Some Myths of Writing Networked Multiplayer Games

Networked games use the internet, and the difficulties of making these games evolve on Internet Time, which means that the articles people wrote as recently as a year ago on how to make a networked or multiplayer game are already out of date. Most of the literature is more than 5 years old, and some as much as 10 years old – hopelessly out of date in the modern world of internet and online gaming.

Anyway, to get you thinking (I’m not providing definite answers here, but just some stuff to make you think about more carefully about how you’re doing your networking), here are some common rules that perhaps no longer apply the way they used to:

I’ve been interviewing candidates recently for Network Programmer roles at NCsoft, one of the world’s largest developers of online games, so we look for the best people (and are usually quite good at spotting them). As an experiment, among the normal interview questions I ask, I started asking some ultra simple questions which have famous “correct” answers that are no longer true. Interestingly, this showed a great many applicants were still parroting answers that either they read from their undergraduate course-notes or in old books, or that they discovered themselves 5 years ago but apparently hadn’t done any real programming in the last few years, because they hadn’t noticed how things have already changed.

What does this mean? Well, for a start, some extremely powerful techniques in network development which people could actually use today are probably being ignored by many developers because “received wisdom” suggests they simply cannot work. It also means I’m going to have start thinking up some new interview questions, of course, because now everyone coming for interview is going to Google this page and change their answers accordingly…

Myth #1: Client bandwidth is the bottleneck

I moved house at the start of this year. I tried to get a cheap, relatively fast, internet connection. 1 Mbit would be fine, I wasn’t going to use it much at home (I already had mobile internet).

The slowest anyone was willing to sell me was 8 Mbits, and the fastest 24 Mbits. Overkill? Well … those DVD’s everyone’s downloading these days are pretty good at soaking up all the bandwidth you can throw at them, I guess…

Of course, I could pay more, and get less bandwidth (there’s a few sharks out there), but that’s just stupid.

And yet … most multiplayer games from the late 90’s through to the mid 2000’s were carefully optimized to get their maximum bandwidth usage per client down to around 1-2 KB/s, i.e. 8-16 Kbits. Or, to put it another way, around 1/3000th of the bandwidth I was offered for the grand sum of £24 a month.

Myth #2: The network is orders of magnitude “slower” than the rest of the computer: the server itself will never be the bottleneck

Another one that crops up often as a rule of thumb – you’ll be lucky to be getting 100 messages a second from your network connection, whilst your PC’s processor runs at 1 Ghz, which means that in the same period of time it can process 10 billion individual commands. Ten billion. That’s quite a lot more than one hundred.

Of course, over the LAN you’ll get a lot more than a hundred messages, but even with a decent fast hub and 100 Mbit ethernet you’ll probably max out at somewhere around ten thousand or so per second. That’s more than a billion times fewer (“slower”) than your CPU.

Now at this point you’ll have spotted the deliberate mistake there, and be saying “but no-one uses hubs these days, switches are now so cheap you can’t buy them,” (clever you) ” and we’ve got gigabit ethernet as standard on all the motherboards – you probably can’t even BUY a 100 Mbit network card any more!”. Right. So … the network can handle theoretically ten times as much traffic (wow!) and in practice probably more like twenty times as much now we’ve got rid of those shoddy hubs. Although, then again … nowadays it’s not the crapness of Ethernet that makes it struggle to hit its rated speeds, it’s usually the fact that your network card is “too” cheap, and you can’t buy a decent one because there’s no market left for them (oh, how ironic!).

Still, we’ve definitely got an order of magnitude more bandwidth, and improvements to the OS IP stacks mean we’ve probably got a fairly good improvement in latencies and the handling of vast numbers of packets on the wire. But we were talking about a factor of 1 million difference, so even a few orders of magnitude is still a total irrelevance.

Hmm. Seems I’m arguing that the network really IS a lot “slower” than you computer. Except … the suspicious amongst you, Dear Reader, will have noticed the quotes I keep putting around the word slower. Yeah. There’s a reason for that.

When you’re writing games people always talk about the number of messages they want to send per second, and ask if it’s reasonable. And in any online game, you’re never going to care about total bandwidth because you’re only sending tiny amounts of information about (c.f. the previous point about games getting down to a few kilobytes of traffic per second). But that’s the problem: mixing the two concepts of latency and bandwidth into one rating of “speed”.

When it comes to the server, the “total” latency it’s experiencing does NOT increase or decrease when you increase or decrease the number of connected clients (well, it does, a very very small amount), but the total bandwidth it’s experiencing DOES increase – linearly, in fact. And if we’ve taken any advantage of Myth #1 up above, and started letting our players transfer stuff like streaming movies of themselves to each other in-game, then potentially every new client is going to add their total ISP-provided bandwidth to the mix.

For a while now there has been something of a problem here with bandwidth usage per client, which is just the raw cost of the ISP bandwidth bill for your server. e.g. back in 2002 even with just 10,000 players the typical small MMOG could be paying as much as a six-figure sum on annual bandwidth bills. But costs have come down, especially if you buy in bulk, so that’s not such a big problem now.

The problem nowadays is not “how much can the internet handle?”, and not always “how much will it cost you to get that much internet traffic?”, but sometimes just “can your server cope, physically, with the volume of data you’re trying to pump in and out of it every second?”.

1,000 players each saturating their 24Mbit home broadband connections?

That’s 3 gigabytes (bytes, not bits) of data. Every second.

Phew. Well…that means we’ll need to stick 6 x 4-way gigabit ethernet cards in the server. That’s probably OK, they only cost a few hundred dollars each these days.

But … um … how much bandwidth is there INSIDE the server? Have you looked at the bandwidth of modern RAM these days? Peaks out at around 10 GB/s – enough that the sheer volume of your network traffic is seriously interfering with the PC’s ability just to function as a processing system. But what about the peripheral bus, which is going to be between you and the CPU/RAM – a nice fast PCIe x16 slot manages a mere 4 GB/s, only just enough merely to do the basic data-transfer (and this assumes the computer isn’t doing anything else except transferring data – obviously we want it to do actual processing too!).

Myth #3: Server Clusters should use TCP internally

Pretty much all multiplayer games these days do some clustering on the server side, and it’s a strange world of it’s own.

(do you think that because you’re not doing an MMO you don’t care about clusters? Well, tell me: exactly HOW is your multiplayer racing game (or FPS) going to allow people to start new games, hmm? A lobby-server, perhaps? Maybe even a login-server, that stores stats for each individual player? Maybe those need to talk to each other, and to the game-server itself?)

Although I hate it when people (usually forum posters ;)) insist that games should all use UDP “because it’s faster” or some such rubbish (define “faster”, please…), the opposite seems to be true with server clusters. Time and again people go with TCP, apparently without considering whether UDP might be better for them. (see? I said “might be” there. You won’t find any gross sweeping generalizations here! Um. Ahem.)

It’s easy to see why people go for TCP: you want the thing that most simplifies your server architecture, because, Damn it!, it’s hard enough making the bloody thing work in the first place – you have more than your fair share of headaches to deal with already, what with Race conditions, Object-Persistence, Transactional integrity, Security holes, etc.

Actually, I suspect it’s as much because cluster-focussed middleware is usually TCP based (think of all the little network tools and OS features that go together to make up your cluster, and how many of them are TCP based as opposed to UDP? How many of them give you a choice what to use as the stream provider?) and that developers just go with the flow.

A lot of the UDP vs TCP debate evaporates inside a server cluster. In particular, there’s no routers, no DSL modems, no switches all doing weird and horrible things to your traffic (TCP header compression? UDP delayed by higher QoS traffic? etc). So it’s a bit simpler.

But, most obviously, *there’s no such thing as packet loss*. So … UDP is suddenly a “reliable protocol” (whoa!). But, also, TCP is suddenly a “no artificial delays on received packets due to out-of-order reception” (double-Whoa!). Then again, that means that UDP is ALSO going to receive packets in-order.

(NB: that’s not really true, unless your routers and switches and OS’s behave themselves. But … with the cluster, you get to buy those bits yourself, so you have – in theory – the option to MAKE it true)

So … what’s wrong with TCP? Well, the one bit that’s really wrong with it is this:

	UDP	TCP
Lossy network (internet)	no special support, sucks badly	works great, designed especially for this environment
…extra code you need to write?	lots	none
	UDP	TCP
Private LAN (cluster)	Just Works ™	Just Works ™
…extra code you need to write?	none	none

i.e. UDP does “nothing special”, it just sends data, but TCP has loads of “extra stuff” that’s going on behind the scenes, including a 22-state FSM (Finite State Machine), that’s just sitting there hogging up resources and adding extra delay onto things like opening and closing connections. That’s all wasted effort – You Ain’t Gonna Need It (so don’t use it).

Sadly, poor old TCP suffers from the exact same problem it always suffers from in games development – nothing is optional. Which means you HAVE to use all that stuff, even though it’s doing you no benefit.

So, TCP is simply “at least as or more expensive” for clusters (uses more resource) than UDP, without really helping. And in a lot of cases, that causes problems as you scale up your number of clients, or number of simultaneous matches.