Data Structures for Entity Systems: Multi-threading and Networking

(One of a series of posts about converting Unity3D into a 1st-class Entity Component System. Read others here)

It’s a year and a half since I wrote about choosing Data Structures for high-performance Entity Systems in A-AAA games. I always had a followup post in mind, but never wrote it down (thanks largely to the demise of #AltDevBlog). These days I’m too busy to write such big articles, but it’s long overdue … This post is a shorter version of what I had planned, but should have enough to get you going in the right directions.

Recap … what we concluded last time

If you haven’t already, then you should read the previous post on Data Structures. Diagrams and everything! But to save time, here’s a recap:

A medium/large desktop game will have:
- 100’s of Component types
- 10,000’s of pre-made templates for game-objects
- 100,0000’s of entities in live game
- 100’s of bytes storage per Component
Contiguous memory storage is the holy grail
Processors usually read MULTIPLE Components at once, per-Entity
It’s easy to make memory optimal for any one set of Components, but almost impossible for all the different combinations of Components that you see on your Entities
We can achieve a lot by messing around with only small differences in how we fill / layout our Arrays in memory
Versioning (keeping multiple copies of the same variable at once) was completely ignored, other than to say “we’ll need it, and none of these approaches help at all”
Fragmentation is a recurrent nightmare

That article focussed on “core” / basic questions of memory-management for Entity Systems; this one will focus on the networking and multithreading issues issues for memory-management. These features are a huge advantage whether or not you do any networking, but it gives me a source of concrete examples for how/why/where you’d want them in-game.

Data needs: going beyond a single thread

…Networking

How do you implement good quality real-time multiplayer networking in a computer game? I used to be an expert on this … a decade ago. How it’s done today? I’ve no idea; I get the impression it’s much the same: the main limitations 10 years ago were “the internet isn’t perfect” and “nothing can go faster than the speed of light”. I don’t think those have changed?

So I’m going to point you at one seminal work if you’ve not read it: Google: Quake 3 networking architecture, with a very short summary of key elements here.

Networking: takeaways (not just from Q3)

I used to be a Network Programmer in a AAA studio. Off the top of my head, these are the generic requirements that most/all game-networking subsystems have of their data / memory management:

Changes to game-state are stored efficiently (less RAM = fewer bytes over internet = faster responsiveness)
Game-state (and changes to game-state) are easy to serialize/deserialize at low CPU time, and work for all current and future game-logic/game-classes, without ever rewriting the code
Server and Client easily hold many different copies of the complete game-state at any one time
All changes to game-state are tracked, no matter which code in which thread on which CPU made the actual change (so you can diff it, both client and server)
The networking layer knows the “true” game-state; all other parts of game (rendering, physics, AI) are using multiple, different, deliberately incorrect copies of the game-state. This e.g. lets rendering hide rubber-banding by faking on-screen positions of players.
…TBA: probably several others I’ve forgotten…

Threading: takeaways

For threading, we want:

As much data as possible is read-only. Ideally complete data-structures.
- With read-only data, many CPU’s can access at once at full speed without having to write any MT code
Mistakes in MT code are some of the most expensive to debug in programmer-time. You can waste literally months solving simple problems – months you couldn’t afford!
- Our MT code needs to be idiot-proof.
- As much as possible, our MT structures and methods should be “invisible”: they happen automatically, without us ever having to “remember” special rules when writing game-logic
Different subsystems can and should run at different frequencies.
- Rendering at 60FPS? Cool! But your AI shouldn’t, your Physics doesn’t have to, etc
- You can buy a lot of performance (when you’re having trouble optimizing your frame rate) simply by running some things slower
- When debugging, it can be a huge advantage to slow down (or stop/pause) some systems around the key point, while others keep running. The volume of debug data you’re watching becomes smaller and more (humanly) manageable.
Unity sucks at MT code
- If you ever wonder “does Unity have a weak/crappy core architecture?” look no further than this: Unity3D in 2015 still cannot run on multiple threads
- You can work around it; we will work around it. But it’s likely to cause recurring challenges.

Solution 1: Ctrl-D the Universe

Long story short: one interesting possible approach is to combine all the above and re-think our arrangement of memory at a more abstract level. Instead of “storing our data in a (set of) data structures”, let’s “store multiple overlapping copies of the same data in different places in memory”.

Key benefits:

When you spin-up a new thread/CPU core: Duplicate the current game-state, give it to that thread, … Profit! (no MT code to worry about, it can get running immediately)
Networking can hold different copies of the game-state based on “time” and “ACKnowledged server/client state”
BONUS: you can cache frequently-used Component-lookups (e.g. “all entities with a Physics Component and a Render Component”), especially read-only ones, into tightly-packed, highly optimized data-structures

(That last one should be ringing some bells from the previous post … best of all worlds, yes?)

Using it: API for the data-structures

Originally, our Data Structures were really simple – just an array. The average programmer could easily remember how to use it. Then it became a set of similar arrays. Then a bit more funky array with internal state. Now … now, it’s getting complicated. To keep it idiot-proof (and let’s be honest: all programmers do something idiotic, sooner or later. Where do you think bugs come from? :))

Quick and dirty needs for our API:

Get me a (set of) data structures that represent a new Copy of the Universe
Give me a customized Copy
1. …give me only specific Components, from specific Entities
2. …give me the Copy as at a specific moment in time
Merge my changed Copy of the Universe with other people’s cop(ies)
1. Update mine by applying changes from N specific copies with mine
2. Update all of theirs by sending my changes out to N different other copies

In practice, this forces into a couple of architecture-design activities. We’ll have to codify our main Data Structure for lists-of-components, we’ll have to define our filtering/search parameters – and the algorithms we’ll support for filtering. We’ll also need a representation of “which” copy of the universe is which, and some concept of a “master” copy that all others merge into / update from.

Thinking ahead, the easiest implementation strategy will be to break this down into “deltas”. Each Copy will be stored / defined as “how it is different from a reference/base Copy”. This makes a lot of the operations above much easier to implement, and it’s also a highly efficient way to store, debug, and act upon the complex data.

Start thinking about your favourite SCM (Git? Mercurial? SVN?), and how it’s implemented; such knowledge will come in useful

Basic implementation 1: Events

Conceptually, the easiest way to implement this is for every change to any piece of data to be stored as an Event. Your storage of data is simply storage of lists of events.

This is grossly inefficient. To find out “what’s the Position of object X?” your CPU has to iterate over every change that has ever happened to X’s position, and apply each change, one by one … all to find out one single value.

You could do it this way; let’s not even go there.

Implementation 2: Copy on Write

Another simple approach is for each Data Structure to wrap two internal DS’s:

Base copy (All Components for all Entities)
Changed subset (Only the Components that have changed, only for the Entities that have change)

This has a few benefits:

Fetching “the current value of variable V on Component C” is a single lookup: directly access that Component from the Changed subset
Merging this back onto the original you copied from is fairly efficient: iterate across the Changed set and clobber the original’s Components, via direct RAM copy / array-copy
- It’s not hugely optimal: you’re copying probably tiny pieces of RAM at once, rather than big, contiguous sections
Assessing “how much” has changed is trivial and fast: count the size of the Changed set
Memory usage is at worst twice the original, three times in total, but on average a lot less
- …unless your use-case for this particular Copy is to change EVERYTHING (e.g. the Movement Processor looping over every Position Component and changing every single one of them, adding their .velocity to their .position)
We can safely delete Entities and Components in one thread, and it won’t cause any dangling-pointers or memory leaks elsewhere
- …but merging two Copies where one has deleted the Component and the other hasn’t … will require some careful design, and careful coding

There’s still a notable weakness: if you want to store many Copies that differ by a small amount – e.g. a Server storing the ACKnowledged game-state for each Client – you’ll end up wasting huge amounts of RAM for the spammed “Base” copies.

Other implementations…

There are plenty. I leave them as an exercise for the reader ;). Feel free to suggest in the Comments below (as noted at the top, this post isn’t ideal – I don’t have time to do an exhaustive look at the topic. But this should have got you started…)

Next Steps

I’m off to try this out in C#, and see what challenges arise. In the meantime …

Support me on Patreon, writing about Entity Systems and sharing tech demos and code examples