Epic saga of Unity3D deserialization of arbitrary Objects/Classes

(a summary writeup of the things I tried – based on this Unity blog post: blogs.unity3d.com/2014/06/24/serialization-in-unity/ – to get Unity 4.5 to correctly serialize/deserialize simple object graphs)

This is not easy to read, it’s bullet points summarising approximately 2 weeks of testing and deciphering (plus a month or so spent learning-the-hard-way the things that are in the Unity blog post above. Note that the blog post only came out very recently! Until then, you had to discover this stuff by trial and error)

Serializable-thing:

– example on blog won’t work if e.g. your Class is an object graph.
– e.g. class Room { List doors; }. class Door { Door otherDoor; }
– …can’t be serialized. Can’t be stored as a struct, unless you have a GUID per “Door” instance. This is standard in most serialization systems.
– So we go on a quest to create a GUID…

GUID1:

– class Door : ISerializeCallback { Door otherDoor; public string guid = Guid.NewGuid().ToString(); /* use OnBefore / OnAfter callbacks to store the guid of the other door / find the door from guid */ }

Unity inspector lists:

– if you add an item to the list, 2 things corrupt your data:
1. all existing objects are NOT MOVED, but are CLONED into new list. This breaks our GUID.
2. the new item is a carbon-copy clone of the old – so it ends up with the SAME GUID (ouch)

GUID2:

– class Door { [SerializeField]private string _guid = Guid.NewGuid().ToString(); }
– now the inspector no longer clones the guid … BUT … instead it sets it to null.
– it seems to run the constructor, then use serialization to delete all values it finds

GUID3:

– class Door { private string _guid = Guid.NewGuid().ToString(); public string serializedGuid; /** use serialize callbacks to maintain serializedGuid */ }
– Disable the Editor’s list-renderer: it has at least two major bugs that corrupt data. Use the UnityEditorInternal.ReorderableList.
– NB: ReorderableList – with no customization – does not have either of the two bugs I saw with built-in Inspector’s lists.
– NB: as a bonus, with RL, we can avoid the “run constructor, delete all data, deserialize, reserialize” cycle, and directly create a new, blank instance with a safely generated – FRESH! – Guid.

Ctrl-D in Editor:

– breaks this, because we have no idea if a deserialize event comes from “ctrl-d” or from “read back from disk” (or from a prefab instantiating into the scene).

GUID4:

– class Door { public Room serializedRoom; private string _guid; /** use callbacks to peek into the serialized room, and use the guid as a “local ID within this room” */ }

THIS WORKS! Until you restart Unity:

– restarting unity causes this to deserialize on a different thread, and now you’re NOT ALLOWED to peek into the serialized room. You cannot use it as a substitute ID. You get crashes at load-time).
– …apart from that, this works fine. We have a solution that worked for ctrl-D, for prefabs.

GUID5:

– class Door { public Room serializedRoom; private bool _isUnityStillOnBackgroundThread; /** use the bool to say “deserialize happened but the data is NOT VALID, DONT RUN ANY METHODS” */ }
– …also use an AutoRun script with [InitializeOnLoad] and “EditorApplication.update += RunOnce;”
– …the autorun script waits for Unity to startup, then goes and Finds all objects, and finds all Rooms, all Doors — and does the fixup.

Fail, but can’t be debugged:

– I have no idea why this failed. It appeared perfect. Simply moving code that worked inside the deserialize (but crashed because it was run on “wrong” thread) out into the post-launc RunOnce method … silently fails. No errors, but no data.
– NB: this is impossible to debug, because we can’t even call ToString() on the deserialize method :(.

Conclusion

Give up. Make 10,000′s of Component instances instead. Waste memory, kill performance (especially at runtime: imagine how bad the AI’s going to be iterating over this!) but hey – IT WILL WORK. Because it leans on the built-in hacks that MonoBehaviour has had for years (which we’ve all been using millions of times a day without realising / appreciating ;)).

Debugging A* Pathfinding in Unity

I needed fast, efficient, … but powerful, adaptable … pathfinding for my hobby project. My minions have to fly, walk, crawl, teleport, tunnel their way across a procedural landscape. Getting to that point has been tricky, but the journey’s been interesting.

We want something that intelligently routes creatures around obstacles, and picks “shallow” routes instead of scaling cliffs (or vice versa for Giant Spiders etc). Something like this:

Screen Shot 2014-08-10 at 20.09.36

Continue reading

OpenGL ES 2 – Video on iPhone in 3D, and Texture-Map sources

This is a blog post I wrote 6 months ago, and my life got too busy to finish it off, finish testing the code, improving the text. So I’m publishing this now, warts and all. There are probably bugs. There are bits that I could explain better – but I don’t have time. Note that the code is ripped from apps where I ACTUALLY USED IT – so I know it does work, and works well!

In brief:

  1. Using different “sources” of textures in OpenGL, not just images!
  2. Using CALayer’s as a source — allows for video in 3D!
  3. Using “Video straight from the video decoding chip” as a source – faster video!

Continue reading

Review: good SMS apps for Android

Why do we need this?

In a bizarre move, Google deleted the perfectly good SMS app from Android. Yeah. Hmm. Well, read this TechRadar piece for some thoughts on that.

Sadly, Google’s corporate policy is that the consumer is always wrong: you’re not allowed to simply “use the app I already had, that worked great”. Time for a trip to the Android App Store…

I tried:

Quick review of each

TextSecure

  1. does what it’s supposed to: show all conversations, send texts
  2. uses the “standard” view of SMS that we’re all accustomed to

…Deal breaker

  1. there’s no widget. Without a widget, you can’t get your phone to show “at a glance” new and recent SMS texts on the home screen / dashboard

Evolve SMS

  1. has a widget that works

…Dealbreakers

  1. enormously unpleasant “forced swipe” control system. I’m not sure why. Maybe: “Facebook did it, and it sucked, so Facebook stopped. But we want to be Facebook, so let’s copy their failures!”
  2. the back button is actually broken. This is a core feature of Android, one of the biggest improvements compared to iPhone. And Evolve broke it. (Sad face).
  3. the widget can’t be resized. Fixed size widgets are a real pain – you can’t place them where you want, cant put the things you want on same screen, can’t control the info displayed

HelloSMS

  1. basically works

…Dealbreakers

  1. stupid forced-swipe control (see above)
  2. On first start, requires you to swipe in wrong direction
  3. no widget

Go SMS Pro

  1. has a widget!
  2. basically works, but ugly

…Dealbreakers

  1. The widget WILL NOT DISPLAY incoming/recent texts. Instead it displays ONE text, and you have to hit buttons to “scroll” the widget to next / previous text. Wat?
  2. The app hangs if it’s not your “default SMS app”: you type something, it hangs. No other app had this problem, they all did a popup explaining, including a single-tap “fix this” button.

chompSMS

  1. has a widget
  2. looks really nice

…Dealbreaker

  1. The widget … is simply “an ugly resized version of the app icon” … and it does nothing. Seriously, a developer chose to add this feature, and implemented … nothing. Wat?

Summary / Winners

TextSecure is a perfect recreation of ” an SMS app that just works, the way its supposed to” … Apart from the lack of widget (essential!). It even has Apple iPhone style “free text messages to other users of the app”. If they’d fix that missing widget, it would be worth paying for…

chompSMS is very much like TextSecure, perhaps slightly prettier (personal preference). But the broken widget is just as bad as TextSecure’s, maybe worse. And it lacks the “free SMS” mode and “secure SMS” modes of TextSecure.

Evolve has the only working widget (essential!). But … you’ll frequently exit the app unintentionally when you hit the back button. And it’s unusable unless you like “swipe all the time to control UI instead of using the back button” behaviour.

Final thoughts

The rest … are not winners. I’m using TextSecure for now. When people say “people on Android don’t buy stuff”, partly it’s because the apps are incomplete: most of these apps haven’t been fully developed (how can you have a broken widget? Why did you launch without a widget? etc).

NB: as a developer, I can attest that “adding a widget” is very easy. It’s stressful the first time, and then you read the docs and examples, and realise “wow! this is going to be super quick to implement!”.

World-exploration game, done differently

Inspired by Markus / @notch, and how he got MineCraft going in the early days, I’m live-prototyping a game idea I’ve been thinking about for ages. It’s about exploring a landscape … where cities affect dungeons, and dungeons affect cities.

(this is stress-relief for me, an “evenings and weekends” project. By day, I’m helping Schools teach children to program, but that’s HARD (no investors, lots of costs – at the moment, I get paid nothing))

Play in web browser here (very early prototype!)

Screen Shot 2014-07-25 at 12.31.23

July 2014

Current build focusses on the “City building” interface. Initially, you won’t be building cities, you’ll be building small camps / trading forts, etc. So this lets me play with the interface, the scale, the rendering, etc. Make sure it’s fun.

Create something – but take screenshots! Because there’s no “save” yet. Super-early prototype.

Long term gameplay

  1. Explore a huge landscape in FPS
  2. Any location, “create a camp”, and it switches to the isometric view you see in current demo. Build yourself a home/fort/etc.
  3. Fast-travel between locations, but this triggers “random encounter” events … or run between locations manually.
  4. Turn-based / DungeonMaster style dungeons are randomly generated near to each camp. Explore the dungeons for resources and to level-up.

Unlike e.g. Skyrim … when you find an interesting location, you can’t fast-travel to it. You have to build a camp next to it, and fast-travel to that. But your camp can be overrun/destroyed/stolen while you’re away.

So … you’ll be building (and maintaining) a lot of bases around the world. I want to be sure that base-building is easy and fun, even after you’ve done it lots of times already, before progressing with the rest!

(I’m trying to make it “repeatably fun” by having the landscape HUGELY affect your base design)

Webmail on your Debian server: exim4 + dovecot + roundcube

95% of linux configuration on Debian servers is simple, well-documented, well-designed, easy to do, with only a tiny bit of reading of docs.

Sadly, “making email work” is most of the 5% that’s: nearly impossible, very badly designed, badly packaged/documented. This OUGHT to take an hour or two, in practice it takes ONE WEEK to setup. WTF? In 2014? Unacceptable!

So I took several incomplete/broken guides, dozens of pages of help and advice, and synthesized this complete, step-by-step guide. This should get you the webmail you actually want (!) in an hour or less.

Continue reading

Your Pricing Models dictate your business success

Many businesses underestimate the power of a clever Pricing Model. I’m selling a new product at the moment (we’re helping schools teach children to program), so pricing models occupy a lot of my head right now.

Startups are so unsure of what model to use that they often say “anything, so long as we get sales”. They usually focus on simplicity (with a “new” product, a complex pricing model can tip people over the edge and make them “give up” on buying).

Which is fine, but a cunning pricing model can work wonders.
Continue reading

How to tidy your Unity code by putting it in a DLL

The importance of Re-use

Here’s a story that may sound familiar:

You downloaded Unity, watched some tutorials on YouTube, and had a physics “game” running the same day. Excited, you started writing the game you’ve always wanted to make. Possibly – bitten before when trying this with GameMaker etc – you thought to organize your source code: folders, subfolders, logically-named scripts, etc.

Fast-forward a few months, and you have 50 scripts in 40 sub-folders, some of it re-usable, some of it hardcoded for your game. Hoooo-Kayyyy… Well, it’s not too bad. And then a new opportunity comes along – a paid contract, a game-jam, a collaboration with a friend – and you want to re-use some code from your main project. You look at the folder structure, you look at how intertwined it’s all become … and you weep.

Without re-use, what seemed like an “easy” project becomes just a little too much work, and you never finish it. Back to square one :(.

Code re-use in Unity

There are five major aspects to code re-use when making Unity apps. Most of these are shared with general programming, but these are the ones I’ve found *especially* important when doing Unity development.

Good names

The best way to get good at naming things in a program is to focus on “what makes a bad name”, and “Don’t do that”. e.g.

  • If you have to open a Script to remember what it does … it has the wrong name.
  • If you keep typo’ing the script when referencing it in other scripts … it’s the wrong name
  • If your script is more than a thousand lines long … it’s the wrong name AND you’ve put too much junk in there; split it!
  • If you can’t find a script, because you can’t remember which folder it’s in … the folder name is wrong, AND the script name is wrong (should have been a name that forced you to put it in the right place first time!)
  • …etc

Renaming scripts in Unity is a minor pain: you rename it, then you have to close the file in MonoDevelop, re-open it, and (assuming it was a C# class) rename the class in source-code too. (I keep hoping latest Unity will do this automatically, but I’m a version behind at the moment).

We need Sub-folders. Lots of Sub-Folders

… you can never have too many. Give them good names, though!

De-couple your code

Ah, now it gets tricky. The art of decoupling takes much practice to master.

Decoupling is why C# (and Java) has the “interface” keyword. The concept is much easier to understand after you cocked it up, written code you can’t re-use, because everything’s too inter-dependent.

“I can’t just copy/paste that script to my new project, because it references 3 other scripts. Each of which … references another 12. ARGH!”

Unity itself doesn’t use Interfaces (though it really ought to!) – and you have to use a little ingenuity to use them yourself (Google it, it’s only a few small caveats). But C#-interfaces simply give you a compiler-compatible way of decoupling: you have to design your classes as decoupled to start with. Again, google this and ask around – it’s too big a topic for me to do justice to here!

Package your code, so you can re-use it

Libraries. This is why The God Of Programming invented Libraries. So, how do you do a Library in Unity?

Libraries in Unity: the easy DLL

There are two kinds of DLL’s:

  1. Real DLL’s, as used by Real Men, who write all their code in C++
  2. Fake DLL’s, as used by the rest of us, who just want an easy life so we can focus on making our game

A DLL is a great way of packaging code – it’s a very widely-used standard. So, Unity uses DLL’s for this (good move) – but if you google “Unity make DLL” you’ll get distracted by the huge complexity and depth of C++/Unity integration, and you don’t need any of it. Instead, you’ll be doing “DLL light”, which is easy.

But, as with most Unity features, it’s almost entirely undocumented. And, out of the box, it will fail. For bonus points, Unity will give you completely the wrong error message when it goes wrong. Yay!

Step 1: write a Unity script in Unity

Do something simple. I wrote a class that makes random numbers, following XKCD’s advice:

using UnityEngine;

/** In Unity 3, you cannot have namespaces. So comment out this next line. Unity 4 is fine */
namespace MyTestDLL
{
	public class DLLClass
	{
		public static int GetRandom()
		{
			return 4;
		}
	}
/** In Unity 3, you cannot have namespaces. So comment out this next line. Unity 4 is fine */
}

Make a second script that uses that first one, e.g.

using UnityEngine;

public class TestScript : MonoBehaviour
{
	void Start()
	{
		print( DLLClass.GetRandom() );
	}
}

…attach it to a GameObject, check it works.

Step 2: follow Unity’s docs on creating and using a DLL

You’d think this would be enough, but it isn’t. However, the docs are simple, direct, and I found them easy to follow.

So http://docs.unity3d.com/Documentation/Manual/UsingDLL.html = read this and do what it says

NB: they write a slightly more complex script to use than mine, with but whatever. The idea is the same.

Step 3: Use the namespace, Luke. And Interfaces. And …

Now that your DLL is compiling/building in MonoDevelop … you are no longer stuck with Unity’s arbitrary rules and restrictions! The world is yours!

(so, even in Unity 3, you can now use the namespace. Which makes it MUCH easier to keep your code well-organized ;))

Step 4: What about the [square brackets]?

These Just Work ™ exactly as they did in Unity scripts. e.g.

using UnityEngine;

namespace MyTestDLL
{

/** Hey! Look! Unity editor-features ... coded outside of Unity */
[ExecuteInEditMode]
[System.Serializable]
	public class DLLClass
	{
...

Step 5: Editor extensions, editor scripts, editor GUIs, etc

This is where it breaks. If you had any Editor scripts (which, in Unity, MUST be stored in the “Editor” root folder, or one of its subfolders), you can include them in your DLL – but they won’t work.

Worse, when you trigger them (e.g. by selecting a GameObject that did custom editor rendering), you’ll get:

Error: multi-object editing not supported !

…at least, you do in Unity 3.x. Fixed in 4.x, I suspect? But I haven’t gone back to try it since I fixed the bug :).

There is some mumbo-jumbo on the internet (and Unity forums) about complicated workarounds, notably courtesy of AngryAnt (with a Jan 2014 post here, with some extra info worth reading but NOT required!). But these are long-since outdated and unnecessary. The fix is very simple: we must create TWO DLL’s!

  1. DLL #1: the one we already had
  2. DLL #2: will contain ONLY the editor-scripts, and we’ll drag/drop it into Unity’s “Editor” folder.

This is extremely logical, simple, and easy to remember. And it works beautifully! But .. how?

Step 5a: Upgrade MonoDevelop project to output not one, but two, DLL’s

First, right-click on the top-most item in MonoDevelop’s “Solution” panel, and select “Add New Project…”:

Screen Shot 2014-05-24 at 20.54.53

As you did earlier, choose “Library (C#)”. I recommend naming this “SomethingSomethingEditorScripts” (where “SomethingSomething” is your library name, which you used for the first DLL).

Second, you need to edit the References (as you did with first DLL), and this time repeat the steps and add ALL of:

  • UnityEngine.dll
  • UnityEditor.dll
  • AND: instead of the “.Net Assembly” tab, use the “Projects” tab and select your main DLL/library. It should be the only option

That gives you something like this:

Screen Shot 2014-05-24 at 20.58.22

…see how I have two “projects” in MonoDevelop, one which just uses normal Unity stuff, the other which additionally uses Unity Editor stuff? And the second one “references” the first, so it can see/use/create the classes and methods from the first one.

Step 5b: Put the DLL’s in different places

Build, and find the output files. The first project/DLL will still only be making one DLL, but the folder for the SECOND project/DLL will conveniently build BOTH projects and contain BOTH DLL’s.

Make sure you put one DLL “anywhere except Editor” and the other one “in Editor, or a subfolder of Editor”.

Finally, create some GameObject’s, open up the DLL’s in Unity’s Project view (expand the triangle) and drag/drop the Scripts onto your objects as desired. Click on them in the Scene view, and all your OnGUI, OnGizmos etc methods should run exactly as normal.

Step 6: Rejoice!

Now you can share your code with other Unity projects simply by drag/dropping the 2 x DLL’s into a new Unity project. BOOM!

Beautiful! Easy, simple, impossible-to-screw-up ;). That’s how I like my code re-use…

Should freelancers in gamedev industry sign NDA’s?

I’m sure I’ve written about this before, but it’s come up again on Reddit, so here’s my thoughts…

IANAL, but … I have been offered more than 100 NDA’s, both as an individual and working in large companies (including a games publisher, and including a funding team).

I have signed fewer than 30 – and most of those were before I realised how “wrong” it is to sign an NDA.

These days, I only sign 1 in every 10 NDA’s sent my way – and yet I still end up working for, or doing business with, at least 50% of the people who initially “required” an NDA.

Often, people send me a bad, broken, poorly worded, typo-riddled NDA and I say “I’m not signing that. Fix it, get it down to 1.5 pages max, and I’ll reconsider” — and they usually don’t bother, they just tell me their “secrets” anyway. They didn’t really need or want the NDA!

Also: In most jurisdictions, an NDA can’t realistically be more than 2 pages, maybe 3 if it’s badly worded … if it’s more than 2 pages, you’re either in a niche industry (not games), or … it’s not an NDA, it’s a contract that’s pretending to be an NDA, perhaps for evil reasons. When someone sends me a 5-10 page NDA I say “send me a 1-2 page NDA. Don’t send me a secret employment contract”. AGain, they often simply carry on talking to me without NDA.

One large games company (100 staff) sent me (IIRC) a TWENTY TWO page NDA. Buried on page 17 was something like: “we own all converations with you, and at YOUR cost we can confiscate all hard-disks and storage belonging to you or your company if we have any suspicion you might be developing ideas similar to our own”.

(It specifically took ownership of any ideas similar – even if we’d never discussed them. It was disgusting.)

There are some exceptions to be aware of – but the company can easily explain + justify this to you. For instance, if the company is based on a patent, then in Europe they may be required to prove everyone is under NDA or risk invalidating the patent (not the same as USA, where you can talk about patents publically, IIRC).

TL;DR – many “NDA’s” are a scam. Refuse them without hesitation. Or question them, and discover the company didn’t really need it anyway. Most “decent” companies won’t care that you’re not under NDA; the obsession with NDA’s usually comes from the kind of dumb-ass employer that is a PITA to work for.

(to reiterate: IANAL. I’m happy giving advice, but legally you shouldn’t trust random strangers on the internet, and I take no responsibility for your actions ! :))

Great use of UX: too-close signs on cars

5 years as an iOS developer has pushed me to understand and improve my product/physical/functional design skills. Especially “industrial” design – problems and solutions.

Here’s a great one I saw the other day: combining a Snellen Chart with Stopping Distances, i.e.:

Screen Shot 2014-04-26 at 15.24.14 + dg_070645

Existing (bad) solutions

The status quo on driving “too close for safety” is provocative, antagonistic. Have a look at a search for eBay: “too close bumper sticker”:

Screen Shot 2014-04-26 at 15.57.35

There’s a couple of major problems with this:

  1. It’s wrong. If you’re stationary, 6 feet behind the vehicle, you can read it – but you’re definitely NOT too close.
  2. It’s combative and offensive. It appeals to the anger and frustration of the driver in front – when we’re trying to modify the behaviour of the driver behind. Offending people who you’re trying to convince to change … is generally unsuccessful.

But combining the charts above, we get…

Net result

A plate you attach to your vehicle that informs the viewer how much too close/far they are right now, using a dynamic scale:

Screen Shot 2014-04-26 at 15.26.36

I couldn’t get a close enough photo to show it clearly, so here’s my redrawing of what the plate looked like:

too-close-am-vector

Why you shouldn’t use webfonts instead of images

Warning: principled rant against sloppy design and bad coding about to start; kids these days! Get off my lawn!

There’s a terrible disease affecting modern web developers – Twitter.com has just fallen ill with it, and it could be a long time before they cure themselves.

Deleting all images, and replacing them with the dreaded “web font”.

It’s the wrong solution for the problem, it doesn’t do what you think it does, and it pisses all over some of the Web’s core principles. If you’re a web developer, and you respect your craft, you shouldn’t even consider this, not for one moment. Here, for instance, is how Twitter currently looks for me – ugly, and very hard to use!

Screen Shot 2014-04-15 at 10.18.17

GitHub was another recent victim of this disease. They’re still in recovery, but they at least made their site “slightly usable” by adding tooltips:

Screen Shot 2014-04-15 at 10.30.18

What’s going on?

The problem

Core Web principle:

Information is everything; presentation is optional. It’s acceptable to forego presentation, so long as everyone can access the information.

Effect: When you load a webpage, your browser requests every image separately. This is a long way short of the “most optimal” code implementation.

Is this a big problem? For most of us … Not really. The system works, it’s flexible, it’s powerful – it’s a little inefficient, but for the corporations that care there are plenty of hacks and optimizations they can deploy.

The other problem

Most artists should be creating Vector images most of the time, but the software vendors who made the editing tools for Vectors … all died out around 15 years ago. Back then, the advantages of Vectors were small, because most screens were low res.

We still haven’t recovered. There are many standards for bitmap image files, and two very popular ones – PNG, JPG. There are many standards for vector image files, but no popular ones.

Effect: web developers end up looking to Web Fonts as a de facto “vector image” standard.

SVG, or not to SVG?

There is an official Web/HTML approved vector standard – SVG – in wide use, with strong support in all current browsers.

Does it work? Let’s see (http://caniuse.com/svg) …

Screen Shot 2014-04-15 at 10.45.46
Screen Shot 2014-04-15 at 10.45.33

… but many software companies ignore it. For instance, Apple allows programmers to use both PNG and JPG (why?) in core iOS, but not SVG (despite having a full SVG parser built-in to their web browser). Many programmers I speak to believe that SVG isn’t supported at all – FUD wins again.

The other, other problem

A core principle of the Web is that information is accurately described (the M in HTML). A recent trend in web development is “progressive enhancement”.

def. “Progressive Enhancement HTML”: a webpage written the way you were supposed to write it, instead of being hacked-together by unskilled monkeys

HTML has always been “progressive” – this was a core principle 20 years ago. But HTML was so easy to use and abuse that many of us (most of us? nearly all of us?) have been writing poor HTML most of that time. Shame on us (shame on me, certainly – been there, done that :( ).

But … the key point here is: Progressive Enhancement isn’t an “optional extra”, it is the Web. If you fight PE, you’re fighting the entire web infrastructure – and we know how that war will end.

…whatever. What about Web fonts?

So, when you remove an “image” and put a “web font letter” there instead, and change that letter so that the font contains an image you wanted …

…your HTML is now a lie.

Maybe you’re the kind of web developer who scorns blind people (and partially blind), who ignores the Internationalization features of software. You laugh in the face of Accessibility Standards, so you can reduce development time.

But HTML doesn’t make these things optional. They are so core to HTML that they are “always on”, even if you personally never use (need) them. One of the beautiful features of HTML is that if you do nothing, most of the Accessibility is automatically done for you.

With HTML, you have to go out of your way to prevent Accessibility. For instance: replacing images with magic-letters from a magic font.

You’re not blind; why does the Web Font fail?

The thing about custom Web Fonts is … the user can disable them.

Again, this is fundamental to the web. Partly for the Accessibility issue (who are you to decide which users require Accessibility? No. It’s for the user to decide).

But also to support the web principles of openness, and user-control (not corporation-control). My machine, my browser, my choice.

Just as you cannot prevent users from hitting the “View -> Zoom” menu option and making your web page take more or fewer pixels on their screen (I’ve worked with web designers – mostly ex-print designers – who HATED this, and felt it was a feature that should be banned) … you cannot force a crappy font on the user.

Information is everything; presentation is optional. It’s acceptable to forego presentation, so long as everyone can access the information.

In my case: I have a MacBook Air. While wonderful in many ways, they have tiny (11″), non-retina screens, and they’re laptops – so the screen is often further away than I’d like. When WebFonts came to CSS, a lot of the websites I use (art sites, design agencies) started using “beautiful but TINY” fonts that were unreadable. Game Studios still do this today, sadly – lots of hard-to-read but “edgy!” fonts and bad colour choices.

Sometimes, the only way I can do my day to day work is to disable the crappy 3rd party fonts.

Another solution?

SVG says “Hi!”. Think about it.

Debugging odd problems when writing game servers

Cas (@PuppyGames) (of PuppyGames fame – Titan Attacks, Revenge of the Titans, etc) is working on an awesome new MMO RTS … thing.

He had a weird networking problem, and was looking for suggestions on possible causes. I used to do a lot of MMO dev, so we ran through the “typical” problems. This isn’t exhaustive, but as a quick-n-dirty checklist, I figured it could help some other indie gamedevs doing multiplayer/network code.

Scenario

  1. Java, Windows 7
  2. DataInputStream.readFully() from a socket… a local socket at that… taking 4.5 seconds to read the first bytes
  3. and I’ve already read a few bits and bobs out of it

Initial thoughts

First thought: in networking, everything has a realm of durations. The speed of light means that a network connection from New York to London takes a few milliseconds (unavoidably) – everything is measured in ms, unless you’re on a local machine, in which case: the OS internally works in microseconds (almost too small to measure). Since Cas’s problem is measured in SECONDS – but is on localhost – it’s probably something external to the OS, external to the networking itself.

Gut feel would be something like Nagle’s algorithm, although I’m sure everyone’s already configured that appropriately ;).

That, or a routing table that’s changing dynamically – e.g. first packet is triggering a “let me start the dialup connection for you”, causing everything to pause (note that “4 seconds” is in the realm of time it takes for modems to connect; it’s outside the realm of ethernet delays, as already noted)

My general advice for “bizarre server-networking bugs”: start doing the things you know are “wrong” from a performance view, and prove that each makes it worse. If one does not, that de facto that one is somehow not configured as intended.

Common causes / things to check

1. Contention: waht’s the CPU + C libraries doing in background? If system has any significant (but small) load, you could be contended on I/O, CPU mgiht not be interrupting to shunt data from hardware up to software (kernel), up to software (OS user), up to JVM

Classic example: MySQL DB running on a system can block I/O even with small CPU load

2. file-level locking in OS FS. Check using lsof which files your JVM is accessing, and what other processes in the machine may have same files open.

(NB: there’s a bunch of tools like “lsof” which let you see which files are in use by which processes in a unix/linux system. As a network programmer, you should learn them all – they can save a lot of time in development. As a network programmer, you need to be a competent SysAdmin!)

Classic problem: something unexpected has an outstanding read (or a pre-emptive write lock) which is causing silliness. I remember several times having a text editor open, or etc, that was inadvertently slowing access to local files (doh).

3. can you snoop the traffic? (i.e. its socket, not pipe)

Classic problem: some unrelated service is bombarding the loopback with crap. eg. Windows networking (samba on linux) going nuts

Try running Wireshark, filtered to show only 127. and see what’s going through at same time

Also: this is a good way to check where the delay is happening. Is it at point of send, or point of receive? … both?

There’s “net traffic” and there’s “net traffic”. Wireshark often shows traffic that OS normally filters out from monitoring apps…

4. Check your routing table? AND: Check it’s not changing before / aftter the attempted read?

5. Try enabling Nagle, see if it has any effect?

My point is: use this as a check: it ought to make things worse. If not … perhaps the disabling wasn’t working?

6. Have you done ANY traffic shaping (or firewalling) on this machine at any time in the past?

Linux: in particular, check the iptables output. Might be an old iptables rule still stuck in there – or even a firewall rule.

Linux + Windows: disable all firewalls, completely.

7. Similarly: do you have any Anti-Virus software?

Disconnect your test machines from the internet, and remove all AV.

AV software can even corrupt your files when they mistakenly think that the binary files used by the compiler/linker are “suspicious” (IIRC, that happened to us with early PlayStation3 devkits).

8. On a related note: security / IPS tools installed? They will often insert artificial delays silently.

CHECK YOUR SYSTEM LOG FILES! …whatever is causing the delay is quite possibly reporting at least a “warning” of some kind.

9. (in this case, Cas’s socket is using SSL): Perhaps something is looking up a certificate remotely over the web?

…checked your certificate chain? (if there’s an unknown / suspicious cert in the chain, your OS might be trying to check / resolve it before it allows the connection)

10. (in this case, Cas is using custom SSL code in Java to “hack” it): Get a real SSL cert from somewhere, see if it behaves any different

(I think it would be very reasonable for your OS to detect any odd SSL stuff and delay it, as part of anti-malware / virus protection!)

11. “Is it cos I is custom?” – Setup an off-the-shelf webserver on the same machine and check if YOUR READING CODE can read from that SSL localhost fast?

(it often helps to get an answer to “is it the writer, the reader, both … is it the hacked SSL, or all SSL?” etc)

12. (getting specific now, as the general ideas didn’t help): if you bind to the local IP address instead of 127 ?

13. Howabout reverse-DNS? (again, 4 seconds is in the realm of a failed DNS lookup)

Might be a reverse DNS issue … e.g. something in OS (kernel or virus protection), or in JVM SSL library, that’s trying to reverse-lookup the 127 address in order to resolve some aspect of SSL.

I’ve had to put fake entries into hosts file in the past to speed-up this stuff, some libs just need a quick answer

Result

Turns out we were able to stop here…

Cas: “Well, whaddya know. It was a hosts lookup after all – because I’m using 127.0.0.2 and 127.0.0.3 it was doing a hostname lookup. I added those to hosts and all is solved

Me: “(dance)” (yeah. It was skype. The dance smiley comes in handy)

Unconference on Entity Systems – Pre-Signup

Entity Systems are widely used in gamedev and starting to appear in mainstream IT / software development. But we want MORE developers and designers to benefit from this…

So, Richard Lord and I are going to do a mini conference on ES ideas/designs/uses/implementation/etc.

I’ll arrange venue, agenda (e.g. a Keynote) – but this will be an Unconference, so most/all sessions will be interactive, freeform, with no fixed Speakers.

To book a venue, I need an idea of numbers. So, if you’d like to come, please fill out this google form. Note: I’m planning to do this in Brighton, (UK) – less than 30 minutes from an International Airport (Gatwick).

https://docs.google.com/forms/d/156Sb9Qhx4RIX1d0BZg_22MVlUSoUb-gRqwrOq0zeT-w/viewform

Key info:

  • Date: Summer/Autumn 2014
  • Location: Brighton, UK
  • Cost: minimal (depends on demand + venue)

Your best links and articles on Entity Systems / Component Systems

The Entity Systems wiki is pretty good – simple source-code examples in 5 different languages, and links to 10 x richer, more complex “frameworks”.

But it’s got little (almost nothing) in links to articles, introductions, tutorials on the topic. I know there’s a lot that’s been written – just look at all the trackbacks on my old ES blog posts.

Let’s fill out a page on the wiki with tutorials, techniques, etc. What are your favourites (and can you give a 1-sentence summary of what each one does well?)

How to Become a Game Designer (2014)

Step 1: Do not buy the book with this title.

Writing a book is tough, and I respect the time and effort that goes into it. But I don’t rate a book highly simply because it was hard to write: it has to fulfil it’s purpose, and help the reader. Some books are so poor they actively hinder the reader.

This one is being marketed to bloggers to promote it (for cash) without reading it. This (no review copies) might be a beginner’s mistake. I hope so; the alternative is horribly cynical.

For bonus points, the email was sent from a bouncing email address. If you want a job in the games industry, you probably don’t want to be advised by someone so careless/clueless they send out emails from broken email accounts.

That annoyed me enough to look through the marketing materials for this book, and it seems weak.

The author does not appear to have been a recruiter, nor a hiring manager, and apparently has never run a studio. If you read a book on this topic – make sure the author has spent years recruiting, hiring, and managing people in the industry. That’s the core expertise you want from the author – if they don’t have it, they need a really good excuse.

Further, the title seems deliberately misleading – linkbait for readers. It’s titled “become a game designer” but the content appears to be “get a job, somehow, doing something – anything you can scrape-by on – in the games industry”. Core topics are missing from those listed – e.g. how to advance your skills as a game-designer (key to getting hired!), etc.

…but I haven’t read the book, so I can only give vague suggestions and guesses at what it contains.

TL;DR

Personally I’d steer clear of all such books, and spend the time entering Game Jams instead. Every jam you enter will teach you more, and give you more experience that’s directly exciting to hiring managers, than reading books like this one.

I have classes of teenagers making their own games with little or no help; if they can do it, why can’t you?

Data Structures for Entity Systems: Contiguous memory

This year I’m working on two different projects that need an Entity System (ES). One of them is a non-game app written natively on iOS + Android. The other is an FPS in Unity3D.

There are good, basic Open-Source ES’s out there today (and c.f. the sidebar there). I tried porting a few, but none of them were optimized for performance, and most of them were too tightly coupled to a single programming language or platform. I’ve started a new ES of my own – Aliqua.org – to fix these problems, and I’m already using it in an app that’s in alpha-testing.

I’ll be blogging experiences, challenges, and ideas as I go.

Background: focus on ES theory, or ES practice?

If you’re new to ES’s, you should read my old blog posts (2007 onwards), or some of the source code + articles from the ES wiki.

My posts focussed on theory: I wanted to inspire developers, and get people using an ES effectively. I was fighting institutionalised mistakes – e.g. the over-use of OOP in ES development – and I wrote provocatively to try and shock people out of their habits.

But I avoided telling people “how” to implement their ES. At the extreme, I feared it would end up specifying a complete Game Engine:

…OK. Fine. 7 years later, ES’s are widely understood and used well. It’s time to look at the practice: how do you make a “good” ES?

NB: I’ve always assumed that well-resourced teams – e.g. AAA studios – need no help writing a good ES. That’s why I focussed on theory: once you grok it, implementation concerns are no different from writing any game-engine code. These posts are aimed at non-AAA teams: those who lack the money (or expertise) to make an optimized ES first time around.

For my new ES library, I’m starting with the basics: Data Structures, and how you store your ES data in memory.

Where you see something that can be done better – please comment!

Aside on Terminology: “Processors, née Systems”

ES “Systems” should be batch-processing algorithms: you give them an array/stream of homogeneous data, and they repeat one algorithm on each row/item. Calling them “Processors” instead of “Systems” reduces confusion.

Why care about Data Structures?

There is a tension at the heart of Entity Systems:

  • In an ES game, we design our code to be Functional: independent, data-oriented, highly efficient for streaming, batching, and multi-threaded execution. Individual Processors should be largely independent, and easy to split out onto different CPU cores.
  • With the “Entity” (ID) itself, we tie those Functional chunks together into big, messy, inter-dependent, cross-functional … well, pretty much: BLOBs. And we expect everything to Just Work.

If our code/data were purely independent, we’d have many options for writing high-performance code in easy ways.

If our data were purely chunked, fixed at compile-time, we’d have tools that could auto-generate great code.

But combining the two, and muddling it around at runtime, poses tricky problems. For instance:

  1. Debugging: we’ve gone from clean, separable code you can hold in your head … to amorphous chunks that keep swelling and contracting from frame-to-frame. Ugh.
  2. Performance: we pretend that ES’s are fast, cache-efficient, streamable … but at runtime they’re the opposite: re-assembled every frame from their constituent parts, scattered all over memory
  3. Determinism: BLOBs are infamously difficult to reason about. How big? What’s in them? What’s the access cost? … we probably don’t know.

With a little care, ES’s handle these challenges well. Today I’m focussing on performance. Let’s look at the core need here:

  • Each frame, we must:
    1. Iterate over all the Processors
    2. For each Processor:
      1. Establish what subset of Entity/Component blobs it needs (e.g. “everything that has both a Position and a Velocity”)
      2. Select that from the global Entity/Component pool
      3. Send the data to the CPU, along with the code for the Processor itself

The easiest way to implement selection is to use Maps (aka Associative Arrays, aka Dictionaries). Each Processor asks for “all Components that meet [some criteria]“, and you jump around in memory, looking them up and putting them into a List, which you hand to the Processor.

But Maps scatter their data randomly across RAM, by design. And the words “jump around in memory” should have every game-developer whimpering: performance will be bad, very bad.

NB: my original ES articles not only use Maps, but give complete source implementations using them. To recap: even in 2011, Android phones could run realtime 30 FPS games using this. It’s slow – but fast enough for simple games

Volume of data in an ES game

We need some figures as a reference point. There’s not enough detailed analysis of ES’s in particular, so a while back I wrote an analysis of Components needed to make a Bomberman clone.

…that’s effectively a high-end mobile game / mid-tier desktop game.

Reaching back to 2003, we also have the slides from Scott’s GDC talk on Dungeon Siege.

…that’s effectively a (slightly old) AAA desktop game.

From that, we can predict:

  • Number of Component-types: 50 for AA, 150 for AAA
  • Number of unique assemblages (sets of Component-types on an Entity): 1k for AA, 10k for AAA
  • Number of Entities at runtime: 20k for AA, 100k for AAA
  • Size of each Component in bytes: 64bits * 10-50 primitives = 100-500 bytes

How do OS’s process data, fast?

In a modern game the sheer volume of data slows a modern computer to a crawl – unless you co-operate with the OS and Hardware. This is true of all games. CPU and RAM both run at a multiple of the bus-speed – the read/write part is massively slow compared to the CPU’s execution speed.

OS’s reduce this problem by pre-emptively reading chunks of memory and caching them on-board the CPU (or near enough). If the CPU is processing M1, it probably wants M2 next. You transfer M2 … Mn in parallel, and if the CPU asks for them next, it doesn’t have to wait.

Similarly, RAM hardware reads whole rows of data at once, and can transfer it faster than if you asked for each individual byte.

Net effect: Contiguous memory is King

If you store your data contiguously in RAM, it’ll be fast onto the Bus, the CPU will pre-fetch it, and it’ll remain in cache long enough for the CPU(s) to use it with no extra delays.

NB: this is independent of the programming-language you’re using. In C/C++ you can directly control the data flow, and manually optimize CPU-caching – but whatever language you use, it’ll be compiled down to something similar. Careful selection and use of data-structures will improve CPU/cache performance in almost all languages

But this requires that your CPU reads and writes that data in increasing order: M1, M2, M3, …, M(n).

With Data Structures, we’ll prioritize meeting these targets:

  1. All data will be as contiguous in RAM as possible; it might not be tightly-packed, but it will always be “in order”
  2. All EntitySystem Processors will process their data – every frame (tick) – in the order it sits in RAM
    • NOTE: a huge advantage of ES’s (when used correctly) is that they don’t care what order you process your gameobjects. This simplifies our performance problems
  3. Keep the structures simple and easy to use/debug
  4. Type-safety, compile-time checks, and auto-complete FTW.

The problem in detail: What goes wrong?

When talking about ES’s we often say that they allow or support contiguous data-access. What’s the problem? Isn’t that what we want?

NB: I’ll focus on C as the reference language because it’s the closest to raw hardware. This makes it easier to describe what’s happening, and to understand the nuances. However, these techniques should also be possible directly in your language of choice. e.g. Java’s ByteBuffer, Objective-C’s built-in C, etc.

Usually you see examples like a simple “Renderer” Processor:

  • Reads all Position components
    • (Position: { float: x, float y })
  • Each tick, draws a 10px x 10px black square at the Position of each Component

We can store all Position components in a tightly-packed Array:

compressed-simple-array

This is the most efficient way a computer can store / process them – everything contiguous, no wasted space. It also gives us the smallest possible memory footprint, and lets the RAM + Bus + CPU perform at top speed. It probably runs as fast or faster than any other engine architecture.

But … in reality, that’s uncommon or rare.

The hard case: One Processor reads/writes multiple Component-types

To see why, think about how we’d update the Positions. Perhaps a simple “Movement” Processor:

  • Reads all Position components and all Velocity components
    • (Position: { float: x, float y })
    • (Velocity: { float: dx, float dy })
  • Each tick, scales Velocity.dx by frame-time, and adds it to Position.x (and repeats for .dy / .y)
  • Writes the results directly to the Position components

“Houston, we have a problem”

This is no longer possible with a single, purely homogeneous array. There are many ways we can go from here, but none of them are as trivial or efficient as the tight-packed array we had before.

Depending on our Data Structure, we may be able to make a semi-homogeneous array: one that alternates “Position, Velocity, Position, Velocity, …” – or even an array-of-structs, with a struct that wraps: “{ Position, Velocity }”.

…or maybe not. This is where most of our effort will go.

The third scenario: Cross-referencing

There’s one more case we need to consider. Some games (for instance) let you pick up items and store them in an inventory. ARGH!

…this gives us an association not between Components (which we could handle by putting them on the same Entity), but between Entities.

To act on this, one of our Processors will be iterating across contiguous memory and will suddenly (unpredictably) need to read/write the data for a different Entity (and probably a different ComponentType) elsewhere.

This is slow and problematic, but it only happens thousands of times per second … while the other cases happen millions of times (they have to read EVERYTHING, multiple times – once per Processor). We’ll optimize the main cases first, and I’ll leave this one for a later post.

Iterating towards a solution…

So … our common-but-difficult case is: Processors reading multiple Components in parallel. We need a good DS to handle this.

Iteration 1: a BigArray per ComponentType

The most obvious way forwards is to store the EntityID of each row into our Arrays, so that you can match rows from different Arrays.

If we have a lot of spare memory, instead of “tightly-packing” our data into Arrays, we can use the array-index itself as the EntityID. This works because our EntityID’s are defined as integers – the same as an array-index.

rect3859

Usage algorithm:

  • For iterating, we send the whole Array at once
  • When a Processor needs to access N Components, we send it N * big-arrays
  • For random access, we can directly jump to the memory location
    • The Memory location is: (base address of Array) + (Component-size * EntityID)
    • The base-address can easily be kept/cached with the CPU while iterating
    • Bonus: Random access isn’t especially random; with some work, we could optimize it further

Problem 1: Blows the cache

This approach works for our “simple” scenario (1 Component / Processor). It seems to work for our “complex” case (multiple Components / Processor) – but in practice it fails.

We iterate through the Position array, and at each line we now have enough info to fetch the related row from the Velocity array. If both arrays are small enough to fit inside the CPU’s L1 cache (or at least the L2), then we’ll be OK.

Each instance is 500 bytes
Each BigArray has 20k entries

Total: 10 MegaBytes per BigArray

This quickly overwhelms the caches (even an L3 Cache would struggle to hold a single BigArray, let alone multiple). What happens net depends a lot on both the algorithm (does it read both arrays on every row? every 10th row?), and the platform (how does the OS handle RAM reads when the CPU cache is overloaded?).

We can optimize this per-platform, but I’d prefer to avoid the situation.

Problem 2: memory usage

Our typeArray’s will need to be approimately 10 megabytes each:

For 1 Component type: 20,000 Entities * 50 variables * 8 bytes each = 8 MB

…and that’s not so bad. Smaller components will give smaller typeArrays, helping a bit. And with a maximum of 50 unique ComponentTypes, we’ve got an upper bound of 500 MB for our entire ES. On a modern desktop, that’s bearable.

But if we’re doing mobile (Apple devices in 2014 still ship with 512 MB RAM), we’re way too big. Or if we’re doing dynamic textures and/or geometry, we’ll lose a lot of RAM to them, and be in trouble even on desktop.

Problem 3: streaming cost

This is tied to RAM usage, but sometimes it presents a bottleneck before you run out of memory.

The data has to be streamed from RAM to the CPU. If the data is purely contiguous (for each component-type, it is!), this will be “fast”, but … 500 MB data / frame? DDR3 peaks around 10 Gigabytes / second, i.e.:

Peak frame rate: 20 FPS … divided by the number of Processors

1 FPS sound good? No? Oh.

Summary: works for small games

If you can reduce your entity count by a factor of 10 (or even better: 100), this approach works fine.

  • Memory usage was only slightly too big; a factor of 10 reduction and we’re fine
  • CPU caching algorithms are often “good enough” to handle this for small datasets

The current build of Aliqua is using this approach. Not because I like it, but because it’s extremely quick and easy to implement. You can get surprisingly far with this approach – MyEarth runs at 60 FPS on an iPad, with almost no detectable overhead from the ES.

Iteration 2: the Mega-Array of Doom

Even on a small game, we often want to burst up to 100,000+ Entities. There are many things we could do to reduce RAM usage, but our biggest problem is the de-contiguous data (multiple independent Arrays). We shot ourselves in the foot. If we can fix that, our code will scale better.

es-datastructures-structured-bigarray

In an ideal world, the CPU wants us to interleave the components for each Entity. i.e. all the Components for a single Entity are adjacent in memory. When a Processor needs to “read from the Velocity and write to the Position”, it has both of them immediately to hand.

Problem 1: Interleaving only works for one set at a time

If we interleave “all Position’s with all Velocity’s”, we can’t interleave either of them with anything else. The Velocity’s are probably being generated by a different Processor – e.g. a Physics Engine – from yet another ComponentType.

mega-array

So, ultimately, the mega-array only lets us optimize one Processor – all the rest will find their data scattered semi-randomly across the mega-array.

NB: this may be acceptable for your game; I’ve seen cases where one or two Processors accounted for most of the CPU time. The authors optimized the DS for one Processor (and/or had a duplicate copy for the other Processor), and got enough speed boost not to worry about the rest

Summary: didn’t really help?

The Mega Array is too big, and it’s too interconnected. In a lot of ways, our “lots of smaller arrays – one per ComponentType” was a closer fit. Our Processors are mostly independent of one another, so our ideal Data Structure will probably consist of multiple independent structures.

Perhaps there’s a halfway-house?

Iteration 3: Add internal structure to our MegaArray

When you use an Entity System in a real game, and start debugging, you notice something interesting. Most people start with an EntityID counter that increases by 1 each time a new Entity is created. A side-effect is that the layout of components on entities becomes a “map” of your source code, showing how it executed, and in what order.

e.g. With the Iteration-1 BigArrays, my Position’s array might look like this:

rect3859

  1. First entity was an on-screen “loading” message, that needed a position
  2. BLANK (next entity holds info to say if loading is finished yet, which never renders, so has no position)
  3. BLANK (next entity is the metadata for the texture I’m loading in background; again: no position)
  4. Fourth entity is a 3d object which I’ll apply the texture to. I create this once the texture has finished loading, so that I can remove the “loading” message and display the object instead
  5. …etc

If the EntityID’s were generated randomly, I couldn’t say which Component was which simply by looking at the Array like this. Most ES’s generate ID’s sequentially because it’s fast, it’s easy to debug (and because “lastID++;” is quick to type ;)). But do they need to? Nope.

If we generate ID’s intelligently, we could impose some structure on our MegaArray, and simplify the problems…

  1. Whenever a new Entity is created, the caller gives a “hint” of the Component Types that entity is likely to acquire at some time during this run of the app
  2. Each time a new unique hint is presented, the EntitySystem pre-reserves a block of EntityID’s for “this and all future entities using the same hint”
  3. If a range runs out, no problem: we add a new range to the end of the MegaArray, with the same spec, and duplicate the range in the mini-table.
  4. Per frame, per Processor: we send a set of ranges within the MegaArray that are needed. The gaps will slow-down the RAM-to-CPU transfer a little – but not much

es-datastructures-structured-megaarray

Problem 1: Heterogeneity

Problem 1 from the MegaArray approach has been improved, but not completely solved.

When a new Entity is created that intends to have Position, Velocity, and Physics … do we include it as “Pos, Vel”, “Pos, Phys” … or create a new template, and append it at end of our MegaArray?

If we include it as a new template, and insist that templates are authoritative (i.e. the range for “Pos, Vel” templates only includes Entities with those Components, and no others) … we’ll rapidly fragment our mini-table. Every time an Entity gains or loses a Component, it will cause a split in the mini-table range.

Alternatively, if we define templates as indicative (i.e. the range for “Pos, Vel” contains things that are usually, but not always Pos + Vel combos), we’ll need some additional info to remember precisely which entities in that range really do have Pos + Vel.

Problem 2: Heterogeneity and Fragmentation from gaining/losing Components

When an Entity loses a Component, or gains one, it will mess-up our mini-table of ranges. The approach suggested above will work … the mini-table will tend to get more and more fragmented over time. Eventually every range is only one item long. At that point, we’ll be wasting a lot of bus-time and CPU-cache simply tracking which Entity is where.

NB: As far as I remember, it’s impossible to escape Fragmentation when working with dynamic data-structures – it’s a fundamental side effect of mutable data. So long as our fragmentating problems are “small” I’ll be happy.

Problem 3: Heterogeneity and Finding the Components within the Array

If we know that “Entity 4″ starts at byte-offset “2048″, and might have a Position and Velocity, that’s great.

But where do we find the Position? And the Velocity?

They’re at “some” offset from 2048 … but unless we know all the Components stored for Entity 4 … and what order they were appended / replaced … we have no idea which. Raw array-data is typeless by nature…

Iteration 4: More explicit structure; more indexing tables

We add a table holding “where does each Entity start”, and tables for each Component stating “offset for that Component within each Entity”. Conveniently, this also gives us a small, efficient index of “which Entities have Component (whatever)”:

es-datastructures-structured-megaarray-by-component

Problem 1: non-contiguous data!

To iterate over our gameobjects, we now need:

  • One big mega-array (contiguous)
  • N x mini arrays (probably scattered around memory)

Back to square one? Not quite – the mini-arrays are tiny. If we assume a limit of 128,000 entities, and at most 8kb of data for all Components on an Entity, our tables will be:

[ID: 17bits][Offset: 13 bits] = 30 bits per Component

…so that each mini-array is 1-40 kB in size. That’s small enough that several could fit in the cache at once.

Good enough? Maybe…

At this point, our iterations are quite good, but we’re seeing some recurring problems:

  • Re-allocation of arrays when Components are added/removed (I’ve not covered this above – if you’re not familiar with the problem, google “C dynamic array”)
  • Fragmentation (affects every iteration after Iteration 1, which doesn’t get any worse simple because it’s already as bad as it could be)
  • Cross-referencing (which I skipped)

I’ve also omitted history-tracking – none of the DS’s above facilitate snapshots or deltas of game state. This doesn’t matter for e.g. rendering – but for e.g. network code it becomes extremely important.

There’s also an elephant in the room: multi-threaded access to the ES. Some ES’s, and ES-related engines (*cough*Unity*cough*), simply give-up on MT. But the basis of an ES – independent, stateless, batch-oriented programming – is perfect for multi threading. So there must be a good way of getting there…

…which gives me a whole bunch of things to look at in future posts :).

PS … thanks to:

Writing these things takes ages. So much to say, so hard to keep it concise. I inflicted early drafts of this on a lot of people, and I wanted to say “thanks, guys” :). In no particular order (and sorry in advance if final version cut bits you thought should be in there, or vice versa): TCE’ers (especially Dog, Simon Cooke, doihaveto, archangelmorph, Hypercube, et al), ADB’ers (Amir Ebrahimi, Yggy King, Joseph Simons, Alex Darby, Thomas Young, etc). Final edit – and any stupid mistakes – are mine, but those people helped a lot with improving, simplifying, and explaining what I was trying to say.

OpenGL ES2 – Shader Uniforms

There’s a famous animated GIF of an infinitely swirling snake (here’s one that’s been Harry Potterised with the Slytherin logo):

Impressive, right?

What if I said it only relies upon one variable, and that you can reproduce this yourself in 3D in mere minutes? It’ll take quite a lot longer to read and understand, but once you’ve grokked it, you’ll be able to do this easily.

Background reading

Make sure you understand:

  1. Draw Calls
  2. VAOs + VBOs
  3. and at least one kind of Texturing:

Uniforms vs. Vertex Attributes

In theory, you don’t need Uniforms: they are a special kind of Vertex-Attribute (which gives them their name: they are a “uniform attribute”).

In practice, as we’ve already seen, bitmap-based texture-mapping in GL ES requires Uniforms to make a link between Texture and Shader.

The code for sending them to the GPU is different from sending Vertex Attributes – annoyingly, it’s more complicated – but the concept is identical. Then why do we have them?

Performance.

And, as a bonus: convenience.

How many items per Spaceship?

A typical 3D model of a spaceship has:

  • 5,000 polygons
  • 10,000 vertices
  • 5 bitmap tetures

OK, fine, so …

Performance basics

Your vertices each specify which texture(s) they’re using. If you want to change the textures, from “standard” ones to “damaged” ones, you’ll have to:

  1. Upload the new texture (one-time cost; once it’s on GPU you can fast-switch between textures)
  2. Re-Upload the “uses texture 1 (out of the 5)” vertex-attribute … once for every vertex (repeated cost: has to be done every time)

Uniforms bypass this by saying:

A GL Uniform is a Vertex Attribute that has the same value for EVERY vertex. It only needs to be uploaded ONCE and is immediately applied to every vertex

But there’s more …

Each vertex can hold up to 2kb of data (in OpenGL ES; more on desktop GL) – making our ship take 10 megabytes of GPU RAM. But that’s small by today’s standards – and as the model gets more complicated, the storage needed increases.

By contrast, the number of Uniforms needed for a model is typically constant.

The net effect is that GPU vendors can afford to use faster RAM for their Uniforms, boosting performance even further.

Convenience basics

Revisiting that spaceship, if we’re using it in a game, there’s a lot more things we’ll want to include in the 3D model:

  • 10 different versions, one for each player. They’re all the same, but some elements change colour to match the Player’s colour
  • Some of the textures animate: e.g. Landing strips, with lights that strobe
  • Gun turrets need to rotate hundreds of vertices at once, without affecting their neighbours
  • … and so on.

Each of these becomes trivial when your DrawCall has global-constants – i.e Uniforms. For instance:

  • 10 different versions…:
    • The vertices that might change colour have a vertex attribute signalling this; at render-time, the shader sees this flag on the verte, and reads from a Uniform what the “actual” colour should be. Change the uniform, and the colour changes everywhere on the model at once
  • Textures animate…:
    • When you read a U,V value from the texture-bitmap, add a number to U and/or V that comes from a Uniform.
    • Mark the teture as “GL_REPEAT”, so that GL treats it like an infinitely tiled teture
    • Increase that uniform by a tiny amount each time you render a frame (e.g. 0.001), and the texture appears to “scroll”
  • Gun turrets rotate…:
    • Use a second DrawCall to draw the turrets.
    • Each turret has a Uniform “rotation angle in degrees”
    • When rendering, your Shader pre-rotates ALL vertices in each turret by the Uniform’s value.
    • Per frame, change the Uniform for “rotation angle”, and the whole turret rotates at once
  • …etc…

Implementing and Using Uniforms in OpenGL

Render / Update cycle for Uniforms

With Vertex-Attributes, it was easy:

  1. CPU: Generate geometry (load from a .3ds file; algorithm for making a Cube; etc)
  2. GPU: Create 1 or more VBO’s to hold the data on GPU
  3. CPU->GPU: Upload the geometry from CPU to GPU in a single big chunk
  4. Every frame: GPU reads the data from local RAM, and renders it
  5. To change the data, re-do all the above

With Uniforms, it’s more tricky.

Firstly – like everything else in Shaders – Uniforms ignore any OpenGL features that already existed. The GPU intelligently selected the correct VBOs each frame, by using the data inside the VAO. But Shaders ignore the VAO, and need to be manually switched over.

Secondly – in VBO’s, OpenGL does not care what format your data is in. But with Uniforms, suddenly it does care: you have to specify, every time you upload them.

Thirdly – GL uses an efficient C-language mechanism for uploading Uniforms with minimal overhead. With VBO’s, the VAO took care of this automatically, but again: Shaders need you to do it by hand.

Together, these complicate the process:

  1. CPU: generate a value for the Uniform
  2. CPU: create an area in RAM that will hold the value, and place it there
  3. GPU: automatically creates storage for the Uniform when you compile/link the ShaderProgram
  4. CPU->GPU: switch to the specific ShaderProgram that will use the Uniform
  5. CPU->GPU: don’t send the data; instead, send the “memory-address” of the data
  6. CPU->GPU: upload using one of thirty three unique methods (instead of the one for Verte Attributes)
  7. Every frame: GPU reads the data from local RAM, but each ShaderProgram has its own copy
  8. To change the data, re-do all the above

Uploading a value to a Uniform

After you’ve linked your ShaderProgram, you can ask OpenGL about the Uniforms it found. For each Uniform, you get:

  1. The human-readable name used in the GLSL file
  2. The OpenGL-readable name generated automatically (an integer: GLint)
  3. The GLType (int, bool, float, vec2, vec3, vec4, mat2, mat3, mat4, etc)
  4. Is this one value, or an array of values? If it’s an array: how many slots in the array? (all GLSL arrays are fied length)

The GLType has to be saved, because when you want to upload, there’s a different upload method for each distinct type:

glUniform1i – sends 1 * integer

glUniform3f – sends 3 * floats

glUniformMatri4fv – sends N * 4×4-matrices, each using floats internally

etc

To handle this automatically, I wrote three chunks of code:

  1. GLK2Uniform.h/m:: stores the GLType, is it a Matrix or Vector (or float?), etc
  2. GLK2ShaderProgram.h/m .. -(NSMutableDictionary*) fetchAllUniformsAfterLinking: parses the data from the Shader, and creates GLK2Uniform instances
  3. GLK2ShaderProgram.h/m .. -(void) setValue:(const void*) value forUniform:(GLK2Uniform*) uniform: uses the GLType data etc to pick the appropriate GL method to upload this value to the specified Uniform

That last method takes “const void*” as argument: i.e. it has no type-checking. I find this much simpler than continually specifying the type. It also intelligently handles dereferencing the pointer (for matrices and vectors) or not (for ints, floats, etc).

Uniforms and VAOs: a missing feature from OpenGL

So far, we’ve used VAO’s. They’re very useful, seemingly they:

…store all the render-state that is specific to a particular Draw call

Tragically: Shaders ignore VAO’s. Once you start using Uniforms, you find that VAO’s actually:

…store all the render-state that is specific to a particular Draw call, so long as that state isn’t in a ShaderProgram (i.e. isn’t a Uniform)

Shaders are the only place where VAO’s aren’t used, and it’s very easy to forget this and have your code break in weird and wonderful ways. If you find Shader state seems to be leaking between DrawCalls, you almost certainly forgot to eplicitly switch ShaderProgram somewhere.

Note: this applies not only for rendering a DrawCall, but also for setting the value of the Uniform. You must call glUseProgram() before setting a Uniform value.

Because we often want to set Uniform values outside of the draw loop – e.g. when configuring a Shader at startup – I added a method that automatically switches to the right program for you. If you use this repeatedly on every frame, it’ll damage performance, but it’s great for checking if you’ve forgotten a glUseProgram somewhere:

GLK2ShaderProgram.m:

-(void) setValueOutsideRenderLoopRestoringProgramAfterwards:(const void*) value forUniform:(GLK2Uniform*) uniform 
{
	GLint currentProgram;
	glGetIntegerv( GL_CURRENT_PROGRAM, &currentProgram);
	
	glUseProgram(self.glName);
	[self setValue:value forUniform:uniform];
	glUseProgram(currentProgram);
}

Uniforms in your Game Engine

Game-Engine code has to treat Uniforms a little specially:

  • Unlike Vertex-Attributes, we tend to update Uniforms very frequently – often every frame.
  • We have to reference them by human-readable name (instead of simply ramming them into a homogeneous C-array).
  • We have to remember to keep calling glUseProgram() each time we write to a Uniform, or render a DrawCall.

You can layer it in fancy OOP wrappers, but ultimately you’re forced to have a Hashtable/Map somewhere that goes from “human-readable Uniform name” to “chunk of memory holding the current value, that can be sent to the GPU whenever it changes”.

Desktop GL is different; they modified the GLSL / Shader spec so that it allowed for slightly tighter integration of variables with your main app. Sadly, they didn’t include those features in GL ES

I’ve tried it a few different ways, but the problem is that you have to store a “string” mapping to a “C-struct”. Worse, OpenGL ignores the value of the struct, it only uses the memory-address. So that struct has to be at a stable location in RAM.

This might not seem a problem, but Apple’s system for storing structs in NSDictionary is to create and destroy them on-the-fly (on the stack) – so there’s never a stable memory-address.

Here’s my current best workaround for ObjectiveC OpenGL apps…

An intelligent, C-based, “Map” class

Observations:

  1. All our data will be structs
    1. C can easily store data if it’s homogeneous
    2. and OpenGL only has circa 10 unique structs for Uniforms
    3. …so: 10 arrays will be enough to store “all possible” Uniform values for a given ShaderProgram
  2. Data is unique per ShaderProgram
    1. C arrays-of-structs can’t change size once created :(
    2. But: the Uniforms for a ShaderProgram are hard-coded, cannot change at runtime
    3. …so: we can create one Map per ShaderProgram, and we know it will always be correct
  3. C-strings are horrible, and we want to avoid them like the plague
    1. We can easily convert C-strings into Objective-C strings (NSString)
    2. Apple’s NSArray stores NSString’s, and returns an int when you ask “which slot contains NSString* blah?”
    3. C allows int’s for direct-fetching of locations in an array-of-structs
    4. …so: we can have a Data Structure of NSString’s, and a separate C-array of structs, and they never have to interact

GLK2UniformMap.h:

@interface GLK2UniformMap : NSObject

+(GLK2UniformMap*) uniformMapForLinkedShaderProgram:(GLK2ShaderProgram*) shaderProgram;

- (id)initWithUniforms:(NSArray*) allUniforms;

-(GLKMatrix2*) pointerToMatrix2Named:(NSString*) name;
-(GLKMatrix3*) pointerToMatrix3Named:(NSString*) name;
-(GLKMatrix4*) pointerToMatrix4Named:(NSString*) name;
-(void) setMatrix2:(GLKMatrix2) value named:(NSString*) name;
-(void) setMatrix3:(GLKMatrix3) value named:(NSString*) name;
-(void) setMatrix4:(GLKMatrix4) value named:(NSString*) name;

-(GLKVector2*) pointerToVector2Named:(NSString*) name;
-(GLKVector3*) pointerToVector3Named:(NSString*) name;
-(GLKVector4*) pointerToVector4Named:(NSString*) name;
-(void) setVector2:(GLKVector2) value named:(NSString*) name;
-(void) setVector3:(GLKVector3) value named:(NSString*) name;
-(void) setVector4:(GLKVector4) value named:(NSString*) name;

@end

You create a GLK2UniformMap from a specific GLK2ShaderProgram. It reads the ShaderProgram, finds out how many Uniforms of each GLType there are, and allocates C-arrays for each of them.

Later, you can use the “setBLAH:named:” methods to set-by-value any struct. Importantly, this does NOT take a pointer! This ensures you can create a struct on the fly – all of Apple’s GLKit methods do this. e.g. you can do:

...
GLK2UniformMap* mapOfUniforms = ...
...
[mapOfUniforms setVector3: GLKVector3Make( 0.0, 1.0, 0.0 ) named:@"position"];
...

Connecting the GLK2UniformMap to a GLK2DrawCall

In previous posts, I created the GLK2UniformValueGenerator protocol. This is a simple protocol that uses the same method signatures as used by OpenGL’s Uniform-upload commands.

We etend GLK2UniformMap, and implement that protocol, to create something we can attach to a GLK2DrawCall, and have our rendering do everything else automatically:

GLK2UniformMapGenerator.h:

@interface GLK2UniformMapGenerator : GLK2UniformMap <GLK2UniformValueGenerator>

+(GLK2UniformMapGenerator*) generatorForShaderProgram:(GLK2ShaderProgram*) shaderProgram;
+(GLK2UniformMapGenerator *)createAndAddToDrawCall:(GLK2DrawCall *)drawcall;

@end

Internally, the methods are very simple, e.g.:

GLK2UniformMapGenerator.m:

@implementation GLK2UniformMapGenerator
...
-(GLKMatrix2*) matrix2ForUniform:(GLK2Uniform*) v inDrawCall:(GLK2DrawCall*) drawCall
{
	return [self pointerToMatrix2Named:v.nameInSourceFile];
}
...

NB: in the protocol, I included the GLK2DrawCall that’s making the request. This is unnecessary. In future updates to the source, I’ll probably remove that argument.

Animated textures: the magic of Uniforms

Finally, let’s do something interesting: animate a texture-mapped object.

The sample code has jumped ahead a bit on GitHub, as I’ve been using it to demo things to a couple of different people.

Have a look around the project, but I’ve split into two projects. One contains the reusable library code, the other contains a Demo app that shows the library-code in use.

I simplified all the reusable render code to date into a Library class: GLK2DrawCallViewController (etends Apple’s GLKViewController)

I’ve also moved the boilerplate “create a triangle”, “create a cube” etc code into a Demo class: CommonGLEngineCode

The sample project – permanent link to branch for this article – has a simple ViewController that loads a snake image and puts it on a triangle:

AnimatedTextureViewController.m:

@interface AnimatedTextureViewController ()
@property(nonatomic,retain) GLK2UniformMapGenerator* generator;
@end

@implementation AnimatedTextureViewController

-(NSMutableArray*) createAllDrawCalls
{	
	/** All the local setup for the ViewController */
	NSMutableArray* result = [NSMutableArray array];
	
	/** -- Draw Call 1:
	 
	 triangle that contains a CALayer texture
	 */

	GLK2DrawCall* dcTri = [CommonGLEngineCode drawCallWithUnitTriangleAtOriginUsingShaders:
						   [GLK2ShaderProgram shaderProgramFromVertexFilename:@"VertexProjectedWithTexture" fragmentFilename:@"FragmentTextureScrolling"]];
...

That’s using the refactored CommonGLEngineCode class to make a unit triangle appear roughly in the middle of the screen.

Then we setup the UniformMapGenerator (no values yet):

...
	self.generator = [GLK2UniformMapGenerator createAndAddToDrawCall:dcTri];
...

NB: the generator class automatically detects requests for Sampler2D, and ignores them. Those are only used for texture-mapping, which we handle automatically inside the GLK2DrawCall class (see previous post for details).

...
	/** Load a scales texture - I Googled "Public Domain Scales", you can probably find much better */
	GLK2Texture* newTexture = [GLK2Texture textureNamed:@"fakesnake.png"];
	/** Make the texture infinitely tiled */
	glBindTexture( GL_TEXTURE_2D, newTexture.glName);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
	
	/** Add the GL texture to our Draw call / shader so it uses it */
	GLK2Uniform* samplerTexture1 = [dcTri.shaderProgram uniformNamed:@"s_texture1"];
	[dcTri setTexture:newTexture forSampler:samplerTexture1];
...

…again: c.f. previous post for details of what’s happening here, nothing’s changed.

...	
	/** Set the projection matrix to Identity (i.e. "dont change anything") */
	GLK2Uniform* uniProjectionMatrix = [dcTri.shaderProgram uniformNamed:@"projectionMatrix"];
	GLKMatrix4 rotatingProjectionMatrix = GLKMatrix4Identity;
	[dcTri.shaderProgram setValueOutsideRenderLoopRestoringProgramAfterwards:&rotatingProjectionMatrix forUniform:uniProjectionMatrix];
	
	[result addObject:dcTri];
	
	return result;
}

…ditto.

Finally, we now have to implement a callback to update our Generator’s built-in structs and ints and floats once per frame:

-(void)willRenderDrawCallUsingVAOShaderProgramAndDefaultUniforms:(GLK2DrawCall *)drawCall
{
	/** Generate a smoothly increasing value using GLKit's built-in frame-count and frame-timers */
	double framesOutOfFramesPerSecond = (self.framesDisplayed % (4*self.framesPerSecond)) / (double)(4.0*self.framesPerSecond);
	
	[self.generator setFloat: framesOutOfFramesPerSecond named:@"timeInSeconds"];
}

Run the project, tap the button, and you should see snakey skin scrolling along the surface of a 3D triangle:

Screen Shot 2014-02-20 at 01.51.20

From scales-on-a-triangle to realistic snake

The scrolling works by moving our offset across the surface of the triangle. Doing this with a Uniform means that the speed is constant relative to the corners of the triangle.

i.e. if you make the triangle smaller, it will take the same time to cover the distance, but it’s covering a shorter distance, so appears to move slower.

We’re getting the effect – for free! – of skin bunching up and stretching out. All you have to do is make your triangles shorter on the inside of a snake-coil, and longer on the outside.

If you model your snake the easiest possible way, this bunching will happen automatically. Simply take a cylinder and bend it with a transform – the vertex attributes (that force the texture to map across each triangle) won’t change, but the triangle sizes will, causing realistic bunching/stretching of the skin.