Category Archives: programming

OpenGL ES 2 – Video on iPhone in 3D, and Texture-Map sources

This is a blog post I wrote 6 months ago, and my life got too busy to finish it off, finish testing the code, improving the text. So I’m publishing this now, warts and all. There are probably bugs. There are bits that I could explain better – but I don’t have time. Note that the code is ripped from apps where I ACTUALLY USED IT – so I know it does work, and works well!

In brief:

  1. Using different “sources” of textures in OpenGL, not just images!
  2. Using CALayer’s as a source — allows for video in 3D!
  3. Using “Video straight from the video decoding chip” as a source – faster video!

Continue reading

World-exploration game, done differently

Inspired by Markus / @notch, and how he got MineCraft going in the early days, I’m live-prototyping a game idea I’ve been thinking about for ages. It’s about exploring a landscape … where cities affect dungeons, and dungeons affect cities.

(this is stress-relief for me, an “evenings and weekends” project. By day, I’m helping Schools teach children to program, but that’s HARD (no investors, lots of costs – at the moment, I get paid nothing))

Play in web browser here (very early prototype!)

Screen Shot 2014-07-25 at 12.31.23

July 2014

Current build focusses on the “City building” interface. Initially, you won’t be building cities, you’ll be building small camps / trading forts, etc. So this lets me play with the interface, the scale, the rendering, etc. Make sure it’s fun.

Create something – but take screenshots! Because there’s no “save” yet. Super-early prototype.

Long term gameplay

  1. Explore a huge landscape in FPS
  2. Any location, “create a camp”, and it switches to the isometric view you see in current demo. Build yourself a home/fort/etc.
  3. Fast-travel between locations, but this triggers “random encounter” events … or run between locations manually.
  4. Turn-based / DungeonMaster style dungeons are randomly generated near to each camp. Explore the dungeons for resources and to level-up.

Unlike e.g. Skyrim … when you find an interesting location, you can’t fast-travel to it. You have to build a camp next to it, and fast-travel to that. But your camp can be overrun/destroyed/stolen while you’re away.

So … you’ll be building (and maintaining) a lot of bases around the world. I want to be sure that base-building is easy and fun, even after you’ve done it lots of times already, before progressing with the rest!

(I’m trying to make it “repeatably fun” by having the landscape HUGELY affect your base design)

How to tidy your Unity code by putting it in a DLL

The importance of Re-use

Here’s a story that may sound familiar:

You downloaded Unity, watched some tutorials on YouTube, and had a physics “game” running the same day. Excited, you started writing the game you’ve always wanted to make. Possibly – bitten before when trying this with GameMaker etc – you thought to organize your source code: folders, subfolders, logically-named scripts, etc.

Fast-forward a few months, and you have 50 scripts in 40 sub-folders, some of it re-usable, some of it hardcoded for your game. Hoooo-Kayyyy… Well, it’s not too bad. And then a new opportunity comes along – a paid contract, a game-jam, a collaboration with a friend – and you want to re-use some code from your main project. You look at the folder structure, you look at how intertwined it’s all become … and you weep.

Without re-use, what seemed like an “easy” project becomes just a little too much work, and you never finish it. Back to square one :(.

Code re-use in Unity

There are five major aspects to code re-use when making Unity apps. Most of these are shared with general programming, but these are the ones I’ve found *especially* important when doing Unity development.

Good names

The best way to get good at naming things in a program is to focus on “what makes a bad name”, and “Don’t do that”. e.g.

  • If you have to open a Script to remember what it does … it has the wrong name.
  • If you keep typo’ing the script when referencing it in other scripts … it’s the wrong name
  • If your script is more than a thousand lines long … it’s the wrong name AND you’ve put too much junk in there; split it!
  • If you can’t find a script, because you can’t remember which folder it’s in … the folder name is wrong, AND the script name is wrong (should have been a name that forced you to put it in the right place first time!)
  • …etc

Renaming scripts in Unity is a minor pain: you rename it, then you have to close the file in MonoDevelop, re-open it, and (assuming it was a C# class) rename the class in source-code too. (I keep hoping latest Unity will do this automatically, but I’m a version behind at the moment).

We need Sub-folders. Lots of Sub-Folders

… you can never have too many. Give them good names, though!

De-couple your code

Ah, now it gets tricky. The art of decoupling takes much practice to master.

Decoupling is why C# (and Java) has the “interface” keyword. The concept is much easier to understand after you cocked it up, written code you can’t re-use, because everything’s too inter-dependent.

“I can’t just copy/paste that script to my new project, because it references 3 other scripts. Each of which … references another 12. ARGH!”

Unity itself doesn’t use Interfaces (though it really ought to!) – and you have to use a little ingenuity to use them yourself (Google it, it’s only a few small caveats). But C#-interfaces simply give you a compiler-compatible way of decoupling: you have to design your classes as decoupled to start with. Again, google this and ask around – it’s too big a topic for me to do justice to here!

Package your code, so you can re-use it

Libraries. This is why The God Of Programming invented Libraries. So, how do you do a Library in Unity?

Libraries in Unity: the easy DLL

There are two kinds of DLL’s:

  1. Real DLL’s, as used by Real Men, who write all their code in C++
  2. Fake DLL’s, as used by the rest of us, who just want an easy life so we can focus on making our game

A DLL is a great way of packaging code – it’s a very widely-used standard. So, Unity uses DLL’s for this (good move) – but if you google “Unity make DLL” you’ll get distracted by the huge complexity and depth of C++/Unity integration, and you don’t need any of it. Instead, you’ll be doing “DLL light”, which is easy.

But, as with most Unity features, it’s almost entirely undocumented. And, out of the box, it will fail. For bonus points, Unity will give you completely the wrong error message when it goes wrong. Yay!

Step 1: write a Unity script in Unity

Do something simple. I wrote a class that makes random numbers, following XKCD’s advice:

using UnityEngine;

/** In Unity 3, you cannot have namespaces. So comment out this next line. Unity 4 is fine */
namespace MyTestDLL
	public class DLLClass
		public static int GetRandom()
			return 4;
/** In Unity 3, you cannot have namespaces. So comment out this next line. Unity 4 is fine */

Make a second script that uses that first one, e.g.

using UnityEngine;

public class TestScript : MonoBehaviour
	void Start()
		print( DLLClass.GetRandom() );

…attach it to a GameObject, check it works.

Step 2: follow Unity’s docs on creating and using a DLL

You’d think this would be enough, but it isn’t. However, the docs are simple, direct, and I found them easy to follow.

So = read this and do what it says

NB: they write a slightly more complex script to use than mine, with but whatever. The idea is the same.

Step 3: Use the namespace, Luke. And Interfaces. And …

Now that your DLL is compiling/building in MonoDevelop … you are no longer stuck with Unity’s arbitrary rules and restrictions! The world is yours!

(so, even in Unity 3, you can now use the namespace. Which makes it MUCH easier to keep your code well-organized ;))

Step 4: What about the [square brackets]?

These Just Work ™ exactly as they did in Unity scripts. e.g.

using UnityEngine;

namespace MyTestDLL

/** Hey! Look! Unity editor-features ... coded outside of Unity */
	public class DLLClass

Step 5: Editor extensions, editor scripts, editor GUIs, etc

This is where it breaks. If you had any Editor scripts (which, in Unity, MUST be stored in the “Editor” root folder, or one of its subfolders), you can include them in your DLL – but they won’t work.

Worse, when you trigger them (e.g. by selecting a GameObject that did custom editor rendering), you’ll get:

Error: multi-object editing not supported !

…at least, you do in Unity 3.x. Fixed in 4.x, I suspect? But I haven’t gone back to try it since I fixed the bug :).

There is some mumbo-jumbo on the internet (and Unity forums) about complicated workarounds, notably courtesy of AngryAnt (with a Jan 2014 post here, with some extra info worth reading but NOT required!). But these are long-since outdated and unnecessary. The fix is very simple: we must create TWO DLL’s!

  1. DLL #1: the one we already had
  2. DLL #2: will contain ONLY the editor-scripts, and we’ll drag/drop it into Unity’s “Editor” folder.

This is extremely logical, simple, and easy to remember. And it works beautifully! But .. how?

Step 5a: Upgrade MonoDevelop project to output not one, but two, DLL’s

First, right-click on the top-most item in MonoDevelop’s “Solution” panel, and select “Add New Project…”:

Screen Shot 2014-05-24 at 20.54.53

As you did earlier, choose “Library (C#)”. I recommend naming this “SomethingSomethingEditorScripts” (where “SomethingSomething” is your library name, which you used for the first DLL).

Second, you need to edit the References (as you did with first DLL), and this time repeat the steps and add ALL of:

  • UnityEngine.dll
  • UnityEditor.dll
  • AND: instead of the “.Net Assembly” tab, use the “Projects” tab and select your main DLL/library. It should be the only option

That gives you something like this:

Screen Shot 2014-05-24 at 20.58.22

…see how I have two “projects” in MonoDevelop, one which just uses normal Unity stuff, the other which additionally uses Unity Editor stuff? And the second one “references” the first, so it can see/use/create the classes and methods from the first one.

Step 5b: Put the DLL’s in different places

Build, and find the output files. The first project/DLL will still only be making one DLL, but the folder for the SECOND project/DLL will conveniently build BOTH projects and contain BOTH DLL’s.

Make sure you put one DLL “anywhere except Editor” and the other one “in Editor, or a subfolder of Editor”.

Finally, create some GameObject’s, open up the DLL’s in Unity’s Project view (expand the triangle) and drag/drop the Scripts onto your objects as desired. Click on them in the Scene view, and all your OnGUI, OnGizmos etc methods should run exactly as normal.

Step 6: Rejoice!

Now you can share your code with other Unity projects simply by drag/dropping the 2 x DLL’s into a new Unity project. BOOM!

Beautiful! Easy, simple, impossible-to-screw-up ;). That’s how I like my code re-use…

Should freelancers in gamedev industry sign NDA’s?

I’m sure I’ve written about this before, but it’s come up again on Reddit, so here’s my thoughts…

IANAL, but … I have been offered more than 100 NDA’s, both as an individual and working in large companies (including a games publisher, and including a funding team).

I have signed fewer than 30 – and most of those were before I realised how “wrong” it is to sign an NDA.

These days, I only sign 1 in every 10 NDA’s sent my way – and yet I still end up working for, or doing business with, at least 50% of the people who initially “required” an NDA.

Often, people send me a bad, broken, poorly worded, typo-riddled NDA and I say “I’m not signing that. Fix it, get it down to 1.5 pages max, and I’ll reconsider” — and they usually don’t bother, they just tell me their “secrets” anyway. They didn’t really need or want the NDA!

Also: In most jurisdictions, an NDA can’t realistically be more than 2 pages, maybe 3 if it’s badly worded … if it’s more than 2 pages, you’re either in a niche industry (not games), or … it’s not an NDA, it’s a contract that’s pretending to be an NDA, perhaps for evil reasons. When someone sends me a 5-10 page NDA I say “send me a 1-2 page NDA. Don’t send me a secret employment contract”. AGain, they often simply carry on talking to me without NDA.

One large games company (100 staff) sent me (IIRC) a TWENTY TWO page NDA. Buried on page 17 was something like: “we own all converations with you, and at YOUR cost we can confiscate all hard-disks and storage belonging to you or your company if we have any suspicion you might be developing ideas similar to our own”.

(It specifically took ownership of any ideas similar – even if we’d never discussed them. It was disgusting.)

There are some exceptions to be aware of – but the company can easily explain + justify this to you. For instance, if the company is based on a patent, then in Europe they may be required to prove everyone is under NDA or risk invalidating the patent (not the same as USA, where you can talk about patents publically, IIRC).

TL;DR – many “NDA’s” are a scam. Refuse them without hesitation. Or question them, and discover the company didn’t really need it anyway. Most “decent” companies won’t care that you’re not under NDA; the obsession with NDA’s usually comes from the kind of dumb-ass employer that is a PITA to work for.

(to reiterate: IANAL. I’m happy giving advice, but legally you shouldn’t trust random strangers on the internet, and I take no responsibility for your actions ! :))

Why you shouldn’t use webfonts instead of images

Warning: principled rant against sloppy design and bad coding about to start; kids these days! Get off my lawn!

There’s a terrible disease affecting modern web developers – has just fallen ill with it, and it could be a long time before they cure themselves.

Deleting all images, and replacing them with the dreaded “web font”.

It’s the wrong solution for the problem, it doesn’t do what you think it does, and it pisses all over some of the Web’s core principles. If you’re a web developer, and you respect your craft, you shouldn’t even consider this, not for one moment. Here, for instance, is how Twitter currently looks for me – ugly, and very hard to use!

Screen Shot 2014-04-15 at 10.18.17

GitHub was another recent victim of this disease. They’re still in recovery, but they at least made their site “slightly usable” by adding tooltips:

Screen Shot 2014-04-15 at 10.30.18

What’s going on?

The problem

Core Web principle:

Information is everything; presentation is optional. It’s acceptable to forego presentation, so long as everyone can access the information.

Effect: When you load a webpage, your browser requests every image separately. This is a long way short of the “most optimal” code implementation.

Is this a big problem? For most of us … Not really. The system works, it’s flexible, it’s powerful – it’s a little inefficient, but for the corporations that care there are plenty of hacks and optimizations they can deploy.

The other problem

Most artists should be creating Vector images most of the time, but the software vendors who made the editing tools for Vectors … all died out around 15 years ago. Back then, the advantages of Vectors were small, because most screens were low res.

We still haven’t recovered. There are many standards for bitmap image files, and two very popular ones – PNG, JPG. There are many standards for vector image files, but no popular ones.

Effect: web developers end up looking to Web Fonts as a de facto “vector image” standard.

SVG, or not to SVG?

There is an official Web/HTML approved vector standard – SVG – in wide use, with strong support in all current browsers.

Does it work? Let’s see ( …

Screen Shot 2014-04-15 at 10.45.46
Screen Shot 2014-04-15 at 10.45.33

… but many software companies ignore it. For instance, Apple allows programmers to use both PNG and JPG (why?) in core iOS, but not SVG (despite having a full SVG parser built-in to their web browser). Many programmers I speak to believe that SVG isn’t supported at all – FUD wins again.

The other, other problem

A core principle of the Web is that information is accurately described (the M in HTML). A recent trend in web development is “progressive enhancement”.

def. “Progressive Enhancement HTML”: a webpage written the way you were supposed to write it, instead of being hacked-together by unskilled monkeys

HTML has always been “progressive” – this was a core principle 20 years ago. But HTML was so easy to use and abuse that many of us (most of us? nearly all of us?) have been writing poor HTML most of that time. Shame on us (shame on me, certainly – been there, done that :( ).

But … the key point here is: Progressive Enhancement isn’t an “optional extra”, it is the Web. If you fight PE, you’re fighting the entire web infrastructure – and we know how that war will end.

…whatever. What about Web fonts?

So, when you remove an “image” and put a “web font letter” there instead, and change that letter so that the font contains an image you wanted …

…your HTML is now a lie.

Maybe you’re the kind of web developer who scorns blind people (and partially blind), who ignores the Internationalization features of software. You laugh in the face of Accessibility Standards, so you can reduce development time.

But HTML doesn’t make these things optional. They are so core to HTML that they are “always on”, even if you personally never use (need) them. One of the beautiful features of HTML is that if you do nothing, most of the Accessibility is automatically done for you.

With HTML, you have to go out of your way to prevent Accessibility. For instance: replacing images with magic-letters from a magic font.

You’re not blind; why does the Web Font fail?

The thing about custom Web Fonts is … the user can disable them.

Again, this is fundamental to the web. Partly for the Accessibility issue (who are you to decide which users require Accessibility? No. It’s for the user to decide).

But also to support the web principles of openness, and user-control (not corporation-control). My machine, my browser, my choice.

Just as you cannot prevent users from hitting the “View -> Zoom” menu option and making your web page take more or fewer pixels on their screen (I’ve worked with web designers – mostly ex-print designers – who HATED this, and felt it was a feature that should be banned) … you cannot force a crappy font on the user.

Information is everything; presentation is optional. It’s acceptable to forego presentation, so long as everyone can access the information.

In my case: I have a MacBook Air. While wonderful in many ways, they have tiny (11″), non-retina screens, and they’re laptops – so the screen is often further away than I’d like. When WebFonts came to CSS, a lot of the websites I use (art sites, design agencies) started using “beautiful but TINY” fonts that were unreadable. Game Studios still do this today, sadly – lots of hard-to-read but “edgy!” fonts and bad colour choices.

Sometimes, the only way I can do my day to day work is to disable the crappy 3rd party fonts.

Another solution?

SVG says “Hi!”. Think about it.

Debugging odd problems when writing game servers

Cas (@PuppyGames) (of PuppyGames fame – Titan Attacks, Revenge of the Titans, etc) is working on an awesome new MMO RTS … thing.

He had a weird networking problem, and was looking for suggestions on possible causes. I used to do a lot of MMO dev, so we ran through the “typical” problems. This isn’t exhaustive, but as a quick-n-dirty checklist, I figured it could help some other indie gamedevs doing multiplayer/network code.


  1. Java, Windows 7
  2. DataInputStream.readFully() from a socket… a local socket at that… taking 4.5 seconds to read the first bytes
  3. and I’ve already read a few bits and bobs out of it

Initial thoughts

First thought: in networking, everything has a realm of durations. The speed of light means that a network connection from New York to London takes a few milliseconds (unavoidably) – everything is measured in ms, unless you’re on a local machine, in which case: the OS internally works in microseconds (almost too small to measure). Since Cas’s problem is measured in SECONDS – but is on localhost – it’s probably something external to the OS, external to the networking itself.

Gut feel would be something like Nagle’s algorithm, although I’m sure everyone’s already configured that appropriately ;).

That, or a routing table that’s changing dynamically – e.g. first packet is triggering a “let me start the dialup connection for you”, causing everything to pause (note that “4 seconds” is in the realm of time it takes for modems to connect; it’s outside the realm of ethernet delays, as already noted)

My general advice for “bizarre server-networking bugs”: start doing the things you know are “wrong” from a performance view, and prove that each makes it worse. If one does not, that de facto that one is somehow not configured as intended.

Common causes / things to check

1. Contention: waht’s the CPU + C libraries doing in background? If system has any significant (but small) load, you could be contended on I/O, CPU mgiht not be interrupting to shunt data from hardware up to software (kernel), up to software (OS user), up to JVM

Classic example: MySQL DB running on a system can block I/O even with small CPU load

2. file-level locking in OS FS. Check using lsof which files your JVM is accessing, and what other processes in the machine may have same files open.

(NB: there’s a bunch of tools like “lsof” which let you see which files are in use by which processes in a unix/linux system. As a network programmer, you should learn them all – they can save a lot of time in development. As a network programmer, you need to be a competent SysAdmin!)

Classic problem: something unexpected has an outstanding read (or a pre-emptive write lock) which is causing silliness. I remember several times having a text editor open, or etc, that was inadvertently slowing access to local files (doh).

3. can you snoop the traffic? (i.e. its socket, not pipe)

Classic problem: some unrelated service is bombarding the loopback with crap. eg. Windows networking (samba on linux) going nuts

Try running Wireshark, filtered to show only 127. and see what’s going through at same time

Also: this is a good way to check where the delay is happening. Is it at point of send, or point of receive? … both?

There’s “net traffic” and there’s “net traffic”. Wireshark often shows traffic that OS normally filters out from monitoring apps…

4. Check your routing table? AND: Check it’s not changing before / aftter the attempted read?

5. Try enabling Nagle, see if it has any effect?

My point is: use this as a check: it ought to make things worse. If not … perhaps the disabling wasn’t working?

6. Have you done ANY traffic shaping (or firewalling) on this machine at any time in the past?

Linux: in particular, check the iptables output. Might be an old iptables rule still stuck in there – or even a firewall rule.

Linux + Windows: disable all firewalls, completely.

7. Similarly: do you have any Anti-Virus software?

Disconnect your test machines from the internet, and remove all AV.

AV software can even corrupt your files when they mistakenly think that the binary files used by the compiler/linker are “suspicious” (IIRC, that happened to us with early PlayStation3 devkits).

8. On a related note: security / IPS tools installed? They will often insert artificial delays silently.

CHECK YOUR SYSTEM LOG FILES! …whatever is causing the delay is quite possibly reporting at least a “warning” of some kind.

9. (in this case, Cas’s socket is using SSL): Perhaps something is looking up a certificate remotely over the web?

…checked your certificate chain? (if there’s an unknown / suspicious cert in the chain, your OS might be trying to check / resolve it before it allows the connection)

10. (in this case, Cas is using custom SSL code in Java to “hack” it): Get a real SSL cert from somewhere, see if it behaves any different

(I think it would be very reasonable for your OS to detect any odd SSL stuff and delay it, as part of anti-malware / virus protection!)

11. “Is it cos I is custom?” – Setup an off-the-shelf webserver on the same machine and check if YOUR READING CODE can read from that SSL localhost fast?

(it often helps to get an answer to “is it the writer, the reader, both … is it the hacked SSL, or all SSL?” etc)

12. (getting specific now, as the general ideas didn’t help): if you bind to the local IP address instead of 127 ?

13. Howabout reverse-DNS? (again, 4 seconds is in the realm of a failed DNS lookup)

Might be a reverse DNS issue … e.g. something in OS (kernel or virus protection), or in JVM SSL library, that’s trying to reverse-lookup the 127 address in order to resolve some aspect of SSL.

I’ve had to put fake entries into hosts file in the past to speed-up this stuff, some libs just need a quick answer


Turns out we were able to stop here…

Cas: “Well, whaddya know. It was a hosts lookup after all – because I’m using and it was doing a hostname lookup. I added those to hosts and all is solved

Me: “(dance)” (yeah. It was skype. The dance smiley comes in handy)

Data Structures for Entity Systems: Contiguous memory

This year I’m working on two different projects that need an Entity System (ES). One of them is a non-game app written natively on iOS + Android. The other is an FPS in Unity3D.

There are good, basic Open-Source ES’s out there today (and c.f. the sidebar there). I tried porting a few, but none of them were optimized for performance, and most of them were too tightly coupled to a single programming language or platform. I’ve started a new ES of my own – – to fix these problems, and I’m already using it in an app that’s in alpha-testing.

I’ll be blogging experiences, challenges, and ideas as I go.

2016 Update: Support me on Patreon, writing about Entity Systems and sharing tech demos and code examples

Background: focus on ES theory, or ES practice?

If you’re new to ES’s, you should read my old blog posts (2007 onwards), or some of the source code + articles from the ES wiki.

My posts focussed on theory: I wanted to inspire developers, and get people using an ES effectively. I was fighting institutionalised mistakes – e.g. the over-use of OOP in ES development – and I wrote provocatively to try and shock people out of their habits.

But I avoided telling people “how” to implement their ES. At the extreme, I feared it would end up specifying a complete Game Engine:

…OK. Fine. 7 years later, ES’s are widely understood and used well. It’s time to look at the practice: how do you make a “good” ES?

NB: I’ve always assumed that well-resourced teams – e.g. AAA studios – need no help writing a good ES. That’s why I focussed on theory: once you grok it, implementation concerns are no different from writing any game-engine code. These posts are aimed at non-AAA teams: those who lack the money (or expertise) to make an optimized ES first time around.

For my new ES library, I’m starting with the basics: Data Structures, and how you store your ES data in memory.

Where you see something that can be done better – please comment!

Aside on Terminology: “Processors, née Systems”

ES “Systems” should be batch-processing algorithms: you give them an array/stream of homogeneous data, and they repeat one algorithm on each row/item. Calling them “Processors” instead of “Systems” reduces confusion.

Why care about Data Structures?

There is a tension at the heart of Entity Systems:

  • In an ES game, we design our code to be Functional: independent, data-oriented, highly efficient for streaming, batching, and multi-threaded execution. Individual Processors should be largely independent, and easy to split out onto different CPU cores.
  • With the “Entity” (ID) itself, we tie those Functional chunks together into big, messy, inter-dependent, cross-functional … well, pretty much: BLOBs. And we expect everything to Just Work.

If our code/data were purely independent, we’d have many options for writing high-performance code in easy ways.

If our data were purely chunked, fixed at compile-time, we’d have tools that could auto-generate great code.

But combining the two, and muddling it around at runtime, poses tricky problems. For instance:

  1. Debugging: we’ve gone from clean, separable code you can hold in your head … to amorphous chunks that keep swelling and contracting from frame-to-frame. Ugh.
  2. Performance: we pretend that ES’s are fast, cache-efficient, streamable … but at runtime they’re the opposite: re-assembled every frame from their constituent parts, scattered all over memory
  3. Determinism: BLOBs are infamously difficult to reason about. How big? What’s in them? What’s the access cost? … we probably don’t know.

With a little care, ES’s handle these challenges well. Today I’m focussing on performance. Let’s look at the core need here:

  • Each frame, we must:
    1. Iterate over all the Processors
    2. For each Processor:
      1. Establish what subset of Entity/Component blobs it needs (e.g. “everything that has both a Position and a Velocity”)
      2. Select that from the global Entity/Component pool
      3. Send the data to the CPU, along with the code for the Processor itself

The easiest way to implement selection is to use Maps (aka Associative Arrays, aka Dictionaries). Each Processor asks for “all Components that meet [some criteria]”, and you jump around in memory, looking them up and putting them into a List, which you hand to the Processor.

But Maps scatter their data randomly across RAM, by design. And the words “jump around in memory” should have every game-developer whimpering: performance will be bad, very bad.

NB: my original ES articles not only use Maps, but give complete source implementations using them. To recap: even in 2011, Android phones could run realtime 30 FPS games using this. It’s slow – but fast enough for simple games

Volume of data in an ES game

We need some figures as a reference point. There’s not enough detailed analysis of ES’s in particular, so a while back I wrote an analysis of Components needed to make a Bomberman clone.

…that’s effectively a high-end mobile game / mid-tier desktop game.

Reaching back to 2003, we also have the slides from Scott’s GDC talk on Dungeon Siege.

…that’s effectively a (slightly old) AAA desktop game.

From that, we can predict:

  • Number of Component-types: 50 for AA, 150 for AAA
  • Number of unique assemblages (sets of Component-types on an Entity): 1k for AA, 10k for AAA
  • Number of Entities at runtime: 20k for AA, 100k for AAA
  • Size of each Component in bytes: 64bits * 10-50 primitives = 100-500 bytes

How do OS’s process data, fast?

In a modern game the sheer volume of data slows a modern computer to a crawl – unless you co-operate with the OS and Hardware. This is true of all games. CPU and RAM both run at a multiple of the bus-speed – the read/write part is massively slow compared to the CPU’s execution speed.

OS’s reduce this problem by pre-emptively reading chunks of memory and caching them on-board the CPU (or near enough). If the CPU is processing M1, it probably wants M2 next. You transfer M2 … Mn in parallel, and if the CPU asks for them next, it doesn’t have to wait.

Similarly, RAM hardware reads whole rows of data at once, and can transfer it faster than if you asked for each individual byte.

Net effect: Contiguous memory is King

If you store your data contiguously in RAM, it’ll be fast onto the Bus, the CPU will pre-fetch it, and it’ll remain in cache long enough for the CPU(s) to use it with no extra delays.

NB: this is independent of the programming-language you’re using. In C/C++ you can directly control the data flow, and manually optimize CPU-caching – but whatever language you use, it’ll be compiled down to something similar. Careful selection and use of data-structures will improve CPU/cache performance in almost all languages

But this requires that your CPU reads and writes that data in increasing order: M1, M2, M3, …, M(n).

With Data Structures, we’ll prioritize meeting these targets:

  1. All data will be as contiguous in RAM as possible; it might not be tightly-packed, but it will always be “in order”
  2. All EntitySystem Processors will process their data – every frame (tick) – in the order it sits in RAM
    • NOTE: a huge advantage of ES’s (when used correctly) is that they don’t care what order you process your gameobjects. This simplifies our performance problems
  3. Keep the structures simple and easy to use/debug
  4. Type-safety, compile-time checks, and auto-complete FTW.

The problem in detail: What goes wrong?

When talking about ES’s we often say that they allow or support contiguous data-access. What’s the problem? Isn’t that what we want?

NB: I’ll focus on C as the reference language because it’s the closest to raw hardware. This makes it easier to describe what’s happening, and to understand the nuances. However, these techniques should also be possible directly in your language of choice. e.g. Java’s ByteBuffer, Objective-C’s built-in C, etc.

Usually you see examples like a simple “Renderer” Processor:

  • Reads all Position components
    • (Position: { float: x, float y })
  • Each tick, draws a 10px x 10px black square at the Position of each Component

We can store all Position components in a tightly-packed Array:


This is the most efficient way a computer can store / process them – everything contiguous, no wasted space. It also gives us the smallest possible memory footprint, and lets the RAM + Bus + CPU perform at top speed. It probably runs as fast or faster than any other engine architecture.

But … in reality, that’s uncommon or rare.

The hard case: One Processor reads/writes multiple Component-types

To see why, think about how we’d update the Positions. Perhaps a simple “Movement” Processor:

  • Reads all Position components and all Velocity components
    • (Position: { float: x, float y })
    • (Velocity: { float: dx, float dy })
  • Each tick, scales Velocity.dx by frame-time, and adds it to Position.x (and repeats for .dy / .y)
  • Writes the results directly to the Position components

“Houston, we have a problem”

This is no longer possible with a single, purely homogeneous array. There are many ways we can go from here, but none of them are as trivial or efficient as the tight-packed array we had before.

Depending on our Data Structure, we may be able to make a semi-homogeneous array: one that alternates “Position, Velocity, Position, Velocity, …” – or even an array-of-structs, with a struct that wraps: “{ Position, Velocity }”.

…or maybe not. This is where most of our effort will go.

The third scenario: Cross-referencing

There’s one more case we need to consider. Some games (for instance) let you pick up items and store them in an inventory. ARGH!

…this gives us an association not between Components (which we could handle by putting them on the same Entity), but between Entities.

To act on this, one of our Processors will be iterating across contiguous memory and will suddenly (unpredictably) need to read/write the data for a different Entity (and probably a different ComponentType) elsewhere.

This is slow and problematic, but it only happens thousands of times per second … while the other cases happen millions of times (they have to read EVERYTHING, multiple times – once per Processor). We’ll optimize the main cases first, and I’ll leave this one for a later post.

Iterating towards a solution…

So … our common-but-difficult case is: Processors reading multiple Components in parallel. We need a good DS to handle this.

Iteration 1: a BigArray per ComponentType

The most obvious way forwards is to store the EntityID of each row into our Arrays, so that you can match rows from different Arrays.

If we have a lot of spare memory, instead of “tightly-packing” our data into Arrays, we can use the array-index itself as the EntityID. This works because our EntityID’s are defined as integers – the same as an array-index.


Usage algorithm:

  • For iterating, we send the whole Array at once
  • When a Processor needs to access N Components, we send it N * big-arrays
  • For random access, we can directly jump to the memory location
    • The Memory location is: (base address of Array) + (Component-size * EntityID)
    • The base-address can easily be kept/cached with the CPU while iterating
    • Bonus: Random access isn’t especially random; with some work, we could optimize it further

Problem 1: Blows the cache

This approach works for our “simple” scenario (1 Component / Processor). It seems to work for our “complex” case (multiple Components / Processor) – but in practice it fails.

We iterate through the Position array, and at each line we now have enough info to fetch the related row from the Velocity array. If both arrays are small enough to fit inside the CPU’s L1 cache (or at least the L2), then we’ll be OK.

Each instance is 500 bytes
Each BigArray has 20k entries

Total: 10 MegaBytes per BigArray

This quickly overwhelms the caches (even an L3 Cache would struggle to hold a single BigArray, let alone multiple). What happens net depends a lot on both the algorithm (does it read both arrays on every row? every 10th row?), and the platform (how does the OS handle RAM reads when the CPU cache is overloaded?).

We can optimize this per-platform, but I’d prefer to avoid the situation.

Problem 2: memory usage

Our typeArray’s will need to be approimately 10 megabytes each:

For 1 Component type: 20,000 Entities * 50 variables * 8 bytes each = 8 MB

…and that’s not so bad. Smaller components will give smaller typeArrays, helping a bit. And with a maximum of 50 unique ComponentTypes, we’ve got an upper bound of 500 MB for our entire ES. On a modern desktop, that’s bearable.

But if we’re doing mobile (Apple devices in 2014 still ship with 512 MB RAM), we’re way too big. Or if we’re doing dynamic textures and/or geometry, we’ll lose a lot of RAM to them, and be in trouble even on desktop.

Problem 3: streaming cost

This is tied to RAM usage, but sometimes it presents a bottleneck before you run out of memory.

The data has to be streamed from RAM to the CPU. If the data is purely contiguous (for each component-type, it is!), this will be “fast”, but … 500 MB data / frame? DDR3 peaks around 10 Gigabytes / second, i.e.:

Peak frame rate: 20 FPS … divided by the number of Processors

1 FPS sound good? No? Oh.

Summary: works for small games

If you can reduce your entity count by a factor of 10 (or even better: 100), this approach works fine.

  • Memory usage was only slightly too big; a factor of 10 reduction and we’re fine
  • CPU caching algorithms are often “good enough” to handle this for small datasets

The current build of Aliqua is using this approach. Not because I like it, but because it’s extremely quick and easy to implement. You can get surprisingly far with this approach – MyEarth runs at 60 FPS on an iPad, with almost no detectable overhead from the ES.

Iteration 2: the Mega-Array of Doom

Even on a small game, we often want to burst up to 100,000+ Entities. There are many things we could do to reduce RAM usage, but our biggest problem is the de-contiguous data (multiple independent Arrays). We shot ourselves in the foot. If we can fix that, our code will scale better.


In an ideal world, the CPU wants us to interleave the components for each Entity. i.e. all the Components for a single Entity are adjacent in memory. When a Processor needs to “read from the Velocity and write to the Position”, it has both of them immediately to hand.

Problem 1: Interleaving only works for one set at a time

If we interleave “all Position’s with all Velocity’s”, we can’t interleave either of them with anything else. The Velocity’s are probably being generated by a different Processor – e.g. a Physics Engine – from yet another ComponentType.


So, ultimately, the mega-array only lets us optimize one Processor – all the rest will find their data scattered semi-randomly across the mega-array.

NB: this may be acceptable for your game; I’ve seen cases where one or two Processors accounted for most of the CPU time. The authors optimized the DS for one Processor (and/or had a duplicate copy for the other Processor), and got enough speed boost not to worry about the rest

Summary: didn’t really help?

The Mega Array is too big, and it’s too interconnected. In a lot of ways, our “lots of smaller arrays – one per ComponentType” was a closer fit. Our Processors are mostly independent of one another, so our ideal Data Structure will probably consist of multiple independent structures.

Perhaps there’s a halfway-house?

Iteration 3: Add internal structure to our MegaArray

When you use an Entity System in a real game, and start debugging, you notice something interesting. Most people start with an EntityID counter that increases by 1 each time a new Entity is created. A side-effect is that the layout of components on entities becomes a “map” of your source code, showing how it executed, and in what order.

e.g. With the Iteration-1 BigArrays, my Position’s array might look like this:


  1. First entity was an on-screen “loading” message, that needed a position
  2. BLANK (next entity holds info to say if loading is finished yet, which never renders, so has no position)
  3. BLANK (next entity is the metadata for the texture I’m loading in background; again: no position)
  4. Fourth entity is a 3d object which I’ll apply the texture to. I create this once the texture has finished loading, so that I can remove the “loading” message and display the object instead
  5. …etc

If the EntityID’s were generated randomly, I couldn’t say which Component was which simply by looking at the Array like this. Most ES’s generate ID’s sequentially because it’s fast, it’s easy to debug (and because “lastID++;” is quick to type ;)). But do they need to? Nope.

If we generate ID’s intelligently, we could impose some structure on our MegaArray, and simplify the problems…

  1. Whenever a new Entity is created, the caller gives a “hint” of the Component Types that entity is likely to acquire at some time during this run of the app
  2. Each time a new unique hint is presented, the EntitySystem pre-reserves a block of EntityID’s for “this and all future entities using the same hint”
  3. If a range runs out, no problem: we add a new range to the end of the MegaArray, with the same spec, and duplicate the range in the mini-table.
  4. Per frame, per Processor: we send a set of ranges within the MegaArray that are needed. The gaps will slow-down the RAM-to-CPU transfer a little – but not much


Problem 1: Heterogeneity

Problem 1 from the MegaArray approach has been improved, but not completely solved.

When a new Entity is created that intends to have Position, Velocity, and Physics … do we include it as “Pos, Vel”, “Pos, Phys” … or create a new template, and append it at end of our MegaArray?

If we include it as a new template, and insist that templates are authoritative (i.e. the range for “Pos, Vel” templates only includes Entities with those Components, and no others) … we’ll rapidly fragment our mini-table. Every time an Entity gains or loses a Component, it will cause a split in the mini-table range.

Alternatively, if we define templates as indicative (i.e. the range for “Pos, Vel” contains things that are usually, but not always Pos + Vel combos), we’ll need some additional info to remember precisely which entities in that range really do have Pos + Vel.

Problem 2: Heterogeneity and Fragmentation from gaining/losing Components

When an Entity loses a Component, or gains one, it will mess-up our mini-table of ranges. The approach suggested above will work … the mini-table will tend to get more and more fragmented over time. Eventually every range is only one item long. At that point, we’ll be wasting a lot of bus-time and CPU-cache simply tracking which Entity is where.

NB: As far as I remember, it’s impossible to escape Fragmentation when working with dynamic data-structures – it’s a fundamental side effect of mutable data. So long as our fragmentating problems are “small” I’ll be happy.

Problem 3: Heterogeneity and Finding the Components within the Array

If we know that “Entity 4” starts at byte-offset “2048”, and might have a Position and Velocity, that’s great.

But where do we find the Position? And the Velocity?

They’re at “some” offset from 2048 … but unless we know all the Components stored for Entity 4 … and what order they were appended / replaced … we have no idea which. Raw array-data is typeless by nature…

Iteration 4: More explicit structure; more indexing tables

We add a table holding “where does each Entity start”, and tables for each Component stating “offset for that Component within each Entity”. Conveniently, this also gives us a small, efficient index of “which Entities have Component (whatever)”:


Problem 1: non-contiguous data!

To iterate over our gameobjects, we now need:

  • One big mega-array (contiguous)
  • N x mini arrays (probably scattered around memory)

Back to square one? Not quite – the mini-arrays are tiny. If we assume a limit of 128,000 entities, and at most 8kb of data for all Components on an Entity, our tables will be:

[ID: 17bits][Offset: 13 bits] = 30 bits per Component

…so that each mini-array is 1-40 kB in size. That’s small enough that several could fit in the cache at once.

Good enough? Maybe…

At this point, our iterations are quite good, but we’re seeing some recurring problems:

  • Re-allocation of arrays when Components are added/removed (I’ve not covered this above – if you’re not familiar with the problem, google “C dynamic array”)
  • Fragmentation (affects every iteration after Iteration 1, which doesn’t get any worse simple because it’s already as bad as it could be)
  • Cross-referencing (which I skipped)

I’ve also omitted history-tracking – none of the DS’s above facilitate snapshots or deltas of game state. This doesn’t matter for e.g. rendering – but for e.g. network code it becomes extremely important.

There’s also an elephant in the room: multi-threaded access to the ES. Some ES’s, and ES-related engines (*cough*Unity*cough*), simply give-up on MT. But the basis of an ES – independent, stateless, batch-oriented programming – is perfect for multi threading. So there must be a good way of getting there…

…which gives me a whole bunch of things to look at in future posts :).

PS … thanks to:

Writing these things takes ages. So much to say, so hard to keep it concise. I inflicted early drafts of this on a lot of people, and I wanted to say “thanks, guys” :). In no particular order (and sorry in advance if final version cut bits you thought should be in there, or vice versa): TCE’ers (especially Dog, Simon Cooke, doihaveto, archangelmorph, Hypercube, et al), ADB’ers (Amir Ebrahimi, Yggy King, Joseph Simons, Alex Darby, Thomas Young, etc). Final edit – and any stupid mistakes – are mine, but those people helped a lot with improving, simplifying, and explaining what I was trying to say.

OpenGL ES2 – Shader Uniforms

There’s a famous animated GIF of an infinitely swirling snake (here’s one that’s been Harry Potterised with the Slytherin logo):

Impressive, right?

What if I said it only relies upon one variable, and that you can reproduce this yourself in 3D in mere minutes? It’ll take quite a lot longer to read and understand, but once you’ve grokked it, you’ll be able to do this easily.

Background reading

Make sure you understand:

  1. Draw Calls
  2. VAOs + VBOs
  3. and at least one kind of Texturing:

Uniforms vs. Vertex Attributes

In theory, you don’t need Uniforms: they are a special kind of Vertex-Attribute (which gives them their name: they are a “uniform attribute”).

In practice, as we’ve already seen, bitmap-based texture-mapping in GL ES requires Uniforms to make a link between Texture and Shader.

The code for sending them to the GPU is different from sending Vertex Attributes – annoyingly, it’s more complicated – but the concept is identical. Then why do we have them?


And, as a bonus: convenience.

How many items per Spaceship?

A typical 3D model of a spaceship has:

  • 5,000 polygons
  • 10,000 vertices
  • 5 bitmap tetures

OK, fine, so …

Performance basics

Your vertices each specify which texture(s) they’re using. If you want to change the textures, from “standard” ones to “damaged” ones, you’ll have to:

  1. Upload the new texture (one-time cost; once it’s on GPU you can fast-switch between textures)
  2. Re-Upload the “uses texture 1 (out of the 5)” vertex-attribute … once for every vertex (repeated cost: has to be done every time)

Uniforms bypass this by saying:

A GL Uniform is a Vertex Attribute that has the same value for EVERY vertex. It only needs to be uploaded ONCE and is immediately applied to every vertex

But there’s more …

Each vertex can hold up to 2kb of data (in OpenGL ES; more on desktop GL) – making our ship take 10 megabytes of GPU RAM. But that’s small by today’s standards – and as the model gets more complicated, the storage needed increases.

By contrast, the number of Uniforms needed for a model is typically constant.

The net effect is that GPU vendors can afford to use faster RAM for their Uniforms, boosting performance even further.

Convenience basics

Revisiting that spaceship, if we’re using it in a game, there’s a lot more things we’ll want to include in the 3D model:

  • 10 different versions, one for each player. They’re all the same, but some elements change colour to match the Player’s colour
  • Some of the textures animate: e.g. Landing strips, with lights that strobe
  • Gun turrets need to rotate hundreds of vertices at once, without affecting their neighbours
  • … and so on.

Each of these becomes trivial when your DrawCall has global-constants – i.e Uniforms. For instance:

  • 10 different versions…:
    • The vertices that might change colour have a vertex attribute signalling this; at render-time, the shader sees this flag on the verte, and reads from a Uniform what the “actual” colour should be. Change the uniform, and the colour changes everywhere on the model at once
  • Textures animate…:
    • When you read a U,V value from the texture-bitmap, add a number to U and/or V that comes from a Uniform.
    • Mark the teture as “GL_REPEAT”, so that GL treats it like an infinitely tiled teture
    • Increase that uniform by a tiny amount each time you render a frame (e.g. 0.001), and the texture appears to “scroll”
  • Gun turrets rotate…:
    • Use a second DrawCall to draw the turrets.
    • Each turret has a Uniform “rotation angle in degrees”
    • When rendering, your Shader pre-rotates ALL vertices in each turret by the Uniform’s value.
    • Per frame, change the Uniform for “rotation angle”, and the whole turret rotates at once
  • …etc…

Implementing and Using Uniforms in OpenGL

Render / Update cycle for Uniforms

With Vertex-Attributes, it was easy:

  1. CPU: Generate geometry (load from a .3ds file; algorithm for making a Cube; etc)
  2. GPU: Create 1 or more VBO’s to hold the data on GPU
  3. CPU->GPU: Upload the geometry from CPU to GPU in a single big chunk
  4. Every frame: GPU reads the data from local RAM, and renders it
  5. To change the data, re-do all the above

With Uniforms, it’s more tricky.

Firstly – like everything else in Shaders – Uniforms ignore any OpenGL features that already existed. The GPU intelligently selected the correct VBOs each frame, by using the data inside the VAO. But Shaders ignore the VAO, and need to be manually switched over.

Secondly – in VBO’s, OpenGL does not care what format your data is in. But with Uniforms, suddenly it does care: you have to specify, every time you upload them.

Thirdly – GL uses an efficient C-language mechanism for uploading Uniforms with minimal overhead. With VBO’s, the VAO took care of this automatically, but again: Shaders need you to do it by hand.

Together, these complicate the process:

  1. CPU: generate a value for the Uniform
  2. CPU: create an area in RAM that will hold the value, and place it there
  3. GPU: automatically creates storage for the Uniform when you compile/link the ShaderProgram
  4. CPU->GPU: switch to the specific ShaderProgram that will use the Uniform
  5. CPU->GPU: don’t send the data; instead, send the “memory-address” of the data
  6. CPU->GPU: upload using one of thirty three unique methods (instead of the one for Verte Attributes)
  7. Every frame: GPU reads the data from local RAM, but each ShaderProgram has its own copy
  8. To change the data, re-do all the above

Uploading a value to a Uniform

After you’ve linked your ShaderProgram, you can ask OpenGL about the Uniforms it found. For each Uniform, you get:

  1. The human-readable name used in the GLSL file
  2. The OpenGL-readable name generated automatically (an integer: GLint)
  3. The GLType (int, bool, float, vec2, vec3, vec4, mat2, mat3, mat4, etc)
  4. Is this one value, or an array of values? If it’s an array: how many slots in the array? (all GLSL arrays are fied length)

The GLType has to be saved, because when you want to upload, there’s a different upload method for each distinct type:

glUniform1i – sends 1 * integer

glUniform3f – sends 3 * floats

glUniformMatri4fv – sends N * 4×4-matrices, each using floats internally


To handle this automatically, I wrote three chunks of code:

  1. GLK2Uniform.h/m:: stores the GLType, is it a Matrix or Vector (or float?), etc
  2. GLK2ShaderProgram.h/m .. -(NSMutableDictionary*) fetchAllUniformsAfterLinking: parses the data from the Shader, and creates GLK2Uniform instances
  3. GLK2ShaderProgram.h/m .. -(void) setValue:(const void*) value forUniform:(GLK2Uniform*) uniform: uses the GLType data etc to pick the appropriate GL method to upload this value to the specified Uniform

That last method takes “const void*” as argument: i.e. it has no type-checking. I find this much simpler than continually specifying the type. It also intelligently handles dereferencing the pointer (for matrices and vectors) or not (for ints, floats, etc).

Uniforms and VAOs: a missing feature from OpenGL

So far, we’ve used VAO’s. They’re very useful, seemingly they:

…store all the render-state that is specific to a particular Draw call

Tragically: Shaders ignore VAO’s. Once you start using Uniforms, you find that VAO’s actually:

…store all the render-state that is specific to a particular Draw call, so long as that state isn’t in a ShaderProgram (i.e. isn’t a Uniform)

Shaders are the only place where VAO’s aren’t used, and it’s very easy to forget this and have your code break in weird and wonderful ways. If you find Shader state seems to be leaking between DrawCalls, you almost certainly forgot to eplicitly switch ShaderProgram somewhere.

Note: this applies not only for rendering a DrawCall, but also for setting the value of the Uniform. You must call glUseProgram() before setting a Uniform value.

Because we often want to set Uniform values outside of the draw loop – e.g. when configuring a Shader at startup – I added a method that automatically switches to the right program for you. If you use this repeatedly on every frame, it’ll damage performance, but it’s great for checking if you’ve forgotten a glUseProgram somewhere:


-(void) setValueOutsideRenderLoopRestoringProgramAfterwards:(const void*) value forUniform:(GLK2Uniform*) uniform 
	GLint currentProgram;
	glGetIntegerv( GL_CURRENT_PROGRAM, &currentProgram);
	[self setValue:value forUniform:uniform];

Uniforms in your Game Engine

Game-Engine code has to treat Uniforms a little specially:

  • Unlike Vertex-Attributes, we tend to update Uniforms very frequently – often every frame.
  • We have to reference them by human-readable name (instead of simply ramming them into a homogeneous C-array).
  • We have to remember to keep calling glUseProgram() each time we write to a Uniform, or render a DrawCall.

You can layer it in fancy OOP wrappers, but ultimately you’re forced to have a Hashtable/Map somewhere that goes from “human-readable Uniform name” to “chunk of memory holding the current value, that can be sent to the GPU whenever it changes”.

Desktop GL is different; they modified the GLSL / Shader spec so that it allowed for slightly tighter integration of variables with your main app. Sadly, they didn’t include those features in GL ES

I’ve tried it a few different ways, but the problem is that you have to store a “string” mapping to a “C-struct”. Worse, OpenGL ignores the value of the struct, it only uses the memory-address. So that struct has to be at a stable location in RAM.

This might not seem a problem, but Apple’s system for storing structs in NSDictionary is to create and destroy them on-the-fly (on the stack) – so there’s never a stable memory-address.

Here’s my current best workaround for ObjectiveC OpenGL apps…

An intelligent, C-based, “Map” class


  1. All our data will be structs
    1. C can easily store data if it’s homogeneous
    2. and OpenGL only has circa 10 unique structs for Uniforms
    3. …so: 10 arrays will be enough to store “all possible” Uniform values for a given ShaderProgram
  2. Data is unique per ShaderProgram
    1. C arrays-of-structs can’t change size once created :(
    2. But: the Uniforms for a ShaderProgram are hard-coded, cannot change at runtime
    3. …so: we can create one Map per ShaderProgram, and we know it will always be correct
  3. C-strings are horrible, and we want to avoid them like the plague
    1. We can easily convert C-strings into Objective-C strings (NSString)
    2. Apple’s NSArray stores NSString’s, and returns an int when you ask “which slot contains NSString* blah?”
    3. C allows int’s for direct-fetching of locations in an array-of-structs
    4. …so: we can have a Data Structure of NSString’s, and a separate C-array of structs, and they never have to interact


@interface GLK2UniformMap : NSObject

+(GLK2UniformMap*) uniformMapForLinkedShaderProgram:(GLK2ShaderProgram*) shaderProgram;

- (id)initWithUniforms:(NSArray*) allUniforms;

-(GLKMatrix2*) pointerToMatrix2Named:(NSString*) name;
-(GLKMatrix3*) pointerToMatrix3Named:(NSString*) name;
-(GLKMatrix4*) pointerToMatrix4Named:(NSString*) name;
-(void) setMatrix2:(GLKMatrix2) value named:(NSString*) name;
-(void) setMatrix3:(GLKMatrix3) value named:(NSString*) name;
-(void) setMatrix4:(GLKMatrix4) value named:(NSString*) name;

-(GLKVector2*) pointerToVector2Named:(NSString*) name;
-(GLKVector3*) pointerToVector3Named:(NSString*) name;
-(GLKVector4*) pointerToVector4Named:(NSString*) name;
-(void) setVector2:(GLKVector2) value named:(NSString*) name;
-(void) setVector3:(GLKVector3) value named:(NSString*) name;
-(void) setVector4:(GLKVector4) value named:(NSString*) name;


You create a GLK2UniformMap from a specific GLK2ShaderProgram. It reads the ShaderProgram, finds out how many Uniforms of each GLType there are, and allocates C-arrays for each of them.

Later, you can use the “setBLAH:named:” methods to set-by-value any struct. Importantly, this does NOT take a pointer! This ensures you can create a struct on the fly – all of Apple’s GLKit methods do this. e.g. you can do:

GLK2UniformMap* mapOfUniforms = ...
[mapOfUniforms setVector3: GLKVector3Make( 0.0, 1.0, 0.0 ) named:@"position"];

Connecting the GLK2UniformMap to a GLK2DrawCall

In previous posts, I created the GLK2UniformValueGenerator protocol. This is a simple protocol that uses the same method signatures as used by OpenGL’s Uniform-upload commands.

We etend GLK2UniformMap, and implement that protocol, to create something we can attach to a GLK2DrawCall, and have our rendering do everything else automatically:


@interface GLK2UniformMapGenerator : GLK2UniformMap <GLK2UniformValueGenerator>

+(GLK2UniformMapGenerator*) generatorForShaderProgram:(GLK2ShaderProgram*) shaderProgram;
+(GLK2UniformMapGenerator *)createAndAddToDrawCall:(GLK2DrawCall *)drawcall;


Internally, the methods are very simple, e.g.:


@implementation GLK2UniformMapGenerator
-(GLKMatrix2*) matrix2ForUniform:(GLK2Uniform*) v inDrawCall:(GLK2DrawCall*) drawCall
	return [self pointerToMatrix2Named:v.nameInSourceFile];

NB: in the protocol, I included the GLK2DrawCall that’s making the request. This is unnecessary. In future updates to the source, I’ll probably remove that argument.

Animated textures: the magic of Uniforms

Finally, let’s do something interesting: animate a texture-mapped object.

The sample code has jumped ahead a bit on GitHub, as I’ve been using it to demo things to a couple of different people.

Have a look around the project, but I’ve split into two projects. One contains the reusable library code, the other contains a Demo app that shows the library-code in use.

I simplified all the reusable render code to date into a Library class: GLK2DrawCallViewController (etends Apple’s GLKViewController)

I’ve also moved the boilerplate “create a triangle”, “create a cube” etc code into a Demo class: CommonGLEngineCode

The sample project – permanent link to branch for this article – has a simple ViewController that loads a snake image and puts it on a triangle:


@interface AnimatedTextureViewController ()
@property(nonatomic,retain) GLK2UniformMapGenerator* generator;

@implementation AnimatedTextureViewController

-(NSMutableArray*) createAllDrawCalls
	/** All the local setup for the ViewController */
	NSMutableArray* result = [NSMutableArray array];
	/** -- Draw Call 1:
	 triangle that contains a CALayer texture

	GLK2DrawCall* dcTri = [CommonGLEngineCode drawCallWithUnitTriangleAtOriginUsingShaders:
						   [GLK2ShaderProgram shaderProgramFromVertexFilename:@"VertexProjectedWithTexture" fragmentFilename:@"FragmentTextureScrolling"]];

That’s using the refactored CommonGLEngineCode class to make a unit triangle appear roughly in the middle of the screen.

Then we setup the UniformMapGenerator (no values yet):

	self.generator = [GLK2UniformMapGenerator createAndAddToDrawCall:dcTri];

NB: the generator class automatically detects requests for Sampler2D, and ignores them. Those are only used for texture-mapping, which we handle automatically inside the GLK2DrawCall class (see previous post for details).

	/** Load a scales texture - I Googled "Public Domain Scales", you can probably find much better */
	GLK2Texture* newTexture = [GLK2Texture textureNamed:@"fakesnake.png"];
	/** Make the texture infinitely tiled */
	glBindTexture( GL_TEXTURE_2D, newTexture.glName);
	/** Add the GL texture to our Draw call / shader so it uses it */
	GLK2Uniform* samplerTexture1 = [dcTri.shaderProgram uniformNamed:@"s_texture1"];
	[dcTri setTexture:newTexture forSampler:samplerTexture1];

…again: c.f. previous post for details of what’s happening here, nothing’s changed.

	/** Set the projection matrix to Identity (i.e. "dont change anything") */
	GLK2Uniform* uniProjectionMatrix = [dcTri.shaderProgram uniformNamed:@"projectionMatrix"];
	GLKMatrix4 rotatingProjectionMatrix = GLKMatrix4Identity;
	[dcTri.shaderProgram setValueOutsideRenderLoopRestoringProgramAfterwards:&rotatingProjectionMatrix forUniform:uniProjectionMatrix];
	[result addObject:dcTri];
	return result;


Finally, we now have to implement a callback to update our Generator’s built-in structs and ints and floats once per frame:

-(void)willRenderDrawCallUsingVAOShaderProgramAndDefaultUniforms:(GLK2DrawCall *)drawCall
	/** Generate a smoothly increasing value using GLKit's built-in frame-count and frame-timers */
	double framesOutOfFramesPerSecond = (self.framesDisplayed % (4*self.framesPerSecond)) / (double)(4.0*self.framesPerSecond);
	[self.generator setFloat: framesOutOfFramesPerSecond named:@"timeInSeconds"];

Run the project, tap the button, and you should see snakey skin scrolling along the surface of a 3D triangle:

Screen Shot 2014-02-20 at 01.51.20

From scales-on-a-triangle to realistic snake

The scrolling works by moving our offset across the surface of the triangle. Doing this with a Uniform means that the speed is constant relative to the corners of the triangle.

i.e. if you make the triangle smaller, it will take the same time to cover the distance, but it’s covering a shorter distance, so appears to move slower.

We’re getting the effect – for free! – of skin bunching up and stretching out. All you have to do is make your triangles shorter on the inside of a snake-coil, and longer on the outside.

If you model your snake the easiest possible way, this bunching will happen automatically. Simply take a cylinder and bend it with a transform – the vertex attributes (that force the texture to map across each triangle) won’t change, but the triangle sizes will, causing realistic bunching/stretching of the skin.

GLKit Extended: Refactoring the View Controller

If you’ve been following my tutorials on OpenGL ES 2 for iOS, by the time you finish Texturing (Part 6 or so) you’ll have a lot of code crammed into a single UIViewController. This is intentional: the tutorials are largely self-contained, and only create classes and objects where OpenGL itself uses class-like-things.

…but most of the code is re-used from tutorial to tutorial, and it’s getting in the way. It’s time for some refactoring.
Continue reading

Artemis Entity System in ObjectiveC

I wanted to try the latest version of Artemis, and I had an old game project that was quickly written in OOP style. So I went looking for an ObjC port…

Existing port: outdated

There was a port linked on the Artemis site that was OK – but had no documentation or updates, and Artemis has moved on since then.

There were also no unit tests etc – and the current Artemis getting-started wouldn’t work with this port (because so much has changed). So I started a new port…

New port: ObjC Artemis 100%

Primary aim:

  • make it identical to the core Artemis.

Continue reading

OpenGL ES 2 – Textures 2 of 3: Texture Mapping

Recap: Texturing in OpenGL can be achieved in two separate ways, using different API’s and hardware features. The preferred, modern approach – procedural texturing – uses nothing more than a Fragment shader (this is really what Fragment shaders were created for: Texturing).

Part 5 (Textures: 1 of 3) covered that in detail – but there’s another option: Texture-Mapping (also known as “UV Mapping”, it’s identical).
Continue reading

SVGKit: Programmatic editing of SVG files on iOS

(this is a post mainly to document some new features and fixes I’ve added to SVGKit version 1.2 today – they are already on the main development branch (currently: 1.x), ready for use)

SVGKit overview, November 2013

While a group of us were cleaning up and improving SVGKit about 2 years ago, I did a quick-n-simple re-architect of the in-memory data structures to split it into three wholly independent sets of source code / classes:

  1. Parsing an SVG file
    1. Parsing a legal XML file, with a new conforming DOM parser I wrote from scratch (because SVG Spec requires you to use a DOM parser, and Apple won’t let you use/extend their one on iOS/OSX!)
    2. Parsing the core SVG spec where it differs from XML-DOM
    3. Adding a system for user-supplied custom-parsers so that you can parse your custom XML in-line with the SVG (this is a major feature of SVG, but difficult to parse!)
  2. Rendering an SVG file with pixels on-screen
    1. Converting in-memory SVG data into Apple’s CALayer rendering format (used on all Apple Operating Systems)
    2. Optionally converting Apple’s CALayer format into highly-optimized hybrid data that lets you render SVG’s *fast*
    3. Painstakingly implementing every feature of SVG, from radial gradients to rich text (we’re about 90% complete now, still features left to add – please help!)
  3. Outputting an SVG file back to disk
    1. …not supported … until now!

Continue reading

OpenGL dumb mistakes: the mysterious Perfect Circular Hole

I’ve got a long-running project around rendering 3D Earth in interesting ways. It’s being used in a game project, and some non-game stuff too. Here’s a recent screenshot:

I added a particle system that uses Vector Fields to move the particles based on arbitrary incoming data, and then wraps it round a globe. Yay!:

…and then I increased the size of each particle, and something very strange happened. I wanted to splurge them out to be huge, but

The mysterious Holes Of Doom

… these weird holes appeared in the middle:

WTF? I must have done something stupid. I double checked everything. I removed the textures. I switched from quads to basic tris, with vertex colours … but still, this perfect circle cut out of the center! My vertex and fragment shaders by this point were exactly one line of code!

…I spent ages looking at this, trying to understand what could possibly create “holes” in my geometry. They were obviously holes – you can see through them!

Eventually, tired, and with lots of more important work to do, I gave up, shelved it, went off to do other things. Then today I suddenly realised the problem. I’m sure it’s obvious to most of you, but it’s taken me ages to figure it out.

When a hole is not a hole

And this is what makes it worth posting: the hole is not a hole.

My particles are on a 2D plane, so that I can calculate simple x/y coords from a vector-field. Great. My shader wraps the corners to a sphere. I’m rendering quads / tris like this:


Great. But GL doesn’t “bend” the edges – it draws straight lines between corners. So … when I made my tris larger … the corners were still slightly above the surface of the globe.

…and the EDGES were still slightly above

…but the centers … slightly interesected the globe. And so, of course, the depth-test cut them out. In the diagram below, it looks as if the quad moved closer to the center of the globe – but it hasn’t. The corners are all still the same height above the surface – only the middle has (implicitly) gone “inwards”.


Lesson to self: sometimes, it’s not a triangle-with-a-hole-that-I-can-see-through … instead, it’s a triangle-that-has-a-sphere-poking-through-the-middle


What programming language should aspiring game developers learn in their free time?

I was asked this recently in private mail by someone heading off to University / College. For people choosing between a Computer Science/Software Engineering course vs. a Game Design / Game Development course, I’d said:

If you love games, you’ll do both anyway; one you’ll learn in lectures, the other you’ll self-teach. Which will you find easier to self-teach (given no-one is prodding you), and which will you need the extra benefit of having a formal course/teaching/teachers? (usually: people find it easier to self-teach “designing and making my own game experiments”. Usually, people find formal Computer Science / Maths is ‘hard’, and they need the help of a formal course)

Gross generalization: Most game-developers in the indie community are programmers; most of them are programmer-centric when looking at the world. I’m a programmer, but I’ve done approx 40% of my career in project-management and studio management (and even a tiny bit of publishing) over the years.

There’s one interesting point I want to clarify a little: By “indie community” I mean “focussing on the places filled with people who actually ship games”. Most game designers who want to make their own stuff have to learn programming, even if they’re 100% non-programmers, since it’s the only way to get “your” vision launched. Even the ones that started-off as pure artists often end up being highly technical (whether they admit that to themselves or not).

So … this is not a popular view, but I think it’s more accurate than most: by the time you finish a 3-4 year course starting “today” … C++ won’t be the game-dev language people care about. Already, for entry-level jobs most studios are more interested in “how good are you at Unity? How good’s your C#?”. That trend will only continue/increase.
Continue reading

iOS Open GL ES 2: Multiple objects at once


  1. Part 1 – Overview of GLKit
  2. Part 2 – Drawcalls, and how OpenGL code is architected
  3. Part 3 – Vertices, Shaders and Geometry
  4. Part 4 – (additions to Part 3); preparing for Textures

…but looking back, I’m really unhappy with Part 4. Xcode5 invalidates almost 30% of it, and the remainder wasn’t practical – it was code-cleanup.

So, I’m going to try again, and do it better this time. This replaces my previous Part 4 – call it “4b”.

Drawing multiple 2D / 3D objects

A natural way to “draw things” is to maintain a list of what you want to draw, and then – when the OS / windowing library / whatever is ready to draw, you iterate over your “things” something like:
Continue reading

OpenGL ES 2 for iOS: glDebugging and cleaning-up our VBOs, VAOs, and Draw calls

UPDATE: This post sucks; it has some useful bits about setting up a breakpoint in the Xcode debugger – but apart from that: I recommend skipping it and going straight to Part 4b instead, which explains things much better.

This is Part 4, and explains how to debug in OpenGL, as well as improving some of the reusable code we’ve been using (Part 1 has an index of all the parts, Part 3 covered Geometry).

Last time, I said we’d go straight to Textures – but I realised we need a quick TIME OUT! to cover some code-cleanup, and explain in detail some bits I glossed-over previously. This post will be a short one, I promise – but it’s the kind of stuff you’ll probably want to bookmark and come back to later, whenever you get stuck on other parts of your OpenGL code.

Cleanup: VAOs, VBOs, and Draw calls

In the previous part, I deliberately avoided going into detail on VAO (Vertex Array Objects) vs. VBO (Vertex Buffer Objects) – it’s a confusing topic, and (as I demonstrated) you only need 5 lines of code in total to use them correctly! Most of the tutorials I’ve read on GL ES 2 were simply … wrong … when it came to using VAO/VBO. Fortunately, I had enough experience of Desktop GL to skip around that – and I get the impression a lot of GL programmers do the same.

Let’s get this clear, and correct…
Continue reading

Why is a Fragment Shader named a Fragment Shader?

I’m writing some KISS tutorials on using OpenGL ES 2.0 in iOS/mobile, because I was frustrated that most of the tutorials on GL ES used incorrect or outdated approaches.

(I’m using the OpenGL ES 2.0 Programming Guide as my baseline, along with my own experiences of game-programming, and checking it all against Khronos Group /’s docs as I go)

I use this stuff in iOS apps, but … I’m a relative novice with shader-pipelines. So I’ve been leaning on the guys at The Chaos Engine for proofreading and to help me avoid spreading misinformation. Most stuff people broadly agree on, but there’s been some interesting debate over a seemingly trivial question:

Why does OpenGL use the term “Fragment Shaders” to describe the shaders-that-generate-pixels? (where other API’s call the same thing “Pixel Shaders”, a much easier to understand/remember name)

Continue reading