May 2009 – The Instruction Limit

Top Sources of Heap Garbage

In the last Xbox-related entry I posted, I mentioned how the CLR Profiler can be a very useful tool to know what are the biggest sources of heap garbage in your .NET project. I used it extensively in the past month to optimize my game to run on the Xbox, and here’s a rundown of my biggest programming “mistakes” (or problematic liberties?).

1. LINQ

Even the simplest LINQ queries like :

List<List<Potato>> potatoBags;
foreach (var potato in potatoBags.SelectMany(x => x))
{
  // ...
}

…will cause a noticeable amount of heap garbage. This includes Where clauses, OrderBy clauses, everything! It’s sad because I think that LINQ is a fantastic code-thinning tool, but it’s not an option on limited-memory systems like the Xbox.

That said, if you want to use LINQ at initialization/loading time, feel free to do so. The problems only arise in update/draw calls.

2. Automatic XNB Deserialization

At one point, I got sick or writing ContentTypeWriter and ContentTypeReader classes and started building an XNB automatic serializer and deserializer based on Alexander’s (John Doe?) work on the subject. On Windows, the load times remained the same and it greatly simplified or deleted many of my content pipeline classes.

But on the Xbox the load times were horrible. Even in Release and without the debugger attached, the load times were at least 5x as slow as on my PC. I then discovered that reflection calls generate a lot of heap garbage — and it’s not even clear if memory stays allocated or if it eventually gets compacted by the GC…

So I swallowed my pride and switched back to good old Reader/Writers. But hey, now the load times are super fast.

3. Using classes when you can use structs

Coming from a Java background, I’m very used to classes and I’ll use them for pretty much everything. Even data structures that get created at runtime, because it’s so handy to have references and everyone pointing to the same object…

Turns out using structs has a lot of advantages. Intuitively I thought that the by-copy parameter passing would just make everything slower, but if you keep your structures small enough it has little to no effect on performance. The fact that they reside on the stack and not on the heap makes them a much better option for the Xbox. So datatypes like collision result objects, object identifiers, anything that you need to create often when the game is running should be made into structs.

4. Object pools (are a good thing)

Sometimes you just need to dynamically allocate reference objects in your algorithms. Or even value-types can get allocated on the heap if they’re at class-local scope. But you can minimize the damage by using object pools!

They’re really easy to set up (I found this Ziggyware article to be a good starting point) and they’ll save you heap garbage by preallocating to the number of objects you’ll actually need, and extending the lifetime of objects that would otherwise be disposable trash.

5. Removing objects from a collection that you’re enumerating

I thought I had found a really good way to fix the old problem of “Collection was modified; enumeration operation may not execute” when you remove an object from a collection when you’re foreach’ing on it :

foreach (var potato in potatoBag.ToArray())
{
  if (potato.Expired)
    potatoBag.Remove(potato);
}

It’s pretty cute, no? Very little impact on the iteration code. But it also copies the whole collection to a brand new array everytime you’re enumerating it… :(

What I ended up doing instead to minimize garbage is :

// This is allocated once, at the class-level scope
readonly List<Potato> expiredPotatoes = new List<Potato>();

public void Update()
{
  // Standard update
  foreach (var potato in potatoBag)
  {
    if (potato.Expired)
      expiredPotatoes.Add(potato);
  }
  // Removing pass
  foreach (var expired in expiredPotatoes)
    potatoes.Remove(expired);
  expiredPotatoes.Clear();
}

It’s certainly heavier code-wise, but at least it’s clean. And it’s faster too.

6. Enums as Dictionary keys

If you use an enum as the TKey type parameter for a Dictionary<TKey, TValue> object, you’ll have a small amount of garbage generated everytime you access the dictionary. But there’s an easy way around it : you just need to build a Comparer class for the enum type (which is under 10 lines of code) and pass it to the constructor of your dictionary.

Cheers to Nick Gravelyn for pointing out a solution to that problem.

7. Collections should be pre-allocated

When possible, you should use the parameterized constructors of all your Lists, Dictionaries, HashSets and whatever other collection types that you use, such that their backing arrays are pre-allocated to the number of elements that you plan to add to them.

Starting them with the default parameterless constructor will force the collection to grow (using Array.Resize, which trashes the old array and creates a new, bigger one) until you filled it completely.

Conclusion

That’s it for now.
I know, 7 is a terrible number for a “Top N” list, but I can’t think of other major sources of garbage that I’ve encountered. The rest goes down to good programming practices. (don’t instantiate reference types all over the place, etc.)

Hope it helped!

A Shared Content Manager for XNA

In an average-sized XNA game, you’ll end up having many levels using many art assets, with most of them sharing textures and models between each other. Using the standard ContentManager class, the basic approach is to load all of a level’s assets into a single ContentManager, and unload it when switching levels : this way there is no possible memory leak and memory usage is kept to a minimum.

But what about load times? Users usually want level transitions to be as seamless as possible, yet we can’t just pre-load everything, you gotta watch the memory budget…

Sharing is caring

One solution is to preserve shared assets : an asset that is loaded for Level #1 and re-used in Level #2 can be kept in memory instead of being destroyed and reloaded. Memory-wise it’s costless because you were about to reload it anyway; keeping it for a longer time has no negative effect.

A simple way to keep track of shared assets is to use reference counting : increment a counter whenever you ask to load an asset, and flush assets that have 0 references when you unload. But even the almighty Shawn Hargreaves thinks it’s a bad idea…

[…] reference counting sucks for all sorts of reasons I can’t be bothered to go into here. It is better than nothing, but falls short of the automatic, rapid development approach .NET developers have rightly come to expect.

Fair enough, but how about making asset disposal transparent by using the same ContentManager containers with the same public interface, yet use reference counting in the background?

I tried doing exactly that, and had great success with it, so I suggest you take a look at the code below and give it a shot!

public class SharedContentManager : ContentManager
{
    static CommonContentManager Common;
    List<string> loadedAssets;

    public SharedContentManager(IServiceProvider serviceProvider, string rootDirectory) 
        : base(serviceProvider, rootDirectory)
    {
        EnsureSharedInitialized();
        loadedAssets = new List<string>();
    }

    static void EnsureSharedInitialized() 
    {
        if (Common == null)
            Common = new CommonContentManager(ServiceProvider, RootDirectory);
    }

    // This is ripped straight off the ContentManager disassembled source...
    // Wouldn't have to do that if it were protected! :)
    internal static string GetCleanPath(string path)
    {
        // Ugly, boring code that you'll get if you download the codefile
    }

    public override T Load<T>(string assetName)
    {
        assetName = GetCleanPath(assetName);
        loadedAssets.Add(assetName);
        return Common.Load<T>(assetName);
    }

    public override void Unload()
    {
        if (loadedAssets == null)
            throw new ObjectDisposedException(typeof(SharedContentManager).Name);

        Common.Unload(this);
        loadedAssets = null;

        base.Unload();
    }

    class CommonContentManager : ContentManager
    {
        readonly Dictionary<string, ReferencedAsset> references;

        public CommonContentManager(IServiceProvider serviceProvider, string rootDirectory) 
            : base(serviceProvider, rootDirectory)
        {
            references = new Dictionary<string, ReferencedAsset>();
        }


        public override T Load<T>(string assetName)
        {
            assetName = GetCleanPath(assetName);

            ReferencedAsset refAsset;
            if (!references.TryGetValue(assetName, out refAsset))
            {
                refAsset = new ReferencedAsset { Asset = ReadAsset<T>(assetName, null) };
                references.Add(assetName, refAsset);
            }
            refAsset.References++;

            return (T) refAsset.Asset;
        }

        public void Unload(SharedContentManager container)
        {
            foreach (var assetName in container.loadedAssets)
            {
                var refAsset = references[assetName];
                refAsset.References--;
                if (refAsset.References == 0)
                {
                    if (refAsset.Asset is IDisposable)
                        (refAsset.Asset as IDisposable).Dispose();
                    references.Remove(assetName);
                }
            }
        }

        class ReferencedAsset
        {
            public object Asset;
            public int References;
        }
    }
}

Notes

By design, the class assumes that all your content managers will have the same root path and use the same service provider. This version uses the constructor parameters of the first instance for all subsequent instances. It’s kind of redundant to pass those parameters everytime since they aren’t used after the first instance has been created, you can probably simplify and optimize that part (I did otherwise in my project but it’s tied to my engine code).

Content loading is not thread-safe with this method. The version I use in my project again uses a different way to initialize the common content manager and monitors, but I thought it made the implementation too heavy for demonstration… this too would need work if you use threaded loading.

It works if you use forward slashes for paths because of the GetCleanPath method. But fun fact, it treats paths and filenames as case-sensitive so it will reload assets if you change the case between loadings! So be careful with that, or fix it. :P

Usage

Here’s the procedure for level transitions :

// Create a content manager for the next level
var nextLevelCM = new SharedContentManager(Game.Services, Game.Content.RootDirectory);

// Load the content for this next level
var fooTexture = nextLevelCM.Load<Texture>("foo");
var barSound = nextLevelCM.Load<SoundEffect>("bar");

// Unload the current (old) level's content manager
currentLevelCM.Unload();

// Cycle
currentLevelCM = nextLevelCM;

If you unload the last level’s content manager before you load the next level’s content, all the assets will be reloaded, which renders my code useless. Make sure you follow that order!

The code can be downloaded here : SharedContentManager.cs (4 kB, XNA 3.0 / C#3.5)

And that’s it! Hope it works for you!

Behind Fez : Trixels (part two)

I decided that I would write a more detailed article about the rendering module of the Trixels engine. Many things changed implementation-wise since last year, and I feel that now is a good time to go public as it’s probably not going to change much anymore.

I really don’t mind “coming clean” about how things are done, since Fez certainly didn’t invent non-interpolated orthographic voxels or whatever you might call trixels in a more formal language. Trixels are just a technology support for Fez’s art style, not necessarily the best, but one that works and that I’d like to share.

You should probably check the first post I made about trixels to get the basic idea first.

Here’s the trile I’ll dissect for this post, in a perspective view :

Memory representation

Each trile at its creation is a full 16³ cube, without any holes. The editing process consists of carving trixels out of the full shape to get a more detailed object.

The very first version of Fez recorded all the trixels that are present inside a trile, which means an untouched trile would contain a list of all the possible positions inside the trile to say that these positions are filled.

It became obvious early on that most triles have a lot more matter than holes, so the second version recorded the missing trixels instead. But even that became hardly manageable, especially in the text-format/human readable intermediate trileset description files; they were still too immense to edit by hand if needed, and took a while to parse and load because of their size. The memory usage was questionably high too for the little data that was represented.

I tried using octrees, but that kinda failed too. Thankfully Saint on the tigsource forums gave me a better idea; to use missing trixel boxes, so the smallest amount of the largest possible missing trixels cuboids. The good thing about these is that a box is represented by a 3D vector for its size and a 3D point for its position, that’s it!

Here’s a visualization of what these boxes looks like. Adjacent boxes have the same color.

The editor tries to keep these boxes as big as possible and their number as small as possible in realtime while you edit, but my algorithms aren’t perfect yet. A “best effort” scenario is fine though, a couple of superfluous boxes have little effect on memory usage/file size.

Polygonization

The rendering strategy of Fez is to draw the bounding surfaces of each trile as a triangle list and ignore everything that’s inside a trile. To do that, we must first isolate the contiguous surfaces of the trile, and then split it in triangles.

To extract surfaces, I assume that all of the actions from the initial filled trile are incremental. By that I mean that all you can do in Fezzer when sculpting a trile is remove a trixel or add a trixel, and each of these operation creates, destroys or invalidates surfaces; but each is treated separately and acts on the current state.
…I was going to explain the algorithm in detail, but I feel like it’d be just confusing and unnecessary. So, exercise to the reader. :)

Whenever a surface is modified/created, another pass tries to find the smallest amount of the biggest possible rectangles inside that surface. It’s the same thing as with the missing trixel boxes, but in 2D this time (also alot easier to do right!). My algorithm traverses the surface from its center in an outward spiral and marks all the cells that form a rectangle; the remaining cells are traversed later, recursively until the whole surface is covered with rectangles.

Each rectangle is then a quad that is formed of two triangles, which we can render directly. A vertex pool makes sure that there are no duplicate vertices, and maps the indices appropriately.

Below is a visualization of the rectangular surface parts of the same trile, in filled geometry mode and in wireframe.

Mass rendering

So that’s a single trile. A level is formed of a ton of triles, we don’t want to draw everything at all times. How do we manage that?
And rendering a lot of small objects in sequence is not very GPU-friendly, so how do we group batches efficiently?

The answer to the first question is efficient culling. Since the world is in essence formed of grid-aligned tiles, it’s easy to find which are visible in the screen or not, and render selectively. But Fez also works in the third dimension, and the depth range can be really big, so we need to find which triles we can skip rendering if they’re behind another trile. Simple enough, traverse to the first visible trile for each screen-space tile, and render this one only. But some triles can be flagged as see-through and let the traversing continue until we hit a non-seethrough trile or the level’s boundaries.

For tilted isometric views, a similar culling algorithm is used, where we try to find the triles with no neighbours on the faces that the camera can see. In perspective view, it’s a lot harder to cull without occlusion queries, which I didn’t want to get into… But 99% of the game is played from an isometric perspective so it’s no big deal.

Here’s a scene from the GDC ’09 trailer in a 2D view that you’d play in, and how it’s culled. The world extends in the third dimension, but we only need to see its shell!

The second thing is batching.

In the very first XNA version of Fez, I tried to call DrawUserIndexedPrimitives repeatedly for each trile in the world and hope for the best. It was unplayable, because as it turns out there is considerable overhead to draw calls and doing fewer, bigger draw calls is the key to 3D performance.

Each level has a trile set that contains a restricted number of different trile templates, and the level indexes integer 3D grid positions with elements of that trile set. Trile instances have a very limited set of properties to themselves (like rotation and offset) but every instance of a template trile shares geometry, texture and collision information. So I felt that geometry instancing was the way to go for batch rendering.

Different hardware supports different flavours of instancing but shader instancing is the common baseline for all Shader Model 2 and above GPUs, so I went with that. My current implementation of SM2 instancing supports 237 instances per batch, and the instance information is stored as vertex shader constants. The number will probably go down if I need to add more information to individual instances, it’s really minimalistic right now. But I found instancing to provide excellent performance on older hardware, current-gen GPUs and consoles, and not too hard to get to work well.

That’s it! Hope you enjoyed the tour. :)
Any questions?