C# – Page 3 – The Instruction Limit

Full Xbox 360 Gamepad Support In C#, Without XNA

I think that XNA’s implementation of the XInput API for Xbox 360 controllers is terrific. Simple, complete, ready to use. So when I wanted to add gamepad support (including analog input and vibration) to Super HYPERCUBE, a TV3D 6.5 game, I just added a reference to the XNA Framework assembly and carried on like I did in Fez. But I realize that referencing XNA just for controller support is silly, and I don’t want to assume that it’s installed on clients, nor force-install the redistributable.

So I recently looked for a more proper solution. It looks like MDX 2.0 Beta had support for XInput, but it’s been discontinued in favor of XNA for some time. There is also a handful of managed wrappers around the XInput native DLL that use P/Invokes… wasn’t too keen on that either.

Then I recalled that SlimDX is the community-maintained successor to MDX, and that it’s awesome. And as a matter of fact, it has complete support for XInput! So might as well use that.

But SlimDX is, well, slim. It’s a very thin wrapper and doesn’t do any normalization or dead zone detection for thumbsticks. It’s not much of a bother to do yourself, but I figured I’d post my code as a reference to people who want to use SlimDX’s XInput implementation in the real world.

Details

My dead zone code is based around the “Getting Started With XInput” guide on MSDN, but given the C#3/SlimDX beautifying treatment. :)
I also found that analog triggers on my controllers did not need dead zones, but if you want to use them as binary buttons, you can check against the appropriate SlimDX constant.

My “state class” does not wrap everything that the SlimDX Controller class exposes, like voice support, battery information, etc. So I made the local Controller instance a public readonly member, and you can access it to query whatever other information you need. Or of course you can add the getters/state variables that you need.

Download

GamepadState.cs (C#3 – 5 Kb, SlimDX March 2009 SP1 SDK or later needed)

Just the dead zone code

If you’re only looking for that, here it is :

    var gamepadState = Controller.GetState().Gamepad;
    var leftStick = Normalize(gamepadState.LeftThumbX, gamepadState.LeftThumbY, Gamepad.GamepadLeftThumbDeadZone);
    var rightStick = Normalize(gamepadState.RightThumbX, gamepadState.RightThumbY, Gamepad.GamepadRightThumbDeadZone),
}

static Vector2 Normalize(short rawX, short rawY, short threshold)
{
    var value = new Vector2(rawX, rawY);
    var magnitude = value.Length();
    var direction = value / (magnitude == 0 ? 1 : magnitude);

    var normalizedMagnitude = 0.0f;
    if (magnitude - threshold > 0)
        normalizedMagnitude = Math.Min((magnitude - threshold) / (short.MaxValue - threshold), 1);

    return direction * normalizedMagnitude;
}

Isotropic Specular Reflection Models Comparison

I’ve decided to repost all my remaining TV3D 6.5 samples to this blog (until I get bored). These are not new, but they were only downloadable from the TV3D forums until now!

This demo (originally released as VB.Net 2005 on Feburary 13th, 2007 here) is a visual and performance comparison (and reference implementation) of five different per-pixel lighting models for isotropic specular reflections :

Phong reflection model
Blinn-Phong (Blinn D1, Phong) specular distribution
Lyon halfway method 1 (for k=2 and D = H* – L)
Trowbridge-Reitz (Blinn D3) specular distribution
Torrance-Sparrow (Blinn D2, Gaussian) specular distribution

My main goals were to :

Make an optimized HLSL implemention of each model that fits in a single Shader Model 2.0 pass and supports 3 lights
Evaluate the performance of each model in a multiple light, per-pixel rendering context
Determine which model keeps the most numerical precision and does not produce artifacts when used with normal-mapping

Download

IsotropicModels.zip [2.7 Mb] – C#3 (VS.NET 2008, TV3D 6.5 Prerelease .NET DLL Required)

Screenshots

Details

The HLSL shader supplied with this sample was made to mimic the built-in TV3D offset-bumpmapping shader as closely as possible. As a result, almost all of its effect parameters are mapped to standard semantics. It supports :

One colored directional light
Two colored point lights in SM2.0, and four in more recent models (SM2a/b and SM3)
Support for all types of vertex fog
Parallax mapping of texture coordinates using a grayscale heightmap
Diffuse mapping with alpha support (a.k.a. texturing)
Normal mapping
Specular mapping (using the alpha channel of the normalmap)
Emissive mapping (colored!)
Usage of all material terms (diffuse, ambient, specular, emissive, power and opacity)

There is no support for point light attenuation as this would’ve gone over the 64 instructions limit of the ps_2_0 profile. (also TV3D doesn’t provide semantics for these)
There is no support for spot lights for the same reasons, but I believe spots will be processed as regular point lights, ignoring the specific parameters.

Techniques

With the realtime controls, you can choose from three different techniques : FiveLightsBranching, FiveLights and ThreeLights. On SM2.0 hardware, only the third option will be valid.

The FiveLightsBranching mode uses loops and “if” statements to produce dynamic branching on SM3.0 compatible hardware. This can (but may not) be benificial because only the calculations for enabled lights are performed.
The FiveLights and ThreeLights modes respectively do five and three lights (WHAT YOU SAY !!) but all in a static manner. It does not just unroll the loop! Most of the calculations are done with matrices, which makes it more efficient on most hardware.

To keep the shader “simple” (or to prevent from becoming even more complex…) I decided not to implement a multipass 5 lights technique for SM2.0… sorry!

Blinn vs. Phong

There’s two major categories in the models I tested : the ones that use the halfway vector, and the Phong model that works with the reflected vector. (see the Wiki entry on Blinn-Phong for details on these vectors).

A directional light reflecting on a surface with a power value of 64

According to a paper from Siggraph 2004 called Experimental Validation of Analytical BRDF Models, the halfway methods generate specular highlights with more realistic shapes than the Phong model. I realized that myself when working on an ocean rendering shader that had a Phong specular reflection, and it was impossible to get a long grazing highlight when the sun was setting.

Normal Mapping Artifacts

One of the things that made me do this whole analysis is that I was dissatisfied with the image quality of Phong and Blinn-Phong when used with a normal map, so per-pixel lighting. I had huge block artifacts on my water surface, and everything with a high enough specular power value and a bumpy surface. So I found about a reformulation of the Blinn-Phong model by Richard F. Lyon, written in 1993 (!) for Apple. (trivia : Mr. Lyon also invented the optical mouse… how awesome is that!)

This reformulation is interesting because it does not use the specular power literally as an exponentiation, it uses a distance metric and a much lower power value to produce very similar results to the Blinn-Phong model. Using a high specular power (32 or more) hurts floating-point accuracy, even in full-precision mode.

That said, I have seen hardware that do not have this problem. I am starting to think that it may be a driver issue, or something about mobile GPUs… In any case, the safe thing to do is choose the model that never produces artifacts, right?

The Other Blinn Distributions

The two other models I implemented (Trowbridge-Reitz and Torrance-Sparrow) were “ported” from the MATLAB code in Lyon’s reformulation paper. I wanted to test them out to see if they had the same artifact problems, and how different they looked from the classic models.

Trowbridge-Reitz is an interesting model because of how it looks. It’s slower than Blinn-Phong, but it has a distinct smoothness to it. The falloff of its specular highlights is softer than the other models… I’m not sure if it’s more accurate, but it looks pretty. Sadly, it has the same problems with normal mapping.

Torrance-Sparrow is a visual identity to the Blinn-Phong model. It’s the same thing, but slower and more instruction-heavy. It does not even fit in SM2.0 with 3 lights,… So I suggest you disregard it for realtime graphics.

Performance

I found that performance varies a lot depending on which technique you use, which shader model you support, how much your GPU is fillrate-limited instead of arithmetic-limited… So I’ll just say this : the Lyon model looks great, and it’s simple and fast enough to be worth considering. If you don’t experience the artifacts I describe, then the Blinn-Phong model is your best shot, but test Trowbridge-Reitz to see if it’s fast on your hardware.

It’s also worth mentioning that many things could be optimized by factorizing equations into small 1D or 2D textures (or perhaps a normalization cubemap), if your GPU loves pixels and hates instructions. But I don’t believe that the shader can be optimized that much by reorganizing code or removing useless statements. At least not without hurting visual quality.

Component System Updates

This sample contains a major breaking change to my component framework : The Service baseclass is gone. This makes Components able to “be” services (and publish many service interfaces), and allows this sample to have a much simpler class structure… no more state classes! The components just publish whatever data they want via their service interfaces. And with the new Eventful<T> class, it’s really easy to propagate changes from a controller to a view.

Top Sources of Heap Garbage

In the last Xbox-related entry I posted, I mentioned how the CLR Profiler can be a very useful tool to know what are the biggest sources of heap garbage in your .NET project. I used it extensively in the past month to optimize my game to run on the Xbox, and here’s a rundown of my biggest programming “mistakes” (or problematic liberties?).

1. LINQ

Even the simplest LINQ queries like :

List<List<Potato>> potatoBags;
foreach (var potato in potatoBags.SelectMany(x => x))
{
  // ...
}

…will cause a noticeable amount of heap garbage. This includes Where clauses, OrderBy clauses, everything! It’s sad because I think that LINQ is a fantastic code-thinning tool, but it’s not an option on limited-memory systems like the Xbox.

That said, if you want to use LINQ at initialization/loading time, feel free to do so. The problems only arise in update/draw calls.

2. Automatic XNB Deserialization

At one point, I got sick or writing ContentTypeWriter and ContentTypeReader classes and started building an XNB automatic serializer and deserializer based on Alexander’s (John Doe?) work on the subject. On Windows, the load times remained the same and it greatly simplified or deleted many of my content pipeline classes.

But on the Xbox the load times were horrible. Even in Release and without the debugger attached, the load times were at least 5x as slow as on my PC. I then discovered that reflection calls generate a lot of heap garbage — and it’s not even clear if memory stays allocated or if it eventually gets compacted by the GC…

So I swallowed my pride and switched back to good old Reader/Writers. But hey, now the load times are super fast.

3. Using classes when you can use structs

Coming from a Java background, I’m very used to classes and I’ll use them for pretty much everything. Even data structures that get created at runtime, because it’s so handy to have references and everyone pointing to the same object…

Turns out using structs has a lot of advantages. Intuitively I thought that the by-copy parameter passing would just make everything slower, but if you keep your structures small enough it has little to no effect on performance. The fact that they reside on the stack and not on the heap makes them a much better option for the Xbox. So datatypes like collision result objects, object identifiers, anything that you need to create often when the game is running should be made into structs.

4. Object pools (are a good thing)

Sometimes you just need to dynamically allocate reference objects in your algorithms. Or even value-types can get allocated on the heap if they’re at class-local scope. But you can minimize the damage by using object pools!

They’re really easy to set up (I found this Ziggyware article to be a good starting point) and they’ll save you heap garbage by preallocating to the number of objects you’ll actually need, and extending the lifetime of objects that would otherwise be disposable trash.

5. Removing objects from a collection that you’re enumerating

I thought I had found a really good way to fix the old problem of “Collection was modified; enumeration operation may not execute” when you remove an object from a collection when you’re foreach’ing on it :

foreach (var potato in potatoBag.ToArray())
{
  if (potato.Expired)
    potatoBag.Remove(potato);
}

It’s pretty cute, no? Very little impact on the iteration code. But it also copies the whole collection to a brand new array everytime you’re enumerating it… :(

What I ended up doing instead to minimize garbage is :

// This is allocated once, at the class-level scope
readonly List<Potato> expiredPotatoes = new List<Potato>();

public void Update()
{
  // Standard update
  foreach (var potato in potatoBag)
  {
    if (potato.Expired)
      expiredPotatoes.Add(potato);
  }
  // Removing pass
  foreach (var expired in expiredPotatoes)
    potatoes.Remove(expired);
  expiredPotatoes.Clear();
}

It’s certainly heavier code-wise, but at least it’s clean. And it’s faster too.

6. Enums as Dictionary keys

If you use an enum as the TKey type parameter for a Dictionary<TKey, TValue> object, you’ll have a small amount of garbage generated everytime you access the dictionary. But there’s an easy way around it : you just need to build a Comparer class for the enum type (which is under 10 lines of code) and pass it to the constructor of your dictionary.

Cheers to Nick Gravelyn for pointing out a solution to that problem.

7. Collections should be pre-allocated

When possible, you should use the parameterized constructors of all your Lists, Dictionaries, HashSets and whatever other collection types that you use, such that their backing arrays are pre-allocated to the number of elements that you plan to add to them.

Starting them with the default parameterless constructor will force the collection to grow (using Array.Resize, which trashes the old array and creates a new, bigger one) until you filled it completely.

Conclusion

That’s it for now.
I know, 7 is a terrible number for a “Top N” list, but I can’t think of other major sources of garbage that I’ve encountered. The rest goes down to good programming practices. (don’t instantiate reference types all over the place, etc.)

Hope it helped!

A Shared Content Manager for XNA

In an average-sized XNA game, you’ll end up having many levels using many art assets, with most of them sharing textures and models between each other. Using the standard ContentManager class, the basic approach is to load all of a level’s assets into a single ContentManager, and unload it when switching levels : this way there is no possible memory leak and memory usage is kept to a minimum.

But what about load times? Users usually want level transitions to be as seamless as possible, yet we can’t just pre-load everything, you gotta watch the memory budget…

Sharing is caring

One solution is to preserve shared assets : an asset that is loaded for Level #1 and re-used in Level #2 can be kept in memory instead of being destroyed and reloaded. Memory-wise it’s costless because you were about to reload it anyway; keeping it for a longer time has no negative effect.

A simple way to keep track of shared assets is to use reference counting : increment a counter whenever you ask to load an asset, and flush assets that have 0 references when you unload. But even the almighty Shawn Hargreaves thinks it’s a bad idea…

[…] reference counting sucks for all sorts of reasons I can’t be bothered to go into here. It is better than nothing, but falls short of the automatic, rapid development approach .NET developers have rightly come to expect.

Fair enough, but how about making asset disposal transparent by using the same ContentManager containers with the same public interface, yet use reference counting in the background?

I tried doing exactly that, and had great success with it, so I suggest you take a look at the code below and give it a shot!

public class SharedContentManager : ContentManager
{
    static CommonContentManager Common;
    List<string> loadedAssets;

    public SharedContentManager(IServiceProvider serviceProvider, string rootDirectory) 
        : base(serviceProvider, rootDirectory)
    {
        EnsureSharedInitialized();
        loadedAssets = new List<string>();
    }

    static void EnsureSharedInitialized() 
    {
        if (Common == null)
            Common = new CommonContentManager(ServiceProvider, RootDirectory);
    }

    // This is ripped straight off the ContentManager disassembled source...
    // Wouldn't have to do that if it were protected! :)
    internal static string GetCleanPath(string path)
    {
        // Ugly, boring code that you'll get if you download the codefile
    }

    public override T Load<T>(string assetName)
    {
        assetName = GetCleanPath(assetName);
        loadedAssets.Add(assetName);
        return Common.Load<T>(assetName);
    }

    public override void Unload()
    {
        if (loadedAssets == null)
            throw new ObjectDisposedException(typeof(SharedContentManager).Name);

        Common.Unload(this);
        loadedAssets = null;

        base.Unload();
    }

    class CommonContentManager : ContentManager
    {
        readonly Dictionary<string, ReferencedAsset> references;

        public CommonContentManager(IServiceProvider serviceProvider, string rootDirectory) 
            : base(serviceProvider, rootDirectory)
        {
            references = new Dictionary<string, ReferencedAsset>();
        }


        public override T Load<T>(string assetName)
        {
            assetName = GetCleanPath(assetName);

            ReferencedAsset refAsset;
            if (!references.TryGetValue(assetName, out refAsset))
            {
                refAsset = new ReferencedAsset { Asset = ReadAsset<T>(assetName, null) };
                references.Add(assetName, refAsset);
            }
            refAsset.References++;

            return (T) refAsset.Asset;
        }

        public void Unload(SharedContentManager container)
        {
            foreach (var assetName in container.loadedAssets)
            {
                var refAsset = references[assetName];
                refAsset.References--;
                if (refAsset.References == 0)
                {
                    if (refAsset.Asset is IDisposable)
                        (refAsset.Asset as IDisposable).Dispose();
                    references.Remove(assetName);
                }
            }
        }

        class ReferencedAsset
        {
            public object Asset;
            public int References;
        }
    }
}

Notes

By design, the class assumes that all your content managers will have the same root path and use the same service provider. This version uses the constructor parameters of the first instance for all subsequent instances. It’s kind of redundant to pass those parameters everytime since they aren’t used after the first instance has been created, you can probably simplify and optimize that part (I did otherwise in my project but it’s tied to my engine code).

Content loading is not thread-safe with this method. The version I use in my project again uses a different way to initialize the common content manager and monitors, but I thought it made the implementation too heavy for demonstration… this too would need work if you use threaded loading.

It works if you use forward slashes for paths because of the GetCleanPath method. But fun fact, it treats paths and filenames as case-sensitive so it will reload assets if you change the case between loadings! So be careful with that, or fix it. :P

Usage

Here’s the procedure for level transitions :

// Create a content manager for the next level
var nextLevelCM = new SharedContentManager(Game.Services, Game.Content.RootDirectory);

// Load the content for this next level
var fooTexture = nextLevelCM.Load<Texture>("foo");
var barSound = nextLevelCM.Load<SoundEffect>("bar");

// Unload the current (old) level's content manager
currentLevelCM.Unload();

// Cycle
currentLevelCM = nextLevelCM;

If you unload the last level’s content manager before you load the next level’s content, all the assets will be reloaded, which renders my code useless. Make sure you follow that order!

The code can be downloaded here : SharedContentManager.cs (4 kB, XNA 3.0 / C#3.5)

And that’s it! Hope it works for you!

Things you should know before/while making an Xbox XNA game

So I started toying around (read: developing full-time) with Xbox programming using XNA GS 3.0. In fact I took a big Windows Game project with many satellite Game Libraries, a Content Pipeline Extension, a content editor, an automatic serialization library… and “ported” most of it to Xbox. But since the editor will remain Windows-based, the engine and most of the code needs to stay cross-platform, compatible with both Windows and Xbox.

And I hit a few walls.

I feel like these are things many people working with XNA will encounter. XNA’s been around for a while now, since version 1.0. Many of these things are already widely discussed in the blogosphere and forums. Also, I’m aware that GS 3.1 is around the corner and it’ll address at least the first point of my rant…

Still, here’s a handful of things that surprised or annoyed me in the transition :

1. ContentTypeReaders need to stay out of Content Pipeline Extension project(s)

Suppose you have a pretty big project that has custom datatypes, and those datatypes are compiled to XNB files using custom ContentTypeWriters and then read back using ContentTypeReaders. You usually need a Content Pipeline Extension project for that, and this project would reference your Engine or whatever project owns the datatypes that you want to compile.

Before very recently, I never quite understood why all official samples had the Reader classes in the Game project, while the Processors, Writers and Importers were all in the Content Pipeline Extension project. Why decouple it like that, and why join them with a fully-qualified assembly string in the GetRuntimeReader method of Writer classes? Moreover, putting Readers in my Extension project always worked in my Windows-only solutions, and it all felt nice and clean.

But when doing everything for Xbox and Windows, the reason becomes clear…

The Content Pipeline Extension project is a standard C# project in XNA GS 3.0. Not a “Game Library” project or any other special container. This means that it won’t be duplicated if you do an Xbox version, and it makes sense; you only need to compile content on your Windows machine.

So your Content Pipeline Extension needs to have a reference to your content datatypes, in some Game Library project. Since the Extension is for Windows only, it’d reference the Windows version of that Game Library. And then if your Readers are in the Extension, your Xbox game needs to reference it to load assets… which means the Xbox and Windows versions of your data structure Library project would coexist on the Xbox. This can’t work!

Besides, the Xbox project doesn’t need to access Processors, Importers and Writers. All it needs to be able to read content and then use it. These other content pipeline classes may even use Windows-specific assemblies like GDI+, why not? They’re certainly useful for image processing.

So bottom line, keep Readers in your Game project if it’s a small project, or in a Game Library for both Xbox and Windows. And if your Content Pipeline can reference this Readers-container, then no need for a hardcoded String for GetRuntimeReader, you can just get the assembly-qualified-name from Reflection classes!

2. The Xbox doesn’t like garbage

The first thing I noticed after I got the game running were hiccups in the framerate, every two seconds or so. But the framerate apart from that was a constant 60. I half-expected this,… it’s the garbage collection.

This paper by three people at the FZI Research Center for Information Technology explains it much better than I can, so I’ll just quote them… :

The .NET Framework on PC uses a generational approach, making garbage collections less painful and more resilient to large numbers of objects. With 512 MiB of GDDR3 memory shared between GPU and CPU the Xbox 360 garbage collector can’t afford such luxury.
The garbage collection of the .NET Compact Framework for the Xbox 360 always has to check all objects. Therefore the time a collection takes increases linearly with the number of objects. Further a collection will be triggered whenever 1 MiB of memory has been allocated.

This means you really need to stop carelessly allocating to the heap when doing an Xbox game. There are several “known causes” of heap garbage with the XNA Framework, but it’s easy to start going on a witch-hunt and replacing all foreach(in) statements by plain for(;;) or stuff like that… It’s a much better idea to find out what are the bottlenecks in your application and fix them starting by the bigger ones. You’ll probably end up solving most of the jittering without making your code look like C.

The above paper presents some options for memory profiling like the CLR Profiler and XNA Framework Remote Performance Monitor, both of which I have yet to try, but sounds like excellent free tools to address this issue.

Update : See this post for more information on typical causes of heap garbage.

3. The Compact .NET Framework needs your help

The Xbox .NET implementation is not the full-blown framework, it’s based on a subset called the Compact Framework, which is also used on mobile devices and embedded systems. This comes at a small cost : you have to complete it to fit your needs.

It’s actually pretty cool that LINQ is supported and all 3.0 features work flawlessly. But here’s a short list (off the top of my head) of things I found missing, some important, some easily worked around, all of them at least mildly annoying… :

Enum.GetValues(), Enum.GetNames(), Enum.GetName() : These are all missing from the CF. There is an old thread on the XNA forums that proposes alternatives that use Reflection. I found them to be working great.
ParameterizedThreadStart : You can’t start a Thread with a context object in the CF. You then need a shared context object in the parent class.
Type.GetInterface(string, [bool]) : You can’t query the interface of a type via reflection in the CF… at least not a single one by name. The GetInterfaces() method is supported, so might as well just use that.
Math.Log(double, double) : Actually, there is a Log(double) function, but it’s with the natural base. The custom-base one is not supported. Seriously? (and I know, the workaround is a one-liner)
HashSet : I love the .NET 3.5 HashSet generic class. It’s really complete, super fast… but the CF doesn’t have it. I ended up faking one with a Dictionary as a backing collection, and rewriting the set operators (UnionWith, IntersectWith, etc.) that I really used.

4. You’ll pretty much need a Content project

I don’t like the fact that in a standard XNA project, the content is compiled at build-time, in the Visual Studio IDE. For many reasons… one of them being that I’m not the one producing the content, the artist does, and he certainly doesn’t want to have Visual Studio installed. Another one being that my Content Processors are super heavy and VS sometimes crashes with a OutOfMemoryError before the build completes.

So what I did is write a content compiling tool that uses the MSBuild API to generate something like the .contentproj (yet simpler) based on the filesystem automatically, and compile it externally without needing Visual Studio. This works really great for Windows, we’ve been using it for months now.

But for Xbox… the deployment process is also tied to Visual Studio. And I’m not expert enough at MSBuild technologies to be able to replicate deployment outside of it. So I ended up making a content project only for the Xbox version of my Game project, and compile it separately when I need to test on XNA Game Studio Connect. This works OK, but I’m still a little unhappy about this whole Visual Studio dependency. I hope they look into it properly in the future.

So I was under the impression that you needed to have a content project for Xbox deployment, but Leaf (first comment) pointed out two ways of handling content compilation outside of Visual Studio. I’m definitely going to use the first one!

And…

These are the big points for now. Stuff I thought about adding : RenderTargets act different on Xbox and PC (but Shawn Hargreaves already blogged extensively on the subject and it’s way better than it used to be), Edit-And-Continue is not supported when debugging on the Xbox (but that would’ve been asking for the moon!), you get different warnings for shader compilation when targeting the Xbox360 platform so you should pay attention to that, etc. etc.

I’m still very much halfway through the conversion process, and I’m still learning, and still discovering oddities. If you have advice or corrections, please let me know through comments! On my part I’ll keep this post updated if I hit another big wall.

Dirtyable Objects

I’m going to start this post with a warning : I haven’t yet decided if this class or practice is a good idea. But I thought it was an interesting use of implicit casting operators, and a cool variation on the Nullable<T> pattern.

public class Dirtyable<T>
{
    T value;

    public T Value 
    { 
        get { return value; }
    }

    public bool Dirty { get; private set; }

    public void Clean()
    {
        Dirty = false;
    }

    internal void Set(T newValue)
    {
        value = newValue;
        Dirty = true;
    }

    public static implicit operator T(Dirtyable<T> dirtyable)
    {
        return dirtyable.Value;
    }
    public static implicit operator Dirtyable<T>(T dirtyable)
    {
        return new Dirtyable<T> { value = dirtyable };
    }
}

Motivation

I made this class in an attempt to abstract the concept that objects that are replaced or written to can be marked as “dirty”, then processed and cleaned by an external actor, such that the “on-dirty” processing is only done when it is needed. I feel like this works better than actual “on-changed” C# events in some cases because there can be many writes or many changes to this value, but the only value that matters is the last write; events would do the processing job on every intermediate change.

It’s important to note that my class would only work with immutable objects like primitives or structs, because it isn’t possible (read: easy) watch their mutable state. It can only detect if you write and replace the current object.

Usage

// Host declaration
readonly Dirtyable<Matrix> textureMatrix = new Dirtyable<Matrix>();
public Dirtyable<Matrix> TextureMatrix
{
    get { return textureMatrix; }
    set { textureMatrix.Set(value); }
}

// Modification
mesh.TextureMatrix = new Matrix { M11 = 1, M22 = 1, M33 = 1, M31 = something };

// Access
var m22 = mesh.TextureMatrix.Value.M22;
Matrix m = mesh.TextureMatrix;

// On-dirty processing
if (mesh.TextureMatrix.Dirty)
{
    // This is the "slow" operation
    textureMatrixEffectParameter.SetValue(mesh.TextureMatrix);
    mesh.TextureMatrix.Clean();
}

If you need to access the field a lot and the dereferencing gets annoying, you can also make another public getter in the host class that does it for you. I did that in a couple of occasions.

Thoughts? Objections? Confusion?

A note on heap garbage

The implicit conversion from T to Dirtyable<T> seemed like a good idea at first, but when profiling my code for the Xbox I found out that it was one of the important sources of heap garbage in my code.

This can be avoided by making the “Set” method public and using it as follows :

mesh.TextureMatrix.Set(new Matrix { M11 = 1, M22 = 1, M33 = 1, M31 = something });

Not much uglier, and less stress on weaker GCs. :)

Loop Parallelism Revisited

Here’s a follow-up on a previous post I had made on “Loop Parallelism”.

A faster PersistentThread

In my last post, I wrote a class that keeps a single thread alive in order to re-use it without having to create threads over and over, an operation that sounded like a lot of overhead when you have over 30 updates per second (i.e. in a game loop). However, the benchmarks showed that it was actually slower to use synchronization primitives than to just recreate the threads every time!

It turns out that using a Monitor for the wait/signal operations wasn’t so good. There are lighter primitives in .NET that can be used for that simple operation, where all you need is a boolean semaphore. After researching a little bit, the best object for the job is the ManualResetEvent. This allows you to put a thread in sleep mode with the WaitOne() method, and wake it from another thread using the Set() method.

I ended up using two ManualResetEvents : one for a thread waiting to be started with a new work unit, another to simulate a thread “Join” operation (without actually killing the thread). And the cost of using them is MUCH smaller than Monitors! Here are the new performance numbers :

Test ‘Multi-Threaded, PersistentThread (single kept-alive thread w/ generic context & delegate)’ Started… Completed.
Time Elapsed : 00:00:11.3376303 s
Test ‘Multi-Threaded, ParameterizedThreadStart’ Started… Completed.
Time Elapsed : 00:00:11.4678206 s
Test ‘Single-Threaded’ Started… Completed.
Time Elapsed : 00:00:22.3374592 s

Seeing as the iterative version takes ~22.33 seconds to execute, the theoretical minimum time it could take on my Core 2 Duo is ~11.17s. And the new PersistentThread takes only ~0.16s less than that!
Also notice that the former speed champion, the base-line ParameterizedThreadStart method, is now slightly slower than the new PersistentThread. Honestly, I don’t think one can do any better with just two threads.

Task Parallel Library

The fancy lads at Microsoft have been working hard lately on bringing new ways of working with concurrent code. It was one of the big subjects on Channel9 this autumn.

So they released a CTP back in June of the Parallel Extensions for .NET, which features the Task Parallel Library and a front-end for using it with the Parallel static class and Parallel LINQ. I love it, and I can’t wait to use the official, final release in .NET 4.0. But for now, the CTP is an excellent reference point to see what kind of performance and ease of integration the TPL brings to your project.

The simplicity of my test codepath for the TPL boggles the mind, compared to all the other methods I’ve tested.

for (int i = 0; i < OuterLoops; i++)
    Parallel.ForEach(testData, sample => { sample.Number = SlowFunctions.Bessel(sample.Number / rnd.Next()); });

The number of threads used, how the data is partitioned, and the actual thread/context handling is all abstracted to its simplest form. It really makes C# 3.0 lambda expressions shine.

So, it’s cute alright. But how does it perform? Actually, the TPL will allocate as many threads as it thinks would be beneficial, like an automatic thread pool, based on how many processors you have and the workload you give it. I’m not sure of the details, but that’s how I understood it. So I suspect that it uses more than two threads if doing so will give more processor time to your code; this is relevant when there are many other threads waiting for the processor, and more of these threads being your code = more chance of it being executed.

So here are the numbers :

Test ‘Task Parallel Library’ Started… Completed.
Time Elapsed : 00:00:11.4088743 s

It’s definitely fast, but not as fast as my new PersistentThread! But it’s so much more flexible, easy to use and scalable that it’s the obvious choice for concurrency in .NET,… when it’ll be officially released. :)

A final note on the Parallel Extensions, there’s a new class called ManualResetEventSlim in this CTP which suggests that it’s more optimized or leaner. I didn’t use it because it depends on libraries that cannot be shipped, but quick test showed that the performance was more or less the same as ManualResetEvent.

Vista > XP

Since the last post, I upgraded (though some might say downgraded…) to Windows Vista. The experience isn’t totally smooth since my laptop is being dumb, but one of the upsides of upgrading is having access to new kernel code that appears to be more efficient.

My last benchmarks had quite a margin between the different tests (generic context & delegate, class-local context, etc.). In Vista, it’s pretty much the same :

Test ‘Multi-Threaded, ParameterizedThreadStart’ Started… Completed.
Time Elapsed : 00:00:11.5178019 s
Test ‘Multi-Threaded, Class-Local Context’ Started… Completed.
Time Elapsed : 00:00:11.4912111 s
Test ‘Multi-Threaded, Generic Context & Delegate’ Started… Completed.
Time Elapsed : 00:00:11.4255807 s

Those are really insignificant differences. I suspect that either calling delegates, creating objects, or just casting objects has become faster in Vista. It’s really hard to tell, but the good news is that you can really choose whatever approach you like best : they’re all equally as fast.

Two threads per core?

A final test I wanted to share is to double the workload on an Idle processor and see what happens. So instead of creating a single PersistentThread, I created 3, which means 4 active threads including the current one. The data is split in four equal parts, and…

Test ‘Multi-Threaded, 3x PersistentThread’ Started… Completed.
Time Elapsed : 00:00:12.3893090
Time Elapsed : 00:00:11.3681673
Time Elapsed : 00:00:15.0850932
Time Elapsed : 00:00:10.2940759
Time Elapsed : 00:00:16.0883727

… and I don’t know what to say. The results are ridiculously unstable, and the code is just confusing and ugly. I wouldn’t recommend it.

Also, I did consider using the .NET 2.0 ThreadPool static class, but it seemed a little confusing to use, especially if I want to join the threads after everyone’s done his work. I think I can wait for the TPL for a proper solution. ;)

Updated test project

Here’s the updated project, now you’ll need the Parallel Extensions CTP for it to compile!
LoopParallelismUpdate.zip (150kb), Visual Studio 2008 Solution for C# 3.5.

Super HYPERCUBE

Note : SUPERHYPERCUBE has been released by Kokoromi for PSVR, I did not work on this version at all, but this article shows its development history.

Super HYPERCUBE (capitalization may vary) is a game I made with the fine folks at Kokoromi for this year’s (2008) Gamma art/game show in Montréal. Gamma is a themed game party that’s been happening for three years now, each year with a design constraint for all the games that apply; this year’s Gamma 3D was about red/cyan stereoscopy aka color anaglyphs.

SHC Logo/Splash Screen : It’s actually all 3D and animated.

The idea is that you have to fit a cluster of cubes inside a wall that represents a projection of that cube on one of its faces, with a series of rotations applied. So it’s a bit like the Japanese “Human Tetris” game shows, which is the comparison that our recent blog coverage have been using, and it’s exactly right. Except you’re handling a random cluster of cubes.

Development Timeline : From concept art, to Sketchup mockup, to early prototype, to final product!

The game, like all other Gamma games, was made to be easy to learn and fun within 5 minutes, because it was to be played by the public and we want to get as many people to play as possible. So the concept is fairly simple, but I was surprised about how competitive the gameplay was on the showfloor! Until the last minute, I had a fight with a fellow party-goer for the #1 high score, which I won by an unfair margin, which I assume was due to luck and… well… hours of testing the game while making it. ;)

Good Luck With That : The shapes get pretty crazy in the last moments.

But our game’s most awesome feature is not just stereoscopy, it’s wiimote headtracking! Which is a bummer, because even if the game is now available for download, I assume noone will have the setup to play it as it was meant to be played. (The most important part being IR-LED-mounted glasses!)

You can still use an Xbox 360 gamepad or just the keyboard to play it, and that’s how I’ve been testing it most of the time. It’s just nowhere as immersive without the headtracking… the combination of that and stereoscopy worked really well for us. There will probably be videos of people playing at Gamma 3D sometime soon, I’ll update this post with links.

Updates :

Here’s a photo taken by my friend Matthew, of someone playing SHC with the LED-mounted glasses!
Aaaand Infinite Ammo just posted video footage of SHC @ Gamma 3D with many other Gamma games like the delicious Paper Moon and Fireflies! Also an epic dance routine.
A handful of additional shots of Super HYPERCUBE at Gamma 3D, including me playing it (and clearly enjoying myself)! Thanks to Ivan “Toastie” Safrin for the pics.

Downloads

Binaries (this one isn’t open-source, sorry…) : sHC_final.zip (1.6 Mb)
Update 21/11 02h57 GMT-5 : Put the required font in a texture instead of looking up the TTF. I didn’t realize that Century Gothic wasn’t shipped with Windows anymore…

You will need the .NET 3.5 SP1 framework installed, and TV3D requires some oft-missing DirectX DLLs which you can get with the End-User Runtimes.

Acknowledgments

I have to say that the Wiimote headtracking technology is all thanks to Johnny Chung Lee‘s inspiring work on the subject (and free code!), as well as Brian Peek’s C# Wiimote library without which this would have never happened.

The game itself was programmed using C# 3.5, the Truevision3D 6.5 engine and part of the XNA framework (I’ve bundled the DLL) for full Xbox controller support. There is no sound, this is voluntary… there was a DJ at the actual event. :)

And last but not least,…

Credits (I’m not alone in this one!) :

Renaud Bédard – Polytron (Concept, Programming, Hardware)
Phil Fish – Kokoromi/Polytron (Concept, Design)
Jason DeGroot – Polytron (Concept, Hardware)
Cindy Poremba – Kokoromi (Design)
Heather Kelley – Kokoromi (Design)
Damien Di Fede – Kokoromi (Play-Testing)

Trouble In Euclidea, a mini-game

Chances are that if you’re reading this blog, you’ve already heard of this, but I’ll put it here for archiving. :)

Trouble In Euclidea is a game I made for the TIGSource Bootleg Demake Competition as an obvious demake of Geometry Wars. My intent was to demake the graphics (by using ASCII art for everything) more than the gameplay, but it turns out most people did both, which makes much more sense as the gameplay ends up being more original and it plays less like a cheap clone. Mine kinda does. :P

I ended up with five votes in the competition, which places my game 26th out of 30 positions. It may not sound like much, but I’m really happy that I got votes at all! There are really amazing entries that scored around mine, SHADE: Ghost Academy and DamN for instance…

It was also a nice experiment in fast prototyping. I did the game in a single month, but only on weekends except for the last week. Which means I spent at most 15 days on the game, from start to finish, from graphics to game code.

I used C# 3.5, IrrKlang.NET, TV3D and my spiffy new XNA-inspired Component Framework to build it. It worked really well for me, so I decided to release the source of the whole project. As with most of my code recently, it has an almost complete absence of comments, but should be fairly self-explanatory.

Downloads

Source with libraries and content : TroubleInEuclidea_src_r4.zip (2.7 Mb)

Binaries only : TroubleInEuclidea_r4.zip (1.6 Mb)

(for those wondering, the fourth update “r4” only contains bugfixes in the component framework, a new version of IrrKlang and very little code cleanup)

Of course you’ll need .NET 3.5 installed, and TV3D requires some oft-missing DirectX DLLs which you can get with the End-User Runtimes.

Closing Notes

There are three known bugs :

Sometimes fuschia octogonal enemies make their spawning sound, but don’t actually spawn.
Sometimes enemies appear too close together and “bounce” very quickly, sometimes traversing the whole screen in less than a second.
If you close the game with Alt+F4, it won’t actually close. Use the ESC key!!

There are also two achievements, read the ReadMe file for more info! :)

Fast .NET Reflection and Serialization

(sorry if you got this twice in your RSS, I hit the “publish” button too early…)

A while ago I decided to make an automatic serializer that works just like the XmlSerializer but for the SDL file format, since I like the simplicity and elegance of this data language. The XmlSerializer also doesn’t work natively with Dictionary objects, and crashes when used with certain visibility combinations and C# 3.0 auto-implemented properties.

Making a serializer for any language implies heavy use of reflection to determine the structure of what you’re reading or writing to or from a data file, but also to invoke the getter/setter of the fields you’re serializing.

Performance considerations

Some reflection operations come at a heavy performance cost. Not all of them though! This 2005 article in MSDN Magazine explains that fetching custom attributes, FieldInfo/PropertyInfo objects, invoking functions/properties and members and creating new instances are the costliest operations. Well that’s a problem, because all of those will be handy when writing our serializer.

The same article continues by showing which are the slowest method invocation techniques. The speediest technique are direct delegate use, virtual method calls or direct calls, but those are impossible to use if all you’ve got is a Type and an Object. The next best thing is using a DynamicMethod object, IL emission and a delegate. Having never used IL before, I didn’t grasp all of that, but thankfully there are many other resources concerning the use of DynamicMethod out there.

A post on Haibo Luo’s blog from 2005 makes a performance comparison between Activator.CreateInstance() (by the way, doing “new T()” with a generic type parameter that’s constrained as “new()” is the exact same as calling this method) and various other techniques including DynamicMethod and using it as a delegate. This last technique blows the rest out of the water in terms of speed.

This GPL library on CodeProject written by Alessandro Febretti provides an excellent dynamic method factory. And this other article on CodeProject goes a bit further and shows how to set/get values on fields, and isolates the boxing in helper functions.

What I ended up doing is taking from all of these examples, correcting the problems outlined in the comments of both CodeProject samples, and I built a IReflectionProvider interface that publishes all these costly operations and which can be implemented three different ways :

DirectReflector : Simply via reflection
EmitReflector : With IL emission but no caching performed (the DynamicMethods and delegates are rebuilt on each call)
CachedReflector : With IL emission and caching (the resulting delegates are created only once, then accessed with a dictionary lookup)

I’m aware that the 2nd test case is ridiculous, you should never emit IL and generate methods at runtime and repeatedly, but I wanted to outline the importance of caching.

The serializer

When making this sample, I wanted to both provide a fast .NET reflection library as well as a proper generic implementation of a reflective serializer. But I didn’t want to spend time on string parsing/formatting, since serializers usually output a text file or a certain data format. So the tradeoff I chose is somewhat unusable in the real world…

It outputs objects which are a generalization tentative of all .NET objects. There are three main categories :

SerializedAtoms are indivisible, single-valued and immutable. All primitive types will serialize to atoms, in addition to strings, enums and nullable types.
SerializedCollections are multi-valued object bags that don’t give a specific meaning to keys or indices other than natural ordering. All classes that implement ICollection<T> will serialize into this.
SerializedAggregates are multi-valued object maps that use the key or index for indentification. All of which doesn’t fall in the two other categories will serialize to aggregates, so Dictionaries and just any other class.

Only atoms contain actual values, but it contains them as an object. There is no string conversion done in the end, it all remains in memory. Serialized objects also retain the name of their host field or dictionary entry if any, and the runtime type if different from the declared one.

To customize the serialization output to an extent, I made a custom attribute called [Serialization] which allows to force an alternate name to a serialized member, mark a member as ignored by the serializer, or mark it as required. I could’ve used “optional” instead, but I find it more logical to skip serialization of all null or default-valued fields.

Just like the XmlSerializer, it only serializes the public instance fields or properties. So unlike the BinaryFormatter (which is deep serialization), my serializer does shallow serialization.

I have tested the implementation with many (if not all) combinations of value-type/class, serialized object category and visibility, so I can say it’s pretty robust and tolerant on what you feed it.

Results

This is the whole point… how fast does “Fast .NET Reflection” go? Here are the timings for 10 outer loops (so 10 serializer creations) and 100 inner loops (100 serializations per outer loop), which means 1000 serializations or the same complex aggregate object.

Test ‘Standard Reflection’ Started… Completed.
Time Elapsed : 00:00:08.2473666
Test ‘Reflection.Emit + Delegate (No Caching)’ Started… Completed.
Time Elapsed : 00:01:52.4517968
Test ‘DynamicMethod + Delegate, Cached’ Started… Completed.
Time Elapsed : 00:00:00.9970487

Well, I did say that no caching was a very bad idea.

Still, the highlight here is that by running the same serialization code with two different reflection function providers, using dynamic IL methods and a healthy dose of caching is eight (8!) times faster than using standard reflection.

Sample code

The code for this sample (C# 3.5, VS.NET 2008) can be found here : FastReflection.zip (46 Kb)

Even if you’re not interested in serialization, I suggest you take a look at the EmitHelper class and how it’s used in CachedReflector. All tasks that need Reflection in a time-critical context should use dynamic methods!