Loop Parallelism in a Game Context

An update to this post is available here.

Recently I was coding some physics-enabled particle systems and ended up with some fairly CPU-intensive stuff that looped as times as there are active particles in a system, and as many as there are active particle systems, each update cycle.

And it got choppy.

I don’t like choppy.

So I had three choices :

  • Offload to the GPU using GPGPU or Vertex Textures, but doing that in XNA with Shader Model 2 support is… time-consuming. Especially when you have no background work on the subject.
  • Profile and optimize my physics code, or switch to a physics package like Havok or Farseer, but I expect that would also take too much time.
  • Multithread using loop parallelism to use 100% of my CPU power instead of 50%! (I have a Core 2 Duo processor, and there’s more and more mainstream PCs with dual-, quad- and even octo-cores)

I ended up doing the latter because it was the simplest, and multithreading is rarely ever simple. But in some specific cases, it’s not much of a headache, really.

If your looped operation does not change the context of further loops, that is if every loop can be done in any order and individually without any write locking, it takes like 30 minutes to get it to work in C#, and with no concurrency problems. It’s like offloading static Web content to a different server, just a matter of routing to the right hardware.

Annoyances and interrogations

  • ParameterizedThreadStart is not generic. If you have to pass a context to your threaded operation, you have to cast it back to what it really is, at each iteration. It’s unnecessary and annoying… is it slow?
  • Once a thread dies, you can’t resurrect it. Correct me if I’m wrong, but FSM knows I’ve tried, and a thread that finished its execution cannot be re-run, even if its thread-start method is the same. You have to instantiate another. Does that affect performance…?

The problem with thread creation is that I create a lot of temporary threads. They live a single update cycle, and I instantiate one per particle system per update cycle, at about 60 cycles per second. It sounds like it could slow things down.

So I made a test!

Benchmark

The (fictional) situation is the following : in the middle of a game loop, you have to evaluate the 0th Order Modified Bessel Function Of The First Kind for some reason. And you need to do that for a large dataset, say 500 times per update cycle.

Actually, this kinda makes sense if you’re calculating weights for a very wide Kaiser filter, which I will probably cover later. I had code laying around, which is why I used the elegantly-named Bessel function.

So, this is slow, because it’s an integral. And so multi-threading would be beneficial. The different strategies are :

  • Single-Threading; it’s always good to know how slow it went before all optimization… if only for programmer ego.
  • Multi-Threading, using the ParameterizedThreadStart delegate, which means we’ll have to cast the Object to our real context type.
  • Multi-Threading, using a class-local, strongly-typed context and the (parameterless) ThreadStart delegate. This way we avoid the cast, and sacrifice a weird variable in the host class’ header.
  • Multi-Threading, using a generic Thread wrapper that does the strongly-typed context caching instead of laying it around. It’s basically a proxy for Thread with the context variable.
  • Multi-Threading, using a single “kept-alive” Thread that is started once for all update cycles, and that is forced into an artificial idle state between update cycles.

I hoped, even half-expected that these solutions would go from slowest to fastest. The last one was particularly interesting because it skipped all the thread instantiations and instead, relies on a Monitor and a sync-lock object to “wait” between two update cycles. It also produces much less garbage.

Results

Here’s the result for 500 update cycles, in which I update a 500-sized data-set, in Release mode, out of the IDE so without any debugging, and as little stuff as possible running in the background. I used a Stopwatch object to calculate these timings.

Test ‘Single-Threaded’ Started… Completed.
Time Elapsed : 27.8970399 s

Test ‘Multi-Threaded, ParameterizedThreadStart’ Started… Completed.
Time Elapsed : 18.1804179 s

Test ‘Multi-Threaded, Class-Local Context’ Started… Completed.
Time Elapsed : 19.5161083 s

Test ‘Multi-Threaded, Generic Context’ Started… Completed.
Time Elapsed : 20.2257278 s

Test ‘Multi-Threaded, Kept-Alive Thread’ Started… Completed.
Time Elapsed : 23.7845815 s

Of course, everything goes differently from what I expected. :P

The good news is, thread creation in .NET is very fast. No need to worry about that anymore. In fact, trying to circumvent that using a single kept-alive thread and monitor use makes it go almost 30% slower!

Also, casting the object context every iteration is faster than any other strongly-typed alternative. Unless the word “Object” in your code makes you want to shoot someone, it’s the best way to go. Kind of sad, that.

Here’s the C#3.0 (VS.NET 2008) test project : Loop Threading Performance (24 Kb)

Sorry again for the absolute absence of comments, it’s late and I don’t feel like it. D:

A Note On Debugging

One thing that’s slightly alarming and certainly worth mentioning, is that the performance in the IDE (with Debugging, but not necessarily in “Debug” mode) is hugely different.

In debugging, the “kept-alive” thread is almost always faster than all other strategies, by as much as 15%. This never replicates in the real-world. So kids, always test out of the IDE, or add the empty green arrow (Start Without Debugging, Ctrl+F5) in your Visual Studio toolbar!

Hash tables and mutable keys, final

I am really, really sick of seeing the preceding post and its cliffhanger on my blog’s front page, so here’s an abrupt end to it.

There is no sample/code, there won’t be any sample. But I do have one conclusion : If your algorithm needs a hash table with mutable keys, then don’t mutate their hash code.

There is absolutely no good reason for a hash set to have keys that get lost, and no point in hacking the data structure to trace all those swaps and recover from them.

Why do I say this after two lengthy posts about how to circumvent the problem? Simple. A hash set should be, first and foremost, a set. And sets are built upon the idea that all its elements are unique and there are no possible duplicates. But if an element can mutate enough to modify what uniquely identifies it, the operation potentially breaks this assumption. How can you know if your set has unique elements if the elements keep changing, and you can’t keep them from changing?

This leads me to the inevitable (yet predictable) conclusion that mutating keys is not a good idea. In my tests and situations, and I’ve tried for weeks to find a test case where sets mutable keys are absolutely needed, there is always a way around it, and it’s safer, it makes more sense.

So next time you see exceptions or problems of that nature (e.g. unreachable elements), think for a minute. Do you really need mutable keys? You probably don’t, but what if you do? Just don’t override GetHashCode() and move to something else.

There, problem solved. Moving on. :)

Hash tables and mutable keys, part two

(Part one is here.)

You know, I was so confident when writing the first post that I was addressing a common problem (which is not entirely untrue considering Google hits) and that it would be a pretty straightforward “here’s the issue, here’s the fix, enjoy” scenario.

Boy, was I wrong.

The common approach when trying to fix a bug is to first reproduce the problem. While I had no difficulty making a fictitious test case that does reproduce the “unreachable hashed entry” issue, I’m having all the trouble in the world finding a proper, real-world, justifiable algorithm that needs a hash table with mutable keys.
It just never seems to me like a reasonable thing to do.

Nevertheless, being the stubborn bastard that I am, I’ll present here a recipe to encounter the problem with as much justifications as I could make up. And if ever you’re ever stuck in a situation where you do need a hash set to react properly with mutable entries, well you’ll have a solution of sorts.

Take a deep breath (this is a looooong one) and hit the jump.

Continue reading Hash tables and mutable keys, part two

Hash tables and mutable keys, part one

In this little series of posts (probably two three), I’m going to address a problem that has happened to me yesterday and was a real blocker in my algorithm : how to get a hash table (HashSet or Dictionary) to work with keys that change their hash code over time.

Continue reading Hash tables and mutable keys, part one

Effect Compiler & Disassembler

Updated! Now supports nVidia’s ShaderPerf tool.

Downloads

EffectCompiler.zip [51.3kb] – XNA Game Studio Express 2.0 (Visual C# 2005 Express), Source + Binaries

Description

Yesterday I took apart my Effect Compiling Tool which took a HLSL shader and converted it to Windows/Xbox360 bytecode, and made it into something more useful outside of XNA.

It’s always been somewhat of a hassle for me to compile and disassemble HLSL shaders. I can edit them pretty well in Visual Studio with code coloring and tabulations/undo’s/whatnot, but to compile them I always had to go with something else. I had read in the book Programming Vertex and Pixel Shaders by W.Engel how to compile them in VC++ 2005 using a Custom Build Step and fxc.exe, but when working in C# I had to have a parallel C++ project just for shaders, which is dumb. Also, fxc.exe has become less and less stable for some reason… So I finally made my own compiler and disassembler using XNA 2.0.

Continue reading Effect Compiler & Disassembler

XNA 2.0 and Windows Forms

I’ve been trying out XNA 2.0 recently and sadly, my former method of using the internal WindowsGameForm and play with it to add all the desired controls has seemingly stopped working. :(

Fortunately, Pedro Güida on CodeProject has found a much simpler way to achieve that, and which enables the use of the Windows Forms Designer instead of adding all the controls by hand like I did. I’m currently implementing that in the Fez editor and I encourage you to do the same! It’s the cleanest method I’ve seen yet.

I’m mostly posting this because my article on Windows Forms has been the most popular on my blog since I posted it, and I wouldn’t like people to take the time to implement it and find out it just doesn’t work anymore. I’ll put a note on the original article that says “XNA 1.0 Refresh only!”, if people are still using that.

The aforementioned article also presents how to use XNA with Visual Studio 2008, WPF projects and even Silverlight, so it’s a good read in any case.

XNA Service Injection Sample

Download

ServicesDemo.rar [13 kb] – XNA GSE 1.0 Refresh Project (source only, the executable is pointless)

Description

Finally, here it is!

In my last blog post I said I’d be doing a sample of what’s explained in it, so service dependencies and service injection in components and other services, in an XNA context. It actually worked in my own game for many weeks, but I just found the time and motivation to finish up my sample.

Details about the sample structure after the jump.

Continue reading XNA Service Injection Sample

Service Dependencies in Game Components

Update : See this post for a sample project!

Last summer I worked with Microsoft patterns & practicesGuidance Automation Toolkit, one of their software factories used for convenient Visual Studio 2005 extensibility. It imposed a strict but well-made service/component model not unlike XNA, but had some more stuff that I thought could become necessary in a big XNA game project; two of those being service dependencies and injection in components and services. So I went ahead an implemented them in an XNA context.

The principle is an extension to the current service/component model of XNA :

  • Components should be “pluggable” and react properly when removed from or appended to a project;
  • The only communication points between a component and its environment are the services it obtains from the game context.

This already works in a typical XNA project. My problem was that if a component needs a service to behave properly, it has no direct means of telling the game, so if the game can’t provide it, the component will throw a NullReferenceException when using the null service that the service collection returned.

Continue reading Service Dependencies in Game Components

A Generic Helper For XNA Services

Sick of lines like these in your Game Components? :

graphicsService = (IGraphicsDeviceService)Game.Services.GetService(typeof(IGraphicsDeviceService));

Well, me too. And since my efforts to bring it up in the XNA forums were a bit fruitless, I decided to give it a shot.

Here’s a pretty small snippet that will save you tons of typecasts and typeofs.
Just enclose with your favourite namespace declaration.

Continue reading A Generic Helper For XNA Services