Tools – The Instruction Limit

Squaring The Thumbsticks

Update (Oct. 26th) : Better interpolation = better shaping, and the simpler “inscribed square” method.
Update (Nov. 4th) : Exact and customizable solution!

The input values of thumbsticks on an Xbox 360 controller are always contained inside the unit circle, a disk centered on the origin and whose radius is 1. This is usually fine, because you want to interpret the values as a vector inside that circle… but if you want to provide an analog version of a D-Pad, for instance to control a character in a 2D platformer, then you’ll find that your character will never reach maximum horizontal speed unless you hit the (1, 0) or (-1, 0) vectors, exactly right or left. Anything tilted up or down will give a fractional speed, and if you hit a diagonal corner, you’ll end up with ±√½.

So what if the analog stick was a square area instead, such that the entire circumference has a least one component maxed out, like the contour of a square?
It may seem like a simple thing to do, but it proved to be a lot of trouble, ~~and in the end I chose to use an approximation~~ (nope, it worked out in the end!)… but here’s how I researched the problem.

Attempt #1 : Wrong way around

I first googled “Mapping circle to square“, and ended up on a blog entry that describes the inverse transform. Fine, I’ll just invert it, isolate the x and y variables as a function of x’ and y’… But that proved to be a problem.

Wolfram|Alpha doesn’t know what to do with it, and after making it more manageable (for x, for y), it gives 4 different answers that seem to work only for specific intervals, but doesn’t say which…
I tried doing it by hand and was stuck early on, because I suck at this.

And since I don’t have access to (nor know how to use) proper nonlinear equation solvers, I decided to give up.

Attempt #2 : Projections

By googling some more and varying search terms, I stumbled upon a cool flicker image titled “Conformal Transformation: from Circle to Square“. That’s exactly what I need!
But alas, it involves something called elliptic integrals and this also goes beyond my math understanding, or the amount of effort I was willing to put in this. The description also links to the Peirce quincuncial projection, which maps a sphere to a square, but the math also goes above my head (complex numbers arithmetic and again, elliptic integrals).

At this point I realized that what I was trying to do was not trivial.

Attempt #3 : Intuition serves best

So I just went back what I usually do : analyze the problem, determine what I expect as a basic solution, and work my way there.

Problem : The smallest horizontal component I end up with is ±√½, or ±~0.707, when the θ angle is π/4, 3π/4, 5π/4 or 7π/4.
Solution : Scale this to ±1, so by a factor of √2.

The scale for angles close to diagonals should fall down to identity (1) when it reaches a cardinal direction, though I’m not sure about the interpolation curve… Linear should do fine?

(code removed because it’s kinda shitty)

And for additional fun and testing, here’s how evenly-spaced random points inside the unit disk stretch to the unit square using this function (points generated by my Uniform Poisson-Disk code) :

Linear interpolation

Not bad? Good enough for me.

Attempt #4 : Interpolation and variations

I then tried different interpolation/easing methods for the scaling factor, and tried raigan’s idea of just scaling and clamping linearly the circle to a square, here’s what it looks like!

Quadratic interpolation

Inscribed square

I tried sine, quadratic and circular ease in/out/in-out from my easing functions library, and quadratic ease-in was the most balanced. It almost shapes everything to a square, very little clamping is necessary, so I’ll keep that in.

The inscribed square method is nice and simple, but cuts off a lot of values and keeps circular gradients in its area. But in the real world, for 2D platformer input, it’s probably just fine…

Attempt #5 : Epic win

So, I decided to give another shot at this after speaking on the bus with a friend about polar coordinates and trigonometry. Turns out there is a more exact solution, and by adding a “inner roundness” parameter you can also tweak how much of the center input will remain circular, while still touching the sides for circumference values.

Here is my final C# function, then I’ll explain how it works :

static Vector2 CircleToSquare(Vector2 point)
{
	return CircleToSquare(point, 0);
}
static Vector2 CircleToSquare(Vector2 point, double innerRoundness)
{
	const float PiOverFour = MathHelper.Pi / 4;

	// Determine the theta angle
	var angle = Math.Atan2(point.Y, point.X) + MathHelper.Pi;

	Vector2 squared;
	// Scale according to which wall we're clamping to
	// X+ wall
	if (angle <= PiOverFour || angle > 7 * PiOverFour)
		squared = point * (float)(1 / Math.Cos(angle));
	// Y+ wall
	else if (angle > PiOverFour && angle <= 3 * PiOverFour)
		squared = point * (float)(1 / Math.Sin(angle));
	// X- wall
	else if (angle > 3 * PiOverFour && angle <= 5 * PiOverFour)
		squared = point * (float)(-1 / Math.Cos(angle));
	// Y- wall
	else if (angle > 5 * PiOverFour && angle <= 7 * PiOverFour)
		squared = point * (float)(-1 / Math.Sin(angle));
	else throw new InvalidOperationException("Invalid angle...?");

	// Early-out for a perfect square output
	if (innerRoundness == 0)
		return squared;

	// Find the inner-roundness scaling factor and LERP
	var length = point.Length();
	var factor = (float) Math.Pow(length, innerRoundness);
	return Vector2.Lerp(point, squared, factor);
}

Inner roundness = 0

Inner roundness = 1

Inner roundness = 5

There are two concepts here :

The scaling factor can be calculated for every theta angle without any need for interpolation. It happens to be the inverse of the sine or cosine of the angle, depending on which wall you’re trying to clamp to. This gives a perfect square all around.
By interpolating relatively to the length of the input vector, it’s possible to preserve the circular shape in the center and still clamp to the sides for extreme inputs.

I think I’m done with this now! Yay.

A Render/SamplerState Stack and Manager for XNA

I’ve been meaning to make something like this for a long time, and I know that some engines like Xen offer a viable solution already, but I wanted to make my KISS solution that interferes with the standard XNA interface as little as possible.

It’s a pretty lengthy snippet of code, so I won’t paste it all here, but basically there’s three classes : the StateStack, ManagedRenderStates and ManagedSamplerStates.

Description

StateStack is a static extension class to the GraphicsDevice. It could be just another static class, or even a global XNA service, but since it’s going to be used in combination with the GraphicsDevice most of the time, I thought it was logical to use extensions.
You can push, pop and peek the current state, which takes and returns ManagedRenderState objects. It’s very similar to how OpenGL works with the attribute stack (glPushAttrib), except you don’t choose what you push, the entire state gets pushed/popped everytime. I didn’t think isolating states in categories was useful or natural in XNA.
The ResetState and CommitState methods are shorthands to the same methods in ManagedRenderState.

ManagedRenderState is like a wrapper over a RenderState object, such that it has the same interface : a read/write property for each render state and access to the SamplerState collection.
The difference is that it tracks changes to render states as you make them, marks properties as dirty and applies only what’s needed when you decide to Commit. The Reset method refreshes the object with the current state of the GraphicsDevice; this needs to be done at the beginning of every draw call, or everytime you think the state has been tampered with.

ManagedSamplerState does the same thing but for sampler states : addressing, filtering, etc. You don’t call Reset or Commit on it directly, it gets done automatically by its host ManagedRenderState.

I added blending mode and texture filtering helper methods to both State classes, because I end up using that a lot… and with proper state management, you don’t have to worry about redundant assignments! So you can be permissive and set everything you need, every time.

Usage

So what does it look like to use it?
The Game class needs to do some initialization and per-draw-call stuff first…

protected override void Initialize()
{
    GraphicsDevice.PushState();

    // Other stuff
    base.Initialize();
}

protected override void Draw(GameTime gameTime)
{
    GraphicsDevice.ResetState();
    SetDefaultRenderStates();

    GraphicsDevice.Clear(Color.CornflowerBlue);
    base.Draw(gameTime);
}

void SetDefaultRenderStates()
{
    var rs = GraphicsDevice.PeekState();

    rs.DepthBufferEnable = true;
    rs.SetBlendingMode(BlendingMode.Alphablending);

    rs.SamplerStates[0].SetFiltering(FilteringMode.Anisotropic);
    rs.SamplerStates[0].AddressU = rs.SamplerStates[0].AddressV = TextureAddressMode.Wrap;

    GraphicsDevice.CommitState();
}

Then to use the state stack, you don’t ever touch the GraphicsDevice.RenderState object : always use the Managed equivalent.

// Push a copy of the state on the stack
var rs = GraphicsDevice.PushState();

// Always-on-top, and don't disturb the depth buffer, and no culling
rs.DepthBufferFunction = CompareFunction.Always;
rs.DepthBufferWriteEnable = false;
rs.CullMode = CullMode.None;

// Commit changes
GraphicsDevice.CommitState();

// Draw some stuff
Draw();

// Pop back to the last state
GraphicsDevice.PopState();
// We are now back to the last state

Download & Conclusion

I think it makes a lot of sense to stack states in a component system, because each component can push its own local state and work with it, and just dispose it afterwards. Add this to dirty-checking to remove almost all redundant state changes, and you’ve got a very usable system!

The next step is batching draw calls to keep calls that use the same RenderStates together… in my system, this is left to the user’s discretion. Kevin Gadd presented a way of doing this (and many other things including threaded rendering) on his blog.

Another rather important note : ManagedRenderState contains a MaxSamplers constant that you can tweak depending on the number of sampler states you know you’ll use. Leaving it at 16 will make update/reset/refresh operations kind of slow… not sure yet if it’s noticeable.

And because of the immense amount of copy-paste that’s been needed to produce these classes, I can’t promise that it doesn’t have a typo or two… I haven’t tested every single state. But up to now, it works great, and I’ll update it if needed.

StateStack.cs (2 kb – C# class)
ManagedRenderState.cs (30 kb – C# class)
ManagedSamplerState.cs (7 kb – C# class)

Bonuses in the code files : a method to unpack a packed color from uint to Color, and a simplistic object Pool implementation (to eliminate garbage when stacking up states).

Hope you find it useful!

Fast Uniform Poisson-Disk Sampling in C#

Updated July 26th : Added a circle sampling method.

This appears to be tool-class weekend! Here’s another that I just finished up.

I’ve been wanting uniform poisson-disk distribution generation code for ages, ever since I started working with shadow map filtering in October 2006. I had difficulty decrypting the whitepapers that popped in the first results of Google at the time, so I just gave up for a little bit until… well… someone did a simple implementation for me to grab.

Thankfully, it happened!

The fine gentlemen at Luma labs (Herman Tulleken is credited for the class I converted) released a Java, Python and Ruby implementation of uniform and non-uniform poisson-disk sampling. It’s based on an oddly straightforward whitepaper by Robert Bridson at the UBC (in Canada!).

It took me under an hour to make it work in C#. This will prove very useful in my PCF filtering code, and I will be using it immediately in a Depth-of-Field shader I’m working on.

Download

UniformPoissonDiskSampler.cs (4 kb – C# class)

This one uses Vector2’s from the SlimDX namespace, but it would be easy to just replace them with XNA Vector2’s or TV3D TV_2DVECTOR’s.

The original Java code used a non-static class, I decided to make it static and use structures for settings and state. This will reduce garbage if it’s used repeatedly, and since I send the structures as references, it’s just as fast as instance variables. I feel it’s a bit cleaner too.

The pointsPerIteration parameter defaulted to 30 in the original code, I did this too and it works well. You can reduce it to have potentially less dense distributions but much faster generation, or make it higher to make sure the whole space is filled.

Again, I won’t post a sample because the class usage is as simple as can be : it returns a List<Vector2> in the specified domain.

Enjoy!

Flash-style Tween/Easing Functions in C#

Easing functions make any movement look pretty. So you’re going to need them at some point.

And implementations are so widely available that you don’t have an excuse not to use them. I found this ActionScript reference to be very helpful in creating my own C# easing functions library, which I’d like to share… so here it is!

First line is EaseIn, second is EaseOut, third is EaseInOut

Download

Easing.cs (4 kb – C# class)

It’s a static class and it’s framework-agnostic, completely standalone. The screenshot you see above is my TV3D test class. I won’t post the whole sample, but here’s the code if you want to see how I used it. In order to run, one would need a version of my components framework that I haven’t released yet.

I also didn’t implement each and every function that Robert Penner presents, just the ones I figured I’d use. Bounce and Elastic sound like physics to me, I don’t think I’d use easing functions to achieve that.

Loop Parallelism in a Game Context

An update to this post is available here.

Recently I was coding some physics-enabled particle systems and ended up with some fairly CPU-intensive stuff that looped as times as there are active particles in a system, and as many as there are active particle systems, each update cycle.

And it got choppy.

I don’t like choppy.

So I had three choices :

Offload to the GPU using GPGPU or Vertex Textures, but doing that in XNA with Shader Model 2 support is… time-consuming. Especially when you have no background work on the subject.
Profile and optimize my physics code, or switch to a physics package like Havok or Farseer, but I expect that would also take too much time.
Multithread using loop parallelism to use 100% of my CPU power instead of 50%! (I have a Core 2 Duo processor, and there’s more and more mainstream PCs with dual-, quad- and even octo-cores)

I ended up doing the latter because it was the simplest, and multithreading is rarely ever simple. But in some specific cases, it’s not much of a headache, really.

If your looped operation does not change the context of further loops, that is if every loop can be done in any order and individually without any write locking, it takes like 30 minutes to get it to work in C#, and with no concurrency problems. It’s like offloading static Web content to a different server, just a matter of routing to the right hardware.

Annoyances and interrogations

ParameterizedThreadStart is not generic. If you have to pass a context to your threaded operation, you have to cast it back to what it really is, at each iteration. It’s unnecessary and annoying… is it slow?
Once a thread dies, you can’t resurrect it. Correct me if I’m wrong, but FSM knows I’ve tried, and a thread that finished its execution cannot be re-run, even if its thread-start method is the same. You have to instantiate another. Does that affect performance…?

The problem with thread creation is that I create a lot of temporary threads. They live a single update cycle, and I instantiate one per particle system per update cycle, at about 60 cycles per second. It sounds like it could slow things down.

So I made a test!

Benchmark

The (fictional) situation is the following : in the middle of a game loop, you have to evaluate the 0th Order Modified Bessel Function Of The First Kind for some reason. And you need to do that for a large dataset, say 500 times per update cycle.

Actually, this kinda makes sense if you’re calculating weights for a very wide Kaiser filter, which I will probably cover later. I had code laying around, which is why I used the elegantly-named Bessel function.

So, this is slow, because it’s an integral. And so multi-threading would be beneficial. The different strategies are :

Single-Threading; it’s always good to know how slow it went before all optimization… if only for programmer ego.
Multi-Threading, using the ParameterizedThreadStart delegate, which means we’ll have to cast the Object to our real context type.
Multi-Threading, using a class-local, strongly-typed context and the (parameterless) ThreadStart delegate. This way we avoid the cast, and sacrifice a weird variable in the host class’ header.
Multi-Threading, using a generic Thread wrapper that does the strongly-typed context caching instead of laying it around. It’s basically a proxy for Thread with the context variable.
Multi-Threading, using a single “kept-alive” Thread that is started once for all update cycles, and that is forced into an artificial idle state between update cycles.

I hoped, even half-expected that these solutions would go from slowest to fastest. The last one was particularly interesting because it skipped all the thread instantiations and instead, relies on a Monitor and a sync-lock object to “wait” between two update cycles. It also produces much less garbage.

Results

Here’s the result for 500 update cycles, in which I update a 500-sized data-set, in Release mode, out of the IDE so without any debugging, and as little stuff as possible running in the background. I used a Stopwatch object to calculate these timings.

Test ‘Single-Threaded’ Started… Completed.
Time Elapsed : 27.8970399 s
Test ‘Multi-Threaded, ParameterizedThreadStart’ Started… Completed.
Time Elapsed : 18.1804179 s
Test ‘Multi-Threaded, Class-Local Context’ Started… Completed.
Time Elapsed : 19.5161083 s
Test ‘Multi-Threaded, Generic Context’ Started… Completed.
Time Elapsed : 20.2257278 s
Test ‘Multi-Threaded, Kept-Alive Thread’ Started… Completed.
Time Elapsed : 23.7845815 s

Of course, everything goes differently from what I expected. :P

The good news is, thread creation in .NET is very fast. No need to worry about that anymore. In fact, trying to circumvent that using a single kept-alive thread and monitor use makes it go almost 30% slower!

Also, casting the object context every iteration is faster than any other strongly-typed alternative. Unless the word “Object” in your code makes you want to shoot someone, it’s the best way to go. Kind of sad, that.

Here’s the C#3.0 (VS.NET 2008) test project : Loop Threading Performance (24 Kb)

Sorry again for the absolute absence of comments, it’s late and I don’t feel like it. D:

A Note On Debugging

One thing that’s slightly alarming and certainly worth mentioning, is that the performance in the IDE (with Debugging, but not necessarily in “Debug” mode) is hugely different.

In debugging, the “kept-alive” thread is almost always faster than all other strategies, by as much as 15%. This never replicates in the real-world. So kids, always test out of the IDE, or add the empty green arrow (Start Without Debugging, Ctrl+F5) in your Visual Studio toolbar!

Hash tables and mutable keys, part one

In this little series of posts (probably two three), I’m going to address a problem that has happened to me yesterday and was a real blocker in my algorithm : how to get a hash table (HashSet or Dictionary) to work with keys that change their hash code over time.

Continue reading Hash tables and mutable keys, part one

Effect Compiler & Disassembler

Updated! Now supports nVidia’s ShaderPerf tool.

Downloads

EffectCompiler.zip [51.3kb] – XNA Game Studio Express 2.0 (Visual C# 2005 Express), Source + Binaries

Description

Yesterday I took apart my Effect Compiling Tool which took a HLSL shader and converted it to Windows/Xbox360 bytecode, and made it into something more useful outside of XNA.

It’s always been somewhat of a hassle for me to compile and disassemble HLSL shaders. I can edit them pretty well in Visual Studio with code coloring and tabulations/undo’s/whatnot, but to compile them I always had to go with something else. I had read in the book Programming Vertex and Pixel Shaders by W.Engel how to compile them in VC++ 2005 using a Custom Build Step and fxc.exe, but when working in C# I had to have a parallel C++ project just for shaders, which is dumb. Also, fxc.exe has become less and less stable for some reason… So I finally made my own compiler and disassembler using XNA 2.0.

Continue reading Effect Compiler & Disassembler

Making an installer for an XNA GS 2.0 game

There is an ongoing thread on the XNA Creators Club forums about how to create an installer using Inno Setup, a free and very capable installer program.

The script described by Pelle in the first posts of this thread works great for XNA GSE 1.0 and Refresh, but is problematic for XNA GS 2.0 games because of the dependency on .NET 2.0 Service Pack 1. I won’t bring much new information in this post, but will try to summarize everything I read and tried it into a tested & true solution that I’m currently using for the Fez installer.

Please note that this solution does not work if your game requires use of the GamerServicesComponent of live networking using XNA, since you need the whole Game Studio for it to work. (see this post)
Also, it wasn’t made with 64-bit operating systems in mind, so if you’re going to support them you’ll have to modify it. It has been known to work on Vista though.

Continue reading Making an installer for an XNA GS 2.0 game

A Generic Helper For XNA Services

Sick of lines like these in your Game Components? :

graphicsService = (IGraphicsDeviceService)Game.Services.GetService(typeof(IGraphicsDeviceService));

Well, me too. And since my efforts to bring it up in the XNA forums were a bit fruitless, I decided to give it a shot.

Here’s a pretty small snippet that will save you tons of typecasts and typeofs.
Just enclose with your favourite namespace declaration.

Continue reading A Generic Helper For XNA Services

XNA Effect Compiling Tool

Downloads

EffectCompilingTool.rar [41.5kb] – XNA GSE 1.0 Refresh Project w/ Binaries

Description

I love the content pipeline in XNA, but for effects, like the author of the XNA Effect Generator understood and mentioned already, it’s just not practical.

I liked the XNA Effect Generator too, but I wanted to implement my own effect framework so the generated classes were not what I was looking for either. Besides I had some problems with it (localization issues, dreaded french accents…). So I made a little tool that gets the Xbox 360 and Windows byte-code out of a shader in a convenient format and presentation.

Continue reading XNA Effect Compiling Tool