Behind Fez : Collision and Physics

Here’s the third part of the “Behind Fez” series (part 1, part 2), in which I’ll try to explain how collision detection and physics work in Fez.

And yes, Fez is still actively developed in all areas. Making a game on your own : IT’S HARD.

Collision Engine as of Early 2008

Back when we made the IGF 2008 build, we had at least two massive limitations that made culling and collision detection very simple :

  1. The world was completely static. No moving platforms, no physics except for the player sprite, you can assume that if something is at one place at level-loading time, it’ll stay there until you quit the game. (as a matter of fact there WAS only a single level, but that’s another topic)
  2. Everything was aligned to the world grid. Everything took up a full cell worth of collision boundaries, nothing bigger or smaller than 16x16x16 trixels – exactly one trile.

The super-blocky world of Fez circa 2008

This allowed me to only calculate collision detection of the player’s collision rectangle (for each of its 4 vertices, point-to-line) whenever it traversed a world grid “line”, and since this was done very rarely, optimization was not an issue, and I went with the most intuitive and naive way possible.

Consider the world as a 3D array (or whatever indexed data structure you can think of) with filled or empty spaces, and each filled space containing visual and physical information. Visual information consists of the polygonal mesh, the textures, etc. Physical information defines if that trile should collide with the player, and from which of its 2D boundaries.
We decided early on that the three possible “collision types” are : no collision, top-only collision (for fall-through/climbable platforms) and all-sides collision (for blocking level boundaries or obstacles).

The three different collision types as seen in Fezzer

This way of mapping world entities with their collision information is elegant because the level designer doesn’t need to paint a separate collision map, or add invisible objects that act as colliders. It also means that any change you make to the level visually is propagated physically to how it plays.

Fez is obviously played from a 2D perspective. The collision results must match what the player sees, and visibility works front-to-back, with only the top-most layer being visible and active.
Knowing the collision type of each and every space (if filled), it’s easy to find the 1D “row” of possible colliders if you have the 2D screen coordinates in hand. Then you just traverse front-to-back, and the first hit is kept, at which point you can early-out from the loop.

Depth “Collision”

So now I know what’s blocking the player in 2D. But we had to make additional rules for the Z position or depth of the player, so that the game would behave like a 2D platformer AND still make sense in the 3D world :

  1. Gomez should stay visible. He should stay on-top of the world geometry as long as he doesn’t rotate the viewpoint. This is done by correcting the depth such that Gomez stands right in front of the geometry.
  2. Gomez should never walk in mid-air. In 2D this is solved by the collision detection, but in the remaining axis it needs to be enforced, such that Gomez stands on the platform nearest to the camera (this is an arbitrary rule-of-thumb that we chose).
  3. Otherwise, don’t change Gomez’s depth for no reason. The player expects it not to change. It’s really easy to get lost in Fez, and if the engine messes up the little spacial perception you’ve got left, it’s not fair anymore.

A hilarious mock-up that shows rules 1 & 2 of depth adjustment

The player will never see that Gomez moves around in the Z axis because the view is flattened and it has absolutely no depth perception, so we can do all we want to ensure that rules 1 & 2 are enforced.

Breaking the Grid (Late 2008 to Late 2009)

So that was good until we decided to implement crate physics, moving platforms, offset triles and variable trile size. Then, this happened :

  1. Every rule defined above has to be tested every time the player moves. If the triles aren’t aligned to the world grid, the “only test when a grid line is traversed” trick won’t work anymore.
  2. For both culling and collision, the world grid stops being an exact reference of how the world appears/behaves, and more of a helper structure where more than one trile can be in a cell, and some triles overlap many cells.
  3. Collision stops being specific to the player, it needs to be generalized in order to support particles and other objects that should have all the same 2D/3D tricks.

The many joys of offset and oddly shaped triles

None of these problems is trivial, but the hardest by far to implement was #2. Thing is, I didn’t want to throw away everything I did and start over. So I made small, incremental changes until the new features were supported. And just by then I had ~1.5 years worth of C# code to maintain…

To explain my final approach, I need to specify that “variable trile size” does not mean that a trile can be bigger than 16x16x16, only that its collision volume can be smaller.
With that in mind, here’s what I did :

  • A trile always has a majority of its volume within a single world cell, even if it’s oddly shaped or positioned arbitrarily. In other words, a single cell holds the center of a trile. This world cell is where it’s stored.
  • When colliding a vertex of a collision rectangle to the world, look up the 4 nearest 1D “rows” (in 2D screen space) of possible collider cells from the world grid. Traverse front-to-back each of those rows, and test if one of the triles contained at each level ACTUALLY collides with the point, taking in consideration the trile’s positional offset and size. The 4 neighbour rows need to be tested because triles within these rows may exceed the cell boundaries by up to 50%!
  • When triles move, update their location within the world grid only if the center changed to a new cell.

So everything’s covered, we’re good! Right?


But it was slow as molasses. I do many, many more checks than I did before, and especially on the Xbox where the JIT compiler is less efficient, all those random accesses killed the game’s performance. Truly a case of CPU/Memory bottlenecking.

This section is a work-in-progress… As long as I’ll be maintaining/developing the game, I’ll worry about it going too slow. But here’s the steps I’ve taken up to now :

  • After every camera rotation, cache the nearest and farthest trile for each screen-space world grid location. This way, I don’t have to loop through the entire level boundaries and test for trile presence, I know that within these cached bounds, I have data. Parts of the cache need to be invalidated every time an object moves. The caching process of the whole level has to be done in another thread while the rotation happens, else it pauses the game for ~250ms… And threading is a headache.
  • Simplify the algorithm for particles and other small objects. The player won’t notice if particles physics aren’t 100% accurate; I can reduce the collision points to a single centered one, and ignore some rules.
  • All the standard optimization techniques… Avoid dynamic memory allocations. Ensure cache spacial locality (still struggling with this one). Start up ANTS Profiler, find a bottleneck, eliminate it, rinse, repeat.

The problem with the world being so dynamic is that I can’t precache everything. I certainly can’t precache the collision result of every pixel of the screen everytime a viewpoint rotation occurs.
Separating dynamic objects from static objects and treating their collisions separately is something I’d like to try if necessary. But it means so many changes to the current system that it scares me a little bit.


In Mid 2008, we decided to implement something called Big Art Objects, which are like triles but bigger than 16³ (they can go up to 128³). They are sculpted like triles, but they don’t have any collision information attached to them, because they stick out of the “world grid” system.

A tree art object, with its collision triles faded in. (T = top-only, π = no-collide)

To make them look like they’re standard world objects, we fill them with invisible collision triles. (yes, I said we wouldn’t need those, but that’s a special case :P)
It’s worth it in the end, because they look fantastic and break the mold of lego-like blocky structures.

Another common exception is what we call immaterial triles. They’re no-collide triles that ALSO don’t make Gomez go in front of them. Strands of grass can pass in front of Gomez, it just looks better that way.

I could go on about the other exceptions, but then I’d reveal features that we haven’t announced or shown. So I’ll just stop now and let your imagination do the rest. :) Scrobble Fetcher & Mapper

Update May 15th 2016 : New v3.0 build on the GitHub project page with many fixes and additions.

Update August 26th 2010 : New v2.1 build on google code that fixes the “Invalid XML Characters” issue! Try that one if you had errors when fetching scrobbles.

Updated June 26th 2010 : This project is now hosted on Google Code! I’m new to this so bear with me, but I’ll try to make it nice so that people can contribute. (Google Code no longer exists and the project is now on GitHub)

Updated March 5th 2010 : You shouldn’t need to install both iTunes and WMP for it work, just the one you want to use! Finally.

Original March 15th 2009 post follows.


Get the freshest releases (source or binaries) on the GitHub project page.


The Scrobble Fetcher & Mapper does exactly that. It fetches all your scrobbles from your (or anyone’s, really) account, assembles them in terms of Play Count and Date Last Played for each music track, and then exports this data to either Windows Media Player 11 or iTunes.

I’ve been working on this application on and off for some time now and I think it’s ready for deployment. I built it because I have quite an extensive music library, I like my playlists smart and automatic, and I have poor luck with music players, hardware failure and vendor lock-in.

It seems that the “Play Count” and “Date Last Played” metadata fields are not part of a file’s ID3 tags, but rather is stored in the player’s local library. This is usually fine, and probably more efficient than writing to files all the time, but it means that if your player’s database gets corrupted (as happened to me with Windows Media Player) or that you decide/are forced to use another player such as iTunes because your iPhone doesn’t want to sync with anything else, then you’re screwed. And same thing of course if you reinstall your machine or get a new one.

I feel that this metadata is important because I like to have automatic “best-of” playlists that are based upon it, and sometimes it’s nice to listen to a comfortable, time-proven playlist.

Thankfully, if like me you’ve been using the services since they were called Audioscrobbler, you’ve gathered an impressive amount of playback information in your account over the years. And using their web services, and my application, now you can take it back home!


I wanted this application to be my cleanest, most error-tolerant and most efficient piece of work yet in application design. I also tried to exploit all C# 3.5 features, having accumulated some months of experience with LINQ, lambda expressions, etc.

Here’s a rundown of its tech features :

  • Multi-threaded

    • Uses the Parallel Extensions for .NET June 2008 CTP and a little home-made framework for progress-reporting asynchronous foreground tasks.
    • Most operations will use multiple cores thanks to the Task Parallel Library’s capabilities, and will seldom lock because of its lockless concurrent data structures.
  • WCF-enabled

    • Uses the Windows Communication Foundation classes in .NET 3.5 to communicate with’s RESTful API.
    • All the response data objects are deserialized automatically from XML using the built-in XmlSerializer.
  • Error tolerant

    • Since track data will not always match your library’s perfectly, I used the Levenshtein distance string metric and “neutralized” strings (removes all accentuated characters, symbols, whitespace, etc.) to get accurate matches.
    • Will (should!) recover gracefully from errors, both caused by user input or unexpected conditions. It also allows graceful cancelation of long-running operations.


scrobblemapper9 scrobblemapper8scrobblemapper5 scrobblemapper3

Closing notes

– This code uses a Community Technical Preview version of the Parallel Extensions for .NET, and one that is almost a year old… So it’s delivered as-is, and you cannot (and should not) use that library in commercial products. Although I have functionally tested the program pretty extensively, I cannot guarantee it’s not going to corrupt your library or overwrite some information. Use at your own risk!

– I noticed that iTunes checks if the file is writable before setting any metadata concerning it. So you should make your files writables unless you’ll get a ton of errors in the error log after the scrobble-mapping.

– You need to keep the iTunes instance running for the iTunes mapping to work. The COM interface requires it to be open, but you can minimize it to the tray.

– The code is released under the Creative Commons Attribution-Share Alike 2.5 license.

If you encounter anything unusual, if you have ideas for extensions to this program or have comments/questions about how it works, do ask! That said I will probably blog some more about parts of this program, like the WCF usage or the asynchronous task classes. Enjoy!

Smart Screen-Space Blurring for Shadows

I’ve been working on a screen-space solution for blurring shadows that has the following features :

  • Variable-sized kernel based on surface distance and view angle
  • Rejection of samples if they lie on a non-contiguous surface, by identifying depth and normal discontinuities
  • Bilateral, two-pass blur filter to maximize the possible kernel size
  • Vertex and pixel shader 3.0, so no worry about 64 instructions in the PS

…but also works with the following constraints, to simplify matters :

  • Flat surfaces only, no curves! (to simplify the sample rejection process)
  • Full control over the rendering pipeline, so the “main render” pass can output weird values in order to be properly blurred

I quickly realized that it’s very hard or impossible to do a variable-sized Gaussian filter in real-time. If you change the kernel size, you change the standard deviation, and you need to recalculate the weights… this is too heavy for a pixel shader. So I chose to use a box filter with uniformly-spaced samples.
I totally would’ve liked to use Poisson-Disk distribution, but it’s not doable in a two-pass bilateral scenario. And you can’t achieve big kernels in real-time without separating the process in two passes.

My XNA3 implementation currently uses no additional render targets (!) but just a “resolve texture”, and resolves from the main buffer quite a lot. A R32F (Single) render target is used for the shadowmapping process itself, but otherwise everything’s done with a A8R8G8B8 (Color) main buffer.
The shadowmapping solution is standard orthographic/directional depth testing, but I’m using Exponential Shadow Maps (ESM) to simplify the depth biasing problems. I could never get my hands on a really good way to do Slope-Scale Biasing for standard depth testing, so ESM saves the day here. Otherwise, there’s no fancy cascading, splitting or projection tricks.

Here’s how it looks at different distances. The blur kernel stays approximately the same size in world-space even if the whole process is screen-space!

I’ll keep working on a clean sample to show, and I’ll definitely release the HLSL code if I can’t release the source to the whole thing. Stay tuned!

Common Kanji Character Ranges for XNA SpriteFont Rendering

Note : This sample is practically useless, because the XNA Localization sample has a much better alternative using the Content Pipeline and character detection from resource files, which works for any language (Chinese, Korean, Japanese…). But I guess if you wanted to get the ranges of common Kanji, here’s how.

While working on Japanese language support in XNA, I realized a couple of things about Japanese writing (some of which may seem obvious, but wasn’t for me) :

  • There’s two broad character sets : Kana (syllabic) and Kanji (logographic)
  • Kana has two modern subsystems or components, Hiragana and Katakana, each with two distinct Unicode regions of respectively 92 and 95 different glyphs (187 total)
  • Kanji originate from Chinese “Han” characters, and are stored within the CJK (Chinese, Japanese, Korean) portion of Unicode. But CJK characters don’t uniquely reference Japanese logographs, and it contains over 20000 glyphs!
  • There’s over 10000 actual Japanese kanji, but only about 2000 of which every high-school grade Japanese person should know

In XNA, the SpriteFont class and its associated content pipeline use bitmap fonts internally to cache and render text strings. It becomes obvious that generating a bitmap font with 20000 Han characters would take a very long time, and is also very irresponsible memory usage. Even 10000 characters seems ridiculous.
I wanted to keep using SpriteFonts, so switching to a realtime font rendering option like FreeType was out of question. So how does one make bitmap fonts usable in Japanese?

While researching the subject, I stumbled upon a whitepaper called “Unicode and Japanese Kanji” by Tony Pottier, in which he discusses how to isolate Japanese Kanji from the CJK characters, and even dresses a list of all 1946 unique characters that are learned in Japanese education up to grade 7 (sorted by Unicode point, or by learning grade). Even if it’s a large amount of glyphs, it’s a lot more reasonable than 10k.

So the only remaining step is to make this table into a a list of XML CharacterRegion elements so that we can use them in an XNA SpriteFont declaration.
I made a little C#3 program that takes a list of Kanji, one per line, and blurts out the expected XML; it also joins the succeeding characters into regions to save space.

using System;
using System.IO;

namespace KanjiFinder
    static class Program
        static void Main()
            var output = new StreamWriter("regions.xml");
            var input = File.ReadAllLines("kanjis.txt");
            var writeCount = 0;
            var intervalsCount = 0;

            var start = (int)char.Parse(input[0]);
            var end = start;
            foreach (var line in input)
                var cur = (int)char.Parse(line);
                if (cur - start > 1)
                    output.WriteLine(string.Format("<CharacterRegion><Start>&#x{0:X4};</Start><End>&#x{1:X4};</End></CharacterRegion>", start, end));
                    writeCount += end - start + 1;
                    start = cur;
                end = cur;
            output.WriteLine(string.Format("<CharacterRegion><Start>&#x{0:X4};</Start><End>&#x{1:X4};</End></CharacterRegion>", start, end));
            writeCount += end - start + 1;


            if (writeCount != input.Length)
                throw new InvalidOperationException();

Here’s its input kanjis.txt (in Unicode format), and its result is regions.xml.

I chose to go up to Grade 7, but one may choose to ignore Grade 7 characters and just do 1-6. I don’t know whether Grade 7 characters are useful in game menus and usual dialogue.

All that’s left is to put those regions in a <CharacterRegions> tag inside a .spritefont file, and supply a valid Japanese font! Thankfully, Windows 7 comes with a bundle of these (MS Gothic, Kozuka, Mieryo and Mincho) and the M+ Fonts offer a public domain alternative.

Apart from the Kanji regions list, a couple more regions you’ll probably need :

<!-- Ideographic Symbols and Punctuation -->
<!-- Hiragana -->
<!-- Katakana -->
<!-- Fullwidth Latin -->
<!-- and/or the standard Latin set... -->

That’s it! Hope it helped.

Reflective Floor

Nov. 13th Update : Added fresnel term, min/max reflection factors, pixel-perfect sampling.

Back in TV3D land for a minute!


(the reflection looks offset in these pics but it’s fine in realtime… not sure why it messed up here.)


It might sound like an easy thing to do, but planar reflections are pretty challenging to do per-pixel. It involves a texture projection shader, some pretty scary matrix play, and so on and so forth. So I decided to make a proper sample that shows how I did it in my reflective water sample, but in its simplest expression.

The floor uses two mesh groups, one for the reflection and one for the actual floor with texture. This makes it possible to only use a shader on the reflection group, and then the other one uses standard TV3D rendering.
Sadly, it’s impossible to do per-pixel reflections without a shader. Thinking about it, an even simpler way would be to use the stencil buffer, but then if you’re working with pixels you can apply fun effects like bump-mapping and fresnel attenuation.

Update Notes

The updated version has fresnel per-pixel attenuation, which means that grazing angles will get full reflection and looking down on the floor will get next to none. The amount of reflection and the effect of the fresnel term can be tweaked with constants.
In order to make this thing work, I had to make it 3-pass :

  1. The first pass is the standard texture mapping one, without shader;
  2. The second pass multiplies the inverse of the fresnel term with the texture-mapped mesh, so that the reflection won’t be overpowering when we blend it in;
  3. The third and last pass additively-blends the reflection with fresnel applied.

The two last passes use the shader, but since both shader passes are enabled, I only need a single group for both. Nice and clean!

I also removed the texel offset thing because if you point-sample the texture (which you should, unless you’re bump-mapping), it’s pixel-perfect without any needed fix. And the blending modes and depth-write flags are moved to the shader, to make things even cleaner!


Binaries + Code : (8.7 Mb – C#3 VS2008 Solution)

Just the code : Projection.fx (the shader) and PlaneReflection.cs (the component that draws everything)


Squaring The Thumbsticks

Update (Oct. 26th) : Better interpolation = better shaping, and the simpler “inscribed square” method.
Update (Nov. 4th) : Exact and customizable solution!

The input values of thumbsticks on an Xbox 360 controller are always contained inside the unit circle, a disk centered on the origin and whose radius is 1. This is usually fine, because you want to interpret the values as a vector inside that circle… but if you want to provide an analog version of a D-Pad, for instance to control a character in a 2D platformer, then you’ll find that your character will never reach maximum horizontal speed unless you hit the (1, 0) or (-1, 0) vectors, exactly right or left. Anything tilted up or down will give a fractional speed, and if you hit a diagonal corner, you’ll end up with ±√½.

So what if the analog stick was a square area instead, such that the entire circumference has a least one component maxed out, like the contour of a square?
It may seem like a simple thing to do, but it proved to be a lot of trouble, and in the end I chose to use an approximation (nope, it worked out in the end!)… but here’s how I researched the problem.

Attempt #1 : Wrong way around

I first googled “Mapping circle to square“, and ended up on a blog entry that describes the inverse transform. Fine, I’ll just invert it, isolate the x and y variables as a function of x’ and y’… But that proved to be a problem.

Wolfram|Alpha doesn’t know what to do with it, and after making it more manageable (for x, for y), it gives 4 different answers that seem to work only for specific intervals, but doesn’t say which…
I tried doing it by hand and was stuck early on, because I suck at this.

And since I don’t have access to (nor know how to use) proper nonlinear equation solvers, I decided to give up.

Attempt #2 : Projections

By googling some more and varying search terms, I stumbled upon a cool flicker image titled “Conformal Transformation: from Circle to Square“. That’s exactly what I need!
But alas, it involves something called elliptic integrals and this also goes beyond my math understanding, or the amount of effort I was willing to put in this. The description also links to the Peirce quincuncial projection, which maps a sphere to a square, but the math also goes above my head (complex numbers arithmetic and again, elliptic integrals).

At this point I realized that what I was trying to do was not trivial.

Attempt #3 : Intuition serves best

So I just went back what I usually do : analyze the problem, determine what I expect as a basic solution, and work my way there.

  • Problem : The smallest horizontal component I end up with is ±√½, or ±~0.707, when the θ angle is π/4, 3π/4, 5π/4 or 7π/4.
  • Solution : Scale this to ±1, so by a factor of √2.

The scale for angles close to diagonals should fall down to identity (1) when it reaches a cardinal direction, though I’m not sure about the interpolation curve… Linear should do fine?

(code removed because it’s kinda shitty)

And for additional fun and testing, here’s how evenly-spaced random points inside the unit disk stretch to the unit square using this function (points generated by my Uniform Poisson-Disk code) :

Linear interpolation

Not bad? Good enough for me.

Attempt #4 : Interpolation and variations

I then tried different interpolation/easing methods for the scaling factor, and tried raigan’s idea of just scaling and clamping linearly the circle to a square, here’s what it looks like!

Quadratic interpolation

Inscribed square

I tried sine, quadratic and circular ease in/out/in-out from my easing functions library, and quadratic ease-in was the most balanced. It almost shapes everything to a square, very little clamping is necessary, so I’ll keep that in.

The inscribed square method is nice and simple, but cuts off a lot of values and keeps circular gradients in its area. But in the real world, for 2D platformer input, it’s probably just fine…

Attempt #5 : Epic win

So, I decided to give another shot at this after speaking on the bus with a friend about polar coordinates and trigonometry. Turns out there is a more exact solution, and by adding a “inner roundness” parameter you can also tweak how much of the center input will remain circular, while still touching the sides for circumference values.

Here is my final C# function, then I’ll explain how it works :

static Vector2 CircleToSquare(Vector2 point)
	return CircleToSquare(point, 0);
static Vector2 CircleToSquare(Vector2 point, double innerRoundness)
	const float PiOverFour = MathHelper.Pi / 4;

	// Determine the theta angle
	var angle = Math.Atan2(point.Y, point.X) + MathHelper.Pi;

	Vector2 squared;
	// Scale according to which wall we're clamping to
	// X+ wall
	if (angle <= PiOverFour || angle > 7 * PiOverFour)
		squared = point * (float)(1 / Math.Cos(angle));
	// Y+ wall
	else if (angle > PiOverFour && angle <= 3 * PiOverFour)
		squared = point * (float)(1 / Math.Sin(angle));
	// X- wall
	else if (angle > 3 * PiOverFour && angle <= 5 * PiOverFour)
		squared = point * (float)(-1 / Math.Cos(angle));
	// Y- wall
	else if (angle > 5 * PiOverFour && angle <= 7 * PiOverFour)
		squared = point * (float)(-1 / Math.Sin(angle));
	else throw new InvalidOperationException("Invalid angle...?");

	// Early-out for a perfect square output
	if (innerRoundness == 0)
		return squared;

	// Find the inner-roundness scaling factor and LERP
	var length = point.Length();
	var factor = (float) Math.Pow(length, innerRoundness);
	return Vector2.Lerp(point, squared, factor);
Inner roundness = 0

Inner roundness = 1

Inner roundness = 5

There are two concepts here :

  1. The scaling factor can be calculated for every theta angle without any need for interpolation. It happens to be the inverse of the sine or cosine of the angle, depending on which wall you’re trying to clamp to. This gives a perfect square all around.
  2. By interpolating relatively to the length of the input vector, it’s possible to preserve the circular shape in the center and still clamp to the sides for extreme inputs.

I think I’m done with this now! Yay.

A Render/SamplerState Stack and Manager for XNA

I’ve been meaning to make something like this for a long time, and I know that some engines like Xen offer a viable solution already, but I wanted to make my KISS solution that interferes with the standard XNA interface as little as possible.

It’s a pretty lengthy snippet of code, so I won’t paste it all here, but basically there’s three classes : the StateStack, ManagedRenderStates and ManagedSamplerStates.


StateStack is a static extension class to the GraphicsDevice. It could be just another static class, or even a global XNA service, but since it’s going to be used in combination with the GraphicsDevice most of the time, I thought it was logical to use extensions.
You can push, pop and peek the current state, which takes and returns ManagedRenderState objects. It’s very similar to how OpenGL works with the attribute stack (glPushAttrib), except you don’t choose what you push, the entire state gets pushed/popped everytime. I didn’t think isolating states in categories was useful or natural in XNA.
The ResetState and CommitState methods are shorthands to the same methods in ManagedRenderState.

ManagedRenderState is like a wrapper over a RenderState object, such that it has the same interface : a read/write property for each render state and access to the SamplerState collection.
The difference is that it tracks changes to render states as you make them, marks properties as dirty and applies only what’s needed when you decide to Commit. The Reset method refreshes the object with the current state of the GraphicsDevice; this needs to be done at the beginning of every draw call, or everytime you think the state has been tampered with.

ManagedSamplerState does the same thing but for sampler states : addressing, filtering, etc. You don’t call Reset or Commit on it directly, it gets done automatically by its host ManagedRenderState.

I added blending mode and texture filtering helper methods to both State classes, because I end up using that a lot… and with proper state management, you don’t have to worry about redundant assignments! So you can be permissive and set everything you need, every time.


So what does it look like to use it?
The Game class needs to do some initialization and per-draw-call stuff first…

protected override void Initialize()

    // Other stuff

protected override void Draw(GameTime gameTime)


void SetDefaultRenderStates()
    var rs = GraphicsDevice.PeekState();

    rs.DepthBufferEnable = true;

    rs.SamplerStates[0].AddressU = rs.SamplerStates[0].AddressV = TextureAddressMode.Wrap;


Then to use the state stack, you don’t ever touch the GraphicsDevice.RenderState object : always use the Managed equivalent.

// Push a copy of the state on the stack
var rs = GraphicsDevice.PushState();

// Always-on-top, and don't disturb the depth buffer, and no culling
rs.DepthBufferFunction = CompareFunction.Always;
rs.DepthBufferWriteEnable = false;
rs.CullMode = CullMode.None;

// Commit changes

// Draw some stuff

// Pop back to the last state
// We are now back to the last state

Download & Conclusion

I think it makes a lot of sense to stack states in a component system, because each component can push its own local state and work with it, and just dispose it afterwards. Add this to dirty-checking to remove almost all redundant state changes, and you’ve got a very usable system!

The next step is batching draw calls to keep calls that use the same RenderStates together… in my system, this is left to the user’s discretion. Kevin Gadd presented a way of doing this (and many other things including threaded rendering) on his blog.

Another rather important note : ManagedRenderState contains a MaxSamplers constant that you can tweak depending on the number of sampler states you know you’ll use. Leaving it at 16 will make update/reset/refresh operations kind of slow… not sure yet if it’s noticeable.

And because of the immense amount of copy-paste that’s been needed to produce these classes, I can’t promise that it doesn’t have a typo or two… I haven’t tested every single state. But up to now, it works great, and I’ll update it if needed.

StateStack.cs (2 kb – C# class)
ManagedRenderState.cs (30 kb – C# class)
ManagedSamplerState.cs (7 kb – C# class)

Bonuses in the code files : a method to unpack a packed color from uint to Color, and a simplistic object Pool implementation (to eliminate garbage when stacking up states).

Hope you find it useful!

WaitUntil Component

Here’s a little something that I hope to use increasingly in the future : elements of functional programming to facilitate modification of state over time or game loops, without using threads all over the place. It’s nothing new, and there are other solutions (like Nick Gravelyn’s Interpolators and Timers), but I tried to make it as concise and generic as I could.

Here it is, more comments after the listing.

using System;
using Microsoft.Xna.Framework;
using Microsoft.Xna.Framework.GamerServices;

namespace Foo
    public class WaitUntil : IGameComponent, IUpdateable
        public static Game Game { private get; set; }

        public static void GuideDisappears(Action onValid)
            Game.Components.Add(new WaitUntil(_ => !Guide.IsVisible, onValid));

        public static void TimePassed(float secondsToWait, Action onValid)
            Game.Components.Add(new WaitUntil(waited => (waited as TimeKeeper).Elapsed.TotalSeconds > secondsToWait,
                                              (elapsed, waited) => (waited as TimeKeeper).Elapsed += elapsed,
                                              onValid, new TimeKeeper()));
        class TimeKeeper { public TimeSpan Elapsed; }

        readonly Func<object, bool> condition;
        readonly Action<TimeSpan, object> whileWaiting;
        readonly Action onValid;
        readonly object state;

        WaitUntil(Func<object, bool> condition, Action onValid) : this(condition, ActionHelper.NullAction, onValid) { }
        WaitUntil(Func<object, bool> condition, Action<TimeSpan, object> whileWaiting, Action onValid) : this(condition, whileWaiting, onValid, null) { }
        WaitUntil(Func<object, bool> condition, Action<TimeSpan, object> whileWaiting, Action onValid, object state) 
            this.condition = condition;
            this.whileWaiting = whileWaiting;
            this.onValid = onValid;
            this.state = state;

        public void Update(GameTime gameTime)
            if (condition(state))
                whileWaiting(gameTime.ElapsedGameTime, state);

        #region Stuff we don't care about
        public void Initialize() { }
        public bool Enabled { get { return true; } }
        public event EventHandler EnabledChanged;
        public event EventHandler UpdateOrderChanged;
        public int UpdateOrder { get { return 0; } }

    public static class ActionHelper
        public static void NullAction<T, U>(T t, U u) { }

I ended up using basically a GameComponent, which means I need access to a Game instance to add it and remove it from the component collection. I decided to use a static field (that you assign in the Game’s constructor) to avoid passing it every time. It’s very unlikely that the Game instance will change or be destroyed… and I already static-ified it in my ServiceHelper earlier anyway.

I also wrote a couple (okay, two) static factory methods that are slightly fluent-interface-ey.

// (from the context of your Game class implementation)
    // Say "OK" when the guide stops being visible, like this?
    Components.Add(new WaitUntil(_ => !Guide.IsVisible, () => Console.WriteLine("OK!")));
    // ...or like this!
    WaitUntil.GuideDisappears(() => Console.WriteLine("OK!"));

    // And while we're at it...
void WaitForTwoSeconds()
    Console.WriteLine("Will wait for two seconds...");
    // ...recursive timers!
    WaitUntil.TimePassed(2, WaitForTwoSeconds);

I’ll probably add new factory methods as the needs arise, and make the class overall more useful, but I feel like it’s a good start.
I used the “GuideDisappears” method when a gamer signs out and I want to show a warning message using the Xbox Guide before going back to a sign-up screen… but since signing out is usually performed from the Guide itself, you have to wait for it to close before doing anything. This seemed like the simplest solution, and it works great.

Anaglyph Stereoscopic Rendering in First Person

I decided to finally finish up my analgyph stereoscopy sample, and cut the depth-of-field component that was adding too much complexity for my limited spare time right now.

stereoscopy finalstereoscopy final 2


Binaries : (1.9 Mb – Binaries)
Code : (524 Kb – Source Only, get DLLs for TV3D and SlimDX from the Binaries)


This sample demonstrates a couple of things :

  • Optimized anaglyph filters for red/cyan stereoscopic rendering, to reduce eye-strain by minimizing retinal revialry but still keep some color information. As my source suggests, I’ve also implemented red channel gamma correction in the shader.
  • Auto-focus of both eyes on a focal plane. The camera is in the first person and the two “virtual cameras” act like human eyes, in the manner that they are connected to a single brain that wants to look at a single point. So the center of the screen is assumed to be that focal point, and both eyes will look at it. I used a depth rendering pass to achieve this, and a weighted sum of a certain portion of the screen near the center (this is all tweakable in realtime)

The distance between the eyes is also tweakable, if you want to give yourself a headache.

I wanted to do a stereoscopy sample to show how I did it in Super HYPERCUBE, but I can’t/don’t want to release its source, so I just re-did it properly. Auto-focus is a bonus feature that I wanted to play with; sHC didn’t need that since the focal plane was always the backing wall.

This sample, like all my recent ones, uses the latest version of my components framework. There are some differences between this release and the Stencil Rendering one, but it’s mostly the same. The biggest thing is that base components (Keyboard, Sound, etc.) are not auto-loaded anymore, and must be added in the Core.Initialize() implementation. This way, if I don’t need the sound engine, I just don’t load it… makes more sense.

Hope you like it!

Stencil Rendering

Here’s a little demo to show off a technique that Farbs posted about earlier this week.

yay yay2


Binaries : (3.6 Mb – Binaries)
Code : (1.8 Mb – Source Only, get DLLs for TV3D and SlimDX from the Binaries)


Every frame, a random color from the target image is sampled. This color will be used as a stencil, such that every pixel whole target color is close enough to that stencil’s color will be painted. It’s a constructive painting process; every frame paints a single color, but if you wait long enough in a single spot you’ll end up with the target image.

Actually, the post processing draw code is so concise that I can post it here :

public override void PostDraw()
    targetBuffer.SetSystemMemCopy(true, true); // Lolworkaround

    // 1) Pick a colour from target image
    var pickedColor = Globals.DecodeRGBA(TextureFactory.GetPixel(targetBuffer.GetTexture(), RandomHelper.Random.Next(0, targetBuffer.GetWidth()), RandomHelper.Random.Next(0, targetBuffer.GetHeight())));
    stencilShader.SetEffectParamVector3("stencilColor", new TV_3DVECTOR(pickedColor.r, pickedColor.g, pickedColor.b));
    Screen2DImmediate.Draw_FullscreenQuadWithShader(stencilShader, 0, 0, 1, 1, mainBuffer.GetTexture(), targetBuffer.GetTexture());

“targetBuffer” and “mainBuffer” are just two TVRenderSurfaces as big as the viewport. Since I sample from targetBuffer, it needs to be flagged with “system memory copy”. I thought this would slow things down, but it runs at very interactive framerates (60 and more).

And uh… The “lolworkaround” is a bug in the current build of TV3D. Usually you only need to set this once at initialization time, but a BltFromMainBuffer does not flag the surface as dirty and it prevents updates to the pixels that I sample. Resetting the memory copy mode makes the changes effective. Sylvain tells me it’s fixed in the current development build. :)

A pixel shader does the rest :

	float3 bufferSample = tex2D(MainBufferSampler, IN.TexCoord).rgb;
	float3 targetSample = tex2D(TargetSampler, IN.TexCoord).rgb;

	// 2) Calc difference between current screen and target (per channel subtraction, abs, then accumulate all three into one)
	float sampleDiff = distance(bufferSample, targetSample);

	// 3) As above, but between colour picked in step 1 and target image
	float stencilDiff = distance(stencilColor, targetSample);

	// 4) Where result of step 2 > result of step 3, draw colour picked in step 1
	float4 color;
	if (stencilDiff < sampleDiff)
		color = float4(stencilColor, 1);
		color = float4(bufferSample, 1);

	return color;

I think it’s a lovely effect. It’s very dependent on how colorful and contrasted the scene is, and it works differently for sharply-defined shapes or gradients… And of course camera movement is a big factor. In the video it gets confusing, the effect is more “painterly” if you just rotate the camera in small circles and wait for the effect to accumulate.