Jun 30 2009

Full Xbox 360 Gamepad Support In C#, Without XNA

Published by Renaud Bédard under C#

I think that XNA’s implementation of the XInput API for Xbox 360 controllers is terrific. Simple, complete, ready to use. So when I wanted to add gamepad support (including analog input and vibration) to Super HYPERCUBE, a TV3D 6.5 game, I just added a reference to the XNA Framework assembly and carried on like I did in Fez. But I realize that referencing XNA just for controller support is silly, and I don’t want to assume that it’s installed on clients, nor force-install the redistributable.

So I recently looked for a more proper solution. It looks like MDX 2.0 Beta had support for XInput, but it’s been discontinued in favor of XNA for some time. There is also a handful of managed wrappers around the XInput native DLL that use P/Invokes… wasn’t too keen on that either.

Then I recalled that SlimDX is the community-maintained successor to MDX, and that it’s awesome. And as a matter of fact, it has complete support for XInput! So might as well use that.

But SlimDX is, well, slim. It’s a very thin wrapper and doesn’t do any normalization or dead zone detection for thumbsticks. It’s not much of a bother to do yourself, but I figured I’d post my code as a reference to people who want to use SlimDX’s XInput implementation in the real world.

Details

My dead zone code is based around the “Getting Started With XInput” guide on MSDN, but given the C#3/SlimDX beautifying treatment. :)
I also found that analog triggers on my controllers did not need dead zones, but if you want to use them as binary buttons, you can check against the appropriate SlimDX constant.

My “state class” does not wrap everything that the SlimDX Controller class exposes, like voice support, battery information, etc. So I made the local Controller instance a public readonly member, and you can access it to query whatever other information you need. Or of course you can add the getters/state variables that you need.

Download

GamepadState.cs (C#3 – 5 Kb, SlimDX March 2009 SP1 SDK or later needed)

Just the dead zone code

If you’re only looking for that, here it is :

    var gamepadState = Controller.GetState().Gamepad;
    var leftStick = Normalize(gamepadState.LeftThumbX, gamepadState.LeftThumbY, Gamepad.GamepadLeftThumbDeadZone);
    var rightStick = Normalize(gamepadState.RightThumbX, gamepadState.RightThumbY, Gamepad.GamepadRightThumbDeadZone),
}
 
static Vector2 Normalize(short rawX, short rawY, short threshold)
{
    var value = new Vector2(rawX, rawY);
    var magnitude = value.Length();
    var direction = value / (magnitude == 0 ? 1 : magnitude);
 
    var normalizedMagnitude = 0.0f;
    if (magnitude - threshold > 0)
        normalizedMagnitude = Math.Min((magnitude - threshold) / (short.MaxValue - threshold), 1);
 
    return direction * normalizedMagnitude;
}

2 responses so far

Jun 20 2009

Isotropic Specular Reflection Models Comparison

Published by Renaud Bédard under C#, HLSL, Samples

I’ve decided to repost all my remaining TV3D 6.5 samples to this blog (until I get bored). These are not new, but they were only downloadable from the TV3D forums until now!

This demo (originally released as VB.Net 2005 on Feburary 13th, 2007 here) is a visual and performance comparison (and reference implementation) of five different per-pixel lighting models for isotropic specular reflections :

  • Phong reflection model
  • Blinn-Phong (Blinn D1, Phong) specular distribution
  • Lyon halfway method 1 (for k=2 and D = H* – L)
  • Trowbridge-Reitz (Blinn D3) specular distribution
  • Torrance-Sparrow (Blinn D2, Gaussian) specular distribution

My main goals were to :

  • Make an optimized HLSL implemention of each model that fits in a single Shader Model 2.0 pass and supports 3 lights
  • Evaluate the performance of each model in a multiple light, per-pixel rendering context
  • Determine which model keeps the most numerical precision and does not produce artifacts when used with normal-mapping

Download

IsotropicModels.zip [2.7 Mb] – C#3 (VS.NET 2008, TV3D 6.5 Prerelease .NET DLL Required)

Screenshots

lyon 2lyon 7
lyon 8lyon 9

Details

The HLSL shader supplied with this sample was made to mimic the built-in TV3D offset-bumpmapping shader as closely as possible. As a result, almost all of its effect parameters are mapped to standard semantics. It supports :

  • One colored directional light
  • Two colored point lights in SM2.0, and four in more recent models (SM2a/b and SM3)
  • Support for all types of vertex fog
  • Parallax mapping of texture coordinates using a grayscale heightmap
  • Diffuse mapping with alpha support (a.k.a. texturing)
  • Normal mapping
  • Specular mapping (using the alpha channel of the normalmap)
  • Emissive mapping (colored!)
  • Usage of all material terms (diffuse, ambient, specular, emissive, power and opacity)

There is no support for point light attenuation as this would’ve gone over the 64 instructions limit of the ps_2_0 profile. (also TV3D doesn’t provide semantics for these)
There is no support for spot lights for the same reasons, but I believe spots will be processed as regular point lights, ignoring the specific parameters.

Techniques

With the realtime controls, you can choose from three different techniques : FiveLightsBranching, FiveLights and ThreeLights. On SM2.0 hardware, only the third option will be valid.

The FiveLightsBranching mode uses loops and “if” statements to produce dynamic branching on SM3.0 compatible hardware. This can (but may not) be benificial because only the calculations for enabled lights are performed.
The FiveLights and ThreeLights modes respectively do five and three lights (WHAT YOU SAY !!) but all in a static manner. It does not just unroll the loop! Most of the calculations are done with matrices, which makes it more efficient on most hardware.

To keep the shader “simple” (or to prevent from becoming even more complex…) I decided not to implement a multipass 5 lights technique for SM2.0… sorry!

Blinn vs. Phong

There’s two major categories in the models I tested : the ones that use the halfway vector, and the Phong model that works with the reflected vector. (see the Wiki entry on Blinn-Phong for details on these vectors).

BlinnvsPhong
A directional light reflecting on a surface with a power value of 64

According to a paper from Siggraph 2004 called Experimental Validation of Analytical BRDF Models, the halfway methods generate specular highlights with more realistic shapes than the Phong model. I realized that myself when working on an ocean rendering shader that had a Phong specular reflection, and it was impossible to get a long grazing highlight when the sun was setting.

Normal Mapping Artifacts

One of the things that made me do this whole analysis is that I was dissatisfied with the image quality of Phong and Blinn-Phong when used with a normal map, so per-pixel lighting. I had huge block artifacts on my water surface, and everything with a high enough specular power value and a bumpy surface. So I found about a reformulation of the Blinn-Phong model by Richard F. Lyon, written in 1993 (!) for Apple. (trivia : Mr. Lyon also invented the optical mouse… how awesome is that!)

lyonblinn

This reformulation is interesting because it does not use the specular power literally as an exponentiation, it uses a distance metric and a much lower power value to produce very similar results to the Blinn-Phong model. Using a high specular power (32 or more) hurts floating-point accuracy, even in full-precision mode.

That said, I have seen hardware that do not have this problem. I am starting to think that it may be a driver issue, or something about mobile GPUs… In any case, the safe thing to do is choose the model that never produces artifacts, right?

The Other Blinn Distributions

The two other models I implemented (Trowbridge-Reitz and Torrance-Sparrow) were “ported” from the MATLAB code in Lyon’s reformulation paper. I wanted to test them out to see if they had the same artifact problems, and how different they looked from the classic models.

trowbridge

Trowbridge-Reitz is an interesting model because of how it looks. It’s slower than Blinn-Phong, but it has a distinct smoothness to it. The falloff of its specular highlights is softer than the other models… I’m not sure if it’s more accurate, but it looks pretty. Sadly, it has the same problems with normal mapping.

Torrance-Sparrow is a visual identity to the Blinn-Phong model. It’s the same thing, but slower and more instruction-heavy. It does not even fit in SM2.0 with 3 lights,… So I suggest you disregard it for realtime graphics.

Performance

I found that performance varies a lot depending on which technique you use, which shader model you support, how much your GPU is fillrate-limited instead of arithmetic-limited… So I’ll just say this : the Lyon model looks great, and it’s simple and fast enough to be worth considering. If you don’t experience the artifacts I describe, then the Blinn-Phong model is your best shot, but test Trowbridge-Reitz to see if it’s fast on your hardware.

It’s also worth mentioning that many things could be optimized by factorizing equations into small 1D or 2D textures (or perhaps a normalization cubemap), if your GPU loves pixels and hates instructions. But I don’t believe that the shader can be optimized that much by reorganizing code or removing useless statements. At least not without hurting visual quality.

Component System Updates

This sample contains a major breaking change to my component framework : The Service baseclass is gone. This makes Components able to “be” services (and publish many service interfaces), and allows this sample to have a much simpler class structure… no more state classes! The components just publish whatever data they want via their service interfaces. And with the new Eventful<T> class, it’s really easy to propagate changes from a controller to a view.

No responses yet

Jun 11 2009

Post Processing/Fullscreen Texture Sampling

Published by Renaud Bédard under Samples, TV3D 6.5

I’ve decided to repost all my remaining TV3D 6.5 samples to this blog (until I get bored). These are not new, but they were only downloadable from the TV3D forums until now!

This demo (originally released July 1st 2008 here) is a mixed bag of many techniques I wanted to demonstrate at once. It contains (all of which are described below) :

  • Pixel-perfect sampling of mainbuffer rendersurfaces for post-processing
  • A component/service system very similar to XNA’s and with support for IoC service injection
  • A minimally invasive, memory-conserving workflow for post-processing shaders
  • Many realtime surface downsampling methods (blit, blit 2x, tent, sinc+kaiser and bicubic)
  • An optimized Gaussian blur shader with up to 17 effective taps in a single pass

Download

FullscreenShaders_(final).zip [8.6 Mb] – C#3 (VS.NET 2008, TV3D 6.5 Prerelease .NET DLL Required)

Screenshots

fs1fs2fs3fs4

Dissection

Pixel-Perfect Sampling

There is a well-known oddity in all DirectX versions (I think I’ve read somewhere that DX11 fixes it… amazing!) that when drawing a fullscreen textured quad, the texture coordinates need to be shifted by a half-texel. So that’s why if you use hardware filtering (which is typically enabled by default), all your fullscreen quads are slightly blurred.
But the half texel offset is related to the texel size of the texture you are sampling, as well as the viewport to which you are rendering! And it gets seriously wierd when you’re downsampling (sampling a texture at a lower frequency than its resolution), or when you’re sampling different-sized textures in a single render pass…

So this demo adresses the simple cases. I very recently found a more proper way to address this problem in a 2003 post from Simon Brown. I tested it and it works, but I don’t feel like updating the demo. ^_^

Component System

The component system was re-used in Trouble In Euclidea and Super HYPERCUBE, and I’m currently using it to prototype a culling system that uses hardware occlusion queries efficiently. The version bundled with this demo is slightly out of date, but very functional.
I blogged about the service injection idea a long time ago, if you want to read up on it.

Post-processing that just sits on top

This is how I consider post-processing should be done : using the main buffer directly without writing to a rendersurface to begin with. This way, the natural rendering flow is not disturbed, and post-processing effects are just plugged after the rendering is done.

If you can work directly with the main buffer (no downscaling beforehand), you can grab the mainbuffer onto a temporary rendersurface using BltFromMainBuffer() after all draw calls are performed, and call Draw_FullscreenQuadWithShader() using the temporary rendersurface as a texture. The post-processed result is output right on the main-buffer.
Any number of effects can be chained this way… blit, draw, blit, draw. Since the RS usage only lasts until the fullscreen draw call, you can re-use the same RS over and over again.

If you want to scale the main buffer down before applying your effect (e.g. for performance reasons, or to widen the effect of a blur), then you’ll need to work “one frame late”. I described how this works in this TV3D forum post.

Downsampling Techniques

The techniques I implemented for downsampling are the following :

  • Blit : Simply uses BltFromRenderSurface onto a half- or quarter-sized rendersurface. A single-shot 4x downscale causes sampling issues because it ignores half of the source texels… but it’s fast!
  • Double blit : 4x downscaling via two successive blits, each recieving surface being half as big as the source. It has less artifacts and is still reasonably fast.
  • Bicubic : A more detail-conserving two-pass filter for downsampling that uses 4 linear taps for 2x downsampling, or 8 linear taps for 4x. Works great in 2x, but I’m not sure if the result is accurate in 4x. (reference document, parts of it like the bits about interpolation/upsampling are BS but it worked to an extent)
  • Tent/Triangular/Bilinear 4x : I wasn’t sure of its exact name because it’s shaped in 2D like a tent, in 1D like a triangle, and it’s exactly the same as bilinear filtering… It’s accurate but detail-murdering 4x downsampling. Theoretically, it should produce the same result as a “double blit”, but the tent is a lot more stable, which shows that BlitFromRenderSurface has sampling problems.
  • Sinc with Kaiser window : A silly and time-consuming experiment that pretty much failed. I found out about this filter in a technical column by the Jonathan Blow (of Braid fame), which mentioned that it is the perfect low-pass filter and should be the most detail-conserving downsampling filter. There are very convincing experimental results in part 2 as well, so I gave it a shot. I get a lot of rippling artifacts, it’s way too slow for realtime, but it’s been fun to try out. (reference document)

Fast Gaussian Blur + Metrics

The gaussian blur shader (and its accompanying classes) in this demo an implementation of the stuff I blogged about some months ago : the link between “lost light” in the weights calculation and how similar to a box-filter it becomes. You can change the kernel size dynamically and it’ll tell you how box-similar it is. The calculation for this box-similarity factor is still very arbitrary and you should take it with a grain of salt… but it’s a metric, an indicator.

But there’s something else : hardware filtering! I read about this technique in a GLSL Bloom tutorial by Philip Rideout, and it allows up to 17 effective horizontal and vertical samples in a ps_2_0 (SM2 compatible) pixel shader… resulting in 289 effective samples and a very wide blur! It speeds up 7-tap and 9-tap filters nicely too, by reducing the number of actual samples and instructions.
Phillip’s tutorial contains all the details, but the idea is to sample in-between taps using interpolated weights and achieve the same visual effect even if the sample count is halved. Very ingenious!

 

If you have more questions, I’ll be glad to answer them in comments.
Otherwise I’ll direct you to the original TV3D forum post for info on its development… there’s a link to a simpler earlier version of the demo there, too.

One response so far

May 23 2009

Top Sources of Heap Garbage

Published by Renaud Bédard under C#, XNA

In the last Xbox-related entry I posted, I mentioned how the CLR Profiler can be a very useful tool to know what are the biggest sources of heap garbage in your .NET project. I used it extensively in the past month to optimize my game to run on the Xbox, and here’s a rundown of my biggest programming “mistakes” (or problematic liberties?).

1. LINQ

Even the simplest LINQ queries like :

List<List<Potato>> potatoBags;
foreach (var potato in potatoBags.SelectMany(x => x))
{
  // ...
}

…will cause a noticeable amount of heap garbage. This includes Where clauses, OrderBy clauses, everything! It’s sad because I think that LINQ is a fantastic code-thinning tool, but it’s not an option on limited-memory systems like the Xbox.

That said, if you want to use LINQ at initialization/loading time, feel free to do so. The problems only arise in update/draw calls.

2. Automatic XNB Deserialization

At one point, I got sick or writing ContentTypeWriter and ContentTypeReader classes and started building an XNB automatic serializer and deserializer based on Alexander’s (John Doe?) work on the subject. On Windows, the load times remained the same and it greatly simplified or deleted many of my content pipeline classes.

But on the Xbox the load times were horrible. Even in Release and without the debugger attached, the load times were at least 5x as slow as on my PC. I then discovered that reflection calls generate a lot of heap garbage — and it’s not even clear if memory stays allocated or if it eventually gets compacted by the GC…

So I swallowed my pride and switched back to good old Reader/Writers. But hey, now the load times are super fast.

3. Using classes when you can use structs

Coming from a Java background, I’m very used to classes and I’ll use them for pretty much everything. Even data structures that get created at runtime, because it’s so handy to have references and everyone pointing to the same object…

Turns out using structs has a lot of advantages. Intuitively I thought that the by-copy parameter passing would just make everything slower, but if you keep your structures small enough it has little to no effect on performance. The fact that they reside on the stack and not on the heap makes them a much better option for the Xbox. So datatypes like collision result objects, object identifiers, anything that you need to create often when the game is running should be made into structs.

4. Object pools (are a good thing)

Sometimes you just need to dynamically allocate reference objects in your algorithms. Or even value-types can get allocated on the heap if they’re at class-local scope. But you can minimize the damage by using object pools!

They’re really easy to set up (I found this Ziggyware article to be a good starting point) and they’ll save you heap garbage by preallocating to the number of objects you’ll actually need, and extending the lifetime of objects that would otherwise be disposable trash.

5. Removing objects from a collection that you’re enumerating

I thought I had found a really good way to fix the old problem of “Collection was modified; enumeration operation may not execute” when you remove an object from a collection when you’re foreach’ing on it :

foreach (var potato in potatoBag.ToArray())
{
  if (potato.Expired)
    potatoBag.Remove(potato);
}

It’s pretty cute, no? Very little impact on the iteration code. But it also copies the whole collection to a brand new array everytime you’re enumerating it… :(

What I ended up doing instead to minimize garbage is :

// This is allocated once, at the class-level scope
readonly List<Potato> expiredPotatoes = new List<Potato>();
 
public void Update()
{
  // Standard update
  foreach (var potato in potatoBag)
  {
    if (potato.Expired)
      expiredPotatoes.Add(potato);
  }
  // Removing pass
  foreach (var expired in expiredPotatoes)
    potatoes.Remove(expired);
  expiredPotatoes.Clear();
}

It’s certainly heavier code-wise, but at least it’s clean. And it’s faster too.

6. Enums as Dictionary keys

If you use an enum as the TKey type parameter for a Dictionary<TKey, TValue> object, you’ll have a small amount of garbage generated everytime you access the dictionary. But there’s an easy way around it : you just need to build a Comparer class for the enum type (which is under 10 lines of code) and pass it to the constructor of your dictionary.

Cheers to Nick Gravelyn for pointing out a solution to that problem.

7. Collections should be pre-allocated

When possible, you should use the parameterized constructors of all your Lists, Dictionaries, HashSets and whatever other collection types that you use, such that their backing arrays are pre-allocated to the number of elements that you plan to add to them.

Starting them with the default parameterless constructor will force the collection to grow (using Array.Resize, which trashes the old array and creates a new, bigger one) until you filled it completely.

Conclusion

That’s it for now.
I know, 7 is a terrible number for a “Top N” list, but I can’t think of other major sources of garbage that I’ve encountered. The rest goes down to good programming practices. (don’t instantiate reference types all over the place, etc.)

Hope it helped!

8 responses so far

May 11 2009

A Shared Content Manager for XNA

Published by Renaud Bédard under C#, XNA

In an average-sized XNA game, you’ll end up having many levels using many art assets, with most of them sharing textures and models between each other. Using the standard ContentManager class, the basic approach is to load all of a level’s assets into a single ContentManager, and unload it when switching levels : this way there is no possible memory leak and memory usage is kept to a minimum.

But what about load times? Users usually want level transitions to be as seamless as possible, yet we can’t just pre-load everything, you gotta watch the memory budget…

Sharing is caring

One solution is to preserve shared assets : an asset that is loaded for Level #1 and re-used in Level #2 can be kept in memory instead of being destroyed and reloaded. Memory-wise it’s costless because you were about to reload it anyway; keeping it for a longer time has no negative effect.

A simple way to keep track of shared assets is to use reference counting : increment a counter whenever you ask to load an asset, and flush assets that have 0 references when you unload. But even the almighty Shawn Hargreaves thinks it’s a bad idea…

[...] reference counting sucks for all sorts of reasons I can’t be bothered to go into here. It is better than nothing, but falls short of the automatic, rapid development approach .NET developers have rightly come to expect.

Fair enough, but how about making asset disposal transparent by using the same ContentManager containers with the same public interface, yet use reference counting in the background?

I tried doing exactly that, and had great success with it, so I suggest you take a look at the code below and give it a shot!

public class SharedContentManager : ContentManager
{
    static CommonContentManager Common;
    List<string> loadedAssets;
 
    public SharedContentManager(IServiceProvider serviceProvider, string rootDirectory) 
        : base(serviceProvider, rootDirectory)
    {
        EnsureSharedInitialized();
        loadedAssets = new List<string>();
    }
 
    static void EnsureSharedInitialized() 
    {
        if (Common == null)
            Common = new CommonContentManager(ServiceProvider, RootDirectory);
    }
 
    // This is ripped straight off the ContentManager disassembled source...
    // Wouldn't have to do that if it were protected! :)
    internal static string GetCleanPath(string path)
    {
        // Ugly, boring code that you'll get if you download the codefile
    }
 
    public override T Load<T>(string assetName)
    {
        assetName = GetCleanPath(assetName);
        loadedAssets.Add(assetName);
        return Common.Load<T>(assetName);
    }
 
    public override void Unload()
    {
        if (loadedAssets == null)
            throw new ObjectDisposedException(typeof(SharedContentManager).Name);
 
        Common.Unload(this);
        loadedAssets = null;
 
        base.Unload();
    }
 
    class CommonContentManager : ContentManager
    {
        readonly Dictionary<string, ReferencedAsset> references;
 
        public CommonContentManager(IServiceProvider serviceProvider, string rootDirectory) 
            : base(serviceProvider, rootDirectory)
        {
            references = new Dictionary<string, ReferencedAsset>();
        }
 
 
        public override T Load<T>(string assetName)
        {
            assetName = GetCleanPath(assetName);
 
            ReferencedAsset refAsset;
            if (!references.TryGetValue(assetName, out refAsset))
            {
                refAsset = new ReferencedAsset { Asset = ReadAsset<T>(assetName, null) };
                references.Add(assetName, refAsset);
            }
            refAsset.References++;
 
            return (T) refAsset.Asset;
        }
 
        public void Unload(SharedContentManager container)
        {
            foreach (var assetName in container.loadedAssets)
            {
                var refAsset = references[assetName];
                refAsset.References--;
                if (refAsset.References == 0)
                {
                    if (refAsset.Asset is IDisposable)
                        (refAsset.Asset as IDisposable).Dispose();
                    references.Remove(assetName);
                }
            }
        }
 
        class ReferencedAsset
        {
            public object Asset;
            public int References;
        }
    }
}

Notes

By design, the class assumes that all your content managers will have the same root path and use the same service provider. This version uses the constructor parameters of the first instance for all subsequent instances. It’s kind of redundant to pass those parameters everytime since they aren’t used after the first instance has been created, you can probably simplify and optimize that part (I did otherwise in my project but it’s tied to my engine code).

Content loading is not thread-safe with this method. The version I use in my project again uses a different way to initialize the common content manager and monitors, but I thought it made the implementation too heavy for demonstration… this too would need work if you use threaded loading.

It works if you use forward slashes for paths because of the GetCleanPath method. But fun fact, it treats paths and filenames as case-sensitive so it will reload assets if you change the case between loadings! So be careful with that, or fix it. :P

Usage

Here’s the procedure for level transitions :

// Create a content manager for the next level
var nextLevelCM = new SharedContentManager(Game.Services, Game.Content.RootDirectory);
 
// Load the content for this next level
var fooTexture = nextLevelCM.Load<Texture>("foo");
var barSound = nextLevelCM.Load<SoundEffect>("bar");
 
// Unload the current (old) level's content manager
currentLevelCM.Unload();
 
// Cycle
currentLevelCM = nextLevelCM;

If you unload the last level’s content manager before you load the next level’s content, all the assets will be reloaded, which renders my code useless. Make sure you follow that order!

The code can be downloaded here : SharedContentManager.cs (4 kB, XNA 3.0 / C#3.5)

And that’s it! Hope it works for you!

2 responses so far

May 03 2009

Behind Fez : Trixels (part two)

Published by Renaud Bédard under Fez

I decided that I would write a more detailed article about the rendering module of the Trixels engine. Many things changed implementation-wise since last year, and I feel that now is a good time to go public as it’s probably not going to change much anymore.

I really don’t mind “coming clean” about how things are done, since Fez certainly didn’t invent non-interpolated orthographic voxels or whatever you might call trixels in a more formal language. Trixels are just a technology support for Fez’s art style, not necessarily the best, but one that works and that I’d like to share.

You should probably check the first post I made about trixels to get the basic idea first.

Here’s the trile I’ll dissect for this post, in a perspective view :

trile-dissected-1

Memory representation

Each trile at its creation is a full 16³ cube, without any holes. The editing process consists of carving trixels out of the full shape to get a more detailed object.

The very first version of Fez recorded all the trixels that are present inside a trile, which means an untouched trile would contain a list of all the possible positions inside the trile to say that these positions are filled.

It became obvious early on that most triles have a lot more matter than holes, so the second version recorded the missing trixels instead. But even that became hardly manageable, especially in the text-format/human readable intermediate trileset description files; they were still too immense to edit by hand if needed, and took a while to parse and load because of their size. The memory usage was questionably high too for the little data that was represented.

I tried using octrees, but that kinda failed too. Thankfully Saint on the tigsource forums gave me a better idea; to use missing trixel boxes, so the smallest amount of the largest possible missing trixels cuboids. The good thing about these is that a box is represented by a 3D vector for its size and a 3D point for its position, that’s it!

Here’s a visualization of what these boxes looks like. Adjacent boxes have the same color.

trile-dissected-2

The editor tries to keep these boxes as big as possible and their number as small as possible in realtime while you edit, but my algorithms aren’t perfect yet. A “best effort” scenario is fine though, a couple of superfluous boxes have little effect on memory usage/file size.

Polygonization

The rendering strategy of Fez is to draw the bounding surfaces of each trile as a triangle list and ignore everything that’s inside a trile. To do that, we must first isolate the contiguous surfaces of the trile, and then split it in triangles.

To extract surfaces, I assume that all of the actions from the initial filled trile are incremental. By that I mean that all you can do in Fezzer when sculpting a trile is remove a trixel or add a trixel, and each of these operation creates, destroys or invalidates surfaces; but each is treated separately and acts on the current state.
…I was going to explain the algorithm in detail, but I feel like it’d be just confusing and unnecessary. So, exercise to the reader. :)

Whenever a surface is modified/created, another pass tries to find the smallest amount of the biggest possible rectangles inside that surface. It’s the same thing as with the missing trixel boxes, but in 2D this time (also alot easier to do right!). My algorithm traverses the surface from its center in an outward spiral and marks all the cells that form a rectangle; the remaining cells are traversed later, recursively until the whole surface is covered with rectangles.

Each rectangle is then a quad that is formed of two triangles, which we can render directly. A vertex pool makes sure that there are no duplicate vertices, and maps the indices appropriately.

Below is a visualization of the rectangular surface parts of the same trile, in filled geometry mode and in wireframe.

trile-dissected-3trile-dissected-4

Mass rendering

So that’s a single trile. A level is formed of a ton of triles, we don’t want to draw everything at all times. How do we manage that?
And rendering a lot of small objects in sequence is not very GPU-friendly, so how do we group batches efficiently?

The answer to the first question is efficient culling. Since the world is in essence formed of grid-aligned tiles, it’s easy to find which are visible in the screen or not, and render selectively. But Fez also works in the third dimension, and the depth range can be really big, so we need to find which triles we can skip rendering if they’re behind another trile. Simple enough, traverse to the first visible trile for each screen-space tile, and render this one only. But some triles can be flagged as see-through and let the traversing continue until we hit a non-seethrough trile or the level’s boundaries.

For tilted isometric views, a similar culling algorithm is used, where we try to find the triles with no neighbours on the faces that the camera can see. In perspective view, it’s a lot harder to cull without occlusion queries, which I didn’t want to get into… But 99% of the game is played from an isometric perspective so it’s no big deal.

Here’s a scene from the GDC ‘09 trailer in a 2D view that you’d play in, and how it’s culled. The world extends in the third dimension, but we only need to see its shell!

culling-1culling-2
 

The second thing is batching.

In the very first XNA version of Fez, I tried to call DrawUserIndexedPrimitives repeatedly for each trile in the world and hope for the best. It was unplayable, because as it turns out there is considerable overhead to draw calls and doing fewer, bigger draw calls is the key to 3D performance.

Each level has a trile set that contains a restricted number of different trile templates, and the level indexes integer 3D grid positions with elements of that trile set. Trile instances have a very limited set of properties to themselves (like rotation and offset) but every instance of a template trile shares geometry, texture and collision information. So I felt that geometry instancing was the way to go for batch rendering.

Different hardware supports different flavours of instancing but shader instancing is the common baseline for all Shader Model 2 and above GPUs, so I went with that. My current implementation of SM2 instancing supports 237 instances per batch, and the instance information is stored as vertex shader constants. The number will probably go down if I need to add more information to individual instances, it’s really minimalistic right now. But I found instancing to provide excellent performance on older hardware, current-gen GPUs and consoles, and not too hard to get to work well.

 

That’s it! Hope you enjoyed the tour. :)
Any questions?

14 responses so far

Apr 18 2009

A Fast 2D Texture ContentImporter using DevIL.NET

Published by Renaud Bédard under XNA

As my XNA project grows larger and larger in terms of content, I try to isolate the bottlenecks of content compilation. Because even if theoretically all content builds are incremental, everytime you play with your content pipeline assemblies, everything needs to be rebuilt… and that gets seriously annoying if your build takes more than 5 minutes.

I noticed that the built-in TextureImporter isn’t exactly fast, even for very small textures (under 256×256). Looks like it uses a lot of “legacy” Direct3D code that takes a while to initialize and spends more time doing that than processing the texture. So I tried making my own simplistic TextureImporter to handle 2D textures with standard RGBA colors, but fast.

I started with the built-in System.Drawing.Bitmap class, but it doesn’t support a very wide array of image formats. I then tried using the managed wrapper for FreeImage and Tao.DevIl, both seemed a little backwards or clunky for the small amount of work I needed it to do. Then I gave DevIL.NET a shot and it does exactly what I want : load any image, get a GDI+ Bitmap object. Perfect!

Here’s the code of my FastTextureImporter, which only supports 2D textures, only parses the first mip level, and assumes a RGBA format, which should actually be fine for most textures. It supports the DDS format, which allows much more format options, but should not be used on 3D or Cube textures, and would also lose DXT compression.

using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Runtime.InteropServices;
using Microsoft.Xna.Framework.Content.Pipeline;
using Microsoft.Xna.Framework.Content.Pipeline.Graphics;
using Color=Microsoft.Xna.Framework.Graphics.Color;
 
namespace MyContentPipeline.Importers
{
[ContentImporter(".bmp", ".cut", ".dcx", ".dds", ".ico", ".gif", ".jpg", ".lbm", ".lif", ".mdl", ".pcd", ".pcx", ".pic", ".png", ".pnm", ".psd", ".psp", ".raw", ".sgi", ".tga", ".tif", ".wal", ".act", ".pal", 
    DisplayName = "DevIL.NET Texture Importer", DefaultProcessor = "TextureProcessor")]
public class FastTextureImporter : ContentImporter<Texture2DContent>
{
    public override Texture2DContent Import(string filename, ContentImporterContext context)
    {
        var content = new Texture2DContent
        {
            Identity = new ContentIdentity(new FileInfo(filename).FullName, "DevIL.NET Texture Importer")
        };
 
        using (var bitmap = DevIL.DevIL.LoadBitmap(filename))
        {
            var bitmapData = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height),
                                             ImageLockMode.ReadOnly, bitmap.PixelFormat);
 
            int byteCount = bitmapData.Stride * bitmap.Height;
            var bitmapBytes = new byte[byteCount];
            Marshal.Copy(bitmapData.Scan0, bitmapBytes, 0, byteCount);
 
            bitmap.UnlockBits(bitmapData);
 
            var bitmapContent = new PixelBitmapContent<Color>(bitmap.Width, bitmap.Height);
            bitmapContent.SetPixelData(bitmapBytes);
            content.Mipmaps.Add(bitmapContent);
        }
 
        content.Validate();
        return content;
    }
}
}

I tested it with all the 2D textures in my project, and the build speed difference is really noticeable. It also accepted all my textures, although I don’t use DXT compression nor exotic formats like A8/L8, so I can’t guarantee that it works for everything.

This class should go in a ContentPipelineExtension project which has a reference to the DevIL.NET2 assembly, and the DevIL.dll file in its build path. You can get those on DevIL.NET’s website; even though the last release dates from 2007, it works great.

One response so far

Apr 07 2009

Things you should know before/while making an Xbox XNA game

Published by Renaud Bédard under C#, XNA

So I started toying around (read: developing full-time) with Xbox programming using XNA GS 3.0. In fact I took a big Windows Game project with many satellite Game Libraries, a Content Pipeline Extension, a content editor, an automatic serialization library… and “ported” most of it to Xbox. But since the editor will remain Windows-based, the engine and most of the code needs to stay cross-platform, compatible with both Windows and Xbox.

And I hit a few walls.

I feel like these are things many people working with XNA will encounter. XNA’s been around for a while now, since version 1.0. Many of these things are already widely discussed in the blogosphere and forums. Also, I’m aware that GS 3.1 is around the corner and it’ll address at least the first point of my rant…

Still, here’s a handful of things that surprised or annoyed me in the transition :

1. ContentTypeReaders need to stay out of Content Pipeline Extension project(s)

Suppose you have a pretty big project that has custom datatypes, and those datatypes are compiled to XNB files using custom ContentTypeWriters and then read back using ContentTypeReaders. You usually need a Content Pipeline Extension project for that, and this project would reference your Engine or whatever project owns the datatypes that you want to compile.

Before very recently, I never quite understood why all official samples had the Reader classes in the Game project, while the Processors, Writers and Importers were all in the Content Pipeline Extension project. Why decouple it like that, and why join them with a fully-qualified assembly string in the GetRuntimeReader method of Writer classes? Moreover, putting Readers in my Extension project always worked in my Windows-only solutions, and it all felt nice and clean.

But when doing everything for Xbox and Windows, the reason becomes clear…

The Content Pipeline Extension project is a standard C# project in XNA GS 3.0. Not a “Game Library” project or any other special container. This means that it won’t be duplicated if you do an Xbox version, and it makes sense; you only need to compile content on your Windows machine.

So your Content Pipeline Extension needs to have a reference to your content datatypes, in some Game Library project. Since the Extension is for Windows only, it’d reference the Windows version of that Game Library. And then if your Readers are in the Extension, your Xbox game needs to reference it to load assets… which means the Xbox and Windows versions of your data structure Library project would coexist on the Xbox. This can’t work!

Besides, the Xbox project doesn’t need to access Processors, Importers and Writers. All it needs to be able to read content and then use it. These other content pipeline classes may even use Windows-specific assemblies like GDI+, why not? They’re certainly useful for image processing.

So bottom line, keep Readers in your Game project if it’s a small project, or in a Game Library for both Xbox and Windows. And if your Content Pipeline can reference this Readers-container, then no need for a hardcoded String for GetRuntimeReader, you can just get the assembly-qualified-name from Reflection classes!

2. The Xbox doesn’t like garbage

The first thing I noticed after I got the game running were hiccups in the framerate, every two seconds or so. But the framerate apart from that was a constant 60. I half-expected this,… it’s the garbage collection.

This paper by three people at the FZI Research Center for Information Technology explains it much better than I can, so I’ll just quote them… :

The .NET Framework on PC uses a generational approach, making garbage collections less painful and more resilient to large numbers of objects. With 512 MiB of GDDR3 memory shared between GPU and CPU the Xbox 360 garbage collector can’t afford such luxury.
The garbage collection of the .NET Compact Framework for the Xbox 360 always has to check all objects. Therefore the time a collection takes increases linearly with the number of objects. Further a collection will be triggered whenever 1 MiB of memory has been allocated.

This means you really need to stop carelessly allocating to the heap when doing an Xbox game. There are several “known causes” of heap garbage with the XNA Framework, but it’s easy to start going on a witch-hunt and replacing all foreach(in) statements by plain for(;;) or stuff like that… It’s a much better idea to find out what are the bottlenecks in your application and fix them starting by the bigger ones. You’ll probably end up solving most of the jittering without making your code look like C.

The above paper presents some options for memory profiling like the CLR Profiler and XNA Framework Remote Performance Monitor, both of which I have yet to try, but sounds like excellent free tools to address this issue.

Update : See this post for more information on typical causes of heap garbage.

3. The Compact .NET Framework needs your help

The Xbox .NET implementation is not the full-blown framework, it’s based on a subset called the Compact Framework, which is also used on mobile devices and embedded systems. This comes at a small cost : you have to complete it to fit your needs.

It’s actually pretty cool that LINQ is supported and all 3.0 features work flawlessly. But here’s a short list (off the top of my head) of things I found missing, some important, some easily worked around, all of them at least mildly annoying… :

  • Enum.GetValues(), Enum.GetNames(), Enum.GetName() : These are all missing from the CF. There is an old thread on the XNA forums that proposes alternatives that use Reflection. I found them to be working great.
  • ParameterizedThreadStart : You can’t start a Thread with a context object in the CF. You then need a shared context object in the parent class.
  • Type.GetInterface(string, [bool]) : You can’t query the interface of a type via reflection in the CF… at least not a single one by name. The GetInterfaces() method is supported, so might as well just use that.
  • Math.Log(double, double) : Actually, there is a Log(double) function, but it’s with the natural base. The custom-base one is not supported. Seriously? (and I know, the workaround is a one-liner)
  • HashSet : I love the .NET 3.5 HashSet generic class. It’s really complete, super fast… but the CF doesn’t have it. I ended up faking one with a Dictionary as a backing collection, and rewriting the set operators (UnionWith, IntersectWith, etc.) that I really used.

4. You’ll pretty much need a Content project

I don’t like the fact that in a standard XNA project, the content is compiled at build-time, in the Visual Studio IDE. For many reasons… one of them being that I’m not the one producing the content, the artist does, and he certainly doesn’t want to have Visual Studio installed. Another one being that my Content Processors are super heavy and VS sometimes crashes with a OutOfMemoryError before the build completes.

So what I did is write a content compiling tool that uses the MSBuild API to generate something like the .contentproj (yet simpler) based on the filesystem automatically, and compile it externally without needing Visual Studio. This works really great for Windows, we’ve been using it for months now.

But for Xbox… the deployment process is also tied to Visual Studio. And I’m not expert enough at MSBuild technologies to be able to replicate deployment outside of it. So I ended up making a content project only for the Xbox version of my Game project, and compile it separately when I need to test on XNA Game Studio Connect. This works OK, but I’m still a little unhappy about this whole Visual Studio dependency. I hope they look into it properly in the future.

So I was under the impression that you needed to have a content project for Xbox deployment, but Leaf (first comment) pointed out two ways of handling content compilation outside of Visual Studio. I’m definitely going to use the first one!

And…

These are the big points for now. Stuff I thought about adding : RenderTargets act different on Xbox and PC (but Shawn Hargreaves already blogged extensively on the subject and it’s way better than it used to be), Edit-And-Continue is not supported when debugging on the Xbox (but that would’ve been asking for the moon!), you get different warnings for shader compilation when targeting the Xbox360 platform so you should pay attention to that, etc. etc.

I’m still very much halfway through the conversion process, and I’m still learning, and still discovering oddities. If you have advice or corrections, please let me know through comments! On my part I’ll keep this post updated if I hit another big wall.

3 responses so far

Mar 29 2009

Fez Trailer Zwei

Published by Renaud Bédard under Fez

Here’s the new trailer we (Polytron) released last week at GDC, in case you missed it!

Fun fact about the trailer : it’s all realtime. The mini-levels all link to themselves, but to get maximum smoothness and music sync, we recorded the parts separately and edited them back together.

No responses yet

Mar 15 2009

Last.fm Scrobble Fetcher & Mapper

Published by Renaud Bédard under C#

And now for something completely different!

Downloads

Source & Binaries : scrobblemapper_v1_src.zip (338 kb, C# 3.5, VS.NET 2008 Solution & Project)
Binaries Only : scrobblemapper_v1_bin.zip (266 kb, .NET 3.5 SP1 Framework needed)

Description

The Last.fm Scrobble Fetcher & Mapper does exactly that. It fetches all your scrobbles from your (or anyone’s, really) Last.fm account, assembles them in terms of Play Count and Date Last Played for each music track, and then exports this data to either Windows Media Player 11 or iTunes.

I’ve been working on this application on and off for some time now and I think it’s ready for deployment. I built it because I have quite an extensive music library, I like my playlists smart and automatic, and I have poor luck with music players, hardware failure and vendor lock-in.

It seems that the “Play Count” and “Date Last Played” metadata fields are not part of a file’s ID3 tags, but rather is stored in the player’s local library. This is usually fine, and probably more efficient than writing to files all the time, but it means that if your player’s database gets corrupted (as happened to me with Windows Media Player) or that you decide/are forced to use another player such as iTunes because your iPhone doesn’t want to sync with anything else, then you’re screwed. And same thing of course if you reinstall your machine or get a new one.

I feel that this metadata is important because I like to have automatic “best-of” playlists that are based upon it, and sometimes it’s nice to listen to a comfortable, time-proven playlist.

Thankfully, if like me you’ve been using the Last.fm services since they were called Audioscrobbler, you’ve gathered an impressive amount of playback information in your account over the years. And using their web services, and my application, now you can take it back home!

Technology

I wanted this application to be my cleanest, most error-tolerant and most efficient piece of work yet in application design. I also tried to exploit all C# 3.5 features, having accumulated some months of experience with LINQ, lambda expressions, etc.

Here’s a rundown of its tech features :

  • Multi-threaded

    • Uses the Parallel Extensions for .NET June 2008 CTP and a little home-made framework for progress-reporting asynchronous foreground tasks.
    • Most operations will use multiple cores thanks to the Task Parallel Library’s capabilities, and will seldom lock because of its lockless concurrent data structures.
  • WCF-enabled

    • Uses the Windows Communication Foundation classes in .NET 3.5 to communicate with Last.fm’s RESTful API.
    • All the response data objects are deserialized automatically from XML using the built-in XmlSerializer.
  • Error tolerant

    • Since Last.fm track data will not always match your library’s perfectly, I used the Levenshtein distance string metric and “neutralized” strings (removes all accentuated characters, symbols, whitespace, etc.) to get accurate matches.
    • Will (should!) recover gracefully from errors, both caused by user input or unexpected conditions. It also allows graceful cancelation of long-running operations.

Screenshots

scrobblemapper9 scrobblemapper8scrobblemapper5 scrobblemapper3

Closing notes

- This code uses a Community Technical Preview version of the Parallel Extensions for .NET, and one that is almost a year old… So it’s delivered as-is, and you cannot (and should not) use that library in commercial products. Although I have functionally tested the program pretty extensively, I cannot guarantee it’s not going to corrupt your library or overwrite some information. Use at your own risk!

- I noticed that iTunes checks if the file is writable before setting any metadata concerning it. So you should make your files writables unless you’ll get a ton of errors in the error log after the scrobble-mapping.

- You need to keep the iTunes instance running for the iTunes mapping to work. The COM interface requires it to be open, but you can minimize it to the tray.

- The code is released under the Creative Commons Attribution-Share Alike 2.5 license.

If you encounter anything unusual, if you have ideas for extensions to this program or have comments/questions about how it works, do ask! That said I will probably blog some more about parts of this program, like the WCF usage or the asynchronous task classes. Enjoy!

19 responses so far

Next »