Meditations – April 8th 2019

In March 2018, I attended Train Jam and chatted there with Rami Ismail about a project he was starting up : contemplative, 5-minute games released day-by-day. I thought it was neat, but I didn’t commit to anything since my time for side-projects is super limited.

In late November 2018, I was asked by Jupiter Hadley whether I’d like to contribute to a super secret project she and others were working on… I asked about details, and realized it’s the same thing Rami pitched me months earlier : Meditations! Jupiter explained that I should spend no more than 6 hours on the project, and submit it before December 14th 2018. The timing worked out, so I decided to give it a go.

I picked April 9th as the date (it was moved back 1 day to April 8th after the fact), and started browsing my Google Photos to see what the heck happened on an April 9th in my life… 2013 jumped out as a memorable one. My then-girlfriend (now wife) MC and I went to Nara, Japan where we met bowing deer and strolled the city on a beautiful sunny day. The cherry trees were at the peak of their flowering time, and they shed pink petals everywhere.

Yours truly, making friends with the locals

So I figured I’d do a little vignette game with a tree, falling petals, and heavy emphasis on the soundscape. I wanted it to feel vaguely like Proteus, with a Boards of Canada kind of musical vibe. Other stylistic references I had in mind : Devine Lu Linvega‘s Kanikule, and @ktch0‘s Places.

I asked my friend and colleague Rodrigo Rubilar whether he’d like to do sound design and music. I originally planned to do it myself with my limited audio gear, but I was excited to work with him on a small project like this, and he’d produce quality results 10 times faster than I could! We jammed an afternoon on an ambient track with a few layers, and one-shot sounds for the leaves hitting the ground.

https://twitter.com/renaudbedard/status/1073678256304316416
Rodrigo’s home studio is a goldmine of analog audio gear

At the same time I started hacking at a fractal tree generator in Unity. It’s actually the first time I’d tried to write one, so it was interesting to see how the simplest algorithm could apply to 3D, with some randomness injected into it. I exposed a ton of parameters, and played with them until I was happy with the result.

Fractal tree generation in action

Then I started thinking about the petals, and how I wanted them to move in the air. I started with a basic Unity particle system and wind zones, but I suck at authoring those and I could never get the look I wanted. So I resorted to manually updating the petals with simple physics. But then I wanted a lot of petals… thousands of them. So I started toying around with Unity 2018’s job system.

Optimization ended up being the most interesting engineering aspect of the project for me, and the biggest time-sink. There is no way I respected the development time limit of 6 hours (probably spent more than 40 hours on it), but it was hard to let go. So I’d like to at least document what I ended up doing and why.

If you just want to look at the code, it’s all here.

Jobified update

I ended up with a single job that does most petal-related work :

  • Update physics based on wind parameters
  • Detect ground collisions
  • Detect player interactions (you can rise petals from off the ground by walking near them)

I want to update all the active petals every frame, so I schedule the job in the earliest Update callback (you can tweak that using Script Execution Order), and complete it in LateUpdate.

I wanted petals hitting the ground to trigger sounds, so the naive way of making this happen in Unity (and my initial approach) is to have one GameObject per petal, with an AudioSource component. The AudioSources don’t have to be enabled all the time, only when they’re playing a sound, but that does mean that I have to update their Transform so that the sound plays at the right 3D position.

It’s possible to access and update Transforms from within a job by implementing IJobParallelForTransform, and pass the transforms to the job using a TransformAccessArray. A major gotcha with the current version of the job system is that only root Transforms will be assigned to different threads. I originally had petals as child of an empty game object just for grouping, but then my job was using a single worker thread. Reading this forum post explained it, but it’s still baffling to me, and makes for a pretty hideous Hierarchy tab.

I ended up not using Transforms for petals at all, but more on this later.

One thing that I found interesting is how one returns data from the job to the game code. For instance, I want to know which petals should trigger a sound by returning a list of petal indices that touched the ground. You can declare a field in the job as NativeQueue<int>.Concurrent (which is part of the Collections package) , and enqueue data as the job runs on multiple threads. From the calling site, you need a NativeQueue<int> to be created, but when it’s passed to the job instance, you need to use the .ToConcurrent() method to make it usable. Then, once the job is complete, you can use .TryDequeue() to extract elements from it. Works great!

I enabled Burst on my job very naively, and spent zero time optimizing the generated code. I did take care of the basics : make sure to mark job fields appropriately as [ReadOnly] and [WriteOnly], use the SIMD-capable Unity.Mathematics package whenever possible. In the end, the job runs way faster than I need, so I just moved on.

Burst had a dramatic effect on the job’s performance. Here’s a profile of the same code running on the same system, with Burst compilation toggled off and on. The biggest noticeable difference is that LateUpdate does not need to wait on jobs at all when they are Burst-enabled, which means it runs fully in parallel!

Longest job without Burst enabled : 1.70ms
Longest job with Burst enabled : 0.05ms

Rendering

Note : even though I’m using Unity 2018.3, I did not use the Scriptable Render Pipeline in this project, so the following probably only applies for the legacy rendering pipeline.

I initially had every petal GameObject with a MeshRenderer, a simple quad mesh, and the Unity Standard Shader with instancing turned on. It definitely worked, but Unity spent a lot of time in culling and batching, and I thought I could do better… by short-circuiting it completely.

It might sound like rendering everything all the time is a bad call, but in this game, the worst-case scenario of all petals being visible at once is the very first thing you see. Might as well make it a fast general case!

One way to draw objects manually is to use Command Buffers. By specifying a mesh, a material and a bunch of transformation matrices, you can use the DrawMeshInstanced() method to draw up to 1022 (that number I got empirically, but it appears to be undocumented) instances at once, in a single call. There is no renderer on the game objects themselves, so Unity acts as if they don’t exist, apart from issuing draw commands.

My petal update job’s final step is then to bake the petal’s current position, rotation and scale into a 4×4 transformation matrix, so that I can use them in my command buffers later. But there it gets annoying :

  • DrawMeshInstanced() takes a Matrix4x4[], nothing else, so we’ll have to copy our NativeArray to a managed array.
  • We have to respect the batch size of 1022, so we have to use NativeSlice to take 1022-element slices of our NativeArray.
  • NativeSlice.CopyTo() copies element by element, and it’s very slow.

I found a faster CopyTo implementation on the Unity forums which I slightly modified, and it does perform much better, but it’s a lot of moving data around for no reason. I wish that NativeSlices were directly usable with command buffers.

I end up with two very similar command buffers : one for depth rendering inside shadow-maps and for the camera’s depth pass, and one for regular opaque rendering.
The depth rendering command buffer uses the ShadowCaster pass from the Standard shader, and hooks to the AfterShadowMapPass event on lights as well as the AfterDepthTexture event on the camera. I needed the camera depth pass for SSAO using Keijiro Takahashi’s MiniEngineAO Unity implementation.

Triggering audio

As I hinted to earlier, I ended up ditching GameObjects for petals entirely. The only reason I originally kept them around (and updated at all times) was for audio, but it’s a lot of data shuffling just for sporadic sound triggers, so I opted for a simple audio trigger object pool instead.

I pre-initialize 1500 (again, empirical) dummy GameObjects with a disabled AudioSource component that lay dormant in the pool until an audio event happens. At that point, I take one from the pool, enable audio, position it at where the impact happens, and play the audio clip. There is no update cost to having them around otherwise, and they are disabled & returned to the pool when the sound playback is complete.

Petals are then strictly represented by a data structure that lives in a NativeArray, not by a GameObject. And the update job becomes a IJobParallelFor, mutating elements of that array. Neat!

You may wonder how dramatic a performance difference this makes. So here’s a comparison, taken from the same Macbook, of the CPU time spent only scheduling jobs between the version using Transforms and the version without. Something about how Unity handles transform access to a job definitely incurs sizeable overhead.

Using Transforms (1.79ms spent scheduling jobs)
Without Transforms (0.03ms spent scheduling jobs)

Bending the tree

When I had the petals mostly working well, it still felt strange that the tree didn’t animate at all. I wanted branches to sway in the wind, and I figured I could make that happen easily in its vertex shader.
It turned out to be a lot trickier than I expected, for a few reasons :

  • The tree is generated as a single baked mesh, there is (ironically) no tree structure to it. No great reason for this apart from lack of foresight.
  • The petals aren’t really attached to branches, their position is just set to the endpoint of branches when generating the tree.

To make the smaller branches bend but not the trunk, I set a bendability parameter during generation (how far away from the trunk that vertex is) in the texture coordinates.

A visualization of the bendability parameter for a generation

I then define the a bend axis perpendicular to wind direction, a bend angle dependent on wind force, and a shake factor which is just a bunch of low-frequency and high-frequency sine functions multiplied together, to make it feel a bit more noisy & organic.
These variables are mapped to shader globals, and I used keijiro’s angle-axis to float3x3 HLSL function to apply this in the vertex shader of my tree. So far so good!

Where it got tricky is for petals. I need to differentiate when they’re attached or detached, so I can apply the wind bending selectively. I ended up using a MaterialPropertyBlock to have per-instance data (a single float) encoding this state.

In the vertex shader (which is part of a simple Surface Shader), it took me a minute to understand how to do world-space transformations on vertex data. The simplest way I found was to transform the vertex to world-space, and take it back in object-space when I’m done. Not super efficient… ¯\_(ツ)_/¯

The full vertex shader for petals, color-coded by the lovely Rider

On the application-side, I did use a NativeArray<float> for instance data, but I’m not sure it’s the easiest way to go about it. I don’t actually read from or write to it from within my update job, it’s all happening in the regular Update(). It allowed me, however, to re-use Slice() and CopyToFast() to make 1022-sized chunks for rendering.
One thing to note : unlike with the draw arrays, MaterialPropertyBlocks cannot be reused for multiple batches, so I create all the ones I need up-front and iterate through them.

Another thing I realized, is that I need to apply the exact same function I used in the vertex shader to my petals the moment they are detached from the tree. This effectively bakes the tree’s bend at that moment in time into their transform, and avoids them visually warping around as they are detached.

In the end it still looks kinda stiff, but it’s better than the cement tree I started with.

Closing thoughts

I’m super happy with the final result. It’s relaxing, soft, yet sometimes surprisingly intense. It ended up feeling a lot more mournful than I expected, but I’m fine with that. I really enjoy the visual transition from a mess of pink sheets of paper to a menacing, pointy web.
But since everything was done in a rush, there’s some things I wish I’d done differently.

In Feburary 2019 I stumbled upon this rather excellent tree by Joe Russ, developer on Jenny LeClue :

https://twitter.com/Mografi_Joe/status/1100507496572223489

It made me realize that my tree shape and animation could be so much better. Having the branch splits be less even, more jagged and well, tree-like, would add a lot. And by having the branches hierarchically laid out, I could have bent the sub-trees and produce a much more convincing wind swaying effect.

Spending so much time on performance optimization made me wonder if doing a whole forest of trees, instead of a single one, would be possible at all. Wouldn’t that be neat? But it brought additional headaches that I didn’t want to deal with.

But there’s something to be said about making a small thing, with a tiny scope, a reasonable amount of polish, and putting it out there. Just the fact that I was excited enough about making this that I convinced Rodrigo to tag along made it worthwhile.
So big thanks to Rami, Jupiter and everyone involved in the Meditations project for making this happen. And thanks as always to my wife MC for humoring me while I spent precious evenings & weekends time endlessly debugging my stupid tree. ♡

Addendum : So what about ECS?

You might be wondering why I went straight with the Job System but didn’t use Entities, since they were showcased in the Megacity demo and Unity is really selling it as The Way to do massive amounts of objects with acceptable performance.
I don’t really have a good answer to this, except that I was short on time and was familiar with the idea of data parallelism, but not so much with ECS… and didn’t want to spend time learning how to use ECS properly. And I’m stubborn and will go down a path I’ve set for myself even if it’s a bad idea.

There’s good chances that using ECS is the best way to do what I was trying to achieve, but I didn’t go down that path yet. It would be interesting to revisit the project and do it that way; maybe I will!
I’m also curious to hear from people who have tried Unity’s ECS. Would it really make this easier to design? Would the rendering portion be simplified? Let me know!

Malisse

Malisse Logo

Malisse is a game I made at TOJam “Party like it’s 19TOJam9” in 2014 with Devine Lu Linvega, Rekka B., Dom2D and technobeanie as Les Collégiennes, with sound effects by dualryan.

The game is playable in the web player at its itch.io home : http://renaudbedard.itch.io/malisse

It uses Unity and the whole source code and assets are hosted in a public repository on GitHub, and released under a Creative Commons Attribution license.

Malisse screenshot

The gameplay is a two-player cooperative physics sandbox and puzzle game. The objective for both players is to clear a path for Malisse as she walks on a sinuous path in the world through the looking glass…

A bunch of rabbits trail behind her, but they get scared easily! Everytime Malisse bumps into an object she cannot climb, a rabbit will run away, but you recover one rabbit for every level cleared. If all the rabbits are gone, Malisse ends up alone and … cries herself to death? That’s the end of that play session, anyway!
Otherwise, the levels are chosen randomly from a pool of 11 hastily-authored levels which vary in difficulty. If you get stuck early on your first attempt, definitely give it another shot since you might find more palatable challenges.

It was an interesting game jamming experience for me in many respects : first time with a 5-person team, first time implementing sprite animations in Unity (using 2D Toolkit), and first time writing a tool for use inside the Unity editor — a spline tool for drawing roads quickly. We also had interesting last-minute collision issues, since we wanted Malisse to be able to climb slopes but didn’t want to resort to rigid bodies to have that done automagically. Spherecasting to the rescue, and lots of debugging! ^_^

If you’re wondering, the music was made during the jam by Devine (aka Aliceffekt), based on “When Johnny Comes Marching Home”, because it just fits marching so damn well.

Other than the web player, you can download the OS X and Windows standalone builds.

Enjoy! Had lots of fun building it. Closing with an outstanding band shot of us by Myriame Pilgrim!

LES COLLÉGIENNES

Ogg streaming using OpenTK and NVorbis

August 18th, 2015 Update

This article could be an interesting reference for people trying to understand how you can submit your own buffers to do streaming audio with OpenAL, but the actual tools I’m using (NVorbis, OpenTK) are outdated and I can’t recommend them anymore.

If you’re looking for a modern C# way of doing the same thing, look at how the Song class is implemented with Ogg Vorbis support in Ethan Lee’s FNA library, using Xiph Vorbisfile and the DynamicSoundEffect API, especially if you’re trying to do this in a MonoGame- or XNA-like environment. It’s much faster, the codebase is cut by half, and much less threading pitfalls!

Original article follows…


Updated September 7th 2012 : New OggStream class with better support for concurrent stream playback.

I was looking for a suitable replacement for the audio streaming and compression capabilities of XACT when porting an XNA project to MonoGame, and it doesn’t look like there’s a clear winner yet. MonoGame contributors suggested NAudio, but it looks like work needs to be done to make it portable, and the sample code is a mess. FMod EX or competing commercial solutions are an easy but costly choice. So I turned to OpenAL to see if it can be a free and usable solution for streaming compressed audio with some DSP capabilities.
T’was a bit challenging, but not impossible! :)

Decoding OGGs

Out of the box, OpenAL doesn’t support being fed MP3 or OGG sources. There are extensions for those, but according to one implementation, they’re deprecated. So you need to handle decoding yourself and feed the PCM bitstream to OpenAL.

It sure would be nice to have a purely managed implementation of libVorbis, but it doesn’t exist, so there’s a dozen homemade decoders floating around open source code hubs in various states of workability. I was pointed to NVorbis by TheGrandHero on the TIGSource forums, and I haven’t found a better alternative yet. CsVorbis is another, but it doesn’t support streaming, all the decoding is done up-front, which defeats the purpose. OggSharp is just a fork of CsVorbis with XNA helpers, so nope. TheGrandHero also mentioned trying out DragonOgg but having problems with it.

NVorbis worked like a charm for me, but it’s pretty early and doesn’t support some features like seeking around the stream, so looping or restarting playback requires creating a new whole new reader/decoder. I also took some time to optimize the memory usage in my fork of the project.
07/09/2012 Update : Andrew Ward, the author of NVorbis, resolved the memory allocation problems that the version I forked off had, so I pulled the new changes out instead.

Streaming

Once you have some decoded data, you have to make OpenAL stream it. This is sort of tricky but welldocumented.

(this image shamelessly stolen from Ben Britten’s blog entry linked above)

The basic idea is the following :

  • Generate one OpenAL source for your sound file, like XACT cues
  • Generate 2 or more OpenAL buffers
  • Fill at least one of those with the first samples of the sound and enqueue it/them to the source
  • Start playback of the source; it’ll play all the buffers associated with it, in order
  • In a background thread :
    • Query the source to know whether buffers have already been processed
    • If so, dequeue those buffers, refill them with fresh data and re-enqueue them

In practice, since it involves threads, it’s a bit more obtuse than the pseudo-code, but OpenAL makes it relatively painless. The trick is to read enough data and often enough to avoid buffer underruns.

Then, if you want to loop the sound, it’s not as easy as setting the source’s “Looping” parameter to true, because the buffers never contain the full sound file. Instead of no longer feeding the buffers when you hit the end of the Ogg stream, you just start back at the beginning and feed continuously, which has the nice side-effect of being 100% gapless.

Filters and effects

Finally, I wanted to have one fancy effect that XACT provided : low-pass filtering. This is used extensively in FEZ as a gameplay mechanic, so I could hardly live without it in MonoGame ports.

Thankfully, OpenAL Effect Extensions (EFX) provide cross-platform effects including filters, at least in theory. In reality, this depends on whether the driver implementation supports them, and even the Creative reference Windows implementation doesn’t on my system.

I was able to find a software implementation that does though, OpenAL Soft, and it’s cross-platform, so that bodes well.
To override the installed implementation, just supply the software DLL in the application’s directory and voilà. Had no problems with it up to now, performance or otherwise.

Plus, it comes with a console application that outputs which EFX and other extensions are supported in this implementation. This is handy to detect whether the right DLL’s been used, and helped me figured out that the Creative implementation didn’t support any filter. Here’s what it should say :

Sample class

The result of all of this is a OggStream class that is in my fork of NVorbis on GitHub, which you can find here :

Update : Version 2.0 comes with a sample console application which allows you to test and visualize how different streams get buffered and when buffer underruns occur in a nice concise format. I’m really quite happy about it, give it a shot! Here’s how it looks :

Legend of the symbols that this app blurts out :

  • (* means synchronous buffering (Prepare()) has started, and ) means it ended.
  • . means that one buffer has been refilled with fresh samples
  • | means that there are no more samples to consume from the sound file
  • ! means that playback stopped because of a buffer underrun and had to be restarted
  • { and } represent calls to Start() and Stop()
  • [ and ] represent calls to Pause() and Resume()
  • L, F or f and V or v in prefix means respectively that the stream is looping, fading the low-pass filter in/out or fading volume in/out

My code has only been tested on .NET on Windows, but I don’t see why it wouldn’t work in Mono either.
Like all the unlicensed content on this blog, it’s public domain, but attribution is appreciated.

Cubes All The Way Down @ IGS (GDC)

This again?!

I re-did my slides and my talk at the Independent Games Summit of the GDC 2012. It grew from a measly 42 slides to a healthy 62, so there is more content, many more videos, and incorporates some of the feedback I had about the MIGS version.
Update : it’s on the GDC Vault, (no membership required!) if you want to see me give the presentation.

Without further ado, here are the slides in different formats :

It’s Cubes All The Way Down (PDF format)(PDF with Notes)(PPTX format)

And you can download the associated Videos and songs (179Mb!)

A Replacement for Coroutines in Unity + C#

Coroutines are a great idea and super useful, but they’re kind of unwieldy to use in C# and sometimes they just don’t plain work. Or I don’t know how to use them properly. All I know is that they’re more complicated than they need to be, and I remember having problems using them from an Update method.

So I made my own version of Coroutines inspired by the XNA WaitUntil stuff I posted about a long time ago. Here it is!

using System;
using UnityEngine;
using Debug = UnityEngine.Debug;
using Object = UnityEngine.Object;

class ConditionalBehaviour : MonoBehaviour
{
    public float SinceAlive;

    public Action Action;
    public Condition Condition;

    void Update()
    {
        SinceAlive += Time.deltaTime;
        if (Condition(SinceAlive))
        {
            if (Action != null) Action();
            Destroy(gameObject);
            Action = null;
            Condition = null;
        }
    }
}

public delegate bool Condition(float elapsedSeconds);

public static class Wait
{
    public static void Until(Condition condition, Action action)
    {
        var go = new GameObject("Waiter");
        var w = go.AddComponent<ConditionalBehaviour>();
        w.Condition = condition;
        w.Action = action;
    }
    public static void Until(Condition condition)
    {
        var go = new GameObject("Waiter");
        var w = go.AddComponent<ConditionalBehaviour>();
        w.Condition = condition;
    }
}

Here’s an example of use, straight out of the Volkenessen code (with special guest appearance from my ported easing functions) :

var initialOffset = new Vector3(hitDirection.x * -1, 0, 0);
var origin = armToUse.transform.localPosition;
armToUse.renderer.enabled = true;

Wait.Until(elapsed =>
{
    var step = Easing.EaseOut(1 - Mathf.Clamp01(elapsed / Cooldown), EasingType.Cubic);
    armToUse.transform.localPosition = origin + initialOffset * step;
    return step == 0;
},
() => { armToUse.renderer.enabled = false; });

What’s going on here :

  • You call Wait.Until as a static method and pass it one or two methods (be it lambdas or method references) : The first one is the Condition which gets evaluated every Update until it returns true, and the second gets evaluated when the condition is true (it’s a shorthand, basically)
  • The Wait static class instantiates a “Waiter” game object and hooks a custom script component to it that does the updating and checking stuff
  • The condition gets passed the number of seconds elapsed since the component was created, so you don’t have to keep track of it separately.

I use it for waiting for amounts of time (Wait.Until(elapsed => elapsed > 2, () => { /* Something */ })), interpolate values and do smooth transitions (like the code example above, I animate the player’s arm with it), etc.

I’ll probably keep updating my component as I need more things out of it, but up to now it’s served me well. Hope it helps you too!

Volkenssen – Global Game Jam 2012

Volkenessen is a game I made with Aliceffekt as Les Collégiennes on January 27-29 2012 as part of the 48h Global Game Jam. We actually slept and took the time to eat away from our computer, so based on my estimate we spent at most 30 hours making it!

It’s a two-player, physics-based 2D fighting game. Each player starts with 9 random attached items on his back, and the goal is to strip the other player of his items by beating the crap out of him. When items are removed, they clutter up the playing area, making it even more cahotic and hilarious. The washing machine and sink in the background can also fall and bounce around!

Controls

You need two gamepads (so far the Xbox wired, wireless and a Logitech generic gamepad have been tested and work [you can use the Tattiebogle driver to hook up an Xbox controller to a mac]) to play, there are no keyboard control fallback (yet). The controls are pretty exotic. To move around you can press either the D-Pad (or left analog stick) or the face buttons (A/B/X/Y), and the direction of the button does the same input as if you pressed that D-Pad direction. As you move, your player will throw a punch, kick or flail his ears to make you move as a result.

To hit the other player, you need to get close to him by hitting away from him, then hit him by moving away from him. Ramming into the opponent just doesn’t do it, you need to throw punches, and depending on the impact velocity, even that might not be enough. You can throw double-punches to make sure you land a solid hit and take off an item.

Development

It was made in Unity, with me on C# script and Aliceffekt on every asset including music and sound effects. I see it as one of our most successful jam games; it even won the judge award at our local GGJ space, and it was just so much fun to make, test and play.

I was surprised how well the rigid body physics worked out in the game. I had to use continuous physics on the players and tweak the gravity/mass to get the quick & reactive feel we wanted, but the game was basically playable 6 hours in! After that it was all tweaking the controls, adding visual feedback, determining the endgame condition and coerce the GGJ theme around the game.

I’ll be porting the game to the Arcade Royale in the coming days/weeks, and it should be a blast to play on a real arcade machine :)

Downloads

Windows (32-bit)
Windows (64-bit)
Mac OS X (Universal) 

Enjoy!

Cubes All The Way Down @ MIGS

Back in November 2011, I gave a talk at the Montréal International Game Summit in the Technology track called “Cubes All The Way Down”, where I talked about how FEZ was built, what’s the big modules, the challenges and intricacies of making a tech-heavy indie game from scratch.

It went okay.
I was really stressed, a bit unprepared due to FEZ crunch time, and just generally uncomfortable speaking in front of an audience.
I spoke so fast that I finished 15 minutes early and had 30 minutes for questions, which worked great for me because the relaxed setting of a Q&A session meant better flow, better information delivery, I really liked that part. Also I had friends in the front row that kept asking good questions and were generally supportive, so all in all a good experience. :)

I was asked about giving the slides out, so here they are! Unedited.

It’s Cubes All The Way Down (Powerpoint 2007 PPTX format) (PDF format)

Enjoy!

Encoding boolean flags into a float in HLSL

(this applies to Shader Model 3 and lower)

Hey! I’m still alive!

So, imagine you’re writing a shader instancing shader (sounds redundant, but that’s actually what they are) and you’re trying to pack a lot of data into a float4 or a float4x4 in order to maximize the amount of instances you can render in a single draw call.

My instances had many boolean flags that changed per-instance and that defined how they were lit or rendered. Things like whether or not they are fullbright (100% emissive), texture transform flags (repeating on x or y, more efficient to rebuild the texture matrix than pass it), etc.
Using one float out of your instance data matrix for each boolean is doable, but highly wasteful. A natural way to fit in many flags into an integer is to use a bitfield, but there’s no integer arithmetic in HLSL, and they’re floating point values… how does one proceed?

Here’s how I did it.

Application side

First, this is how I pack my data into floats from the application side (setting the effect parameter) :

int flags = (fullbright ? 1 : 0) | 
	(clampTexture ? 2 : 0) | 
	(xTextureRepeat ? 4 : 0) | 
	(yTextureRepeat ? 8 : 0);

Geometry.Instances[InstanceIndex] = new Matrix(
	p.X, Rotation.X, Scale.X, color.X,
	p.Y, Rotation.Y, Scale.Y, color.Y,
	p.Z, Rotation.Z, Scale.Z, color.Z,
	Animated ? Timing.Step : 0, Rotation.W, flags, Opacity);

Just putting an OR operator between the flags you wanna put, and keep the flag bits powers of two.
Ignore the rest of the matrix contents, they’re just here for show. (in my case : position, rotation, scale, color, opacity, animation frame and the flag collection).

A note on floating point : in a single-precision floating point number as defined by the IEEE, you’ve got 23 bits for the significand. That means you can theoretically put 23 flags in there! That’s a lot of data.
(also, considering the decimal point is floating, you can effectively put much more than 23 bits if some of them are mutually exclusive…!)

Vertex shader

Now in the vertex shader, they get passed to an effect parameter through vertex shader constants, and here’s now the decoding works :

int flags = data[2][3];

bool fullbright = fmod(flags, 2) == 1;
bool clampTexture = fmod(flags, 4) >= 2;
bool xTextureRepeat = fmod(flags, 8) >= 4;
bool yTextureRepeat = fmod(flags, 16) >= 8;

I know my flags reside in the 3rd row, 4th column of my matrix, so I grab ’em from that. Might as well cast them to an integer right now since I won’t be using decimals.

Then I can test for values by testing the remainder of the division of each power-of-two. There is no integer modulo intrinsic function in HLSL for Shader Models 3 and lesser, but the floating-point version works fine.

If I set the first (least significant) bit of a number and divide it by two, the remainder will be 1 if that bit is set. Basically, we test if that number is odd or even; odd means the bit is set.

For every other test, we can test whether the remainder is greater or equal to half the divisor. Effectively, we’re masking the bits greater than the one we’re testing, and testing remaining bits for the presence of the one we’re looking for. Here, if we test for the 3rd bit (from the LSB), so masking with 8 (1000 in binary) and testing against 4 (0100 in binary) :

0000 % 1000 = 0000 // 0 < 4, bit not set
0100 % 1000 = 0100 // 4 >= 4, bit set
1011 % 1000 = 0011 // 3 < 4, bit not set
1110 % 1000 = 0110 // 6 >= 4, bit set
1101 % 1000 = 0101 // 5 >= 4, bit set

Enjoy!

Last.fm Scrobble Fetcher & Mapper

Update May 15th 2016 : New v3.0 build on the GitHub project page with many fixes and additions.

Update August 26th 2010 : New v2.1 build on google code that fixes the “Invalid XML Characters” issue! Try that one if you had errors when fetching scrobbles.

Updated June 26th 2010 : This project is now hosted on Google Code! I’m new to this so bear with me, but I’ll try to make it nice so that people can contribute. (Google Code no longer exists and the project is now on GitHub)

Updated March 5th 2010 : You shouldn’t need to install both iTunes and WMP for it work, just the one you want to use! Finally.

Original March 15th 2009 post follows.

Downloads

Get the freshest releases (source or binaries) on the GitHub project page.

Description

The Last.fm Scrobble Fetcher & Mapper does exactly that. It fetches all your scrobbles from your (or anyone’s, really) Last.fm account, assembles them in terms of Play Count and Date Last Played for each music track, and then exports this data to either Windows Media Player 11 or iTunes.

I’ve been working on this application on and off for some time now and I think it’s ready for deployment. I built it because I have quite an extensive music library, I like my playlists smart and automatic, and I have poor luck with music players, hardware failure and vendor lock-in.

It seems that the “Play Count” and “Date Last Played” metadata fields are not part of a file’s ID3 tags, but rather is stored in the player’s local library. This is usually fine, and probably more efficient than writing to files all the time, but it means that if your player’s database gets corrupted (as happened to me with Windows Media Player) or that you decide/are forced to use another player such as iTunes because your iPhone doesn’t want to sync with anything else, then you’re screwed. And same thing of course if you reinstall your machine or get a new one.

I feel that this metadata is important because I like to have automatic “best-of” playlists that are based upon it, and sometimes it’s nice to listen to a comfortable, time-proven playlist.

Thankfully, if like me you’ve been using the Last.fm services since they were called Audioscrobbler, you’ve gathered an impressive amount of playback information in your account over the years. And using their web services, and my application, now you can take it back home!

Technology

I wanted this application to be my cleanest, most error-tolerant and most efficient piece of work yet in application design. I also tried to exploit all C# 3.5 features, having accumulated some months of experience with LINQ, lambda expressions, etc.

Here’s a rundown of its tech features :

  • Multi-threaded

    • Uses the Parallel Extensions for .NET June 2008 CTP and a little home-made framework for progress-reporting asynchronous foreground tasks.
    • Most operations will use multiple cores thanks to the Task Parallel Library’s capabilities, and will seldom lock because of its lockless concurrent data structures.
  • WCF-enabled

    • Uses the Windows Communication Foundation classes in .NET 3.5 to communicate with Last.fm’s RESTful API.
    • All the response data objects are deserialized automatically from XML using the built-in XmlSerializer.
  • Error tolerant

    • Since Last.fm track data will not always match your library’s perfectly, I used the Levenshtein distance string metric and “neutralized” strings (removes all accentuated characters, symbols, whitespace, etc.) to get accurate matches.
    • Will (should!) recover gracefully from errors, both caused by user input or unexpected conditions. It also allows graceful cancelation of long-running operations.

Screenshots

scrobblemapper9 scrobblemapper8scrobblemapper5 scrobblemapper3

Closing notes

– This code uses a Community Technical Preview version of the Parallel Extensions for .NET, and one that is almost a year old… So it’s delivered as-is, and you cannot (and should not) use that library in commercial products. Although I have functionally tested the program pretty extensively, I cannot guarantee it’s not going to corrupt your library or overwrite some information. Use at your own risk!

– I noticed that iTunes checks if the file is writable before setting any metadata concerning it. So you should make your files writables unless you’ll get a ton of errors in the error log after the scrobble-mapping.

– You need to keep the iTunes instance running for the iTunes mapping to work. The COM interface requires it to be open, but you can minimize it to the tray.

– The code is released under the Creative Commons Attribution-Share Alike 2.5 license.

If you encounter anything unusual, if you have ideas for extensions to this program or have comments/questions about how it works, do ask! That said I will probably blog some more about parts of this program, like the WCF usage or the asynchronous task classes. Enjoy!

Common Kanji Character Ranges for XNA SpriteFont Rendering

Note : This sample is practically useless, because the XNA Localization sample has a much better alternative using the Content Pipeline and character detection from resource files, which works for any language (Chinese, Korean, Japanese…). But I guess if you wanted to get the ranges of common Kanji, here’s how.

While working on Japanese language support in XNA, I realized a couple of things about Japanese writing (some of which may seem obvious, but wasn’t for me) :

  • There’s two broad character sets : Kana (syllabic) and Kanji (logographic)
  • Kana has two modern subsystems or components, Hiragana and Katakana, each with two distinct Unicode regions of respectively 92 and 95 different glyphs (187 total)
  • Kanji originate from Chinese “Han” characters, and are stored within the CJK (Chinese, Japanese, Korean) portion of Unicode. But CJK characters don’t uniquely reference Japanese logographs, and it contains over 20000 glyphs!
  • There’s over 10000 actual Japanese kanji, but only about 2000 of which every high-school grade Japanese person should know

In XNA, the SpriteFont class and its associated content pipeline use bitmap fonts internally to cache and render text strings. It becomes obvious that generating a bitmap font with 20000 Han characters would take a very long time, and is also very irresponsible memory usage. Even 10000 characters seems ridiculous.
I wanted to keep using SpriteFonts, so switching to a realtime font rendering option like FreeType was out of question. So how does one make bitmap fonts usable in Japanese?

While researching the subject, I stumbled upon a whitepaper called “Unicode and Japanese Kanji” by Tony Pottier, in which he discusses how to isolate Japanese Kanji from the CJK characters, and even dresses a list of all 1946 unique characters that are learned in Japanese education up to grade 7 (sorted by Unicode point, or by learning grade). Even if it’s a large amount of glyphs, it’s a lot more reasonable than 10k.

So the only remaining step is to make this table into a a list of XML CharacterRegion elements so that we can use them in an XNA SpriteFont declaration.
I made a little C#3 program that takes a list of Kanji, one per line, and blurts out the expected XML; it also joins the succeeding characters into regions to save space.

using System;
using System.IO;

namespace KanjiFinder
{
    static class Program
    {
        static void Main()
        {
            var output = new StreamWriter("regions.xml");
            var input = File.ReadAllLines("kanjis.txt");
            var writeCount = 0;
            var intervalsCount = 0;

            var start = (int)char.Parse(input[0]);
            var end = start;
            foreach (var line in input)
            {
                var cur = (int)char.Parse(line);
                if (cur - start > 1)
                {
                    output.WriteLine(string.Format("<CharacterRegion><Start>&#x{0:X4};</Start><End>&#x{1:X4};</End></CharacterRegion>", start, end));
                    writeCount += end - start + 1;
                    start = cur;
                    intervalsCount++;
                }
                end = cur;
            }
            output.WriteLine(string.Format("<CharacterRegion><Start>&#x{0:X4};</Start><End>&#x{1:X4};</End></CharacterRegion>", start, end));
            writeCount += end - start + 1;

            output.Close();

            if (writeCount != input.Length)
                throw new InvalidOperationException();
            Console.WriteLine(intervalsCount);
        }
    }
}

Here’s its input kanjis.txt (in Unicode format), and its result is regions.xml.

I chose to go up to Grade 7, but one may choose to ignore Grade 7 characters and just do 1-6. I don’t know whether Grade 7 characters are useful in game menus and usual dialogue.

All that’s left is to put those regions in a <CharacterRegions> tag inside a .spritefont file, and supply a valid Japanese font! Thankfully, Windows 7 comes with a bundle of these (MS Gothic, Kozuka, Mieryo and Mincho) and the M+ Fonts offer a public domain alternative.

Apart from the Kanji regions list, a couple more regions you’ll probably need :

<!-- Ideographic Symbols and Punctuation -->
<CharacterRegion><Start>&#x3000;</Start><End>&#x303F;</End></CharacterRegion>
      
<!-- Hiragana -->
<CharacterRegion><Start>&#x3040;</Start><End>&#x309F;</End></CharacterRegion>   
      
<!-- Katakana -->
<CharacterRegion><Start>&#x30A0;</Start><End>&#x30FF;</End></CharacterRegion>   
      
<!-- Fullwidth Latin -->
<CharacterRegion><Start>&#xFF01;</Start><End>&#xFFE6;</End></CharacterRegion>   
<!-- and/or the standard Latin set... -->
<CharacterRegion><Start>&#32;</Start><End>&#126;</End></CharacterRegion>

That’s it! Hope it helped.