16-Bit Color Encoding on the GPU

While working on some tangent project you’ll know about pretty soon, I’ve been trying to pack color data that had little visual importance from 24-bit “Truecolor” R8G8B8 to 16-bit “Highcolor” R5G6B5. Intuitively the solution is to take the most significant bits of each component and fit it inside two 8-bit containers by using bitwise operations.

But the problem is, bitshifting and just any bitwise operator are not supported in shaders before SM4.0, and I am still lagging behind with my videocard and OS so I can’t run those yet. And anyway, I assume 95% of the world can’t either.
So the only way to make this is to resort to integer arithmetic (division, multiplication and modulus). And since it took me most of the day to have it working, I thought I’d share my little HLSL snippet with the world.

Update : Now with 232.3% less arithmetic instructions!
Update #2 : Added in netics’s optimization in the encoding, 3 less instructions!

float2 EncodeR5G6B5(float3 rgb24)
	// scale up to 8-bit
	rgb24 *= 255.0f;

	// remove the 3 LSB of red and blue, and the 2 LSB of green
	int3 rgb16 = rgb24 / int3(8, 4, 8);

	// split the green at bit 3 (we'll keep the 6 bits around the split)
	float greenSplit = rgb16.g / 8.0f;

	// pack it up (capital G's are MSB, the rest are LSB)
	float2 packed;
	packed.x = rgb16.r * 8 + floor(greenSplit);		// rrrrrGGG
	packed.y = frac(greenSplit) * 256 + rgb16.b;		// gggbbbbb

	// scale down and return
	packed /= 255.0f;
	return packed;

float3 DecodeR5G6B5(float2 packed) {
	// scale up to 8-bit
	packed *= 255.0f;

	// round and split the packed bits
	float2 split = round(packed) / 8;	// first component at bit 3
	split.y /= 4;				// second component at bit 5

	// unpack (obfuscated yet optimized crap follows)
	float3 rgb16 = 0.0f.rrr;
	rgb16.gb = frac(split) * 256;
	rgb16.rg += floor(split) * 4;
	rgb16.r *= 2;

	// scale down and return
	rgb16 /= 255.0f;
	return rgb16;

Update Notes : Now, the first version I had posted here was much more high-level, and used functions like rightShift(x, a) that emulated bitwise operators. The idea was good, and it allowed me to experiment until I got it working, but it was way too complicated and the HLSL compiler just couldn’t optimize it well enough. So I rewrote it.

The new version consumes 28 vs_3_0 instructions to encode, and 11 ps_3_0 instructions to decode including the texture sampling. The old one was respectively 69 and 24 instructions for the exact same result. It’s crazy how optimizable some tasks are.
The big changes were the caching of divisions in a variable and the use of floor() or frac() instead of integer arithmetic, packing of similarly used data in vectors to group operations, removal of all pow() function calls, and overall code tidying. It gives a pretty hard to understand decoding function, but >200% speed-up totally justifies it.

An additional thing that I found out while optimizing, it’s just impossible to remove most-significant-bits by left-shifting and right-shifting back into place with integer arithmetic. The reason is that there is no native integer math on GPUs before SM4.0 and even if you can push a number by 30-something bits, you can’t bring it back down because the inverse has too many decimals and the floats run out of them. So the natural way to work around that is right-shifting (divide by 2^x), then use of the frac() intrinsic, and left-shifting if necessary to bring it back up.

EncodeR5G6B5() and DecodeR5G6B5() take respectively one float3 and compresses it to float2, or the inverse. Most of the color information is kept because only 3 bits at most are stripped, and they’re in the 1-4 range.

The encoding logic is the following :

  • Take the float3 (24-bit) color and expand it to 256-base range.
  • Remove the least significant bits (3-2-3) of each components by using 2^x integer division.
  • Shift the 5-bit red component leftmost and place it in the first 8-bit field.
  • Split the 6-bit green component in the two fields; the three least significant bits (LSB) of the first field will have the component’s three most significant bits (MSB), and the three MSB of the second field will have the rest (the component’s three LSB). This might have sounded confusing, but basically we’re filling the holes in sequence.
  • Append the 5-bit blue component to the remaining space, no need to shift, just bitwise-OR it up.
  • Take back the range to 1-base by floating-point-dividing on 255.

The decoding logic is, as one would expect, the inverse.
One important mention though is the presence of the round intrinsic function. Without it, for reasons unknown to my sleep-deprived brain, I keep losing random bits. I assume that integer casting (explicit or implicit) in HLSL just drops all decimals, like a floor operation would, and to be consequent we need to round it off to the nearest integer.
And of course since we’re dealing with encoded data, any bit could make a dramatic change!

And as a closing note, it doesn’t work very well with FSAA or probably any sort of blending, because those change the intensities by arbitrary factors and will screw up the encoding. I’ve had problems with FSAA, haven’t tried blending yet but it would be expected behaviour.


13 thoughts on “16-Bit Color Encoding on the GPU”

  1. ooops. you moved. i posted same comment commented in your old blog.

    i have a question.

    if below code changed to
    int3 rgb16 = rgb24 / 4;
    rgb16.rb /= 2;
    int3 rgb16 = rgb24 / int3(8, 4, 8);

    will it be faster?

  2. @netics :
    I just made a quick test, and yes it does! My original version is 16 arithmetic instructions and yours is 13.
    I’ll update the post with your findings. Thanks!

    I also tried changing the decoding from :
    float2 split = round(packed) / 8;
    split.y /= 4;

    float2 split = round(packed) / float2(8, 32);
    But for some reason this didn’t change the instruction count. Anyway it only takes 10 arithmetic to decode…

  3. I have a slightly different application, but I think your method is on the right track to solving it. I need to display a 16-bit float texture (luma + alpha) by packing it into two 8-bit color channels.
    Ideally it would yield a coherent image, so the least significant bits would be in the red channel; most, green. This would result in a tone mapping that was black at 0’s and yellow at full scale. With integer formats, the solution is pretty simple (like your solution, accomplish bit shifts with arithmetic), but I’m kind of at a loss as to how to do this with floats. Any idea of how one would do this? Thanks!

  4. your post is awesome!! I was very impressed. It looks good for my recently works.
    sorry but I can’t understand why frac() simulate bitwise. how it works?
    It’s just returns the fractional part. :-)
    Could you coach me plz?
    God bless you.

  5. Hiya! Glad you like it.
    The frac() and floor() intrinsics work because I take a float, divide it by a power of two (which puts the bits higher than the split in the integer part, and the bits lower than the split in the fractional part), and frac/floor act as a bitmask to strip what I want around the split.
    Then I can move it around with power-of-two multiplications if needed.

    I’m not sure why the 7xxx series wouldn’t work, they support SM3.0… Maybe it has to do with half-precision maths? Maybe you can force full-precision when compiling the shader or something?
    Otherwise I’d just say “update your drivers” but you probably already did that…

    Good luck! And happy new year in Korea. :)

  6. Oh! thx u! SAINT Renaud Bédard!!!! I’ll never forget the grace.
    My works have been helped by your exposition.
    I wonder how you know where i am. do you know a korean word?
    Happy new year!!

  7. Hey, I’m trying to do a similar thing in Flash’s AGAL language (for Stage3D).

    I’m a little new at this but while reading your solution, I was curious about one thing. Is there any reasons why you couldn’t pull it off as R8G8B8? Can’t a register’s field hold a plain white value (0xFFFFFF = 16777215) ?

    I’m really interested in your approach though. I might need to do something similar with a higher-level Shader language like FLSL (from Flare3D) or Adobe’s AGALMacroAssembler.

    1. The idea was to reduce the amount of data taken by color information in the shader, so I can use the B channel for something else without doing multiple render targets, or multiple draw passes.
      It’s not usually a good idea because there’s quality loss, but sometimes compression can be a big win!
      I don’t know anything about Stage3D stuff sadly, so I can’t help you there. :) Good luck!

  8. This works great, thanks!!

    Do you know of a solution so that the images don’t have banding seams when using bilinear filtering? It works fine with point filtering, but then of course you don’t get the blending. Example image of the seams here : https://i.ibb.co/mSjT8LP/Screenshot-2020-11-10-044923.jpg

    It is espically noticeable on the red, and surprisingly less noticeable on the green, and not even present on the blue

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.