Fast .NET Reflection and Serialization

(sorry if you got this twice in your RSS, I hit the “publish” button too early…)

A while ago I decided to make an automatic serializer that works just like the XmlSerializer but for the SDL file format, since I like the simplicity and elegance of this data language. The XmlSerializer also doesn’t work natively with Dictionary objects, and crashes when used with certain visibility combinations and C# 3.0 auto-implemented properties.

Making a serializer for any language implies heavy use of reflection to determine the structure of what you’re reading or writing to or from a data file, but also to invoke the getter/setter of the fields you’re serializing.

Performance considerations

Some reflection operations come at a heavy performance cost. Not all of them though! This 2005 article in MSDN Magazine explains that fetching custom attributes, FieldInfo/PropertyInfo objects, invoking functions/properties and members and creating new instances are the costliest operations. Well that’s a problem, because all of those will be handy when writing our serializer.

The same article continues by showing which are the slowest method invocation techniques. The speediest technique are direct delegate use, virtual method calls or direct calls, but those are impossible to use if all you’ve got is a Type and an Object. The next best thing is using a DynamicMethod object, IL emission and a delegate. Having never used IL before, I didn’t grasp all of that, but thankfully there are many other resources concerning the use of DynamicMethod out there.

A post on Haibo Luo’s blog from 2005 makes a performance comparison between Activator.CreateInstance() (by the way, doing “new T()” with a generic type parameter that’s constrained as “new()” is the exact same as calling this method) and various other techniques including DynamicMethod and using it as a delegate. This last technique blows the rest out of the water in terms of speed.

This GPL library on CodeProject written by Alessandro Febretti provides an excellent dynamic method factory. And this other article on CodeProject goes a bit further and shows how to set/get values on fields, and isolates the boxing in helper functions.

What I ended up doing is taking from all of these examples, correcting the problems outlined in the comments of both CodeProject samples, and I built a IReflectionProvider interface that publishes all these costly operations and which can be implemented three different ways :

  • DirectReflector : Simply via reflection
  • EmitReflector : With IL emission but no caching performed (the DynamicMethods and delegates are rebuilt on each call)
  • CachedReflector : With IL emission and caching (the resulting delegates are created only once, then accessed with a dictionary lookup)

I’m aware that the 2nd test case is ridiculous, you should never emit IL and generate methods at runtime and repeatedly, but I wanted to outline the importance of caching.

The serializer

When making this sample, I wanted to both provide a fast .NET reflection library as well as a proper generic implementation of a reflective serializer. But I didn’t want to spend time on string parsing/formatting, since serializers usually output a text file or a certain data format. So the tradeoff I chose is somewhat unusable in the real world…

It outputs objects which are a generalization tentative of all .NET objects. There are three main categories :

  • SerializedAtoms are indivisible, single-valued and immutable. All primitive types will serialize to atoms, in addition to strings, enums and nullable types.
  • SerializedCollections are multi-valued object bags that don’t give a specific meaning to keys or indices other than natural ordering. All classes that implement ICollection<T> will serialize into this.
  • SerializedAggregates are multi-valued object maps that use the key or index for indentification. All of which doesn’t fall in the two other categories will serialize to aggregates, so Dictionaries and just any other class.

Only atoms contain actual values, but it contains them as an object. There is no string conversion done in the end, it all remains in memory. Serialized objects also retain the name of their host field or dictionary entry if any, and the runtime type if different from the declared one.

To customize the serialization output to an extent, I made a custom attribute called [Serialization] which allows to force an alternate name to a serialized member, mark a member as ignored by the serializer, or mark it as required. I could’ve used “optional” instead, but I find it more logical to skip serialization of all null or default-valued fields.

Just like the XmlSerializer, it only serializes the public instance fields or properties. So unlike the BinaryFormatter (which is deep serialization), my serializer does shallow serialization.

I have tested the implementation with many (if not all) combinations of value-type/class, serialized object category and visibility, so I can say it’s pretty robust and tolerant on what you feed it.

Results

This is the whole point… how fast does “Fast .NET Reflection” go? Here are the timings for 10 outer loops (so 10 serializer creations) and 100 inner loops (100 serializations per outer loop), which means 1000 serializations or the same complex aggregate object.

Test ‘Standard Reflection’ Started… Completed.
Time Elapsed : 00:00:08.2473666

Test ‘Reflection.Emit + Delegate (No Caching)’ Started… Completed.
Time Elapsed : 00:01:52.4517968

Test ‘DynamicMethod + Delegate, Cached’ Started… Completed.
Time Elapsed : 00:00:00.9970487

Well, I did say that no caching was a very bad idea.

Still, the highlight here is that by running the same serialization code with two different reflection function providers, using dynamic IL methods and a healthy dose of caching is eight (8!) times faster than using standard reflection.

Sample code

The code for this sample (C# 3.5, VS.NET 2008) can be found here : FastReflection.zip (46 Kb)

Even if you’re not interested in serialization, I suggest you take a look at the EmitHelper class and how it’s used in CachedReflector. All tasks that need Reflection in a time-critical context should use dynamic methods!

8 thoughts on “Fast .NET Reflection and Serialization

  1. Hi,

    I really love this implementation of the Serialization framework. I do have a question though: Would I somehow be able to use the Fast .Net Serialization example in a Web Service context? I would like to serialize the information but using XML Elements and Attributes, etc… I need this as when I convert my Entity Framework entities to my custom Data Transfer Objects. I send these DTO’s using my web service. I’d like to use the above implementation to transfer my objects. Can anyone help?

    Giovanni

  2. You’re really onto something near to my heart here.

    It irks me to no end that we’ve let serialization contracts become so tightly coupled with their entities. Not a good situation for libraries you don’t control. Ditto for libraries decorated with different serialization attributes to fit different serializers (e.g. Xml for shallow, DataContract for deep), etc. Ditto when you have different serialization needs from the same serializer in different contexts (deep vs. shallow XML). Ditto when entities have auto-managed current-state properties (e.g. a read-only IsDirty instance property that gets set to true in other properties’ setter blocks when the value changes).

    Imagine this though: Forget the decorators; use separate classes to implement something like ITransfomationContract <- IHydrationContract <- IEntitySpec. (I used "Hydration" instead of "Data" so as not to confuse with the MS DataContract.)

    You've already nailed the entity spec magnificently – SerializedAtoms, SerializedCollections and SerializedAggregates. IHydrationContract would hold everything needed — given an IEntitySpec instance — to create cacheable getter/setter delegates with desired behaviors. This would include things like binding flags (e.g. set private fields for deep cloning), ignored properties/fields and/or required properties/fields, default values (e.g. Guid.Empty), setter invocation ordinals (e.g. when setters encapsulate order-sensitive side-effects). Lastly, ITransformationContract specifies a wire format given instances of a hydrator and entity.

    From there, it wouldn't take much to encapsulate some of the heavy lifting into helper methods that handle the most common hydration scenarios (e.g. deep exact cloning vs. shallowly copying just the public, writable properties). For wire formats, maybe protobuf, JSON and Xml out of the box.

    Cobble all that together, and finally, developers are freed from choosing between non-injectable declarative hell and custom ISerializable implementations (which is like choosing between skipping dinner or having to eat an entire stale fruitcake).

    Thanks for a great article. This looks like a huge step towards a wide range of fast, flexible, *trivially* customizable serializers!

  3. :o

    You’re very welcome, and this is motivating me to pursue my efforts on serialization! :D

    Lots of good ideas in your post, and I do think that decoupling hydration/serialization contract from the data spec is the way to go… and I’d love to implement a json/xml/sdl/whatever transformer just to show that it works.
    I think a project like that belongs on a collaborative site though, I wouldn’t undertake it alone. My intent is mainly to do a proof-of-concept and high-performance tidbits that you can stick into your own code.

    Anywho, thanks!

  4. Like your article.. still trying to get my head around how this all works. One issue that I am running into:

    I’m trying to deserialize my object from a file. I don’t see an obvious way to do this.

    My serialization code is as follows (calling your class library)

    public void FastSerialize(string fileName)
    {
    System.IO.StreamWriter streamWriter = null;
    try
    {
    var serializer = new Serializer(new CachedReflector());
    SerializedObject temp = serializer.Serialize(this);
    System.IO.FileInfo serializedFile = new FileInfo(fileName);
    streamWriter.WriteLine(temp.Stringify());
    streamWriter.Close();
    }
    finally
    {
    if ((streamWriter != null))
    {
    streamWriter.Dispose();
    }
    }
    }

    public MyObject FastDeserialize(string fileName)
    {
    MyObject result = null;
    System.IO.FileStream file = null;
    System.IO.StreamReader sr = null;
    try
    {
    file = new System.IO.FileStream(fileName, FileMode.Open, FileAccess.Read);
    sr = new System.IO.StreamReader(file);
    string xmlString = sr.ReadToEnd();
    sr.Close();
    file.Close();
    var serializer = new Serializer(new CachedReflector());

    // This is where I am stuck, not clear to me what the next line should be
    return (serializer.Deserialize();
    }
    catch (Exception)
    {

    throw;
    }

    }

    Any help would be greatly appreciated.

  5. Hi Doug,

    Haha, yeah. The problem is that I don’t have a “de-stringify” method. Anyway, the Stringify() method was just for presentation and I don’t think it keeps all the necessary metadata to be a useable string-serialization method.

    My code serializes any object to a SerializedObject, and from there you’d have to write another layer that converts it to string (or XML, or SDL, or what you want) and back.

    The advantage is that you work with something very concrete/concise and that encompasses any data you can make in C# 3.5.

  6. Hey Renaud,

    great work, I’ve been testing your code and ran into the following issue: If you have a generic property defined as interface (e.g. IDictionary) you will run into an ArgumentNullException in EmitHelper.CreateParameterlessConstructorHandler() on deserialization of an serialized object since there is no ctr info available.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>