danielwertheim

danielwertheim


notes from a passionate developer

Share


Sections


Tags


Disclaimer

This is a personal blog. The opinions expressed here represent my own and not those of my employer, nor current or previous. All content is published "as is", without warranty of any kind and I don't take any responsibility and can't be liable for any claims, damages or other liabilities that might be caused by the content.

Structurizer improvements

Structurizer turns your object graph into a key-value representation (read more in previous post). When building the key-value representation it leaves out all null values and keeps the actual values. Each value is contained in a StructureIndex. Since it is generated dynamically each value is stored as an object. So Structurizer does not not convert them into string representation a'la JSON or something. But what if it did? I started to attack the IL-code generation to see if my idea could be effectful. So instead of representing each value using an object, the idea was to convert all value types to strings to get rid of the boxing and unboxing associated with them. When doing this I finally got a chance to have a look at BenchmarkDotNet. As a tool for seeing the metrics before and after my changes. Just to get an idea of something similar, I also benchmarked the process of serializing and deserializing the same object graph using Newtonsoft's JSON.Net.

BenchmarkDotNet

BenchmarkDotNet is really well documented and simple to get started with. In my case I was interested in seeing memory info so I installed the BenchmarkDotNet.Diagnostics.Windows package.

install-package BenchmarkDotNet.Diagnostics.Windows

Then I created a small Benchmark scenario.

[MemoryDiagnoser]
public class MeasureOrder {
  private IStructureBuilder _builder;
  private Order _sampleOrder;

  [Params(1, 10, 100, 1000, 10000)]
  public int Count { get; set; }

  [Setup]
  public void Setup() {
    _builder = StructureBuilder.Create(cfg => cfg.Register<Order>());
    _sampleOrder = Order.CreateSample();
  }

  [Benchmark]
  public void Run() {
    for (var i = 0; i < Count; i++)
      _builder.CreateStructure(_sampleOrder);
  }
}

All that is left is to compile and execute it. Do ensure that you are compiling for Release build. According to docs, it will warn if you don't.

class Program {
  static void Main(string[] args) {
    var summary = BenchmarkRunner.Run<MeasureOrder>();
  }
}

Benchmarking JSON.Net as well

This is a bit like comparing apples with pears, but I can see people questioning the metrics of Structurizer, while at the same time they are using JSON.Net to serialize and deserialize back and forth between objects and JSON.

[MemoryDiagnoser]
public class MeasureJson
{
  private Order _sampleOrder;
  private string _sampleOrderJson;
  private JsonSerializerSettings _settings;

  [Params(1, 10, 100, 1000, 10000)]
  public int Count { get; set; }

  [Setup]
  public void Setup()
  {
    _settings = CreateSettings();

    _sampleOrder = Order.CreateSample();
    _sampleOrderJson = JsonConvert.SerializeObject(
      _sampleOrder,
      _settings);
  }

  private static JsonSerializerSettings CreateSettings()
  {
    var settings = new JsonSerializerSettings
    {
      ContractResolver = new CamelCasePropertyNamesContractResolver(),
      DateFormatHandling = DateFormatHandling.IsoDateFormat,
      Formatting = Formatting.None
    };
    settings.Converters.Add(new StringEnumConverter());

    return settings;
  }

  [Benchmark]
  public void Deserialize()
  {
    for (var i = 0; i < Count; i++)
      JsonConvert.DeserializeObject<Order>(
        _sampleOrderJson,
        _settings);
  }

  [Benchmark]
  public void Serialize()
  {
    for (var i = 0; i < Count; i++)
      JsonConvert.SerializeObject(_sampleOrder, _settings);
  }
}

Benchmark results

I just ran this on one machine/environment. First I ran the benchmark against the old code base and then against the new code base.

Environment

BenchmarkDotNet=v0.10.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4790K CPU 4.00GHz, ProcessorCount=8
Frequency=3906249 Hz, Resolution=256.0001 ns, Timer=TSC
Host Runtime=Clr 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1586.0
Job Runtime(s):
	Clr 4.0.30319.42000, Arch=32-bit RELEASE

Structurizer BEFORE changes

 Method | Count |            Mean |        StdDev |          Median |    Gen 0 | Bytes Allocated/Op |
------- |------ |---------------- |-------------- |---------------- |--------- |------------------- |
    Run |     1 |      28.7449 us |     0.1135 us |      28.7134 us |     0.95 |          21 434,34 |
    Run |    10 |     291.3118 us |     8.3047 us |     286.0621 us |     8.72 |         195 174,19 |
    Run |   100 |   2,934.1353 us |    54.0423 us |   2,907.7632 us |    86.92 |       1 941 835,75 |
    Run |  1000 |  28,811.9646 us |   330.2285 us |  28,755.5290 us |   835.00 |      18 515 322,18 |
    Run | 10000 | 290,485.8957 us | 1,178.4613 us | 290,496.3380 us | 7,842.80 |     172 048 916,40 |

Structurizer AFTER changes

 Method | Count |            Mean |     StdDev |          Median |    Gen 0 | Bytes Allocated/Op |
------- |------ |---------------- |----------- |---------------- |--------- |------------------- |
    Run |     1 |      11.1401 us |  0.0110 us |      11.1377 us |     0.54 |           9 407,66 |
    Run |    10 |     110.7046 us |  0.0955 us |     110.7346 us |     4.78 |          87 130,48 |
    Run |   100 |   1,087.4541 us |  0.7515 us |   1,087.4795 us |    48.28 |         870 739,49 |
    Run |  1000 |  10,898.0924 us | 33.3535 us |  10,897.8268 us |   490.00 |       8 631 232,33 |
    Run | 10000 | 108,300.7339 us | 47.8315 us | 108,294.6048 us | 4,625.00 |      81 464 268,97 |

JSON.Net - Just for fun

Note. Again. This was just included so that you could get an idea of metrics around something similar that you might be doing in various places in your code base. Like the process of serializing and deserializing the same object graph using Newtonsoft's JSON.Net.

      Method | Count |            Mean |      StdDev |          Median |    Gen 0 | Bytes Allocated/Op |
------------ |------ |---------------- |------------ |---------------- |--------- |------------------- |
 Deserialize |     1 |      17.4606 us |   0.0574 us |      17.4595 us |     0.39 |           7 798,36 |
   Serialize |     1 |      13.0944 us |   0.0455 us |      13.0878 us |     0.42 |           8 229,88 |
 Deserialize |    10 |     167.2348 us |   0.5212 us |     167.1813 us |     3.65 |          75 018,41 |
   Serialize |    10 |     134.7028 us |   0.2724 us |     134.6870 us |     4.17 |          82 189,98 |
 Deserialize |   100 |   1,664.0922 us |   5.2214 us |   1,663.7990 us |    43.35 |         897 572,65 |
   Serialize |   100 |   1,299.8021 us |   5.5256 us |   1,299.4652 us |    43.05 |         855 575,72 |
 Deserialize |  1000 |  16,609.1412 us |  50.8539 us |  16,617.6777 us |   328.53 |       6 866 461,43 |
   Serialize |  1000 |  12,952.0301 us |  41.9472 us |  12,944.2135 us |   407.87 |       8 225 151,30 |
 Deserialize | 10000 | 167,776.6216 us | 104.4235 us | 167,751.4143 us | 4,032.00 |      80 390 181,95 |
   Serialize | 10000 | 130,453.3570 us |  86.7287 us | 130,467.7205 us | 4,703.00 |      92 042 180,70 |

Sample model

var order = new Order
{
  Id = DateTime.Now.Ticks,
  MerchantId = Guid.NewGuid(),
  OrderNo = "2016-1234",
  CustomerId = Guid.NewGuid(),
  Tags = new[] { "Test1", "Test2", "Gold customer" },
  PlacedAt = DateTime.Now.Subtract(TimeSpan.FromDays(2)),
  Status = OrderStatus.Payed,
  IsShipped = true,
  FreightCost = 33.50M,
  Amount = 1300M,
  Discount = 100,
  AmountToPay = 1233.50M,
  Lines = new List<OrderLine>
  {
    new OrderLine
    {
      ArticleNo = "Article-Line0",
      Qty = 42,
      Props =  new List<Prop>
      {
        new Prop
        {
          Name = "Key-Line0-Item0",
          Value = "Value-Line0-Item0",
        },
        new Prop
        {
          Name = "Key-Line0-Item1",
          Value = "Value-Line0-Item1"
        }
      }
    },
    new OrderLine
    {
      ArticleNo = "Article-Line1",
      Qty = 3,
      Props =  new List<Prop>
      {
        new Prop
        {
          Name = "Key-Line1-Item0",
          Value = "Value-Line1-Item0"
        },
        new Prop
        {
          Name = "Key-Line1-Item1",
          Value = "Value-Line1-Item1"
        }
      }
    }
  }
};

What change made the biggest difference?

The code base that this comes from is a few years old. And in there, there were a naive optimisation attempt. The StructureBuilder was using Parallel.For to create structures. But this only kicked in if the number of structures where 100 or more (no idea where that number came from).

REMOVED

private IStructure[] CreateStructuresInParallel<T>(T[] items, IStructureSchema schema) where T : class
{
  var structures = new IStructure[items.Length];

  Parallel.For(0, items.Length, i =>
  {
    var itm = items[i];

    structures[i] = new Structure(
      schema.Name,
      CreateIndexes(schema, itm));
  });

  return structures;
}

Parallel.For was also used in the StructureIndexFactory in which the StructureIndexes are created. So one master parallel flow calling into a child parallel flow that in turn created pre-sized arrays represented by IEnumerable<IStructureIndex>[] which in the end wasn't returned as is, but instead null values where filtered out and returned as an array:

public IStructureIndex[] CreateIndexes<T>(IStructureSchema structureSchema, T item) where T : class
{
  var indexes = new IEnumerable<IStructureIndex>[structureSchema.IndexAccessors.Count];

  Parallel.For(0, indexes.Length, c =>
  {
    var indexAccessor = structureSchema.IndexAccessors[c];
    var values = indexAccessor.GetValues(item);

    var valuesExists = values != null && values.Count > 0;
    if (!valuesExists)
      return;

    var isCollectionOfValues = indexAccessor.IsEnumerable || indexAccessor.IsElement || values.Count > 1;
    if (!isCollectionOfValues)
      indexes[c] = new[]
      {
        new StructureIndex(indexAccessor.Path, values[0].Path, values[0].Value, indexAccessor.DataType, indexAccessor.DataTypeCode)
      };
    else
    {
      var subIndexes = new IStructureIndex[values.Count];

      Parallel.For(0, subIndexes.Length, subC =>
      {
        if (values[subC] != null && values[subC].Value != null)
          subIndexes[subC] = new StructureIndex(
            indexAccessor.Path,
            values[subC].Path,
            values[subC].Value,
            indexAccessor.DataType,
            indexAccessor.DataTypeCode);
      });
      indexes[c] = subIndexes;
    }
  });

  return indexes
    .Where(i => i != null)
    .SelectMany(i => i)
    .Where(i => i != null)
    .ToArray();
}

For now, this implicit construct has been replaced with a simple List<IStructureIndex>. And population of values and exclusion of null values are done within a simple sequential for-loop instead. This was the change that had the biggest impact on both time and memory allocation.

That's it for now. Will profile and see what more can be done.

//Daniel

View Comments