Structurizer turns your object graph into a key-value representation (read more in previous post). When building the key-value representation it leaves out all null values and keeps the actual values. Each value is contained in a StructureIndex
. Since it is generated dynamically each value is stored as an object. So Structurizer does not not convert them into string representation a'la JSON or something. But what if it did? I started to attack the IL-code generation to see if my idea could be effectful. So instead of representing each value using an object, the idea was to convert all value types to strings to get rid of the boxing and unboxing associated with them. When doing this I finally got a chance to have a look at BenchmarkDotNet. As a tool for seeing the metrics before and after my changes. Just to get an idea of something similar, I also benchmarked the process of serializing and deserializing the same object graph using Newtonsoft's JSON.Net.
BenchmarkDotNet
BenchmarkDotNet is really well documented and simple to get started with. In my case I was interested in seeing memory info so I installed the BenchmarkDotNet.Diagnostics.Windows package.
install-package BenchmarkDotNet.Diagnostics.Windows
Then I created a small Benchmark scenario.
[MemoryDiagnoser]
public class MeasureOrder {
private IStructureBuilder _builder;
private Order _sampleOrder;
[Params(1, 10, 100, 1000, 10000)]
public int Count { get; set; }
[Setup]
public void Setup() {
_builder = StructureBuilder.Create(cfg => cfg.Register<Order>());
_sampleOrder = Order.CreateSample();
}
[Benchmark]
public void Run() {
for (var i = 0; i < Count; i++)
_builder.CreateStructure(_sampleOrder);
}
}
All that is left is to compile and execute it. Do ensure that you are compiling for Release
build. According to docs, it will warn if you don't.
class Program {
static void Main(string[] args) {
var summary = BenchmarkRunner.Run<MeasureOrder>();
}
}
Benchmarking JSON.Net as well
This is a bit like comparing apples with pears, but I can see people questioning the metrics of Structurizer, while at the same time they are using JSON.Net to serialize and deserialize back and forth between objects and JSON.
[MemoryDiagnoser]
public class MeasureJson
{
private Order _sampleOrder;
private string _sampleOrderJson;
private JsonSerializerSettings _settings;
[Params(1, 10, 100, 1000, 10000)]
public int Count { get; set; }
[Setup]
public void Setup()
{
_settings = CreateSettings();
_sampleOrder = Order.CreateSample();
_sampleOrderJson = JsonConvert.SerializeObject(
_sampleOrder,
_settings);
}
private static JsonSerializerSettings CreateSettings()
{
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
DateFormatHandling = DateFormatHandling.IsoDateFormat,
Formatting = Formatting.None
};
settings.Converters.Add(new StringEnumConverter());
return settings;
}
[Benchmark]
public void Deserialize()
{
for (var i = 0; i < Count; i++)
JsonConvert.DeserializeObject<Order>(
_sampleOrderJson,
_settings);
}
[Benchmark]
public void Serialize()
{
for (var i = 0; i < Count; i++)
JsonConvert.SerializeObject(_sampleOrder, _settings);
}
}
Benchmark results
I just ran this on one machine/environment. First I ran the benchmark against the old code base and then against the new code base.
Environment
BenchmarkDotNet=v0.10.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4790K CPU 4.00GHz, ProcessorCount=8
Frequency=3906249 Hz, Resolution=256.0001 ns, Timer=TSC
Host Runtime=Clr 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1586.0
Job Runtime(s):
Clr 4.0.30319.42000, Arch=32-bit RELEASE
Structurizer BEFORE changes
Method | Count | Mean | StdDev | Median | Gen 0 | Bytes Allocated/Op |
------- |------ |---------------- |-------------- |---------------- |--------- |------------------- |
Run | 1 | 28.7449 us | 0.1135 us | 28.7134 us | 0.95 | 21 434,34 |
Run | 10 | 291.3118 us | 8.3047 us | 286.0621 us | 8.72 | 195 174,19 |
Run | 100 | 2,934.1353 us | 54.0423 us | 2,907.7632 us | 86.92 | 1 941 835,75 |
Run | 1000 | 28,811.9646 us | 330.2285 us | 28,755.5290 us | 835.00 | 18 515 322,18 |
Run | 10000 | 290,485.8957 us | 1,178.4613 us | 290,496.3380 us | 7,842.80 | 172 048 916,40 |
Structurizer AFTER changes
Method | Count | Mean | StdDev | Median | Gen 0 | Bytes Allocated/Op |
------- |------ |---------------- |----------- |---------------- |--------- |------------------- |
Run | 1 | 11.1401 us | 0.0110 us | 11.1377 us | 0.54 | 9 407,66 |
Run | 10 | 110.7046 us | 0.0955 us | 110.7346 us | 4.78 | 87 130,48 |
Run | 100 | 1,087.4541 us | 0.7515 us | 1,087.4795 us | 48.28 | 870 739,49 |
Run | 1000 | 10,898.0924 us | 33.3535 us | 10,897.8268 us | 490.00 | 8 631 232,33 |
Run | 10000 | 108,300.7339 us | 47.8315 us | 108,294.6048 us | 4,625.00 | 81 464 268,97 |
JSON.Net - Just for fun
Note. Again. This was just included so that you could get an idea of metrics around something similar that you might be doing in various places in your code base. Like the process of serializing and deserializing the same object graph using Newtonsoft's JSON.Net.
Method | Count | Mean | StdDev | Median | Gen 0 | Bytes Allocated/Op |
------------ |------ |---------------- |------------ |---------------- |--------- |------------------- |
Deserialize | 1 | 17.4606 us | 0.0574 us | 17.4595 us | 0.39 | 7 798,36 |
Serialize | 1 | 13.0944 us | 0.0455 us | 13.0878 us | 0.42 | 8 229,88 |
Deserialize | 10 | 167.2348 us | 0.5212 us | 167.1813 us | 3.65 | 75 018,41 |
Serialize | 10 | 134.7028 us | 0.2724 us | 134.6870 us | 4.17 | 82 189,98 |
Deserialize | 100 | 1,664.0922 us | 5.2214 us | 1,663.7990 us | 43.35 | 897 572,65 |
Serialize | 100 | 1,299.8021 us | 5.5256 us | 1,299.4652 us | 43.05 | 855 575,72 |
Deserialize | 1000 | 16,609.1412 us | 50.8539 us | 16,617.6777 us | 328.53 | 6 866 461,43 |
Serialize | 1000 | 12,952.0301 us | 41.9472 us | 12,944.2135 us | 407.87 | 8 225 151,30 |
Deserialize | 10000 | 167,776.6216 us | 104.4235 us | 167,751.4143 us | 4,032.00 | 80 390 181,95 |
Serialize | 10000 | 130,453.3570 us | 86.7287 us | 130,467.7205 us | 4,703.00 | 92 042 180,70 |
Sample model
var order = new Order
{
Id = DateTime.Now.Ticks,
MerchantId = Guid.NewGuid(),
OrderNo = "2016-1234",
CustomerId = Guid.NewGuid(),
Tags = new[] { "Test1", "Test2", "Gold customer" },
PlacedAt = DateTime.Now.Subtract(TimeSpan.FromDays(2)),
Status = OrderStatus.Payed,
IsShipped = true,
FreightCost = 33.50M,
Amount = 1300M,
Discount = 100,
AmountToPay = 1233.50M,
Lines = new List<OrderLine>
{
new OrderLine
{
ArticleNo = "Article-Line0",
Qty = 42,
Props = new List<Prop>
{
new Prop
{
Name = "Key-Line0-Item0",
Value = "Value-Line0-Item0",
},
new Prop
{
Name = "Key-Line0-Item1",
Value = "Value-Line0-Item1"
}
}
},
new OrderLine
{
ArticleNo = "Article-Line1",
Qty = 3,
Props = new List<Prop>
{
new Prop
{
Name = "Key-Line1-Item0",
Value = "Value-Line1-Item0"
},
new Prop
{
Name = "Key-Line1-Item1",
Value = "Value-Line1-Item1"
}
}
}
}
};
What change made the biggest difference?
The code base that this comes from is a few years old. And in there, there were a naive optimisation attempt. The StructureBuilder
was using Parallel.For
to create structures. But this only kicked in if the number of structures where 100 or more (no idea where that number came from).
REMOVED
private IStructure[] CreateStructuresInParallel<T>(T[] items, IStructureSchema schema) where T : class
{
var structures = new IStructure[items.Length];
Parallel.For(0, items.Length, i =>
{
var itm = items[i];
structures[i] = new Structure(
schema.Name,
CreateIndexes(schema, itm));
});
return structures;
}
Parallel.For
was also used in the StructureIndexFactory
in which the StructureIndexes
are created. So one master parallel flow calling into a child parallel flow that in turn created pre-sized arrays represented by IEnumerable<IStructureIndex>[]
which in the end wasn't returned as is, but instead null
values where filtered out and returned as an array:
public IStructureIndex[] CreateIndexes<T>(IStructureSchema structureSchema, T item) where T : class
{
var indexes = new IEnumerable<IStructureIndex>[structureSchema.IndexAccessors.Count];
Parallel.For(0, indexes.Length, c =>
{
var indexAccessor = structureSchema.IndexAccessors[c];
var values = indexAccessor.GetValues(item);
var valuesExists = values != null && values.Count > 0;
if (!valuesExists)
return;
var isCollectionOfValues = indexAccessor.IsEnumerable || indexAccessor.IsElement || values.Count > 1;
if (!isCollectionOfValues)
indexes[c] = new[]
{
new StructureIndex(indexAccessor.Path, values[0].Path, values[0].Value, indexAccessor.DataType, indexAccessor.DataTypeCode)
};
else
{
var subIndexes = new IStructureIndex[values.Count];
Parallel.For(0, subIndexes.Length, subC =>
{
if (values[subC] != null && values[subC].Value != null)
subIndexes[subC] = new StructureIndex(
indexAccessor.Path,
values[subC].Path,
values[subC].Value,
indexAccessor.DataType,
indexAccessor.DataTypeCode);
});
indexes[c] = subIndexes;
}
});
return indexes
.Where(i => i != null)
.SelectMany(i => i)
.Where(i => i != null)
.ToArray();
}
For now, this implicit construct has been replaced with a simple List<IStructureIndex>
. And population of values and exclusion of null
values are done within a simple sequential for-loop
instead. This was the change that had the biggest impact on both time and memory allocation.
That's it for now. Will profile and see what more can be done.
//Daniel