danielwertheim

danielwertheim


notes from a passionate developer

Share


Sections


Tags


Disclaimer

This is a personal blog. The opinions expressed here represent my own and not those of my employer, nor current or previous. All content is published "as is", without warranty of any kind and I don't take any responsibility and can't be liable for any claims, damages or other liabilities that might be caused by the content.

C# - Parallel deserialization of JSON stored in database

The scenario

A while back ago I had to yield entities constructed by deserializing JSON stored in a database. The first solution just opened a simple single result, sequential reader against the database returning a one column result set containing JSON. This was just yield returned after deserialized to the desired entity. Trying to tweak this I turned to the task parallel library. The idea was to in a separate task, read from the datareader and at the same time in, the main thread, deserialize the JSON string and yield entities while still reading from the database.

The solution

First, lets be clear. I'm not saying you should see this as an solution that fits in all similar scenarios. In my case it was faster reading the strings from the database then doing the deserialization, but it wasn't to big of a difference, hence there wasn't that much more memory consumption caused by a large BlockingCollection. But this is something you have to test and measure for your needs. But everything depends on the scenarios. How big is the JSON string? How many items are there? How's the infrastructure? But lets put that aside and have a look at the solution. The sourceData below comes from yielding the data-reader. In a separate task I read from a data-reader which represent a single column result set with a string, a JSON. In that task I add the JSON string to a BlockingCollection that wraps the ConcurrentQueue. At the same time in the main thread I TryTake/dequeue a JSON string from the collection and then yield return it deserialized.

When the reading from the database is done, the task is closed and I then deserialize all the non deserialized JSON strings.

public IEnumerable<T> DeserializeManyInParallel<T>(
    IEnumerable<string> sourceData) where T : class
{
	using (var q = new BlockingCollection<string>())
	{
		Task task = null;

		try
		{
			task = new Task(() =>
			{
				foreach (var json in sourceData)
					q.Add(json);

				q.CompleteAdding();
			});

			task.Start();

			foreach (var e in q.GetConsumingEnumerable())
				yield return
				    JsonSerializer.DeserializeFromString<T>(e);
		}
		finally
		{
			if (task != null)
			{
				Task.WaitAll(task);
				task.Dispose();
			}

			q.CompleteAdding();
		}
	}
}

Again! Measure, test and try it for your scenarios, before accepting it as a solution.

//Daniel

View Comments