notes from a passionate developer

Developer that lives by the mantra "code is meant to be shared".




This is a personal blog. The opinions expressed here represent my own and not those of my employer, nor current or previous. All content is published "as is", without warranty of any kind and I don't take any responsibility and can't be liable for any claims, damages or other liabilities that might be caused by the content.

NoSQL and non explicit schemas does not free you from modelling

Daniel WertheimDaniel Wertheim

This will be a really short post and will be about the whole flexible schemas in NoSQL land and how easy it is to store data and let the design of it evolve by time. “You just throw some JSON in there and then start harvest the data with some nifty queries.” I’m saying, no you are not freed from not thinking about your data model, because the shape of it will need to vary depending on how you will consume the data. Will it be read or write intensive? What granularity do you need? How will your application consume the data?

I have written about this before. But this time, lets take MongoDb as an example. I’m by no means an MongoDb expert (I just like this subject) so instead, I will point you in the direction of one that is more experienced then I am in it. In this nice webinar: “Time-Series Data in MongoDb – by Sandeep Parikh” at around 12min 18ms in to 13min 32s, there’s a modelling scenario described. A scenario where time-series data is stored in “a document per hour by second” approach. One document represents a servers load at each second during one specific hour. That means that there will be one load value written to the document, for each second of the hour. That is one initial write and 3600 updates (minus one, depending on if the initial write also writes a load value or not). No problems right? Well, as it turns out (according to the webinar), since the load values are stored as a nested document:

    server: "server1",
    load: {0: 15, 1: 20, ..., 3598: 45, 3599: 40},
    ts: ......

MongoDb apparently have to walk 3599 steps to update the last second! And you thought you just could throw the data in there?!? Of course there are solutions, and in the webinar the next solution has a slightly different modelling technique which solves the problem. Watch the webinar. Even if you, just as me, isn’t a MongoDb user. The point is. Know your data. Know your application and know how it will consume the data. And do not be to naive about your selected platform of choice to store your data.



Developer that lives by the mantra "code is meant to be shared".