I’ve been discussing the “schemaless” view of NoSQL with some fellow developers and have been tweeting a bit about it and yesterday I started to read a new book: “NoSQL Distilled – A brief guide to the emerging world of polygot persistence, Pramod J.Sadalage & Martin Fowler (2013)” (Sadalage & Fowler 2013). So far I really appreciate it and I’ve extracted two sentences from the book that I think is really informative and clear when it comes to schemas and NoSQL:

“… whenever we write a program that accesses data, that program almost always relies on some form of implicit schema” – (Sadalage & Fowler 2013)

“Essentially, a schemaless database shifts the schema into the application code that accesses it.” – (Sadalage & Fowler 2013)

So, even though you don’t explicitly define a specific schema in the database (naming columns and putting data type constraints on it etc) you do have an implicit schema, expressed by dependencies from your application model against your data. You also have the notion of an implicit schema in your database. Lets say you are using a data-storage solution that allows you to author views using map & reduce functions using JavaScript. How often do you find your self writing logic in your map functions, ensuring that the current document is of a certain type and/or has certain fields? If you do that, aren’t you ensuring that the document is conforming to a certain schema? Another example that I think highlights this area is the common question you find when developers used to RDBMS and OR/Ms switch to NoSQL and start renaming members in their application model and then find them selves in unknown waters not knowing how to handle the difference in the implicit data-schema and the explicit application model. Should they version their application model or start writing mappers? Or should they introduce proxies that wraps the raw data-entity? Or is there another solution? Could we introduce meta fields in e.g the document if it’s a document-oriented NoSQL store and act on this meta data when deserializing the document to our application entities? Or could we just accept that the data is part of our history and as long as it is interesting we can not change names (history can not be changed). You see, there are questions and I think they all brings light on the fact that there’s an implicit schema. Schemaless in NoSQL land, is for me the fact that non uniformed data can be stored within the same container; and it can diverge by time.

//Daniel

P.s! You are most welcomed to share your thoughts about this subject.

Category:
Architecture, Design, NoSQL
Tags:

Join the conversation! 6 Comments

  1. Hi Daniel, great read as always! You are overlooking a few particularities of a nosql store, you could really have at any moment an unknown schema, in both app code and data store. Take for instance a log aggregation system. Somewhere in there, you’d have to write magic fairy glue to pull that out of a relational db where as with a nosql solution you could do just that, throw everything into it and at a later point decide how you want to describe said data.

    Just my 2 cents ;)

    • Hi,

      I kind of wrote this piece a bit “angled” for getting opinions and to be clear for all that doesn’t know me, I really, really like NoSQL solutions. So this is not me trying to write it down. With that said, I think you are providing a valid point, highlighting the fact that with a RDBMS you need to be explicit upfront. Yoy have to think through what shape your data is in, and what constraints it has etc. But I would say you still have to think of your aggregates of data and its shape in NoSQL land as well. Of course you don’t have to. You could just store it for later consumption, but the point is, I don’t think you are freed from thinking how your data will be consumed at a later point. E.g Is the data one aggregate or would it be more appropriate to split it up in two or more with links in between? And at the point you will start consuming it, will you only be reading it or always materializing it to a model having 100% of the members before writing it back? Otherwise what happens when you put that data back in a NoSQL store that does not support partial updates (some do)? But yes, it’s most certainly easier to put data in and to consume it as well. But I don’t think it frees you from thinking how it should be consumed. And when thinking of this, you kind of are “shaping” it and are giving it an implicit schema, right?

      Cheers,

      //Daniel

  2. I agree with most of your points in that, generally speaking, you are in one way or another constrained by a schema, be it by an upfront design oh how it will be used or as the data is consumed and represented. That said, most, if not all do in fact allow you to, not only partially update said data, but also transform it into something else all together. Take for instance map reduce functions, which most nosql solutions offer, you could say that this is almost equivalent to a view in an relational database, but there are nosql stores which do allow you to, given a map/reduce function for instance, patch/transform the data.

    I hope this throws some light onto what I tried to explain in my previous post.
    <3 Rei

    • And I agree on your points as well and find it interesting to hear views, opinions etc. in the subject of: “working with a NoSQL store on daily basis” :-) And I’m not saying it’s a bad thing of having an implicit schema. And yes, maps can let you transform your data to fit your needs and thereby mapping to a custom view model of your choice that you can decide on and create at a later time, but it still operates on the aggregate (root), so it’s not: “throw it in there and lets consume the data how-ever/whenever”.

      Cheers,

      //Daniel

  3. I worked on a project that explicitly didn’t have a schema at the application layer and I found it hard as developer to work in the application layer were a schema model wasn’t define. Looking a document db to see what the schema is just didn’t feel write. Bugs would show up later in the application layer because the model didn’t have schema validation. I think schema are necessary at least at application layer because then you know that model is would work with your application and if the schema changes you can have schema validation identify changes and update your application. There is a performance cost on schema validation but I think in a long run it’s worth it.

  4. […] I have written about this before. But this time, lets take MongoDb as an example. I’m by no means an MongoDb expert (I just like this subject) so instead, I will point you in the direction of one that is more experienced then I am in it. In this nice webinar: “Time-Series Data in MongoDb – by Sandeep Parikh” at around 12min 18ms in to 13min 32s, there’s a modelling scenario described. A scenario where time-series data is stored in “a document per hour by second” approach. One document represents a servers load at each second during one specific hour. That means that there will be one load value written to the document, for each second of the hour. That is one initial write and 3600 updates (minus one, depending on if the initial write also writes a load value or not). No problems right? Well, as it turns out (according to the webinar), since the load values are stored as a nested document: […]

Comments are closed.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: