notes from a passionate developer

Developer that lives by the mantra "code is meant to be shared".




This is a personal blog. The opinions expressed here represent my own and not those of my employer, nor current or previous. All content is published "as is", without warranty of any kind and I don't take any responsibility and can't be liable for any claims, damages or other liabilities that might be caused by the content.

A word or two about NoSQL and it being schemaless

Daniel WertheimDaniel Wertheim

I’ve been discussing the “schemaless” view of NoSQL with some fellow developers and have been tweeting a bit about it and yesterday I started to read a new book: “NoSQL Distilled – A brief guide to the emerging world of polygot persistence, Pramod J.Sadalage & Martin Fowler (2013)” (Sadalage & Fowler 2013). So far I really appreciate it and I’ve extracted two sentences from the book that I think is really informative and clear when it comes to schemas and NoSQL:

“… whenever we write a program that accesses data, that program almost always relies on some form of implicit schema” – (Sadalage & Fowler 2013)

“Essentially, a schemaless database shifts the schema into the application code that accesses it.” – (Sadalage & Fowler 2013)

So, even though you don’t explicitly define a specific schema in the database (naming columns and putting data type constraints on it etc) you do have an implicit schema, expressed by dependencies from your application model against your data. You also have the notion of an implicit schema in your database. Lets say you are using a data-storage solution that allows you to author views using map & reduce functions using JavaScript. How often do you find your self writing logic in your map functions, ensuring that the current document is of a certain type and/or has certain fields? If you do that, aren’t you ensuring that the document is conforming to a certain schema? Another example that I think highlights this area is the common question you find when developers used to RDBMS and OR/Ms switch to NoSQL and start renaming members in their application model and then find them selves in unknown waters not knowing how to handle the difference in the implicit data-schema and the explicit application model. Should they version their application model or start writing mappers? Or should they introduce proxies that wraps the raw data-entity? Or is there another solution? Could we introduce meta fields in e.g the document if it’s a document-oriented NoSQL store and act on this meta data when deserializing the document to our application entities? Or could we just accept that the data is part of our history and as long as it is interesting we can not change names (history can not be changed). You see, there are questions and I think they all brings light on the fact that there’s an implicit schema. Schemaless in NoSQL land, is for me the fact that non uniformed data can be stored within the same container; and it can diverge by time.


P.s! You are most welcomed to share your thoughts about this subject.

Developer that lives by the mantra "code is meant to be shared".