notes from a passionate developer





This is a personal blog. The opinions expressed here represent my own and not those of my employer, nor current or previous. All content is published "as is", without warranty of any kind and I don't take any responsibility and can't be liable for any claims, damages or other liabilities that might be caused by the content.

The life of an archive worker, Bob, and his day of managing documents

In the family of NOSQL stores there is a particular family, focusing on storing semi-structured data as ONE unit called a document. This family is called: “document stores. It’s most often represented by a JSON-blob. When it comes to ACID-compliance, it’s “guaranteed” for ONE document. Either it accepts the blob or it doesn’t. A construct that e.g. makes it easier to scale as it reduces the time for locks etc. If scaling horizontally, it can still have conflicts as the cluster can become partitioned, but that’s out of scope of this post. This post is about…hmm…Bob and his documents. Oh, and some about the contentedness of documents; that is, its way to handle relationships.

Wait, can it?

Picture Bob sitting at a boring grey desk in a super big storage facility filled with archive cabinets. There’s bad lights and Bob as a freaking head-lamp to be able to manage his job. What is Bobs job? He manages documents. On his desk he has the most current documents lying in piles (cache) since he doesn’t want to wonder around to much in the moldy facilities, searching for yet another document to revision.

Bob handles a particular type of documents (collection), the ones that are about “something”. But they all look differently, since, well, there’s no real structure (schemaless). Now, to make Bobs life even more miserable, some documents need to be xeroxed, cut out and glued into another document(s) (embedding). But the worst days for Bob is when someone decides that a couple of documents should link, and preferably in more than two levels (linking, not tree). Now luckily Bob finds two of the five documents on his desk (thank god for cache, hope it was invalidated correctly), the other three documents he has to fetch in three different location, at one he needs to use a broken ladder (defragmented index) to climb up and fetch in a really high (badly sharded) cabinet.

Back at Bob’s desk, he now needs to add footnotes and references (links/relations) to the documents. The work is easily done. FACEPALM! In one of them, Bob realizes he needs to update the value that is being referenced in the other documents. No worries right? It was only a total of five documents and only a straight path of linking. Bring the Tippex. But Bob just knows he’s in for a treat. Why? Well this document was being referenced by previous documents and some nut made it possible to change the values being referenced, since that was the only value that was checked for duplicates (read lack of ability of creating additional unique constraints). Bob starts his work and 99% through, some other Bob has updated the last document (concurrency). What to do? Bob-1 doesn’t care (lack of e.g. MVCC) so he just writes over the document. After all, the time is getting close to 17:00 (the dominating hour format) and Bob wants to go home watching that soccer game and having a pint. In a rush he manages to short circuit a wall socket causing a fire. No worries. The majority of other Bobs had gained copies of Bobs work and had consensus (quorum) and started to divide up his documents among them (re balanced)…

The end!

View Comments