danielwertheim

danielwertheim


notes from a passionate developer

Developer that lives by the mantra "code is meant to be shared".

Share


Tags


Disclaimer

This is a personal blog. The opinions expressed here represent my own and not those of my employer, nor current or previous. All content is published "as is", without warranty of any kind and I don't take any responsibility and can't be liable for any claims, damages or other liabilities that might be caused by the content.

Would you expose your DB for querying using SQL?

Daniel WertheimDaniel Wertheim

I'm guessing that 98% would say "NO", 1% would say "yes" due to building some internal APIs or something. The final percent will say "yes" without really knowing what they are doing. But is this something we even have today? I mean who works with pure SQL directly in the year of 2016? Who dares? Don't we all hear the O/RM fetishists shouting:

If you are using ADO.Net you are doing it wrong!

(If you aren't a .Net developer, ADO.Net is the lower parts communicating with the DB).

Or:

Who do you think would write a better query, you or the wizards building the O/RM?

For all you TL,DR hipsters. There isn't one. If there is. It's the last piece in this post (in the bottom).

Hmm, well let me see. I would, at least in some cases. ;-) Seriously. Maybe, maybe not. But it will probably be a waste of resources in me doing so, since the SQL produced by an O/RM will be good enough in 99% of the cases.

How-ever, if you weren't using an O/RM and instead were authoring the query your self, I think chances would be greater that you actually would reflect and pay some attention to the query itself. I know I would. For starters, I would be writing it in a tool dedicated for actually working with the DB, and in there have the possibility of looking at query statistics etc. And if I feel that the query has a certain degree of complexity, I would profile it. The same of course applies when working with an O/RM. You should use a profiler of some sort, preferably highlighting queries and potential weird access paths like N+1.

What makes databases fast (or slow)? Indexes! Missing indexes will hurt reads. To many will hurt writes. Lets say you have an index on a field. And at one point you decide to use some nifty feature in your O/RM, allowing the consumer to author dynamic queries on the fly. You have now put your self in the hands of the O/RM maker and the consumer. What could possibly go wrong with that? Nothing, right? Because we all know that the O/RM makers are SQL gurus and the consumers knows what fields have indexes etc. OK, the last point we realize is not true, so we limit access on what fields are allowed to be queried on. Sweet! Now we are safe right? Well, not if the gurus have made a design choice to e.g. force case insensitivity in your queries (completely screwing RDBMS concepts like collations, which normally is used to define compare strategies) by doing something like:

select foo from bar where UPPER(foo) == ....  

First, why is this bad? Well it will most likely fuck up the generated query plan that the query engine of the RDBMS will use. Fuck up in the sense that functions operating on row-data in the WHERE-clause, makes it not able to use an possible index on foo. Unless you are using a RDBMS that supports creating function-based indexes. Or do a workaround by indexing against a computed column.

Somewhere, the O/RM makers decided. Hey, if people would switch DB, the switch should be seamless and you should not have to care about the underlying RDBMS. OK. How many switches like this have you actually done? I've done one in like 10 years (not counting tests that execute against SQLite). If you do such a switch, you can't just neglect the underlying RDBMS. You own that! And you will invest time in it. At least so that you can merge your data to it. And BTW, good luck with doing that seamlessly. But perhaps we are talking about seamless switches in blank installations. Fair enough. Then I have one question to you. Would you prefer the O/RM to generate SQL that would get non expected query results that you can find in tests when switching engine; or would you rather have a query that generates expected results but at one point will fail and eat up server resources and generate time-outs? This will also for many people first arise when you hit production and have larger amounts of data and then it really starts to show that your query is not using an index as you thought? I for sure, would rather have my tests fail. I would not like to be called up in the middle of the night, trying to find that there's some SQL generated by an O/RM that screws you over so that an index is not used and therefore stuff starts to time-out. Try finding that quickly. Try finding the convention that you hopefully can hook in and then redeploy. How's that for a seamless switch?

So if you are building something that needs to handle more traffic than my useless blog. For god sake. Whenever you have a fishy/complex query, don't blindly trust what ever O/RM you are using for queries. Consider inspecting the generated SQL. Or do load tests. And if exposing some auto query API, consider limiting the query options against it. So that you ensure that only indexed fields are used.

//Daniel

Developer that lives by the mantra "code is meant to be shared".

Comments