Does you data make sense, and are you sure? Weird little things sneak their way into code and when it comes to mission-critical data you are better to be explicit than rely on some agreed-upon understanding. My favourite quote at the moment is from an article written by Steve Jones on SQL Server Central:
“Whatever is not explicitly prohibited is therefore mandatory”
This afternoon I came across a query that turned my head and made me question not only the the design of the query but the integrity of the data. The query in question deleted rows from a table based on a combination of conditions and if you were to ever make sure you got a DML statement correct – this has to be top of the list! After much discussion, we all agreed that the query would only remove the desired rows, but not as a result of sound logic. To illustrate our discussion, we jumped into my test database, WorldOfMayhem, and experimented.
In WorldOfMayhem I have a number of characters, each of which have various levels of experience, gold and a current status, isPlaying.
Let’s say we are interested in all players who are currently playing, isPlaying = ‘Y’, a playerID <= 5, experience > 50or gold < 30. Glancing over our players we can quickly determine we would exclude the first player, and return players 2, 3 and 5. Simple enough right? Or is it? If we write this query in exactly the same way as our problem query, we get some unexpected players returned:
Notice that we get players 2, 3 and 5 – but we also get player 4 and 9! Now we weren’t expecting that, so what is going on? The problem here is the or gold < 30. While we appear to be filtering on playerID and isPlaying, and only selecting those players with experience > 50 or gold < 30, what we have failed to do is group the logic and the final condition, or gold < 30. Oops.
With AND and OR – the order matters. It is tempting to think that the OR somehow binds the two conditions experience and gold, but it doesn’t. In fact it is the reverse, the AND binds together the first three conditions:
select * from players where (playerID <=5 and isPlaying = 'Y' and experience > 50) or gold < 30
What we really wanted to write was:
After much discussion, we determined that the problem query would never delete players 4 and 9 within the actual database, or that it wouldn’t matter that if it did, for the following reasons:
- any player where isPlaying = ‘N’ could be happily deleted (but hold on, this was never the intention)
- and it would never be the case that both experience < 50 AND gold > 30 would be true (but do the columns have a check constraint on them – no!)
In the end the query was correct, the database made sense and everyone was happy. But are you happy with this situation? It left me a little unsettled that a pair of parenthesis could potentially be the difference between good data and nonsensical data. A database is there to support business processes. In my opinion business rules (such as the two reasons above) would be better to be explicitly defined in the data model and queries, particularly DML, should be logically sound.