Why you should be using an Object Database instead of ORM: Part 101-08-2009 7:03 AM permalink
I think traditional relational databases (RDBMS) are generally crappy for as the data store for complex dynamic web sites. But despite the inflammatory nature of that first sentence, I'm going to try not to turn this into a rant. Instead, I'll try to explain the problems I see and why I came to that conclusion.
The problem with RDBMS
First of all, I don't have a problem with RDBMS. I think it has served us very well up to this point and will continue to do so for the forseeable future. It's a perfectly workable solution to the problem of storing and retrieving persistent data.
I don't think anyone will take issue with that argument. So what's wrong with relational databases? The answer has been getting a lot of attention lately so I'll keep it simple and let Wikipedia define it for me. Impedance Mismatch. I pretty much agree most of the points listed, and several of them can be rolled up into one argument. Essentially, RDBMS doesn't fit the object-oriented way we like to use data on the web.
The last sentence is important, because it puts some necessary context on the argument I'm making. I'm gonna go out on a limb and say that the majority of working web professionals have bought into object-oriented programming. That's not to say most people program this way. PHP is still the most popular web language and many people still generate pages in a very top down imperative way. (I'm one of those people who thinks PHP lends itself to this style, but that's a different rant... I mean discussion). Also, some functional languages like Haskell and Clojure are getting attention as being viable for web programming.
But whenever I see examples of people doing anything useful, I see a trend. We like to work with object-like data. More and more websites are being built to provide access to bigger and more complex collections of data. Movies on Netflix, People and their friends on Facebook even boring stuff like all of Physicians in your Insurance network under your employee health benefits plan. So it's more and more essential for programmers to be able to create a good mental model of the data and how to work with it. Creating objects representing these different data collections and then composing them together seems to be the best thing going.
When using an RDBMS, in the simplest form, "object-like" just means lists or arrays that represent rows in the database. This is generally where PHP falls. You use PDO or some equivalent, write your SQL statements and you can access your data like this.
$get_my_employee = 'select * from "Employee"';
$emloyee = $db->query($get_my_employee).fetch(); // simplified on purpose. you get the point
echo $employee["First Name"];
echo $employee["Last Name"];
Not too bad, except for that first bit, but it seems harmless enough so lets ignore it for now. It's just a collection of related data points representing some information on an employee. But things are rarely this simple, and that last line looks suspicious too. Because we're all about normalization and proper organization of our data, we've split all of our Spouses into a separate table and linked them through a foreign key. So making a long story short, that first line isn't so harmless anymore. You have to do your join and specify the fields you want and put your necessary filter clauses on it until you get all your data in one fell swoop.
$get_my_employee = 'select e."First Name", e."Last Name", e."Phone Number", ... s."First Name" from "Employee" e, "Spouse" s where e."SpouseID" = 'schmoopie' and e."SpouseID" = s."ID"...';
I put in ellipses in place of some details to speed things along. But depending on the complexity of your database schema, those ellipses are hiding some headaches. Not to mention the fact that you have two columns with the same name that represent different data points ("First Name"). It's left as an exercize for you to determine which is which by adding even more code. Fun times.
But eventually you end up with one long array with the employee you're interested in along with all the data about their Spouse as well. You completed your task but the journey was a pain, and don't even get me started on debugging and maintaing that long SQL string.
The Problem with ORM
Okay, I think I made the last section a sufficiently irritating explanation of what we already know. I wanted to go through this to drive home a point though. Problems like this are the reason that Object-Relational Mapping (ORM) is all the rage right now.
ORM allows us to take those nasty details and abstract them away. We can get nice simple objects to work with and get back to the real business of making our web page do something cool. More importantly, it allows us to deal with the relationships between our data items in a sane fashion. No more jumping through hoops to get all of our data points into one row. Let's switch to Python/Django just because I haven't worked with any PHP ORMs yet.
employee = Employee.objects.get(id=1)
spouse = employee.spouse
Ah, that's more like it. Even my non-programmer friends (who's eyes glaze over when I talk about code) can piece together what's going on here. And that first line replaces the previous unpleasantness with all the ellipses. So what's my beef with this? On a theoretical level, nothing. I'm a fan of Django precisely for this reason. But from a more practical perspective, this sets off that nervous twitch I get when something is way harder than it needs to be. That RDBMS is still there. Lurking. The ORM evangelists will respond with "yeah, but you don't have to worry about it". First off, I call bullshit on that. It never works out that way. But let's say we're drinking the kool-aid today and I believe that. The spirit of ORM is to literally take something that was designed to work a certain way (RDBMS + SQL), and take lots of time and effort to make it work a different way (Objects + simple filter functions). Wrapping it in layers of abstraction until it doesn't even look like itself. Let's talk about some of the problems.
It's an Abstraction.
When I first started looking at ORM, I thought this was a good idea. Maybe because I was already working with objects as data on a regular basis. One of the platforms I started out on was Zope, which is built on a datastore that represents actual Python objects composed together in a manner of your choosing. It has it's own issues, but in terms of web programming, it felt natural. It took a while before I got "comfortable" with SQL. It has always felt, for lack of a better word icky. The developers of Django's ORM understand this. If you look at the documentation for Django Querying, the api goes through great pains to insure that you can avoid the ickiness. You can represent almost any complex query without writing SQL.
## The example assumes you have a Model representing a Blog and this has a number of Entries related to it.
## To select all blogs that contains entries with "Lennon" in the headline and were published in 2008, we would write:
I'm sure the last of my non-programmer friends have drifted away after that. But if you are a programmer and you're trying to get a handle on how this works, you run into the following bit of nastiness before you even get to this example.
To handle both of these situations, Django has a consistent way of processing filter() and exclude() calls. Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.
That may sound a bit confusing, so hopefully an example will clarify. (the above code example follows)
Even the developers of one of the best schemes out there (in my opinion) have to admit that they've got some crazy juju going on under the hood. This is because they have to take what you're trying to do and put the ick back into it. These simple filters get translated into not so simple SQL. And to top things off, they frequently try to help you along by giving you the equivalent example in SQL. So you frequently end up doing double work.
- Step 1) How would I do this with a complex SQL query?
- Step 2) How do I translate Step 1 to my nifty ORM api so I can get my data as objects?
The complexity of the ORM scheme leaves you wide open for leaky abstraction problems. And often times, when you run into these, you lose a lot of the time you saved by using the abstraction in the first place.
If it doesn't work, you're back where you started
Is this really an acceptable trade-off to be able to work with object-like data? Absolutely. Hence the popularity of Django and Ruby on Rails and Symphony (or whatever php framework is actually in style these days) But to boil it down to the point of this post. Why are we choosing to deal with this impedance mismatch when we don't really have to?
In part 2 to this post, I'll take a shot at talking about object databases and making a case that they should become a more common addition to our web framework architectures.