Relational databases like MySQL, MSSQL, and PostgreSQL are everywhere in today's computing world. Are they always necessary, though? Perhaps a more appropriate question is: Are relational databases the best solution for storing data created with object-oriented programs? Enterprise software developer, consultant, and 2007 JavaOne speaker Ted Neward addressed that very issue in a paper published last year entitled "The Vietnam of Computer Science." This "Vietnam" article was discussed heavily in the Java development community, which raised enough post-publication issues to warrant a follow-up piece. Today the second "Vietnam" article is available on ODBMS.org. In the below interview, both Ted and db4objects CEO Christof Wittig discuss the problems presented by object/relational mapping, some potential solutions to those problems, and offer explanations for why programmers continue to make the same database mistakes despite the quagmire frequently presented by ORM.
Let's clearly define the problem so that readers are up to speed. What's wrong with the state of software development as it pertains to database usage?
Ted Neward: I would say that it's not so much that it's wrong -- we don't have anything truly wrong with software development. The problem that the "Vietnam" piece addresses is that frequently developers were being led into this sense that object/relational mapping tools would somehow completely eliminate object/relational impedance mismatch. Unfortunately that's a false premise -- it's not going to hold up. You can partially eliminate it, but you can't completely eliminate it. Does that mean that all relational databases are evil? No. Does it mean that people not getting useful things done with object-oriented languages and relational databases? No. What it does mean is that people are frequently falling into a sequence of problems, particularly problems that assume things like, "Hibernate is going to encapsulate completely my access to the relational database." So now you have a complex table query, a complex schema, you have a problem where your DBAs are giving you grief about generating reports from tables that were created using the Hibernate imaging utilities -- whatever. I'm not picking on Hibernate because anything's intrinsically worse about it than other ORMs -- it's just the most popular.
That was basically the point of the original essay. Not that we're all doomed to failure -- just that we're running into the same problems over and over again.
Okay, so what's the ideal solution to the problem as you defined it above?
TN:I don't think there is an ideal solution. I'm actually a big believer in the pattern approach to looking at things. And not so much patterns in the way that a lot of people have envisioned them as code blocks, but patterns as in an analytical approach. Every problem has a solution that, when applied within a given context, yields certain consequences. It's the four-part tuple that makes all the difference in the world.
There are certain situations where a relational database is the ideal solution, there are other situations where an object database is the ideal solution, and there are certain situations where neither one is going to lead you toward your ideal persistence scenario. Unfortunately this is true throughout the technology industry. Whether we're talking about persistence, programming languages, or user interface frameworks, none of them are perfect. None of them are ideal. We tend to reach for that silver bullet, but it doesn't exist, and we keep getting frustrated and upset when we find out that some new technology isn't the silver bullet that makes all of our problems go away.
Do you think that databases in general are overused?
TN:I think developers have gotten to a point where they assume that a database is the only option for storing data. I think it's fair to say that relational databases get overused. I think it's fair to say that relational databases are at times applied in situations where they're overkill, where they're not necessary. At the same time I don't want to imply that relational databases must die. Clearly the IT industry has solidified behind them for the past 30 or 40 years for a good reason -- they do a good job of presenting data in a set-oriented format. The problem is, too many developers walk up to a project and ask themselves not if they should use a relational database, but rather which one they should choose.
Christof Wittig: I find it quite interesting that the discussion is so hotly debated on the server side -- in an audience of developers who write traditional enterprise-type applications where you usually have an abstraction between the database and the application. You have dual ownership -- who owns the code, and who owns the data model? The latter is the DBA, not the developer. So it's quite interesting that this discussion is so eminent in this space when db4o was originally positioned in the embedded space. In the device space and in packaged software and real-time control systems we don't have that discussion because people do not use relational databases there. You don't install a relational database on your cell phone, but people will simply serialize objects so they will forgo the benefits of a powerful persistent solution. This is speaking of transaction safety, updatability, schema versioning, and so on which they will get with db4o -- they'd get it with a relational database too, but they cannot deploy it there. So in our space -- in the embedded space -- it's between "write it yourself" or use something like db4o, which fits with that environment, without incurring the cost of object/relational impedance mismatch. You can mitigate that on the server side with object/relational mappers, but you can't meet persistence needs. We find that, for instance, in investment banks, they are using db4o as a cast of objects on their clients' trading desks to reduce network traffic and get less latency.
I don't know how it is in the rest of the software world, but I've learned as a Web site system administrator that it's a good idea to avoid relational database-driven programs when they aren't absolutely necessary. Nearly every problem I have with my sites in terms of performance tuning, resource usage, maintenance, search engine friendliness, security, portability to other platforms, backups, the frequency of updates, is directly caused by or has its roots in a database like MySQL or PostgreSQL. I've found that a lot of Web applications written in object-oriented languages like PHP frequently use a relational database to store information that would be more appropriately stored in a directory, static HTML files, or even a simple text file. Given these inherent database problems and the complexity of programming for an SQL database, why do you suppose so many developers take this route, even when it's far from the best solution?
TN: Dogma. At the end of the day, it's all dogma. The Java community seems to be more vulnerable to this than other communities that I participate in, but I think it's happened throughout the programming industry as a whole. There's a very dogmatic element that seems to permeate a lot of our thinking. People think that they have to have a database to store stuff, and that creates the kind of problems that you're running into with Web sites.
In some cases people are saying, "Okay, we know our site is going to go international," so they might use a database to store strings that are internationalization-sensitive -- language, currency, time format, and that sort of thing. There it makes sense to put it in an environment where data can be easily retrieved and so forth. Now could you store it in a directory format, use different filenames? Absolutely, there's no argument there. But not particularly for traditional programmers who were practically raised on relational databases who have graduated from college and taken their first job, and they know no other persistence mechanism than a relational database.
The first generation of ODBMS vendors never really achieved much penetration into the enterprise space. As a result, if you take a poll of the attendees of JavaOne and asked them how many have used an object database, I think the number would be statistically insignificant. I'd be willing to venture a guess of less than 1%. There's just a dogma that says, "I must have a relational database as my persistence method."
CW: I can give you some very interesting data on this. Within a year since we started doing business in China, the Chinese community has overtaken the U.S. developer db4o community by far. In China people don't have that legacy -- that dogma -- they don't know the bad history of object databases, so they simply love the benefit to productivity that they get from it. We have a much higher adoption rate there where this dogma is not in place.
Traditionally, programmers have enthusiastically implemented or at least auditioned new languages, techniques, and technologies. The sudden explosion in popularity of Java in the late 90s, and more recently Ruby on Rails and Ajax, is evidence of this. Why haven't software developers embraced object databases as rapidly? Why aren't they more popular if they're a more elegant solution?
CW: You assume that they're not popular. They're not popular with a certain group of people that have been in the database discussion wars of the 1990s -- they still have that dogma. We currently have logged more than 1 million downloads in a short amount of time; we have more than 20,000 registered developers who are intense users of db4o; and then there are commercial implementations with customers like Boeing, Ricoh, car manufacturers, and cell phone makers that use db4o very passionately. So I wouldn't say that they're not liked -- they're just not that hyped. I don't think that hype is a good thing, though, because it tends to be short-lived. Databases are something more conservative. I think that was one other answer to your question. People want to stick around with a database choice for many years -- I think that will make a difference as well.
TN: I'm not sure I can agree completely with everything Christof's said. Most notably, I'll go out on a limb here and say that I don't think object databases are popular at all. Sorry Christof!
CW: <laughs> It's okay.
TN: But Christof comes more from the embedded space, I come more from the enterprise space. In the enterprise space if you walk into a big enterprise consulting job and say you're going to choose an object database, you're libel to get tarred and feathered. That's something I generally try to avoid in my consulting engagements. I think the bigger issue here is that technologies like Ruby on Rails, in many cases they don't have a recent history. Ruby in particular -- the language -- is not the "gateway drug." Rails is -- it gets people into Ruby. Once people go down the Rails path -- really the discussion there is all about productivity. Ruby, in many respects, is the Visual Basic of the Java community. In the .NET world where we have these tools that are geared more toward rapid application development and productivity and so forth, we don't see as much of the Ruby excitement.
So I think what's happened here is, because so many of the enterprise developers do have some history -- or at least have connections with somebody who has some history -- with the object database... if you're a young college intern programmer on his first job, and you scour the Internet looking for the largest possible pool of projects to draw from to make a good impression on your boss and the rest of your team, and you come across db4o and bring it back to them and say, "Wow, you guys have got to check this out -- it could really improve our productivity!" Somebody on that team is likely to say, "Aw, dude, object databases? Boy are you young. Let me tell you how that worked out -- let me tell you about when we tried that back in 1947." So in many cases I don't think the object database even gets a fair shot or trial.
Also, a friend of mine -- Glenn Vanderburg -- does a presentation at a variety of software conferences called "Everything old is new again." In particular he points out that there is sort of a hype curve. We talk about something and think it's really exciting and will change the world and make everyone's lives better, the men will be strong, the women will be beautiful, the children will all be well-behaved. Then people start using that technology and blindly adopting it in places where they very likely shouldn't, they get some spectacular failures, and then the backlash begins. Then people say, "Ah crap, I can't believe we ever thought that this would be at all useful." People start avoiding and badmouthing it. It becomes basically the hip thing to disrespect. EJB for example, or Web services, or whatnot. Then eventually you reach the trough of understanding, where someone says, "Why did that thing fail so miserably?" They start looking at it without the hype surrounding it, and they start to realize that there is a place where it is applicable -- just not in all the places they originally thought. Then they figure out how to use it more successfully. Just about every technology that we see and use today has followed that same curve. Certainly object languages followed that curve, the relational database followed that curve...
CW: Linux!
TN: Well, Linux is definitely one... it's definitely one perspective.
The notion of objects -- remember with Oracle -- was building the object database, and people snickered and laughed. Sure enough you don't see Oracle talking about objects anymore. OS/2 was going to be the object-oriented operating system, and so on and so forth. And then objects became passe, but then we started thinking about objects in a more cohesive and coherent fashion, and started to use them more appropriately. Certainly the relational database will follow that curve as well. There was a time when people in the IT department would have said that you could put your relational database in when you pry my flat files from my cold, dead fingers. It's that notion, as Christof said, that people are concerned about changing... but also that sense of -- people have to go off and abuse the technology before they figure out how to use it correctly. Believe me, the same thing is going to happen -- if it's not happening already -- with Ruby on Rails. I've talked with a lot of people who are using Groovy or have used Groovy and are shaking their heads when they hear how Ruby on Rails is being applied on certain projects. They know that those are bad uses for the technology, and that they will give Ruby on Rails a bad name. After that point comes the trough of understanding and find ways to use Ruby on Rails that are genuinely successful.
As long as there has been the idea of commercial open source software, there have been theories on how to make money from it without violating the spirit of the open source community. Apparently what you're doing with db4objects works well for you. What advice would you give other open source projects or companies with regard to balancing the need for profit with the desire to keep the code open and available for others to use, study, and modify?
CW: With the dual license model that we run, which is the same as MySQL, Sleepycat, TrollTech, and many others use, I think commercialization and open source code are not a contradiction at all. The source code with db4o comes under the GNU General Public License version 2, which is well-understood and people know how to handle it. If people use the GPL-licensed code and want to redistribute the work, they have to reveal the code as well. It's a tit-for-tat license and people understand and are very comfortable with that. But there are some who say that they like db4o but can't open-source their own work, so they use the alternative commercial license. If you are commercial, we are commercial. If you are open-source, we are as well. That is the basic principle for the dual license model. In addition, db4o has very successfully created a third licensing option which we call the db4o open source compatibility license (dOCL), which is basically an agreement that open source projects can use and redistribute db4o even if they are not under the GPL themselves. So we have basically created a redistribution license for GPL-licensed code without constraining projects like Apache or Eclipse. So people use db4o extensively in a large number of non-GPL open source projects. Our community traction is really large, so as I mentioned -- Fedora Linux, Spring modules, Interface 21 -- build db4o proactively into Spring modules. All this was made possible by a friendly approach to the open source community. So I think it's not a contradiction at all -- it's the best of both worlds, and to the benefit of everybody.
What's the question I didn't ask in this interview, but should have? What should appear below this line?
TN: In terms of the paper itself, hopefully it stands alone in expressing its message. Probably the most hotly debated aspect of it has nothing to do with technology, but more the appropriateness of the analogy. There are a couple of people who have said some very scathing remarks in email and blog posts, saying that nobody ever died from ORM, and so forth. To which my response is that I'm sorry if anybody's offended by the appropriateness of the analogy, but hey -- the war is 35 years over. We should really be at a point where we're moving on. Despite whether you had family who perished in Vietnam or whatnot, the appropriateness of the analogy still stands in that... if you read McNamera's memoirs, the United States government very well understood the fact that it was walking into a potential quagmire situation, and still chose to do it believing it could manage it. To me, that's the crux of the whole thing. Many developers walk into the object/relational world thinking that they can manage the potential quagmire that an ORM can present to them, believing that they can manage the dual schema problem, and the network traversal problem. Then they discover much later, after it's far too late for them to pull out and start over, that they really haven't managed it at all and that something is biting them in the ass in a significant way.
This is not intended as a general call to avoid or banish ORM, but to simply be aware of the problems that are intrinsic in ORM, and to deal with it in one of several fashions. It doesn't mean you have to go off and use an object database, and in some cases it means use the ORM for the 80% of cases where it solves your problem, and drop back to traditional SQL for the remaining 20%. But if you really want to close the loop completely, an object database does a great job of doing that. Or we could talk about integrating relational concepts directly into our programming languages, which is where some of the research work is going on now.
Ultimately the paper is simply saying, "Look, there is this problem that all of us keep running into. Let's point out the white elephant in the middle of the room and talk about it. Let's discuss it. Let's look at solutions, of which the object database is an attractive one, and let's go from there."
CW: When we read Ted's first "Vietnam" article I thought it was a real tight change. The consequence is that we engaged with each other and set up the second paper. What I thought was very remarkable was that, for the first time, object/relational mappers were not seen as the salvation to all of these problems, but actually in the defensive -- they were a quagmire. They are not a clean solution to a very fundamental problem in the computer industry. I think that is facilitated by the fact that people have converged on Hibernate as the leading object/relational tool. In the years before, you were hopping from one tool to the next, hoping that one would make the pain go away, but it simply didn't because of the underlying structural problems. And now, as we know, Hibernate is sort of the endgame there -- we've found that the problem still does not go away. So I think it's really a tight change -- really a shift in the industry. That doesn't necessarily automatically make object databases popular -- I'm totally aware of that -- but I think it will take out this preconception that object/relational mappers solve the object/relational problem. The problem is still there. It's even labeled by Oracle as the largest single problem in computer science that is still there, that still causes delays in projects and makes them unmaintainable. I think it's time that people started to have a real discussion about this, and that's what we've tried to achieve with the second "Vietnam" paper -- to see how other options and object/relational mappers address those issues.