I've been trying to log some things I had to overcome in learning how to leverage an Object-Relational Mapper (OR/M) for my applications' data access. In my first post I discussed the philosophical leap from Data Centric to Object-Centric application development. I realize this is not a constraint imposed by all OR/M tools but when applying my favorite, NHIbernate, it really leans you toward Object-Centric application development. There are some OR/M tools developed by folks who have a more datacentric view of software development and believe that the table schema dictates your domain model. I don't follow that school but that discussion is beyond this little series' scope.
In my second post I discussed the hurdle of how an OR/M tool deals with Inheritance. Since inheritance is an object-oriented decision and has no relational (read: Database schema) equivalent, we are forced to deal with the impedance mismatch between object- and relational- thinking.
In this part, I'd like to discuss ways to USE an OR/M and how an OR/M fits into the overall architecture of an application. This kind of thing tends to creep outside of OR/M specifics into broader architectural considerations and I will try to narrow my view, but one thing should become obvious when deciding to leverage an OR/M for data access in an application...IT IS VERY IMPORTANT AND ITS ARM OF INFLUENCE IN AN APPLICATION IS SWEEPING.
WHAT HAPPENED TO MY DATA ACCESS LAYER (DAL)?
If you have been slurping off of Microsoft's nipple of architectural guidance you will no doubt have become drunk with their 'n-tier' or '3-tier' suggestions, the majority consisting of a picture that has these little boxes and circles with pretty colors like these:
Next the tutorials will go on to teach you how to write lots of ADO.NET Database Access code and then return a DataSet either directly to your 'code-behind' pages (in ASP.NET tutorials) or even get a little wild and have you do some bizarre functions like validation within your mysterious 'Business Logic Layer' (BLL). Since a dataset isn't actually the object it represents were are then encourage to write a number of static helper methods to perform operations upon the DataSet to scrub the data before it appears into the (now bloated) Presentation layer.
What was confusing to me about all of this was the idea that this was somehow 'object-oriented' development, but then I found myself writing the same data access code over and over or having to make changes all over the place whenver something like a field definition or rule changed. I thought OOP (Object-Oriented Principles) were supposed to take away all of this duplication of effort? I find myself slipping into the infamous 'DataSets vs. POCO' argument that I think has been better elaborated upon by better developers than me. Read this Hanselman post for a good (and fun) detail of the problem with using DataSets as Domain Entities.
One nice thing about the ADO.NET way of doing things was it was pretty cut and dry...you have to hand-code all the data access code to hydrate your domain objects and also to persist changes to your database when you need to. Since you wrote all this stuff, or maybe went a little crazy and used the Data Access Application Block (DAAB) to help you, you know exaclty how data gets into and out of your classes. As an aside,I won't even get into Unit of Work or other things we need to consider to avoid muddying the water.
But now you decided to use this library (dll) that is supposed to do most of this stuff for you (CRUD functions and so on). Sounds great and my tendonitis just got a little easier to bear with, but now how do I get the data into my objects or save them to the database when I'm all done?
One thing you must remember is that the library you have decided to use (NHibernate.dll for example) is your DAL. When you need to load an object from a database, instead of opening a connection and writing all the mapping code to squirt into your object's properties you now have permission to do this:
private object myObject = Session.Get(myObjectId);
Huh? That's it? Yes. That's all.
But it now becomes obvious that these kinds of operations (Get, Save, Update, etc) can now be splattered all over my application and that I have simultaneously become 'married' to an external library (in this case, NHibernate.dll). I'll elaborate on the first point in a moment, but regarding the 'marriage' to an external tool like NHIbernate a few things I will say and elaborate upon in a future post. There is a disease calledl 'Not Invented Here' syndrome which stalks developers (especially .NET developers) and which at first seems like a wise condition to be predisposed to, but in the end can drain the life out of a project's momentum. Realize that this disease is a leading cause of Death By Future-Proofing and that we code not only against an API's specific objects (such as the 'ISession' interface in NHibernate) but also philosophically against a tool's (whether homegrown or not) way of operating. What I mean is that code changes will be required anyways even if we built an elaborate way of protecting ourselves from the very tool we choose to implement. But more on all that later on.
It is true that the manner of calling the various CRUD operations on an OR/M will potentially be divers in an application and could possibly undermine a large benefit of having and OR/M (besides no more ADO.NET code writing)...code reuse. This is where Design Patterns step in to help us establish a consistent and robust way to employ our OR/M. The Domain Driven Design (DDD) camp has given us a pattern called a Repository pattern . I am no DDD expert (you may have noticed) but I believe strictly speaking you'd incorporate a Repository only for your Aggregates in your Domain providing functions like Get(), Save(), Update() and PersistAll(). This smells alot like the Data Access Object (DAO) and in fact you'll find folks intermingling the terms. There is plenty of discussion of this on the forum for domaindrivendesign.org. The only difference I can find is that a Repository is more strictly ONLY applied to 'Aggregate roots', or those objects within a domain that establish a logical boundary, while a DAO may be freely used for any object in the Domain. The semantics of this aren't important here, but rather the simple fact that a single, reusable object is used to employ the OR/M to access the Database.
Oren's blog at www.ayende.com is a great resource for an implementation of the IRepository<T> object and Billy McCafferty wrote an article on using NHibernate using a DaoFactory. What is important to note is that this object (whether you call it Repository or DAO) hides the internals of dealing with NHibernate (especially with getting the current ISession ) and exposes very simple, but rich, methods needed by your Domain to perform persistence operations. But as I mentioned before, unless you really have a requirement that forces you to have your entire application tool-agnostic, resist the temptation to build an elaborate API that hides your tool's rich resources (such as the ICriteria implementations or HQL in NHibernate and so on).
Where you place this object in your application's project structure is largely personal. I don't want any objects in my Presentation layer to perform data operations (such as a call to Repository<T>.Save(object)) and instead have the Presenters always fetch/save from some kind of Service layer, so it might make sense to place this interface in an assembly that isn't referenced by the Presentation project. I have found it easier, though, to simply place this Repository/DAO in my assembly that provides common functions throughout the entire application.
How you retrieve this IRepository/IDao is an important conversation, but one that will be pushed off to a future post I'm afraid. Inversion of Control, Dependency Injection...I just don't have the energy to go into it right now :).
Hopefully, you'll see that your 'DAL' while employing your OR/M has become split across your application. No longer is it found in one assembly but has its low-level details zipped up in your ORM tool's assembly, while you provide the higher-level code to utilize it. How you do this is up to you, but the driving forces are code-reuse (you shouldn't have to write a Repository for every object in your Domain) and richness (you should be able to easily extend the Repository and leverage the rich API your OR/M tool provides such as querying or caching). The beauty is that you are spending time on DOMAIN concerns and not on mundane data access code...