Monday, February 18, 2013

Data Warehouse Architecture

One of the first decisions any Institution will have to make is what Data Warehouse architecture to commit to. I use the word 'commit' with intention. While this is hardly an all or nothing endeavor (few if any organization can claim to have any one single architecture) if you want to arrive somewhere it's important to have that goal in mind because there will be an enormous number of decisions to be made along the way.

Below is an overview of the major approaches to an 'Information Warehouse'. Two cautions, one these are just overviews and two I am taking a liberal view of the term "Data" or "Information" Warehouse (yes, I know many would not consider an ODS a DW, but let's just look at them all for sake of discussion).

1. Kimball
The classic Dimensional Model that most seem to associate with a DW. Here, data are acquired from their source system (ETL) and delivered into a target Dimensional design or Data Warehouse. Within the larger DW exist data marts of subject-are focused data sets which are brought together via the use of conformed dimensions. Our BI tools will then sit on top of the DW.

2. Inmon
Bill Inmon is credited with coining the term Corporate Information Factory and has also been, incorrectly, accused of being opposed to the Dimensional design Kimball is so famous for. He is not. Don't feel bad, for years I too was operating under this false premise assuming he was opposed to the Dimensional Model and advocated reporting from a model in 3NF (3rd normal form) - aka one similar to our source OLTP. We stopped listening at this point I guess.

The Corporate Information Factory does in fact contain data marts in the form of a Dimensional Model, fed from the Enterprise Data Warehouse (EDW),  it's just where (or at what stage in the entire data flow) these data are accessed that differentiates Inman with Kimball. Again, Inman advocates the EDW to be in 3NF and then feed into subject-are focused data marts (in dimensional format) while Kimball will allow the 'EDW' to be in dimensional format and accessed directly - simply 1 less degree of separation from the end user. These differences are likely only ever experienced by our Data Architects and ETL developers.

So why all the controversy? Who knows....

3. Departmental
Almost a natural extension of life in today's organization. Requests to IT are answered with not enough time, resources, or funding to address business needs leads to a proliferation of self-made, self-grown shadow systems that ultimately transform into 'production' ready lifelines that run our companies. 

Because of this, I am seeing a trend, particularly for consultants, away from the Enterprise Data Warehouse (EDW), especially in large organizations, where more departmentally focused information silos are being maintained and funded. Why? High Cost, diverse requirements, lower Total Cost of Ownership (TOC), quicker time to market, to name a few. This will be an interesting trend to watch. Perhaps more evidence towards distributed IT?

4. Operational Data Store (ODS)
The ODS is common in today's world as it's often quickly built and fairly easily administered. A lot of these are simply copies of our ERP's database over to a new reporting instance, perhaps from the nightly backup, with minor enhancements and a BI layer built on top. Viola.

This very well could be an attractive path for many organizations. Why?

Vendor ODS
a) It's often being sold to us. Literally. All of our ERP systems and most 3rd party software packages are shipping, for a small additional cost mind you, with their own pre-built data marts and associated BI layer to have you up and running quickly and with a small IT footprint. (this is why all of the big ERP vendors swallowed up major BI players). Ellucian sells this as part of their Colleague system using Business Objects as the reporting layer.

It's obvious why this is an attractive option. No big DW project, that spans months and years, with lots of resources costing lots of money. In the real world, I've never seen this dream become reality. In short time, our business users are asking more sophisticated questions that can not be handled by the vendor's  ODS, they want to make tweaks to canned reports that either can't be done or then go back into the IT queue (contrary to why we bought this things in the first place), and if those changes can be made we are at risk of losing support by our vendor because we've modified their product. Most, if not all, of these project that I've seen result in reworks of some kind or even a complete redirection.

It's sounds like I am totally opposed to this approach. I'm not. I do think there is a valid use case for a vendor shipped ODS but I would recommend going into it with eyes wide open being aware of its ultimate limitations and perhaps even committing to it sitting alongside a more traditional DW.

In-House ODS
b) A second type of ODS is largely the same as the one I've just described except it is developed and maintained by our own in-house resource. This has the benefit of providing a relatively quickly built reporting environment while also being, hopefully, more flexible and accepting of change because it is internally administered.

If you go the route of the ODS be keenly aware of what you're getting and what you're not. It may largely deliver basic reporting needs to our business but will struggle, if be able to handle at all, more advanced analytical queries such as historical longitudinal questions where a DW and OLTP take vastly different approaches to historical data.

5. Hybrid
This is a particularly interesting, and at times attractive, compromise between the traditional Dimensional Model and the ODS approach. With core, conformed dimensions centrally built and maintained (often the ONLY place a truly centralized, multi sourced Student or Faculty may exist, for example) they can be joined with more traditional operational (ERP, OLTP) table structures.

I use a Hybrid approach quite frequently for prototyping or POC (proof of concept) during projects before converting to a more traditional DW design. Though, I have as well delivered perfectly working models using this approach that are still in production today with happy users. I think this approach becomes more viable and palatable if we have already built our core conformed Dimensions which are then joined to operational data structures for smaller reporting needs.








No comments:

Post a Comment