Monday, February 25, 2013

Fact Explosion Part 1


Some of the most frequent modeling challenges in a Higher Ed Data Warehouse has to do with multivalued dimensions and multivalued attributes. The truth is these are not unique to Higher Ed – they occur in every industry and virtually every data mart I’ve ever worked on. The use cases and solutions are also, technically, industry agnostic. What’s interesting in HE is just how pervasive this is in the Ellucian (Datatel) Colleague student system. But again, the modeling techniques are all well documented and fairly common place for a data architect. What I find a bit more challenging is how we represent this at the BI layer and educate our users on the various approaches, their benefits, and their risks.

Let’s start with a simple example report where we want to see ‘Total Financial Aid Amt’ by Student.

Table 1:
Name
FinancialAidAmt
Catherine
$10,000
Total
$10,000

Ok, as simple as it gets, right? 

What happens if we now want to include a student (repeating, multivalued) attribute, perhaps Major? (If we visualize the a potential star schema it may have a Student Dimension and Financial Aid Fact table. If the Student Dim is at the grain of 1 record per student we then have a nice one-to-many relationship between the Dimension and Fact table. If the Student Dimension is not at that grain, or perhaps joins to an outrigger table, and contains repeating rows for each student to accomodate multiple majors, then we now have a many-to-many from Dimension to Fact).

For example:

Table 2:
Name
Major
FinancialAidAmt
Catherine
English
$10,000
Catherine
History
$10,000
Total

$20,000


By simply introducing a multivaued attribute of student we experience a form of ‘fact explosion’ – incorrectly doubling of our Aid Amt measure – and a major reporting challenge. Our ‘Financial Aid Amt’ is not at the grain of student major (ie can’t break down Aid by major, it’s only at the Student, or Person, level).

Aside from the way I have aggregated the Total in Table 2, there is nothing inherently wrong with these data. There are few absolutes in modeling. If asked how one should handle such a scenario the best answer is always the same, “It depends”. Too often we look for the a cookie cutter solution, the “Best Practice”, but what’s more important is understanding what the goals and requirements are and having the tools, techniques, and knowledge base to apply the correct solution to fit the problem.

At a recent TDWI Conference I attended an advanced modeling session given by Chris Adamson (http://www.oaktonsoftware.com/) and he said something I thought was very well stated. He surveyed the audience and asked how many would consider themselves ETL developers, Business Analysts, DBAs, BI Developers, etc. He intentionally omitted a call for ‘Data Architects’ (DA) stating that one of the primary roles of the DA was to balance the shifting burdens of various technical challenges to the right layer (paraphrasing).

In the above example, there are techniques available to us at both the BI and data layers that we can use to fit the use case/s in our environments. Perhaps we’ll need to utilize several as there is rarely if ever only a single use case to solve for.

I’ll discuss the options in a future blog post.

Thursday, February 21, 2013

Collaborative Requirements

I used to work with a guy that was the Roger Ebert of the Business Requirements Document (BRD) critique. In a sick way I think he really looked forward to receiving them and then proceeded to slowly dissect and disintegrate any life form that contributed to their existence. He was a connoisseur and the rest of us were cheap can drinkers. Every BRD he received fell far short of his expectations.

The truth is, the entire system and process was doomed from the start. He relied on those unfamiliar with the DW space to go talk to users about there "needs" while he waited patiently scheduling cron jobs and scripting his latest stored procedure. A somewhat common scenario in the world of corporate silos that maybe relies to heavily on the BSA and separation between tech team and the business. This is a flawed approach and, thankfully, slowly going by the wayside.

If you're part of building the DW, take part, contribute. Roll your sleeves up and get dirty. Ensure you're invited to those critical meetings. BI doesn't succeed with tech geeks that can't understand the problem their trying to solve. We only exist to support the business and need to remember that and ensure we take part in that process. Don't forget about the first word in Business Intelligence.

If you're role is to develop or architect a solution then don't rely on the person with the BSA title to do this work alone. Collaborate on the effort as a team. This seems like less of a problem in HE versus the corporate world, at least where I am at. Maybe I'm lucky.

We all have skin in the game.


Tuesday, February 19, 2013

BI Portfolio & Standardization

In BI the term 'standardize' is often misused and too narrowly defined to mean ONE single tool or platform or report template. This is a false reality and one our user base will either outright reject or comply with frustration. As technologists, we too often want to become part of the story rather than just providing the tools to tell it. Standardization needs to not be narrowly defined as ONE single thing but a set of things (tools, conventions, templates, processess, etc) that can address each business use case.

Of course there is a fine line we straddle here from being streamlined to fragmentation. Most of us can speak to this in our every day lives as users of iPhone and Android devices. One'e greatest sell is providing a single consistent user experience. The other promotes customization and flexibility. A great debate, albeit out of scope for this post, but something we also face on our DW/BI projects. A standard consistent user experience can happily coexist with the flexibility to customize and adjust to changing business needs, though it certainly will require more thought, discussion, and caution.

This is facing us in my current project. We have purchased SAP Business Objects Edge platform and are building our user base and training offerings on the Webi Rich Client. There is this thought though that "Webi" is now our standard BI tool that everyone will use in all (ok most) situations. This is not reality. Webi does not fit every analytical use case nor should one try to make it do so.

We will aim to standardize at the Portfolio level not at the one Tool level around a robust set of options in:

  • Reporting
  • Ad hoc analysis
  • Data visualization, dashboards, mashups
  • MSFT Office integration

We'll be looking at SAPs offerings in each category against some additional players like QlikView, Tableau and Microsoft PowerPivot and PowerView.  It will raise the great Best-of-Breed vs Integrated Solution debate.

I'll update as we go.


Monday, February 18, 2013

Data Warehouse Architecture

One of the first decisions any Institution will have to make is what Data Warehouse architecture to commit to. I use the word 'commit' with intention. While this is hardly an all or nothing endeavor (few if any organization can claim to have any one single architecture) if you want to arrive somewhere it's important to have that goal in mind because there will be an enormous number of decisions to be made along the way.

Below is an overview of the major approaches to an 'Information Warehouse'. Two cautions, one these are just overviews and two I am taking a liberal view of the term "Data" or "Information" Warehouse (yes, I know many would not consider an ODS a DW, but let's just look at them all for sake of discussion).

1. Kimball
The classic Dimensional Model that most seem to associate with a DW. Here, data are acquired from their source system (ETL) and delivered into a target Dimensional design or Data Warehouse. Within the larger DW exist data marts of subject-are focused data sets which are brought together via the use of conformed dimensions. Our BI tools will then sit on top of the DW.

2. Inmon
Bill Inmon is credited with coining the term Corporate Information Factory and has also been, incorrectly, accused of being opposed to the Dimensional design Kimball is so famous for. He is not. Don't feel bad, for years I too was operating under this false premise assuming he was opposed to the Dimensional Model and advocated reporting from a model in 3NF (3rd normal form) - aka one similar to our source OLTP. We stopped listening at this point I guess.

The Corporate Information Factory does in fact contain data marts in the form of a Dimensional Model, fed from the Enterprise Data Warehouse (EDW),  it's just where (or at what stage in the entire data flow) these data are accessed that differentiates Inman with Kimball. Again, Inman advocates the EDW to be in 3NF and then feed into subject-are focused data marts (in dimensional format) while Kimball will allow the 'EDW' to be in dimensional format and accessed directly - simply 1 less degree of separation from the end user. These differences are likely only ever experienced by our Data Architects and ETL developers.

So why all the controversy? Who knows....

3. Departmental
Almost a natural extension of life in today's organization. Requests to IT are answered with not enough time, resources, or funding to address business needs leads to a proliferation of self-made, self-grown shadow systems that ultimately transform into 'production' ready lifelines that run our companies. 

Because of this, I am seeing a trend, particularly for consultants, away from the Enterprise Data Warehouse (EDW), especially in large organizations, where more departmentally focused information silos are being maintained and funded. Why? High Cost, diverse requirements, lower Total Cost of Ownership (TOC), quicker time to market, to name a few. This will be an interesting trend to watch. Perhaps more evidence towards distributed IT?

4. Operational Data Store (ODS)
The ODS is common in today's world as it's often quickly built and fairly easily administered. A lot of these are simply copies of our ERP's database over to a new reporting instance, perhaps from the nightly backup, with minor enhancements and a BI layer built on top. Viola.

This very well could be an attractive path for many organizations. Why?

Vendor ODS
a) It's often being sold to us. Literally. All of our ERP systems and most 3rd party software packages are shipping, for a small additional cost mind you, with their own pre-built data marts and associated BI layer to have you up and running quickly and with a small IT footprint. (this is why all of the big ERP vendors swallowed up major BI players). Ellucian sells this as part of their Colleague system using Business Objects as the reporting layer.

It's obvious why this is an attractive option. No big DW project, that spans months and years, with lots of resources costing lots of money. In the real world, I've never seen this dream become reality. In short time, our business users are asking more sophisticated questions that can not be handled by the vendor's  ODS, they want to make tweaks to canned reports that either can't be done or then go back into the IT queue (contrary to why we bought this things in the first place), and if those changes can be made we are at risk of losing support by our vendor because we've modified their product. Most, if not all, of these project that I've seen result in reworks of some kind or even a complete redirection.

It's sounds like I am totally opposed to this approach. I'm not. I do think there is a valid use case for a vendor shipped ODS but I would recommend going into it with eyes wide open being aware of its ultimate limitations and perhaps even committing to it sitting alongside a more traditional DW.

In-House ODS
b) A second type of ODS is largely the same as the one I've just described except it is developed and maintained by our own in-house resource. This has the benefit of providing a relatively quickly built reporting environment while also being, hopefully, more flexible and accepting of change because it is internally administered.

If you go the route of the ODS be keenly aware of what you're getting and what you're not. It may largely deliver basic reporting needs to our business but will struggle, if be able to handle at all, more advanced analytical queries such as historical longitudinal questions where a DW and OLTP take vastly different approaches to historical data.

5. Hybrid
This is a particularly interesting, and at times attractive, compromise between the traditional Dimensional Model and the ODS approach. With core, conformed dimensions centrally built and maintained (often the ONLY place a truly centralized, multi sourced Student or Faculty may exist, for example) they can be joined with more traditional operational (ERP, OLTP) table structures.

I use a Hybrid approach quite frequently for prototyping or POC (proof of concept) during projects before converting to a more traditional DW design. Though, I have as well delivered perfectly working models using this approach that are still in production today with happy users. I think this approach becomes more viable and palatable if we have already built our core conformed Dimensions which are then joined to operational data structures for smaller reporting needs.








Friday, February 15, 2013

Iteration 0: Initiation

Project BI Lab initiation. In true Agile spirit.

I've worked in BI and DW all of my career in one form or another. There's a lot to this area. From database development and architecture to ETL design to data architecture and modeling to front end BI development. And I left several things out. To be successful in this space takes a certain amount of luck of which I have been the beneficiary. Luck to have the opportunities to build skills in all these different areas. This didn't happen because I am necessarily good or talented but an opportunist at heart that devoted a great deal of time for many years to learn and grow. The most important and influential years of my career have been spent at the database level. That's where I get most excited. You may notice this bias coming through in my writings. Foundational level design is paramount to all upstream technology.

I've had the privilege of working in this space across several industries - insurance, retail, manufacturing, finance, media, pharmacy - and now in Higher Education. I'd been told how different Higher Education is. It's not. At least not from a technical perspective. The modeling challenges are the same, just the data are different. A NULL is still a NULL and has certain ways of being handled. Multivalued dimensions are handled the same whether in manufacturing or Higher Ed. But where Higher Ed differentiates itself from private industry is in culture. This is important and provides a great opportunity to build a truly sustainable BI program specifically around program level initiatives like Data Governance, building a BI Competency Center or Center of Excellence, strong IT and community partnership, and collaborative strategic planning.

The goal of this blog is simple. To use the experiences of a small New England Liberal Arts College's embarkment into building a Data Warehouse and Business Intelligence Program from ground up as the backdrop for topics that are common across most of these projects with a specific focus to the Higher Ed community. Those topics will span technical to more program and business level.

I hope it provides value to someone, somewhere. And I look forward to, and encourage, mutual growth, sharing, and collaboration both on and offline.