A (business intelligence) platform does not offer true Business Intelligence unless it satisfies all of these criteria:
Breadth: It integrates functions and technologies from across the organization. Truly integrated BI integrates data from every corner of the organization—from operational/transactional systems, multiple databases in different formats, and all contact channels. The information flow can then transcend functional silos, organizational boundaries, computing platforms, and specialized tools.
Depth: It reaches all who need it, in a way that is relevant to them. A true BI solution provides appropriate interfaces and tools for users at different levels of the organization, who have profoundly different needs. The results of analysis should be easily disseminated across all functional areas and organizational levels, so everyone can contribute to the organization’s success.
Completeness: It is a comprehensive, end-to-end platform. Business Intelligence success does not just happen at the application layer. And it is not just query and reporting. It depends on a chain of applications and technologies working together from a common data foundation to create a single, verifiable version of the truth.
Advanced analytics: It delivers predictive insights, not just hindsight. Online Analytical Processing (OLAP) is a valuable part of the picture, but it is not your optimum source of competitive differentiation. Historical query and reporting—what many vendors call “BI”—merely tells you where the organization has been. Going beyond BI requires predictive analytics, such as forecasting, scenario planning, optimization, and risk analysis.
Data quality: It gives applications one validated, verified version of the facts. Data are vital to the decision-making process, and ensuring that you have the right data is imperative. All major information technology (IT) analysts recognize the importance of data quality to the return on BI investment, yet many organizations are restricted by their choice of solution.
Intelligence storage: It meets the information needs of intelligence applications. The data storage platform must be able to draw on information from many sources, prepare it for analysis, and deliver it quickly to the applications and platforms that need it.
Reference: Business Intelligence Competency Centers: A Team Approach to Maximizing Competitive Advantage; Gloria J. Miller; Dagmar Bräutigam; Stefanie V. Gerlach; John Wiley & Sons; 2006
At a high level a generic data warehousing architecture can be said to comprise of the following components:
- Data Staging Layer (DSL)
- Operational Data Store (ODS)
- Data Warehouse (DW, EDW for an Enterprise)
- Data Marts (DM)
Each components functions are follows:
Data Staging Layer
- Decouples extraction processes from data cleansing processes
- Houses the transformation of formats of data from the incoming disparate systems to the single target data warehousing technology platform
- It reduces impact of loads on transaction system
- Maintains a ‘Snapshot’ that allows repeatable / re-startable ETL processes
- Is an area to store data using different load schedules to the DW
- Is not typically available to end users for data consumption
- Acts as an area to store 3rd party data
- Employs data manipulation not appropriate for the DW
- The data in Data Staging Area is: a copy of operational data, deleted after a short period so non-historical, in flat tables atomic in granularity
Operational Data Store (ODS)
ODS is a hybrid of data warehouse and operational systems. Functions of an ODS are:
- Allows OLTP response times, update capabilities and DSS capabilities
- Can be used for operational reporting because data from multiple source systems are integrated at this point
- Operational reporting is typically more structured so dimensional models are not required
- Data in the ODS is: kept for 6 months so non-historical, in flat tables or 3NF, often a copy of operational data, and is atomic in granularity
For architectural purposes, ODS’ are classified as follows:
- Class I – Real-time
- Class II – Two-hour to four-hour
- Class III – Daily
- Class IV -Aggregated data from the data warehouse
Enterprise Data Warehouse (EDW)
An EDW is the central hub of the BI infrastructure in some architectural approaches. The function of the EDW is to be the single point of truth for enterprise. Data in the EDW is:
- cleansed and harmonized during the ETL process
- Historical
- Distributed to various other systems
- Often 3NF but can be dimensional
- Usually atomic but sometimes aggregate
The degree of normalization in the EDW data model is a tradeoff between multiple properties or characteristics of the data warehouse. The following matrix captures the trade-offs:
Data Marts
A Data mart is a departmental, or subject oriented, subset of the EDW. Data in the data mart comes from the EDW if one exists Relationship between EDW and data marts is often called a hub-and-spoke architecture. Data in Data Mart is often aggregated but may be atomic, dimensional, may or may not be historical, and may be volatile.
Data Warehousing Architectural Options
There are three architectural approaches for setting up a flexible, future-oriented data warehouse. This architectural decision is critical since the selection will play a large role in many aspects of the data warehouse such as data modeling methodology uses and physical infrastructure characteristics. The data warehouse architecture will determine the locations of the data warehouses and data marts themselves, and where the control resides. For example, the data can reside in a central location that is managed centrally. Alternatively, the data can reside in distributed local and/or remote locations that are either managed centrally or independently.
At a higher level, the three architectural approaches are:
- Enterprise data warehouse (EDW)
- Dependent data marts
- Independent data marts
These architectural choices do not have to be used exclusively by themselves, thy can be used in combinations as suited to the organizational needs. For example an often used option is to combine the EDW and dependent data marts in the same instance. The data marts are chained to follow in lock step with EDW, rather than directly from the staging area. The grain of the marts are restricted by the lowest grain of the EDW, as is the data latency and recency.
Enterprise Data Warehouse:
An enterprise data warehouse supports a large part of business requirements for a more fully integrated data warehousing environment that has a high degree of data access and usage across departments or lines of business. The data warehouse is designed and constructed based on the needs of the business as a whole. It is a common repository for decision-support data that is available across the entire organization. The term “Enterprise” reflects the scope of data access and usage, not the physical structure.
This type of data warehouse is characterized as having all the data under central management. However, centralization does not necessarily imply that all the data is in one location or in one common systems environment. While the data is centralized, it is logically centralized rather than physically co-located. When this is the case, by design, it then may also be referred to as a hub and spoke implementation. The key point is that the environment is managed as a single integrated entity.
Independent data mart architecture:
An independent data mart architecture, as the name implies, is comprised of standalone data marts that are controlled by particular workgroups, departments, or lines of business. There typically is no connectivity with data marts in other workgroups, departments, or lines of business. Therefore, these data marts do not share any conformed dimensions and conformed facts between them. This is one of the concerns when using an independent data mart. The data in each may be at a different level of currency, and the data definitions may not be consistent – even for data elements with the same name. Independant data marts are primary candidates for data mart consolidation for companies around the world today. The proliferation of such independent data marts has resulted in issues such as:
- Increased hardware and software costs for the numerous data martsIncreased resource requirements for support and maintenance
- Many redundant and inconsistent implementations of the same data
- Development of many extract, transform, and load (ETL) processes
- Lack of a common data model, and common data definitions, leading to inconsistent and inaccurate analyses and reports
- No data integration or consistency across the data marts
- Time spent, and delays encountered, while deciding what data can be used, and for what purpose
- Concern and risk of making decisions based on data that may not be accurate, consistent, or current
- Many heterogeneous hardware platforms and software environments that were implemented, because of cost, available applications, or personal preference, resulting in even more inconsistency and lack of integration.
- Inconsistent reports due to the different levels of data currency stemming from differing update cycles; and worse yet, data from differing data sources
Dependent data mart architecture
An interconnected data mart architecture is basically a distributed implementation. Although separate data marts are implemented in a particular workgroup, department, or line of business, they are integrated, or interconnected, to provide a more global view of the data. These data marts are connected to each other using conformed dimensions and conformed facts. Each of the dependant data marts typically has a common staging area. At the highest level of integration, the combination of all dependent data marts could be thought of as a distributed enterprise data warehouse. In an implementation where the EDW and dependent data mart architectures are combined, the Staging Area is basically replaced by the EDW.