The level of aggregation as well as the granularity of details required from management information reports is not known to absolute certainity at the time of analysis and design of a data warehouse. At the same time, it is not optimal to assume that the data warehouse should hold the most granular data available in the transactional systems. Therefore, the data warehouse must be designed in such a way that it provides optimal support for aggregation on the fly and for navigation through aggregated hierarchies, thus supporting reporting in any desired layout.

While there are sound methods for the analysis and design of ordinary transaction processing  systems, a comparable method for the development of management information systems such as a data warehouse remains not solidly defined. Inmon deals with many phenomena related to data warehouse design, but leaves the ‘how’ of it completely untouched. Schouten’s article (link) on analysis and design of data warehouses is an attempt to devise a proper way of thinking and working to achieve this goal. Schouten outlines two complementary methods for the analysis of data warehouse relations; one simple and the other advanced. The simple method exploits the knowledge contained in an ordinary relational schema. The advanced method is based on the analysis of derivation rules. Subsequently, the design of data warehouses based on these methods is investigated. Special attention has been given to the actuality of data warehouses that contain historical information, to the transitivity of derivations, to the navigation through aggregation hierarchies via so-called drill paths and to the maintenance of various aggregation levels within a single data warehouse relation.

One of the goals of a data warehouse is to provide management information that will be used to make organizational policy decisions. By nature, all management information is derived from operational information, i.e. information created and stored in transactional information management systems. Therefore, it follows that management information is almost always an aggregated, derived form of the operational information. A distinct observation about the difference between analysis and design as applicable to data warehousing systems is as follows: analysis consists of the detection of derivable facts and the applicable derivative rules, while design consists of grouping the derivable facts into well formed relationships in the data warehouse.

Citation: H. Schouten, “Analysis and Design of Data Warehouses”;  Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’99)

The fundamental architecture of a data warehouse is a key decision that drives all technical and functional aspects of the data warehousing project. Therefore, it is necessary to carefully evaluate all the available implementations options for the architecture before dedicating resources to the project. The architecture of the data warehouse will be defined in the planning stage of the project, and the decision process should involve all the stakeholders (business and technical). While the designing, managing and implementing architecture requires effort, the task can benefit and enhance the impact of business applications by improving availability, performance, scalability and interoperability.

Three practical data warehouse architectural patterns are found to be widespread. The current technology and organizational forms eventually lead to the DW pattern to fall into one of these three, perhaps with minor modifications. Eventually, an organization’s form drives the architecture pattern preference. The three patterns are -
1. Distributed, by moving and translating data between nodes in a network.
2. Centralized (consolidation data warehouse or operational data store (ODS)), leveraging a hub-and-spoke form
3. Federated, which is a system of distributed data warehouses, which, in its more successful implementations, also exploits a data hub.

The above patterns all require implementation of physical data stores (warehouses, marts etc.), and therefore the Extract Transform Load (ETL) layer is the heart of these systems. The alternative to these will be to avoid physical data warehousing altogether and address decision support issues by deriving business intelligence directly from operational, transactional systems. Enterprise Information Integration tool set focuses on this virtual implementation of the data warehouse. It is important to note that the virtual warehouse does not have a graphic representation different from the distributed one. However, no persisting physical implementation exists. Data is repeatedly transformed on-demand instead of being transformed and stored persistently.

In all three instances, the critical path of success lies through the design and implementation of unified and consistent data dimensions. The single most important action a business can take from an architectural point of view is to design consistent and unified definitions of reference data. Thus, the master data (or master reference data) plays an integral role in data warehouse implementations, irrespective of what pattern is chosen. An organization can then implement either federated data marts or a centralized ODS or some combination of the two (such as ODS with downstream, dependent data marts). This also enables management of diverse on-line analytic processing (OLAP) approaches as dependent data marts rather than disconnected and dysfunctional silos. The data store design will be sufficiently robust to support a flexible architecture that will accommodate future business requirements.

Content Protected Using Blog Protector By: PcDrome.