The level of aggregation as well as the granularity of details required from management information reports is not known to absolute certainity at the time of analysis and design of a data warehouse. At the same time, it is not optimal to assume that the data warehouse should hold the most granular data available in the transactional systems. Therefore, the data warehouse must be designed in such a way that it provides optimal support for aggregation on the fly and for navigation through aggregated hierarchies, thus supporting reporting in any desired layout.

While there are sound methods for the analysis and design of ordinary transaction processing  systems, a comparable method for the development of management information systems such as a data warehouse remains not solidly defined. Inmon deals with many phenomena related to data warehouse design, but leaves the ‘how’ of it completely untouched. Schouten’s article (link) on analysis and design of data warehouses is an attempt to devise a proper way of thinking and working to achieve this goal. Schouten outlines two complementary methods for the analysis of data warehouse relations; one simple and the other advanced. The simple method exploits the knowledge contained in an ordinary relational schema. The advanced method is based on the analysis of derivation rules. Subsequently, the design of data warehouses based on these methods is investigated. Special attention has been given to the actuality of data warehouses that contain historical information, to the transitivity of derivations, to the navigation through aggregation hierarchies via so-called drill paths and to the maintenance of various aggregation levels within a single data warehouse relation.

One of the goals of a data warehouse is to provide management information that will be used to make organizational policy decisions. By nature, all management information is derived from operational information, i.e. information created and stored in transactional information management systems. Therefore, it follows that management information is almost always an aggregated, derived form of the operational information. A distinct observation about the difference between analysis and design as applicable to data warehousing systems is as follows: analysis consists of the detection of derivable facts and the applicable derivative rules, while design consists of grouping the derivable facts into well formed relationships in the data warehouse.

Citation: H. Schouten, “Analysis and Design of Data Warehouses”;  Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’99)



Content Protected Using Blog Protector By: PcDrome.