The level of aggregation as well as the granularity of details required from management information reports is not known to absolute certainity at the time of analysis and design of a data warehouse. At the same time, it is not optimal to assume that the data warehouse should hold the most granular data available in the transactional systems. Therefore, the data warehouse must be designed in such a way that it provides optimal support for aggregation on the fly and for navigation through aggregated hierarchies, thus supporting reporting in any desired layout.

While there are sound methods for the analysis and design of ordinary transaction processing  systems, a comparable method for the development of management information systems such as a data warehouse remains not solidly defined. Inmon deals with many phenomena related to data warehouse design, but leaves the ‘how’ of it completely untouched. Schouten’s article (link) on analysis and design of data warehouses is an attempt to devise a proper way of thinking and working to achieve this goal. Schouten outlines two complementary methods for the analysis of data warehouse relations; one simple and the other advanced. The simple method exploits the knowledge contained in an ordinary relational schema. The advanced method is based on the analysis of derivation rules. Subsequently, the design of data warehouses based on these methods is investigated. Special attention has been given to the actuality of data warehouses that contain historical information, to the transitivity of derivations, to the navigation through aggregation hierarchies via so-called drill paths and to the maintenance of various aggregation levels within a single data warehouse relation.

One of the goals of a data warehouse is to provide management information that will be used to make organizational policy decisions. By nature, all management information is derived from operational information, i.e. information created and stored in transactional information management systems. Therefore, it follows that management information is almost always an aggregated, derived form of the operational information. A distinct observation about the difference between analysis and design as applicable to data warehousing systems is as follows: analysis consists of the detection of derivable facts and the applicable derivative rules, while design consists of grouping the derivable facts into well formed relationships in the data warehouse.

Citation: H. Schouten, “Analysis and Design of Data Warehouses”;  Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’99)

There is a continual pressure to reduce information latency times in the quest to keep a finger on the pulse of a business. This pressure stems from the rapid changes in the business landscape that have shaped IT strategy over the past four or five years. The move to real time information delivery demands just-in-time reporting. This demand is not restricted to the operational or transactional reports; the demand has expanded to include the analytics and business intelligence as well. Traditionally, these analytics and business intelligence applications were sourced from data warehouses that would be updated in a nightly batch window. As such, a day’s latency was the norm for analytics, and sufficed to know how the business did at the end of the last business day. However, in today’s world, it is necessary to know how the business is doing every hour. This is especially true for high volume rapid transaction systems such as financials, retail and airlines.

The demand for real time reporting and analytics is translated to stringent requirements on the backed data integration systems. The traditional flow of data to a data warehouse is via the ETL routines. These routines need to be architected to meet the near-real time latency requirements. Some typical challenges in a real time data integration environment are:

  • Necessity to conduct business 24/7 is reducing batch windows. As more business is done across time zones and over the web, many organizations are faced with the problem of shrinking batch windows, making it more difficult for traditional ETL tools to extract data in the short time available.
  • Large volumes of information are difficult to handle in a batch window. As more information is gathered – such as online transaction data, inventory data, and customer information – the effort involved in moving it to the warehouse increases drastically. Many organizations are finding that an eight hour batch window is no longer sufficient for traditional ETL (extract, transform, load) tools to integrate all of the needed data.
  • The need to track all changes for auditing purposes. Organizations need to comply with regulations, which often requires them to continuously track all changes to data and not just the net result of those changes.
  • Growing need to detect and react to business events as they happen. Many organizations are looking for ways to detect business events in production systems and have those events trigger a response in another system. For example, a cell phone company would like to send a text message to a customer running low on minutes asking if him if he would like to purchase more.
  • Organizations want to deploy new applications using data on legacy systems without paying for an increase in workload. Often, legacy systems are already maxed out and new capacity is very expensive. Organizations want new applications on UNIX or Windows to avoid this cost, but integrating the data from those legacy systems without increasing the load on them is a key challenge
  • Increasing need to keep data in sync across the enterprise. Customers want up-to-the-minute access to order, payment and inventory data so they can buy products, pay bills and check delivery status online. Employees need much of the same so they can better service customers and make wise business decisions. To accomplish this, eCommerce data needs to be in sync with business applications and data needs to flow in real-time across the enterprise.

 For more information, the reference for this post is here (Link to IBM)

Content Protected Using Blog Protector By: PcDrome.