It is obvious that there is no organization running without data. The data can be viewed as tangible assets of an organization just as any physical asset. So, they need to be stored and made available to those who need them when they need them. However, the data by themselves are useless. So, they must be put together to produce useful information. In turn, information becomes the basis for relational decision making. To facilitate the decision-making process, a new development of database systems was developed called “data warehouse”.
The data warehouse can be generally described as a decision-support tool that collects its data from operational databases and various external sources, transforms them into information and making that information available to decision-makers (top managers) in a consolidated and consistent manner. (2:64)(4:82) BACKGROUND The data warehouse is not more than a database but separated from other databases like the operational database distributed database and text database. When did management start to utilize this powerful tool and why they seek to use it. The data warehouse has been developed at the beginning of 1980s. However, it was optimize to transform non-organized and lightly summarized data from the operational database into analytical tool that supports intelligent decision-making.
The term DSS (Decision support system) database is used interchangeably with the data warehouse. On the other hand, other names for the operational database are transactional database and production database. WHAT IS A DATA WAREHOUSE? The data warehouse can be very simply defined as an integrated, subject-oriented, time variant and non-volatile database that provides support for decision-making (5:39) (6:19). The following four sections will explain what this definition means.
Integration The data warehouse is a centralized database that integrates data from different sources (6:19) with diverse formats. This integration of the data provides a unified view of the overall organizational situation. Data integration enhances decision-making and helps the manager to better understand the operations of the organization (6:19). Subject-Oriented The data in DSS database are organized to provide answers to questions coming from different areas within the organization. They are arranged by topic such as sales, marketing, finance and so on. The DSS database contains specific subject for each topic like customer, product, region and so on. This form of data organization is different that of more process-oriented of the operational database system. (5:39,43) Time Variant The data warehouse contains historical data over a long time. Those data reflect what happened last week, last month, the past five years and the like. (6:19) Non-Volatile Once the data enter the data warehouse, they are never removed or changed.
Because the data warehouse represents the entire history of the organization, the data from operational database are always added to it. Since DSS data are never deleted and new data are periodically added, the data warehouse is always growing. That’s why the data warehouse must be able to have hardware that supports gigabytes and even terabytes size of databases. (5:43) THE DIFFERENCE BETWEEN OPERATIONAL DATABASE AND THE DATA WAREHOUSE The operational database and the DSS database differ in the roles the do as well as the data characteristics for each one. Main Role The transactional database is optimized to support transactions that represent daily operations (2:67).
For example, during the registration period at KFUPM, each time a student adds, drops courses, or changes sections, he must be accounted for by the operational database system of the university. So, student data and course data are in frequent update mode. On the other hand, the data warehouse is optimized to support data analysis and decision-making (2:64). Basically, it takes the summarized data from the operational database, filters them for analysis and decision making processes (2:64). For instance, the manager of the admission and registration department may ask for the number of students at KFUPM taking ENGL-214 last summer.
The data warehouse answers this query for him. Then, he would take decision whether to increase number of sections of this particular course or not. Operational Data Vs. Warehoused Data Transactional data and DSS data are different in the summarization level, transaction type, query activities and dimensionality. Summarization level The degree to which DSS data are summarized is very high when contrasted with the operational data (5:39). For example, rather than storing thousands of sales transactions for a given store on a given day, the data warehouse might simply store the total number of units sold and the total price during that day.
Then, the store manager may decide whether to continue or discontinue selling or producing such products. Transaction type The operational database and the data warehouse are different in terms of transaction type. Whereas production data are characterized by update transactions, DSS data are mainly characterized by query (read only) transactions (2:67). The DSS data also require periodic update (2:67) to load new data that are summarized from the transactional database as well as other external sources.
Therefore, the warehoused data are historic (2:64) while the operational data represent transactions as they happen. Query activity It is difficult, if not impossible, to optimize a single database for both processing purposes as well as for decision-making needs. For that reason, the data warehouse is optimized for ad hoc (on demand or as needed) complex queries needed by decision-makers (2:67). The production database, on the other hand, is optimized to allow more processing for the repetitive update transactions (2:67). So, it is difficult to get ad hoc queries from that operational database because of the continuously updated transactions.
Dimensionality Dimensionality is the most distinguishing characteristic of the DSS data. The data warehouse is set to provide the larger picture (2:64). In other words, it includes many data dimensions. For instance, a sale manager may ask how many units of product X were sold to customer Y during the last T months (2:66). So, he or she can view the data from three dimensions: product, customer and time. In fact, (s)he could view the data from many dimensions.
This multidimensional view of the data is different from the single view of the operational database. In order to gain the assistance of the data warehouse in the analysis and decision-making process, four main stages are required: storing the data, data extraction & filtering, query tools and presentation tools. Data Store The data store is a repository where the meta data (information about the data) are kept to describe the characteristics of the data in the DSS database. It is also linked with the transactional database so that any modification of the transactional data will be updated in the data warehouse as well. (2:65) Data Extraction & Filtering The data warehouse contains two main types of data: data extracted from the production database and data from external sources (3:32) like stock price indicators. The data extraction & filtering tools are used to extract and validate the data taken from the transactional database and external data sources (3:32).
The warehoused data are not the copy of the operational data. Instead, they are summarized and organized for analysis and query speed. Also, using data from external sources means having to solve data formatting conflicts (1:74). For example, when comparing the GPA (Grade Point Average) of two different universities say KFUPM and KFU (King Faisal University), many inconsistencies must be solved. First, the GPA of KFUPM students are calculated out of four while they are calculated out of five for KFU students. Second, KFUPM uses the standard English date whereas KFU uses the standard Arabic date. And there are many other conflicts that DSS database filtering tools are able to solve them so that data are stored in standard format. Query Tool The data warehouse contains very huge data.
However, the manager of any organization may only need specific portion of those data. As a result, the query tools are used in the DSS database in order to retrieve the appropriate and relevant data (6:20). The manager may then, analyze those data so that he or she comes up with the right decisions that serve the organization (6:20). Presentation Tool The query tools provide some ad hoc queries and repots.
Unfortunately, to use the query tools, the manager has to know the details of the query tools such as SQL (Structured Query Language), QBE (Query By Example) and many others. The presentation tools help the manager to select the most appropriate presentation format like bar graph, maps and summary reports. Of course, one picture is better than one thousand words. (5:43) CONCLUSION The data warehouse is read only database optimized for data analysis, query processing and decision-making process. Not surprisingly, it has become the main data source for modern decision support system during the past few years. Therefore, all organizations should exploit the power of this tool so that their top management could carry out the decision-making process with more confidence to achieve the desired goals.
- Ballou, Donald P., and Giri Kumar Tayi. “Enhancing Data Quality in Data Warehouse Environments.” Communications of the ACM Jan.1999: 73-78. 2. Gould, Lawrence.
- “What You Need to Know About Data Warehousing.” Automotive Manufacturing & Production Jun. 1998: 64-67. 3. Scheuerman, Michael.
- “Planing to Build A Data Warehouse.” Credit Union Magazine Dec. 1998: 32-33. 4. Stephenson, Miles, and Michael McCathren. “Digital Decisions.” Restaurant Hospitality Feb.
- 1999: 82-84. 5. Taylor, Rick. “Knowledge Is Power.” Credit Union Management Jan. 1999: 39, 43.
- 6. Teresko, John. “Information Rich, Knowledge Poor.” Industry Week 1 Feb,1999: 19-24. Bibliography 1.
- Ballou, Donald P., and Giri Kumar Tayi. “Enhancing Data Quality in Data Warehouse Environments.” Communications of the ACM Jan.1999: 73-78. 2. Gould, Lawrence. “What You Need to Know About Data Warehousing.” Automotive Manufacturing & Production Jun. 1998: 64-67.
- 3. Scheuerman, Michael. “Planing to Build A Data Warehouse.” Credit Union Magazine Dec. 1998: 32-33.
- 4. Stephenson, Miles, and Michael McCathren. “Digital Decisions.” Restaurant Hospitality Feb. 1999: 82-84. 5.
- Taylor, Rick. “Knowledge Is Power.” Credit Union Management Jan. 1999: 39, 43. 6. Teresko, John.
- “Information Rich, Knowledge Poor.” Industry Week 1 Feb,1999: 19-24. Technology Essays.