Table of Contents
1. Executive Summary 3
2. Data Warehousing Solutions 4
3. McDonnel Data Warehouse Design 5
3.1. Star Schema 5
3.2. Sales Fact Table 6
3.3. Dimensions 6
3.3.1. Store Dimension 6
3.3.2. Time 7
3.3.3. Customer 7
3.3.4. Product 8
4. Data Warehouse Adaption to Business Problems 9
4.1. Key Customers 9
4.2. Most Profitable Products 9
4.3. Most Profitable Locations 10
4.4. Most Profitable Time Periods 10
5. References 11
Appendix 1 – Data Dictionary 12
Appendix 2 – SQL Code 16
Appendix 3 – Work Breakdown 18
Executive Summary
This report documents a study into Data Warehousing and its application to providing management assistance for the Sales business process for McDonnel Clothing, a clothing retailer based in Melbourne, Australia. The report includes a discussion of the benefits of data warehousing, description of the design of the data warehouse and how it can be used to address the key business issues.
Kimball’s four step approach has been used in the design of the data warehouse and a star schema design has been proposed. Section 3 of the report contains descriptions of the fact and dimension tables used in the star schema. The data dictionary is provided in Appendix 1 in addition to how data will be sourced from existing databases. SQL statements used in the creation of the data warehouse are shown in Appendix 2.
Key Objectives:
Design a data warehouse system that is aligned with deriving insights for decision-making in the business
Outline integration of existing information systems
Describe how the data warehouse can be used to answer key business issues
The existing information systems can easily be integrated into creating the data warehouse. In addition, the following improvements to the existing database systems have been recommended:
customer database system to include a customer version number and a ‘Date of First Order’ attribute
Product order database system to include a product version number
We have found that the proposed star schema design for the data warehouse can be used to answer the key business questions for McDonnel Clothing. It will help managers understand key customers, the most profitable products, store locations and time periods which and make decision regarding the future growth of the company. We recommend the implementation of the data warehouse for McDonnel Clothing to enable future growth.
Data Warehousing Solutions
In the era of competitive business environment, large amounts of data are being generated from each branch of the business. Intellectual capital is the main source of added market value to the company, created over time. Hence, it is crucial to discovery techniques for developing and leveraging intellectual capital. Data warehousing is the best way to leverage intellectual capital in today’s era. It allows the formatting of data from different fragments of business to one unified structure which is easy to understand (Katal, 2013, August).
According to (Inmon, 1993), a data warehouse is a subject-oriented (it analyses a particular subject area), integrated (collection of data from multiple data sources), time-variant (warehouses provide access to more detailed information over a longer period (Zeng, 2003)) and non-volatile (historical data is not altered) collection of data in support of management's decision-making process.
From the above definition, we know that a data warehouse presents complex data in a uniform way; giving a single integrated view of the whole organization, allowing data to be accessed quickly. This makes it easier to be understood by managers by minimizing the number of tables resulting in simplified queries leading to improved decision-making. This is possible because the data warehousing is based on the principle of chunking (human brain has limited capacity to process information) and hierarchical structuring (Moody, 2003).
A common goal for all businesses is to make better decisions in comparison to their competitors. That is only possible if a company has a data warehouse as it provides a standardized data from different sources which reduce the risk of error in interpretation (Herzing University, n.d.), reducing time wasted in accumulating data which leads to quick and accurate decision making, giving the company a competitive edge (Minerra, 2017).
Functions that become streamlined, efficient and qualitatively improved include:
Reporting: managers may analyse data and make reports more easily
Presentation of data: It’s easier to make dashboards for critical management data.
Analysis and Implementation: Drilling down is easier and helps to distinguish problems and areas where reforms have to be implemented
Uniformity in Data Design: In communication, it is not just the content that matters. Equally important is the form. Data Warehouse and Data Warehouse managers can create this uniformity which increases ease of understanding and comfort of familiarity.
Security: it gives an organization a secure way to provide specific data to people who have a legitimate need and to exclude others. (Serra, 2013)
The creation of a data warehouse brings about a new orientation in the organization. It creates a different way of relating to and analysing the business problems. Data Warehouse emerges as a new well-informed member of the organization which pulls all deliberations to an empirical focus. It has an impact on the whole working environment.
McDonnel Data Warehouse Design
This section details the design of Star Schema created for McDonnel Clothing
Star Schema
The following design is based on the process stated by Kimball (Kimball, 2002).
The business process identified for the design of the star schema is Sales. Thus, we have specified the Sales Fact Table at the centre of the star schema – it is clear from the description, that to expand the business, increased understanding of sales, products and customers over different time periods is required.
For this business process, the definition of grain is transactional, capturing a line item on a sales order. That is, we shall use each transaction and the individual line items with those transactions.
The Dimensions chosen are Customer, Store, Product and Time.
In understanding Sales, we have chosen Unit Sales, Revenue and Margin as the business measures. The star schema for the Sales Process is given below.
Sales Fact Table
The Sales Fact Table is defined by the following attributes: Grain, Type of Fact Table and Type of Business Measures.
The grain of the Sales Fact Table is a line item on a Sales Order. The Sales Fact Table is a transactional table, listing the individual line items from sales of products made at a date, by a customer, from a particular store. The facts in the fact table are additive business measures since they can be summed across all dimensions. The business measures are:
Unit Sales for each line item is available from the Sales Order Table in the Sales System as “Qty”.
Revenue (or Dollar Sales) is a calculated value available by multiplying the Unit Sales (as above) by the Sale Price, given in the Sales Order Table of the Sales System.
Margin is a calculated value given as Unit Sales (given above), multiplied by the difference between Sale Price (Sales Order Table in sales System) and the Unit Cost given as ‘Cost per item $’ in the Product Order Table, within McDonnel’s Inventory System. Here, the chosen ‘Cost per item $’ shall be the amount which corresponds to the most recent transaction performed before the sale.
An example row of the Sales Fact table is shown below.
StoreID
ProductID
Prod
Version
TimeID
CustomerID
Cust
Version
Unit
Sales
Revenue
Margin
1
100018
1
12 May 2013
C478Carl
1
3
148.71
203.13
Dimensions
This section discusses each of the dimensions given in the Star Schema.
Store Dimension
A Store dimension has been chosen since the business would like to understand which store locations perform better than others. This is a simple table where a storeID identifies the address and PostCode of the store. Here the storeID is a surrogate key as it has not been included in the original data – it shall allow for efficient querying.
StoreID
Address
PostCode
1
Lygon Street, Carlton
3053
2
South Melbourne
3205
3
Williamstown
3016
Time
The Time dimension has been chosen because the company needs to know which time period is the most profitable. Thus, we have specified the time into year, quarter, month, day of month, and day of the week to better understand the sales performance under different time scales. Time ID is the unique date of that day.
TimeID
Year
Quarter
Month
Day
Day_of_the_week
19 Jan 2013
2013
1
Jan
19
Sat
13 May 2013
2013
2
May
13
Mon
Customer
The Customer dimension has been chosen to record the details of our customers. We decided to have CustomerID, version, names, date of birth, age group and postcode as the basic information. Customer is a Slowly Changing Dimension (SCD) and for this reason version has been included in the Customer Table.
The version points to the different suburb for a particular customer, because each customer may change their residential address over time. As such this is classified as a Type 2 SCD. In this fashion, the PostCode shall be updated at the time of the first order for the new version of the customer. For this design of customer dimension, we would recommend that McDonell update their customer database system to include a customer version number and a date_of_first_order. To start with, all existing customers shall be assigned version 1 and any subsequent change of address would be updated as a new version, together with a corresponding date_of_first_order.
In order to understand how the age of customers have an effect on the business, we have segregated them into various age groups. Age group is a calculated attribute in the Customer dimension, based on the date of birth attribute.
CustomerID
Cust
Version
Name
DOB
Age_Group
PostCode
Date_of_First_Order
C478Carl
1
Brian Jones
10 Jan 1980
31-50
3053
12/5/13
C478Carl
2
Brian Jones
10 Jan 1980
31-50
3000
12/5/14
C478SthMelb
1
Nancy Smith
12 Dec 1959
Over 50
3205
15/5/14
Product
The Product dimension has been chosen to record the details of the products the company has sold. As for the Customer dimension, the Product dimension has been identified as a Type 2 SCD. As the Unit Cost of the product changes over time, the Unit Costs at various time periods can be tracked. This table contains productID, ProdVersion, product description, product group, order date and unit cost. The version shall be revised as the unit cost of a particular product is reordered at a new unit cost.
ProductID
ProdVersion
Description
Group
OrderDate
UnitCost
100011
1
6 Panel Sandwich Peak Cap (HW004) (Navy/Stone)
Headware
19 Jan 2013
6.5
101659
1
Ladies Maitre D Blouse (LL-MA) (White)
Lounge
7 Dec 2013
78
Whilst the value of Margin is calculated using the Unit Cost taken directly from the company’s existing databases, it is believed that tracking the Product Unit Cost over time will allow the company to understand why a product has become less profitable. That is, once the most profitable products have been identified, the manager may use the the Product Version in the Product dimension to further analyse the situation and perhaps change suppliers if suitable.
Data Warehouse Adaption to Business Problems
Based on the proposed data warehouse design, the following business questions can be answered by the managers of McDonnel Clothing.
Key Customers
Combing the customer dimension table and the sales fact table by Customer ID and joining the time dimension table with Time ID and product dimension table with Product ID from our new data warehouse, we can get all the orders and sales for each year, month and quarter for each customer. By summing up the Unit Sales grouped by Customer ID, we can calculate the total unit sales for each customer. And summing up the Revenue grouped by Customer ID, we can get the total dollar sales for each customer. Retrieving the margin from the sales fact table grouped by Customer ID, we can obtain the total margin for each customer. Combined with the time table with matching Time ID, we can view the unit sales, dollar sales and margin for each customer in each year, month and quarter. By sorting, we can show the managers the performance of sales for each customer in different time period and find the most profitable customers.
Combing the postcodes, which are retrieved from customer table, and sorting the margin in the order from largest to smallest, we can get the suburbs in which most profitable customers live. When a customer move his/her address, the customer table will update a new version of that customer without changing Customer ID and with the first order date of the new address. So, every sale will be associated with a particular postcode depending on when the order is being purchased. By comparing the Order Date from sales table with the First Order Date in customer table, we can know that the order made under which version of address.
By subtracting the year of today and year of birth, we can compute the age of each customer. By selecting and grouping the customers with ages less than 30, from 30 to 50 or over 50, we can summarize the unit sales, dollar sales and margins for each age group.
Most Profitable Products
For analysis of products with respect to various time periods, we need to join the sales fact table with the product table; we can do that by using the composite key which includes ProductID and ProdVersion. Similarly, we can join the sales fact table with the time dimension using TimeID. This will provide us with sales data across all products and time which can be used for analysis purpose. Next, we can group the transactions by ProductID and the chosen time interval. McDonnel can choose to group by individual months or years by specifying the month or year in a query. For seasonal analysis, McDonnel can query the data by specifying the months corresponding to the chosen season. For example, if McDonnel wishes to know for the Unit Sales, Revenue, and Margin for the summer season, a query can be run including the months December, January, and February. In the same fashion, September, October, and November can be used to determine the business metrics for Spring. Finally, sorting the sum of margin for each product will help McDonnel Clothes determine the most profitable products in different time periods.
In addition, we have included the unit cost of each product in the product dimension. This will allow the manager to analyse how changes in unit cost of products affect the margin over time. For example, if the margin of a particular product was decreasing over time, it can be compared to the change in unit cost to find out if that was the cause for the decrease in margin.
Most Profitable Locations
By combining the sales fact table and store dimension table using Store ID and joining the time dimension table using Time ID, the sales data for all stores across all time periods can be obtained. The sales data can be grouped by store ID and sorting by attributes such as unit sales, dollar sales and margin for each month can be analysed to determine which store location was the most profitable. Similarly, this analysis can be conducted over different time periods.
Most Profitable Time Periods
By combining the sales fact table and time dimension table using Time ID, the sales data across all time periods can be obtained. The joined table will contain data about the unit sales, dollar sales, and margins for various time periods such as daily, monthly, quarterly and yearly. This will allow the manager to analyse sales data across various time periods. For example, sum of dollar sales grouped by year will show how the total revenue has changed over the years. Also, sales data can be grouped for each day of the week (e.g. Sunday, Monday) to see how the performance varies over different days. This may help McDonell to identify if certain days of the week (e.g. Sundays) are more profitable and extend opening hours of the other stores.
References
1 Key Data. (n.d.). Data Warehouse Definition. Retrieved from https://www.1keydata.com/datawarehousing/data-warehouse-definition.html
Herzing University. (n.d.). What is Data Warehousing and Why Is It Important. Retrieved from Herzing University: https://www.herzing.edu/blog/what-data-warehousing-and-why-it-important
Inmon, W. (1993). Building the Data Warehouse,. New York: A Wiley QED publication.
Katal, A. W. (2013, August). Big data: issues, challenges, tools and good practices. In Contemporary Computing (IC3), Sixth International Conference on (pp. pp. 404-409). IEEE.
Kimball, R. R. (2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition. New York: John Wiley and Sons, Inc.
Minerra. (2017, August 15). What Is A Data Warehouse And How Does It Deliver Value? Retrieved from https://www.minerra.net/business-analytics/what-data-warehouse-how-deliver-value/
Moody, D. a. (2003). From ER models to dimensional models: bridging the gap between OLTP and OLAP design, Part I. Business Intelligence Journal, 7-24.
Serra, J. (2013, July 18). Why You Need a Data Warehouse. Retrieved from http://www.jamesserra.com/archive/2013/07/why-you-need-a-data-warehouse/
TechAdvisory.org. (2015, March 31). Retrieved from http://www.techadvisory.org/2015/03/benefits-of-data-warehouses-for-business/
Zeng, Y. C. (2003). Enterprise integration with advanced information technologies: ERP and data warehousing. Information Management & Computer Security.