Data integration consists of three processes that integrate data from multiple sources into a data
warehouse: accessing the data, combining different views of the data and capturing changes to
the data. It makes data available to ETL (Extraction, Transformation and Load) tools and through
the three processes of ETL, to the analysis tools of the data warehousing environment.
2. What is a data warehouse and what are its benefits? Why is Web accessibility important
with a data warehouse?
A data warehouse can be defined as a pool of data produced to support decision
making.’ This focuses on the essentials, leaving out characteristics that may vary from one DW
to another but are not essential to the basic concept.
The same paragraph gives another definition: ‘a subject-oriented, integrated, time-variant,
nonvolatile collection of data in support of management’s decision-making process.’ This
definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to
conceive of a data warehouse that would not be subject-oriented, integrated, etc.
The benefits of a data warehouse are that it provides decision making information, organized in a
way that facilitates the types of access required for that purpose and supported by a wide range
of software designed to work with it.
Web accessibility of a data warehouse is important because many analysis applications are Web-
based, because users often access data over the Web and because data from the Web may feed
the DW.
(Reviewed from chapter-8 Section 8.2. )
3. A data mart can replace a data warehouse or complement it. Compare and discuss these
options.
For a data mart to replace a data warehouse, it must make the Data ware house unnecessary. This would
mean that all the analyses for which the Data ware house would be used can instead be satisfied by a Data management. If this is so, it can be much less expensive, in terms of
development and computer resources, to use multiple Data managements instead of an
overall Data ware house.
In other situations, a data mart can be used for some analyses which would in its absence use the
Data ware house, but not all of them. For those, the smaller Data management is more efficient’quite possibly, enough
more efficient as to justify the cost of having a Data management in addition to a Data ware house. Here the Data management.
complements the Data ware house.
4. Discuss the major drivers and benefits of data warehousing to end users.
Major drivers include:
‘ Increased competition and pace of business, leading to increased need for good decisions
quickly
‘ Successful pioneering experiences with data warehouses, leading to their wider user
acceptance
‘ Decreasing hardware costs, making terabyte databases with masses of historical data
economically feasible for more firms
‘ Increased availability of software to manage a large data warehouse
‘ Increased availability of analysis tools making DWs potentially more useful
‘ Increased computer literacy of decision makers, making them more likely to use these tools.
Benefits
It gives easier, faster and flexible reporting.
Identifies data quality issues and helps in taking solutions.
For new business opportunities it improves ability to respond intelligently.
Standards are accommodated both globally and locally.
(reviewed from chapter-8, Section 8.6 )
5. List the differences and/or similarities between the roles of a database administrator and a
data warehouse administrator.
Since a data warehouse is a specific type of database designed for a specific application area, a
data warehouse administrator has all the roles of a database administrator’plus others. One new
role is advising on decision support uses of the Data Warehouse, for which a Data Warehouse
Administrator needs to understand decision making processes. Beyond that, the issue is more a
need for additional skills in the same roles as a Data Business Analytics’e.g., understanding
high-performance hardware to deal with the large size of a Data Warehouse’than it is one of
additional roles.
(Reviewed from chapter-8, Section 8.8)
6. Describe how data integration can lead to higher levels of data quality.
A question involving the word ‘higher’ requires asking ‘higher than what’? In this case, we can
take it to mean ‘higher than we would have for the same data, but without a formal data
integration process.’
Without a data integration process to combine data in a planned and structured manner, data
might be combined incorrectly. That could lead to misunderstood data (a measurement in meters
taken as being in feet) and to inconsistent data (data from one source applying to calendar
months, data from another to four-week or five-week fiscal months). These are aspects of low
-quality data which can be avoided, or at least reduced, by data integration.
7. Compare the Kimball and Inmon approaches toward data warehouse development.
Identify when each one is most effective.
Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets if
appropriate. It is most effective when there is a recognized need for an EDW, an executive
‘champion’ of the project, and a willingness to invest in a data warehousing infrastructure before
it will show results.
Kimball’s approach starts with data marts, consolidating them into an EDW later if appropriate.
It is most effective when it is desired to provide a ‘proof of concept’ implementation before
embarking on a full-scale EDW project or when a well-defined area with the greatest benefits
can be identified.
8. Discuss security concerns involved in building a data warehouse.
Security and privacy concerns are important in building a data warehouse:
1. Laws and regulations, in the U.S. and elsewhere, require certain safeguards on databases
that contain the type of information typically found in a DW.
2. The large amount of valuable corporate data in a data warehouse can make it an attractive
target.
3. The need to allow a wide variety of unplanned queries in a DW makes it impractical to
restrict end user access to specific carefully constrained screens, one way to limit potential
violations.
9. Investigate current data warehouse development implementation through off shoring.
Write a report about it. In class, debate the issue in terms of the benefits and costs, as well as
social factors.
It is impossible to predict what the debate will bring. A student’s position on this issue is related
to his/her feelings on the relationship of national economies to the global economy. It can be
argued that off shoring improves the global economy while potentially harming one or more of
the national economies involved’such as the student’s own. U.S. students may see primarily the
damage they perceive it does to their national economy(and to their own career prospects), but
students in India may take a different view.