Data Warehouse (DW) is one of the solutions for decision-making process in a business organization. But it only stores data for managerial purpose and it has no intelligent mechanism for decision making. This raises the issue of knowledge storage in organization for high capability decision support. The enterprise-wide information delivery systems provided in a DW can be leveraged and extended to create a Knowledge Warehouse (KW). Framework of KW is introduced, which is enhanced form of data warehouse to provide a platform/ infrastructure to capture, refine and store consistent and adequate knowledge along with data to improve decision making in an organization . The primary goal of a KW is to provide decision-maker with an intelligent analysis platform that enhances all phases of the knowledge management process. KW architecture will not only facilitate the capturing and coding of knowledge but also enhance the retrieval and sharing of knowledge across the organization .
Enterprises today rely on a set of automated tools for knowledge discovery to gain business insight and intelligence. Many branches of knowledge discovery tools were developed to help today’s competitive business markets thrive in the age of information. Main tools for getting information from these vast amounts are automated mining tools. Data Mining (DM) is defined as the automated process of analyzing large databases, usually data warehouses or internet, to discover new information, hidden patterns, behaviors, hidden traits, and to predict future trends and
forecast possible opportunities. For DM, it uses data stored in data warehouses for analysis. DM directly affects decision-making. , where Decision Support Systems (DSS) increasingly become more critical to the daily operation of organizations . Successfully supporting managerial decision-making is critically dependent upon the availability of integrated, high quality information organized and presented in a timely and easily understood manner . Corporate DM faces the challenge of systematic knowledge discovery in large data streams to support managerial decision making, because the application of each DM algorithm requires the presence of data in a mathematically feasible format which achieved through Data Pre-Processing (DPP). Therefore the phase of DPP represents a complex prerequisite for DM in the process of Knowledge Discovery in Databases (KDD) aiming to maximize the predictive accuracy of DM .
This chapter gives overview of the data mining and knowledge warehouse and has focused a variety of techniques, approaches and different areas of the research which are helpful in data mining and knowledge discovery applications.
2.2 Distinction among Data ,Information, and Knowledge
There are three important technical terms in the information technology worlds which are data, information, and knowledge. So in this section, it is important to distinct between these terms by clarifying the meaning of them:
In general, data is a value of an observable, measurable or calculable attributes. Data consists of the measurement and computerization of daily life. Data can be considered as atoms of knowledge. Data refers to what we attempt to gather or measure such as size, age, and amount. Data provides very little explanation about the subject. In the context of computer systems, we define a data item as “the smallest unit of named data,” consisting “of any number of bits or bytes [10, 21].
It refers to the organized and stored data that can be used for answering a specific question. Information is the aggregation and subsequent reduction of data . When data is processed (collected , stored , grouped , analyzed , and interpreted) it becomes information , in another means information is data extracted , filtered , or formatted in some way such as averages , trends , and percentages . Data by itself has little purpose and meaning but information is data that has meaning and purpose. It really relates to the process of making our data have direct meaning to our business. For example, when we summarize customer sales amounts, and subtract the expenses for serving that customer, we attain profitability numbers. If we do this for each customer and compare them, we can see what customers are most profitable. In this way, we turn data into information .
It can refer to the information in use. Knowledge is generated when information is combined with context and experience. Knowledge can be thought of as the set of rules and relationships that enable values to be added and got skilled performance. Knowledge may consist of the work procedures and processes, precedents, details, and conceptual relationships between topics and domains. Knowledge is represented in the form of rules. Knowledge is higher level aggregation and interpretation of information [10,21]. Knowledge is more than an accumulation of facts. It is also the rules and procedures needed to make decisions .
Knowledge is a subset of information that has been extracted, filtered or formatted in a very special way. More specifically, the information we call knowledge is information that has been subjected to and passed tests of validation or has been validated by common experiences .
Scientific knowledge is a type of knowledge and it is defined as the information validated by the rules and tests applied to hypotheses and theories by some scientific community. It is a type of knowledge that depends on the validation rules and tests of the organization that improve organizational performance .
2.3 Data Pre-Processing (DPP)
Data Pre-Processing (DPP) is also known as (data preparation). It comprises those techniques concerned with analyzing raw data so as to yield quality data. It mainly includes data collection and integration, data transformation, data cleaning, data reduction, and data discretization .
DPP is an important step in the data mining process, because if there is much irrelevant and redundant information or noisy and unreliable data, then knowledge discovery during the analysis and training phase will be
more difficult . DPP is considered as a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. DPP is a proven method of resolving such issues. It prepares raw data for further processing like data mining. It is used in database-driven applications such as customer relationship management and rule-based applications like neural networks . DPP tasks are distinguished in data reduction, aiming at decreasing the size of the dataset by means of instance selection and/or feature selection, and data projection, altering the representation of the data, e.g. mapping continuous variables to categories or encoding nominal attributes .
2.3.1 Importance of Data Pre-processing (Preparation) for Knowledge Discovery
Over the years, there has been significant advancement in data-mining techniques. This advancement has not been matched with similar progress in data pre-processing (preparation). Therefore, there is now a strong need for new techniques and automated tools to be designed that can significantly assist us in preparing quality data. Data pre-processing can be more time consuming than data mining, and can present equal, if not more, challenges than data mining . High-performance mining systems require quality data, because quality data yields high-quality patterns. In order to investigate quality data, it is important to apply data pre-processing. Data pre-processing importance can be included in the following three tasks [23, 15]:
1- Data pre-processing (preparation) generates a dataset smaller than the original one, which can significantly improve the efficiency of data mining. This task includes:
Selecting relevant data: attribute selection (filtering and wrapper methods), removing anomalies, or eliminating duplicate records.
Reducing data: sampling or instance selection.
2- Data pre-processing (preparation) generates quality data, which leads to quality patterns (knowledge). For example, we can:
Recover incomplete data: filling the values missed, or reducing ambiguity.
Purify data: correcting errors, or removing outliers (unusual or exceptional values).
Resolve data conflicts: using domain knowledge or expert decision to settle discrepancy.
3- Data pre-processing is used because real-world data may be incomplete, noisy, duplicated and inconsistent, which can disguise useful patterns. This is due to the following reasons [23, 13]:
Incomplete data because of lacking attribute values, lacking certain attributes of interest, or containing only aggregate data.
Noisy data because of containing errors or outliers.
Inconsistent data because of containing discrepancies in codes or names.
Duplicated data because of containing duplicated records (transactions) or attributes with similar names or contents.
2.3.2 Categorizing of Data Pre-Processing Techniques
The techniques that clean and prepare data for processing and mining to gain high quality and accurate knowledge can be categorized into the following [14, 8]:
Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data , which we will explain and apply in chapter three, as shown in figure (2-1).
Data Integration: it includes the integrating of several databases and files into one unified file or database as we apply in chapter three. Data with different representations is put together and conflicts within the data are resolved, as shown in figure (2-1).
Data Transformation: Data is normalized, aggregated and generalized.
Data Reduction: This step aims to present a reduced representation of data in a data warehouse which we will study and apply in chapter three. In many cases, the amount of data is too huge, therefore it is necessary to reduce the utilized data without losing any of its predicting accuracy and describing ability , as shown in figure (2-1).
Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals.
2.4 Data Cleaning (DC) and Knowledge Warehouse (KW)
Data cleaning (DC) tools have the important role in improving the data quality (DQ). There are various data cleaning tools, and these tools functionalities can be classified into the following Declarative data cleaning, and rules based approach for DC (RBDC). Here we explain the RBDC approach. The RBDC approach uses Business Rules (BR) that are constraints on data contained in business policies into a formal form in order to allow computerization, while the BR is considered a specific part of all domain knowledge (DK). The main challenge in DK is knowledge discovery, which is usually manually made by human experts (interviews or
questionnaires) or collected from books and references. The first step in the RBDC approach is the collection of knowledge from which the rules are derived, the collected and organized knowledge is translated to rules .
Knowledge like data are collected or issued (derived) from different sources which may be incomplete, inconsistent, and heterogeneous; therefore, there is a need in managing the quality of knowledge and rules to ensure high data cleaning (DC) quality .
The rule base system (RBS) which uses the RBDC approach can be an agent of change to improve the data quality. The objective of RBDC system is to improve the functional capability of a data quality management by embedding it with knowledge of problem area .
The RBDC systems have many advantages which are :
1. Quality improvement.
2. Practical knowledge made applicable: Systems can assist experts in decision making even if they have that knowledge to hand; this improves the accuracy and timeliness of the decision made.
3. Infallible and complete.
4. Consistency: Results produced by a knowledge system are consistent throughout its operational lifespan.
5. Updating knowledge.
2.5 Knowledge Management (KM)
Knowledge Management (KM) is the gathering, retaining and disseminating of intellectual capital (i.e. data, information, and knowledge) to generate a competitive advantage in the market .
KM is a common function in organizations since it can create, store, retrieve, transfer and reuse knowledge. Nowadays, knowledge becomes the key resource for its economic strength and it is the main factor for
organizations to gain competitive advantages. In addition, KM in organizations is achieved as a result of having sufficient factors such as learning organization, knowledge sharing intention, team activity, top management commitment, collaboration and quality of knowledge . KM is the practice of adding actionable value to information by capturing tacit knowledge and converting it to explicit knowledge; by filtering, storing, retrieving and disseminating explicit knowledge; and by creating and testing new knowledge .
Knowledge Management can benefit a corporation in a number of ways, including :
• Leverage “lessons learned” to lower expenses.
• Share information to generate new ideas and increase revenues or decrease expenses.
• Improve the corporation’s ability to adapt to change and opportunities in the market.
• Foster innovation through the sharing of past solutions and collective ideas.
There are two types of knowledge: explicit knowledge and tacit (implicit) knowledge:
1. Explicit Knowledge: is the knowledge that can be expressed formally and can be easily communicated or spread throughout an organization. Explicit knowledge is categorized into strings of symbols (e.g., words, numbers, and formulas), physical objects (e.g., equipment, documents, and models), rules, routines, or standard operating procedures .
2. Implicit (Tacit) Knowledge: is the knowledge that is uncategorized and difficult to spread. Tacit knowledge is learned through extended periods of experiencing and doing a task. Despite being uncategorized, implicit knowledge can be and is regularly taught and shared .
Implicit knowledge includes the beliefs, perspectives, and mental models. It consists of subjective expertise, insights and intuitions that a person develops from having been immersed in an activity or profession for an extended period of time .
Implicit knowledge and explicit knowledge are complementary entities. Through four interaction and conversion processes between implicit and explicit knowledge, four modes of new organizational knowledge created. These processes are (Socialization, Articulation (Externalization), Combination, and Internalization) as shown in figure (2-2) .
1- Socialization (Tacit to Tacit Knowledge Sharing):
It is the process of sharing a particular form of implicit knowledge with the other forms of implicit knowledge such as experiences, technical skills, and mental models. For example, the manufacturer cannot learn a craft alone but with the assistance of his master by observing, imitating and practicing this represent mode of sharing tacit knowledge in the business world .
2- Articulation (tacit to explicit knowledge conversion):
It is also called Externalization. It is the process of converting tacit knowledge to explicit knowledge. It includes specifying the purpose of the decision; for example the number and locations of warehouses must be known and understood to specify supply costs in a new marketing area. And also specifying articulating parameters, objective functions, relationships, etc… Articulation may also include knowledge extraction in expert systems . DSS can enhance converting the implicit to explicit knowledge through specifying the mathematical models. In the process of model building, the tacit knowledge is required to explicitly specify the objective of the model, the decision variables, the relative importance of the decision variables and
knowledge also explicitly specifies the model constraints in terms of the decision variables . This process is done through such exchange mechanisms as conversation and reflection knowledge .
3- Combination (Explicit to New Knowledge Conversion):
It is also called Integration. It is the process of combining several types of explicit knowledge into new patterns and new relations . This mode of knowledge conversion involves the use of social processes to combine different bodies of explicit knowledge held by individuals, where individuals exchange and combine knowledge through an exchange mechanisms such as meetings and telephone conversations .
4- Internalization (Explicit to Tacit Knowledge Conversion):
It is also called understanding, and it is the process of testing and validating the new relationships in the proper context, thereby converting them into new tacit knowledge . This process bears some similarity to the traditional notion of “learning” and will be referred to here as internalization .
2.6 Knowledge Base Management System (KBMS)
The concept of Knowledge Base Management System (KBMS) is analogues of Data Base Management System (DBMS) and the concept of Knowledge Warehouse (KW) is analogues of Data Warehouse (DW). The primary component of the KW architecture is a KBMS that integrates the knowledge base, model base, and analysis tasks. It is implemented in an object-oriented environment. It is not only to manage data, but also all of the objects, object models, process models, case models, object interaction models, and dynamic models used to process the knowledge and to interpret it to produce the knowledge base .
2.7 Data Base Management System (DBMS) in Contrast with Knowledge Base Management System (KBMS):
There are many different definitions of Data Base Management System (DBMS), where according to Patrick O’Neill, it is defined as “a program product for keeping computerized records about an enterprise.”, and according to Rumbaugh, Blaha, Premerlani, Eddy, and Lorensen, it is defined as “a computer program for managing a permanent, self-descriptive repository of data.”, Combining various aspects of these definitions, it is a computer program for managing data repository or Data Base (DB). A specific DBMS programming application is produced by using a DBMS_template such as (oracle, DB2, etc.) to create, maintain, and enhance it which will be a concrete product of using a particular template or tool for producing an actual database management application .
In moving to Knowledge Base Management System (KBMS), it is defined as a computer application for managing (creating, enhancing, and maintaining) the Artificial Knowledge Base (AKB) which is Knowledge Base (KB) itself. Just like DBMS which is a computer application for
managing a DB. KBMS is both the Artificial Knowledge Management System (AKMS) and knowledge warehousing system, and unlike DBMS there are no analogues tools to DBMS templates available for developing AKMSs. Since KBMS is the AKMS, then the standard the AKMS Company is developing for AKMS is also considered the KBMS standard and any software tools developed on the basis of the standard will be KBMS tools as well as AKMS tools .
2.8 The Knowledge Management Process Model
The process model associated with knowledge management consists of the following well-defined activities that are explained below in detail and are represented in figure (2-3) :
1- Knowledge Acquisition
During knowledge acquisition, the Knowledge Engineering captures knowledge from domain experts through interviews, case histories, and other techniques. This knowledge can be represented as rules and heuristics for
expert systems. This activity helps to ensure the quality of the data and information used by knowledge workers .
2- Knowledge Refinement
This activity assists in the refinement of data and information into knowledge. During knowledge refinement the information contained in multiple repositories under multiple heterogeneous is classified and indexed, and metadata is created in terms of domain concepts, relationships and events. In addition, the domain context and domain usage constraints are specified. Data mining and data analysis techniques can be applied to discover patterns in the data, to detect outliers .
3- Knowledge Storage and Retrieval
This activity allows the efficient storage and retrieval of metadata and knowledge. The refined data, metadata and knowledge are indexed and stored for fast retrieval using multiple criteria, for example, by concept, by keyword, by author, by event type, and by location. Additional summary knowledge may be also added. Access controls and security policies should be put in place to protect the knowledge base and the intellectual property it contains .
4- Knowledge Distribution
Knowledge can be distributed in many ways, as for example, a corporate knowledge portal. Electronic messaging may also be used to distribute knowledge in the form of attachments of documents, presentations, etc. Another approach is to have active subscription services whereby agents inform users of relevant information in e-mail messages with hyperlinks to knowledge in the repository .
5- Knowledge Presentation
The knowledge portal may handle knowledge presentation, and the interface may be tailored to the needs and preferences of each individual user. The portal should support user collaboration so as to combine tacit knowledge with explicit knowledge for problem solving .
2.9 Knowledge Component
Knowledge Component (KC) is a description of a mental structure or process that a learner uses, alone or in combination with other knowledge components to accomplish steps in a task or a problem. A KC is a generalization of everyday terms like concept, principle, fact, or skill, and cognitive science terms like schemas, production rule, misconception, or facet (appearance). KC is either explicit or implicit, when we say a student “has” a knowledge component, it might mean the student can describe it in words (e.g., “Vertical angles are congruent”) or it might simply mean that the student behaves as described by the knowledge component, this is called explicit knowledge component, like a fact or principle. On the other hand when we say that the student “has” the knowledge component “If angle A and B are vertical angles and angle A is X degrees, then angle B is X degrees” means the student will behave in agreement with it although they might not be able to state the rule, this is called implicit knowledge component, like a skill. A KC relates features to a response where both the features and responses can be either external, in the world, like cues in a stimulus and a motor response or internal, in the mind, like inferred features and a new goal. KCs are correct or in correct, they are “correct” when all of the features are relevant to making the response and none of them are irrelevant. They are incorrect when they include irrelative features for example, in geometry, the knowledge component “if angles look equal, then conclude they are equal” is incorrect because it includes an irrelevant feature
“angles look equal” and is missing a relevant feature like “the angles are at the base of an isosceles triangle” .
2.9.1 Types of knowledge Components 
1- Domain knowledge
• Facts, concepts, principles, rules, procedures, strategies
2- Prerequisite knowledge
• Feature encoding knowledge
3- Integrative knowledge
• Schemas or procedures that connect other KCs.
4- Meta cognitive knowledge
• About knowledge, controlling use or acquisition of knowledge.
5- Beliefs & interests
• What one likes, believes
2.9.2 Knowledge Object:
Almost all subject matter content can be represented as entities (things), actions (procedures that can be performed by a student on, to, or with entities or their parts, processes (events that occur often as a result of some action), and properties (qualitative or quantitative descriptors for entities, actions, or processes). The knowledge that required to learn about entities, actions, or processes can be represented by a collection of knowledge components which we call a knowledge object. A knowledge object is a framework consisting of containers for different kinds of specific information (the knowledge components). Each knowledge component is a container for a specific kind of information about the subject matter being taught. This knowledge framework is the same for wide variety of different topics within a subject matter or for knowledge in different subject matter
domains. Knowledge objects include knowledge components that are: (a) name, describes, or illustrate some entity; (b) name, describe, or illustrate the parts of an entity; (c) identify properties of an entity, part, action, or process; (d) identify actions associated with the entity; (e) identify processes that modify the entity; and (f) identify kinds of entities, actions, or processes. Table (2-1) identifies each of these knowledge components .
Process trigger Process:
Condition (value of property)
Consequence (property value changed)
Process trigger Kind:
Definition (list of property value)
2.10 Knowledge Quality
There are many definitions of quality, such as “fitness for use”, “fitness for purpose” as defined by ISO (2000), and “conformance to requirements” as defined by Crosby (1979). The international definition of a Knowledge quality is “the quality of design and quality of the process; it is perception of the value of the suppliers’ work output” as defined by Deming (1940). Knowledge quality is the important factor for knowledge management process because knowledge quality can be useful such as solving problem, decision support in work and innovation knowledge. Knowledge management performance can measure by knowledge quality .
There are eight critical dimensions or categories of quality that can serve as a framework for strategic analysis which are proposed by Garvin (1987) :
1- Performance: refers to a product’s primary operating characteristics.
2- Features: are usually the secondary aspects of performance, those characteristics that supplement their basic functioning.
3- Reliability: this dimension reflects the probability of a product failing within a specified time period.
4- Conformance: is the degree to which a product’s design and operating characteristics meet established standards.
5- Durability: a measure of product life, durability has both economic and technical dimensions.
6- Serviceability: is the speed, courtesy, competence, and ease of repair. Consumers are concerned not only about a product breaking down but also about the time before service is restored, the timeliness with which service appointments are kept, the nature of dealings with service personnel, and the frequency with which service calls or repairs fail to correct outstanding problems.
7- Aesthetics: is a subjective dimension of quality. It demonstrates how a product looks, feels, sounds, tastes, or smells. It is a matter of personal judgment and a reflection of individual preference. On this dimension of quality it may be difficult to please everyone.
8- Perceived (recognized) quality: consumers do not always have complete information about a product’s or service’s attributes; indirect measures may be their only basis for comparing brands. A product’s durability for example can seldom be observed directly; it must be concluded from various touchable and untouchable aspects of the product.
2.11 Knowledge Warehouse
Knowledge Warehouse (KW) can be thought of as an “information repository”. It consists of knowledge components (KCs) that are defined as the smallest level in which knowledge can be decomposed. KCs are cataloged and stored in the knowledge warehouse for reuse by reporting, documentation, execution the knowledge or query and reassembling which are accomplished and organized by instructional designers or technical writers. The idea of KW is similar to that of Data Warehouse (DW). Like DW, the KW also provides answers for ad-hoc queries, and the knowledge in the knowledge warehouse can reside in several physical places, and it may be viewed as subject oriented, integrated, time-variant, and supportive of management’s decision making processes. But unlike DW, it is a combination of volatile and nonvolatile objects and components, and, of course, it stores not only data, but also information and knowledge [10, 21].
A KW is the component of an enterprise’s knowledge management system. It is the technology to organize and store knowledge. It also has logical structures like computer programs and databases to store knowledge that are analogous to the system of tables that implement data storage in the DW . The primary goal of KW is to provide the knowledge worker with an intelligent analysis platform that enhances all phases of the knowledge management process [5, 19].
The KW can also evolve over time by enhancing the knowledge it contains . It provides the infrastructure needed to capture, cleanse, store, organize, leverage, and disseminate not only data and information but also knowledge .
2.12 Knowledge Discovery Process
Knowledge discovery in databases (KDD) is a rapidly growing field, whose development is driven by strong research interests as well as urgent practical, social, and economical needs. The term KDD is used to denote the overall process of turning low-level data into high-level knowledge. A simple definition of KDD is as follows: Knowledge discovery in databases is “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” .
Knowledge Discovery has also been defined as “the non-trivial extraction of implicit, previously unknown and potentially useful information from data”. It is a process of which data mining plays an important role to extract knowledge from huge database or data warehouse . Data mining is the core part of the KDD process as shown in the figure (2-4).
Figure (2-4): The typical knowledge discovery process 
The KDD process may consist of the following steps: 1) data integration, 2) data selection and data pre-processing as it has been explained in (section 2.3); 3) data mining as it will be explained in (section 2.13); 4) interpretation & assimilation. Data comes in, possibly from many sources. It
is integrated and placed in some common data stores. Part of the data in the data stores is then selected and pre-processed into a standard format. This ‘prepared data’ is then passed to a data mining algorithm which produces an output in the form of rules or some other kind of ‘patterns’. These are then interpreted to give new and potentially useful knowledge. Although the data mining algorithms are central to knowledge discovery, they are not the whole story. The pre-processing of the data and the interpretation of the results are both of great importance .
2.13 Data Mining Technique
Data mining (DM) is one of the important techniques that are used to discover required knowledge for intended enterprise.
Data mining derives its name from the similarities between searching for valuable information in a large database and mining rocks for a vein of valuable ore. Since mining for gold in rocks is usually called “gold mining” and not “rock mining”, thus by analogy, data mining should have been called “knowledge mining” instead .
Data mining is the knowledge discovery process by analyzing the large volumes of data from various perspectives and summarizing it into useful information . Data mining is the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies, models, and significant structures from large amount of data stored in databases, data warehouse, or other information repositories [31, 34].
The goal of data mining is to allow a corporation to improve its marketing, sales, and customer support operations through a better understanding of its customers. Data mining, transforms data into actionable results .Other similar terms referring to data mining are: data dredging, knowledge extraction and pattern discovery . In order to apply data
mining technique, there are several processes must be performed which are shown in figure (2-5) .
2.13.1 Data Mining Tasks 
...(download the rest of the essay above)