In today's businesses, data is omnipresent (Cukier, 2010). To surmount the challenges of growing volumes of data generated by individuals, groups and organizations, modelers have created and used analytical models to practically find solutions and patterns to effectively make use of mass influx of data. It is deemed that the selection of the relevant business data through which knowledge is generated, could be further used to enhance problem-solving and decision-making by its users. To do so, it is important to know what analytical models are and how humans utilize them.
Humans have been systematically creating and using quantitative and qualitative mathematical models since about 60 years, to optimize processes (Hämäläinen et al., 2013). Due to the assumption that analytical models provide quicker, better and rational outcomes, models have been constantly used in many fields for decision-making (Hämäläinen et al., 2013; Luoma, 2016).
Although analytical models are considered to yield rational outcomes, human aspects in applying analytical models cannot be neglected. Human intervention can create and select models in specific situations that can help to shape the data, and also the entire process. It is therefore a corollary to examine how a person or an organization uses models in processes to eventually come to viable decisions. This has been the focus of the Behavioral Operational Research; a field that deserves more attention, as studies in this field has proved to be an important part in understanding the role of humans in processes that lead to model-based decision-making.
The increase in use of technology, and the insistent exposure of terms such as data, governance in mass media, has stimulated how elements of humans, data, governance and decisions could be combined. This sets the scope of this Bachelor's thesis. The focus lies on how humans use analytical models to come to decisions, which reflected in the title of this thesis:
Governance of analytical models – A systematic review of how humans can enhance decision-making through the application of models.
3. Literature research methodology
The following section describes the literature research methodology and defines identified relevant terminologies models, Operations Research (OR), Behavioral Operations Research (BOR), Knowledge of Discovery in Databases (KDD) and Governance that are associated with humans and analytical models.
To find literature and materials on the initiation and problem formulation, a literature search was conducted at international and national level through the application of a structured literature research approach provided by Webster and Watson (2002). The objective here was to identify relevant sources using keywords in the local library and in digital databases such as Google Scholar, JSTOR or ScienceDirect.
Firstly, this lead to chain searches combining keywords such as analytical models, models, humans, behavior, knowledge, decision, governance jointly the term operational research, that represents the field of study. The literatures found were then screened, but have not been limited to titles and abstracts due to limited knowledge and publications in this area. Therefore, all literatures identified in this step were considered as relevant.
Next, as proposed by Webster and Watson (2002), citations of the literatures were used to identify prior literatures that could also be relevant.
Lastly, literatures citing the relevant literatures found in step one were examined to incorporate scientific work following the main literatures.
Models: Models are simplifications of reality (Ackoff, 1977). In Operational Research, a field that has existed over 60 years (Churchman et al., 1957), models should guide people to optimal processes and solutions (Hämälainen et al., 2013; Hämäläinen & Lahtinen, 2016; O'Keefe, 2016)
Operational Research (OR): OR is often used as a synonym to modeling (Pidd, 2004; Luoma 2016) and describes the act of creating mathematical and qualitative models (Hämälätinen et al., 2013).
Behavioral Operations Research (BOR): Similar to other fields of academic research, the OR field of study has matured in the latest years (Franco & Hämäläinen, 2016b). BOR has become a solid research area within the broader field of OR focusing on the extent to “make better use of OR models and to warn of the behavioral issues that need to be taken into account when using models to support decision making or making predictions” (Hämäläinin et al., 2013, p. 624). Research areas such as finance and accounting have seen similar developments as they created new research domains such as behavioral finance and behavioral accounting (Franco & Hämäläinen, 2016a, 2016b; Becker, 2016).
Knowledge Discovery in Databases (KDD): KDD is used as a synonym to Knowledge Discovery Process (KDP) (Cios et al., 2007). KDD is “concerned with the development of methods and techniques for making sense of data” and defines a process that aim to attaing knowledge through the use of analytical models (Fayyad et al., 1996a). Fayyad et al., who have been considered as the pioneers of the KDD (Cios et al, 2007), have emphasized that data-mining is a step within the KDD, which deals with the application of algorithms to obtain patterns (Fayyad et al., 1996).
Governance: Depending on the time frame, governance has had different connotations. Today, the term has evolved to relate to Corporate governance and that individuals in organizations should not make independent decisions (Thomas, 2006; Hufty 2007).
Section four digs deeper into the existing literature to provide systematic overview.
4.1 Behavioral Operational Research
The editors Franco & Hämälätinen (2016b) in Behavioural operational research: Returning to the roots of the OR profession, acknowledge that since the researches by Church (1970) and Ackoff (1977), there existed an interest in human interaction with analytical models (Franco & Hämäläinen, 2016b). Yet it was not until 2013, that Hämäläinen et al. firstly introduced the terminology BOR, in which behavioral factors of humans were to be analyzed in the course of modeling and model-based problem-solving. Debates on best practices of OR models were an ongoing issue, although the human aspect within OR models was previously rarely explored (Hämälätinen et al., 2013). Their identification of research gaps in BOR led to papers that complemented their initial study. Since then, many researchers have gained interest in investigating human interaction with models. Franco & Hämäläinen (2016a, 2016b) have in this regard identified three different focuses in the field of BOR. These concepts were:
1. Model Actors
2. Model Praxis
3. Model Methods
It therefore seems a corollary to examine how studies that followed the initial gap identification of Hämälätinen et al. (2013) could be categorized into one of the mentioned concepts. This could thoroughly explain how humans interact with models and how this affects their decision-making.
4.1.1 Model actors
The first concept is that humans interact with models as model actors. Model actors are described as the ones who “design, implement and influence processes” that are dependent on analytical models (Franco & Hämälätinen, 2016b, p. 792). Here, the emphasis lies upon the idea that OR actors are not simply individuals that are centered upon the OR-work, but are all who are part of or had interaction with OR (Franco & Hämäläinen, 2016b). Although Franco & Hämäläinen (2016b) and Hämäläinen et al. (2013) mention that OR actors could potentially be anyone who has had an interaction with OR activities, including problem-owners, clients and sponsors for instance, studies have rather concentrated on the actors that have centered around OR processes, with exceptions such as Becker (2015), who has considered the non-expert use of OR in his research. For this ground, it is reasonable to lay focus on the actors that are centered upon the OR-work and how they interact with models.
Papers such as White (2016) differentiate between model-based problem-solving and decision-making by individuals, groups and organizations. Also, studies that followed Hämälänin et al. (2013) have particularly focused on various aspects of BOR and model actors in their interaction with models distinguishing individuals (O'Keefe, 2016; Monks et al., 2014), groups (Franco, 2013; Tavella & Franco, 2015; Hämäläinen & Lahtinen, 2016) and organizations (Luoma, 2016; White, 2016). This suggests that it is useful to divide OR actors into these subcategories.
The first OR actor that must be regarded are individuals. Individuals can be distinguished by their role as in model creation and model use. Individuals that create models are commonly referred as “modellers” (O'Keefe 2016) or “modelers”(Hämäläinen et al. 2013; 2016). They are shaped by their cognitive style and cognitive bias (O'Keefe, 2016).
Firstly introduced in the area of psychology, the terminology cognitive style implied distinctive and selective perceptions by individuals of the patterns in the world (Klein, 1951), finding further use of a definition to relate to the ways of how an individual organizes information (Cronbach, 1960). According to O'Keefe (2016) cognitive style is a personality trait of people that fundamentally exists and are not influenced by situations. Cognitive biases, on the other hand, can be influenced by specific situations and events. Cognitive biases are associated with judgments made under uncertainty (Tversky & Kahnemann, 1974) and therefore account for the decisions made that are considered to be different from usual decision-making (O'Keefe, 2016). For this reason, model creators are characterized by their cognitive style that eventually is reflected in the outcome.
While cognitive styles of individuals shape the outcomes, the level of participation of an individual in model creation can also shape outcomes. Through a case study with laboratory settings, Monks et al. (2014) have found out that individuals creating models are the ones that learn from their involvement in model creation and are able to cast doubts in their assumptions. This would promote modelers to find alternatives ways if limitations to their models are reached. It is therefore a logic consequence that individuals interacting with models in model creation have better a understanding of the modeling problem than individuals simply restricted to model use (Monks et al., 2014).
Model users are individuals that use pre-established models. Similar to model creators, O'Keefe (2016) notices that users who are open to consider alternative models perform better than users that search for solution-approaches within one model, referred as “attribute-focused processing” (Bell & O'Keefe, 1995, p. 1022). Yet, emphasis lies upon the importance that the usage of models alone is causes for limitations, as the experience and insights gained through model creation would not be available to aid decision-making (Monks et al. 2014). This suggests that humans are able to interact better with models when they thoroughly understand the models through their role in model creating.
Next, model actors in groups interact with models to gain knowledge (Tavella & Franco, 2015) and come to agreements (Rouwette et al., 2011, Hämäläinen & Lahtinen, 2016). Rouwette et al. (2011) examined seven groups in their model-building processes to find out that models affected the behaviors and attitudes of the group members. To explain this, the act of persuasion within groups was closely examined. Most importantly, model- building triggers group interaction, which results in exchange of ideas by its group members. That participation in the model creation eventually acts as a persuasion mechanism in changing thoughts of the members to reach unanimous group agreement. This has been previously studied in a concept called groupthink (Janis, 1982). Understanding the concept of groupthink is important to understand that groups can arrive to decisions, which are not due to the model itself but through the act of interaction of people that gathered to build models (Hämäläinen & Lahtinen, 2016).
Another way of reaching consensus between group members is when groups utilize models to tackle group boundaries (Franco, 2013). In his case study, the most common boundaries, such as syntactic, semantic and pragmatic boundaries, occur when groups are collaboratively engaged in problem-solving.
To handle syntactic boundaries, which are troubles in communication, groups utilized models to create a common language for its group members to share knowledge. Having overcome the syntactic boundary, the group was able to challenge semantic problems. Semantic boundaries are differences in knowledge and specialization within the group members. In this case, models helped to translate the underlying problem, to stimulate shared understanding and acknowledgement of the member's interdependencies. The last problem encountered was the pragmatic boundary, which represented conflicts of interests generated after overcoming the previous mentioned boundaries. The utilization of model for pragmatic boundaries only provided help in the solution of one out of the two workshop cases. Yet in that group, models supported negotiations, alternatives to tackle problems collectively and to achieve shared interest (Franco, 2013).
Eventually, studies suggest that model-based problem-solving in groups provide a basis for better decision-making (Franco et al., 2016; White, 2016), as humans individually have limited rationality (Luoma, 2016). It therefore is apparent, that groups largely depend on models to finding agreements and acting better collectively.
Lastly, humans can utilize models in organizations to support decision-making. For this, Luoma (2016) examines organizations separately from individuals and groups, as he regards them as fundamentally different. The main difference lies in the fact that organizations are not unitary, but is a coalition of individuals and groups with deviating interests (Cyert & March, 1963). Therefore, although organizational decision-making provides externality by providing collective behavior, collective mind perspectives (White, 2016) and collective rationality (Luoma, 2016), the act itself cannot be undergone as easily as when individuals or groups come to agreement due to conflicts interest and individual objectives (Cyert & March, 1963).
Another study has shown that through the application of models, organizational operations can in fact be improved (Pidd, 2004). Pidd's study suggests that the utilization of models in organizations hinge on two types of situations that need to be distinguished before decisions are made. Depending on the complexity of the underlying problem, the first situation requires models for routines and automation, the other case demands human interaction and thinking. Based on this differentiation, Luoma (2016) further examines the difference between these two situations as well as the different required model applications for decision-making. He identifies two models, one for routine decision-making and the other for problem-solving that should be applied to underlying two situations.
The model for routine decision-making is found in organizational situations that are reoccurring. Decision-making procedures and frames are pre-established so that there is no need to constantly reflect upon the problem and model used for this process. Routines are deemed to be more favorable as they bring stability into the organization (Cyert & March, 1963) and enhance efficiency, accuracy and predictability of decision-making (Luoma, 2016). Therefore, models for routine decision-making are designed for the application of simple reoccurring decision-making processes within an organization, enhancing its operations. Yet, routines may cause limitations. Through the application of the same models, these processes could lead to narrow problem framings and failures as they do not allow to recognize that changing models are required as circumstances and situations evolve (Luoma 2016).
Routine decision-making fails, when circumstances of a situation become a novelty that causes invalidity of existing decision-making procedures and frames (Pidd, 2004). According to Pidd, these situations require human interaction instead of automation, which means organizations need alternative models to overcome the problem. For these types of situations, Luoma (2016) uses the term models for problem-solving to cope with situations demanding human interaction and more cognitive effort. As novelties in situations require more cognitive efforts in comparison to routine decision-making, they are susceptible to errors due to the lack of existing experience (Mingers, 2011). This problem becomes serious, if the individual users of models for problem-solving within the organization do not have the knowledge gained from creating the model (Monks et al., 2014) to doubt and challenge the outcome of models.
Though there exists shortages as regard each case and their resolution processes, organizations continue to use models to either solve routine decisions or support their complex problem-solving. The separation of the two decision-making processes within the organization can lead to more effective utilization of models (Luoma, 2016). However, choosing the model for application demands investments in time and resources. Luoma (2016) suggests that the benefits and use cases of models must be previously evaluated before they are applied .
4.1.2 Model praxis
Model praxis describes where and how models are deployed. As many researchers have hinted in their studies, OR is not just about establishing analytical concepts, but finding contributions to actual practices (Hämäläinen et al., 2013; Luoma, 2016; White, 2016). Criticism towards the actual use of OR exist, as Ormerod (2014) has pointed out, in accordance with Pickering (1995), who criticized that an analysis of a field of science cannot be done without regarding the practice (Pickering, 1995), especially in the OR field (Ormerod, 2014). Ackoff (1977) also passed criticism reporting that OR optimization as an objective for organizations will lead to false depiction of the reality. These views are exemplary when it comes to emphasizing the role of practices in this field. For this reason, it is reasonable to review the main papers on how humans use analytical models in the field of OR to solve problems and to support their decisions.
O'Brien (2014) analyzed the use cases of OR in the area of strategy support. By examining practitioners that acted as OR consultants, four situations were regarded and how in those situations the consultants using OR could offer support for organizations in solving their impending problems.
Hämäläinen (2015) in the area of environmental modeling, has shown, how the field uses models to tackle environmental problems, and that environmental modeling is susceptible to behavioral factors such as cognitive biases. Keller et al. (2016) depicts the possibility of OR utilization in the field of military stability operations. In their experiment, they found out that psychological heuristic models in military operations were able to solve simple, but not complex decision tasks (Keller et al., 2016). Studies have provided many more cases of model deployment in daily lives such as in health care (Brailsford, 2016) or in finance during mergers and acquisition processes (Atkinson & Gary, 2016).
From a different perspective, these forms of organizational model deployment to achieve knowledge and to decide on behalf of the knowledge gained in fact are also represented in processes as the KDD (Fayyad, 1996a). KDD is one of the most prominent use cases for models, due to the fact that this process model has found ways into businesses areas such as in marketing, finance or manufacturing, but also has since its introduction been constantly enhanced through gained personal and industrial experiences (Cios et al. 2007). For this reason, it is important to examine KDD in more detail for a better understanding of model deployment. Thus, the following part focuses on the model methods that are applied in practice through the data-mining step of KDD, as these used in practice while other academic methods that do not reflect the real-world problems of organizations might not be adapted (Luoma, 2016).
4.1.3 Model methods
Model methods are technical resources that OR actors utilize in practices (Franco & Hämäläinen, 2016b). Criticism that were mentioned in the practices of OR, that models must find ways in practice to solve and support decisions, mirrors within the methods. This can be explained by the fact that traditional OR does not integrate behavioral factors (Mingers, 2011; Hämälätinen et al., 2013). Mingers criticized the ineffectiveness of existing mathematical models when trying to challenge novel and complex situations, referred as the problem-solving processes by Luoma (2016).
Hämäläinen et al. (2013) use accumulation, O'Keefe (2016) simulation and Luoma (2016) mentions methods such as simple optimization, forecasting or data envelopment analysis to solve problems and support decision-making. This thesis will have a closer look at the main model methods used in the data-mining step of the KDD, for they are prevalently deployed in practices.
The main models used in KDD can be categorized according to whether the models are applied for descriptive or predictive purposes (Kantardzic, 2011). Descriptive models are applied to find patterns and to describe a chosen data set, while prediction should predict future values out of the chosen data set. The most prominent models that were initially introduced for KDD processes are classification, regression, clustering, summarization, dependency modeling and change and deviation detection (Fayyad, 1996). Although Fayyad (1996) mentions that the separation between prediction and description is not as clear, these models can be, regarding their most common use, all be assigned to one of the two types. Against this backdrop, in the following, these models methods will be shortly illustrated and whether they are considered to be descriptive or predictive. According to Fayyad (1996b) and Kantardzic (2011):
1. Classification models are commonly used as predictive learning functions and map data items into predefined classes.
2. Regression models are commonly used as predictive learning functions and map data items real-value prediction variables.
3. Clustering models are commonly used for descriptive purposes to identify a finite set of categories or clusters to describe the data.
4. Summarization models are commonly used for descriptive purposes to find a compact description of the data set.
5. Dependency modeling is commonly used for descriptive purposes to find a model that significantly describes the dependencies between variables.
6. Change and deviation detection models are used for both descriptive and predictive purposes to discover the most significant changes in a data set.
As regards the models used in the KDD process, after the purpose and the model have been selected, the next step that follows is method-selection. For the method-selection, the technical tool, the algorithm that should eventually find patterns in the data in the data-mining step of the KDD, is selected. Fayyad (1996a) mentions various methods that can be applied for predictive purposes, such as non-linear and nearest-neighbor classifiers for classification.
Studies such as Lessmann et al. (2015) approve that there are a variety of
methods that can be used for classification. In their study, each classifier performed differently, some being more appropriate than others to the same credit-scoring data set. This implies that the choice of model methods can vary according to the given problem and whether the utilization of the method is meant to predict or describe the underlying data set. The importance of the choice of models and methods within steps of a model-based sequence is underlined by Hämäläinen & Lahtinen (2016). They outline how the paths taken, specifically models, impact outcomes. Especially sequences where the process is iterative such as the KDD can be highly influenced by the choice of models and methods. Therefore it is a consequential to examine steps within the KDD to pinpoint which part of the process is susceptible to model methods and potentially human intervention.
4.2 The Knowledge Discovery in Databases
To display to what extent the choice of models can influence outcome can be seen when regarding the KDD. Each step of the KDD is part of an iterative sequence (Fayyad, 1996) that can lead to different paths and outcomes. Outcomes can be understood as identified patterns or the knowledge that has been extracted through the KDD process. KDD is of great importance for this thesis, because the processes fundamentally rely on human interaction with data (Brachman & Anand, 1994) and the application of models, which serve as an exemplar for how humans interact with models and enhance decision-making through the application of those models.
For this, two KDD process-models are going to be examined in the following. It must be noted that process-models are road maps for organizations (Cios et al., 2007), which are different from the models applied in the data-mining step of the KDD process. The first KDD-model is by Fayyad (1996a, 1996b), which has established itself as the most successful research model in academics (Cios et al., 2007). Here, it will be referred as the Academic model. Another process model that must be regarded is CRISP-DM (Chapman et al., 2000), which is commonly applied in business and reflects the application of KDD in the real-world. It will be referred as the Business model.
4.2.1 Academic model
Figure 4.1: Overview of the KDD Process Model (Fayyad et al., 1996a)
Fayyad et al. (1996a, 1996b) introduced the Academic model of KDP. The process comprises nine-steps to extract knowledge out of a data set. According to Fayyad (1996a, 1996b) the process includes the following steps:
1. Development of understanding about the application domain and identification of the goals that are relevant for the user of the knowledge
2. Creation of target data set by selecting subset of relevant variables and data samples
3. Cleaning and preprocessing of the data by removing outliers and missing data
4. Reduction and projection of data by finding relevant attributes through the application of dimensionality reduction and transformation methods
5. Matching goals of the process that were identified in the first step to data-mining models such as classification or clustering, depending on whether the goal is prediction or description of the data
6. Selecting data-mining methods to discover patterns within the data set in the next step
7. Data mining and the discovery of patterns in data to generate representational forms
8. Interpretation of mined patterns and knowledge
9. Utilization of the discovered knowledge by incorporating knowledge into other systems to further employ it with other gained knowledge or documentation and reporting knowledge to the user of the knowledge
The focus within the process model lies in the fact that it is iterative and interactive (Fayyad, 1996a; Brachman & Anand, 1994). Iterative means that each prior step provides basis for the following step. Interactive means that the process must involve humans to operate. Furthermore, the steps within the process are interdependent as loops between two steps of the process can be executed as can be seen one Figure 4.1. Although this model has found acceptance as the standard KDD process model, researchers have criticized the process model for lacking in business contexts (Cios et al. 2007). For this reason, soon after its initial introduction, business process models followed that complemented the original KDP by gained industrial experiences, such as the CRISP-DM model.
4.2.2 Business model
Figure 4.2: CRISP-DM Process Modell (Chapman et al., 2000)
CRISP-DM stands for CRoss-Industry Standard Process for Data Mining and was introduced in the late 1990s by three European companies Daimler-Benz (now: DaimlerChrysler), Integral Solutions Ltd. (now: SPSS) and NCR to tackle the increasing demands of data mining projects. Developed with the experience gained from practices and real-life problems, the concept of CRISP-DM does not deviate from the initial KDD process, even though CRISP-DM is structured as a six-step process (Chapman et al., 2000):
1. Business understanding, which requires businesses to assess situations, determine business objectives and data-mining goals and to produce a project plan
2. Data understanding, which requires businesses to collect initial data and to describe, explore and verify their quality
3. Data preparation, which requires businesses to select, clean, construct, integrate and format data
4. Modeling, which requires businesses to select modeling methods, to generate a test design, to build and assess a model
5. Evaluation, which requires businesses to evaluate results, review processes and determine the next steps
6. Deployment, which requires businesses to plan deployment, monitoring and maintenance, to produce final reports and to review the entire project
CRISP-DM is important especially regarding its neutrality towards industries, tools and applications, which possibly accounts for its prominent use in business (Cios et al., 2007). Although refinement of CRISP-DM such as the recent ASUM (IBM Analytics, 2016) was developed to additionally incorporate agile project management (Ponsard, 2017), CRISP-DM has proven successful up until the latest past (KDNuggets, 2014).
Against this background, there are similarities and differences between the steps of KDD and CRISP-DM. Although concepts of the steps of KDD are in harmony with those of CRISP-DM, the number of steps and their titles are slightly different.
Similarities between the two process models are the first two and the last two steps (Mariscal et al., 2010). The first step of KDD reflects the same characteristics of the Business understanding step in CRISP-DM to evaluate the current situations. Creation of data subset can be compared to the Data understanding step of the business model, as both steps for the first time encounter data. Similarly, Interpretation in the KDP equals the Evaluation step of CRISP-DM, while the final step, which represents the utilization of knowledge, is comparable to the Deployment-stage of CRISP-DM in creating documentation and final reports.
The differences are reflected in the remaining steps of KDP and CRISP-DM. While Data cleaning and preprocessing and their reduction are two subsequent steps in the KDP, in CRISP-DM they are summed together to Data preparation. Finally, Modeling in CRISP-DM is separated as steps five, six and seven of KDP, which are the selection of models, methods for data-mining and the data-mining activity itself. The similarities and differences are summarized in the following Table 4.1.
Table 4.1: Similarities and Differences within Steps
Before moving onto the next part, it is essential to know why both KDP and CRISP-DM processes were contextualized into this thesis. The reason brings back to how models influence processes and outcomes. This has been previously examined by the notion that processes are path dependent (Hämäläinin & Lathinen, 2016). According to Hämäläinin & Lathinen (2016), steps within a process that are susceptible to path dependence are for instance problem structuring, choosing models but also implementation of the results, which are all steps included in KDD or CRISP-DM. This raises the assumption that each step within the two process models is a path that is taken.
What is more, when Hämäläinen & Lathinen (2016) identified drivers of path dependence, behavior is reported as an example to claim that paths taken are an accumulation of biases. This further induces to reconsider the importance of human participation in processes. Although steps within KDD or CRISP-DM can be influenced by humans, the initial goal of KDD processes and the following model CRISP-DM was to wholly automate the processes as much as possible (Fayyad, 1996). In this regard, it can be beneficial for humans not to interfere in each step of the process but let it be routinized and only act if problem-solving situations occur, a concept initially introduced by Pidd (2004).
On the contrary, Brachman & Anand (1994) regard the KDD as a human task, highlighting the fact that humans play a key role in surmounting the problem of large influxes of data.
The data-mining step can be identified as parts of the path that is most prone to human interaction as it gives people the freedom of choosing from a variety of models and methods, to apply them on data and also, to utilize the outcomes generated by the models for further decisions.
The following part will examine how humans can govern analytical models in this step and potentially benefit to solve problems and support their decisions.
4.3 Governance of analytical models
The previous examination of the process models have proved that knowledge gained through processes are influenced by paths taken, which are on the other hand influenced by humans and their drivers such as behavior and motivation (Hämäläinen & Lathinen, 2016). Despite Hämäläinen & Lathinen (2016) suggesting peer reviewing as a way to mitigate the risks of path dependence, humans play the key role in all of those process steps (Banchand & Anand, 1994).
The emergence and use of terminologies such as behavioral data mining (Manca et al., 2015; Kusakabe, 2014) indicate the shift of interest towards the role of humans in data-mining. It indicates that humans and their behavior might indeed play an influential role in KDD as a whole, but especially in the data-mining step.
The reason why the modeling-step of CRISP-DM and the equivalents in KDD deserve closer attention is because it was considered as paths most impacted by human participation. But also, coming back to Pidd (2004), this step can be regarded as the most candid to novelties that require human interaction instead of automated routines in organizational problem contexts.
Figure 4.3: Situations for routine use and human interaction (Pidd, 2004)
Given the fact that the modeling-step is susceptible to human's behavior, people can mitigate risks such as unauthorized change in data by applying governance concepts (Cheong & Chang, 2007). In the following, data governance will be introduced as a way to how humans can affect the modeling-step, consisting out of the selection of models, methods and the data-mining activity, to enhance data quality and decision-making.
There is general consent that organizations can utilize data governance to improve data quality and thus processes for decision-making. Studies such as Sarsfield (2009) also consider the increase in severity of regulations such as SOX, Basel I and Basel II as causes for concern of organizational data governance. Also, the recent European General Data Protection Regulation (EU-GDPR) requires organizational confrontation with data and is treated as an important issue as surveys such as the one conducted by Pryor (2017) depicts. Here, 39% of the authorities in organizations questioned disclosed regulatory requirements as the key driver of data governance, whilst 54% were looking for ways to find efficiencies in processes. This leads to the assumption that in practice, there are two basic interests when confronted with data governance – The first one being enhancement of data quality and the second, the necessity to comply with regulations (Sarsfield, 2009). The first concept is the one that matters when it comes to human participation in knowledge discovery and decision-making processes.
In a Gartner report, Friedmann (2006) emphasizes how data quality is important for business intelligence activities such as data modeling or data warehousing. As such, the concept of data governance can directly be applied to the aforementioned modeling step of KDD process-model. Through data governance, organizations aim to optimize data quality for IT-systems (Otto, 2011) that can be further processed in data-mining steps to find patterns and gain knowledge. Otto has outlined the fact that data quality is essential in so far, as it is required by automated business processes. This is the case with the KDD process, as data-mining requires data governance and thus human intervention, while other parts of the process should stay automated.
As good as it sounds, in the academic community as well as in practice, there is not a universal definition of data governance (Otto, 2011), which is represented by the many distinctive data governance structures and frameworks available (Thomas, 2006; Cheong & Chang, 2007; Wende, 2007; Weber et al., 2009; Panian, 2010; Ladley, 2012). Although all data governance structures vary, there are two main similarities in the approaches.
First, there is a need for a data quality management with actors and responsibilities. Proposed roles are for instance data stewards, data quality boards, executive sponsors, an executive steering committee, project teams, data stakeholders or data governance offices (Thomas, 2006; Wende, 2007; Weber et al, 2009; Ladley, 2012). Next, data governance consists out of steps defining how data governance should be applied in practice. The most precise way to deploy data governance is offered by Ladley (2012).
Figure 3.4: 8-phases of Data governance (Ladley, 2012)
For each step of the process, Ladley (2012) outlines considerations and corresponding activities performed in that phase. However, the emphasis lies in the fact that the data governance process is a cycle that is iterative and should be continuously executed. It is apparent, that data governance has parallels to KDD. Ultimately, data governance, which should enhance data quality during the modeling-step in the KDD, is a model itself that should be governed by model actors.
One interesting finding is the term analytical models. In many papers, analytical models were used to describe mathematical and statistical models, which in the field of OR were simply reduced to models. It therefore caused complications when realizing that KDD and CRISP-DM were process models that contained a modeling-step that applied analytical models and methods. This also directly relates to the next discussion point, data governance. Data governance intends to maximize data quality (Otto, 2011). Regarding this thesis, KDD presented an opportunity for human intervention into an otherwise routinized process. Following, data governance was a concept that directly docks to the modeling-step of CRISP-DM or KDD. It was notable that humans that should perform the act of data governance, but data governance could be conceptualized as process model as shown by Ladley (2012).
This again highlighted the importance of not only analytical models but also models in general that onto KDD to enhance deployment of knowledge and therefore, decisions.
In addition, the field of BOR is still a relatively young field of study. The terminology BOR as it stands today only existed since 2013 (Hämälätinen et al., 2013), while relevant scientific works followed in 2016, three years after its initial introduction. This has led to journal articles that predominantly focus on conceptualizing the framework of BOR and putting matters into theoretical contexts, although explaining human behaviors in processes of model creation and usage is more related to practices, as criticized by Luoma (2016) or White (2016). Therefore, many studies identify gaps in the theories of explain BOR and propose aspects to consider in practice (Hämäläinen et al. 2013; White, 2016; Hämäläinen & Lahtinen, 2016; Becker, 2016), but the complementary studies that engage in experiments regarding human behaviors in modeling are missing. This could surely be due to the recent emergence of BOR as a novel field, but also be due to the fact that the field can be regarded complex and at the same time ambiguous. On the one hand, others areas such as Decision Analysis or System Modeling cover issues dealing with humans and models (Hämäläinen et al. 2013; Pidd, 2004). On the other hand, BOR combining human behavior with models is a small part of the umbrella term OR. Becker (2015) has attempted to provide a definition of BOR that underlines the complexity of this field.
This thesis examined three different areas of study, namely BOR, KDD and Governance, to provide findings on how humans interact with models, how they gain knowledge through the application of analytical models and how the application can enhance decision-making.
The results show that humans interact with models as individuals, groups and organizations utilizing model methods. Individuals create or use models in problem-solving and decision-making, although model creators were better in understanding problems. Groups interact with models to overcome group boundaries, to act collectively and to find mutual agreements. Organizations utilized models to routines and novel situations.
Knowledge through analytical models can be gained by their utilization within the data-mining step in process models such as the KDD or CRISP-DM, which have found prominent use in literature but also in real-life business such as in marketing, finance or manufacturing. The patterns and knowledge gained from the process were considered as basis for future decision-making.
Data governance is a concept that optimizes the quality of data flowing into IT-systems and organizations. When data governance is used for the data-mining step of KDD, it is apparent that organizations and humans can enhance decision-making when analytical models process the relevant data.
Hämäläinen & Lathinen (2016) and Luoma (2016) have pointed out that there is a large difference on the outcome, depending on the choice of a model. For this reason, it must be bared in mind that analytical models can enhance decision-making, yet humans must remain flexible and open to multiple model choices (Hämäläinen & Lathinen, 2016) when it comes to model-supported problem-solving and decision-making.
...(download the rest of the essay above)