Data mining is a process which is now considered to be a fundamental marketing and research tool for many modern organisations, both government and commercial. In particular, the analysis of raw data collected through data mining can be utilised to assist organisations to build on their knowledge of target groups or individuals through the recognition of common patterns from within larger quantities of data which has a range of positive applications and outcomes. As such, on face value the concept of data mining itself is not necessarily a bad thing, rather it is the way in which the collected data is recorded and how it is ultimately used which is controversial. Consequently, the primary aim of this paper is to explore several of the key negatives that are commonly put forward in arguments against data mining and in turn to show how it can impact negatively on today's society.
In regards to data mining, the collection and exploitation of an individual's personal information and the related intrusion of their privacy is considered one of the most controversial elements. As a key component of data mining incorporates the analysis of routine behaviours of groups or individuals in a discrete manner, the collection of such information allows organisations to detect particular behavioural patterns associated with these groups and to thus predict future behaviour and actions. Accordingly, the potential for the malicious use of such information is immediately evident. For example, the practice of data mining is regularly utilised within the retail sector, inclusive of the collection of behavioural data by service providers. In particular, large retail chains place a strong emphasis on consumer's interests within the development of marketing strategies and in turn utilise raw data captured through data mining to determine consumer patterns, for example understanding what products are most or least popular. This allows large retail organisations to better cater for the interests of the customer and to consequently develop larger revenue (McKinsey Global Institute, 2013). The effectiveness of this strategy is supported by Computerworld (2013), who state that the use of “big data can increase profits in the retail sector by a staggering 60%”. In such instances, due to the focus on broad trends within the data, most consumers are unaware of the collection and utilisation of their personal data and as such it is often not considered an issue by them.
In contrast, in instances where the emphasis is taken away from groups and instead placed on individuals, the practice of data mining begins to develop into a serious issue. Certainly, when signing up for certain services such as online shopping, consumers typically expect that certain sets of information will be gathered and stored. Consequently, as long as such data capture processes are acknowledged and transparent, this is generally not considered an issue by consumers and, likewise, this form of data is often restricted to a cluster form and generally not separated into data sets specific to individuals. Rather, what is now becoming an increasing issue is the secretive gathering of sensitive information through means that are considered highly unethical and intrusive, such as the involuntary gathering of personal data. One such example of involuntary data capture was conducted by Nordstrom, an upscale retailer who used sensors provided by an analytics vendor Euclid to collect shopping information from customers' smartphones each time they connected to a store's free Wi-Fi service (Computerworld, 2013). Another example of secretive behavior is evident in the case of a clothing retailer Urban Outfitters who allegedly violated consumer protection laws by telling shoppers who pay by credit card that they had to provide their ZIP code (post code) and then using that information to obtain the shoppers' addresses (Computerworld, 2013). This type of unethical data gathering and its linkages to specific individuals has caused people to question the integrity of data mining as a collection methodology.
Alongside privacy, the actual accuracy and integrity of collected data is also considered to be a fundamental concern of data mining as, without correct data, there can be a whole series of issues which can subsequently develop – ultimately the quality actual data analysis can only be as good as the data being analysed. Consequently, a key implementation challenge for organisations utilising data mining, and the subsequent information, is being able to synchronize this data with a series of conflicting or redundant data from a range of other sources (Anderson, 20**). As stated by the Information Systems Audit and Control Association (ISACA, 2013), untrusted data providers could return inaccurate or false results and that with large data sets it would be almost impossible to identify such inconsistencies, leading in turn to potentially catastrophic effects on those organisations relying on the data, particularly scientific and financial service providers. This wouldn't necessarily be an issue if data was only collected for essential purposes and not used irregularly. However, a recent study conducted by Georgetown University showed, that out of 391 websites surveyed, 92.8% collected personal identifying data. Furthermore, the extent of data collection being undertaken by numerous third parties means that all of this information is combined, making it impossible to determine what is correct and what is inaccurate. As such it is again necessary to ask whether data mining itself is bad or whether or not it is the way in which the information is used and gathered that is bad.
What is also a significant issue in relation to the process of data mining is an individual organisations ability to maintain the security of this information. Although companies and organisations store a lot of information sourced from data mining, there are often not adequate security measures put in place, thus meaning that people's personal data is vulnerable to hackers, or those who wish to gain access to it. In relation to these concerns, there have been numerous studies conducted regarding the protection of information gathered through data mining, with a study undertaken by Himss finding that the security of data mining is the “most fundamental challenge that consumers will face in the next decade”. As the utilisation of data mining has increased, so have the data-theft related incidents. An example includes Target, where 70 million customers private and sensitive information was extracted by Hackers (The Washington Post, 2014). Another occurrence of such data theft occurred in 2014, when eBay (online auction site) was targeted by hackers, putting the private information of 145 million users at risk (BBC). This theft of people's sensitive information emphasizes the risk associated with collecting and storing such a large amount of private information and raises the question whether data mining is an appropriate way of sourcing information.
In conclusion, it is evident that although there are a range of positive applications of data collected through data mining (e.g. for healthcare and financial purposes), there are also a number of negative factors in both collection processes and application, which overshadow its effectiveness as a source of information gathering. These factors include the actual integrity of the data, as well as the process in which this information is gathered. Furthermore, the way in which this data is secured is also considered to be a primary issue associated with data mining. Ultimately, it would appear that the concept of data mining in itself is not necessarily an issue with a number of positive applications evident, rather it is more a matter of how organisations (primarily third party service providers) gather, store and analyses this information. Thus, for future advancements in data gathering and information sourcing, it is important that more secure, transparent and less invasive methods are established such that data mining can be utilised for the benefit of all consumers.
...(download the rest of the essay above)