An Overview of Data Mining and Machine Learning
In recent years, companies have begun to aggressively gather large amounts of data on their userbase hoping that they can transform this data into meaningful information to further advance their business. This is where data mining and machine learning comes into the picture. Data mining and machine learning are two closely related terms that are often confused with each other, so we will examine the similarities and differences between both topics and show how they may be used in modern technology.
Data mining is a process of extracting potentially useful information from a previously unknown data set to develop models. Various patterns and correlations are discovered within the data to create training and test sets used in creating a predictive model for future data. Machine learning on the other hand is a system that self-improves its predictive algorithms based on prior training and test data to learn and adapt to new data without being programmed. The main patterns contributed with machine learning include classification and regression.
This is where data mining and machine learning get confused with each other. They both share similar tasks that involve the process of classification and regression. Classification is used in order to apply a known structure to any new data and regression is used to find the best fitting model for the data with the least amount of error. However, even though they both involve classification and regression, data mining is more of a foundation for machine learning. Because data mining begins with previously unknown data, it first needs to be reviewed and extracted by a human to ensure the training and test data sets are correct for future models. While machine learning employs algorithms based on the training and test sets from the mined data that allows it to learn and adapt to new data. This can be labeled as unsupervised learning since machine learning will run accordingly without the need of human interaction. The drawback is that the baseline information must be correct in order for the predicted output to be accurate.
An example of a useful data mining technique where machine learning is not necessary is researching past trends and creating correlations. For example, a supermarket may analyze the shopping habits of its customers to find out if any products are more than likely to be purchased together. This information can then be used to help their marketing team to create promotions or adjust product placement for higher visibility to increase their profits. Machine learning would not work here since the supermarket wants to find useful information and trends within a set of unknown data.
One example where data mining and machine learning are used together is found in the useful optical character recognition. OCR uses data mining by taking large amounts of data on letters, numbers, words from multiple unknown sources and creates models to accurately predict the given input. This is initially done by human interaction to verify the accuracy of the training and test sets. However, with more and more data collected by the verification process, more accurate predictions will be made by the system. Now when this is paired with machine learning, whenever a new letter, number, or word appears it will be able to learn and adapt to the information and output a valid prediction. A brief visual example would be if we wanted to attempt to OCR some writing that looked like “A A A A A” then the system should know that they are all still As even if they look slightly different.
As we can see, it is no wonder why data mining and machine learning can be often confused with each other because they do overlap in similar tasks, but it should be clear to see how they are two entirely different methodologies. Data mining finds useful information in previously unknown data to predict data, while machine learning predicts and learns from new data based on previously known data.
...(download the rest of the essay above)