The healthcare industry is the most important industries in the world and is the most complicated and challenging problem. The diagnosis of heart disease is a vital and tedious task in medicine. World Health Organization (WHO) report Global Atlas on cardiovascular disease (CVDs) prevention that CVDs are leading to death, disability and illness in world. The WHO has estimated that 12 million death occurs worldwide is due to heart disease every year(Soni et al., 2011b).The term heart disease comprises the multiple diseases which consist of heart and blood vessels that can affect the normal function ability of heart. It is intended to deal only with the condition commonly called “Heart Attack” and the factors, which lead to such condition. Narrowing the coronary arteries results in inadequate supply of oxygen and blood to the heart and leads to Coronary heart disease (CHD). Heart attack is because of blockage of a coronary artery that due to a blood clot. The blood received by the heart muscles inadequate will also results in chest pain. There are other factors that can cause the heart attack as well.
The detection of heart disease from various symptoms needs effort to utilize the huge amount of enormous amounts of heart disease data. In healthcare, the data mining is the most useful way to all the healthcare application in order to convert the huge amount of medical data into some useful purpose. Medical data mining is a potentially useful information for exploring the dataset of the medical domain compare to the hospital information system nowadays. There are some of problems for the hospital information system such as shortage of doctors who are expert in every sub specialty, ‘information rich’ but ‘knowledge poor’ in healthcare environment, missing value in medical data, biases and so on (Soni et al., 2011b). So, an automated system would be useful and beneficial in making intelligent clinical decisions by using historical heart disease databases which the traditional hospital information systems cannot. In this paper, two clustering technique which are K-Means algorithm and Fuzzy C-Means algorithm will be compared in terms of accuracy performance for diagnosis heart disease.
1.2 Problem Statement
Most hospitals nowadays make use of some sort hospital information systems to handle and manage their healthcare or patient data. Huge amounts of data in the form of images, text, and charts will be generated typically by these systems (V et al., 2012). Regrettably, these data are rarely used in clinical decision making. Clinical decisions are often made by doctors without any knowledge-rich data in database. All doctor are not equally skilled in every sub specialty and moreover there is a shortage of resource persons in many places (Soni et al., 2011b). So, an accurate automation medical diagnosis system would be extremely advantageous that as it would enhance medical care and patient safety, reduce the error and biases, improve the outcome of patients and reduce costs.
In this paper, we have proposed efficient approach and compare the two clustering techniques in order to find the most accurate technique for heart disease prediction. The heart disease data will be clustered by using K-Means algorithm and Fuzzy C-Means algorithm. Matlab tool is used to compare the performance accuracy of clustering technique for diagnosis of heart disease on the same dataset. By using the clustering technique (Fuzzy C-Means and K-Means algorithm), numerous amounts of heart disease data can be putting into groups by following the similarities of data. For example, a hospital may cluster a large number of patients with selecting age, sex, weight, smoke, pain location and so on as data point and allowing the clustering process for the data selected, thus we can find the clusters partitioned by the selected data. We can get clusters that have similar age, sex, weight, smoke, pain location by this way, finally analyzing each cluster can give likelihood information of heart disease and comparing the accuracy of the different techniques. The cluster will help the hospital to be more understand its patient better and thus provide more suitable medical treatment.
1.3 Project Goals
The goals of the study are to analyze, evaluate and compare K-Means and Fuzzy C-Means clustering techniques to get the most accurate algorithm for heart disease diagnosis, thus to enhance the medical care heart disease system.
1.4 Objectives of the project
Objective of the study are:
1. To study the clustering algorithms K-Means and Fuzzy C-Means in order to apply it in heart disease health care.
2. To analyze and evaluate the result in order to know the performance of each algorithm based on the performance measure.
3. To compare the performance accuracy of clustering technique for diagnosis of heart disease using K-Means algorithm and Fuzzy C-Means algorithm in order to get the most accurate algorithm for enhancing medical care system.
1.5 Scopes of the project
There are several scopes below in order to propose a clustering technique of K-Means and Fuzzy C-Means to compare the performance accuracy for diagnosis of heart disease:
1. The problem that used to solve is to cluster enormous amounts of heart disease data and compare the accuracy for diagnosis of heart disease using two techniques only which are Fuzzy C-Means and K-Means algorithm.
2. Using heart disease data from universal sources-UCI machine learning repository
3. Analyze and compare the performance accuracy of the proposed technique based on the root mean square error (RMSE) and percentage calculations of correctly classified vectors.
1.6 Significant of the project
A huge amount of data are available in many areas such as science, business, medical care, sports and so on nowadays. So, in order to meet the challenge with such large amount of data, data mining techniques such as clustering, fuzzy logic, feature selection, classification and so on have been applied successfully in many areas in the real-life. Clustering technique will be focused mainly in this paper to deal with heart disease patients’ database. The aims of this research is to make comparison between 2 clustering technique (Fuzzy C-Means and K-Means algorithm) in order to get the most accurate technique for diagnosis of heart disease. By using clustering technique in clinical decisions will also enhance medical care, reduce the medical errors and biases, improve the outcome of patients and reduce costs. Thus, the problem of shortage in doctor resources may be solved due to the intelligent clinical decisions can be made based on the historical heart disease database instead of making clinical decisions by doctor.
Currently, there are several predictive algorithm that are used to predict heart disease such as based on classification such as decision tree, na??ve bayes, neural network, SVM, k-Nearest Neighbour and etc (Chickinja). However, clustering technique will be focus in paper because it is an attractive approach for finding similar data and putting it into groups based on the similarities. Clustering not only suitable for categorizing data and organizing data but only can be used for model construction and data compression. For model construction, if we can find a group of similar data, then a model can be built up. Besides that, this research will be serving as a foundation for other researchers to find more techniques to enhance the medical care.
...(download the rest of the essay above)