CHAPTER 5

SIMULATION AND RESULTS

5.1 Simulation Tool (MATLAB)

MATLAB could be a powerful ADP system for handling the calculations concerned in scientific and engineering issues. MATLAB refer as Matrix Laboratory, as a result of the system was made to make matrix calculation significantly straightforward [37]. MATLAB program and script files always have ends with extension ".m". Script files contain a sequence of usual MATLAB commands, that area unit dead (in order) once the script is termed among MATLAB. In MATLAB nearly each knowledge object is assumed to be AN array. a decent supply of knowledge associated with MATLAB, the creator company THE MATHWORKS Iraqi National Congress and their different product is their online page at WWW.mathworks.com [31]. There are 2 essential needs for victorious MATLAB programming [37]-

a) We need to learn the exact rules for writing MATLAB statements.

b) We need to develop a logical plan of attack for solving particular problems. The MATLAB program implements the MATLAB programming language, and give a very extensive library of predefined functions to make technical programming job simple and more efficient.

5.1.1 Advantages of MATLAB

MATLAB [38] has several benefits compared to standard laptop languages for technical downside finding. Among them are-

1. Ease of use:

MATLAB is AN understood language like Basic; it's terribly straightforward to use. Programs is also simply written and changed with the inbuilt integrated development setting and debugged with the MATLAB programme. as a result of the language is really easy to use, it's ideal for the speedy prototyping of recent programs. Several program development tools area unit provided to form the program straightforward to use. They hold AN integrated editor/debugger, on-line records and guide, a space browser, and demanding demos.

2. Platform Independence:

In MATLAB programs written on any platform can run on all of the opposite platforms, and knowledge files written on any platform is also scan transparently on the other platform. As a result, programs written in MATLAB will migrate to new platforms once the requirements of user changes.

3. Predefined functions:

MATLAB has intensive library of predefined functions that offer tested and pre-packaged solutions to several basic technical tasks. There are several special purpose toolboxes offered to resolve complicated issues in specific areas. Toolboxes area unit libraries of MATLAB functions accustomed customise MATLAB for finding explicit category of downside. Toolboxes area unit a results of a number of the world’s prime researchers in specialised fields. They’re such as pre-packaged ‘of-the-shelf’ software for explicit category of downside. This area the gathering of special files referred to as M files that reach the practicality of the bottom program. Such files area unit referred to as “m-files” as a result of they need to have the string “.m”. This extension is needed so as for these files to be understood by MATLAB. Every chest is purchased on an individual basis. If AN analysis license is requested, the MathWorks sales division needs careful data concerning the project that MATLAB is to be evaluated. Overall the method of deed a license is pricey in terms of cash and time. If granted (which it usually is), the analysis license is valid for 2 to 4 weeks. The assorted toolboxes are -

a. ANFIS

b. Control Systems

c. Signal Processing

d. Communications

e. System Identification

f. Robust Control

g. Simulink

h. Image Processing

i. Neural Networks

j. Fuzzy Logic

k. Analysis

l. Optimization

m. Spline

n. Symbolic

o. User Interface Utility

4. Device- Independent plotting

MATLAB has several integral plotting and imaging commands. The plots and pictures may be displayed on any graphical output device supported by the pc on that MATLAB is running. This capability makes MATLAB an impressive tool for visualizing technical knowledge.

5. Graphical User Interface:

MATLAB embrace tools that permit a computer user to interactively construct a graphical interface (GUI) for his/her own program. With this capability, the computer user will style subtle data-analysis programs that may be operated by comparatively inexperienced users.

6. MATLAB Compiler:

MATLAB code understood instead of compiled. A separate compiler is accessible. This compiler will compile a MATLAB program into a real workable code that runs quicker than the understood code. Its an excellent thanks to convert a model MATLAB program into AN workable and appropriate purchasable and distribution to users.

MATLAB is AN economical tool to develop applications supported neural network. So it's utilized in projected result for carcinoma designation and prognosis victimization polynomial neural network.

5.1.2 Limitations of MATLAB

Following are some limitations of using MATLAB [38]-

1. It is an interpreted language and therefore can execute more slowly than compiled languages.

This problem can be mitigated by properly structuring the MATLAB program and by the use of MATLAB compiler to compile the final MATLAB program before distribution and general use.

2. A full copy of MATLAB is 5-10 times more expensive than a conventional than C or FORTRAN compiler. There is also an inexpensive student edition for MATLAB which is a great tool for students. The student edition of MATLAB is essentially identical to the full edition.

5.2 Weka

Weka refers to Waikato Environment for Knowledge Analysis is a well-liked collection of machine learning software which is written in Java, developed at the University of Waikato, New Zealand. It is free software approved in the GNU General Public License. This software is easily available at http://www.cs.waikato.ac.nz/ml/weka. The Weka is a group of state-of-the-art machine knowledge algorithms and data pre-processing tools. In weka algorithms can concern straightly to a dataset or access from Java code. It includes tools for data pre-processing, categorization, regression, clustering, association rules, and visualization. It is highly compatible for developing latest machine learning plan. In weka we can pre-process a dataset, added to a learning system, and investigate the resultant classifier and its performance, all this process can be done by without writing any programming code . Data is the integral component of the work. [39]

Fig 5.1: Weka GUI

5.2.1 Weka Application Interfaces

• Explorer

– preprocessing, attribute selection, learning, visualiation

• Experimenter

– testing and evaluating machine learning algorithms

• Knowledge Flow

– visual design of KDD process

– Explorer

• Simple Command-line

– A simple interface for typing commands

5.2.2 Main features of Weka

• 49 data preprocessing tools

• 76 classification/regression algorithms

• 8 clustering algorithms

• 15 attribute/subset evaluators + 10 search algorithms for feature selection.

• 3 algorithms for finding association rules

• 3 graphical user interfaces

– “The Explorer” (exploratory data analysis)

– “The Experimenter” (experimental environment)

– “The KnowledgeFlow” (new process model inspired interface)

5.2.2 Weka: Download and Installation

• Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/

– Choose a self-extracting executable (including Java VM)

– (If you are concerned in modifying/extending weka there is a developer edition that includes the source code)

• After download is finished, run the self extracting file to install Weka, and use the default set-ups.

5.3 Description of dataset

Detailed description of the datasets used in the proposed research is as follows:

5.3.1 Ovarian Cancer Dataset

This information has 216 instances and 15154 attributes as well as the category attribute. Attribute one through 15154 area unit accustomed represent instances. Every instance has one in all 2 doable classes: benign or malignant [39]. Women have 2 ovaries that area unit situated within the pelvis, one on both sides of the female internal reproductive organ. The ovaries create feminine hormones and turn out eggs. Once cancer starts in either ovary, it's known as gonad cancer. Ovary cancer causes a lot of deaths than the other cancer of the feminine genital system. However once gonad cancer is found in its early stages, treatment is handiest. salpinx cancer (which starts within the fallopian tube) and first serosa cancer (which starts within the lining that supports the abdomen) area unit terribly just like gonad cancer. Several of the signs and symptoms area unit identical and doctors treat these cancers within the same approach. Most (more than 90%) gonad cancers area unit classified as "epithelial" and area unit believed to arise from the surface (epithelium) of the ovary. However, some proof suggests that the salpinx may even be the supply of some gonad cancers.

Since the ovaries and tubes area unit closely associated with one another, it's thought that these fallopian will cancer cells can mimic gonad cancer [39]. Ovary cancer typically contains a comparatively poor prognosis. it's disproportionately deadly as a result of it lacks any clear early detection or screening take a look at, that means that the majority cases don't seem to be diagnosed till they need reached advanced stages. quite hour of girls presenting with this cancer have stage III or stage IV cancer, once it's already unfold on the far side the ovaries. Ovary cancers shed cells into the present fluid inside the bodily cavity. These cells will then implant on different abdominal (peritoneal) structures, enclosed the female internal reproductive organ, vesica, viscus and therefore the lining of the viscus wall peritoneum forming new neoplasm growths previous to cancer is yet alleged.

The five-year continued existence rate for all stages of gonad cancer is forty seventh. For cases wherever a diagnosing is created early within the sickness, once the cancer continues to be confined to the first website, the five-year endurance rate is ninety two.7%.Ovarian cancer is that the second commonest medicine cancer and therefore the deadliest in terms of absolute variety. It caused nearly 14,000 deaths within the USA alone in 2010. Whereas the five-year endurance rate for all cancers combined has improved significantly: sixty eight for the overall population diagnosed in 2001 (compared to five hundredth within the 1970s), gonad cancer contains a poorer outcome with a forty seventh survival rate (compared to thirty eighth within the late 1970s).

TABLE 5.1 A BRIEF DESCRIPTION OF CANCER DATASETS

Dataset name No of No of No. of classes

Attributes instances

15154 216 2

Ovarian Cancer (Benign, Cancer)

For simulation datasets named gonad Cancer original dataset downloaded from the FDA-NCI Clinical genetic science Program Databank and saved as a document. A short description of datasets is given in table 5.1. Careful decipherment of dataset is provided in next section.

¬¬¬¬¬¬¬

Fig. 5.2 Weka Explorer.

Description :- The above snapshots shows the weka explorer.

Fig 5.3 Ovarian Cancer data set in Weka

Description :- In the above figure ovarian cancer data set is loaded in weka explorer with 216 instance and 20 attributes.

Fig 5.4 Interquartile Range on Dataset

Description :- The above figure Interquartile range is applied to ovarian cancer data set so that two more attributes are added to it named as outlier and extreme values. So total instance and attribute are 216 * 22.

Fig 5.5 Extreme values

Description :- Above snapshot shows the extreme values.

Fig 5.6 Remove Extreme Values

Description :- In the above figure all the extreme values are remove from the datasets. After that overall dataset is compact to 216*21.

Fig 5.7 Classifier Window

Description :- Above snapshot shows the classification windows .

Fig 5.8 Classifiers

Description :- Above snapshot shows the various classifiers for classifying the data set

Fig 5.11 Naives Bayes Algorithm

Description :- On the Ovarian Cancer dataset Naives Bayes algorithm is applied through this 110 instance are classified correctly from 216 instance. And gives 50.92% accuracy.

Fig 5.11 IB1 Algorithm

Description :- On the Ovarian Cancer dataset IB1 algorithm is applied through this 170 instance are classified correctly from 216 instance. And gives 78.70% accuracy

Fig 5.11 IBK Algorithm

Description :- On the Ovarian Cancer dataset IBK algorithm is applied through this 170 instance are classified correctly from 216 instance. And gives 78.70% accuracy.

Fig 5.12 K- Star Alogrithm

Description :- On the Ovarian Cancer dataset K-Star algorithm is applied through this 178 instance are classified correctly from 216 instance. And gives 82.40% accuracy.

Fig 5.13 LAD Tree Algorithm

Description :- On the Ovarian Cancer dataset LAD Tree algorithm is applied through this 170 instance are classified correctly from 216 instance. And gives 78.70% accuracy.

5.4 Results and Discussion

Evaluated results divides the entire data into two sets- malignant (cancerous) and benign (non-cancerous). Ovarian Cancer databases are used for training and testing. The result are showing are as follows.

5.4.1 Ovarian Cancer Result

The following results are shown in table within which first of all the gonad cancer dataset options is reduced refer table 5.2, within which massive dataset of size 15154*216 is decreased to 20*216.The selected twenty options is listed in table 5.3.then the reduced dataset is loaded in weka to teach the principles and therefore the result's given table 5.3.

TABLE 1. DATASET WITH GENETIC ALGORITHM

Dataset Attributes Instances Classes

Ovarian Cancer

15154 216 2

(Benign, Cancer)

Ovarian Cancer

(with GA) 20 216 2

(Benign, Cancer)

TABLE 2. APPLY INTERQUARTILE RANGE

Dataset Attributes Instances Classes

Ovarian Cancer

(GA) 22 216 2

(Benign, Cancer)

Remove extreme values 21 216 2

(Benign, Cancer)

TABLE-3: CLASSIFICATION RESULTS OF VARIOUS ALGORITHM

Classifier Correctly classified instance

TP rate FT rate Precion Recall F. Measure Roc area Time

(sec)

Naives Bayes 110

(50.92%) 0.492 0.394 0.874 0.492 0.629 0.635 0.3

IB1 170

(78.70%) 0.847 0.545 0.896 0.847 0.871 0.651 0

IBK 170

(78.70%) 0.847 0.545 0.896 0.847 0.871 0.655 0

K-Star 178

(82.40%) 0.902 0.606 0.892 0.902 0.897 0.626 0

LAD Tree 170

(78.70%) 0.896 0.818 0.859 0.896 0.877 0.574 2

Table 5.4 shows accuracy of proposed method for running various algorithm for classification. According to above Table 5.4, we can clearly see the best algorithm in WEKA is Bayesnet classifier with an accuracy of 61.57% because it takes 0.1 seconds for classifying the dataset. The total time required to build the model is also a crucial parameter in comparing the classification algorithm.

Fig 2 Accuracy measure chart

Description :- The above figure shows the accuracy measure chart of various classification algorithm applied on Ovarian cancer dataset.

**...(download the rest of the essay above)**