My task is produce an output decision based on a dataset that I was given. I am going to use machine learning to experiment with the data using various classifier and then report on the best classifier approach. This report will be about comparing classifiers and offering an insight into why I have taken a specific classifier to be used to make an output decision from the datasets.
A classifier is an object of instructions, which inputs data or information about one entity (it could be: a picture, houses, cars, human, animals, etc.), and outputs a prediction (a quality, response to a binary question, probability of a value, etc.) about this entity.
Examples can be:
– input a picture (an ensemble of RGB values disposed in a matrix), and output the probability that there is a dog in the picture,
– input details of a house, output the most probable price the house will be sold for
Each classifier uses a strategy that embraces a learning calculation to recognize a model that best fits the relationship between the property set and class name of the data.
Decision Tree Classifier is a basic and generally used classification technique. It applies a straightforward thought to approach the classification issue. Decision Tree Classifier represents a progression of precisely created inquiries regarding the attributes of the test record. Each time it gets an answer, a subsequent inquiry is requested until some information about the class label of the record is reached.
So when does it terminate?
1. Either it has divided into classes that are pure (only containing members of single class)
2. Some criteria of classifier attributes are met.
DECISION TREE PARAMETERS
Criterion are string and it is optional. The default value is “gini”. Taking the one that gives the best data pick up is one of the good splitting choice. Measuring the quality of a split is the Function.
Splitter are string and it is optional. The default value is “best”. This is used for splitting each node.
Max_depth are integers or none and are optional. The default value is “None”. This is the maximum depth of the tree. If there is none, then nodes are expanded until all leaves contain less than min_samples_split samples or until all leaves are pure.
Min_samples_split are integers or float and are optional. The default value is “2”. Ideally, features either it runs of features or working set ends up in same class is what decision tree quits part the working set is based on. It can be made quicker by enduring some error at minimum split criteria. Within this parameter, if the quantity of items in working set decreases below specified value decision tree classifier quits the splitting.
Min_samples_leaf are integers or float and are optional. The default value is “1”. Float values were recently added for percentages. The required minimum number of samples to be at a leaf node is if integer, then considered min_samples_leaf as the minimum number or if float, ceil (min_samples_leaf * n_samples) are the minimum number of samples for each node and min_samples_leaf is a percentage.
Min_weight_fraction_leaf are float and are optional. The default value is “0”. The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. When sample_weight is not provided samples will have equal weight.
Max_features can be integers, float, string or none and are optional. The default value is “None”. If none, then max_features=n_features. The search for a split will not stop until at least one valid partition of the node samples is found.
Random_state are integers, RandomState instance or none and are optional. The default value is “None”.
Max_leaf_nodes are integers or none and are optional. The default value is “None”.
Min_impurity_decrease are floats and are optional. The default value is “0”. A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
Min_impurity_split are floats, threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Class_weight are dict, list of dicts, “balanced” or none. The default value is “None”.
Presort are Boolean and are optional. The default value is “False”. Whether to presort the data to speed up the finding of best splits in fitting. For the default settings of a decision tree on large datasets, setting this to true may slow down the training process. When using either a smaller dataset or a restricted depth, this may speed up the training.
Artificial Neural Networks
Artificial neural networks are relatively crude electronic systems of neurons in the view of the neural structure of the brain. They process records each one in turn, and learn by comparing their classification of the record (i.e., largely arbitrary) with the known real classification of the record. The errors from the initial classification of the principal record is sustained again into the network, and used to modify the networks algorithm for advance iterations.
Neurons are categorized into layers: input, hidden and output. The input layer is not made up of full neurons. Its contains records values that are input to the next layer of neurons. The hidden layer can have several of its layer hidden that exist in one neural network.
ARTIFICIAL NEURAL NETWORKS PARAMETERS
The learning rate parameter represents the speed at which the value of each weight is updated and this parameter can be set between 0 and 1.
NUMBER OF EPOCHS
The neural network needs to iterate several times for it to achieve higher accuracy. For all training instances through the network an epoch is one forward pass and one backward pass. the number of iterations to be made is determined from the number of epochs.
Similar to the learning rate, the momentum parameter is set to make sure that minimization of the cost function is not mistaken with local minima and the momentum parameter should be between 0 and 1.
NUMBERS OF LAYERS AND NEURON
These parameters define the number of hidden layers and the numbers of neurons (ith element represents the number of neurons in the ith hidden layers). Calculations are performed to arrive at an outcome when the input data received contains neurons.
The data set consist of six classifications, which are numerical. They are BI-RADS, Age, Shape, Margin, Density and Severity. It is a data set for breast imaging reporting and data system.
BI-RADS has a label scale of 0-6. The BI-RADS assessment categories are:
♣ 0- incomplete
♣ 2-benign findings
♣ 3-probably benign
♣ 4-suspicious abnormality
♣ 5-highly suspicious of malignancy
♣ 6-known biopsy with proven malignancy
BI_RADS is ordinal. With ordinal type, it is the order of the values that is important and significant, but the differences between each one is not clear. Look at the category above. In each case, we know that a #1 is better than a #3 or #2, but we do not know and cannot quantify how much better it is.
The median of the Bi-RADS is 4 which means in the dataset there’s a high percentage of category 4-suspicious abnormality in the dataset. The number of missing value is 2.
Age is an attribute that records the age of each patient. The minimum age in this dataset is 18 while the maximum is 96. Age is an integer and the average value is 55. The number of values is 5.
Shape is an attribute with Nominal type that refers to the shape of the tumor. Nominal type is numeric value that name the label uniquely and we use mode to grade it statically. The shape labels are split into 4 categories:
The result of this mode is 4. The number of missing values is 31.
Margin is an attribute with nominal type that refers to the margin of the tumor. Nominal type is numeric value that name the label uniquely and we use mode to grade it statically. The margin labels are split into 4 categories:
The result of this mode is 1. The number of missing values is 48.
Density is an attribute with ordinal type and it refers to how dense the tumor is. With ordinal type, it is the order of the values that is important and significant, but the differences between each one is not clear. We use median because using mean when the data is not distributed evenly will not be meaningful.
The density labels are split into 4 categories:
The result of this median is 4. The number of missing values is 76.
Severity is an attribute with Binominal type and it refers to the classification of the tumor.
The severity labels are split into 2 categories:
The mode result is 0.
Classifier for dataset
In this section, a brief listing of advantage and disadvantage of both algorithms used on this study will be provided. Taking that into account, at the end of the section a prediction of which algorithm should perform better under the current scenario will be given.
easy to interpret visually when the trees only contain several levels
prone to overfitting
Can easily handle qualitative (categorical) features
possible issues with diagonal decision boundaries
Works well with decision boundaries parallel to the feature axis
Can approximate complex situations.
Training process requires time.
Can achieve really high levels of accuracy.
Require large amounts of data.
Difficult to interpret (Black box model)
According to the pro and cons of each algorithm, there is no enough information to explore the potential of a neural network. The decision tree will get better results because the data provided has few attributes.
Initial observations and experiments with preprocessing
First observation is the value 55 in the Bi-rads attribute as you can see in the image below. The assessment has a label scale of 0-6 might but looking at the graph in the image below, it might be a mistake from the data set and as it only one entry no data has been left outside.
The second observation is that there is no entry of the value 1. It may be possible that the input might be outside the value range as its impossible to know if the values are errors
I will experiment by deleting all the 16 entries containing the values of 0 or 6 and compare the dataset with performance with no modification.
Model 1(after pre-processing)
Model 2 (before pre-processing)
False Positive Rate (%)
True Positive Rate (%)
...(download the rest of the essay above)