DESCRIPTION OF APPROACHES
(a) KNN : The basic algorithm for the classification is KNN. It can also be used for the purpose of estimation and prediction. K- nearest neighbor is based on the principle of instant- based learning, which is a technique used to identify the classification of unclassified records by comparing them with the most familiar records in the training data set which is stored until the process of finding classification is completed. As the k- nearest neighbor algorithm assigns the classification of the record on the basis of similarity, data analysts define distance function, also known as distance metrics to measure similarity. The distance function is just a real valued mathematical function 'd' such that it defines the distance between any coordinates x, y & z in the space like –
I) d(x,y) > 0 and d(x,y) = 0 only if x = y {Non – negative distance property}
II) d(x,y) = d(y,x) {Commutative property}
III) d(x,z) ≤ d(x,y) + d(y,z) {Triangle inequality property}
The distance 'd' can never be zero until both the coordinates are overlapped to each other. Commutative property shows that the distance never changes between two points no matters if x–> y or y–> x. Triangle property shows that distance between two points can never be reduced by introducing a new point. The most familiar and widely used mathematical function for determining distance between points of n- dimensional is Euclidean distance function which is given by –
d(x,y)=√(∑_(i=0)^n▒〖(x_i – y_i)〗^2 ) where i shows 'n' number of dimension.
KNN is a lazy algorithm as it uses complete training data set during the testing phase. Every training set are comprised of a set of vectors having class label associated with each vector. In the simplest case, the class labels are + (positive classes) or – (negative classes). The idea in k-Nearest Neighbor methods is to identify k that decides how many neighbor based on the distance metric or distance function are influencing or can influence the classification. With small k (e.g., k = 1), the algorithm will simply return the target value of the nearest observation, a process that may lead the algorithm toward overfitting, tending to memorize the training data set at the expense of generalizability. A small value of k means that noise will have a higher influence on the result. On the other hand, choosing a value of k that is not too small will tend to smooth out any idiosyncratic behavior learned from the training set. However, if we take this too far and choose a value of k that is too large, locally interesting behavior will be overlooked. The data analyst needs to balance these considerations when choosing the value of k. A large value make it computationally expensive and also defeats the basic philosophy behind KNN that states points that are near might possess similar densities or classes. A simple approach to select k is to set k = √n where n shows the number dimension with which a point can be defined.
(b) GENETIC ALGORITHM : Genetic algorithms (GA) are derived from the principle of Darwin in natural genetics that states that only the fittest will survive and are adaptive in nature. GA maintains population of potential solutions of the candidate problem considered as individuals or creatures or phenotypes. GA comes under the larger class of evolutionary algorithms (EA) that produces optimized solution to problems by taking inspiration from natural evolution techniques on earth such as mutation, selection, inheritance and crossover. Chromosomes or genotype are the properties of each candidate solution that can be altered or mutated traditionally. Candidate solutions are composed of binary strings i.e, having only 1's and 0's of fixed length which can be encoded too. Evolution starts from randomly generated individuals that goes on a iterative process and in each generation, fitness of every individuals in the population is evaluated. Genetic Algorithm terminates when either a maximum number of generation are produced, or a satisfactory fitness level has been reached for the population. Fitness is a value of objective function that are solved in the optimized problem. Candidate solution are also represented in variable length but they make the crossover complex unlike in fixed length representation where the parts of candidate solution are easily aligned which makes the genetic representation convenient facilitating simpler crossover. After the process of selection of high fitness value individuals, the process of evolution takes place by three genetic operator – reproduction, crossover, mutation.
Let 2 individuals having candidate solution of length 20 having crossover of 5 be –
X1= (01001|101100001000101) X2= (11010|011100000010000)
I) Reproduction does not make any change in the candidate solution of parent population and inherit the same candidate solution to the offspring population. The two resulting offspring are –
X’1= (01001|101100001000101) X’2= (11010|011100000010000)
II) Crossover interchange the bits of candidate key of parent population after the crossover bit and inherit that changed candidate solution to the offspring population. The two resulting offspring are –
X’1= (01001|011100000010000) X’2= (11010|101100001000101)
III) Mutation invert each bit of candidate solution of parent population and inherit that changed candidate solution to the offspring population. The two resulting offspring are –
X’1= (10110|010011110111010) X’2= (00101|100011111101111)
In the genetic programming unlike to evolutionary programming, the tree- like representations are explored. There are various kind of drawbacks of genetic algorithm such as complexity in search operation because of exponential increase in the search space size where the number of element that are exposed to mutation is large, ineffective in problem solving in the case where the criteria for decision making is only Fitness measure.
(c) ANT COLONY OPTIMIZATION : An Ant Colony Optimization Algorithm, abbreviated as (ACO) is a probabilistic technique that posses the basis of agents system which work on the simulation of natural behavior of ants through the mechanism of cooperation and adaption. This algorithm was first proposed by Marco Dorigo in 1992 using the concept of reducing computational problem with the help of finding good paths through graph which is as similar to the concept used by ants to seek a path between their colony and food. The basic idea behind the ACO is comprised of three views that are –
I) For every problem, a candidate solution is associated with each path that is followed by ants.
II) The amount of pheromone deposited on each path followed by an ant is proportional to the quality of the corresponding candidate solution for the target problem.
III) When an ant has to choose between two or more paths, the path(s) with a larger amount of pheromone have a greater probability of being chosen by the ant.
In ACO, the appetency of solutions is inversely proportional to the difference of importance of negative and redundant path, and the concentration is proportional to the sum of number of ants whose appetency is bigger than α where α can be defined as m/10 having the value of m equal to the no of ants in a colony. All the ants having appetency greater than α deposits incremented pheromone. ACO involve a number of parameters that need to be set approximately such as α which is used to weigh the relative influence of the pheromone, β which indicates the heuristic values in the construction of ant's solutions and posses the value between 2 and 5 usually, ρ which is known as evaporation rate parameter where 0 ≤ ρ ≤1 is used to regulate the degree of the decrease of pheromone trails, local pheromone (history) coefficient indicated with σ controls the amount of contribution history plays in a components probability of selection and set to 0.1, a problem-dependent heuristic function (h ) that measures the quality of items that can be added to the current partial solution.