Abstract- Classification is very important in data mining. It is nothing but categorization of data for its most effective and efficient use. In basic approach to storing data ,data can be classified according to its importance or how often it needs to be accessed. Decision tree is one of the classification technique. Decision tree is used to clarify and find solution to complex problem. Structure of decision tree contains multiple possible solutions and displays it in a simple, easy to understand format. There is different algorithm used for classification. In this paper tree is constructed using the geometric structure of data it builds small decision trees and gives better performance. Now we will use adaptive boosting method for boosting decision tree so it improving the accuracy of decision tree. This paper proposes a modified geometric decision tree algorithm using boosting process.

Introduction

With the continuous development of database technology and the extensive applications of database management system, the data volume stored in database increases rapidly and in the large amounts of data much important information is hidden. If the information can be extracted from the database they will create a lot of potential profit for the companies, and the technology of mining information from the massive database is known as data mining. Data mining tools can forecast the future trends and activities to support the decision of people. For example, through analyzing t he whole database system of the company the data mining tools can answer the problems such as “Which customer is most likely to respond to the e -mail marketing activities of our company, why”, and other similar problems. S ome data mining tools can also resolve some traditional problems which consumed much time, this is because that they can rapidly browse the entire database and find some useful information experts unnoticed. Neural network is a parallel processing network which generated with simulating the image intuitive thinking of human, on the basis of the research of biological neural network, according to the features of biological neurons and neural network an d by simplifying, summarizing and refining. It uses the idea of non-linear mapping, the method of parallel processing and the structure of the neural network itself to express the associated knowledge of input and output. Initially, the application of the neural network in data mining was not optimistic, and the main reasons are that the neural network has the defects of complex structure, poor interpretability and long training time. But its advantages such as high affordability to the noise data and low error rate, the continuously advancing and optimization of various network training al gorithms, especially the continuously advancing and improvement of various network pruning algorithms and rules extracting algorithm, make the application of the neural network in the data mining increasingly favoured by the overwhelming majority of users . In this paper the data mining based on the neural network is researched in detail .

Decision Trees A decision tree is a classifier expressed as a recursive partition of the instance space. The decision tree consists of nodes that form a rooted tree, meaning it is a directed tree with a node called “root” that has no incoming edges. All other nodes have exactly one incoming edge. A node with outgoing edges is called an internal or test node. All other nodes are called leaves (also known as terminal or decision nodes). In a decision tree, each internal node splits the instance space into two or more sub-spaces according to a certain discrete function of the input attributes values. In the simplest and most frequent case, each test considers a single attribute, such that the instance space is partitioned according to the attribute’s value. In the case of numeric attributes, the condition refers to a range. Each leaf is assigned to one class representing the most appropriate target value. Alternatively, the leaf may hold a probability vector indicating the probability of the target attribute having a certain value. Instances are classified by navigating them from the root of the tree down to a leaf, according to the outcome of the tests along the path. Internal nodes are represented as circles, whereas leaves are denoted as triangles.Note that this decision tree incorporates both nominal and numeric attributes. Each node is labeled with the attribute it tests, and its branches are labeled with its corresponding values. In case of numeric attributes, decision trees can be geometrically interpreted as a collection of hyperplanes, each orthogonal to one of the axes. Naturally, decision-makers prefer less complex decision trees, since they may be considered more comprehensible. the tree complexity has a crucial effect on its accuracy. The tree complexity is explicitly controlled by the stopping criteria used and the pruning method employed. Usually the tree complexity is measured by one of the following metrics: the total number of nodes, total number of leaves, tree depth and number of attributes used. Decision tree induction is closely related to rule induction. Each path from the root of a decision tree to one of its leaves can be transformed into a rule simply by conjoining the tests along the path to form the antecedent part, and taking the leaf’s class prediction as the class value. The resulting rule set can then be simplified to improve its comprehensibility to a human user, and possibly its accuracy .

Geometric Decision Tree

The performance of any top-down decision tree algorithm depends on the measure used to rate different hyperplanes at each node. The issue of having a suitable algorithm to find the hyperplane that optimizes the chosen rating function is also important. For example, for all impurity measures, the optimization is difficult because finding the gradient of the impurity function with respect to the parameters of the hyperplane is not possible. Motivated by these considerations, here, we propose a new criterion function to assess the suitability of a hyperplane at a node that can capture the geometric structure of the class regions. For our criterion function, the optimization problem can also be solved more easily. We first explain our method by considering a two-class problem. Given the set of training patterns at a node, we first find two hyperplanes, i.e., one for each class. Each hyperplane is such that it is closest to all patterns of one class and is farthest from all patterns of the other class.We call these hyperplanes as the clustering hyperplanes (for the two classes). Because of the way they are defined, these clustering hyperplanes capture the dominant linear tendencies in the examples of each class that are useful for discriminating between the classes. Hence, a hyperplane that passes in between them could be good for splitting the feature space. Thus, we take the hyperplane that bisects the angle between the clustering hyperplanes as the split rule at this node. Since, in general, there would be two angle bisectors, we choose the bisector that is better, based on an impurity measure, i.e., the Gini index. If the two clustering hyperplanes happen to be parallel to each other, then we take a hyperplane midway between the two as the split rule. As can be seen, although this hyperplane promotes the (average) purity of child nodes, it does not really simplify the classification problem; it does not capture the symmetric distribution of class regions in this problem. the two clustering hyperplanes for the two classes and the two angle bisectors, obtained through our algorithm, at the root node on this problem. As can be seen, choosing any of the angle bisectors as the hyperplane at the root node to split the data results into linearly separable classification problems at both child nodes. Thus, we see here that our idea of using angle bisectors of two clustering hyperplanes actually captures the right geometry of the classification problem. This is the reason we call our approach “geometric decision tree (GDT).”

Gini Index:

Gini index is an impurity-based criterion that measures the divergences between the probability distributions of the target attribute’s values. The Gini index has been used in various works and it is defined as:

Gini(y, S) = 1- ‘_(c_(j ?? dom(y)))”(|??_(y=c_j S) |/|S| )’^2

Relevant Mathematics The Presented algorithm for learning oblique decision trees is based on the idea of capturing, to some extent, the (linear) geometric structure of the underlying class regions.Given a set of training patterns at a node, algorithm first finds two clustering hyper planes, one for each class. Each hyper plane is such that it is closest to all patterns of one class and is farthest from all patterns of other class. This formulation leads to two generalized Eigen value problem. The Mathematical Model for the Presented System is given here. Identify the input data set a S_(1= { S_(1 ,) S_2 ,S_3 , S_4 , S_5 } )Where S ‘ Input Data Set Identify the data classes ‘CL’_(1 = { ‘CL’_1 , ‘CL’_2 , ‘CL’_3, ‘CL’_4, ‘CL’_5 }) Where CL ‘ The Data Class

Process : Given a set of data points at a node, we find hyper planes that have most of the data points of same class. Find the best hyper-plane to split the data. Let C_1 be the set of points contains points of the majority class. Let C_2 be the set of points contains points of remaining class. Let A be the matrix containing points of C_1 . Let B be the matrix containing points of C_2 . Let W_1 be the first hyper-plane. Let W_2 be the second hyper-plane. Let W_3 be the first bisecting angle. Let W_4 be the second bisecting angle. Such that W_3 = ‘ W’_1 + W_2 W_4 = W_1 + W_2 Choose the angle bisectors having lesser Gini Index G Where { n_+^tl} – Number of points in matrix A that goes to left child. { n_+^tr} – Number of points in matrix A that goes to right child. { n_-^tl} – Number of points in matrix B that goes to left child. { n_-^tr} – Number of points in matrix B that goes to right child. NP Complete NP – Non Deterministic Polynomial Time Problem The problems which give us different- different output to the same input fields is call as NP-Complete Problems so that we cannot predict whatever output our project will give when we give same input to the project. According to different different topology. The presented algorithm for learning oblique decision tree is based on the idea of capturing, to some extent, the geometric structure of the underlying regions.Given a set of training patterns at a node, algorithm first finds two clustering hyperplanes , one for each class. Each hyperplane is such that it is closest to all patterns of one class and is farthest from all patterns of other class. This formulation leads to two generalized eigen value problem.These are the Solutions of a general Optimization problem . The Mathematical Model for the presented system is given here. Lets consider the problem of classifying m points in the n dimensional real space {R^n}. A word about some notations: All vectors will be column vectors unless transposed to a row vector by a prime superscript. The scalar product of two vectors x and y within the n dimensional real space {R^n} will be denoted by xy . The two norm of x will be denoted by x. For a matrix ?? { R^(m X n) } , ‘{ A’_i} is the i^th row of A which is a row vector in {R^n} . An arbitrary dimension column vector of ones will be denoted by e. The arbitrary order identity matrix will be denoted by I. Given some training data S, a set of m points of the form S= { ( X_(i ,) Y_i ) } ; x_i ??R^n ; Y_(i ) ?? {-1,1} i= 1,”’.,m} Where the ‘ Y’_(i )is either 1 or -1, indicating the class to which the point X_i belongs. Each X_i is an n-dimensional real vector. Let A??R^(m_(1 Xn) ) represent the data set of class -1 and B??R^(m_(2 Xn) ) represent the data set of class 1. Then , by referring [3], get G:= [A-e]'[A-e]+??I H=: [B-e]'[B-e] By solving generalized eigenvalue equation Gz= ??Mz, get (W_(1 ) ) ?? := x’??^1- ??^1 =0 ‘ similarly, L:= [B -e]'[B-e]+??I M:= [A-e]'[A-e] By solving generalized eigen value equation Lz= ??Mz, get (W_(2 ) ) ?? := x’??^2- ??^2 =0 ‘ Once clustering hyperplanes are found, the hyperplane that associate with the current node will be one of the angle bisectors of these two hyperplanes. Let (W_(3 ) ) ?? := x’??^3- ??^3 =0 and (W_(4 ) ) ?? := x’??^4- ??^4 =0 be the angle bisectors of x’??^1- ??^1 =0 and x’??^2- ??^2 =0 . It is easily shown that these hyperplanes are given by (in the case ??^1′??^2 ) (W_(3 ) ) ??=(W_(1 ) ) ??+(W_(2 ) ) ‘?? (W_(3 ) ) ??=(W_(1 ) ) ??-(W_(2 ) ) ‘?? As mentioned earlier , the angle bisector is chosen which has lower impurity .Here the gini index is used to measure impurity. Let (W_(t ) ) ??be a hyperplane which is used for dividing set of patterns S_t in two parts S_l^t and S_(r. )^t Let m_(-1 )^tl and m_1^tl denote the number of patterns of the two classes in the set S_1^t and m_(-1)^tr and m_1^tr denote the number of patterns of the two classes in the set S_(r.)^t Then gini index of hyperplane (W_(t ) ) ??is given by, Gini((W_t ) ??) = m^tl/m^t [ 1 ‘ ‘((m_1^tl)/m^tl )’^2 -‘((m_(-1)^tl)/m^tl )’^2] + m^tr/m^t [ 1 ‘ ‘((m_1^tr)/m^tr )’^2 -‘((m_(-1)^tr)/m^tr )’^2] -‘ The algorithm chooses (W_(3 ) ) ?? or (W_(4 ) ) ?? to be the split rule for S^t based on which of the two gives lesser value of gini index given by ‘ when the clustering hyperplanes are parallel (that is ??^1=??^(2 )) the hyperplane is chosen midway between them as the splitting hyperplane . It is given by, (W_(3 ) ) ??= (??, ??)= ( ??^1, (??^1+??^2)/2 ) As is easy to see, in this method , the optimization problem of finding the best hyperplane at each node is solved exactly rather than by relying on a search technique based on local perturbations of the hyperplane parameters . These clustering hyperplanes are obtained be solving the generalized eigen value problem . Afterward, to find the hyperplane at the node , it is needed to compare only two hyperplanes based on gini index. Contribution Work: Details Of Real World Data Sets Used From UCI ML REPOSITORY

Table I

Data Set Dimension #Points #Classes Class Distribution

Breast-Cancer 10 683 2 444,239

Liver Disorder 6 345 2 145,200

Pima Indian 10 768 2 268,500

Magic 10 6000 2 3000,3000

Heart 13 270 2 150,120

Votes 16 232 2 108,124

Wine 13 178 3 59,71,48

Vehicle 18 846 4 199,217,218,212

Balance Scale 4 625 3 49,288,288

Waveform 1 21 5000 3 1647,1696,1657

Adaboost Algorithm

1. Build distribution D_1, assuming all samples equally important 2. For t= 1,’.,T (rounds of boosting) – Select weak classifier with the lowest error from a group – Check if error larger than ?? (Yes:terminate ; NO: go on) -Calculate confidence parameter , weight of sub-classfier ??_(t= ) 1/2 ‘ln(”'(1-??_t)/??_t )’ > 0 – Re-weight data samples to give poorly classified samples an increased weight D_(t+1= D_t/Z_t {‘(e^(‘-‘??_t ) if y_i= h_(t() x_i) @.@e^(??_(t if’ y’_i’ h_(t() x_i) ) ) )’ ) Where Z_t is the normalization factor 3 . At the end (tth round), the final strong classifier results H_final (x)=sgn(‘_t”??_t h_t (x)’)

Conclusion

In this Paper a new algorithm is design which creates multiple decision trees at each node of decision tree found two clustering hyperplanes and angle bisector as split rule. Based on this we create multiple decision trees by which we got average split value and it increases accuracy in terms of confusion matrix, precision & recall.

FUTURE WORK

1. Apply gradient boosting instead of adaptive

boosting.

2. Introduce pruning process to remove the excess

nodes created in the decision tree so that the

generalization performance of the decision tree can

be improved.

3. There are multiple pruning processes available in

literature. So need to find out which one performs

best with GDT.

References: 1. ‘ Geometric Decision Tree’ Naresh Manwani and P. S. Sastry 2012.

2. ‘A Streaming Parallel Decision Tree Algorithm Yael Ben-Haim’ , Elad Tom-Tov2010.

3. ‘Communication and Memory Efficient Parallel Decision Tree Construction’ Ruoming Jin Gagan Agrawal 2010.

4. ‘ A Multi-relational Decision Tree Learning Algorithm – Implementation and Experiments’ Anna Atramentov, Hector Leiva, and Vasant Honavar 2003.

5. ‘a new algorithm for generation of decision trees’ jerzy w. grzyma??a-busse1,2, zdzis??aw s. hippe2, maksymilian knap2 and teresa mroczek2 2004.

6. ‘ An Efficient Boosting Algorithm for Combining Preferences’ Yoav Freund, Raj Iyer, Robert E. Schapire, Yoram Singer 2003.

7. ‘A Short Introduction to Boosting’ Yoav Freund Robert E. Schapire 1999.

8. ‘AdaRank: A Boosting Algorithm for Information Retrieval’ Jun Xu Hang Li 2007.

9. ‘A Fast Decision Tree Learning Algorithm’ Jiang Su and Harry Zhang 2006.

10. ‘Top-down induction of decision trees classifiers’A survey’ L. Rokach and O. Maimon 2005

11. ‘Inducing oblique decision trees with evolutionary algorithms,’ E. Cant??-Paz and C. Kamath, 2003

12. ‘New algorithms for learning and pruning oblique decision trees,’ S. Shah and P. S. Sastry, 1999

13. ‘A geometric algorithm for learning oblique decision trees,’ N. Manwani and P. S. Sastry, 2009

Table II Result

Acknowledgement:

The proposed system is based on IEEE Transaction paper under the title ‘Geometric Decision Tree’ published in ‘IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS’PART B: CYBERNETICS, VOL. 42, NO. 1, FEBRUARY 2012’.

Data Set classes Boosting trial nodes GDT Model before Adaboosting GDT Model After

Adaboosting

Training Data Testing Data Training Data Testing Data

2X2 Checkerboard data set 2 9 52 82.875 80.5 82.875+0.625

80.5+1.5

4X4 checkerboard data set 2 12 62 73.625 71 73.625+2.875 71-2

Bank full balanced 2 7 62 62.86 62.48 62.86+0 62.48+0

Cancer 2 10 62 76.62 77 76.62+2.38 77

+1.5

checker 2 10 40 76.62 77 76.62+2.38 77+1.5

Data bank note authentication 2 7 16 98.9 98.54 98.9+0 98.54+0

iris 2 7 24 85.83 86.66 85.83+12.5 86.66+10

Iris2 2 7 24 92.5 90 92.5+0 90+0

wdbc 2 10 52 78.46 60.52 78.46+0.66 60.52+0

wine 2 15 30 88.02 69.44 88.02+9.16 69.44+13.89

**...(download the rest of the essay above)**