1.1. Introduction
Intrusion detection system (IDS) is a device/software application that monitor network or system activities for malicious activities or policy violation and produces reports to a management station. IDS come in varios approach the goal of detecting suspicious traffic in different ways. There are intrusion detection system called network based (NIDS) and host based (HIDS). NIDS is a network security system focusing on the attacks that come, from inside of the network (authorised users). When the designing of the NIDS according to the system interactivity property is classified, there are two types; on-line and off-line NIDS. On-line NIDS deal with the network in real time and it analyses the Ethernet packet and applies it on the some rules used to decide, if it is an attack or not. Off-line NIDS deals with a stored data and pass it for some process to decide if it is an attack or not. Some system may attempts to stops an intrusion attempt but it is neither required nor expected of a monitoring system. Intrusion detection systems (IDPS) primarily focused on identifying possible incidents, logging information of them, and reporting the attempts. In addition organizations use IDPS for other purposes, such as identifying problems with the security policies, documenting existing threat and deterring individuals from violating security policies. Now IDS have become a necessary addition to the security infrastructure of nearly every organization.
IDS typically record information related to observed the events, notify security administrators of important observed events and produce reports. Many IDS can also respond to a detected threat by attempting to prevent it from succeeding. They use several response technique, which involves the IDPS stopping the attack itself, there by changing the security environment (such as. reconfiguring a firewall) or changing the attack's content.
types Intrusion detection systems, namely network based (NIDS) and host based (HIDS) intrusion detection systems.
1.2 Network Intrusion Detection Systems
The Network Intrusion Detection Systems (NIDS) are placed at strategic points or points within network to monitors traffic to and from all devices on the networks. It performs an analysis of passing traffic on the entire the subnet, and matches the traffic which is passed on the subnets to the library of the known attacks. Once attack is identified or abnormal behavior is sensed, alert is sent to the administrator. Example of an NIDS would be installing it on the subnet where firewalls are located in order to sees if someone is trying to breaks into the firewall. Ideally one scan all inbound and outbound traffics, however doing so might create a bottleneck that would impair overall speed of the network. OPNET and the NetSim are commonly used tool for simulation network intrusion detection system. NID Systems are also capable of comparing signature for similar packets to link and drop harmful detected packets which may have a signature matching the records in the NIDS.
1.3 Host Intrusion Detection Systems
Host Intrusion Detection Systems (HIDS) CAN run on individual hosts or devices in the network. A HIDS monitors the inbound and outbound packets, from the device only and alert the users or administrators if suspicious activity is being detected. It takes a snapshot of existing system files and matches it to the previous snapshot. If the critical system files were modified/deleted, alert is sent to the administrators to investigate. Example of HIDS usage is seen on mission critical machines, which are not expected to change their configuration.
1.4 K-means Clustering
K-means clustering [11] this is one of the simplest unsupervised clustering algorithms. The algorithm take input parameter k and partition the n dataset into k cluster so that the intra-cluster similarity is high and inter-cluster similarity is low. K is a positive integer number given in advance. K means clustering takes use to less time as compared to the hierarchical clustering and yields better results.
With help of clustering training dataset is clustered into 5 dataset whereas 4 dataset will be a type of intrusion called attack dataset and one with normal data type called a normal dataset. Here are four steps of the clustering algorithms:
1) Define number of clusters K.
2) Initialize the K cluster centroids. This is done by arbitrarily dividing all objects into K cluster, computing their centroids, and verifying that all centroid are different from other. Alternatively the centroids can be initialized to K arbitrarily chosen different objects.
3) Iterates over all objects and compute the distance to centroids of all clusters. Assign each object to the cluster with the nearest centroids.
4) Recalculate centroids of both modified clusters.
5) Repeat the step 3 until the centroids do not change any more. A distance function is required so as to compute the distance between two objects. The commonly used distance function is the Euclidean one which is defined as: d(x,y) = √∑ Where x = (x1 . . .. xm) and y = (y1…..ym) are two input vector with m quantitative features. At Euclidean distance function, the features contribute equally to the function value, Since different type of features are usually measured with different metrics or different scale, they must be normalized.
1.5 SVM Classifier
SVM classifier [16] this is used to produce better results for binary classification when compared to other types of classifier. here non linear kernel function are used and resulting maximum margin of hyper-plane fits in a transformed feature space is a Hilbert space of infinite dimension.
A Support Vector Machine is a discriminative classifier defined by a separating hyperplane. In other words, given labeled training datas, the algorithm outputs an optimal hyperplane that categorizes new examples. The operation of the SVM algorithm is based on finding of the hyperplane that give the largest minimum distance to the training example.
This distance receives the important name of margin within the SVM theory. Therefore, the optimal separating hyperplane maximizes the margin of the training data.
1.6 Testing and Validation
For our experiments we are using KDD CUP 99 dataset. KDD CUP 1999 contains 41 fields as an attributes and 42nd field as a label. In our algorithm we have taken some selected features. The 42nd field can be generalized as Normal, DoS, Probing, U2R, and R2L. The description of KDD CUP 99 used for our method shown in table 1[17]. The performances of each method are measured according to the Accuracy, Detection Rate and False Positive Rate using the following expressions:
Where
FN is False Negative
TN is True Negative
TP is True Positive
FP is False Positive ALERT TYPE:-
• True Positive: : Attack – Alert
• False Positive: : No attack – Alert
• False Negative: : Attack – No Alert
• True Negative: : No attack – No Alert
• True Positive: A legitimate attack which triggers an IDS to produce an alarm.
• False Positive: An event signaling an IDS to produce an alarm when no attack has taken place.[3]
• False Negative: When there is no alarm raised when an attack has taken place.
• True Negative: This isAn event when no attack has taken place, and no detection is made.
• Noise: Data or interference that can trigger a false positive or obscure a true positive.
The detection rate is the number of attacks detected by the system divided by the number of attacks in the data set.False positive rate is the number of normal connection that are misclassified as attacks divided by the number of normal connections in the data set.
1.7 Fuzzy Logic:
Fuzzy Logic is a problem solving control Structure approach that gives itself to implementation in the systems which are ranging from multichannel PC or Workstation acquisition and control systems. It can be engaged in hardware, software, or in both. It offers a simple manner to attain on a definite decision based upon indefinite, ambiguous, inaccurate, noisy, or absent input information.
1.8 Problem Statement
Intrusion detection faces a number of challenges, it must reliably detect malicious activities in a network and must perform efficiently to cope with the large amount of network. Intrusion detection systems are gauged base on its detection precision and detection stability.
The majority of the current existing system faces a number of challenges such as low detection rate and high false alarm rate which falsely classify a normal connection as an attack and this therefore obstructs legitimate user access to the network resources.
These problems are due to the sophistication of the attacks and their intended similarities to normal behaviors. More intelligence is brought into the IDS by means of machine learning, theoretically its possible for a machine learning algorithm to archieve the best performance by maximizing the detection accuracy.
However, it requires infinite training sample sizes. This give rise towards enhancing the detection precision and stability.
Early researchers focused on using expert system and statistical approaches.But when encountered large datasets, the results becomes worse.
1.9 OBJECTIVES
Network security has become the key for a lot of financial and business web applications. Intrusion detection is one of the loom to resolve the problem of network security now.The Imperfectness of intrusion detection systems (IDS) give an opportunity for data mining to make severals important contribution to the field of intrusion detection system. Recent years, data mining techniques for building IDS are used. Here a propose approach by utilizing data mining techniques such as neuro-fuzzy and radial basis support vector machine (SVM) for helping IDS to attain higher, detection rates. Proposed technique has four major steps: i.e
K-means clustering which is is used to generate different training subsets, based on obtained training subset different neuro-fuzzy models are been trained. Subsequently a vector for SVM classification is then formed and in the end classification using radial SVM is performed, to detect whether intrusion take place or not. The experiments use KDD CUP 1999 dataset.