The rest of the paper, is described out the combined process of Hierarchical and Partition hybrid approach of (BIRCH with CLARANS), (CURE with CLARANS) clustering process and finding the outliers in data streams, it is followed by new enhanced hybrid approach techniques(E-CURE with E-CLARANS) for data stream clustering algorithm for detecting the outliers, discussed in the next section. It is followed by two performance factors are carried out here as clustering accuracy and outlier detection accuracy for analysis. Lastly, the conclusion for the proposed techniques is given.
Data stream in clustering based outlier detection is one of the challenging tasks in data mining. The data mining, in general, deals with the discovery of non-trivial, hidden and interesting knowledge from different types of data. With the development of information technologies, the number of databases, as well as their dimension and complexity grow rapidly. It is necessary what we need to automated the analysis of great amount of information. For that data stream technique is used to handle that stream of data. (Diksha Upadhyay, Susheel Jain, Anurag Jain, 2010) discussed about a data stream mining, the streamed data can be high dimensional various dimension reduction techniques were applied on it prior to it clustering. And also they had been provided the comparative analysis of various stream mining procedures and dimension reduction techniques. The simplest and popular clustering algorithm is k-mean algorithm is used for data stream to detect the outlier.(Hossein Moradi Koupaie , Suhaimi Ibrahim, Javad Hosseinkhani, 2013) put forth cluster based outlier detection in data stream. K-means algorithm using start enter data window with specify size. Report these data as outlier of online and store in memory clustering these data in window by K means algorithm and also they add previous outlier in n previous window to this window, finding some cluster that are small and faraway of other clusters as outlier.
Hierarchical based clustering algorithm are used for group the data in data streams ,(Shifei Ding, Fulin Wu, Jun Qian, Hongjie Ji ;2013)a proposed a typical data stream clustering algorithms proposed in recent years, such as Birch algorithm, Local Search algorithm, Stream algorithm and CluStream algorithm. The authors summarized about the latest research achievements in this field and introduce some new strategies to deal with outliers and noise data.
(Luis Torgo, Carlos soares, 2010) put forth a methodology for the application of hierarchical clustering methods to the task of outlier detection and the methodology is tested on the problem of official statistics data and the objective is to detect erroneous foreign trade transactions in data collected by the Statistics of institute (INE). The method is based on the output of hierarchical in agglomerative clustering methods. In this research work the authors compared the outlier ranking method (LOF) it achieved better results on this particular application and the experimented results are also competitive with previous results on the same data. At last, the outcome of the experimental results raises important questions concerning the method currently followed at INE concerning items with small number of transactions.
(Prodip Hore, Lawrence O. Hall, and Dmitry B. Goldgof, 2008) suggested a clustering streaming data using soft clustering algorithms. The algorithms must enable cluster centroids to be extracted with weights based on the number of examples partially assigned them. Then the modifications for three types of fuzzy clustering algorithms namely, FCM, GK and PCM and the number of cluster centers and that most chunks of data are reasonably representative of the class mixture. It is clearly possible to create scenarios where the data is ordered in a strange way and the algorithm will produce unpredictable and it’s applied to other types of clustering algorithms that result in centroids.
(Sudipto Guha, et.al, 2003) discussed a clustering algorithm called CURE and it is used for detecting outliers. CURE achieves by representing point per cluster its allow CURE to adjust well to the geometry of non-spherical shapes and the reduction helps to reduce the effects of outliers. The combination of random sampling and partitioning and the experimental results confirm that the quality of clusters produced by CURE is much better than those found by existing algorithms. Moreover, the authors expressed the partitioning and random sampling enable CURE to not only perform existing algorithms but also to scale well for large databases without sacrificing the quality of cluster.
(Ren,J.W;Qunhui;Zhang,Jia;Hu,Changzhen,2009)suggested approach, for Heterogeneous Data Streams, which divide the stream in chunks. After that, each chunk is clustered and the equivalent clusters are kept in same cluster situations. The amount of adjacent cluster situations and the illustration degree are calculated to create the final outlier references such as potential outliers. The experimental findings had given better scalability and higher investigation accuracy. For an efficient outlier detection method (S.D.Pachgade, S. S. Dhande, 2012) proposed an algorithm which is use to group the data in to number of clusters. Due to reduction in size of dataset, the computation time reduced considerably. By using threshold value from user they had been calculate the outliers in a cluster, they had been proved the hybrid approach takes less computation time while clustering.
...(download the rest of the essay above)