A Review of Anomaly Detection Techniques in Online Social Network
Overview:
In this project, our group did a research on the Online Social Network Anomaly Detection. The reason we choose this topic is because there are more and more people connect internet and use social media causes to share, communicate and collaborate. However, it also causes some problems, more and more convenient social media platform has been used by evildoers. They spread malicious on social media platform such as bullying, terrorist attack planning and fraud information dissemination. So, to present these issues from happening, it is important to find out techniques to detect them and stop them.
The scope of this project will be anomaly detection within an online social network platform. So, to structure our research paper, we divided the paper into several parts:
1. Classify exist anomalies in social network.
2. Explore appropriate techniques for anomaly detection and try to classify these techniques as well.
3. analyze the challenges of these techniques and the future works need to work on.
Based on our research, we divided the social network anomaly into three criteria as below: [1] [10]
1. Nature of anomalies
a. Point anomalies
b. Contextual anomalies
c. Collective anomalies
d. Horizontal anomalies
2. Information available in network/graph structure
a. Dynamic anomalies
b. Static anomalies
c. Labeled anomalies
d. Unlabeled anomalies
3. Behavior
a. White crow anomalies(outlier)
b. indistinguishable
After that, we discover various approaches to the anomaly detection in social network anomaly detection. Based on the type of method, we divided the them into four parts:
1. Classification-based
a. SVM [2]
b. Bayesian [3], [13]
c. Neural network [4]
2. Statistical-based
a. Mixture model [5]
b. Signal processing [14]
c. PCA [15]
3. Clustering-based
a. K-mean algorithm [6]
b. Fuzzy C-means and Genetic Algorithm [7]
c. Co-clustering [8]
4. Network-based [9]
Based on the input format, we divided the methods into to activities-based and graph-based method.
Based on the Temporal factor, we discovered some methods for the Anomaly detection in static and dynamic social network.
Overall, this project is interesting and also, we did find some challenges. The first challenge is the type of input data, since there exist lots of kinds of input data, it can be image, text, audio, etc. And it makes it difficult to design a anomalies detection machine that cover all kinds of anomalies and that’s why most of methods are very specific to one kind of use-case and dataset, which mean the method is not universal. The second challenge is lack of public open-source dataset and tools. Since the dataset preparing is a key part of training step, lack of dataset will make us spend a lot of time on collecting data, which is not efficient. Other challenges such as obtain ground truth label for anomalies and noise in dataset, etc. are also needing to be resolved.
Contributions:
Contribution 1: SVM-based Method for Social network detection [2]
I was responsible to the SVM method in the classification-based method section. In this part, I did a lot of research related to it. Since I did not have any prior background related to machine learning, so I put a lot of efforts on studying the basic concept of SVM so that I am able to understand the advanced version of SVM in other papers. SVM (Support Vector Machine), is an algorithm that allow us to construct a classifier for multi class problem. The key part of the algorithm is to find out a decision boundary that best separate and has largest margin for the classes, we called this decision boundary as hyper-plane. So SVM is an efficient way to solve binary classification problem, but how do we apply it to anomaly detection? The answer is yes if we can collect enough anomalies data. However, in our real-life, the data we collect is always unbalance, the number of positive data we collected in real life is usually significantly more than negative one, which raise a big problem that the boundary we found might be not accurate. So, the general SVM method may not be fit for the Anomaly Detection Problem.
Approaches:
To fix that issue, I read some papers related to it, there is a paper named One-Class SVMs for Document Classification, it lists two possible enhancements for current SVM algorithm, which is Schölkopf Methodology and outlier Methodology.
Approach one: Schölkopf Methodology
Figure 1: One-class SVM[2]
In Schölkopf’s theory [2], the one-class SVM algorithm is introduced. Instead of original two-class class for classification problem, there are only one class is involved. To achieve this, this algorithm mapped all the data into a feature space H with a suitable kernel function, and in the feature space, the origin is the only original member that in the second(negative) class [12], and the goal of this algorithm is to find out a hyperplane that separate the vectors and the origin vector with largest margin. To decide the margin, it is really complicated, we need to solve the following quadratic programming problem: [2]
Subject to
After solved the equation above, we get the decision function below:
If f(xi) is positive, then the corresponding data Xi is normal data which is in the positive class, if f(xi) is negative, then the corresponding data Xi is abnormal, which is on the negative class.
Outlier Methodology [2]: By contrast, this approach is similar to previous, the only different is origin point is no longer the only point the negative class, all the data point that close enough to the origin belong to the negative class. So, we need to decide the threshold of distance from origin. There is no fixed solution for that, generally, we try different global values for threshold or we can decide different individual thresholds for the different categories.
Figure 2: Outlier SVM [2]
Finally, we compare the performance between these two methods and in most case Schölkopf’s approach is better, but for large categories, outlier SVM works better.
Contribution 2: Identify different aspects of anomaly detection Problem.
This task assigned to me on the early phase of this project, and I am responsible to identify different kinds of anomaly detection Problem. To achieve this, I read the several papers that related to it such as An Effective Technique to Identify Anomalous Accounts on Social Networks using Bloom Filter [1] and Anomaly detection in online social networks [11].
And after I finished these papers, I find out 10 different kinds of anomalies, which is the anomalies that I list in the overview.
Contribution 3: build up paper table
The paper table can be found on page 48 in the final report. The table maps each paper to corresponding techniques, so that reader can easily realize what techniques used in each paper.
Lessons learn:
From this project, I learn a lot of things. Firstly, it is the first time I work with a huge group, and it is challenge to work with a big group, since most of team members come from different countries, and they get different cultures. Luckily, we have a good team leader, and she managed everything reasonable, every team member has task to work on every week.
Secondly, I realized the establishing a clear research focus is the most important step at the beginning of research, the clear and small goal can help us better go into details of selected topic and struct our report.
Thirdly, I realized that
communication is a key part of group work, you can always get answers during communication in which you may take a lot of time by you own. And by the way, you may reap a friendship during the communication.
In addition, it is the first I work on a project without any prior knowledge related to the topic, so I paid a lot of effect on studying the basic concept, I found that reading papers is an efficient way to learn new knowledges, before I took this course, I thought stack overflow is a good place to get answers and I now I realize that the answer from stack overflow may be one-side and fragmented, The better way is always read papers or even textbook, since the knowledge from papers and textbook are more rigorous and comprehensive,
Reference
[1] Kaur, S. and Kaur, P., 2017. An Effective Technique to Identify Anomalous Accounts on Social Networks using Bloom Filter. International Journal of Computer Applications, 164(11).
[2] Manevitz LM, Yousef M. One-class SVMs for document classification. J Mach Learn Res 2002;2:139–54.
[3] Chen, C.M., Guan, D.J. and Su, Q.K., 2014. Feature set identification for detecting suspicious URLs using Bayesian classification in social networks. Information Sciences, 289, pp.133-147.
[4] Moradi M, Zulkernine M. A neural network based system for intrusion detection and classification of attacks. In: Proceedings of the 2004 IEEE international conference on advances in intelligent systems-theory and applications; 2004.
[5] Zhang, B., Ma, L. and Krishnan, R., 2011. Statistical analysis and anomaly detection of sms social networks.
[6] Pires A, Santos-Pereira C. Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the international conference on robust statistics; 2005.
[7] 1- Akoglu, L., McGlohon, M. and Faloutsos, C., 2009. Anomaly detection in large graphs. In In CMU-CS-09-173 Technical Report.
[8] Kaufman L, Rousseeuw PJ. Clustering Large Applications (Program CLARA). Find Groups Data: Introduct Cluster Anal 2008:126–63.
[9] Ahmed, M., Mahmood, A.N. and Hu, J., 2016. A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60, pp.19-31.
[10] Sarbjeet kaur, Prabhjot Kaur "Review of different types of Anomalies and Anomaly detection techniques in Social Networks based on Graphs". International Journal of Computer Trends and Technology (IJCTT) V47(2):116-121, May 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
[11] Savage, D., Zhang, X., Yu, X., Chou, P. and Wang, Q., 2014. Anomaly detection in online social networks. Social Networks, 39, pp.62-70.
[12] Li K-L, Huang H-K, Tian S-F, Xu W. Improving one-class SVM for anomaly detection. Int Conf Mach Learn Cybernetics 2003;5:3077–81.
[13] Wang, A.H., 2010, July. Don't follow me: Spam detection in twitter. In Security and cryptography (SECRYPT), proceedings of the 2010 international conference on (pp. 1-10). IEEE.
[14] Miller, B.A., Bliss, N.T. and Wolfe, P.J., 2010, March. Toward signal processing theory for graphs and non-Euclidean data. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on (pp. 5414-5417). IEEE.
[10] Berkhin P. A survey of clustering data mining techniques.Group Multidimens Data 2006:25–71.
[15] Viswanath, B., Bashir, M.A., Crovella, M., Guha, S., Gummadi, K.P., Krishnamurthy, B. and Mislove, A., 2014, August. Towards Detecting Anomalous User Behavior in Online Social Networks. In USENIX Security Symposium (pp. 223-238).