Data Mining Techniques Used in Fraud Detection - Explained

Data mining is the process of finding insightful patterns in a data which were unknown previously and can be used to make decisions. The data mining problem must have a well-defined business objective and also, the problem should not be easily solved by executing queries on a structured table or using reporting tools. The Oxford English Dictionary [55], p. 562] defines fraud as “wrongful or criminal deception intended to result in financial or personal gain.” Phua et al. [58] describe fraud as leading to the abuse of a profit organization's system without necessarily leading to direct legal consequences. Wang et al. [78], p. 1120] define it as “a deliberate act that is contrary to law, rule, or policy with intent to obtain unauthorized financial benefit.”

Fraud detection is essential to identify the fraud associated with institutions and prevent the severe consequences of the fraud activities. The method adopted for the Fraud detection is distinguishing the fraud data from legitimate transactions or data. It is often challenging to identify the authenticity of an application or transaction. The best possible option is to construct evidences of fraud activities from the existing data using mathematical or statistical algorithms.

DATA AND MEASUREMENTS:

There are not many publicly available data to perform study on the Fraud detection. It is extremely difficult to gather real-time data from the organizations due to the legal and competitive reasons. To outwit the data availability problems, the synthetic data are created which substitutes the actual data. Barse et al (2003) has proposed five step synthetic data generation process and supports the creation of synthetic data.

The alternative to above approach is to mine the email data in spam. This data can be easily correlated with fraud detection and is available in large quantities. But, the data is a large source of unstructured data and would require effective text processing or feature selection operations.

PERFORMANCE MEASURES:

The Fraud detection departments place a monetary value on the results to increase the profit of the organization. They can either define explicit cost (Phua et al, 2004; Chan et al, 1999; Fawcett and Provost, 1997) or benefit models (Fan et al, 2004; Wang et al, 2003).

The fraud detection studies prominently using the supervised algorithms have stopped using measurements such as accuracy at a given threshold and true positive rate. In the fraud detection systems, the misclassification costs can differ between the examples and can also change over time. Here, a false negative error is usually costlier than a false positive error.

The other important performance measure includes the speed at which a fraud activity is detected, the ability to detect various types of fraud, and if the detection was done in real-time mode or in a batch processing mode.

METHODOLOGICAL FRAMEWORK FOR RESEARCH:

The methodological framework for research in fraud detection can be divided into 3 different phases,

1. Research definition: In this phase the expected research area, research scope and research goals are determined.

2. Research methodology: In this phase, the criteria for searching, selecting and creating a framework to classify the data is established.

3. Research analysis: In this phase, some conclusions are drawn based on the results and future prospects of the research are identified.

Based on the literature-based research review on the fraud detection, the methods and techniques for fraud detection are designed and evaluated.

EXISTING METHODS AND TECHNIQUES:

Many existing fraud detection systems operate by matching the existing records of fraudulent activities with activities which appear like frauds in the new instances. A remarkable concept derived from spam (Fawcett, 2003, p144, figure 5) is to identify the sequential nature of fraud by tracking the frequency and the category of the activities. The below are some of the complex natures of the data available for fraud detection,

1. The volume of the fraud and the legitimate activity data will fluctuate independently of each other. Thus, the distributions of the classes may vary over time.

2. Multiple styles of frauds may occur in the same time. Or fraud can be seasonal, one time or even regular.

3. The definition of the legitimate transactions may change over time.

CLASSIFICATION OF FRAUD:

The financial fraud is classified broadly into four major categories,

1. Bank fraud:

According to Connell University Law School (CULS) [7], bank fraud is defined as “whoever knowingly executes, or attempts to execute, a scheme or artifice (1) to defraud a financial institution; or (2) to obtain any of the moneys, funds, credits, assets, securities, or other property owned by, or under the custody or control of, a financial institution, by means of false or fraudulent pretenses, representations, or promises.” The Bank fraud includes the credit card fraud, mortgage fraud and money laundering. The credit card fraud is identified by the unusual transactions and misuse of the card. Money laundering allows the criminals to bring their illegal money into the stream of commerce. Mortgage fraud is the material misstatement or misinterpretation.

2. Insurance fraud:

The insurance fraud can occur at various steps of insurance process including application, eligibility, rating, billing and claims. It can be committed by a variety of people involved in the business. Insurance fraud includes the healthcare, automobile insurance and different kinds of frauds.

3. Securities and commodities fraud:

According to another definition by CULS [7], securities frauds include theft from manipulation of the market, theft from securities accounts, and wire fraud.

4. Computer Intrusion:

Intrusion is defined as the potential possibility of a deliberate unauthorized attempt to access information, manipulate information, or render a system unreliable or unusable. [8]. Intrusion activity can be performed by a person inside the organization or from outside.

5. Telecommunication fraud:

For a network carrier a fraud is expensive in terms of both wasted capacity and loss in income. The telecommunication fraud can be broadly classified into two types as subscription fraud and superimposed fraud. Subscription fraud often occurs while obtaining a subscription of a service with false identity and without an intention of paying back. Superimposed fraud occurs from using a service without a necessity. The mobile phone cloning, ghosting, tumbling are some types of superimposed fraud.

DATA MINING TECHNIQUES USED IN FRAUD DETECTION:

The below are the possible data mining techniques which can be used in fraud detection. A brief description of each of the technique is given below.

1. Clustering:

Clustering methodology forms a significant group of objects with similar characteristics using automated techniques. Cluster analysis involves the breakdown of data into related components in such a way that the patterns become observable. Bolton & Hand (2002) suggest two clustering techniques for behavioral fraud detection. Peer Group Analysis (PGA) is a system which identifies transactions that differ at a given moment which were behaving the same way previously. Fraud Analysis (FA) will later investigate the cases identified by PGA. Clustering is an unsupervised Classification technique. In Classification, the objects are assigned to predefined classes.

2. Neural Networks:

A Neural Network is a set of interconnected nodes which are specifically designed to imitate the human brain. Each node has a weighted connection to several other neighboring nodes. Neural network represents complex mathematical equations, algorithms and parameters to copy the human brain. Neural Networks are applied to a broad range of the supervised and unsupervised learning applications. They have the ability to form logical models that does not require extreme training. They can modify their behavior with respect to the new environment with the general capability of evolution from the present environment to the new situation. In this approach, the Neural Networks use the Back Propagation algorithm and the Multi-layer neural network. The Back Propagation iteratively processes a data set containing training tuples, comparing the result of each tuple with real target value which is known. These modifications are made in the backward direction, which is, the traversal is from the output layer through the first layer. This algorithm is most appropriate when the results of the model has high importance.

3. Bayesian Belief Network:

Bayesian Belief Networks provide a graphic model of relationships on which membership probabilities are predicted (Han et al. 2000). It uses a directed acyclic graph (DAG) to represent a set of random variables and their conditional independencies. Here the nodes represent the missing variables. Naïve Bayesian classification assumes that the attributes of an instance are independent, given the target attribute (Feelders et al. 2003). This model is often adopted in corporate fraud detection, credit card fraud detection and automobile insurance fraud detection.

4. Decision Trees:

Decision trees are machine learning techniques that express independent attributes and a dependent attribute in a tree-shaped structure that represents a set of decisions (Witten et al. 1999). Each prediction of the decision tree is represented using leaves and the aggregation of the features by the branches of the tree. Decision trees are used in credit card, automobile insurance, and corporate fraud.

Conclusion:

This paper begins with an overview of the concepts of data mining and fraud detection, followed by a discussion of data and its corresponding measurements, performance measures, the framework for research, existing fraud detection techniques, classification of the types of fraud and the different data mining techniques that can be adopted to detect fraud.

Essay: Data Mining Techniques Used in Fraud Detection – Explained

Essay details and download:

Text preview of this essay:

Conclusion:

About this essay:

Essay details and download:

Text preview of this essay:

Conclusion:

About this essay:

Essay Categories: