Benefits of Predictive Analytics for Business Insight.

1 | P a g e

1. What is Predictive Analysis?

Consider the power of predictive analytics:

• A Canadian bank uses predictive analytics to increase campaign response rates by 600%, cut

customer acquisition costs in half, and boost campaign ROI by 100%.

• A large state university predicts whether a student will choose to enroll by applying predictive

models to applicant data and admissions history.

• A research group at a leading hospital combined predictive and text analytics to improve its ability

to classify and treat pediatric brain tumors.

• An airline increased revenue and customer satisfaction by better estimating the number of

passengers who won’t show up for a flight. This reduces the number of overbooked flights that

require re-accommodating passengers as well as the number of empty seats.

As these examples attest, predictive analytics can yield a substantial ROI. Predictive analytics can help

companies optimize existing processes, better understand customer behavior, identify unexpected

opportunities, and anticipate problems before they happen.

1.1 High Value, Low Penetration. With such stellar credentials, the perplexing thing about predictive

analytics is why so many organizations have yet to employ it. According to our research, only 21% of

organizations have ―fully‖ or ―partially‖ implemented predictive analytics, while 19% have a project

―under development‖ and a whopping 61% are still ―exploring‖ the issue or have ―no plans.‖ Predictive

analytics is also an arcane set of techniques and technologies that bewilder many business and IT

managers.

1.2 Applications. Predictive analytics can identify the customers most likely to churn next month or to

respond to next week’s direct mail piece. It can also anticipate when factory floor machines are likely to

break down or figure out which customers are likely to default on a bank loan. Today, marketing is the

biggest user of predictive analytics with cross-selling, campaign management, customer acquisition, and

budgeting and forecasting models top of the list, followed by attrition and loyalty applications.

Fig. Among business intelligence disciplines, prediction provides the most business value but is also the most

complex. Each discipline builds on the one below it—these are additive, not exclusive, in practice

2 | P a g e

1.3 Versus BI Tools. In contrast, other BI technologies—such as query and reporting tools, online

analytical processing (OLAP), dashboards, and scorecards—examine what happened in the past. They are

deductive in nature—that is, business users must have some sense of the patterns and relationships that

exist within the data based on their personal experience. They use query, reporting, and OLAP tools to

explore the data and validate their hypotheses.

Predictive analytics works the opposite way: it is inductive. It doesn’t presume anything about the data.

Rather, predictive analytics lets data lead the way. Predictive analytics employs statistics, machine

learning, neural computing, robotics, computational mathematics, and artificial intelligence techniques to

explore all the data, instead of a narrow subset of it, to ferret out meaningful relationships and patterns.

Predictive analytics is like an ―intelligent‖ robot that rummages through all your data until it finds

something interesting to show you.

1.4 More Than Statistics. It’s also important to note that predictive analytics is more than statistics. Some

even call it statistics on steroids. Linear and logistic regressions—classic statistical techniques—are still

the workhorse of predictive models today, and nearly all analytical modelers use descriptive statistics

(e.g., mean, mode, median, standard deviation, histograms) to understand the nature of the data they want

to analyze.

However, advances in computer processing power and database technology have made it possible to

employ a broader class of predictive techniques, such as decision trees, neural networks, genetic

algorithms, support vector machines, and other mathematical algorithms. These new techniques take

advantage of increased computing horsepower to perform complex calculations that often require multiple

passes through the data. They are designed to run against large volumes of data with lots of variables (i.e.,

fields or columns.) They also are equipped to handle ―noisy‖ data with various anomalies that may wreak

havoc on traditional models.

1.5 Terminology. Predictive analytics has been around for a long time but has been known by other

names. For much of the past 10 years, most people in commercial industry have used the term ―data

mining‖ to describe the techniques and processes involved in creating predictive models. However, some

software companies—in particular, OLAP vendors—began co-opting the term in the late 1990s, claiming

their tools allow users to ―mine‖ nuggets of valuable information within dimensional databases. To stay

above the fray, academics and researchers have used the term ―knowledge discovery.‖

1.6 Training Models. Supervised learning is the process of creating predictive models using a set of

historical data that contains the results you are trying to predict. For example, if you want to predict which

customers are likely to respond to a new direct mail campaign, you use the results of past campaigns to

―train‖ a model to identify the characteristics of individuals who responded to that campaign. Supervised

learning approaches include classification, regression, and time-series analysis. Classification techniques

identify which group a new record belongs to (i.e., customer or event) based on its inherent characteristics.

1.7 Unsupervised Learning. In contrast, unsupervised learning does not use previously known results to

train its models. Rather, it uses descriptive statistics to examine the natural patterns and relationships that

occur within the data and does not predict a target value. For example, unsupervised learning techniques

can identify clusters or groups of similar records within a database (i.e., clustering) or relationships among

values in a database (i.e., association.) Market basket analysis is a well-known example of an association

technique, while customer segmentation is an example of a clustering technique.

3 | P a g e

2. The Business Value of Predictive Analytics

2.1 Incremental Improvement. Although organizations occasionally make multi-million dollar

discoveries using predictive analytics, these cases are the exception rather than the rule. Organizations that

approach predictive analytics with a ―strike-it-rich‖ mentality will likely become frustrated and give up

before reaping any rewards. The reality is that predictive analytics provides incremental improvement to

existing business processes, not million-dollar discoveries.

―We achieve success in little percentages,‖ says a technical lead for a predictive analytics team in a major

telecommunications firm. She convinced her company several years ago to begin building predictive

models to identify customers who might cancel their wireless phone service. ―Our models have

contributed to lowering our churn rate, giving us a competitive advantage.‖

3. The Process of Predictive Modeling

3.1 Methodologies. Although most experts agree that predictive analytics requires great skill—and some

go so far as to suggest that there is an artistic and highly creative side to creating models—most would

never venture forth without a clear methodology to guide their work, whether explicit or implicit. In fact,

process is so important in the predictive analytics community that in 1996 several industry players created

an industry standard methodology called the Cross Industry Standard Process for Data Mining (CRISPDM.)

3.2 CRISP-DM. Although only 15% of our survey respondents follow CRISP-DM, it embodies a

common-sense approach that is mirrored in other methodologies. ―Many people, including myself, adhere

to CRISP-DM without knowing it,‖ says Tom Breur, principal of XLNT Consulting in the Netherlands.

4. Most Processes for Creating Predictive Models Incorporate the Following Steps:

4.1. Defining the Project

Although practitioners don’t spend much time defining business objectives, most agree that this phase

is most critical to success. The purpose of defining project objectives is to discourage analytical

fishing excursions where someone says, ―Let’s run this data through some predictive algorithms to

see what we get.‖ These projects are doomed to fail.

Collaboration with the Business. Defining a project requires close interaction between the business

and analytic modeler. To create a predictive model, this analyst meets with all relevant groups in the

marketing department who will use or benefit from the model, such as campaign managers and direct

mail specialists, to nail down objectives, timeframes, campaign schedules, customer lists, costs,

processing schedules, how the model will be used, and expected returns.

4.2. Exploring the Data

The data exploration phase is straightforward. Modelers need to find good, clean sources of data since

models are only as good as the data used to create them. Good sources of data have a sufficient number

of records, history, and fields (i.e., variables) so there is a good chance there are patterns and

relationships in the data that have significant business value.

On average, groups pull data from 7.8 data sources to create predictive models. (―High value‖

predictive projects pull from 8.6 data sources on average.) However, a quarter of groups (24%) use just

4 | P a g e

two sources, and 40% use fewer than five sources. Most organizations use a variety of different data

types from which to build analytical models, most prominently transactions (86%), demographics

(69%), and summarized data (68%).

Fig. Based on 149 respondents that have implemented predictive analytics.

4.3. Preparing the Data

Cleaning and Transforming. Once analysts select and examine data, they need to transform it into a

different format so it can be read by an analytical tool. Most analysts dread the data preparation phase,

but understand how critical it is to their success. Preparing data means first cleaning the data of any

errors and then ―flattening‖ it into a single table with dozens, if not hundreds, of columns. During this

process, analysts often reconstitute fields, such as changing a salary field from a continuous variable

(i.e., a numeric field with unlimited values) to a range field (i.e., a field divided into a fixed number of

ranges, such as $0–$20,000, $20,001–$40,000, and so forth), a process known as ―binning.‖ From

there, analysts usually perform additional transformations to optimize the data for specific types of

algorithms.

4.4. Building Predictive Models

Creating analytic models is both art and science. The basic process involves running one or more

algorithms against a data set with known values for the dependent variable (i.e., what you are trying

to predict.) Then, you split the data set in half and use one set to create a training model and the other

set to test the training model.

If you want to predict which customers will churn, you point your algorithm to a database of

customers who have churned in the past 12 months to ―train‖ the model. Then, run the resulting

training model against the other part of the database to see how well it predicts which customers

actually churned. Last, you need to validate the model in real life by testing it against live data.

Iterative Process. As you can imagine, the process of training, testing, and validation is iterative.

This is where the ―art‖ of analytic modeling comes to the forefront. Most analysts identify and test

many combinations of variables to see which have the most impact. Most start the process by using

statistical and OLAP tools to identify significant trends in the data as well as previous analytical work

done internally or by expert consultants. They also may interview business users close to the subject

and rely on their own knowledge of the business to home in on the most important variables to

include in the model.

5 | P a g e

Selecting Variables. Most analysts can create a good analytic model from scratch in about three

weeks, depending on the scope of the problem and the availability and quality of data. Most start with

a few hundred variables and end up with 20 to 30. This agrees with our survey results showing that a

majority of groups (52%) create new models within ―weeks‖ and another third (34%) create new

models in one to three months. Once a model is created, it takes about half the groups (49%) a matter

of ―hours‖ or ―days‖ to revise an existing model for use in another application and takes another 30%

―weeks‖ to revise a model. In addition, about half (47%) of models have a lifespan shorter than a

year, and one-third (16%) exist for less than three months.

Fig. How Long Does It Take to Create a New Model from Scratch?

Fig. How Many Variables Do You Use in Your Models?

4.5. Deploying Analytical Models

Focus on Business Outcomes. A predictive model can be accurate but have no value. Predictive

models can fail if either (1) business users ignore their results or (2) their predictions fail to

produce a positive outcome for the business. The classic story about a grocery that discovered a

strong correlation between sales of beer and diapers illustrates the latter situation. Simply identifying

a relationship between beer and diaper sales doesn’t produce a valuable outcome. Business users must

know what to do with the results, and their decision may or may not be favorable to the business.

Fig. What Does Your Group Do with the Models It Creates?

6 | P a g e

4.6. Managing Models

The last step in the predictive analytics process is to manage predictive models. Model management

helps improve performance, control access, promote reuse, and minimize overhead. Currently, few

organizations are concerned about model management. Most analytical teams are small and projects

are handled by individual modelers, so there is little need for check in/check out and version control.

―We don’t have a sophisticated way of keeping track of our models, although our analytical tools

support model management,‖ says one practitioner. She says her four-person team, which generates

about 30 models monthly, maintains analytical models in folders on the server.

Fig. Which Best Describes Your Group’s Approach to Model Management?

7 | P a g e

5. Advances in Predictive Analytics Software

Analytical software has taken much of the labor, time, and guesswork out of creating sophisticated

analytical models.

5.1. Integrated Analytic Workbenches. Leading vendors of analytical software have introduced in the

past several years robust analytic workbenches that pre-integrate a number of functions and tasks

that analytic modelers previously completed by hand or with different tools. Today, modelers can

purchase a single analytic development environment that supports all six steps in the analytic

development process.

5.2. Graphical Modeling. One major advancement offered by these workbenches is their ability to

graphically model the flow of information and tasks required to create and score analytic models. In

the past, modelers had to hand-code these steps into SQL or a scripting program. ―I can’t develop

models without the types of analytic tools available today since I don’t have programming skills,‖ says

TN Marketing’s Siegel. ―Today, I can create one hundred little steps in a graphical workflow,

configure each step, and then hit a button to make the program run. The tool builds the programming

logic behind the scenes so I don’t have to.‖

5.3.Automated Testing. Analytic workbenches have also improved developer productivity by

automatically running multiple models and algorithms against a data set and measuring the impacts to

see which provides the best performance. Previously, developers had to spend time testing each type

of model and algorithm separately, effectively limiting the options they could test.

5.4. Client/Server. Today’s analytic workbenches run in a client/server configuration rather than only on

a desktop. A client/server architecture consolidates queries onto the server, reducing what analysts

must download to their desktops to explore data and create analytic models. This reduces network

traffic and redundant queries, which can bog down system performance.

5.5. Text Analytics. Predictive text analytics enables organizations to explore the ―unstructured‖

information in text in much the same way that predictive analytics explores tabular or ―structured‖

data. Through text analytics, organizations can uncover hidden patterns, relationships, and trends in

text. As a result, companies gain greater insight from articles, reports, surveys, call center notes, email,

chat sessions, and other types of text documents. Predictive text analytics also allows

organizations to combine structured and unstructured information in the same models or retrieve

documents related to specific KPIs.

5.6. Analytic Data Marts. Along with the client/server workbench, most organizations implement an

analytical data mart to house much of the data that analysts want to analyze. Most organizations

refresh these analytical data marts on a monthly basis so modelers can rerun models on new data.

Having a dedicated environment for predictive modelers further offloads query processing from a

central data warehouse and operational systems, and improves performance across the systems.

8 | P a g e

6. Machine Learning Methods for Mail Spam Classifier

6.1. Naïve Bayes classifier method

In 1998 the Naïve Bayes classifier was proposed for spam recognition. Bayesian classifier is

working on the dependent events and the probability of an event occurring in the future that can

be detected from the previous occurring of the same event. Every word has certain probability of

occurring in spam or ham email in its database. If the total of words probabilities exceeds a certain

limit, the filter will mark the e-mail to either category. Here, only two categories are necessary:

spam or ham. Almost all the statistic-based spam filters use Bayesian probability calculation to

combine individual token's statistics to an overall score.

The message is considered spam if the overall spamminess product S[M] is larger than the

hamminess product H[M]. The above description is used in the following algorithm:

Stage 1. Training:

Parse each email into its constituent tokens

Generate a probability for each token W

S[W] = Cspam(W) / (Cham(W) + Cspam(W))

store spamminess values to a database

Stage 2. Filtering:

For each message M

while (M not end) do scan message for the next token Ti query the database for spamminess S(Ti)

calculate accumulated message probabilities

S[M] and H[M]

Calculate the overall message filtering indication by:

I[M] = f(S[M] , H[M])

f is a filter dependent function,

such as:

I [M] = 1+S[M]-H[M]/2

International Journal of Computer Science & Information Technology (IJCSIT), Vol 3, No 1, Feb

2011:

176

if I[M] > threshold msg is marked as spam

else

msg is marked as non-spam.

9 | P a g e

6.2. K-nearest neighbour classifier method

The k-nearest neighbour (K-NN) classifier is considered an example-based classifier, that means

that the training documents are used for comparison rather than an explicit category

representation, such as the category profiles used by other classifiers. As such, there is no real

training phase. Additionally, finding the nearest neighbours can be quickened using traditional

indexing methods. To decide whether a message is spam or ham, we look at the class of the messages

that are closest to it.The comparison between the vectors is a real time process. This is the idea of the k

nearest neighbor algorithm:

Stage 1. Training:

Store the training messages.

Stage 2. Filtering:

Given a message x, determine its k nearest neighbours among the messages in the

training set. If there are more spam's among these neighbours, classify given

message as spam. Otherwise classify it as ham. The use here of an indexing method in order to reduce

the time of comparisons which leads to an update of the sample with a complexity O(m), where m is

the sample size. As all of the training examples are stored in memory, this technique is also referred to

as a memory-based classifier.

. Another problem of the presented algorithm is that there seems to be no parameter that we

could tune to reduce the number of false positives. This problem is easily solved by changing the

classification rule to the following l/k-rule:

If l or more messages among the k nearest neighbours of x are spam, classify x as spam,

otherwise classify it as legitimate mail.

The k nearest neighbour rule has found wide use in general classification tasks. It is also one of

the few universally consistent classification rules

10 | P a g e

6.3. Support Vector Machines classifier method

Support Vector Machines are based on the concept of decision planes that define decision

boundaries. A decision plane is one that separates between a set of objects having different

class memberships, the SVM modeling algorithm finds an optimal hyperplane with the

maximal margin to separate two classes, which requires solving the following optimization

problem.

A k-fold cross validation randomly splits the training dataset into k approximately equal-sized

subsets, leaves out one subset, builds a classifier on the remaining samples, and then

evaluates classification performance on the unused subset. This process is repeated k times

for each subset to obtain the cross validation performance over the whole training dataset. If

the training dataset is large, a small subset can be used for cross validation to decrease

computing costs. The following algorithm can be used in the classification process.

Input :

sample x to classify

training set T = {(x1,y1),(x2,y2),……(xn,yn)};

number of nearest neighbours k.

Output:

decision yp Î {-1,1}

Find k sample (xi,yi) with minimal values of K(xi,xi) – 2 * K(xi,x)

Train an SVM model on the k selected samples

Classify x using this model, get the result yp

Return yp.

Table 1. Performance of three machine learning algorithms by selecting top

100 features

Algorithm Spam Recall

(%)

Spam Precision

(%)

Accuracy

(%)

NB 98.46 99.66 99.76

SVM 95.00 93.12 96.90

KNN 96.92 96.02 96.83

Fig. Spam Recall, Spam Precision and Accuracy curves of three classifiers

11 | P a g e

7. Conclusion

In this report we review some of the most popular machine learning methods and of their

applicability to the problem of spam e-mail classification. Descriptions of the algorithms are

presented, and the comparison of their performance on the Spam Assassin spam corpus is

presented, the experiment showing a very promising results specially in the algorithms that is

not popular in the commercial e-mail filtering packages, spam recall percentage in the three

methods has the less value among the precision and the accuracy values, while in term of

accuracy we can find that the Naïve bayes and rough sets methods has a very satisfying

performance among the other methods, more research has to be done to escalate the

performance of the Naïve bayes and KNN either by hybrid system or by resolve the feature

dependence issue in the naïve bayes classifier, or hybrid the Immune by rough sets. Finally

hybrid systems look to be the most efficient way to generate a successful anti spam filter

nowadays.

Essay: Benefits of Predictive Analytics for Business Insight.

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: