Business Diversification is a common thing among companies that are trying to grow in categories beyond their core business (McKinsey & Company, 2015) and it is not a new strategy for telecommunications companies. Orange, a French telecom giant, has recently ventured into banking in 2017 (REUTERS, 2017) and Virgin, initially a record shop, has ventured into telecommunications, transportation and banking amongst others (Virgin, 2018)
Through an initial overview of the data, several options were proposed in order to make a final recommendation; it was first proposed to find the most valuable customers, based on monetary and demographic variables, and create a new financial product. This product would be used to provide customers who did not own a house a bank loan to become homeowners. The second option was a Programmatic Advertising software. Gathering all the behavioural and demographic data from the customers, clusters of clients who shared similarities would be made to create Purple4’s own software that created lookalike audiences in order to provide advertisers specific audiences that matched their products and their targeting strategy. The latter proposal, even if interesting, was disregarded due to the lack of behavioural data and a complete understanding of the customer’s data usage. Therefore, the aim of this study will be to find the most profitable customers, in order to offer a bank loan specifically to be used in the housing market.
Homeownership in the United Kingdom has been declining in the past years, accompanied by a fall in the proportion of young households who were owners (Meen, 2012), it was also argued by Willets (2010) that baby-boomers have profited to the disadvantage of younger generations, making it harder for millennials to own a home. A solid rise in mortgage debt has become global issue, but the UK in particular shows an increased debt to GDP ratio only under the US, Denmark, The Netherlands and Iceland (Meen, 2012).
However, UK homeowners are benefiting since property prices have been rising, enabling them to trade up, purchase second homes or remortgaging (Meen, 2012). In 2007, 50 percent of gross income from the housing market, came from remortgage or property buying from homeowners (Meen, 2012).
Therefore, it is clear that there is an important opportunity in the financial sector, for customers who are willing to begin their journey as first-time homeowners (millennials) and also for customers who already own a home, who are willing to trade up or buy a second property. It is important to mention that, even though necessary for a business decision, this study will not contain any financial predictions or recommendations on the housing market due to the specificity of the field.
III Methodology: 1,000
As mentioned above, this study will focus on the finding and clustering of valuable clients who are most eligible and likely to use a new Bank Loan offered by Purple4. The data set consists of 4,865 customers and 123 variables, the sample includes nominal, continuous and ordinal data.
As defined by McDaniel and Gates (2012) a correlation is the degree to which changes in one variable (the dependent variable) are associated with changes in another. Changes can be positive or negative, meaning that if the correlation is positive, an increase in one variable will have an increase in the dependant variable. As the new product offered is a bank loan, the first step was to find commonalities with the variable “Ever Defaulted On A Bank Loan” in order to assure a good credit score, as a negative credit history is one of the main reasons banks decide to deny loans to customers (Money Advice Service, 2018). Therefore, making this variable extremely important when segmenting customers.
A Correlation Matrix is a table used to visually display the coefficients between a set of variables (Statistics How To, 2018) and it was used to identify the variables with the highest impact on the desired variable using Pearson Correlation, one of the most common measures of linear dependence (Ly, Marsman and Wagenmakers, 2017)
As a result, a number of binary, nominal and ordinal and continuous variables showed relevance to the main variable, a Cluster Analysis was used in order to identify people that were similar in regard to the selected variables. As stated by McDaniel and Gates (2012) “the purpose of a cluster analysis is to classify objects or people into some number of mutually exclusive and exhausting groups so that those within a group are as similar as possible to one another.” The first step was to understand the number of clusters we could find in the data with the selected variables. An agglomerative hierarchical procedure is used to identify the number of clusters through the merger of similar groups until the entire data set becomes one group (Ferreira, L. and Hitchcock, D.B., 2009). For a hierarchical clustering procedure in SPSS the continuous variable “Debt To Income Ratio” was recoded in order to transform it from a continuous variable to a binary one. It was noted by Folger (2018) that a healthy Debt-To-Income Ratio (DTI) for housing loans was below 36%, so all the customers who had a DTI below 36% were coded as “0”, while all customers above 36% were coded as “1”. As an output, a dendogram was used to visually identify the number of clusters.
Another clustering method is the k-means algorithm, Likas, Vlassis and J. Verbeek (2003) define it as “a point-based clustering method that starts with the cluster centres initially placed at arbitrary positions and proceeds by moving at each step the cluster centres in order to minimize the clustering error”. As noted by Steinbach, M., Karypis, G. and Kumar, V. (2000) sometimes both, agglomerative hierarchical and K-means approaches are combined so as to “get the best of both worlds”. In order to get a more precise cluster, K-means was applied to the same variables, using the number of clusters found on the dendogram from the hierarchical cluster. Afterwards, a custom table was found in order to locate and filter the target customers in both clusters.
After the clusters were established, a second correlation analysis was made in order to find the most significant commonalities with all the remaining variables. A second recoding of the variables was used to transform and align the number of the clusters as a binary variable, and the newly-found variables were used as explanatory variables in this step.
In order to test the outcome of the cluster, and its commonalities to the newly added variables SAP Predictive Analytics was used to perform a logistic regression and analyse the outcome of the proposed model. The cluster and the new variables were modified in order to be used in SAP. Logistic regression is a predictive analysis, used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables, it is a common method of testing a model or a hypothesis (Statistics Solutions, 2018).
The accuracy of the proposed model is calculated through two measurements, Predictive Power (KI) and Prediction Confidence (KR). According to SAP Documentation (2018) “The predictive power is the percentage of information in the target variable that can be explained by the other variables in the model (the explanatory variables)” and “the prediction confidence
indicates the capacity of the model to achieve the same performance when it is applied to a new data set which has the same characteristics as the training data set.” Both values are shown in as decimals in between 0 and 1, models closer to one are considered robust.
...(download the rest of the essay above)