%%WJ: Updated - To be reviewed by MB.Beat Online Fraud: How The Hut Group Defends Millions of Transactions

Fraud, as defined by Merriam-Webster dictionary, is deceit or trickery, specifically with “intentional perversion of truth in order to induce another to part with something of value or to surrender a legal right"cite{Ref1}.

Fraud is not a crime that has recently appeared, with the first documented attempts dating back to 300BC, when Hegestratos attempted to sink a ship deliberately and claim the costs back from his insurance policycite{Ref2}.

Fraud has since taken many different forms, but has been especially prominent following the creation of the Internet, which connected over 3.5 billion users in 2016[find ref].

In the United Kingdom alone, retailers sold £133 billion worth of products online in 2016. This growth has been matched by a doubling in cases of online fraud in the last 5 years, with estimated losses of £261.5 million in 2015cite{Ref3}.

Fraud is being tackled in the UK with the Financial Fraud Action (FFA UK) reporting “Financial fraud losses across payment cards, remote banking and cheques totalled £755 million in 2015, an increase of 26 per cent compared to 2014.

Prevented fraud totalled £1.76 billion in 2015. This represents incidents that were detected and prevented by the banks and card companies and is equivalent to £7 in every £10 of attempted fraud being stopped."[ref]

E-commerce companies are specialist businesses involved in the transfer of information across the Internet. They are most commonly defined as companies who purchase or sell goods and services via electronic channels. [ref]

%http://www.businessnewsdaily.com/4872-what-is-e-commerce.html

Electronic and online fraud is an apparent threat for these companies. It can not only cost them financially, but can also reduce their overall sales if users do not feel safe using their websites. If the company sells a product and a fraud occurs, then the company will be impacted in numerous ways.

The goods themselves will be lost to the fraudster. The money received by the company in exchange for the goods will be returned to the individual who has been a victim of fraud. Finally a potential charge from the payment provider will be levied on the retailer for handling a fraudulent payment. The benefits, both in reputation and cost, of avoiding such transactions is clear. %of the victim which can be as much as £14 per transaction from payment methods.

%Include full process (if necessary/ further detail required) of chargebacks.

One of the most common type of fraud facing e-commerce companies today is Card ID theft with 36,318 cases and total costs of £38.2 million in 2016 in the UK. [ref] The fraud can be carried out in two ways, the first of which is application fraud. This involves the fraudster opening an account in someone else's name using fake or stolen documents and information that they have acquired illegally. To prove their legitimacy to companies, the fraudster may steal or forge a document like a utility bill and if the company aren't able to identify this then the fraudster is able to start ordering at their leisure. The second method is account takeover where a fraudster gathers enough bank details and information about the victim to contact the bank whilst acting as if they were the victim and successfully convincing the bank to transfer funds out of the account or change the account holders address and send new cards to that address. This again, if successful gives the fraudster the ability to make purchases until realised by the bank or victim.

In e-commerce, the fraudster aims to acquire the goods of a specific company. With fraud detection systems in place in the majority of companies either `in-house' (run by the company) or external (run by another company), having another individuals card and payment details won't be enough to bypass the systems. Some examples of behaviour that may indicate a fraudulent transaction is being carried out are:

begin{itemize}

item Shipping and Billing Address do not match

item Large orders with multiple payment cards

item Ordering multiple items that are the same item but with a different size or colour

item Multiple orders to one address paid on a number of different cards.

item Multiple orders to multiple addresses paid on a single card.

item Declined purchases followed by smaller value orders

item Fast shipping regardless of price, especially to overseas addresses

end{itemize}

%Could go on here and split into different areas if we need to expand further

subsection{Company Background}

The Hut Group (THG), a rapidly growing e-commerce company currently operates more than 100 websites, representing over 20 different brands. This leaves it susceptible to fraudulent transactions of this type.

Facing regular fraudulent attacks, it is estimated that 1 in every 100 orders is fraudulent and with around 15 million orders per year currently this amounts to approximately 150,000 fraudulent attempts per year. With the company increasing in size and acquiring more brands and companies, THG are not only increasing in the number of orders, but also the locations that they sell to with a large emerging market in Asia. As the company grows, and their popularity increases, it will only attract further fraudulent activity.

The company along with other online retailers, face an incredibly busy weekend every year in November nicknamed `Black Friday'. During this weekend, large scale discounts are placed on items in an attempt to attract customers to the website and make purchases. Having become a well known event in recent years, many consumers will save money to spend specifically on deals from this weekend. Over the Black Friday weekend in 2016, THG welcomed over 3 million unique users to its sites in a 24 hour period and during peak times were selling in excess of 3000 products per minute[ref]. With a huge number of transactions to process, fraudsters use this as an optimal chance to attempt to bypass the security systems in place given that less time can be spent investigating each order that is placed.

%https://www.thehutgroup.com/2016/11/28/hut-group-announces-record-breaking-black-friday-sales-95-sales-growth/

%MB: More general background here I think. Section wants to be around 1 1/2 – 2 pages long.

%WJ: Updated- To be reviewed by MB.

section{Motivation}

%MB: Where does the model come from here? It isn't previously introduced as far as I could see.

The main purpose for the implementation of this potential model is to reduce the number of fraudulent transactions that pass through THG’s system. If a fraudulent customer is able to pass through the system unnoticed, then it can take up to 3 months before THG find out. In some cases it is possible to observe where the current rules have failed to pick up the fraud and change or adapt the weightings in the system so that it doesn’t happen in the future. If this isn't changed, the fraudster is able to continue ordering successfully. In this case, an address match would prove highly beneficial if it were able to identify when an identical address had been altered. If a past transaction has been identified as fraudulent in one of the systems or through the chargeback, this model will be useful in attempting to match a new and potentially manipulated known address with the known address in the fraudulent address database. As previously stated, the current address match is an address unification. It uses a hard match, that means following the unification process, a match will only be returned if the address is identical. With the amount of possible manipulations we can instead look to apply a fuzzy match based on the similarity. As opposed to returning a match or non-match we can apply a score in the range 0-100 based on similarity with 0 being completely different and 100 being identical. A hard classification would take any score from 0-99.9 as a non-match and only 100 as a match.

Fraudulent transactions are separated into two main categories for THG, nth{1} party fraud and nth{3} party fraud. nth{1} party fraud occurs when a customer uses their own card but attempts to commit a fraudulent transaction, for example claiming that they did not make the order or did make the order, but did not receive the items. nth{3} party fraud occurs when a person's card details are acquired and used to make a purchase without the persons knowledge. To successfully carry out nth{3} party fraud, a fraudster must disguise the order as a legitimate one by making subtle changes to information entered upon purchase. This is so that the goods will still be received, despite the fact that the address may be blacklisted by the fraud detection systems. The fact that there are still fraudulent transactions, as notified by the victim and their payment card issuer, demonstrates that there are transactions that are still as yet, going unnoticed.

subsection{Current system} %WJ: Diagram of system in the appendix to be included if worthwhile

The current system retrieves information from the customer at the checkout stage. At this stage the customer payment card isn't charged, the details are first checked and the order goes through the fraud detection system before attempting to charge the customer. This ensures that if a transaction is fraudulent, THG has the opportunity to stop it before a payment is processed by the customers bank. Inversely, if a transaction successful passes all stages, the order can be processed almost immediately.

The first stage of checking (Checker A) has a simple set of rules to reduce the number of transactions going through the main checking system (Checker B). In Checker A, transactions can pass straight to being processed if they pass a rule and do not need to complete every check in A if this is the case. Around 40% of the orders that THG receive pass to processing whilst the remaining 60% passes to Checker B. No orders at this stage can be rejected or identified as fraud. Checker B consists of approximately 300 complex rules. Each rule returns a score with a high score relating to a high risk order. The total score from these rules is then calculated and can either be sent to processing if it is under a set threshold or sent to the final checking system (Checker C) if it is over the set threshold (Find % of orders that carry onto C). Checker C requires a human check to decide whether to process the order or flag it as fraudulent. They are able to see which rules returned high risk scores from Checker B to aid their final decision on whether to allow the order to be processed.

This is inefficient to THG for a number of reasons. Checker B has a number of irrelevant rules that are not relevant to THG's transactions. For example “Bank Postcode details not supplied'' is a rule which isn't useful to THG since they don't ask for the Bank Postcode details from customers. Also “Non-Domestic card and high value'' isn't relevant to THG since `Domestic card' is only considered as a UK card regardless of the country that the customer is from. Since THG sell more outside of the UK than inside of the UK, this rule won't aid in the identification of fraudulent transactions.

%MB: Are you going to expand the 'new system being built' comment? At least alude to the following paragraphs.

%WJ: Yes that would be worthwhile. TBC

These rules are present because THG do not currently use their own fraud detection system so do not make the rules (New system being built). Also with a large number of transactions passing to Checker C, this requires a number of staff to process this volume of orders effectively. Staff need to be working 24 hours a day, 7 days a week so that orders are able to be processed for customers ordering next delivery and so that the volume doesn't get out of hand. With THG continuously expanding, if there are no changes to the fraud detection system, then the expected number of orders that reach Checker C would be increasingly higher, resulting in the need for further staff being required to check potentially fraudulent transactions.

Due to this being an apparent problem for THG, they are creating a new system to replace Checker B. Whilst this will not change the fact that a human checker is needed as the final decision, the goal for THG is to significantly reduce the number non-fraudulent orders that are flagged up as high-risk whilst still identifying all fraudulent transactions.

In the new system, rules can be built to identify fraudulent transactions through the address input.

One possible way that a fraudster can bypass the checking systems is through altering the address used. If a fraudulent transaction is carried out and a chargeback occurs, THG can add the fraudulent address to an address blacklist which stores all previous fraudulent transactions so that it can be noted for future transactions to prevent further fraud. The fraudster can then create another order to the same address but alter the input of the address in a number of ways. This alteration is performed so that a different address ID is created whilst the same address is identified by the delivery company and therefore is allowed to pass as a new address. If this problem is not managed appropriately, then it can make a fraudsters job very easy and could cost the company significantly.

Every time we find a fraudulent order, we delete the account. So the fraudster has to constantly create new accounts and fill in the address fields. That is why we usually have different versions of the same address in cases of fraud. And since the address might be the only link between the orders, the match is important.

After the order is released, the address strings are printed on the label for the courier. So the only restriction for the fraudster is to write the address so that it can be understood by a human. It is still however, possible to make many alterations to a string whilst keeping it understandable. For example:

"Ins’t it inertseting taht you can raed tihs steennce aobut adrdseses eevn tohguh the leettsrs aenr't in the crorect oedrr?"

This is an example of typoglycemia where the human brain is able to read words as they are meant as long as the first and last letter are in the correct places.[ref??] An increased number of alterations that a fraudster creates also therefore comes with a higher level of risk that their frauded goods will not arrive in the correct location or will be sent back to THG from the postal service if the address cannot be interpretted by the deliverer.

subsection{Project Goals}

From the limitations of the previous system and moving to the new, more flexible system, the project will look to gain insight into address input on THG's website. It will look to investigate whether there is reason to create an address comparison rule in the new system. If there is evidence to suggest that using address comparison can reduce the number of fraudulent transactions that are missed whilst reducing the number of non-fraudulent transactions registered as fraudulent then THG can look to implement this rule in the new system.

subsection{Objectives} %Is it worth mentioning stop-words in this section?

With the general project aim stated, we can further describe the objectives of the project. The first objective is to understand and identify how both legitimate customers and fraudsters write the addresses. This will help by giving insight into general address input and whether it can be effectively used to predict fraud. Trends in accidental input errors from legitimate customers with 'good' addresses and will be compared with trends in deliberate input errors from fraudsters with 'bad' addresses. It is necessary to find out whether the way in which fraudsters alter their addresses is significantly different to the way in which a legitimate customer accidentally enters their address incorrectly.

Using this information we want to match identical addresses that are written differently whilst also minimising the number of different addresses that are matched as the same address. We want to, using a, or a number of, string distance metrics, match a fraudulent address to one in our database consisting of previously encountered bad addresses. %Machine learning may be a potential solution for combining the results of multiple string distance metrics to identify matches.

We finally want to test the successfulness of the matching of addresses. We will define success in this project as detecting more fraudulent orders than before whilst not also increasing the number of legitimate orders that are considered fraudulent.

%WJ: Drop-off time, was not pursued for a number of reasons to discuss. Can leave in and discuss why this wasn't possible or just remove it. May be good to add it as future work.

%Further to matching, we are also interested in investigating drop-off time. With a large database of fraudulent addresses, a matching algorithm can take some time. we will look to investigate whether, based on severity, addresses can be removed from the fraudulent database and assumed as no longer being a problem. This will be measured against the efficiency of the algorithm to continue to successfully match addresses.

subsection{Research Questions}

Given these aims and objectives, the research questions are as follows:

begin{enumerate}

item Are there significant differences in the way that fraudulent and non-fraudulent customers input their address into the system?

item Can similarity metrics be used to match identical addresses with altered inputs?

item Is machine learning a valid approach in combining the results from a number string distance metrics for matching?

item Is this matching more successful than the current address ID comparison method?

%item Can the drop-off time be altered based on severity to increase efficiency of the similarity matching?

end{enumerate}

%MB: Did you want to mention anything about the machine learning approach? We can come back to this when you are happier with the rest of the dissertation

%WJ: Following our discussion on 04/08 I think it would be best to come back to this later when we see how the main sections are shaping up.

Essay: %%WJ: Updated – To be reviewed by MB.Beat Online Fraud: How The Hut Group Defends Millions of Transactions

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: