As a kid growing up in the 90’s, one of my favorite things to do was to watch Cartoon Network. There was something fantastic about living in a world so drastically different from my own! Of course, one show I absolutely loved watching was ‘Jetsons,’ a futuristic drama revolving around the eponymous family, set in 2062 (Not too far away, is it?). I would dream of traveling in space cars, have a robot maid, talk to my own digital diary, and have conversations with my dog…
Flash forward to 2018. While, my dog and I still cannot talk, and space cars still aren’t mainstream (Hey Elon Musk!) Rosie, the Robot Maid and Didi the Digital Diary have both slowly but surely become a part of most of our lives. And what is the technology at the core of both? Artificial Intelligence.
Artificial Intelligence (AI) has gradually evolved to be a hot-topic of the current times. The term and the technology have somehow managed to make its way into nearly every sphere of human life; from mundane activities like cleaning and shopping to more cerebral ones such as finding the cure for cancer. With so much of discussion surrounding the topic, and with AI becoming an imperative part of our lives, it makes great sense for us to understand what the term (and myriad terms that are often used along with it) means.
Simply put, Artificial intelligence is the intelligence demonstrated by machines. While this is a generalized definition, the intelligence itself includes activities like planning, understanding languages, recognizing objects and sounds, learning, and problem solving. A device is deemed to have (artificial) intelligence if it can perceive its environment and takes actions that maximize its chance of successfully achieving its goals (David, Alan, Randy, 1998).
We can classify AI in two categories, general and narrow. General AI would have all of the characteristics of human intelligence, including the capacities mentioned above. Narrow AI exhibits some facets of human intelligence, and can perform in that facet extremely well; however, it is lacking in other areas. A machine that is able to only recognize images, but cannot classify or identify themes, is an example of narrow AI.
As machines become increasingly capable, many applications that have been a result of AI are not associated with the technology. This is a phenomenon known as the ‘AI Effect.’ For instance, Optical Character Recognition which was actually a result of an AI research to identify characters, is frequently excluded from “artificial intelligence”, because it has become a routine technology.
Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at the highest level in strategic game systems such as chess, autonomous cars, intelligent routing in content delivery networks, military simulations, and interpreting complex data, including images and videos. Extensive research into Artificial intelligence and the efforts to make it better resulted in birth of various concepts like Machine Learning, Deep Learning, Data Mining and Natural Language Processing.
Machine learning is a fundamental concept of AI research since the field’s inception. earlier, Computers were specifically programmed to carry out certain tasks. They were given parameters and conditions to respond to given situations. These actions were based on the programmer’s response to a given situation and were limited to finite options. This has changed since the advent of Machine Learning. With the introduction of machine learning computers gained the ability to learn a problem with data and respond effectively. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with reliable performance is difficult. Email filtering, Fraud detection, shopping recommendation and shortest route prediction are some of the examples of everyday use of machine learning. Machine learning tasks are typically classified into two broad categories, depending on whether there is a learning signal or feedback available to a learning system as Supervised learning and Unsupervised learning. In supervised learning the system is presented with example inputs and their desired outputs and then they are trained in such a way that maps inputs to outputs. While unsupervised leaning does not have any labels leaving it on its own to find the structure. The data that is used to train the system is known as training data. A core objective of a learner is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases. Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use, thus digitizing cultural prejudices. Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Machine learning has been used to make drastic improvements to computer vision such as the ability of a machine to recognize an object in an image or video. Moving on to the algorithms employed in machine learning.
Decision Tree Learning:
Decision tree learning is one of the most popular predictive modelling approaches used in statistics, data mining and machine learning. It is mainly used for classification problems. This method uses decision trees to get observations about an item and to arrive at various conclusions to the item’s target value. A simple example of decision tree about playing conditions for a golf game is shown below-
The target of the above argument is to play or not. After going through a range of factors like weather, humidity and rainfall the final verdict is chosen accordingly. Decision tree can further be classified into two types based on the predicted outcome such as classification tree and regression tree. Classification tree analysis is used when the predicted outcome belongs to a certain class of data. The golf game decision tree example illustrated above is an example of this type. While regression tree analysis is used when the predicted outcome can be considered a real number; like the price of a property or a patient’s length of stay in a hospital. This is important in cleaning up the tree and making it more presentable. There are numerous advantages in using decision trees in machine learning. They are very simple to understand and can be displayed easily. This method required very minimal data normalization and can handle qualitative predictors. It performs well with large datasets in reasonable time. This method is the closest one to human decision making so it is useful in modelling human sentiments. But of all the most important advantage is that it can handle both numerical and categorical data. This is something other techniques cannot do since they are focused on processing one type of variable. Despite its ability to handle diverse types of data it also has its share of disadvantages. Decision trees are comparatively less accurate than other techniques. The accuracy of results can be improved to a certain extent by a method called pruning. Pruning a decision tree means reducing the size of a tree by removing the branches that are not significant for classification. They are also not robust. A tiny change in the training data will result in tremendous changes in the final output. Overfitting is another problem of decision trees. It means decision tree learners can create over-complex trees that do not generalize well with training data. This can be overcome by setting a minimum number of training inputs to use on each leaf. This will prevent the creation of complex trees.
The next machine learning technique is regression analysis. Regression analysis is a set of statistical processes for estimating the relationships among variables. It is used to predict how the relationship between two variables, such as advertising and sales, can develop over time. We can draw the regression line with data (cases) derived from historical sales data available. The focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Regression analysis is widely used for prediction and forecasting. Regression may refer specifically to the estimation of continuous response (dependent) variables, as opposed to the discrete response variables used in classification. There are many types of regression analysis present but for machine learning only linear regression is used. Linear regression can be explained with the following example. Let us assume that a company decides to spend on marketing for a product in four different regions and it gauges the sales (in thousands of units) for these products as a function of advertising budgets (in thousands of dollars) for digital and print media. A scatter plot can be drawn using the data in the table where sales is the dependent variable which can be gauged against independent variables like budget spent on digital and print media to find out which media contribute to sales.
Sales vs Digital Media plot Sales vs Print Media Plot
Y Axis is dependent variable (Sales) for both the plots while X axis denotes the independent variables (Budget for Digital and Print Media). While it is very helpful in forecasting and prediction as it gives near accurate numeric outputs this method is not widely used as the other methods because it works only on independent data which means digital media budget should not influence print media budget. This method is also very sensitive to outliers. Thus, other methods are often accompanied with linear regression in forecasting and prediction.
Artificial Neural Networks:
This method is different from other methods because unlike those it cannot be used for prediction or forecasting in a straight forward manner. Instead like human and animal brains it is trained to learn tasks by observing examples. While other methods utilize task specific programming, this doesn’t. A very common example is photo tagging on Facebook. For the first time you tag the picture yourself but after several tagged photos the system is trained and will identify your face in any picture uploaded on Facebook. They do this without any prior knowledge about the specific person such as height, complexion and age. Instead, they evolve their own set of relevant characteristics from the learning material that they process. An ANN is based on a collection of connected units or nodes called artificial neurons which are similar to neurons in an animal brain. The artificial neuron transmits information it processed to all the connected neurons. Below is an illustration of how artificial neural networks work in a broader sense.
Layers of ANNs
The neural networks are made up of three layers- Input, Output and Hidden layers. The input layer is where the training data is given to the system which is processed in the hidden layer and the output is fed to the output layer. Artificial neurons and connections typically have a weight that
adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Each signal traverses a said layer multiple times. Because of their ability to reproduce and model nonlinear processes, ANNs have found many applications in a wide range of disciplines. Their uses include speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. ANNs have to be carefully used. Their characteristics have to be carefully studied and implemented. The choice of model should be simple to avoid slow learning. ANNs also require significant experimentation to select and tune an algorithm for training on unseen data.
Neural networks offer a number of advantages, including requiring less formal statistical training, ability to implicitly detect complex nonlinear relationships between dependent and independent variables, ability to detect all possible interactions between variables, and the availability of multiple training algorithms. A common criticism of neural networks is that they require too much training for real-world operation. Other limitations include greater computational burden and proneness to overfitting.
Having understood the fundamental concepts of Artificial Intelligence and the main technology behind it- Machine Learning, it shouldn’t come as a surprise that the world of marketing and advertising which are no longer limited to human interactions and are seamlessly integrating digital platforms to keep up with the changing customer traits, have come to adopt and adapt to AI and ML.
Natural progression of things, and all that; next question which arises is ‘how are the technological concepts of Machine Learning and Artificial Intelligence applied to the fields of Advertising and Marketing?’
Despite William Bernbach staunchly claiming that advertising is essentially persuasion and persuasion is art, the line between the areas of science and art is slowly fading. With the advent of the world wide web, newer avenues have opened up for brands and companies to propagate their message to a larger audience on a larger scale. It also helps that the Big Data boom of 2013 has taken over the business lexicon in nearly every industry; because clearly, everyone wants to know more about who they are catering to in order to cater better and thereby be profitable! No brainer really! As more and more companies are turning to leverage the benefits of analyzing data and enhancing their services/offerings, the world of Advertising has completely caught on.
What is the connection between big data and machine learning? To simply state, data is the fodder which feeds the machine. Let us try that definition one more time. A machine learns from the data it is trained on; just like a child learns from the stimuli it is surrounded by. Except here, the stimuli are large beyond imagination, and the learning happens in a fraction of a second.
Data-driven Advertising regardless of the medium has become a norm. In advertising, the data collected and analyzed may pertain to the customers themselves, (shopping habits, advertisement engagement, social media usage, buying habits, preferences, brand engagement etc.) brand/companies, (financial aspects, ad spend, media spend etc.) and industries (trends, forecasts, participants and so on.) With so many people having access to smartphones, computers, and the internet in general, it shouldn’t be surprising that there is a copious amount of data being generated. It also pays to note that “Individual differences exist among the members of a group;” which means that every user has his/her own unique tastes, preferences, ways of engaging with a brand, and interpreting ad content. Big data can help marketers and advertisers understand their customer base, classify them into groups, make sense of their online interactions, gauge their motivations, and identify underlying patterns to the same. Think of it as something the Behavioral Analysis Unit of the FBI does to profile criminals; but with more data and a diverse group of people!
Gaining all these insights using digital footprints of consumers helps tremendously in creating ad campaigns that are highly personalized and targeted. This precision targeting can ensure that the ads and ad messages are reaching only those people who will yield sales, or are likely to yield sales. This saves a ton of money and make the ad more relevant to those viewing it.
Analyzing user behavior becomes even more important as a Bloomberg research points that nearly a quarter of the video ads are in actuality viewed by bots; which means that nearly quarter of your audience are fake! This means that some portion of your digital ad revenue is being wasted. User profiling may hence aid in uncovering ad fraud. Although it may help to understand that some companies and brands intentionally engage in fake viewing to boost their online rating (ad impression rate).
This is proficiently supported by Predictive Analytics. It is exactly what it seems to be. Using a combination of Bayesian statistics, theories of probability, and similar mathematical concepts of logic and weights, a theoretical model is constructed. This model with relevant data can predict the kind of behavior advertisers and marketers seek to. These behaviors typically range from who the appropriate consumers are, what they are likely to buy and when, how they are more prone to buying (via which channel,) and even why they are buying.
A predictive model uses individual characteristics as input data and churns out a predictive score as an output. Predictive analysis also helps in building Recommender Systems which harness user data collected from past experiences to recommend newer things. For instance, Amazon’s SIMS widget on product detail pages uses the customer’s personal data to recommend new product, and also uses the data garnered from other users who expressed interest in a similar/same product to recommend what the ‘Customers Also Bought.’ This also helps Amazon calculate the user’s worth.
Here, the role of a data analyst is imperative because we can have all the information in the world. But, if we don’t know how to leverage appropriate information at the right time, it is a waste.
Another important application of big data in advertising is ‘Native Advertising.’ Essentially, native advertising is blending ads with the editorial content of the websites where these ads are being displayed. As opposed to banner ads and pop-ups which are intrusive and lead to a bad customer experience, native ads are more inconspicuous and ensure that the customers have a seamless, pleasant experience. The placement of these ads are often determined by a machine, which has learned the techniques of optimal placement of advertisements from the actionable datasets it was provided with. The advent of programmatic advertising in 2010 only solidified the concept of automatic ad placement.
Programmatic advertising is buying or selling ad inventory via an automated, data-driven process. This process is dependent on variables such as location, search history, user interests etc. Programmatic Advertising is perhaps one of the largest applications of Machine Learning and Artificial Intelligence in advertising today. To quote from an interesting article from Bloomberg-
“The ideal programmatic transaction works like this: A user clicks on a website and suddenly her Internet address and browsing history are packaged and whisked off to an auction site, where software, on behalf of advertisers, scrutinizes her profile (or an anonymized version of it) and determines whether to bid to place an ad next to that article. Ford Motor could pay to put its ads on websites for car buffs, or, with the help of cookies, track car buffs wherever they may be online. Ford might want to target males age 25-40 for pickup-truck ads, or, better yet, anybody in that age group who’s even read about pickups in the past six months.”
Customization is another application of programmatic advertising that is gaining importance. A notable example of this is the Axe- CUBOCC (Brazil) production ‘Romeo Reboot’ which is an interpretation of the Shakespearean classic ‘Romeo and Juliet.’ The uniqueness of this campaign is that almost every viewer sees a different story. After suitable research, the agency divided the audience into four segments based on their consumption preferences. Each segment viewed different iterations of the trailer for the cinematic campaign, depending on their profile. Anyone who watched the trailer felt that this was something created exclusively for them, and every experience was almost unique! Plus, all this happened in real time! Eventually bots (one of the most encountered forms of artificial intelligence) will begin to handle payments and campaign management in programmatic advertising, further eliminating any need for human involvement.
Eventually, the creative narratives will be dynamic and change according to region, customer profile, preferences etc. By using data signals to understand who they are talking to, ads can be modified in real time for points such as price, language, duration without changing the core message. Saatchi & Saatchi in LA using IBM Watson used one thousand varied interests to help audience find unusual activities. For example, one ad matched martial arts enthusiasts with barbeque lovers and encouraged them to participate in an extraordinary competition called ‘Taikwan Tenderizer’ where they used hand-to-hand combat to tenderize meat!
What are chatbots? They can be defined as conversational computer programs. They engage in a dialogue with the customer either textually, or through auditory means. A common example of chatbots would be the Microsoft’s Clippy. As unhelpful and sometimes annoying Clippy was, he still spoke to you and tried to help!
As chatbots are becoming more and more commonplace, it is very likely that they will take over the domain of customer service. Bots are more personal as they speak to you one-on-one and these interactions are customized as per your query. They are quick, happen in real time and is just like talking to your geeky friend and getting a solution for your printer problems! A Nielsen study, commissioned by Facebook reveals that 56% of the people would rather chat with customer service reps than talk over phone. These automated chatbots comprehend the human messages through a machine learning concept called Natural Language Processing. Currently, most chatbots work hand in hand with human customer service agents, and are not independent. This is because there is a ton of data but people (programmers and developers) do not know how to use that data in a manner that is efficient and ethical. Look at Microsoft’s Tay, the anti-Semitic chatbots that had to be killed within five minutes of inception!
...(download the rest of the essay above)