Building a movie recommendation system using content based filtering

Recommender systems are information filtering tools which aim to predict the rating or preference that a user would give to an item. They are important especially with the explosive growth of data. As such, recommender systems are created to act as information filtering systems. They are a useful alternative to search algorithms and help a user find more personalized items out of large dynamically generated information based on a user’s profile and previous selections of movies.

These systems are built on top of an algorithm, which is selected by intuition. But when intuition fails, machine learning comes into the picture, where the machine is provided with the ability to learn without explicitly being programmed, by exposing it to new data. Thus, an apt algorithm is selected after comparing the accuracies of different models.

These recommender systems are important not just for service providers but for end users as well. Users have to switch from movie to movie or follow lists on websites like IMDB or Rotten Tomatoes to find movies that they prefer. This process is arduous and not always successful. So, suitable content selection becomes an important task. Recommender systems have emerged as a solution to this problem.

In recent years, recommender systems have become common and are used in various industries, one of them being the movies to recommend. Users also benefit from recommender systems because of the relevant and dependable recommendations that satisfy their own preferences. These systems typically produce a list of recommendations, either through collaborative filtering or content-based filtering.

Collaborative filtering methods rely on a user-item matrix which represents whether users have liked an item or not. These filtering methods makes recommendations to users based on their similarity to other users, while content based filtering works on the basis of the description or properties of an item. These give more personalized results to individual users. We are using content based filtering to build our recommendation system.

PROBLEM STATEMENT AND APPROACH

The research problem that we are trying to solve is building a movie recommendation system using content based filtering. Our system will recommend movies to a user by matching the attributes of the movie previously liked by the user with attributes of movie not seen yet by them, to generate recommendations.

The model is built on top of a knowledge base which is a graph. The graph contains four types of nodes, i.e., actors, directors, movies and genres. Weights are assigned to these nodes, based on the relations between actors, directors and genres. The MovieLens data is fed into our knowledge base, which is powered by statistical and machine-learning techniques.

The technique we use to train our model is inspired by Bayesian Belief Networks. We’ve tweaked the algorithm such that it works for our model, instead of computing posterior probabilities, we determine the affinities between the nodes. Finally, we use a BFS and a greedy search technique to produce a list of movies that could be recommended to the user.

DATASET

Our dataset can be sourced from “hetrec2011-movielens-2k”, in grouplens.org. We sourced 5 files out of the dataset for the purpose of our project. A detailed description of these files can be found in the table below.

File Name

Number of entries

Description of the File

Attributes

movies_actors.dat

95321 actors

This file contains the main actors of the movie.

movieID, actorID, ranking

movies_directors.d at

4060 directos

This file contains the directors of the movies.

movieID, directorID

movies_genres.dat

20 genres

This file contains

movieID, genre

the genres of the movies.

movies_dat

10197 movies

This file contains information about the movies of the database.

user_ratedmovies. dat

2113 users

These files contain the ratings of the movies provided by each particular user.

userID, movieID, rating

To make it easy to parse through the data, we have converted the data to json format using “datToJson_convert.py” and storing this formatted data in “movies.json” and “likes.json” respectively. The json file, “movies.json”, contains the list of movies and their descriptions, which includes the rating, movie id, name, director, actors and genre. The “likes.json” contains the movie id, user id and the user rating for the movie.

KNOWLEDGE REPRESENTATION

Rule-based systems are a way to store and and modify knowledge and data to interpret useful information from it in a useful way. Most of their applications are in artificial intelligence and research. A very common form of using a rule base is a knowledge graph, much like the one we have used in our system.

To increase performance, there have been a lot of hybrid systems that use a mix of content based filtering and collaborative filtering techniques. The type of rule base we have used in our system is an external knowledge graph in addition to the content-based recommendation system.

The graph we have built in our system uses a mix of data structures, most of which are in the form of lists. The main data is brought in from “movies.json” and “likes.json” and mapped to four different dictionaries, which are:

– Map a director to a list of his/her associated movies

– Map an actor to a list of his/her associated movies

– Map a genre to a list of its associated movies

– Likes Mapping(User is mapped to a tuple with a list of all movies and another list

of liked movies)

Recommendation rule:

suggest(U, M) ← affine(U, E), related(M,E), liked(U, M)

where, affine() selects top 30 entities with highest affinity to user. User – Entity affinities are computed in the training phase.

MACHINE LEARNING ALGORITHM

The algorithm we have used in our system is a tweaked version of Bayesian Belief Networks, particular for our dataset. Bayesian Belief Networks are probabilistic directed graphical models in which each node is a hypothesis or a random process. Relations between nodes represent their conditional probabilities. Instead of the common approach of using posterior probabilities, we trained the model using affinities between entities. Affinities are a way of showing how closely related each entity is to another and by how much. In the following pictorial representation, we show how actors, genres and directors are related to one another.

Affinity can be calculated by multiplying the likability of both entities involved as shown: Likeability (actor) = |liked movies in which actor has acted| / |total liked movies|

Likeability (director) = |liked movies which were directed by said director| / |total liked movies|

Affinity(actor, director) = Likeability (actor) * Likeability (director)

FILTER/SEARCH METHOD

We used a graph traversal method with a combination of breadth first search and greedy technique. From the knowledge graph implemented, we traverse through the top 30 directors, actors and genres associated with the user for the recommendation. Afterwards, the Cartesian product of directors, actors and genres is used to determine their affinities, which we have calculated with our Bayesian Belief Network model.

From each of the director, actor, genre combination, we calculate sets of movies related to all three of our entities. We select a list of movies for each combination i.e. director-actor, director-genre and actor-genre. These list of movies will make our potential list of movies for recommendation.

Sometimes through graph traversal, we may also end up collecting movies that a user has already watched. This subset of watched of movies would have to be removed from the potential movies set. Once we finalize our potential movies, we calculated the top 30 movies among them with the help of a heap data structure.

RESULTS

To test the recommender system, we recommended movies to a user based on their previously liked movies:

1. We randomly selected 40% of the users as test users.

2. From the list of test users, we removed 40% of the liked movies from the training

phase and store them into an array ‘testLikesMap’.

3. We trained the model and used it to recommend movies for each of the test

users.

4. We then count the number of common recommended movies with the values we

stored in ‘testLikesMap’

5. The accuracy is the number of successful recommendations / number of test

users

In running it, we were able to achieve an average accuracy of 65%.

References

[1] Uluyagmur, M., Cataltepe, Z., Tayfur, E.: Content-based movie recommendation using different feature sets. In: Proceedings of the World Congress on Engineering and Computer Science, WCECS 2012, San Francisco, USA, October 24-26, vol. 1 (2012)

[2] L.T. Ponnam, S.D. Punyasamudram, S.N. Nallagulla, S. Yellamati, “Movie Recommender System Using Item Based Collaborative Filtering Technique”, IEEE International Conference on Emerging Trends in Engineering Technology and Science (ICETETS), pp. 1-5, 2016.

[3] R.Catherine and W.Cohen. Personalized Recommendations using Knowledge Graphs: A Probabilistic Logic Programming Approach. Pittsburgh, 2011.

Essay: Building a movie recommendation system using content based filtering

Essay details and download:

Text preview of this essay:

References

About this essay:

Essay details and download:

Text preview of this essay:

References

About this essay:

Essay Categories: