Home > Computer science essays > Building a movie recommendation system using content based filtering

Essay: Building a movie recommendation system using content based filtering

Essay details and download:

  • Subject area(s): Computer science essays
  • Reading time: 5 minutes
  • Price: Free download
  • Published: 15 October 2019*
  • Last Modified: 22 July 2024
  • File format: Text
  • Words: 1,433 (approx)
  • Number of pages: 6 (approx)

Text preview of this essay:

This page of the essay has 1,433 words.

Recommender systems are information filtering tools which aim to predict the rating or preference that a user would give to an item. They are important especially with the explosive growth of data. As such, recommender systems are created to act as information filtering systems. They are a useful alternative to search algorithms and help a user find more personalized items out of large dynamically generated information   based   on   a   user’s   profile   and   previous   selections   of   movies.

These systems are built on top of an algorithm, which is selected by intuition. But when intuition fails, machine learning comes into the picture, where the machine is provided with the ability to learn without explicitly being programmed, by exposing it to new data. Thus,   an   apt   algorithm   is   selected   after   comparing   the   accuracies   of   different   models.

These recommender systems are important not just for service providers but for end users as well. Users have to switch from movie to movie or follow lists on websites like IMDB or Rotten Tomatoes to find movies that they prefer. This process is arduous and not always successful. So, suitable content selection becomes an important task. Recommender   systems   have   emerged   as   a   solution   to   this   problem.

In recent years, recommender systems have become common and are used in various industries, one of them being the movies to recommend. Users also benefit from recommender systems because of the relevant and dependable recommendations that satisfy their own preferences. These systems typically produce a list of recommendations,   either   through   collaborative   filtering   or   content-based   filtering.

Collaborative filtering methods rely on a user-item matrix which represents whether users have liked an item or not. These filtering methods makes recommendations to users based on their similarity to other users, while content based filtering works on the basis of the description or properties of an item. These give more personalized results to individual users. We are using content based filtering to build our recommendation system.

PROBLEM   STATEMENT   AND   APPROACH

The research problem that we are trying to solve is building a movie recommendation system using content based filtering. Our system will recommend movies to a user by matching the attributes of the movie previously liked by the user with attributes of movie   not   seen   yet   by   them,   to   generate   recommendations.

The model is built on top of a knowledge base which is a graph. The graph contains four types of nodes, i.e., actors, directors, movies and genres. Weights are assigned to these nodes, based on the relations between actors, directors and genres. The MovieLens data is fed into our knowledge base, which is powered by statistical and machine-learning   techniques.

The technique we use to train our model is inspired by Bayesian Belief Networks. We’ve tweaked the algorithm such that it works for our model, instead of computing posterior probabilities, we determine the affinities between the nodes. Finally, we use a BFS and a greedy search technique to produce a list of movies that could be recommended   to   the   user.

DATASET

Our dataset can be sourced from “hetrec2011-movielens-2k”, in grouplens.org. We sourced 5 files out of the dataset for the purpose of our project. A detailed description of   these   files   can   be   found   in   the   table   below.

File   Name

Number   of   entries

Description   of   the File

Attributes

movies_actors.dat

95321   actors

This   file   contains the   main   actors   of the   movie.

movieID,   actorID, ranking

movies_directors.d at

4060   directos

This   file   contains the   directors   of   the movies.

movieID,   directorID

movies_genres.dat

20   genres

This   file   contains

movieID,   genre

the   genres   of   the movies.

movies_dat

10197   movies

This   file   contains information   about the   movies   of   the database.

id

user_ratedmovies. dat

2113   users

These   files   contain the   ratings   of   the movies   provided   by each   particular user.

userID,   movieID, rating

To make it easy to parse through the data, we have converted the data to json format using “datToJson_convert.py” and storing this formatted data in “movies.json” and “likes.json” respectively. The json file, “movies.json”, contains the list of movies and their descriptions, which includes the rating, movie id, name, director, actors and genre.   The   “likes.json”   contains   the   movie   id,   user   id   and   the   user   rating   for   the   movie.

KNOWLEDGE   REPRESENTATION

Rule-based systems are a way to store and and modify knowledge and data to interpret useful information from it in a useful way. Most of their applications are in artificial intelligence and research. A very common form of using a rule base is a knowledge   graph,   much   like   the   one   we   have   used   in   our   system.

To increase performance, there have been a lot of hybrid systems that use a mix of content based filtering and collaborative filtering techniques. The type of rule base we have used in our system is an external knowledge graph in addition to the content-based   recommendation   system.

The graph we have built in our system uses a mix of data structures, most of which are in the form of lists. The main data is brought in from “movies.json” and “likes.json” and mapped   to   four   different   dictionaries,   which   are:

– Map   a   director   to   a   list   of   his/her   associated   movies

– Map   an   actor   to   a   list   of   his/her   associated   movies

– Map   a   genre   to   a   list   of   its   associated   movies

– Likes Mapping(User is mapped to a tuple with a list of all movies and another list

of   liked   movies)

Recommendation   rule:

suggest(U,   M)   ←   affine(U,   E),   related(M,E),   liked(U,   M)

where,   affine()   selects   top   30   entities   with   highest   affinity   to   user.   User   –   Entity   affinities are   computed   in   the   training   phase.

MACHINE   LEARNING   ALGORITHM

The algorithm we have used in our system is a tweaked version of Bayesian Belief Networks, particular for our dataset. Bayesian Belief Networks are probabilistic directed graphical models in which each node is a hypothesis or a random process. Relations between nodes represent their conditional probabilities. Instead of the common approach of using posterior probabilities, we trained the model using affinities between entities. Affinities are a way of showing how closely related each entity is to another and by how much. In the following pictorial representation, we show how actors, genres   and   directors   are   related   to   one   another.

Affinity   can   be   calculated   by   multiplying   the   likability   of   both   entities   involved   as   shown: Likeability (actor) =   |liked   movies   in   which   actor   has   acted|   /   |total   liked   movies|

Likeability (director)  = |liked   movies   which   were   directed   by   said   director|   /   |total   liked movies|

Affinity(actor,   director)   =   Likeability (actor) *   Likeability (director)

FILTER/SEARCH   METHOD

We used a graph traversal method with a combination of breadth first search and greedy technique. From the knowledge graph implemented, we traverse through the top 30 directors, actors and genres associated with the user for the recommendation. Afterwards, the Cartesian product of directors, actors and genres is used to determine their   affinities,   which   we   have   calculated   with   our   Bayesian   Belief   Network   model.

From each of the director, actor, genre combination, we calculate sets of movies related to all three of our entities. We select a list of movies for each combination i.e. director-actor, director-genre and actor-genre. These list of movies will make our potential   list   of   movies   for   recommendation.

Sometimes through graph traversal, we may also end up collecting movies that a user has already watched. This subset of watched of movies would have to be removed from the potential movies set. Once we finalize our potential movies, we calculated the top   30   movies   among   them   with   the   help   of   a   heap   data   structure.

RESULTS

To   test   the   recommender   system,   we   recommended   movies   to   a   user   based   on   their previously   liked   movies:

1. We  randomly  selected  40%  of  the  users  as  test  users.

2. From  the  list  of  test  users,  we  removed  40%  of  the  liked  movies  from  the  training

phase   and   store   them   into   an   array   ‘testLikesMap’.

3. We  trained  the  model  and  used  it  to  recommend  movies  for  each  of  the  test

users.

4. We  then  count  the  number  of  common  recommended  movies  with  the  values  we

stored   in   ‘testLikesMap’

5. The  accuracy  is  the  number  of  successful  recommendations  /  number  of  test

users

In   running   it,   we   were   able   to   achieve   an   average   accuracy   of   65%.

References

[1] Uluyagmur, M., Cataltepe, Z., Tayfur, E.: Content-based movie recommendation using different feature sets. In: Proceedings of the World Congress on Engineering and Computer Science, WCECS 2012, San Francisco, USA, October 24-26, vol. 1 (2012)

[2] L.T. Ponnam, S.D. Punyasamudram, S.N. Nallagulla, S. Yellamati, “Movie Recommender System Using Item Based Collaborative Filtering Technique”, IEEE International Conference on Emerging Trends in Engineering Technology and Science   (ICETETS),   pp.   1-5,   2016.

[3] R.Catherine and W.Cohen. Personalized Recommendations using Knowledge Graphs:   A   Probabilistic   Logic   Programming   Approach.   Pittsburgh,   2011.

 

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, Building a movie recommendation system using content based filtering. Available from:<https://www.essaysauce.com/computer-science-essays/2017-11-30-1512013581/> [Accessed 11-04-26].

These Computer science essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.