AIML - Recommendation Systems

 Two techniques used

  • Content Based Filterting
    • Based on content recommendation is given to similar users
    • If user1 watches Action & Adventure movies, similarly user2 also sees same. Then new movies watched by User1 will be suggested to User2 based on category of movie. 
    • eg: amazon online shopping
  • Collaborative based filtering
    • identifies behavior of user and categorizes users accordingly. New movies will be suggested based on the other users in group irrespective of the genre/category
    • eg: Netflix, Amazon prime

Movie genre(Action, Comedy, Adventure, Romantic)

Content based recommendation system design - Movie recommendation system

  • Import data from tmdb(movie details with overview)
  • from sklearn.feature_extraction.text import TfidfVectorizer
    • remove stop words, special chars, remove nan value with blanks
    • fit transform on movies OVERVIEW field to get sparse matrix
  • from sklearn.feature_extraction.text import sigmoid_kernel
    • Note: sigmoid transforms input between 0 to 1
    • Apply sigmoid on the sparse matrix(pass same matrix to both arguments to get all combinations of how overview1 related to other overviews)
  • Reverse mapping of indices and movie titles

Recommender systems

  • Collaborative filtering - Book recommendation system using K nearest neighbors
    • online shopping - suggestion to by accessories along with iPhone 
    • online movies -- suggestion based on user ratings
  • Content based filtering
    • Netflix - recommendation  based on movie type

 

  • Weighted average for each movie average rating
    •  W=(Rv + Cm)/(v+m)
      • W - weighted rating
      • R - Avg for movie (0 to 10) = Rating
      • v - #of votes
      • m-minimum votes required to be in top 250
      • C - the mean vote across whole report
    • Recommendation based on both weighted average & popularity
      • import minmax scaller
      • transform weighted average & popularity to scale them down to 0-5 values
      • Predict score based on tranfored values * %weightage to be given
      • order by score in descending
  • Correlation
    • Download movie & user ratings data
    • Get mean of the ratings by movie for movies with #userratings>100
    • Create pivot table with name moviemat with user id & title as column
    • Correlate moviemat with startwarsuserrating.
  • Nearest Neighbours
    • online shopping - suggestion to by accessories along with iPhone
    • online movies -- suggestion based on other similar user ratings
    • Download movie & user ratings data 
    • Get mean of the ratings by movie for movies with #userratingcount>100
    • Create pivot table with name moviemat with user id & title as column
    • Import NearestNeighbours package(This is unsupervised ML)
    • Create variable by initiating NearestNeighbours object passing parameters metric 'Cosine', algorithm 'Brute' & 'Euclidian' distance. And call fit() method on it.
    • call method kneighbors() passing random choice of movie & number of neighbours needed.
    • The above method returns 2 output params. Distances & indice
  • Book recommendation purely on #ratings using Pearson Correlation
    • Each book take avg rating & #ratingcount
    • To ensure statistical significance, Users with less than 200 ratings(across books), and books with 100 ratings are excluded.
    • Create a pivot with user id & book id 
    • Correlate the required book with the pivot. 
  • Book recommendation using Collaborative Filtering(KNN)
    • Each book take avg rating & #ratingcount
    • To ensure statistical significance, Users with less than 200 ratings(across books), and books with 100 ratings are excluded.
    • Create a pivot with user id & book id 
    • Import NearestNeighbours package(This is unsupervised ML)
    • Create variable by initiating NearestNeighbours object passing parameters metric 'Cosine', algorithm 'Brute' & 'Euclidian' distance. And call fit() method on it.
    • call method kneighbors() passing random choice of book & number of neighbours needed.
    • The above method returns 2 output params. Distances & indice
  • Content based Recommendation system

Comments

Popular posts from this blog

Statistics in Machine Learning

Cluster Analysis