Data Sciences Recommendations

Data Analytics, more specifically, prescriptive analytics are recommendations derived from mathematical computations. Recommendations are based on behaviors and needs in a particular context. Context is king in my book, and knowing the context allows systems to better personalize recommendations.

The goal of a recommendation system, in the context of an e-commerce application, is to predict what a customer would be willing to purchase and “suggest” it. Most of the time the suggestion is based on prior purchase patterns or purchase patterns of “similar” users. Another example of recommendation systems is suggesting documents to read based on prior reading patterns, subjects, and topics of interest from “similar” users. Essentially it would predict users’ interest based on subject or topics and then recommend relevant documents or news articles related to these topics to the user.

Recommendation systems use many different components and technologies and can be classified into two types of systems:

  • Content-based systems: Systems that examine features or characteristics of an item recommended.
  • Collaborative filtering systems: Systems that recommend items based on a similarity distance measured between users and/or items usually represented by a [User x Item] matrix.

Cold-Start Problem

One of the most challenging problems for a recommendation system is the cold-start problem. A cold-start problem is when the users have not interacted or purchased any item from the web site. Thus it is difficult to predict how well a visitor will like an item that is not scored. The absence of a rating value makes it sparse. In this case, Collaborative-Filtering algorithms do not work very well with such sparse matrices.

Sparse means that some entries in the matrix are empty. For example for movie ratings sites, the data is represented as a utility matrix. For each user-item pair, there is a value that represents what is known about the rate of preference of that user for that item. The diagram below is an example of a sparse matrix for movie ratings by users.

sparseMatrix

The challenge for recommendation systems is to fill in these blank entires by predicting the ratings based on other latent features or characteristics of the items (or movie). In 2006, Netflix held a competition called Netflix Prize that paid the winner of the best collaborative filtering model. The essence of the solution was a blend of two algorithms. The two techniques were Matrix Factorization and Restricted Boltzmann Machines (RBM). The blend of these two algorithms delivered by the Korbell team won them a $1 million prize.

To solve the cold start problem, you can always survey the users the first time they come to your site and build up an “understanding” of their preference then recommend and capture the feedback in an interactive manner. Yes, that would be ideal, however more and more users, millennials specifically, are less likely to fill your surveys. So the modern solution lies in the rebirth of these techniques blending old and new techniques like Matrix Factorization, Eigenvector, Multi-armed bandit algorithms with RBM and other probabilistic methods.