Matrix Factorization Part III: Production phase

Haneul Kim
3 min readNov 6, 2022


Going over simple paper implementation, into what we need to do to be production ready.

Photo by Haneul Kim

In previous articles we successfully trained user and item embedding using Matrix Factorization and tested it on our training dataset. We simply predicted scores for (user, item) pairs that exists however we are building a recommender system meaning that given an user n items must be recommended.

Today we will write code to recommend n items given an user and also handle cold-start user. Cold-start problem refer to users and items that have little or no previous data that enter our service, this is common problem in recommender systems and various researches aim to recommend good items to cold-users as well. Some remedies include recommending most popular items, similar items, and using models that generalize well.

Now, let’s recommend some movies to users :)

For each user in test dataset, we will recommend top3 relevant movies. For cold-start users we will recommend most popular top3 movies.

line 14~16, 18, 20 are cold-users and others are warm-users.

When warm-user(not cold-start) enters our service, look-up trained user embedding and pass it to get_topK() to get recommendations.

  • line 9 : replicate user_emb n times where n is number of items that can be recommended.
  • line 10 : calculate predicted relevance score using vectorized method.
  • line 13~14 : sort items from most relevant to least relevant then return.

Now we will see what happens when cold-user enters our service. We’ve added self.mp_df_sorted which gets created when we fit our model on training dataset, it is basically a dataframe that represents item popularity.

When cold-user enters the service we simply look-up movieId from above table. It is sorted by highest average rating and number of ratings its gotten.

When serving recommender engine in productions here are few things to that I always consider:

  • latency
  • computing cost
  • memory
  • training time
  • inference time
  • how to handle cold-start users

Possible improvements:

  1. Reduce search space : notice that for each user we need to predict rating for all movieId which is very inefficient and slow. So one remedy could be cluster the movies and then rank/predict each movies within that cluster only. This is commonly uses in production and it is referred to candidate-generation and ranking recommendation.
  2. Batch inference / Online inference : We’ve used simple model therefore online inference is feasible however often times when companies use more complex models it takes longer time to output predictions and when multiple users come into our service at once it has hard time serving all of them in real-time. Therefore calculate predictions in batches then simply look-up top-k items which saves a lot of time and computing power so depending on your model it’s okay to fall back to batch inference.


Even though it is important to focus on recommender model and its performance metrics, if you look at user and your company’s perspective there are much more important metrics that should be considered. It’s important for Data Scientists to understand these and apply them therefore both your company and users will be satisfied. What I’ve learned in past few years is that it’s easy to get stuck in building a more complex model that slightly increase performance, you must always step-back and think about users and company’s future.



Haneul Kim

Data Scientist passionate about helping the environment.