Matrix Factorization Part II: SGD, optimized implementation

Haneul Kim
2 min readNov 1, 2022
Photo by Haneul Kim

Previously we’ve covered one of collaborative filtering method, Matrix factorization in detail, specifically using stochastic gradient descent to learn the embedding vectors. Algorithms itself are important however how it is implemented is equally important(especially in business setting) because at the end of the day we need to serve these algorithms to our users. Today we will improve our previous implementation.

Main bottleneck of previous implementation was number of for-loops, we needed to loop each user and inside of it loop each items. So my main focus was to get rid of the for-loop and instead of operating row-by-row, try to vectorize operations as much as possible.

This works fine for small dataset however even at only 10_000 rows it takes 478s to run 10 epochs.

, not fast enough to be productionized.

Main techniques that improved performance are as follows:

  1. map each userId to associated learned embedding vector in vectorized manner
  2. to get the prediction score of (user, item) we take dot product, which is simply element-wise multiplication then summed. Since we worked with dataframe and each row is (user, item) interaction we could get prediction of all interactions by taking dot product on all rows. As mentioned dot product is simple element-wise multiplication followed by a summation it can easily be vectorized.

We will process it on 10_000rows of data, which looks like this:

First thing we need to do is re-index userId and movieId so that they range from [0, # of unique user/item-1] this is because we need to efficiently look up embedding vector.

This only took 0.013s.

Now here is code for optimized version, which took only 1.332s to train which is almost 400x faster.

One bottleneck is that SGD method updates user/item embedding vector at each iteration which means that operations cannot be vectorized because update of userA’s embedding vector will affect is next interaction. This is why other method WALS is used, for parallelization.

This is it! All code can be found https://github.com/HaneulKim214/Recommender-System/blob/main/poc_notebooks/matrix_factorization.ipynb So take a look if you are interested!

Conclusion

By simply fixing few lines of code it improved training speed by factor of 400x which is unbelievable, you will probably get promoted few levels right away. This is the power of vectorization therefore always look at ways to vectorize operations! Also take a look at another learning method of matrix factorization called Weighted Alternating Least Squares (WALS)

--

--

Haneul Kim

Data Scientist passionate about helping the environment.