Help me choose
2022/October 2022

Collaborative Filtering (user-based, item-based) notes

by hajinny 2022. 10. 23.

User-based collaborative filtering

We predict unknown rating of a user based on the ratings of other similar users.

 

Given a rating matrix, to predict user u's rating on item p by taking a weighted sum of ratings of other users who rated p (either all other users that rated p or top k similar users who rated p). The weight is similarity between u and the other user u', computed based on cosine similarity - cos(u,u'). Take the example of finding u4's missing rating on item p3.

This is where the example came from

Item-based collaborative filtering

We predict rating of item p by user u based on ratings the user u has given to other items p' (either all or top k similar items).

 

The following example computes rating of user u4 on item p3 based on lecture example:

 

Lecture example

 

Collaborative filtering with bias terms

Here's a user-based CF with bias terms:

For a better picture, this is what's being done:

Here's an example of how you would compute user 4's bias on item 6:

 

Note: Cosine similarity & Pearson similarity

 the exam might specify that I should use pearson correlation as a similiarity measure.

Use this to quickly compute cosine similarity during the exam

import numpy as np
def cosineSim(a,b):
  numerator = np.array(a).dot(b)
  denom = np.linalg.norm(a)*np.linalg.norm(b)
  return numerator/denom

Here's pearson similarity

from math import sqrt


def pearsonSim(a,b):
  numerator = 0
  def getAvg(c):
    sum = 0
    count = 0
    for i in range(len(c)):
      if c[i] != 0:
        count+=1
        sum += c[i]
    return sum/count
  a_avg = getAvg(a)
  b_avg = getAvg(b)
  
  a_denom = 0
  b_denom = 0
  for i in range(len(a)):
    if a[i] != 0 and b[i] != 0:
      numerator += (a[i]-a_avg)*(b[i]-b_avg)
      a_denom += (a[i]-a_avg)**2
      b_denom += (b[i]-b_avg)**2
  denom = sqrt(a_denom) * sqrt(b_denom)
  return numerator/denom

 

 

'2022 > October 2022' 카테고리의 다른 글

Community detection notes  (0) 2022.10.22
Reservoir Sampling and Bloom Filter notes  (0) 2022.10.19
Universal hashing and minhash notes  (0) 2022.10.18