본문 바로가기

2022/October 2022

(4)
Collaborative Filtering (user-based, item-based) notes User-based collaborative filtering We predict unknown rating of a user based on the ratings of other similar users. Given a rating matrix, to predict user u's rating on item p by taking a weighted sum of ratings of other users who rated p (either all other users that rated p or top k similar users who rated p). The weight is similarity between u and the other user u', computed based on cosine si..
Community detection notes Girvan-Newman This approach defines communities by cutting most-traversed edges (until we form k communities). We identify most-traversed edges by counting how many times an edge is traversed as part of shortest path from a node to another node. If the shortest path from i to j covers an edge (a,b), then edge-betweenness increases by 1 due to that path. BUT if there are q shortest paths from i t..
Reservoir Sampling and Bloom Filter notes Reservoir Sampling Reservoir sampling is sampling technique from a stream of elements. It makes sure that the probability of an element remaining in a sampling of size s after encountering m elements from the stream is s/m. Reservoir sampling with sample size of 1 ensures that any element from the stream gets picked into the sample with a probability of 1/m (that is, after encountering m element..
Universal hashing and minhash notes 1. Universal hashing (W2.2) We have numbers from a large set S, which we want to store, query, and remove. Our requirements are: - time complexity of O(1) - space complexity that's less than O(|S|) Deterministic hash functions (maps S -> {0,n-1}, n