Information Retrieval and Filtering
Information retrieval evolved in response to the need to be able to ask questions about a large collection of documents. We have a static content base, and there is dynamic information need (a query). So we spend our time and invest in indexing the content base. The common approach used is called TFIDF, which ranks documents and terms.
As time passes, the assumption of information retrieval reversed. The information need is pretty much static, but content base is dynamic. So in information filtering, we switched our effort to modeling user’s need.
Collaborative Filtering
Collaborative filtering emerged as a reaction to the problem that you want really good content, instead of just what’s on topic. The first effort was manual, based on the premise that keyword were insufficient. Automated collaborative filtering is the first system became known as recommender system, started with GroupLens project. The premise of GroupLens was that the user would rate the articles as they read them. Users would be matched to each other with similar tastes. You would get personalized prediction on what you would like or dislike.
In the mid or late of 90s, work has been down left and right, and people went out, got these things into commercial practice. We are seeing personalized recommender systems deployed pretty much everywhere.
Recommenders
We can define recommenders as tools helping people find worthwhile stuff. We can break them down in the sense of interfaces:
filtering interface | takes a stream of content and identifies the one you want |
recommendation interface | suggestion list, top-10 list, offers and promotions |
prediction interface | evaluate candidates, predicted rating, etc. |
Recommendation Approaches
Non-personalized and stereotyped | something popular, or group preference |
Product association | people who like / buy this also like / buy that |
Content-based | start learning what individual likes and building a profile |
Collaborative | learn what individual likes and use other’s experience to recommend |
Preference and Ratings
Very broadly we want to learn preference. What do users do that might tell us something about their preference? Explicit preference include ratings, reviews, votes / likes, continuous scale, pairwise preference, etc. Ratings are not always accurate, users’ preference may change over time. The ratings could occur:
- during consumption (rate when experiencing the item)
- some time after the consumption (based on their memory to rate), or
- not consumed the (high cost low volume) item yet (expectation)
Implicit preference are inferred from users’ actions. How much time user spend reasonably correlated well to their ratings. There are also binary actions include search, click, follow, purchase.
Predictions and Recommendations
Predictions are estimates of how much you will like an item. Recommendations don’t make bold statement that predictions make. Recommendations are suggestions for items you might like.
Predictions | Recommendations | |
Pros | helps quantify gives you clear understanding on some scale | provides a set of good choices |
Cons | gives you something can be wrong (falsifiable) | poor items can result in failure to explore |
Explicit predictions or recommendations may let customers feel that they are pushed and being manipulated.
Analytical Framework
This is a framework for analyzing recommender systems in general, there are 8 dimensions:
- Domain – what is being recommended.
- Purpose – sales, information, education, building community
- Recommendation Context – what is the user doing at the time of recommendation?
- Whose Opinions – experts, ordinary folks, or people like you
- Personalization Level
- non-personalized
- demographic
- ephemeral
- persistent
- Privacy and Trustworthiness
- personal info revealed, identity, deniability.
- is the recommendation honest? biased?
- Interfaces – prediction, recommendation, filtering, organic or explicit representation
- Recommendation Algorithms
Recommendation Algorithms
In the basic model of recommenders, there are 3 concept:
- users – with user attributes (demographics, etc), user model
- items – with item attributes (properties,,etc)
- ratings – where users meet items is the space of ratings.
Broadly there are 4 categories of algorithms:
users | |
non-personalized summary statistics | simple summary of statistics, no model, no user and item attributes |
content-based filtering | build models using user ratings and item attributes |
collaborative filtering | common core is a sparse matrix of ratings fill in missing values (predict) select promising cells (recommend) |
others | interactive approaches hybrids of various techniques |
Types of evaluation:
- Accuracy of predictions
- Usefulness of recommendations
- Computational performance
My Certificate
For more on recommender systems, please refer to the wonderful course here https://www.coursera.org/learn/recommender-systems-introduction
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Checkout more course reviews at https://KZHU.ai