The content-based approaches to recommendation include:

Pure information filtering systemsbuild profiles of content preference
Case-based reasoning systemsask questions and query database of cases
Knowledge-based navigation systemsnavigate from item to item in a library of examples
without necessarily the engineering of questions


Generally, these case-based and knowledge-based approaches are helpful for these ephemerally personalized experiences. Case-based recommendations are a lot easier to explain to a user.

Whereas, the pure information filtering approaches are much more appropriate when you’re talking about a longer-termed model of preferences over time, like getting news every day. And you still can’t explain them with relation to the user profiles, when profiles can get awfully complicated.

Content-based techniques work without a large set of users, but they need that set of item data. The challenges with content-based techniques in general are that they:

  1. depend on well-structured attributes that align with preferences.
  2. depend on a reasonable distribution of attributes across items (and vice versa).
  3. less likely to find surprising connections.
  4. are harder to find complements than substitutes.

Model Preferences

The notion of stable preferences is for all recommender systems. But with content-based recommendation, we’re going to think about content attributes as our measure of preference and stability. For example: color, brand, actors, services offered by hotels. All of these are things that we could use to model preferences in a way that would help us find products or services that people are interested in or help a system recommend them. So the key idea is to:

  1. model items according to relevant attributes,
  2. reveal user preferences by attributes, and
  3. match those together, then we have a recommender.

Build and Use Models

Vector models are built to represent user preferences and item expression in terms of keywords, tags, or other content attributes. User could explicitly build their own profiles, but we could also implicitly infer their profiles from their actions like read, click, buy, or rate. But mostly what we’re looking for is a set of keywords that’s descriptive of the items and that seems related to people’s preferences, and then simply count the number of times the user chooses items with the keyword or use more sophisticated method.

Once we have this vector of keyword preferences, we could just add up the likes and dislikes to the attributes. It would be nice if we could figure out which things are more or less relevant. One of the ways to do this is a technique known as “Term Frequency Inverse Document Frequency” or TFIDF. Briefly, TFIDF looks at the question of how important or how frequent is this description in the current context. But if you added something that was much less frequent, and very descriptive at the same time, it is probably a salient feature.



TFIDF Weighting

TFIDF and the related family of weighting functions that are used not just for content-based filtering, but actually in document retrieval and search engines. The motivation behind TFIDF came from the problem of searching. Those primitive search engines usually fail, because they usually do not know which one is better, say:

  1. return all documents that contain the search terms, or
  2. rank documents higher if the search terms occur more frequently.

At a minimum, a search engine needs to consider two factors:

  1. term frequency may be significant, and
  2. not all terms equally relevant

TFIDF weighting is a strategy of weighting which comes from the field of information retrieval. Formally it is defined as below:

TFIDF Weight = Term Frequency * Inverse Document Frequency
Term FrequencyThe number of occurrences of a term in the document,
or the number of occurrences of a keyword or a tag in metadata or something other than documents.
Inverse Document FrequencyHow few documents contain this term, typically:
log( num of all docs / num of docs with term)
The higher the number of documents has the term, the lower the IDF value is. We don’t care as much about a term if it appears everywhere.

The TFIDF weight is assigned to the particular search term (or tag). The bigger the weight is, the more we factor that in, when we’re deciding whether we have a match.

Variants and Alternatives

There are sorts of tweaks that are done to make TFIDF work better in different domains.

  1. In certain areas, the term frequency is mostly ignored, or turned into a boolean frequency (occurs above threshold), we only worry about terms are valuable to the extent that they’re rare.
  2. In cases where term frequencies are big, they often get logarithmically adjusted, say log(TF+1).
  3. In cases where the particular collections of objects or metadata are very widely varying in size, you’ll often find term frequency normalized.

Pros and Cons

TFIDF automatically demotes stop words and common terms, it attempts to promote core terms over incidental ones. However TFIDF does not always work. There are documents that are about something that don’t say the something that they’re about very often, for example: legal contracts, TFIDF will do fine on the substance of the contract but not the participants. TFIDF will also fail when people just ask for bad terms.

TFIDF, in both search engine and content filtering, can be actually harder than simple processing of vectors of terms.

  1. When it comes to phrases and n-grams, adjacency is one factor that’s often used as a way of dealing with terms that are close to each other.
  2. The occurrence of different terms may have different significance depending on where or how it’s applied. A term in title may have more weight than a term in a paragraph.
  3. The general authority. In search engines like Google, there are complex strategies to try to figure out which documents are good ones.
  4. Implied content: links, usage, etc.


Content Filtering

For content filtering, we want to build a profile of user content preferences. We want to build that profile most of the time by taking user’s ratings, implicit or explicit, of a set of objects. TFIDF can be used to create the profile of an object (a document): there is an indicator for each term in the document of how important this term is, as a descriptive term for this document.

And by having the TFIDF profile of what this document is about, we then have a weighted vector that can be combined with ratings from user to create a user profile, and then match that profile against future documents.

Key Concept: Keyword Vector

The universe of possible keywords defines a content space:

  1. Each keyword is a dimension.
  2. Every item has a position in that space, that position defines a vector.
  3. Every user also has preference / taste profiles that are also vectors in that space.

The match between user preferences and items is measured by how closely the two vectors align. You may want to limit / collapse keyword space. A common technique from natural language processing is stemming and stopping. Stemming means we’re going to take words that have different endings but the same root. And collapse them together so that computing is all one keyword. Stopping is to say that there are stop words, phrases that are so common, they have no meaning, whatsoever in telling us about something.

We also need to move to user profiles. And this is where we introduce the fundamental limitation of the vector space model. The vector space model represents user preference as a single scalar value for each dimension (keyword or term), so it conflates liking with importance. (I like something but don’t actually care much about it). There are questions you need to ask and decisions you need to make:

  1. How to accumulate profiles?
  2. How to factor in ratings?
  3. How to update profiles and new ratings?

Computing Predictions

A prediction can be computed as the cosine of the angle between two vectors (profile vector and item vector). Cosine between the vectors, which can be computed as the dot-product of normalized vectors, is a good measure that smoothly scales from “a perfect match” (angle = 0 degree) to “an exact opposite” (angle = 180 degree). Cosine doesn’t actually give you a prediction. It gives you a score between -1 and 1, and there’s lots of ways to go from that to predictions. For example: map that range back on to the user’s rating scale.

Strength and Challenages

One of the strengths of this approach is that it depends only on content. You don’t need to know anything more than the content attributes and the user preferences for the items. It gives you an understandable profile. The computation is easy. And it is flexible, you can integrate with query-based systems or case-based approaches.

However, it is challenging to figure out the right weights and factors. You’ve gotta actually understand your data and how your data relates to preference. This is a highly simplified model, it can not handle inter-dependencies. Breaking things down to attributes doesn’t always work. When you’re doing things based on attributes, you’re unlikely to be delightfully surprised, because what the system recommends is sort of what you’ve already said you like.



My Certificate

For more on Content-Based Recommenders, please refer to the wonderful course here https://www.coursera.org/learn/recommender-systems-introduction


Related Quick Recap


I am Kesler Zhu, thank you for visiting my website. Checkout more course reviews at https://KZHU.ai

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!

Leave a Reply

Your email address will not be published. Required fields are marked *