Let us start with the standard setting in collaborative filtering: We are given an incomplete rating matrix and one usual task is to fill those missing entries (aka rating prediction), and another task is to produce a ranked list of new items for each user (or a ranked list of new users for each item).
In this post, we propose to view the two tasks slightly: We want to recommend a set of items for each user, ideally with rating per each item. This differs from the standard rating prediction in that the set itself is unknown. List ranking is now a by product.
There are two variants: One is we do not need to predict the ratings in the set, and another is a ranked list of subsets.
Let us start with the first variant, which is applicable in many real-world situations. When you go shopping at a grocery for the whole week, you usually get a basket full of items of varied qualities. Some items are the same product, but in general those items are different, and together they satisfy the nutrition need and food quality for the family as well as the budget constraints. Another situation is travel package: There are many constraints on the routes, airlines, ticket costs & promotions, waiting time, visa applications, the business deadlines, transits, and luggage allowances. Yet another example is music consumption: You don’t always need all best rock pieces but sometimes you want a good mix of them even though some pieces are not very high quality.
Let us assume for now that we will deal only with non-repeated items. The problem is now to specify the best set out of many possible sets. In fact, if there are ( N ) items, there will be ( 2^N -1 ) possible non-empty sets.
One simple solution is to first rank all the items, and then estimate a threshold from which high quality items will be selected. This will sometimes work if items are homogeneous, but this will fail if items are complementary: In the case of shopping baskets, you will end up with high quality items you won’t need. Even if items are fairly homogeneous, some items tend to go together more often than with other items:
A better solution should deal with the item set directly.
An immediate question is how can we judge the set quality?
(To be continued)