blogs

Those who learn from history are doomed to repeat it

Bias propagation, and the tug-of-war between popular and personal

Yeah, yeah, we know that’s not the original quote. So why are we deliberately misquoting it?

Imagine you’re a marketing executive tasked with running customer outreach campaigns. One year, during the festive season, you’re put in charge of pushing the new ethnic wear line, so you do your research and find that it is most likely to work with women who have shopped at least once in the last 180 days. So you send them a campaign with personalized recommendations and an offer code. It works quite well, you get your well-deserved kudos, and move on to the next challenge.

The next year rolls around and you are asked to do the same thing again. So you dust off the old campaign plan and are ready to send it out when your boss asks you to redo the analysis because he wonders if customer preferences have changed. Trouble is, your historical data from the previous year only tells you what women who shopped sometime in the last 6 months are likely to do.

If you implement a solution based on your analysis of the data, and assume that your conclusions are good enough to deploy, you run the risk of getting to the point where, in future decision cycles, your data is biased by your past decisions. Call it bias propagation.

In other words, those who learn from history, yada yada yada.

In our earlier article, we had spoken of how testing on demand using multi-armed bandits allows us to challenge our assumptions about what customers want, and learn in the process. One important reason why we need to do this is bias propagation.

Zoom out a bit and you’ll notice that the general problem is one of exploration vs exploitation. To what extent should you double down on the insights that historical data gives you, and to what extent should you challenge these insights through exploration and testing?

Let’s take the example of recommender systems. Statistically speaking, recommending bestsellers to customers sounds like a wise strategy, especially when you have a lot of customers about whom you have very little data. After all, if a lot of people are buying it, it’s likely to be a good recommendation. However, the risk you run with this is that you might keep recommending the same products and, if the nudge is successful, keep selling more of them. And while you’re doing that, you have a catalogue full of products that nobody sees, and nobody buys. So, one of the objectives of a recommender system becomes discoverability – the ability to help customers discover products that suit them, that they might not have seen if all they focused on were the greatest hits.

Now, when some products sell much more than others, they are likely to pop up in recommender results as well. For instance, the Customer Genome algorithm is a lookalike method that recommends what people like a given customer X are buying. If a good-sized proportion of people are buying Product Id 123, then it is likely that people similar to customer X are also buying Product Id 123.  Similarly, the preference vector algorithm blends the customer preference with the crowd preference, in order to account for the fact that we have less data about a good proportion of customers, and therefore, a weaker knowledge of their preferences. This again means that what the crowd buys might dominate what the individual wants.

So, how do we ensure that customers don’t all get recommended the same product?

  1. Shuffling: This is a simple trick that mixes things up a bit within a recommender story. By shuffling the top few recommendations at random, one ensures that even people who have received the same set of top recommendations don’t end up seeing the exact same thing.
  2. Reduce the emphasis on the crowd. For instance, in the preference vector algorithm, reduce the weightage given to crowd preferences. Or, remove the Trending Products Recommender from the Hybridizer so that its output is purely personalized. (Trending can still be used to backfill recommendations in the Stories layer as required.)
  3. Play with the performance metric. Intuitively speaking, any performance metric on recommenders would involve matching what was recommended with what was purchased. However, a blue shirt (which every working age male in the history of working age males has possessed at least one of) gets the same weightage as a purple polka-dot shirt. So, the thing to do is to give a greater weightage to those instances where the recommender suggested a product that was more rarely bought, but turned out to be perfect for that particular customer. Imagine a sort of discounting factor that is proportional to the overall sales of a product – the more popular a product is, the less it matters if the recommender got it right. (This is sort of like the inverse document frequency logic that one sees in natural language processing – a word might be often seen in a document, but if it is often seen in every document, its presence is discounted accordingly.)

Remember: this sort of exploration comes with a cost – you’re likely to recommend more products, but you’ll also get plenty wrong. (Fun, somewhat-related fact: 45% of all ice cream sold even today is vanilla.) Still, if you wish to market a wide assortment of products and harness the power of personalization, it’s the way to go in the long term.