Intro and praise for Robyn
Meta’s Data Science department has generously developed and offered an open-source Marketing Mix Modeling (MMM) software solution written mostly in R. The software offers several unique and useful features such as minimizing errors on three distinct measurement dimensions, Geometric and Weibull distribution options for estimating adstocks, and a general focus on leveraging detailed data where possible to reduce differences in human judgements.
Regarding my own background, I’ve managed teams of modelers building Mix Models for many large clients for seven years while at MarketShare/Neustar. My team and I analyzed around $5 billion ad dollars which represented about 40% of MarketShare’s MMM portfolio at that time. More generally I’ve developed marketing models for 30+ years covering areas such as MMM, Prospecting, Cross-Selling, Segmentation, and Product Design.
While Robyn is well thought out, there are several areas where new users are likely to find themselves in a bind. In this article I’ll highlight the most important of several problems I’ve noticed, when they are most likely to occur, and how to mitigate them. I plan to run down the list of 25 risk areas that I’ve observed in rank order of importance in a series of articles. The articles will appear each week or two for as long as interest remains high as indicated by the conversations that result.
How MMM Differs from Most Modeling
When I began mix modeling about 14 years ago, I already had 15+ years of modeling experience and considered myself an expert modeler. That said, the transition wasn’t without difficulties. Like many people that I see beginning mix modeling today, I had some key learnings that I had to adapt to this new situation.
Probably the most common marketing models are targeting models used to direct marketing efforts towards receptive prospects/customers. Many modelers build their experience in that domain. Those models are characterized by an abundance of potential predictor variables and the two key performance characteristics are “fitting the data” and “maintaining accuracy when used on new data”. Consequently, a lot of the modeling process is focused on variable selection, and the primary metric of success is the accuracy on the holdout (or testing) data.
Marketing Mix Modeling differs from that in accordance with its different objectives. MMM tries to accurately assess the value of each media channel and optimally allocate the marketing budget over them. That difference means that the accuracy of the “media coefficients” has an equal, or probably greater importance than the prediction accuracy, because the coefficients are what will determine the recommended budget allocations. It also means that the modeler has a lot less flexibility in what data to use, because all the relevant media channels must be estimated to compare their returns.
While building a targeting model it is best practice to delete correlated variables and any predictor that’s not statistically significant, but in a Mix Model you have to keep those variables and try to make fair comparisons between inherently unequal piles of evidence. The modeler will have to deal with many data problems that would have been avoided in other models, such as correlated predictors, non-causal correlation, sparse evidence, and differences in temporal granularity. A good mix model does have to fit the data, but overall fitting alone doesn’t guarantee that each coefficient is objectively accurate, and the coefficient comparisons are what will determine the recommendations and thus the financial efficacy of the MMM project.
How Robyn works, and when it fails
Robyn tries to reduce human bias by estimating more parameters from the data than some other systems. Every MMM estimates the size of media returns, but Robyn goes further to use data, rather than assumptions, to estimate additional parameters that govern the shape of the return curves and the spread of sales after advertising. It also uses Ridge Regression rather than a Bayesian framework to further reduce human bias. Bayesian estimation systems allow the users to insert prior estimates that influence the final estimates, and that can inadvertently add human bias.
Reducing bias, human or otherwise, is a good goal. That said, to be successful at obtaining that goal there must be enough data to estimate the additional parameters. If there isn’t, then trying to extract more parameters than the data can offer will normally produce worse estimates than using reasonable assumptions and fewer estimates.
Likewise, Bayesian priors do add bias in the direction of the priors, whereas Ridge Regression adds bias in the direction of zero coefficient values, implying no media impact. It’s a tradeoff of human bias for systematic bias and with that reduction in model flexibility comes predictable areas of vulnerability.
Problematic situations
Robyn’s heavier demands on the data and lack of priors mean that it will struggle and possibly fail to estimate accurately in three common situations:
- Non-causal correlation
- New media channels
- Increasing measurement detail
Endogeneity (non-causal correlation)
Despite the cautionary refrain “correlation does not imply causation”, all models use the past to both predict the future and make inferences about which advertising caused which sales. The correlation isn’t always reliable, but it is usually the best estimate we have. That said, there are some obvious cases where we know the correlation between the media spending and sales will be very high, but clearly not be the cause of the sale. The most blatant example is branded paid search, when the prospect has already typed the company name into the search bar and then receives a paid link to that company site. Clearly the prospect was already on their way to the site, so there is a high correlation between the clicks and sales. How much credit the paid link deserves is discussed in other articles written by myself and others. But what is obvious without further reading is that a large amount of the intent to buy existed prior to the paid link and should not be estimated as being caused by the link.
This situation is problematic for Robyn. Without priors, there is no direct way to correct for this error.
New Media Channels
Marketing is constantly evolving, and companies frequently look for customers via new media channels. By their nature new media channels will have fewer weeks of data and probably lower spending levels as well. That’s a harder measurement situation, but it is further exacerbated by the bias of Robyn’s Ridge Regression to assume a zero ROI when the ROI is unclear. New medias probably do enjoy less success than tried-and-tested ones but biasing their measurements towards zero doesn’t really give them a fair chance.
Increasing Measurement Detail
Another common situation is for a client to get all the MMM answers, approve of them, and then want more detail. This often results in taking a category of spend and dividing it into two distinct categories. Ideally the historical data should be split, but often that’s not possible so the media is simply divided into two new data streams going forward. This is the clearest case for when a Bayesian prior would have been useful. The data was already in the model and being measured, and logically we know that the two new data streams should add up to the prior aggregate data stream. However, with no such prior usage possible, Robyn now must estimate two new data streams with very little data for either one.
Financial Consequences
The financial consequences of each of these errors will clearly vary by the size of the investment and the errors. However, directionally we can intuit the consequences.
Non-causal correlation
Interpreting non-causal correlation as the cause of sales will result in taking advertising investments away from what’s working and putting it into what isn’t working. This study conducted by researchers at Ebay, the University of Chicago, and the University of California at Berkley shows that, in Ebay’s case, branded paid search caused “zero” sales while being very correlated to sales. Taking that as the worst-case scenario, then the loss to the company would be the amount of funds spent on the ineffective media times the ROI that would have been achieved if the funds had been dedicated elsewhere. The loss to the CMO would be reflected in a loss of trust and shortened tenure as increasing the budgets of ineffective media would result in few actual sales so that the marketing promises made would never materialize.
Interpreting non-causal correlation as the cause of sales will result in taking advertising investments away from what’s working and putting it into what isn’t working. This study conducted by researchers at Ebay, the University of Chicago, and the University of California at Berkley shows that, in Ebay’s case, branded paid search caused “zero” sales while being very correlated to sales. Taking that as the worst-case scenario, then the loss to the company would be the amount of funds spent on the ineffective media times the ROI that would have been achieved if the funds had been dedicated elsewhere. The loss to the CMO would be reflected in a loss of trust and shortened tenure as increasing the budgets of ineffective media would result in few actual sales so that the marketing promises made would never materialize.
New media channels
Biasedly measuring new medias as less effective than they are, would have the consequence of hampering growth. Management would get more bad news than was warranted and that would close doors that could have helped to grow the company. On a personal level the blame would probably fall on the manager spearheading the use of the new media channel.
Biasedly measuring new medias as less effective than they are, would have the consequence of hampering growth. Management would get more bad news than was warranted and that would close doors that could have helped to grow the company. On a personal level the blame would probably fall on the manager spearheading the use of the new media channel.
Increasing measurement detail
Splitting existing media into multiple categories which are then measured to be less than the total is probably the error that would be the most obvious to even the casual observer. Odds are that it would simply be interpreted as measurement error and erode confidence in the MMM solution as a whole. But if the deflated measurements were accepted as truth, the most likely result would be to cut the new category of spend that had measured to be less effective. That’s probably the least consequential of the three negatives listed here, but needlessly cutting a positive ROI advertising channel is never a good thing.
Splitting existing media into multiple categories which are then measured to be less than the total is probably the error that would be the most obvious to even the casual observer. Odds are that it would simply be interpreted as measurement error and erode confidence in the MMM solution as a whole. But if the deflated measurements were accepted as truth, the most likely result would be to cut the new category of spend that had measured to be less effective. That’s probably the least consequential of the three negatives listed here, but needlessly cutting a positive ROI advertising channel is never a good thing.
How to work around the problem
Fortunately, there is a work around, but before I get to it, I want to make clear that I’m not suggesting that Robyn is not an effective solution generally, nor am I saying that using priors is aways advantageous. However, I am saying that the three problematic situations described above certainly could have benefited from a thoughtfully selected prior based on prior data, other studies, and so on, and therefore represent three areas of weakness for Robyn.
One way to “insert priors into Robyn” is by using its calibration to independent A/B tests. Robyn is designed to minimize three sources of error and one of those is calibrating to A/B tests. In effect it’s like adding priors to the model with the prior coming from the test. Whether you set up new tests, or create simulated tests based on existing evidence, is your choice, but it’s one way to mitigate some known problem areas in Robyn.
MMM Audits
If you’ve made it this far, I hope you feel the article was worth the time you spent reading it. As stated at the outset, I plan to write about the other 24 areas of vulnerability that I’ve noticed in Robyn in the coming weeks as long as interest remains.
If you’d like to find opportunities to improve a mix model you’re working with, contact me at [email protected]. I offer both low priced MMM audits, or I can build you a new mix model from scratch as you prefer.
Related content you might like
© COPYRIGHT 2021. ALL RIGHTS RESERVED.