So their argument is that the tests are not providing causal inference, because the platforms can target the two A/B test groups differently.
With that in mind, my reading of this is: if you're a researcher trying to say "Advertisement A is more appealing than Y", you need causal inference and these tests won't give them to you. If you're a marketer just trying to determine what ad spend is more efficient, you don't need that causal inference.
In other words, they are debunking using these tools for research studies, and not as marketing tools.
I think it's somewhat more important than that. Basically the point is that the A/B tests tell you "which ad is more effective spending on this particular platform with the particular arrangement of users you selected". If you try to expand the user group after the A/B test is over, or if you take the ads to another platform, you shouldn't expect the results to hold.
So if you're an ad exec, you should make sure you run separate A/B tests for each specific ad channel you intend to spend money on. An A/B test on Facebook does not tell you anything about how well the same ads would do on Google, and even less so on TV (even assuming you are reaching roughly the same type of audience). This happens because Facebook's targeting mechanisms are not the same as Google's, and so they may optimize the ads differently than Google within the same population groups you selected and give you different results. And TV ads are not targeted at all, even if the audience you are reaching is the same.
True, but any ad exec worth their salt already knows this, if not because of different targeting algorithms, then at least different user and intent profiles (eg social users are generally younger and lower intent).
The issue seems to be that the platforms optimize before showing — presumably because they get paid for click-throughs.
Couldn’t they offer an unbiased randomization option with a different payment model (eg based on showings, not clicks)? Would preserve their revenue and researchers get a good tool.
There is a section 2.1.3 "Online platform studies versus lift tests" in the article. For the marketing tools purpose, you can use either (or some mixture of both). There are pros and cons to the choice.
>i.e., the inability to attribute user responses to ad creatives versus the platform’s targeting algorithms
Why would people expect to measure just the creative? How good the platform is able to target is part of what one would want to include in the measurement.
The goal of these experiments is to see which creative pushes the metrics the most.
The reason you need this is that the hypothesis is about the creative not the combination with the platform. You'd want the creative choice to go the same way across platforms.
Really found this paper interesting and concerning. I don’t work in marketing or run these kinds of studies. I do work at Qualtrics and have experience with A/B testing, in general. For those who work in this space and can relate to this paper, would it be helpful if Qualtrics developed some kind of audit panel in our product to help surface potential platform bias? For example, sample ratio mismatch or metadata balance checks.
Perhaps something that highlights the limited scope of a test's predictive power? E.g. for a test run on Facebook "This test is likely to be very useful for a another Facebook ad campaign with the same parameters, and at least somewhat useful for a Google ad campaign with equivalent parameters".
With that in mind, my reading of this is: if you're a researcher trying to say "Advertisement A is more appealing than Y", you need causal inference and these tests won't give them to you. If you're a marketer just trying to determine what ad spend is more efficient, you don't need that causal inference.
In other words, they are debunking using these tools for research studies, and not as marketing tools.
So if you're an ad exec, you should make sure you run separate A/B tests for each specific ad channel you intend to spend money on. An A/B test on Facebook does not tell you anything about how well the same ads would do on Google, and even less so on TV (even assuming you are reaching roughly the same type of audience). This happens because Facebook's targeting mechanisms are not the same as Google's, and so they may optimize the ads differently than Google within the same population groups you selected and give you different results. And TV ads are not targeted at all, even if the audience you are reaching is the same.
Couldn’t they offer an unbiased randomization option with a different payment model (eg based on showings, not clicks)? Would preserve their revenue and researchers get a good tool.
There is a section 2.1.3 "Online platform studies versus lift tests" in the article. For the marketing tools purpose, you can use either (or some mixture of both). There are pros and cons to the choice.
Dead Comment
Why would people expect to measure just the creative? How good the platform is able to target is part of what one would want to include in the measurement.
The goal of these experiments is to see which creative pushes the metrics the most.
While it would be convenient, platforms aren't the same. You can't just assume people will react the same across platforms.
Deleted Comment