Readit News logoReadit News
bertil · 2 years ago
The main reason for not using causal inference is not because data scientists don’t know about the different approaches or can’t imagine something equivalent (a lot of reinvention); forecasting is one of the most common tasks, after all.

The main reason is that they generally work for software companies where it’s easier and less susceptible to analyst influence to implement the suggested change and test it with a Random Control Trial. I remember running an analysis that found that gender was a significant explaining factor for behavior on our site; my boss asked (dismissively): What can we do with that information? If there is an assumption of how things work that doesn’t translate to a product change, that insight isn’t useful; if there is a product intuition, testing the product change itself is key, and there’s no reason to delay that.

There are cases where RCTs are hard to organize (for example, multi-sided platform businesses) of changes that can’t be tested in isolation (major brand changes). Those tend to benefit from the techniques described there——and they have dedicated teams. But this is a classic case of a complicated tool that doesn’t fit most use cases.

riedel · 2 years ago
Actually causal inference is also really hard to benchmark. My colleague started an effort to be actually able to reproduce and compare results. Also the algorithms often do not scale too well.

Everytime we wanted to use this for real data it is just a little bit too much effort and the results are not conclusive because it is hard to verify huge graphs. My colleague e.g. wanted to apply it explain risk confounders in investment funds.

I personally also do not like the definition of causality they base it on.

Dzidas · 2 years ago
One way to test this is through a placebo test, where you shift the treatment, such as moving it to an earlier date, which I have seen used successfully in practice. Another approach is to test the sensitivity of each feature, which is often considered more of an art than a science. In practice, I haven't observed much success with this method.
kjkjadksj · 2 years ago
You don’t need to look at a graph at all though, right? There are plenty of tests that can help you identify factors that could be significantly affecting your distribution
esafak · 2 years ago
Go on, please. What definition, and algorithms with scaling problems?
mikpanko · 2 years ago
A/b experiments are definitely a gold standard as they provide true causality measurement (if implemented correctly). However, they are often expensive to run: need to implement the feature in question (which is less than 50% going to work) and then collect data for 1-4 weeks before being able to make the decision. As a result only a small number of business decisions today rely on a/b tests. Observational causal inference can help bring causality into many of the remaining decisions, which need to be made quicker or cheaper.
jiggawatts · 2 years ago
The “gold standard” has failure modes that seem to be ignored.

E.g.: making UI elements jump around unpredictably after a page load may increase the number of ad clicks simply because users can’t reliably click on what they actually wanted.

I see A/B testing turning into a religion where it can’t be argued with. “The number went up! It must be good!”

lr1970 · 2 years ago
There can be a real ethical dilemma when applying A/B testing in medical setting. Placing someone with an incurable disease in a control group is condemning them to death while in treatment group they might have a chance. On the other hand, without a proper A/B testing methodology the drug efficacy cannot be established. So far no perfect solution to the dilemma has been found.
bertil · 2 years ago
Causal inference is useful, but it's neither quicker nor cheaper.
travisjungroth · 2 years ago
> As a result only a small number of business decisions today rely on a/b tests.

The default for all code changes at Netflix is they’re A/B tested.

hackernewds · 2 years ago
an expensive test is better than an expensive mistake :) within the scale of hundreds of decisions made with inherent bias of the product/biz/ops teams that direction misalignment can be catastrophic
Dzidas · 2 years ago
You can apply it to estimate the impact of any business decision if you have data, so not only IT companies can benefit from it. However, the problem arises when the results don't align with the business's expectations. I have firsthand experience with projects being abandoned simply because the results didn't meet expectations.
Anon84 · 2 years ago
For a hands on introduction to Causality, I would recommend “Causal Inference in Python” by M. Facure https://amzn.to/46byWnl Well written and to the point.

<ShamelessSelfPromotion> I also have a series of blog posts on the topic: https://github.com/DataForScience/Causality where I work through Pearls Primer: https://amzn.to/3gsFlkO </ShamelessSelfPromotion>

quotient · 2 years ago
The Facure text is good, can confirm
qmsoqm · 2 years ago
Thanks for the recommendation
hackernewds · 2 years ago
Thank you for sharing
mbowcut2 · 2 years ago
For what it’s worth, my undergraduate was in Economics with an emphasis in econometrics and this article touched on probably 80% of the curriculum.

The only problem is by the time I graduated I was somewhat disillusioned with most causal inference methods. It takes a perfect storm natural experiment to get any good results. Plus every 5 years a paper comes out that refutes all previous papers that use whatever method was in vogue at the time.

This article makes me want to get back into this type of thinking though. It’s refreshing after years of reading hand-wavy deep learning papers where SOTA is king and most theoretical thinking seems to occur post hoc, the day of the submission deadline.

mmmmpancakes · 2 years ago
Yeah, the only common theme I see in causal inference research is that every method and analysis eventually succumbs to a more thorough analysis that uncovers serious issues in the assumptions.

Take for instance the running example of catholic schoolings effect on test scores used by the boook Counterfactuals and Causal Inference. Subsequent chapter re-treat this example with increasingly sophisticated techniques and more complex assumptions about causal mechanisms, and each time they uncover a flaw in the analysis using techniques from previous chapters.

My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many. This is a great setting for publishing new research, but its the opposite of what you want in an industry setting where the bias is/should be towards methods that are relatively quick to test and validate and put in production.

I see researchers in large tech companies pushing for causal methodologies, but I'm not convinced they're doing anything particularly useful since I have yet to see convincing validation on production data of their methods that show they're better than simpler alternatives which will tend to be more robust.

Trombone12 · 2 years ago
> My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many.

This seems like a natural feature of any sensitive method, not sure why this is something to complain about. If you want your model to always give the answer you expected you don't actually have to bother collecting data in the first place, just write the analysis the way pundits do.

gridland · 2 years ago
just use propensity scores + ipw and you have the same thing as a rct. :)
mmmmpancakes · 2 years ago
From my experience propensity scores + ipw really doesn't get you far in practice. Propensity scoring models rarely balance all the covariates well (more often, one or two are marginally better and some may be worse than before). On top of that, IPW either assumes you don't have any cases of extreme imbalance, or, if you do you end up trimming weights to avoid adding additional variance, but in some cases you do even with trimmed weights..
hackernewds · 2 years ago
not necessarily unless you skim over meaningful confounding factors :)
g42gregory · 2 years ago
In Corporate and Medical data science fields, people begin to accept causal inference. It is difficult, as the subject is still in flux and under development.

I am aware of three reputable causal inference frameworks:

1. Judea Pearl's framework, which dominates in CS and AI circles

2. Neyman-Rubin causal model: https://en.wikipedia.org/wiki/Rubin_causal_model

3. Structural equation modelling: https://en.wikipedia.org/wiki/Structural_equation_modeling

None of them would acknowledge each other, but I believe the underlying methodology is the same/similar. :-)

It's good to see that it is becoming more accepted, especially in Medicine, as it will give more, potentially life-saving, information to make decisions.

In Social Sciences, on the other hand, causal inference is being completely willfully ignored. Why? Causal inference is an obstacle to making a preconceived conclusions based on pure correlations: something correlates with something, therefore ... invest large sums of money, change laws in our favor, etc... This works for both sides. Sadly, I don't think this could be fixed.

huitzitziltzin · 2 years ago
> In Social Sciences, on the other hand, causal inference is being completely willfully ignored. Why? Causal inference is an obstacle to making a preconceived conclusions based on pure correlations: something correlates with something, therefore ... invest large sums of money, change laws in our favor, etc... This works for both sides. Sadly, I don't think this could be fixed.

This remark is totally ignorant of the reality in the social sciences. Certainly in economics (which I know well) this hasn't described the reality of empirical work for more than 30 years. Political Science and Sociology are increasingly concerned with causal methods as well.

Medicine on the other hand is the opposite. Medical journals generally publish correlations when they aren't publishing experiments.

cubefox · 2 years ago
> In Social Sciences, on the other hand, causal inference is being completely willfully ignored.

This conflicts with what the article says:

> Economists and social scientists were among the first to recognize the advantages of these emerging causal inference techniques and incorporated in their research.

rubslopes · 2 years ago
Economist here. Causal inference is more alive than never, in Economics at least. A publication in an applied top journal practically has to use causal methods.

The DID literature, for instance, has been expanding at the speed of light -- it has never been so hard to keep up as it is now.

pocketsand · 2 years ago
Social sciences haven't ignored causal inference. Perhaps it’s not everywhere you’d like to see it, but it’s common in quant papers, its the backbone of econometrics, and you’d probably have trouble finding a single top ranked PhD program which doesn’t provide at least cursory coverage of the methods.
bigfudge · 2 years ago
Pearl’s framework isn’t really distinct from SEM as I understanding it. SEM is really just one tool to achieve the sort of adjustments that Pearl describes to make causal inferences from observational data.
_glass · 2 years ago
Social Scientist here. It is thriving under the name Qualitative comparative analysis for a quarter of a century. This is a good paper for more on the epistemological foundations: https://doi.org/10.1177/1098214016673902
mikpanko · 2 years ago
An important topic. Today most tech companies worship a/b experiments as the main way of being data-driven and bringing causality into decision-making. It deserves to be the gold standard.

However, most experiments are usually expensive: they require investing in building the feature in question and then collecting data for 1-4 weeks before being certain of the effects (plus there are long-term ones to worry about). Some companies report that fewer than 50% of their experiments prove truly impactful (my experience as well). That’s why only a small number of business decisions are made using experiments today.

Observational causal inference offers another approach, trading off full confidence in causality with speed and cost. It was pretty hard to run correctly so far, so it is not widely adopted. We are working on changing that with Motif Analytics and wrote a post with an in depth exploration of the problem: https://www.motifanalytics.com/blog/bringing-more-causality-... .

tmoertel · 2 years ago
Interestingly, recent research suggest that you can make better decisions by combining experimental and observational data than by using either alone:

https://ftp.cs.ucla.edu/pub/stat_ser/r513.pdf

> Abstract: Personalized decision making targets the behavior of a specific individual, while population-based decision making concerns a sub-population resembling that individual. This paper clarifies the distinction between the two and explains why the former leads to more informed decisions. We further show that by combining experimental and observational studies we can obtain valuable information about individual behavior and, consequently, improve decisions over those obtained from experimental studies alone.

mwexler · 2 years ago
I 100% agree with this blind spot. Most data science coursework avoids the very thing making it a science: the explanation of what change causes what effect. I've been surprised that year after year, programs at so many "Schools of Data Science" keep gliding over this area, perhaps alluding to it in an early stats course if at all.

It's an important part of validating that your data-driven output or decision is actually creating the change you hope for. So many fields either do poor experimentation or none at all, others are prevented from doing the usual "full unrestricted RCT": med and fin svcs and other regulated industries have legal constraints on what they can experiment with; in other cases, data privacy restricts the measures one can take.

I've had many data folks throw up their hands if they can't do a full RCT, and instead look to pre-post with lots of methodological errors. You can guess how many of those projects end up. (No, not every change needs a full test, and some things are easy rollback. But think of how many others would have benefitted from some uncertainty reduction.)

Sure, "LLM everything" and "just gbm it!" and "ok, just need a new feature table and I'm done!" are all important and fun parts of a data science day. But if I can't show that a data driven decision or output makes things better, then it's just noise.

Causal modeling gets us there. It improves the impact of ml models that recognize the power of causal interventions, and it gives us evidence that we are helping (or harming).

It's (IMO) necessary, but of course, not sufficient. Lots of other great things are done by ML eng and data scientists and data eng and the rest, having nothing to do with casual inference... But I keep thinking how much better things get when we apply a causal lens to our work.

(And next on my list would be having more data folks understanding slowly changing dimension tables, but this can wait for another time).

rcthompson · 2 years ago
I realize this is nitpicking a minor point in your comment, but I don't agree with your characterization of RCTs in medical research as being primarily constrained by laws and regulations. Any time I've discussed research on human subjects with doctors doing that research, the discussion of what is and is not an acceptable experiment has always been primarily driven by the risks of harm to the people involved in the study. Any time the law comes up, it's usually because the law requires an RCT in a specific setting, as opposed to preventing it (e.g. drug trials). (Of course in the setting of starting a company based on some medical product, the situation may be quite different.)

Biologists, if not data scientists, are used to considering indirect evidence for causality. It's why we sometimes accept studies performed in other organisms as evidence for biology in humans; it's why we sometimes accept research performed on post-mortem human tissue as being representative of the biology of living humans; to name but a few examples. A big part of a compelling high-impact biology (or bioinformatics) paper is often the innovative ways that one comes up to show causality when a direct RCT is not feasible, and papers are frequently rejected because they don't to the follow-up experiments required to show causality.

mwexler · 2 years ago
That's a very fair point. I didn't mean to suggest that harm to the patients or subjects was not the overriding factor, nor that bio, pharma, and other medical fields never do RCTs.

But there are a slew of laws and requirements around _how_ to run an RCT across the world of bio-related work, esp as it becomes a product. From marketing to manufacture to packaging, there are strict limits around where variation is allowed, at least anything involving the FDA in the US. (Some would say too many regs, others say not enough).

And in those cases, having a wider collection of ways to impute cause would be great.

tomrod · 2 years ago
I've self-learned for a long time in the causal inference space and model evaluation is a concern for me. My biggest concern is falsification of hypotheses. In ML, you have a clear mechanism to check estimation/prediction through holdout approaches. In classical metrics, you have model metrics that can be used to define reasonable rejection regions for hypothesis tests. But causal inference doesn't seem to have this, outside traditional model fit metrics or ML holdout assessment? So the only way a model is deemed acceptable is by prior biases?

If my understanding is right, this means that each model has to be hand-crafted, adding significant technical debt to complex systems, and we can't get ahead of the assessment. And yet, it's probably the only way forward for viable AI governance.

godelski · 2 years ago
> In ML, you have a clear mechanism to check estimation/prediction through holdout approaches.

To be clear, you can overfit while your validation loss does not decrease. If your train and test data are too similar then no holdout will help you measure generalization. You have to remember that datasets are proxies for the thing you're actually trying to model, they are not the thing you are modeling themselves. You can usually see this when testing on in class but out of train/test distribution data (e.g. data from someone else).

You have to be careful because there are a lot of small and non-obvious things that can fuck up statistics. There's a lot of aggregation "paradoxes" (Simpsons, Berkson's), and all kinds of things that can creep in. This is more perilous the bigger your model too. The story of the Monte Hall problem is a great example of how easy it is to get the wrong answer while it seems like you're doing all the right steps.

For the article, the author is far too handwavy with causal inference. The reason we tend not to do it is because it is fucking hard and it scales poorly. Models like Autoregressive (careful here) and Normalizing Flows can do causal inference (and causal discovery) fwiw (essentially you need explicit density models with tractable densities: referring to Goodfellow's taxonomy). But things get funky as you get a lot of variables because there are indistinguishable causal graphs (see Hyvarien and Pajunen). Then there's also the issues with the types of causalities (see Judea's Ladder) and counterfactual inference is FUCKING HARD but the author just acts like it's no big deal. Then he starts conflating it with weaker forms of causal inference. Correlation is the weakest form of causation, despite our often chanted saying of "correlation does not equate to causation" (which is still true, it's just in the class and the saying is more getting at confounding variable). This very much does not scale. Similarly discovery won't scale as you have to permute so many variables in the graph. The curse of dimensionality hits causal analysis HARD.

mjburgess · 2 years ago
To be clear, the mechanism for checking ML doesn't really check ML. There's really little value in a confidence interval conditional on the same experimental conditions that produced the dataset on which the model is trained. I'd often say it's actively harmful, since it's mostly misleading.

Insofar as causal inference has no such 'check', its because there never was any. Casual inference is about dispelling that illusion.

tomrod · 2 years ago
> Insofar as causal inference has no such 'check', its because there never was any. Casual inference is about dispelling that illusion.

Aye, and that's the issue I'm trying to understand. How to know if model 1 or model 2 is more "real" or, for my lack of a better term, more useful and reflective of reality?

We can focus on a particular philosophical point, like parsimony / Occam's razor, but as far as I can tell that isn't always sufficient.

There should be some way to determine a model's likelihood of structure beyond "trust me, it works!" If there is, I'm trying to understand it!