Error bar plots showing the point estimate (and it's 95% confidence interval) of the effect of DYS on MIL (in litres). On each plot the red horizontal line shows effect of DYS on MIL.
Assumed DAG under respective model



In epidemiology we define exposure and outcome variables and interest lies in estimating the strength of the causal effect between exposures and outcomes.
A confounder is a third variable that is associated with both an exposure and an outcome (Greenland and Morgenstern 2001). Controlling for, or conditioning an analysis on a confounder (using, for example, stratification or regression) provides an unbiased estimate of the association between the exposure and the outcome.
A collider, on the other hand, is a third variable that is influenced by both an exposure and an outcome. Controlling for, or conditioning an analysis on a collider (using, for example, selection, stratification or regression) leads to a biased estimate of the association between the exposure and the outcome (Cole et al. 2009). The presence of collider bias is likely to explain the paradoxical findings that often appear in the medical and veterinary epidemiological literature (Rohrer 2018).
The objective of this web page is to illustrate the effect of conditioning on a collider, based on a realistic example from food animal practice. Our interest is to estimate the effect of an exposure (the presence of dystocia, DYS) on an outcome (first herd test milk yield, MIL) in dairy cattle using a linear regression model. For this example age of the cow (AGE) is a confounder. Whether or not the cow was examined and treated by a veterinarian (VET) at the time of calving is a collider.
Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C (2009) Illustrating bias due to conditioning on a collider. International Journal of Epidemiology 39: 417-420.
Freedman D (2010) Statistical Models and Causal Inference A Dialogue with the Social Sciences. Cambridge University Press London.
Greenland S, Morgenstern H (2001) Confounding in health research. Annual Review of Public Health 22: 189-212.
Luque-Fernandez M, Schomaker M, Redondo-Sanchez D, Jose Sanchez Perez M, Vaidya A, Schnitzer M (2019) Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application. International Journal of Epidemiology 48:640 - 653. DOI: 10.1093/ije/dyy275.
Pearl J (1995) Causal diagrams for empirical research. Biometrika 82: 669-688.
Rohrer JM (2018) Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science 1: 27-42.
Vanderweele TJ, Vansteelandt S (2009) Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface 2: 457-468.
Weiskopf N, Dorr D, Jackson C, Lehmann H, Thompson C (2023) Healthcare utilization is a collider: an introduction to collider bias in EHR data reuse. Journal of the American Medical Informatics Association 30: 971 - 977. DOI: 10.1093/jamia/ocad013.
Using an example from food animal practice we use a simulation approach to generate a dataset to demonstrate incorrect inference that might arise due to collider selection bias.
Several factors influence first herd test milk yield (MIL) in dairy cows including the presence of dystocia (DYS) at calving and cow age (AGE).
MIL is positively associated with AGE. MIL is negatively associated with dystocia (DYS). Whether or not a cow is examined and treated by a veterinarian at the time of calving (VET) is positively associated with DYS and MIL, noting that DYS precedes a veterinary visit (VET).
In this example AGE confounds the association between DYS and MIL because it is associated with both the exposure (DYS) and the outcome (MIL) and the effect of AGE on MIL and the effect of DYS on MIL occur through two independent pathways. We say that AGE is on 'the back-door path' between DYS and MIL.
VET, on the other hand, is a collider in this example. VET is associated with DYS. VET is also associated with MIL. We expect that it will be unlikely that an investigator will conduct an analysis including VET as an explanatory variable because they'll be well aware that VET is a consequence of DYS, not a risk factor for DYS. What is conceivable is that investigators might only include cows visited by a veterinarian in a study (using, for example, practice records) resulting in restriction of the data by VET and leading to, as we demonstrate on this page, collider selection bias.
Simulating a data set based on a DAG is useful for learning about the effect of colliders on inference. Given that we're in the privileged position of knowing the truth we can make an objective assessment of how different model formulations approximate this `truth`. In this example we want to estimate the effect of DYS on MIL. The default estimate of effect is -3 which means that if a cow is DYS positive she will produce 3 litres less milk at first herd test compared with cows that are DYS negative. For each of the model specifications we want to know how close our estimate of β1 is to -3.
We propose that collider selection bias is important in some situations and not in others. Experiment with changing the prevalence of DYS and VET and each of the odds ratio estimates. What happens to β1 when:
Download the data if you want to repeat the analyses presented in this app using your own statistical software.
Download simulations (.csv)This application is based on code provided as supplementary material for the article by Luque-Fernandez et al. (2019). It has been adapted to use an animal health example by Mark Stevenson from the Veterinary Epidemiology @ Melbourne group at the Melbourne Veterinary School, University of Melbourne, Parkville 3010, Victoria Australia. We thank Luque-Fernandez and colleagues for making their code available.
Luque-Fernandez M, Schomaker M, Redondo-Sanchez D, Jose Sanchez Perez M, Vaidya A, Schnitzer M (2019) Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application. International Journal of Epidemiology 48:640 - 653. DOI: 10.1093/ije/dyy275.