Motivation
In epidemiology we define exposure and outcome variables and interest lies in estimating the strength of the causal effect between exposures and outcomes.
A confounder is a third variable that is associated with both an exposure and an outcome (Greenland and Morgenstern 2001). Controlling for, or conditioning an analysis on a confounder (using, for example, stratification or regression) provides an unbiased estimate of the association between the exposure and the outcome.
A collider, on the other hand, is a third variable that is influenced by both an exposure and an outcome. Controlling for, or conditioning an analysis on a collider (using, for example, selection, stratification or regression) leads to a biased estimate of the association between the exposure and the outcome (Cole et al. 2009). The presence of collider bias is likely to explain the paradoxical findings that often appear in the medical and veterinary epidemiological literature (Rohrer 2018).
The objective of this web page is to illustrate the effect of conditioning on a collider, based on a realistic example from food animal practice. Our interest is to estimate the effect of an exposure (the presence of twins, TWIN) on an outcome (dystocia, DYS) using a binary logistic regression model. For this example age of the cow (AGE) is a confounder. Whether or not the cow was examined and treated by a veterinarian (VET) at the time of calving is a collider.
References
Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C (2009) Illustrating bias due to conditioning on a collider. International Journal of Epidemiology 39: 417-420.
Freedman D (2010) Statistical Models and Causal Inference A Dialogue with the Social Sciences. Cambridge University Press London.
Greenland S, Morgenstern H (2001) Confounding in health research. Annual Review of Public Health 22: 189-212.
Luque-Fernandez M, Schomaker M, Redondo-Sanchez D, Jose Sanchez Perez M, Vaidya A, Schnitzer M (2019) Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application. International Journal of Epidemiology 48:640 - 653. DOI: 10.1093/ije/dyy275.
Pearl J (1995) Causal diagrams for empirical research. Biometrika 82: 669-688.
Rohrer JM (2018) Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science 1: 27-42.
Vanderweele TJ, Vansteelandt S (2009) Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface 2: 457-468.
Weiskopf N, Dorr D, Jackson C, Lehmann H, Thompson C (2023) Healthcare utilization is a collider: an introduction to collider bias in EHR data reuse. Journal of the American Medical Informatics Association 30: 971 - 977. DOI: 10.1093/jamia/ocad013.