Short Course 1

Mediation analysis using R

Room: TBD (Sunday, 8 July 2018 from 9:00 am – 5:00 pm)


  • Theis Lange (University of Copenhagen) Copenhagen, Denmark
  • Stijn Vansteelandt (Ghent University) Ghent, Belgium


Within fields spanning drug testing, epidemiology and social sciences, researchers are often faced with the challenge of decomposing the effect of an exposure into different causal pathways working through defined mediator variables. The goal of such analyses is often to understand the mechanisms of the system or to suggest possible interventions. Over the last two decades mediation analyses has developed both a theoretical framework for understanding effect decomposition and specific tools to the actual estimations.

One of the most recent developments is natural effects models, which allows causal inference based mediation analysis to be understood (and performed) as traditional regression analyses. The associated R package – called medflex – allows for an uncomplicated use of natural effects models.

In this course we will introduce the modern causal inference based understanding of effect decomposition. All theoretical concepts will be set into the context of real life research problems. The course will enable the participants to conduct their own mediation analyses in settings with either single or multiple mediators, as we will carefully explain how to conduct analyses using medflex. The examples of the course will be taken from medical testing, epidemiology and social sciences. Several shorter exercise sessions throughout the course will ensure that participants actively use the just taught concepts.


  • Module 1a – Theory of counterfactual based mediation analysis (approx. 1 hour): Here we introduce the definition of counterfactuals by first refereeing to the simple RCT settings. We will not discuss concepts from causal inference which are not directly relevant to mediation analysis (e.g. IV techniques). Focus is on establishing the theoretical foundation for mediation analysis. As an integral part we also present the problems associated with conducting mediation analyses based on difference-of-coefficient methods and similar (the Baron and Kenny approach). We then progress to definitions of natural effects, identifiability conditions for these and Pearl’s mediation formula. We conclude the module with a discussion of the in-built curse of dimensionality in Pearl’s mediation formula. The format for this and all other a-modules will be a lecture with brief review questions during the presentation.
  • Module 2a – Natural effect models and their estimation (approx. 1 hour): Here we introduce the natural effect models as suggested by Vansteelandt and Lange [REFS 1+2] as a solution to the curse of dimensionality in Pearl’s mediation formula. To ease understanding this module is restricted to settings with a single mediator and a non-survival outcome. We explain how to estimate natural effects models from a theoretical perspective, as well as how it can be done in a few lines of R code using the the medflex package.
  • Module 2b – Two computer exercises on module 2a material (approx. 1 hour): The two exercises will be full but non-complex, mediation analyses based on real life data. Students are expected to conduct and interpret their own mediation analyses, and present both code and concluding sentences/presentation for approval. No formal or mandatory hand-in exercises are planned though.
  • Module 3a – Advanced use of medflex (approx. 30 min.): Here we explain how to use medflex in more complex settings. This will include survival outcomes, multi-dimensional mediators (with unknown internal causal structure) and non-linear effects of mediators. The module contains no new theory but many suggestions and hints on optimal use of medflex.
  • Module 3b – One computer exercise on 3a material (approx. 45 min.): The format will follow the format in module 2b.
  • Module 4a – Causally ordered mediators, theoretical understanding and estimation (approx. 1 hour): Here we explain the challenges of conducting mediation analyses with causally order mediators where one attempts to disentangle some or all of the individual pathways. The module will closely follow the methodology presented in Steen (2016) [REF 3].
  • Module 4b – One exercise on 4a material (approx. 1 hour): The format will follow the format in module 2b. Note that the present version of the medflex package does not contain in-built functionality for causally order mediators, but an updated version which does is planned well before Summer 2018.

 Learning Outcomes

At the end of the course, participants should:

  1. Understand the limitations of traditional mediation analyses based on structural equation models.
  2. Understand the meaning of natural direct and indirect effects.
  3. Be able to assess the plausibility of the assumptions required for the identification of natural direct and indirect effects.
  4. Be able to interpret natural effect models.
  5. Be able to conduct mediation analyses based on medflex in R.
  6. Be able to communicate the results and in-built assumptions of a mediation analysis to a non-specialized audience.


This course will proceed relatively slowly and is not mathematically highly abstract. All participants are expected to have ample experience with the application of regression based modeling (i.e. multiple linear regressions and logistic regressions as a minimum). This includes both the interpretation of effect parameters and the actual conduction of the analyses in R. Although the course will use concepts from the causal inference literature, we do not expect that participants will have had prior exposure to that literature.

About the Instructors

Theis Lange obtained his PhD in Statistics and Econometrics in 2008 from the University of Copenhagen. He is currently Associate Professor at the Department of Biostatistics, University of Copenhagen and visiting professor at the Center for Statistical Science, Peking University. He was among the first to consider natural effects and mediation analyses in general on survival outcomes. He was given the Kenneth Rothman Award for the related paper in 2012. His work spans from purely methodological contributions within mediation analysis to applications of these in many different medical fields; in total he has authored 80 peer-reviewed papers. He has been teaching courses or course components on causal inference in general and mediation analysis in particular at statistics departments (e.g. University of Copenhagen, September 2016 and Oslo University, November 2016), as well as at applied research institutions (e.g. the Radiation Exposure Research Center in Hiroshima, March 2016 and Novartis, October 2015).

Stijn Vansteelandt obtained a PhD in Mathematics (Statistics) in 2002 at Ghent University. After postdoctoral research at the Department of Biostatistics of the Harvard School of Public Health, he returned to Ghent University in 2004, where he is now Full Professor in the Department of Applied Mathematics, Computer Science and Statistics; moreover, he is Chair of Statistical Methodology at the Department of Epidemiology and Population Health at the London School of Hygiene and Tropical Medicine. Stijn Vansteelandt is a leading expert in causal inference. He has authored over 120 peer-reviewed publications in international journals on a variety of topics in biostatistics, epidemiology and medicine, such as the analysis of longitudinal and clustered data, missing data, mediation and moderation/interaction, instrumental variables, family-based genetic association studies, analysis of outcome-dependent samples and phylogenetic inference. He is Co-editor of Biometrics and has previously served as Associate Editor for the journals Biometrics, Biostatistics, Epidemiology, Epidemiologic Methods and the Journal of Causal Inference. He has been teaching short courses in causal inference on a yearly basis at Ghent University and the London School of Hygiene and Tropical Medicine and has experience for many years teaching international short courses on mediation and/or causal inference – the two most recent ones in Poland (September 2016) and Norway (June 2016).