Inverse Probability of Treatment Weighting

Because the causal effect of A on Y is identified by adjusting for the confounders L1 and L2, we can estimate by inverse probability of treatment weighting.

  1. Model \(P(A = 1\mid L_1, L_2)\), the conditional probability of treatment given confounders
  2. Predict the conditional probability \(\pi\) of each unit’s observed treatment
    • if A = 1, predict \(\pi = P(A = 1 \mid L_1,L_2)\)
    • if A = 0, predict \(\pi = P(A = 0 \mid L_1,L_2)\)
  3. Aggregate to the average causal effect
    • estimate the expected outcome under treatment \(E(Y^1)\) by a weighted mean of treated units’ outcomes, weighted by \(\frac{1}{\pi}\)
    • estimate the expected outcome under control \(E(Y^0)\) by a weighted mean of untreated units’ outcomes, weighted by \(\frac{1}{\pi}\)
    • estimate the average causal effect by the difference

The code below assumes you have generated data as on the data page.

1) Model

The code below uses logistic regression to model the conditional probability of treatment.

model <- glm(
  A ~ L1 + L2, 
  data = data, 
  family = binomial
)

Call:
glm(formula = A ~ L1 + L2, family = binomial, data = data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.0740     0.1737 -11.938  < 2e-16 ***
L1            1.0845     0.1639   6.618 3.65e-11 ***
L2            1.1877     0.1548   7.673 1.68e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 474.41  on 499  degrees of freedom
Residual deviance: 354.86  on 497  degrees of freedom
AIC: 360.86

Number of Fisher Scoring iterations: 5

2) Predict

The code below predicts the conditional probability of each unit’s observed treatment value, also known as the propensity score.

predicted <- data |>
  # Predict the probabilities that A = 1 and A = 0
  mutate(
    p_A_equals_1 = predict(model, type = "response"),
    p_A_equals_0 = 1 - p_A_equals_1
  ) |>
  # Assign the propensity score based on the observed treatment
  mutate(
    pi = case_when(
      A == 1 ~ p_A_equals_1,
      A == 0 ~ p_A_equals_0
    )
  )
# A tibble: 500 × 10
        L1     L2      Y0    Y1 propensity_score     A       Y p_A_equals_1
     <dbl>  <dbl>   <dbl> <dbl>            <dbl> <int>   <dbl>        <dbl>
1  0.00304  1.03   0.677   1.59          0.276       0  0.677       0.300  
2 -2.35    -1.66  -4.09   -3.53          0.00244     0 -4.09        0.00136
3  0.104   -0.912  0.0659  1.31          0.0569      0  0.0659      0.0455 
# ℹ 497 more rows
# ℹ 2 more variables: p_A_equals_0 <dbl>, pi <dbl>

3) Aggregate

The final step is to aggregate to an average causal effect estimate.

aggregated_Y1 <- predicted |>
  # Restrict to cases with A == 1
  filter(A == 1) |>
  # Calculate the weighted mean outcome
  summarize(estimate = weighted.mean(Y, w = 1 / pi))

aggregated_Y0 <- predicted |>
  # Restrict to cases with A == 1
  filter(A == 0) |>
  # Calculate the weighted mean outcome
  summarize(estimate = weighted.mean(Y, w = 1 / pi))

average_effect_estimate <- aggregated_Y1 - aggregated_Y0
  estimate
1 1.288555

Closing thoughts

Inverse probability of treatment weighting is a powerful strategy because it bridges nonparametric causal identification to longstanding strategies from survey sampling where units from a population are sampled with known probabilities of inclusion. The analogy is that outcomes under treatment are sampled with estimated inclusion probabilities (the probability of treatment). Just as in a population sample we would need to think carefully about the probability of sampling, treatment modeling encourages us to model the probability of receiving the observed treatment.

Here are a few things you could try next:

  • replace step (1) with another approach to estimate conditional treatment probabilities, such as a different functional form or a machine learning method
  • evaluate performance over many repeated simulations
  • evaluate performance at different simulated sample sizes