3 Basic functionality

This page illustrates the basic functionality of the pstratreg function. This function conducts parametric principal stratification analysis to estimate average causal effect among the always-valid subgroup whose outcome would exist in either treatment condition.

Jargon? Start with the first page on the goal!

The package automates the process to

  • estimate a mediator model
  • estimate an outcome model
    • allowing heteroskedasticity if needed
  • calculate the conditional probability of being always-valid
  • implement monotonicity assumptions
  • bound estimates using the conditional outcome distribution and the proportion in the always-valid subgroup
  • return estimates, conditional on population subgroups if requested

3.1 Simulate data

We first simulate some data for illustration.

The data has four variables

  • continuous confounder x
  • binary treatment a
  • binary mediator m
  • continuous outcome y
    • y is NA when m = FALSE
library(tidyverse)
library(pstratreg)
data <- pstratreg_sim(n = 100)
#>            x     a    m          y
#> 1 -0.8499974  TRUE TRUE  0.4490180
#> 2  0.3409742 FALSE TRUE -1.3027644
#> 3 -0.7508652 FALSE TRUE -1.6457591
#> 4  0.8077379 FALSE TRUE  0.7376321
#> 5 -0.5110795  TRUE TRUE -0.1220103

3.2 The pstratreg function

The call below runs a principal stratification regression analysis with default options, returning estimates of the average treatment effect among the latent stratum who would have valid outcomes regardless of treatment.

result <- pstratreg(
  formula_y = formula(y ~ x*a),
  formula_m = formula(m ~ x*a),
  data = data,
  treatment_name = "a"
)
#> Effect on mediator, where mediator indicates whether outcome will be valid
#> # A tibble: 1 × 3
#>   mhat1 mhat0 effect_m
#>   <dbl> <dbl>    <dbl>
#> 1 0.891 0.812   0.0795
#> 
#> Effect on outcome among those who would have a valid outcome regardless of treatment
#> # A tibble: 1 × 3
#>   effect_y_lower effect_y_naive effect_y_upper
#>            <dbl>          <dbl>          <dbl>
#> 1          0.639           1.18           1.72

3.3 Positive monotonicity

If you believe that the TRUE value of the treatment never causes the outcome to be undefined, then you might add the monotonicity_positive = TRUE option.

Sometimes, the monotonicity assumption disagrees with the empirical estimates in at least some cases. For example, we assume that treatment never prevents a valid outcome but empirically we estimate that the treatment increases the probability that the treatment increases the value of the mediator in some subgroups. When this happens, the package issues a warning.

Empirical monotonicity violations may be non-troubling; they can occur in estimates due to random chance from sampling variability. Because the user has assumed monotonicity, the package assumes that any violations arise purely from estimation errors. The predicted values of the mediator under each treatment condition in these cases are forced to be equal, at the midpoint of the two predicted values.

Generally, if the warning tells you that monotonicity is violated in only a small percent of cases, it may be warranted to proceed. If monotonicity is empirically violated in many cases, then you may need to rethink this assumption.

result <- pstratreg(
  formula_y = formula(y ~ x*a),
  formula_m = formula(m ~ x*a),
  data = data,
  treatment_name = "a",
  monotonicity_positive = TRUE
)
#> Warning in pstratreg(formula_y = formula(y ~ x * a), formula_m = formula(m ~ : Monotonicity violated in 9 % of cases
#> Forcing mhat1_trunc = mhat0_trunc at midpoint of estimates for those
#> Effect on mediator, where mediator indicates whether outcome will be valid
#> # A tibble: 1 × 3
#>   mhat1 mhat0 effect_m
#>   <dbl> <dbl>    <dbl>
#> 1 0.891 0.812   0.0795
#> 
#> Effect on outcome among those who would have a valid outcome regardless of treatment
#> # A tibble: 1 × 3
#>   effect_y_lower effect_y_naive effect_y_upper
#>            <dbl>          <dbl>          <dbl>
#> 1           1.01           1.18           1.40

3.4 Negative monotonicity

Conversely, you can assume negative monotonicity with monotonicity_negative = TRUE. If you are not sure what negative monotonicity is, see the previous page on the big idea!

In this particular simulation, negative monotonicity does not hold and you see below that the warning appropriately alerts us that monotonicity is frequently empirically violated.

result <- pstratreg(
  formula_y = formula(y ~ x*a),
  formula_m = formula(m ~ x*a),
  data = data,
  treatment_name = "a",
  monotonicity_negative = TRUE
)
#> Warning in pstratreg(formula_y = formula(y ~ x * a), formula_m = formula(m ~ : Monotonicity violated in 91 % of cases
#> Forcing mhat1_trunc = mhat0_trunc at midpoint of estimates for those
#> Effect on mediator, where mediator indicates whether outcome will be valid
#> # A tibble: 1 × 3
#>   mhat1 mhat0 effect_m
#>   <dbl> <dbl>    <dbl>
#> 1 0.891 0.812   0.0795
#> 
#> Effect on outcome among those who would have a valid outcome regardless of treatment
#> # A tibble: 1 × 3
#>   effect_y_lower effect_y_naive effect_y_upper
#>            <dbl>          <dbl>          <dbl>
#> 1           1.16           1.18           1.20

3.5 Aggregate in subgroups

Instead of sample average effect estimates, you might want the estimate within subgroups defined by a grouping variable in the data. The group_vars argument allows you to specify a vector of variable names from data within which to aggregate results.

First we create some groups for illustration

data_with_groups <- data %>%
  mutate(group1 = x < -.5,
         group2 = x > .5)
#>            x     a    m          y group1 group2
#> 1 -0.8499974  TRUE TRUE  0.4490180   TRUE  FALSE
#> 2  0.3409742 FALSE TRUE -1.3027644  FALSE  FALSE
#> 3 -0.7508652 FALSE TRUE -1.6457591   TRUE  FALSE
#> 4  0.8077379 FALSE TRUE  0.7376321  FALSE   TRUE
#> 5 -0.5110795  TRUE TRUE -0.1220103   TRUE  FALSE
#> 6  0.7363362  TRUE TRUE  0.3218909  FALSE   TRUE

and then we apply the function to estimate within those groups.

result <- pstratreg(
  formula_y = formula(y ~ x*a),
  formula_m = formula(m ~ x*a),
  data = data_with_groups,
  treatment_name = "a",
  group_vars = c("group1","group2")
)
#> Effect on mediator, where mediator indicates whether outcome will be valid
#> # A tibble: 3 × 5
#>   group1 group2 mhat1 mhat0 effect_m
#>   <lgl>  <lgl>  <dbl> <dbl>    <dbl>
#> 1 FALSE  FALSE  0.915 0.812  0.104  
#> 2 FALSE  TRUE   0.968 0.843  0.125  
#> 3 TRUE   FALSE  0.770 0.774 -0.00347
#> 
#> Effect on outcome among those who would have a valid outcome regardless of treatment
#> # A tibble: 3 × 5
#>   group1 group2 effect_y_lower effect_y_naive effect_y_upper
#>   <lgl>  <lgl>           <dbl>          <dbl>          <dbl>
#> 1 FALSE  FALSE           0.653           1.18           1.72
#> 2 FALSE  TRUE            0.753           1.13           1.52
#> 3 TRUE   FALSE           0.433           1.23           2.03

3.6 Sample weights

If you have case weights from sampling with unequal probabilities, they can be provided in a vector of length nrow(data) using the weights argument.

Here we generate some hypothetical weights

sim_weights <- runif(nrow(data))

and then call the function. Note that the glm() used internally to estimate logistic regression will create a warning for non-integer #successes in a binomial glm! which is to be expected when weights are used in this function.

result <- pstratreg(
  formula_y = formula(y ~ x*a),
  formula_m = formula(m ~ x*a),
  data = data,
  treatment_name = "a",
  weights = sim_weights
)
#> Warning in eval(family$initialize): non-integer #successes
#> in a binomial glm!
#> Effect on mediator, where mediator indicates whether outcome will be valid
#> # A tibble: 1 × 3
#>   mhat1 mhat0 effect_m
#>   <dbl> <dbl>    <dbl>
#> 1 0.926 0.838   0.0882
#> 
#> Effect on outcome among those who would have a valid outcome regardless of treatment
#> # A tibble: 1 × 3
#>   effect_y_lower effect_y_naive effect_y_upper
#>            <dbl>          <dbl>          <dbl>
#> 1          0.815           1.26           1.71