Simulate Data

This exercise works with simulated samples. Taking the nonparametric estimates from 5 million cases as the truth, you will generate a simulated sample of a much smaller size using the code below.

If you are a Stata user, see the bottom of this page for code. The page mainly supports coding in R.

Prepare the environment by loading the tidyverse package.

library(tidyverse)

The function below simulates a sample of 100 cases.

simulate <- function(n = 100) {
  read_csv("https://ilundberg.github.io/description/assets/truth.csv") |>
    slice_sample(n = n, weight_by = weight, replace = T) |>
    mutate(income = exp(rnorm(n(), meanlog, sdlog))) |>
    select(year, age, sex, income)
}

We can see how it works below,

simulated <- simulate(n = 100)

Rows: 420 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): sex
dbl (5): year, age, meanlog, sdlog, weight

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

and can print a bit of the output.

simulated |> print(n = 3)

# A tibble: 100 × 4
   year   age sex     income
  <dbl> <dbl> <chr>    <dbl>
1  2017    30 female  28993.
2  2017    41 female  31110.
3  2012    34 male   271444.
# ℹ 97 more rows

Code for Stata users

I am mostly not a Stata user, and this is provided for secondary pedadogical purposes in case some people do not use R. If you are a Stata user, feel free to let me know how to improve this code.

set seed 90095

* Load true population data

import delimited https://ilundberg.github.io/description/assets/truth.csv

* Draw a sample of 100 X-values
* Need two supporting packages
*ssc install moremata
*ssc install gsample

* Draw the sample
gsample 100 [w = weight]

* Simulate individual income data
gen log_income = meanlog + sdlog * rnormal()
gen income = exp(log_income)

* Keep variables to work with

encode sex, gen(factorsex)
keep year age factorsex log_income income
rename factorsex sex