Quite often, errors will arise in the course of simulating or fitting models. These errors can be tricky to debug because they may happen within a simulation function and only in certain cases.
The primary place where errors occur in the simulation is in
generating the data and fitting models. Both generate()
and
fit()
have the same primary error handling options:
.warn_on_error
. When TRUE
, the default, a
warning is given when the simulation returned errors. If you expect
errors or have other ways of checking on them, you can make this
FALSE
..stop_on_error
. When TRUE
, the simulation
stops as soon as an error is encountered. This is useful when running
interactively and when errors are unexpected. Default is
FALSE
.The default behavior is to warn when there are errors, and storing
those error messages in a separate column .sim_error
. This
gives you the flexibility to continue with the simulation or
model-fitting process even if one particular data generation or model
fit hit an error.
generate()
Consider the following example:
We’d expect this to produce an error when n
is negative,
since that’s not a valid sample size.
What happens when we try to generate data based on the buggy specification?
set.seed(100)
generate_default = buggy_spec %>%
generate(1)
#> Warning in create_sim_results(specs = specs, x = x[c("meta_info", "specify", :
#> Simulation produced errors. See column '.sim_error'.
The default is for R to produce a warning, but to still return the
object. This lets us know that simpr
hit some issues. We
can follow the advice given and take a look at the returned object.
generate_default
#> tibble
#> --------------------------
#> # A tibble: 2 × 5
#> .sim_id size rep sim .sim_error
#> <int> <dbl> <int> <list> <chr>
#> 1 1 -10 1 <NULL> "Error in `map()`:\nℹ In index: 1.\nℹ W…
#> 2 2 10 1 <tibble [10 × 1]> <NA>
The first simulation had an error and thus did not return any
simulation data. We can see the error message given directly in the
.sim_error
column.
We would get the same output without a warning if we set
.warn_on_error = FALSE
:
set.seed(100)
generate_no_warn = buggy_spec %>%
generate(1, .warn_on_error = FALSE)
generate_no_warn
#> tibble
#> --------------------------
#> # A tibble: 2 × 5
#> .sim_id size rep sim .sim_error
#> <int> <dbl> <int> <list> <chr>
#> 1 1 -10 1 <NULL> "Error in `map()`:\nℹ In index: 1.\nℹ W…
#> 2 2 10 1 <tibble [10 × 1]> <NA>
Alternatively, we can set .stop_on_error = TRUE
to
simply stop the simulation and immediately produce the error. This is
often useful during initial development of the simulation, when an error
often means that something is misspecified:
fit()
If there is already an error in the simulation, this will usually propagate to any model-fitting as well. Recall the output above:
generate_no_warn
#> tibble
#> --------------------------
#> # A tibble: 2 × 5
#> .sim_id size rep sim .sim_error
#> <int> <dbl> <int> <list> <chr>
#> 1 1 -10 1 <NULL> "Error in `map()`:\nℹ In index: 1.\nℹ W…
#> 2 2 10 1 <tibble [10 × 1]> <NA>
Since the simulation with size = -10
has
NULL
for the generated data, any model-fitting will usually
produce errors as well.
fit_propagate = generate_no_warn %>%
fit(t_test = ~ t.test(y))
#> Warning in fit.simpr_tibble(., t_test = ~t.test(y)): fit() produced errors.
#> See '.fit_error_*' column(s).
fit_propagate
#> tibble
#> --------------------------
#> # A tibble: 2 × 7
#> .sim_id size rep sim .sim_error t_test .fit_error_t_test
#> <int> <dbl> <int> <list> <chr> <list> <chr>
#> 1 1 -10 1 <NULL> "Error in `ma… <NULL> "Error in eval(s…
#> 2 2 10 1 <tibble [10 × 1]> <NA> <htest> <NA>
#>
#> t_test[[1]]
#> --------------------------
#> NULL
Now, there are new columns, one for errors on each fit attempted. We can see a situation where we have multiple errors:
fit_multi_error = generate_no_warn %>%
fit(t_test = ~ t.test(y),
chisq = ~ chisq.test(y))
#> Warning in fit.simpr_tibble(., t_test = ~t.test(y), chisq = ~chisq.test(y)):
#> fit() produced errors. See '.fit_error_*' column(s).
The first fit t_test
failed on just the first row, with
size = -10
, but the second fit chisq
was
nonsensical either way and so two .fit_error
columns are
produced.
fit_multi_error
#> tibble
#> --------------------------
#> # A tibble: 2 × 9
#> .sim_id size rep sim .sim_error t_test .fit_error_t_test chisq
#> <int> <dbl> <int> <list> <chr> <list> <chr> <list>
#> 1 1 -10 1 <NULL> "Error in `map(… <NULL> "Error in eval(s… <NULL>
#> 2 2 10 1 <tibble> <NA> <htest> <NA> <NULL>
#> # ℹ 1 more variable: .fit_error_chisq <chr>
#>
#> t_test[[1]]
#> --------------------------
#> NULL
#>
#> chisq[[1]]
#> --------------------------
#> NULL
Even in this degenerate case, tidy_fits
still gathers
together whatever valid fits it can find, and reproduces the errors when
there is not a valid fit:
fit_multi_error %>%
tidy_fits
#> # A tibble: 4 × 14
#> .sim_id size rep .sim_error Source .fit_error estimate statistic p.value
#> <int> <dbl> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 -10 1 "Error in `m… t_test "Error in… NA NA NA
#> 2 1 -10 1 "Error in `m… chisq "Error in… NA NA NA
#> 3 2 10 1 <NA> t_test <NA> 0.172 0.826 0.430
#> 4 2 10 1 <NA> chisq "Error in… NA NA NA
#> # ℹ 5 more variables: parameter <dbl>, conf.low <dbl>, conf.high <dbl>,
#> # method <chr>, alternative <chr>
Sometimes it is difficult to tell what went wrong in a simulation
just from the error message. In this case it is useful to use R’s
powerful debugging features. See ?browser
and the debugging
chapter from Advanced
R for general information on using R’s debugger.
Both generate()
and fit()
allow you to set
.debug = TRUE
to look interactively at what is coming in to
R. This allows you to see how the data is coming in and to figure out
what is causing the error.
Let’s use the same example as above:
If we run this in Rstudio, we’ll see this in the .debugger view:
This is a behind-the-scenes function created by simpr
based on the specification of y
, but we can see the
rnorm(n)
that was originally in the specification.
And this in the console:
What’s nice is that we can see what the incoming data look like, so
we can for instance type n
at the console to see what’s
coming in:
We can type "Q"
to exit debugging or "n"
to
move to the next debug step. (If we have a variable called
n
, we would need to use print(n)
to see its
value.)
If you aren’t seeing any output when you type commands in the
debugger, try running the command sink()
, which seems to
allow the output to show up in some cases.
This same mechanism is used by fit()
:
set.seed(100)
specify(y1 = ~ rnorm(10),
y2 = ~ rnorm(10)) %>%
generate(2) %>%
fit(t_test = ~ t.test(y1, y2),
.debug = TRUE)
The function displays a bit differently:
Our input, t.test(y)
is still present, but it is within
the call for with
. The input simulation data is stored in
the special variable .
:
Browse[2]> .
# A tibble: 10 × 2
y1 y2
<dbl> <dbl>
1 0.922 -0.559
2 0.0958 0.827
3 -0.866 0.486
4 0.922 -0.717
5 1.15 -0.375
6 1.71 -0.0771
7 1.44 0.508
8 -0.244 -1.59
9 0.153 0.386
10 -0.294 1.18
We can perform whatever test calculations on this sample data to diagnose issues that are coming up.
We can type n
to move to the next simulation, which has
different data:
Browse[2]> n
exiting from: ...furrr_fn(...)
debugging in: ...furrr_fn(...)
debug: with(data = ., expr = t.test(y1, y2))
Browse[2]> .
# A tibble: 10 × 2
y1 y2
<dbl> <dbl>
1 -0.335 0.212
2 1.23 -0.834
3 -0.821 -0.668
4 -0.0373 0.246
5 0.436 -0.0761
6 -1.83 -0.704
7 0.765 0.507
8 0.825 1.07
9 -2.49 2.61
10 1.13 0.261