Addendum: rsample
tidymodels
is a fairly new companion collection of
packages to tidyverse
. The vision, as I understand it, is
that tidymodels
will eventually replace
modelr
.
One of the tidymodels
packages, rsample
,
provides some of the same (re)sampling tools that modelr
provides. I stuck with modelr
in the bootstrapping reading
because the functionality provided by rsample
is more
flexible than we really need and I didn’t want to obscure the main point
without more programming-related technicalities. But, if you’d like to
understand how rsample
works, here are some brief notes to
get you going. First of all, install the package:
install.packages("rsample")
Here’s a toy tibble to get things going. It has just one column, containing the numbers 1 through 6.
<- tibble(x = 1:6) df
The role of the modelr
object type “resample” is played
by the rsample
object type “rsplit.” To construct an rsplit
object, the basic constructor is make_split
. This function
takes two arguments: the first is a named list of the form
list(analysis = ..., assessment = ...)
where both ...
are lists of integers, and the second is
the tibble.
For example:
library(rsample)
<- list(analysis = 1:3, assessment = 4:6)
indices <- make_split(indices, df) split
This creates an rsplit object named split
which contains
(pointers to) rows 1 through 3 in its “analysis group” and (pointers to)
rows 4 through 6 in its “assessment group.” As far as we’re concerned in
this class, the assessment group is irrelevant. If you run
as_tibble(split)
, you’ll get a tibble that just contains
rows 1 through 3.
Note. If you really want to specify which group to make a
tibble out of, you can run as_tibble(analysis(split))
and
as_tibble(assessment(split))
.
There is no dedicated rsample
function that takes a
single resample with replacement (ie, a function analogous to
modelr::resample_bootstrap
), but we can recreate this
functionality as follows:
= function(df) {
resample_bootstrap seq(nrow(df)) %>%
sample(nrow(df), replace = TRUE) %>%
as_mapper(~ list(analysis = ., assessment = setdiff(seq(nrow(df)), .)))() %>%
make_splits(df)
}
as_tibble(resample_bootstrap(df))
There is a built-in rsample
function which takes
multiple bootstrap resamples, analogous to
modelr::bootstrap
. It is the function
bootstraps
(note the extra s
in the name).
<- bootstraps(df, times = 100) resamples
With this understanding, you should now be able to go through the
bootstrapping reading using rsample
in place of
modelr
.