Uses Arrow and dplyr to push filters and column selection down before data is collected into memory.
Usage
cis_read(
fecha_min = NULL,
fecha_max = NULL,
estudios = NULL,
cols = NULL,
keep_core_cols = TRUE,
collect = TRUE
)Arguments
- fecha_min
Minimum survey date, included.
NULLmeans no lower bound.- fecha_max
Maximum survey date, included.
NULLmeans no upper bound.- estudios
Optional vector of study codes.
- cols
Optional tidyselect expression or character vector of columns to select.
- keep_core_cols
If
TRUE, always includeestudio,fecha,genero, andedadwhencolsis supplied.- collect
If
TRUE, return a tibble in memory. IfFALSE, return a lazy Arrow/dplyr object.
Examples
if (FALSE) { # \dontrun{
cis_read(fecha_min = "2023-01-01")
cis_read(
fecha_min = "2020-01-01",
fecha_max = "2024-12-31",
cols = c("estudio", "fecha", "genero", "edad", "idv", "recuerdo")
)
cis_read(cols = dplyr::starts_with("val_"))
cis_read(fecha_min = "2020-01-01", collect = FALSE) |>
dplyr::count(estudio) |>
dplyr::collect()
} # }