Skip to contents

Uses Arrow and dplyr to push filters and column selection down before data is collected into memory.

Usage

cis_read(
  fecha_min = NULL,
  fecha_max = NULL,
  estudios = NULL,
  cols = NULL,
  keep_core_cols = TRUE,
  collect = TRUE
)

Arguments

fecha_min

Minimum survey date, included. NULL means no lower bound.

fecha_max

Maximum survey date, included. NULL means no upper bound.

estudios

Optional vector of study codes.

cols

Optional tidyselect expression or character vector of columns to select.

keep_core_cols

If TRUE, always include estudio, fecha, genero, and edad when cols is supplied.

collect

If TRUE, return a tibble in memory. If FALSE, return a lazy Arrow/dplyr object.

Value

A tibble or a lazy Arrow query.

Examples

if (FALSE) { # \dontrun{
cis_read(fecha_min = "2023-01-01")

cis_read(
  fecha_min = "2020-01-01",
  fecha_max = "2024-12-31",
  cols = c("estudio", "fecha", "genero", "edad", "idv", "recuerdo")
)

cis_read(cols = dplyr::starts_with("val_"))

cis_read(fecha_min = "2020-01-01", collect = FALSE) |>
  dplyr::count(estudio) |>
  dplyr::collect()
} # }