Skip to contents

R-CMD-check CRAN status Spain Electoral Project

cislongitudinal lets you download, cache, update, inspect, and query the longitudinal CIS dataset published by Spain Electoral Project as a Parquet file. The package keeps the data outside your project directory and uses Arrow so filters and column selection can run before data is collected into memory.

Installation

# install.packages("remotes")
remotes::install_github("hmeleiro/cislongitudinal")

Download the dataset

The local file is stored in:

By default this is the application data directory returned by:

rappdirs::user_data_dir("cislongitudinal", "spainelectoralproject")

Check the local copy

Update

cis_update() reads the remote manifest and replaces the local Parquet only after a successful download and validation.

Read data

Filter by date:

df <- cis_read(fecha_min = "2023-01-01")

Filter by date range:

df <- cis_read(
  fecha_min = "2020-01-01",
  fecha_max = "2024-12-31"
)

Filter by study code:

df <- cis_read(estudios = c(3420, 3421, 3422))

Select columns:

df <- cis_read(
  fecha_min = "2023-01-01",
  cols = c("estudio", "fecha", "genero", "edad", "idv", "recuerdo")
)

When keep_core_cols = TRUE, cis_read() always keeps the core columns: estudio, fecha, genero, and edad.

Lazy queries

Use collect = FALSE to keep working lazily:

df_lazy <- cis_read(
  fecha_min = "2020-01-01",
  collect = FALSE
)

df_lazy |>
  dplyr::count(estudio) |>
  dplyr::collect()

For advanced queries, open the local dataset directly:

cis_open() |>
  dplyr::filter(fecha >= as.Date("2023-01-01")) |>
  dplyr::select(estudio, fecha, genero, edad) |>
  dplyr::collect()

Explore columns and studies