| Title: | Download and Process Brazilian Education Data from INEP |
|---|---|
| Description: | Download and process public education data from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira). Provides functions to access microdata from the School Census (Censo Escolar), ENEM (Exame Nacional do Ensino Médio), SAEB (Sistema de Avaliação da Educação Básica), Higher Education Census (Censo da Educação Superior), ENADE (Exame Nacional de Desempenho dos Estudantes), ENCCEJA (Exame Nacional para Certificação de Competências de Jovens e Adultos), IDD (Indicador de Diferença entre os Desempenhos Observado e Esperado), CPC (Conceito Preliminar de Curso), IGC (Índice Geral de Cursos), CAPES graduate education data, FUNDEB (Fundo de Manutencao e Desenvolvimento da Educacao Basica), IDEB (Índice de Desenvolvimento da Educação Básica), and other educational datasets. Returns data in tidy format ready for analysis. Data source: INEP Open Data Portal <https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos>. |
| Authors: | Sidney da Silva Pereira Bissoli [aut, cre] (ORCID: <https://orcid.org/0009-0001-0442-3700>) |
| Maintainer: | Sidney da Silva Pereira Bissoli <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-27 11:03:11 UTC |
| Source: | https://github.com/sidneybissoli/educabr |
Returns the years available for a given dataset. On the first call in a session, queries the data source to discover which years are actually available (requires internet). Results are cached for the session. Falls back to a known list if discovery fails.
available_years(dataset)available_years(dataset)
dataset |
The dataset name. |
An integer vector of available years.
## Not run: available_years("enem") available_years("enade") available_years("fundeb_enrollment") ## End(Not run)## Not run: available_years("enem") available_years("enade") available_years("fundeb_enrollment") ## End(Not run)
Removes all cached files from the educabR cache directory.
clear_cache(dataset = NULL)clear_cache(dataset = NULL)
dataset |
Optional. A character string specifying which dataset
cache to clear. If |
Invisibly returns TRUE if successful.
Other cache functions:
get_cache_dir(),
list_cache(),
set_cache_dir()
## Not run: # clear all cached data clear_cache() # clear only ENEM cache clear_cache("enem") ## End(Not run)## Not run: # clear all cached data clear_cache() # clear only ENEM cache clear_cache("enem") ## End(Not run)
Calculates summary statistics for ENEM scores, optionally grouped by demographic variables.
enem_summary(data, by = NULL)enem_summary(data, by = NULL)
data |
A tibble with ENEM data (from |
by |
Optional grouping variable(s) as character vector. |
A tibble with summary statistics for each score area.
Other ENEM functions:
get_enem(),
get_enem_escola(),
get_enem_itens()
## Not run: enem <- get_enem(2023, n_max = 10000) # overall summary enem_summary(enem) # summary by sex enem_summary(enem, by = "tp_sexo") ## End(Not run)## Not run: enem <- get_enem(2023, n_max = 10000) # overall summary enem_summary(enem) # summary by sex enem_summary(enem, by = "tp_sexo") ## End(Not run)
Returns the current cache directory used by educabR.
get_cache_dir()get_cache_dir()
A character string with the path to the cache directory.
Other cache functions:
clear_cache(),
list_cache(),
set_cache_dir()
get_cache_dir()get_cache_dir()
Downloads and processes data from CAPES (Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior) on Brazilian graduate programs (stricto sensu). Data is retrieved from the CAPES Open Data Portal via the CKAN API.
get_capes( year, type = c("programas", "discentes", "docentes", "cursos", "catalogo"), n_max = Inf, keep_file = TRUE, quiet = FALSE )get_capes( year, type = c("programas", "discentes", "docentes", "cursos", "catalogo"), n_max = Inf, keep_file = TRUE, quiet = FALSE )
year |
The year of the data (2013-2024). |
type |
The type of data to download. One of:
|
n_max |
Maximum number of rows to read. Default is |
keep_file |
Logical. If |
quiet |
Logical. If |
CAPES is the federal agency responsible for evaluating and regulating graduate programs in Brazil. The data covers stricto sensu programs (master's and doctoral).
The data types include:
programas: Program identifiers, area, evaluation scores
discentes: Student enrollment, demographics, scholarship status
docentes: Faculty information, qualifications, employment
cursos: Course details, modality, start dates
catalogo: Catalog of theses and dissertations
Important notes:
Data is sourced from the CAPES Open Data Portal (CKAN), not INEP.
Files are large CSV files. Downloading may take several minutes.
Column names are standardized to lowercase with underscores.
Internet connection is required to discover download URLs via the CKAN API before downloading.
A tibble with CAPES data in tidy format.
https://dadosabertos.capes.gov.br
## Not run: # get graduate programs for 2023 programas <- get_capes(2023, type = "programas") # get student data for 2022 with limited rows discentes <- get_capes(2022, type = "discentes", n_max = 1000) ## End(Not run)## Not run: # get graduate programs for 2023 programas <- get_capes(2023, type = "programas") # get student data for 2022 with limited rows discentes <- get_capes(2022, type = "discentes", n_max = 1000) ## End(Not run)
Downloads and processes microdata from the Brazilian School Census (Censo Escolar), conducted annually by INEP. Returns school-level data with information about infrastructure, location, and administrative details.
get_censo_escolar( year, file = NULL, uf = NULL, n_max = Inf, keep_zip = TRUE, quiet = FALSE )get_censo_escolar( year, file = NULL, uf = NULL, n_max = Inf, keep_zip = TRUE, quiet = FALSE )
year |
The year of the census (1995-2025). |
file |
Optional. Name (or partial name) of a specific CSV file to load.
By default, loads the main school data file. Use
|
uf |
Optional. Filter by state (UF code or abbreviation). |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
The School Census is the main statistical survey on basic education in Brazil. It collects data from all public and private schools offering basic education (early childhood, elementary, and high school).
Important notes:
The microdata contains one row per school (~217,000 schools in 2023).
Column names are standardized to lowercase with underscores.
Use the uf parameter to filter by state for faster processing.
Older years (1995-2006) contain multiple CSV files with different data.
Use list_censo_files() to discover available files, then pass the
desired file name to the file parameter.
A tibble with school data in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/censo-escolar
Other School Census functions:
list_censo_files()
## Not run: # get schools data for 2023 escolas <- get_censo_escolar(2023) # get schools from Sao Paulo state only escolas_sp <- get_censo_escolar(2023, uf = "SP") # read only first 1000 rows for exploration escolas_sample <- get_censo_escolar(2023, n_max = 1000) # list available files for an older year list_censo_files(1995) # [1] "CENSOESC_1995.CSV" "DADOS_DESP_1995.CSV" "DADOSCURSO_1995.CSV" # load a specific file from an older year cursos <- get_censo_escolar(1995, file = "DADOSCURSO") # 2025: data is split into separate tables list_censo_files(2025) escolas_2025 <- get_censo_escolar(2025) matriculas_2025 <- get_censo_escolar(2025, file = "Matricula") docentes_2025 <- get_censo_escolar(2025, file = "Docente") turmas_2025 <- get_censo_escolar(2025, file = "Turma") ## End(Not run)## Not run: # get schools data for 2023 escolas <- get_censo_escolar(2023) # get schools from Sao Paulo state only escolas_sp <- get_censo_escolar(2023, uf = "SP") # read only first 1000 rows for exploration escolas_sample <- get_censo_escolar(2023, n_max = 1000) # list available files for an older year list_censo_files(1995) # [1] "CENSOESC_1995.CSV" "DADOS_DESP_1995.CSV" "DADOSCURSO_1995.CSV" # load a specific file from an older year cursos <- get_censo_escolar(1995, file = "DADOSCURSO") # 2025: data is split into separate tables list_censo_files(2025) escolas_2025 <- get_censo_escolar(2025) matriculas_2025 <- get_censo_escolar(2025, file = "Matricula") docentes_2025 <- get_censo_escolar(2025, file = "Docente") turmas_2025 <- get_censo_escolar(2025, file = "Turma") ## End(Not run)
Downloads and processes microdata from the Brazilian Higher Education Census (Censo da Educação Superior), conducted annually by INEP. Returns data on institutions, courses, students, or faculty.
get_censo_superior( year, type = c("ies", "cursos", "alunos", "docentes"), uf = NULL, n_max = Inf, keep_zip = TRUE, quiet = FALSE )get_censo_superior( year, type = c("ies", "cursos", "alunos", "docentes"), uf = NULL, n_max = Inf, keep_zip = TRUE, quiet = FALSE )
year |
The year of the census (2009-2024). |
type |
Type of data to load. Options:
|
uf |
Optional. Filter by state (UF code or abbreviation). |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
The Higher Education Census is the most comprehensive statistical survey on higher education institutions (HEIs) in Brazil. It collects data from all HEIs offering undergraduate and graduate programs.
Data types:
"ies": One row per institution — administrative data, location,
academic organization, funding type.
"cursos": One row per undergraduate course — area of study, modality
(in-person/distance), enrollment counts.
"alunos": One row per student enrollment — demographics, program,
admission type, enrollment status.
"docentes": One row per faculty member — education level, employment
type, teaching regime.
Important notes:
Student files ("alunos") can be very large (several GB).
Use n_max to read a sample first.
Column names are standardized to lowercase with underscores.
Use the uf parameter to filter by state for faster processing.
A tibble with Higher Education Census microdata in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/censo-da-educacao-superior
Other Higher Education Census functions:
list_censo_superior_files()
## Not run: # get institution data for 2023 ies <- get_censo_superior(2023) # get course data for Sao Paulo cursos_sp <- get_censo_superior(2023, type = "cursos", uf = "SP") # get a sample of student data alunos <- get_censo_superior(2023, type = "alunos", n_max = 10000) # get faculty data docentes <- get_censo_superior(2023, type = "docentes") ## End(Not run)## Not run: # get institution data for 2023 ies <- get_censo_superior(2023) # get course data for Sao Paulo cursos_sp <- get_censo_superior(2023, type = "cursos", uf = "SP") # get a sample of student data alunos <- get_censo_superior(2023, type = "alunos", n_max = 10000) # get faculty data docentes <- get_censo_superior(2023, type = "docentes") ## End(Not run)
Downloads and processes CPC data from INEP. The CPC is a quality indicator for undergraduate courses in Brazil, composed of ENADE scores, IDD, faculty qualifications, pedagogical resources, and other institutional factors.
get_cpc(year, n_max = Inf, keep_file = TRUE, quiet = FALSE)get_cpc(year, n_max = Inf, keep_file = TRUE, quiet = FALSE)
year |
The year of the indicator (2007-2019, 2021-2023). Note: there is no 2020 edition. Years 2004-2006 used a different indicator ("Conceito Enade"). |
n_max |
Maximum number of rows to read. Default is |
keep_file |
Logical. If |
quiet |
Logical. If |
CPC is calculated by INEP as part of the higher education quality assessment system (SINAES). It serves as a preliminary indicator used to determine which courses require on-site evaluation.
The data includes:
Course and institution identifiers
CPC scores (continuous and categorical/faixa)
Component scores (ENADE, IDD, faculty, infrastructure, etc.)
Number of students evaluated
Important notes:
CPC follows ENADE's rotating cycle of course areas, so each year covers a specific set of fields.
There is no 2020 edition (COVID-19 suspension).
Column names are standardized to lowercase with underscores.
Files are in Excel format (xls/xlsx), not CSV.
A tibble with CPC data in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/areas-de-atuacao/pesquisas-estatisticas-e-indicadores/indicadores-de-qualidade-da-educacao-superior
Other CPC/IGC functions:
get_igc()
## Not run: # get CPC data for 2023 cpc <- get_cpc(2023) # get CPC data for 2021 with limited rows cpc_2021 <- get_cpc(2021, n_max = 1000) ## End(Not run)## Not run: # get CPC data for 2023 cpc <- get_cpc(2023) # get CPC data for 2021 with limited rows cpc_2021 <- get_cpc(2021, n_max = 1000) ## End(Not run)
Downloads and processes microdata from ENADE, the Brazilian National Student Performance Exam. ENADE evaluates the performance of undergraduate students in higher education.
get_enade(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)get_enade(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)
year |
The year of the exam (2004-2024). |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
ENADE is conducted annually by INEP and evaluates undergraduate students nearing the end of their programs. Each year, a different set of course areas is assessed on a rotating cycle (typically every 3 years per area).
The microdata includes:
Student performance scores (general and specific knowledge)
Socioeconomic questionnaire responses
Course and institution identifiers
Important notes:
ENADE files can be large (several hundred MB for recent years).
Use n_max to read a sample first for exploration.
Column names are standardized to lowercase with underscores.
Not all course areas are assessed every year due to the rotating cycle.
A tibble with ENADE microdata in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/enade
## Not run: # get ENADE data for 2023 enade <- get_enade(2023, n_max = 10000) # get full dataset for 2021 enade_2021 <- get_enade(2021) ## End(Not run)## Not run: # get ENADE data for 2023 enade <- get_enade(2023, n_max = 10000) # get full dataset for 2021 enade_2021 <- get_enade(2021) ## End(Not run)
Downloads and processes microdata from ENCCEJA, the Brazilian National Exam for Youth and Adult Education Certification. ENCCEJA assesses competencies of young people and adults who did not complete basic education at the regular age.
get_encceja(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)get_encceja(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)
year |
The year of the exam (2014-2024). |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
ENCCEJA is conducted by INEP and provides certification for elementary and high school equivalency for youth and adults (EJA). The exam covers four knowledge areas:
Natural Sciences (Ciências Naturais)
Mathematics (Matemática)
Portuguese Language (Língua Portuguesa)
Social Sciences (Ciências Humanas)
Important notes:
ENCCEJA files can be large (several hundred MB).
Use n_max to read a sample first for exploration.
Column names are standardized to lowercase with underscores.
A tibble with ENCCEJA microdata in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/encceja
## Not run: # get ENCCEJA data for 2023 encceja <- get_encceja(2023, n_max = 10000) # get full dataset for 2022 encceja_2022 <- get_encceja(2022) ## End(Not run)## Not run: # get ENCCEJA data for 2023 encceja <- get_encceja(2023, n_max = 10000) # get full dataset for 2022 encceja_2022 <- get_encceja(2022) ## End(Not run)
Downloads and processes microdata from ENEM, the Brazilian National High School Exam. ENEM is used for university admissions and as a high school equivalency exam.
get_enem( year, type = "participantes", n_max = Inf, keep_zip = TRUE, quiet = FALSE )get_enem( year, type = "participantes", n_max = Inf, keep_zip = TRUE, quiet = FALSE )
year |
The year of the exam (1998-2024). |
type |
Type of data to load. Only used for ENEM 2024+, where
microdata is split into separate files. Options: |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
ENEM is conducted annually by INEP and is the largest exam in Brazil, with millions of participants. The microdata includes:
Participant demographics (age, sex, race, etc.)
Socioeconomic questionnaire responses
Scores for each test area
Essay scores
School information (when applicable)
Important notes:
ENEM files are very large (several GB when extracted).
Use n_max to read a sample first for exploration.
Column names are standardized to lowercase with underscores.
Score variables start with nu_nota_ prefix.
From 2024 onwards, INEP split the microdata into separate files.
Use the type parameter to choose which file to load.
A tibble with the ENEM microdata in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/enem
Other ENEM functions:
enem_summary(),
get_enem_escola(),
get_enem_itens()
## Not run: # get a sample of 10000 rows for exploration enem_sample <- get_enem(2023, n_max = 10000) # get full data (warning: large file) enem_2023 <- get_enem(2023) # ENEM 2024+: choose data type participantes <- get_enem(2024, type = "participantes", n_max = 1000) resultados <- get_enem(2024, type = "resultados", n_max = 1000) ## End(Not run)## Not run: # get a sample of 10000 rows for exploration enem_sample <- get_enem(2023, n_max = 10000) # get full data (warning: large file) enem_2023 <- get_enem(2023) # ENEM 2024+: choose data type participantes <- get_enem(2024, type = "participantes", n_max = 1000) resultados <- get_enem(2024, type = "resultados", n_max = 1000) ## End(Not run)
Downloads and processes ENEM results aggregated by school. This dataset contains average ENEM scores, participation rates, and other indicators for each school in Brazil.
get_enem_escola(n_max = Inf, keep_zip = TRUE, quiet = FALSE)get_enem_escola(n_max = Inf, keep_zip = TRUE, quiet = FALSE)
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
ENEM por Escola is a single bundled dataset covering years 2005 to 2015. It was discontinued by INEP after 2015 and no per-year files exist.
The data includes:
School identification (code, name, municipality, state)
Average ENEM scores by knowledge area
Number of participants and participation rates
School-level indicators
Important notes:
This is a single file covering all years (2005-2015), not per-year.
Column names are standardized to lowercase with underscores.
Data was discontinued after 2015.
A tibble with ENEM by School data in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/enem-por-escola
Other ENEM functions:
enem_summary(),
get_enem(),
get_enem_itens()
## Not run: # get all ENEM by School data (2005-2015) enem_escola <- get_enem_escola() # read only first 1000 rows for exploration enem_escola_sample <- get_enem_escola(n_max = 1000) ## End(Not run)## Not run: # get all ENEM by School data (2005-2015) enem_escola <- get_enem_escola() # read only first 1000 rows for exploration enem_escola_sample <- get_enem_escola(n_max = 1000) ## End(Not run)
Downloads and processes ENEM item response (gabarito) data, which contains detailed information about each question.
get_enem_itens(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)get_enem_itens(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)
year |
The year of the exam (1998-2024). |
n_max |
Maximum number of rows to read. |
keep_zip |
Logical. If |
quiet |
Logical. If |
A tibble with item response data.
Other ENEM functions:
enem_summary(),
get_enem(),
get_enem_escola()
## Not run: # get item data for 2023 itens <- get_enem_itens(2023) ## End(Not run)## Not run: # get item data for 2023 itens <- get_enem_itens(2023) ## End(Not run)
Downloads and processes FUNDEB resource distribution data from STN (Secretaria do Tesouro Nacional). Each year's Excel file contains multiple sheets with monthly transfer data by state/municipality, broken down by funding source.
get_fundeb_distribution( year, uf = NULL, source = NULL, destination = NULL, n_max = Inf, keep_file = TRUE, quiet = FALSE )get_fundeb_distribution( year, uf = NULL, source = NULL, destination = NULL, n_max = Inf, keep_file = TRUE, quiet = FALSE )
year |
The year of the data (2007-2026). |
uf |
Optional. A UF code (e.g., |
source |
Optional. The funding source to filter by. One of:
|
destination |
Optional. The transfer destination. One of:
Default is |
n_max |
Maximum number of rows to return. Default is |
keep_file |
Logical. If |
quiet |
Logical. If |
FUNDEB (Fundo de Manutencao e Desenvolvimento da Educacao Basica e de Valorizacao dos Profissionais da Educacao) is the main funding mechanism for basic education in Brazil.
Each Excel file from STN contains ~20 data sheets named with a prefix
indicating the destination (E_ for states, M_ for municipalities)
and a suffix indicating the funding source (e.g., E_FPE, M_ICMS).
Each sheet contains two tables: the main FUNDEB transfers and a
FUNDEB adjustment table.
Important notes:
Data is sourced from STN (Tesouro Nacional), not INEP.
Files are in Excel format (XLS) — requires the readxl package.
Column names are standardized to lowercase with underscores.
Summary sheets (Resumo, Total, etc.) are automatically excluded.
A tibble in tidy (long) format with columns:
State name
State code (UF)
Date (last day of the month)
Funding source (FPE, FPM, ICMS, etc.)
Transfer destination ("UF" or "Municipio")
Table type ("Fundeb" or "Ajuste Fundeb")
Transfer amount in BRL (numeric)
https://www.tesourotransparente.gov.br
Other FUNDEB functions:
get_fundeb_enrollment()
## Not run: # get all FUNDEB distribution data for 2023 dist_2023 <- get_fundeb_distribution(2023) # get only FPE transfers to states fpe_estados <- get_fundeb_distribution(2023, source = "FPE", destination = "uf") # get data for Sao Paulo only sp <- get_fundeb_distribution(2023, uf = "SP") ## End(Not run)## Not run: # get all FUNDEB distribution data for 2023 dist_2023 <- get_fundeb_distribution(2023) # get only FPE transfers to states fpe_estados <- get_fundeb_distribution(2023, source = "FPE", destination = "uf") # get data for Sao Paulo only sp <- get_fundeb_distribution(2023, uf = "SP") ## End(Not run)
Downloads and processes FUNDEB enrollment data from FNDE's OData API. These are the enrollment counts considered for FUNDEB funding calculation.
get_fundeb_enrollment( year, uf = NULL, n_max = Inf, keep_file = TRUE, quiet = FALSE )get_fundeb_enrollment( year, uf = NULL, n_max = Inf, keep_file = TRUE, quiet = FALSE )
year |
The year of the data (2007-2026). |
uf |
Optional. A UF code (e.g., |
n_max |
Maximum number of rows to read. Default is |
keep_file |
Logical. If |
quiet |
Logical. If |
Enrollment data comes from FNDE (Fundo Nacional de Desenvolvimento da Educacao) via its OData API. It includes the number of enrollments considered for FUNDEB funding, broken down by state, municipality, education type, school network, class type, and location.
Important notes:
Data is sourced from FNDE, not INEP.
Requires the jsonlite package.
Results are cached locally as CSV after first download.
Column names are standardized to lowercase with underscores.
When uf is used with a cached file, filtering is done locally.
A tibble with columns:
Census year
State code (UF)
Municipality name
Education network type
Education type description
Teaching type description
Class type description
Class hours type description
Location type description
Number of enrollments
FNDE: https://www.fnde.gov.br
Other FUNDEB functions:
get_fundeb_distribution()
## Not run: # get FUNDEB enrollment data for 2023 mat_2023 <- get_fundeb_enrollment(2023) # get enrollment data for Sao Paulo only mat_sp <- get_fundeb_enrollment(2023, uf = "SP") # get enrollment data with limited rows mat_sample <- get_fundeb_enrollment(2023, n_max = 1000) ## End(Not run)## Not run: # get FUNDEB enrollment data for 2023 mat_2023 <- get_fundeb_enrollment(2023) # get enrollment data for Sao Paulo only mat_sp <- get_fundeb_enrollment(2023, uf = "SP") # get enrollment data with limited rows mat_sample <- get_fundeb_enrollment(2023, n_max = 1000) ## End(Not run)
Downloads and processes microdata from IDD, an indicator that measures the value added by an undergraduate course to student performance. It compares ENADE scores with expected performance based on students' prior achievement (ENEM scores at admission).
get_idd(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)get_idd(year, n_max = Inf, keep_zip = TRUE, quiet = FALSE)
year |
The year of the indicator (2014-2019, 2021-2023). Note: there is no 2020 edition. |
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
IDD is calculated by INEP as part of the higher education quality assessment system. It complements ENADE by isolating the contribution of the course itself to student learning, controlling for student input quality.
The data includes:
Course and institution identifiers
IDD scores (continuous and categorical)
Number of students considered in the calculation
Related ENADE and ENEM metrics
Important notes:
IDD is published alongside ENADE results, following the same rotating cycle of course areas.
Column names are standardized to lowercase with underscores.
Not all courses have IDD values (minimum sample requirements apply).
A tibble with IDD data in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/idd
## Not run: # get IDD data for 2023 idd <- get_idd(2023) # get IDD data for 2021 with limited rows idd_2021 <- get_idd(2021, n_max = 1000) ## End(Not run)## Not run: # get IDD data for 2023 idd <- get_idd(2023) # get IDD data for 2021 with limited rows idd_2021 <- get_idd(2021, n_max = 1000) ## End(Not run)
Downloads and processes IDEB data from INEP in tidy (long) format. IDEB is the main indicator of education quality in Brazil, combining student performance (from SAEB) with grade promotion rates.
get_ideb(level, stage, metric, year = NULL, quiet = FALSE)get_ideb(level, stage, metric, year = NULL, quiet = FALSE)
level |
The geographic level. Required.
|
stage |
The education stage. Required.
|
metric |
The type of data to return. Required.
|
year |
Optional. Integer vector of IDEB editions to filter
(e.g., |
quiet |
Logical. If |
IDEB is calculated every two years since 2005 based on:
Learning: Average scores in Portuguese and Mathematics from SAEB
Flow: Grade promotion rate (inverse of repetition/dropout)
The index ranges from 0 to 10. Brazil's national goal is to reach 6.0 by 2022 (the level of developed countries in PISA).
The function always downloads the most recent IDEB file available from INEP, which contains the full historical series (2005-2023).
A tibble in tidy (long) format. Columns vary by level and metric:
ID columns (vary by level):
escola: uf_sigla, municipio_codigo, municipio_nome, escola_id, escola_nome, rede
municipio: uf_sigla, municipio_codigo, municipio_nome, rede
brasil: rede
estado: uf_nome, uf_sigla, rede
regiao: regiao, rede
Value columns (vary by metric):
indicador: ano, indicador, valor
aprovacao: ano, ano_escolar or serie (ensino_medio), taxa_aprovacao
nota: ano, disciplina, nota
meta: ano, meta
Official IDEB portal: https://www.gov.br/inep/pt-br/areas-de-atuacao/pesquisas-estatisticas-e-indicadores/ideb
Other IDEB functions:
get_ideb_series(),
list_ideb_available()
## Not run: # school-level IDEB indicators for early elementary ideb <- get_ideb("escola", "anos_iniciais", "indicador") # municipality-level approval rates, only 2021 and 2023 aprov <- get_ideb("municipio", "anos_finais", "aprovacao", year = c(2021, 2023)) # national IDEB targets metas <- get_ideb("brasil", "ensino_medio", "meta") # state-level SAEB scores notas <- get_ideb("estado", "anos_iniciais", "nota") # region-level IDEB indicators regioes <- get_ideb("regiao", "anos_finais", "indicador") ## End(Not run)## Not run: # school-level IDEB indicators for early elementary ideb <- get_ideb("escola", "anos_iniciais", "indicador") # municipality-level approval rates, only 2021 and 2023 aprov <- get_ideb("municipio", "anos_finais", "aprovacao", year = c(2021, 2023)) # national IDEB targets metas <- get_ideb("brasil", "ensino_medio", "meta") # state-level SAEB scores notas <- get_ideb("estado", "anos_iniciais", "nota") # region-level IDEB indicators regioes <- get_ideb("regiao", "anos_finais", "indicador") ## End(Not run)
get_ideb_series() is deprecated because the new get_ideb() already
returns data in long format with all historical editions.
Use get_ideb(level, stage, metric) instead.
get_ideb_series( years = NULL, level = c("escola", "municipio"), stage = c("anos_iniciais", "anos_finais", "ensino_medio"), uf = NULL, quiet = FALSE )get_ideb_series( years = NULL, level = c("escola", "municipio"), stage = c("anos_iniciais", "anos_finais", "ensino_medio"), uf = NULL, quiet = FALSE )
years |
Vector of years to include (default: all available). |
level |
The aggregation level. |
stage |
The education stage. |
uf |
Optional. Filter by state. |
quiet |
Logical. If |
A tibble with IDEB data.
Other IDEB functions:
get_ideb(),
list_ideb_available()
## Not run: # deprecated: use get_ideb() instead ideb <- get_ideb("municipio", "anos_iniciais", "indicador") ## End(Not run)## Not run: # deprecated: use get_ideb() instead ideb <- get_ideb("municipio", "anos_iniciais", "indicador") ## End(Not run)
Downloads and processes IGC data from INEP. The IGC is a quality indicator for higher education institutions in Brazil, calculated as a weighted average of CPC scores across all evaluated courses plus CAPES scores for graduate programs.
get_igc(year, n_max = Inf, keep_file = TRUE, quiet = FALSE)get_igc(year, n_max = Inf, keep_file = TRUE, quiet = FALSE)
year |
The year of the indicator (2007-2019, 2021-2023). Note: there is no 2020 edition. Years 2004-2006 used a different indicator ("Conceito Enade"). |
n_max |
Maximum number of rows to read. Default is |
keep_file |
Logical. If |
quiet |
Logical. If |
IGC is calculated by INEP as part of the higher education quality assessment system (SINAES). It provides an overall quality measure for institutions, considering both undergraduate and graduate programs.
The data includes:
Institution identifiers (code, name, organization type)
IGC scores (continuous and categorical/faixa)
Number of courses and students considered
Component breakdown (undergraduate CPC average, graduate CAPES scores)
Important notes:
IGC is published annually based on the last three ENADE cycles.
There is no 2020 edition (COVID-19 suspension).
Column names are standardized to lowercase with underscores.
Files are in Excel format (xls/xlsx), except 2007 which is 7z.
A tibble with IGC data in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/areas-de-atuacao/pesquisas-estatisticas-e-indicadores/indicadores-de-qualidade-da-educacao-superior
Other CPC/IGC functions:
get_cpc()
## Not run: # get IGC data for 2023 igc <- get_igc(2023) # get IGC data for 2021 with limited rows igc_2021 <- get_igc(2021, n_max = 1000) ## End(Not run)## Not run: # get IGC data for 2023 igc <- get_igc(2023) # get IGC data for 2021 with limited rows igc_2021 <- get_igc(2021, n_max = 1000) ## End(Not run)
Downloads and processes microdata from SAEB, the Brazilian Basic Education Assessment System. SAEB evaluates educational quality through student performance assessments in Portuguese and Mathematics.
get_saeb( year, type = c("aluno", "escola", "diretor", "professor"), level = c("fundamental_medio", "educacao_infantil"), n_max = Inf, keep_zip = TRUE, quiet = FALSE )get_saeb( year, type = c("aluno", "escola", "diretor", "professor"), level = c("fundamental_medio", "educacao_infantil"), n_max = Inf, keep_zip = TRUE, quiet = FALSE )
year |
The year of the assessment (2011, 2013, 2015, 2017, 2019, 2021, 2023). |
type |
Type of data to load. Options:
|
level |
For 2021 only, SAEB was split into two files:
|
n_max |
Maximum number of rows to read. Default is |
keep_zip |
Logical. If |
quiet |
Logical. If |
SAEB is conducted biennially by INEP and assesses students in grades 5 and 9 of elementary school, and grade 3 of high school. The data includes:
Student performance scores in Portuguese and Mathematics
School infrastructure and management questionnaires
Teacher and principal profiles
Important notes:
SAEB files can be large (several hundred MB).
Use n_max to read a sample first for exploration.
Column names are standardized to lowercase with underscores.
In 2021, INEP split SAEB into two separate downloads (elementary/high
school and early childhood). Use the level parameter to choose.
A tibble with SAEB microdata in tidy format.
For detailed information about variables, see INEP's documentation: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/saeb
## Not run: # get student results for 2023 saeb <- get_saeb(2023, n_max = 10000) # get school questionnaire data saeb_escola <- get_saeb(2023, type = "escola") # SAEB 2021: early childhood education saeb_infantil <- get_saeb(2021, level = "educacao_infantil", n_max = 1000) ## End(Not run)## Not run: # get student results for 2023 saeb <- get_saeb(2023, n_max = 10000) # get school questionnaire data saeb_escola <- get_saeb(2023, type = "escola") # SAEB 2021: early childhood education saeb_infantil <- get_saeb(2021, level = "educacao_infantil", n_max = 1000) ## End(Not run)
Lists all files currently in the educabR cache.
list_cache(dataset = NULL)list_cache(dataset = NULL)
dataset |
Optional. Filter by dataset name. |
A tibble with information about cached files.
Other cache functions:
clear_cache(),
get_cache_dir(),
set_cache_dir()
## Not run: list_cache() ## End(Not run)## Not run: list_cache() ## End(Not run)
Lists the data files available in a downloaded School Census.
Use this to discover which files are available for a given year,
then pass the desired file name to get_censo_escolar()'s file
parameter.
list_censo_files(year)list_censo_files(year)
year |
The year of the census. |
A character vector of file names found.
Other School Census functions:
get_censo_escolar()
## Not run: # first download the data get_censo_escolar(1995) # then see what files are available list_censo_files(1995) # [1] "CENSOESC_1995.CSV" "DADOS_DESP_1995.CSV" "DADOSCURSO_1995.CSV" # load a specific file cursos <- get_censo_escolar(1995, file = "DADOSCURSO") ## End(Not run)## Not run: # first download the data get_censo_escolar(1995) # then see what files are available list_censo_files(1995) # [1] "CENSOESC_1995.CSV" "DADOS_DESP_1995.CSV" "DADOSCURSO_1995.CSV" # load a specific file cursos <- get_censo_escolar(1995, file = "DADOSCURSO") ## End(Not run)
Lists the data files available in a downloaded Higher Education Census. Useful for exploring the contents of the ZIP file.
list_censo_superior_files(year)list_censo_superior_files(year)
year |
The year of the census. |
A character vector of file names found.
Other Higher Education Census functions:
get_censo_superior()
## Not run: list_censo_superior_files(2023) ## End(Not run)## Not run: list_censo_superior_files(2023) ## End(Not run)
Lists the IDEB data combinations available for download.
list_ideb_available()list_ideb_available()
A tibble with available IDEB datasets (level, stage, metric).
Other IDEB functions:
get_ideb(),
get_ideb_series()
list_ideb_available()list_ideb_available()
Sets the directory where downloaded files will be cached. This avoids repeated downloads of the same data.
set_cache_dir(path = NULL, persistent = FALSE)set_cache_dir(path = NULL, persistent = FALSE)
path |
A character string with the path to the cache directory.
If |
persistent |
Logical. If |
Invisibly returns the cache directory path.
Other cache functions:
clear_cache(),
get_cache_dir(),
list_cache()
## Not run: # set a custom cache directory (use tempdir() in examples) set_cache_dir(file.path(tempdir(), "educabR_cache")) ## End(Not run)## Not run: # set a custom cache directory (use tempdir() in examples) set_cache_dir(file.path(tempdir(), "educabR_cache")) ## End(Not run)