Changes in version 1.0.0 (2026-05-27)                  

Breaking changes

  - get_ideb() has a new signature: get_ideb(level, stage, metric, year,
    quiet). The old positional usage get_ideb(year, level, stage) still
    works with a deprecation warning, but the year parameter now filters
    IDEB editions instead of selecting which file to download.
  - get_ideb() now returns data in tidy long format instead of wide
    format. Output columns depend on the metric parameter ("indicador",
    "aprovacao", "nota", "meta").
  - get_ideb() now supports 5 geographic levels: "escola", "municipio",
    "estado", "regiao", and "brasil" (previously only escola and
    municipio).
  - get_ideb() always downloads the most recent IDEB file available,
    which contains the full historical series. The year parameter
    filters editions.
  - get_ideb_series() is deprecated. Use get_ideb(level, stage, metric)
    instead.
  - list_ideb_available() now returns level, stage, and metric columns
    (previously returned year, level, stage).
  - The uf parameter has been removed from get_ideb(). Filter the result
    with dplyr::filter() instead.

New features

  - download_inep_file() timeout is now configurable via
    options(educabR.download_timeout = N) (seconds; default 600). Raise
    it when downloading large microdata (e.g. ENEM participantes at ~1.6
    GB) over a slow link (issue #7).
  - read_inep_file() now warns before reading files larger than 500 MB
    entirely into memory, suggesting n_max or UF filters to reduce
    memory pressure. Suppressible with quiet = TRUE; all get_*() callers
    propagate their quiet argument (issue #5).

Bug fixes

  - get_ideb() no longer consumes several GB of RAM for school-level
    reads (issue #1). The xlsx is now read with column projection: only
    the vl_* columns matching the requested metric (and year, when
    given) are parsed; the others are skipped at the readxl C++ layer.
    INEP's NA tokens ("", "-", "ND") are also passed to read_excel(na =
    ...) so the missing-value strings never get allocated as R character
    vectors. For level = "escola", stage = "anos_iniciais", metric =
    "indicador", this cuts the in-memory result from ~133 MB to ~37 MB
    (4 years) or ~19 MB (1 year), with proportional drops in peak memory
    during reshape.
  - Character columns from Excel readers (read_ideb_excel(),
    read_excel_safe()) and the FUNDEB enrollment OData fetcher
    (fetch_fundeb_enrollment()) are now normalized to UTF-8 NFC,
    matching the behavior already in read_inep_file(). Previously,
    equality comparisons against literals such as filter(rede ==
    "Pública") could silently return zero rows on Windows because the
    source-file encoding produced non-canonical strings. The shared
    helper normalize_utf8_nfc() is now applied at every read entrypoint
    so all four code paths agree. Affects get_ideb(), get_cpc(),
    get_igc(), get_fundeb_enrollment().
  - read_excel_safe() (used by get_cpc() and get_igc()) now passes
    INEP's missing-value tokens ("", "-", "ND", en/em dashes) to
    readxl::read_excel(na = ...) so those cells are loaded as NA instead
    of character strings cleaned up post-hoc (issue #4). Previously, a
    column whose first rows were all "-" could be inferred as logical
    and later numeric values silently dropped. clean_dash_values()
    remains as a safety net but is now largely redundant for CPC/IGC.

Internal

  - download_inep_file() now verifies downloaded files before caching
    them (issue #3). Three checks run after the bytes hit disk: file
    size against the server's Content-Length (1% tolerance, catches
    truncated downloads), HTML-masquerade detection on the first 64
    bytes (catches INEP maintenance pages served with HTTP 200), and ZIP
    magic-bytes (PK\x03\x04) for .zip destinations (catches proxy
    corruption). On any failure the corrupt file is deleted and the user
    gets a clear error telling them to retry, instead of a cryptic
    readxl / readr failure on the next call.
  - validate_year() now rejects vectors and non-numeric input with a
    clear error pointing at purrr::map_dfr() for multi-year composition
    (issue #2). Previously, passing c(2017, 2019) to any of the 13
    affected getters (get_cpc, get_idd, get_igc, get_capes, get_saeb,
    get_enem, get_enem_itens, get_enade, get_encceja,
    get_fundeb_distribution, get_fundeb_enrollment, get_censo_escolar,
    get_censo_superior) hit either a cryptic length > 1 error (R ≥ 4.2)
    or silently used only the first element (R < 4.2). get_ideb() is
    unaffected — it intentionally accepts year vectors.
  - extract_zip() cleaned up: removed dead if (TRUE) branch and an
    unreachable cli_abort(); the muffle on extraction warnings was
    tightened from the broad erro|error pattern to the two specific
    messages that motivated it (issue #6).
  - RoxygenNote bumped to 8.0.0 and man/*.Rd regenerated; systemfonts
    and textshaping declared in Suggests: to silence the cosmetic R CMD
    check NOTE about packages pulled transitively by pkgdown.
  - R CMD check warnings cleared: em-dashes in cli_abort() message
    strings in R/utils-download.R are now written with Unicode escapes
    (R requires ASCII-only in code strings; comments are exempt).

                 Changes in version 0.9.0 (2026-04-03)                  

Breaking changes

  - available_years() now dynamically discovers available years by
    querying data sources (HEAD requests for INEP, OData queries for
    FNDE). Results are cached per session. Falls back to a hardcoded
    list when offline.
  - available_years() now accepts "fundeb_enrollment" as a separate
    dataset name. Previously, "fundeb" was shared between distribution
    and enrollment.
  - Code columns (CO_*, CD_*) are now read as character instead of
    numeric across all datasets. This prevents loss of leading zeros in
    codes like municipality codes, course codes, and institution codes.

Bug fixes

  - Fixed get_enade() failing for 9 of 19 available years. INEP uses
    inconsistent URLs for ENADE: _LGPD suffix for 2012-2019, .rar format
    for 2022. Added hardcoded URL map (enade_urls) with all 19 correct
    URLs.
  - Fixed get_fundeb_enrollment() accepting years with no data in the
    FNDE API. The API currently only has data for 2017-2018.
  - Fixed encoding warning ("unable to translate MARCO") on Windows
    caused by Unicode cedilla in FUNDEB month name map.
  - Fixed clear_cache() failing to delete files on Windows when they
    were memory-mapped by readr. Now deletes entire directories and
    warns about locked files.
  - Fixed ZIP extraction fallback not triggering for ENADE 2017
    ("Illegal byte sequence" error was not matched by the encoding
    detection pattern).

New features

  - Added .rar archive extraction support via 7-Zip. find_7z() searches
    common Windows install paths when 7z is not in PATH.
  - Added strip_diacriticals() internal helper for encoding-safe text
    matching.
  - read_inep_file() now auto-detects code columns (CO_*, CD_*) from the
    file header and reads them as character. No user action required.
  - read_ideb_excel() and read_excel_safe() (CPC/IGC) now convert code
    columns to character after reading.

                        Changes in version 0.8.0                        

New features

FUNDEB (Fundo de Manutencao e Desenvolvimento da Educacao Basica)

  - get_fundeb_distribution(): Download FUNDEB resource distribution
    data (years 2007-2026). Reads all sheets from STN Excel files and
    returns tidy long-format data with monthly transfer amounts by
    state, funding source, destination (states/municipalities), and
    table type (fundeb/adjustment).
  - get_fundeb_enrollment(): Download FUNDEB enrollment data. Fetches
    from FNDE OData API with automatic pagination. Results cached as
    CSV.
  - Filtering parameters for distribution: uf, source (FPE, FPM, ICMS,
    etc.), and destination ("uf" or "municipio").
  - Data sources: Tesouro Transparente
    (https://www.tesourotransparente.gov.br) and FNDE
    (https://www.fnde.gov.br).

                        Changes in version 0.7.0                        

New features

CAPES (Dados Abertos da Pos-Graduacao)

  - get_capes(): Download CAPES graduate education data
    (years 2013-2024).
  - Supports 5 data types: programs ("programas"), students
    ("discentes"), faculty ("docentes"), courses ("cursos"), and
    theses/dissertations catalog ("catalogo").
  - Uses CKAN API to dynamically discover download URLs (CAPES URLs
    contain UUIDs).
  - Data source: CAPES Open Data Portal
    (https://dadosabertos.capes.gov.br).

                        Changes in version 0.6.0                        

New features

CPC (Conceito Preliminar de Curso)

  - get_cpc(): Download CPC data (years 2007-2019, 2021-2023; no 2020
    edition).
  - Quality indicator for undergraduate courses, part of SINAES.
  - Files are in Excel format (xls/xlsx) — requires the readxl package.
  - Hardcoded URL map due to completely inconsistent INEP naming
    patterns.

IGC (Indice Geral de Cursos)

  - get_igc(): Download IGC data (years 2007-2019, 2021-2023; no 2020
    edition).
  - Institutional quality indicator based on weighted CPC averages and
    CAPES scores.
  - Files are in Excel format (xls/xlsx), except 2007 which is a 7z
    archive.
  - Hardcoded URL map due to completely inconsistent INEP naming
    patterns.

Shared utilities

  - read_excel_safe(): Internal helper to read Excel files with error
    handling.

                        Changes in version 0.5.0                        

New features

ENEM por Escola (ENEM by School)

  - get_enem_escola(): Download ENEM results aggregated by school
    (2005-2015).
  - Single bundled file covering all years. Discontinued after 2015.

IDD (Indicador de Diferença entre os Desempenhos Observado e Esperado)

  - get_idd(): Download IDD microdata (years 2014-2019, 2021-2023;
    no 2020 edition).
  - Measures the value added by undergraduate courses to student
    performance.
  - Handles both ZIP (2021+) and 7z (2014-2019) archive formats via new
    extract_archive() utility.
  - Automatic delimiter detection and dash-to-NA cleaning.

                        Changes in version 0.4.0                        

New features

ENCCEJA (Exame Nacional para Certificação de Competências de Jovens e Adultos)

  - get_encceja(): Download ENCCEJA microdata (years 2014-2024).
  - Automatic delimiter detection and dash-to-NA cleaning.

                        Changes in version 0.3.0                        

New features

ENADE (Exame Nacional de Desempenho dos Estudantes)

  - get_enade(): Download ENADE microdata.
  - Automatic delimiter detection and dash-to-NA cleaning.

Censo da Educação Superior (Higher Education Census)

  - get_censo_superior(): Download Higher Education Census microdata
    (years 2009-2024).
  - Supports multiple data types: institutions ("ies"), courses
    ("cursos"), students ("alunos"), and faculty ("docentes").
  - list_censo_superior_files(): List available files in a downloaded
    census.
  - UF filtering via the uf parameter.

                        Changes in version 0.2.0                        

New features

SAEB (Sistema de Avaliação da Educação Básica)

  - get_saeb(): Download SAEB microdata
    (years 2011, 2013, 2015, 2017, 2019, 2021, 2023).
  - Supports multiple data types: student results ("aluno"), school
    ("escola"), principal ("diretor"), and teacher ("professor")
    questionnaires.
  - Handles SAEB 2021 special case where INEP split downloads into
    elementary/high school and early childhood education files via the
    level parameter.

Bug fixes

  - Fixed encoding detection on Windows using iconv() instead of
    validEnc().
  - Fixed "Latin-1" encoding name to "latin1" for Windows codepage
    compatibility.
  - Fixed ENEM 2024+ support: new type parameter for split files
    ("participantes", "resultados").
  - Added SAS datetime parsing for Censo Escolar date columns (dt_*).
  - Converted IDEB vl_* columns from character to numeric, handling "-",
    "ND", and comma decimals.

                 Changes in version 0.1.2 (2026-02-19)                  

New features

  - Added post-read data validation for all datasets. Errors on empty or
    corrupted files; warns when expected columns are missing (e.g.,
    score columns for ENEM, UF columns for IDEB/Census) with actionable
    messages.
  - Downloads now show estimated file size before starting (e.g.,
    "downloading 2.3 GB from INEP...") via HTTP HEAD request, with
    graceful fallback if size is unavailable.
  - get_ideb_series() now shows per-year progress indication (e.g.,
    "processing IDEB 2017 (1/4)") and propagates the quiet parameter to
    inner get_ideb() calls.
  - get_enem_itens() now has keep_zip parameter for consistency with
    get_enem() and get_censo_escolar().

Documentation

  - Added English README (README.md) as default; Portuguese version
    renamed to README.pt-br.md with cross-links between both.
  - Fixed @param year ranges in documentation to match
    available_years():
      - get_enem() / get_enem_itens(): 2009-2023 -> 1998-2024
      - get_censo_escolar(): 2007-2024 -> 1995-2024
  - Added @family tags to group related functions in help pages (ENEM,
    IDEB, School Census, cache).
  - Added English vignette (getting-started.Rmd).
  - Fixed Portuguese accents in README.pt-br.md.

Tests

  - Added tests for enem_summary(): statistics calculation, NA handling,
    grouping by variable, and error on missing score columns.
  - Added tests for validate_data(): empty data, few columns, missing
    expected columns per dataset.

CRAN

  - Replaced \donttest with \dontrun in all examples per CRAN request.

                        Changes in version 0.1.1                        

Bug fixes

  - Fixed set_cache_dir() example that created a directory in the user's
    home (~/educabR_cache) during CRAN checks. Now uses tempdir() in
    examples.

                 Changes in version 0.1.0 (2026-02-03)                  

First public release.

New features

IDEB

  - get_ideb(): Download IDEB data (years 2017, 2019, 2021, 2023).
  - get_ideb_series(): Download IDEB historical series across multiple
    years.
  - list_ideb_available(): List available year/stage/level combinations.

ENEM

  - get_enem(): Download ENEM microdata (years 1998-2024).
  - get_enem_itens(): Download ENEM item response data.
  - enem_summary(): Calculate summary statistics for ENEM scores.

School Census

  - get_censo_escolar(): Download School Census microdata
    (years 1995-2024).
  - list_censo_files(): List available files in a downloaded census.

Cache management

  - set_cache_dir(): Set custom cache directory.
  - get_cache_dir(): Get current cache directory.
  - clear_cache(): Clear cached files.
  - list_cache(): List cached files with metadata.

Utilities

  - available_years(): Get available years for each dataset.