Schema Management with apply_projection()
Lawrence Garba
Source:vignettes/apply_projection.Rmd
apply_projection.RmdWhat is apply_projection()?
When you retrieve data from the CBN API, the result contains every
column the endpoint offers — sometimes more than you need, and almost
always with names that require explanation.
apply_projection() lets you take control of that raw data
by doing three things in a single, auditable step:
- Select — keep only the columns you care about.
- Rename — replace technical column names with human-readable labels.
- Reorder — arrange columns in the sequence that fits your workflow.
Every call is silently recorded in a projection manifest attached to the result as an attribute. This means you can always look back and see exactly what was done, when it was done, and why — which is essential for reproducible research.
Input and Output
| Argument | Type | What it does |
|---|---|---|
.data |
data.frame / opennaijR_tbl
|
Your raw or previously projected dataset |
cols |
character vector |
Names of the columns to keep |
rename |
named character vector
|
New names mapped to old names:
c(NewName = "old_name")
|
order |
character vector |
Final column order (use post-rename names if you renamed) |
reason |
character scalar |
Optional label stored in the manifest for audit purposes |
Output: A data.frame of the same class
as the input, containing only the requested columns in the requested
order, with the requested names — plus a
projection_manifest attribute.
1. Selecting Columns
The simplest use case: you only want two of the many columns in the inflation dataset.
library(opennaijR)
infl <- cbn("inflation")
# Keep only the date and headline year-on-year figure
infl_basic <- apply_projection(
infl,
cols = c("date", "headline_yoy")
)
head(infl_basic)What happened? Every column except date
and headline_yoy was dropped. The data is otherwise
unchanged — same rows, same values, same order.
To keep more columns, just extend the vector:
infl_three_cols <- apply_projection(
infl,
cols = c("date", "headline_yoy", "food_yoy")
)2. Renaming Columns
Renaming uses a named character vector where the name on the left is what you want the column to be called, and the value on the right is what it is called now.
# Rename all columns — no selection, every column is kept
infl_renamed <- apply_projection(
infl,
rename = c(
Date = "date",
Headline = "headline_yoy",
Food = "food_yoy"
)
)If you only rename a subset of columns, the rest keep their original names.
Common mistake: Passing an unnamed vector to
renamewill throw an error.# This will fail — no names on the left-hand side apply_projection(infl, rename = c("headline_yoy")) #> Error: `rename` must be a named vector: c(new_name = 'old_name')
3. Selecting and Renaming Together
You can combine cols and rename in one
call. The columns not listed in cols are dropped first, and
then the remaining columns are renamed.
infl_clean <- apply_projection(
infl,
cols = c("date", "headline_yoy", "food_yoy"),
rename = c(
Date = "date",
Headline = "headline_yoy",
Food = "food_yoy"
)
)
head(infl_clean)
#> Date Headline Food
#> 1 2010-01-01 11.80 14.60
#> 2 2010-02-01 14.05 17.02
#> ...4. Reordering Columns
Use order when the column sequence matters — for
example, a dashboard that expects a specific layout, or a report where
the most important indicator should appear first.
# Put food before headline, even though we selected headline first
infl_reordered <- apply_projection(
infl,
cols = c("date", "headline_yoy", "food_yoy"),
order = c("food_yoy", "headline_yoy", "date")
)After renaming, use the new names in
order, not the original ones:
infl_report <- apply_projection(
infl,
cols = c("date", "headline_yoy", "food_yoy"),
rename = c(Date = "date", Headline = "headline_yoy", Food = "food_yoy"),
order = c("Date", "Headline", "Food")
)5. Adding a Reason (Audit Trail)
The reason argument does not change the data at all — it
adds a human-readable label to the manifest so you can recall
why a particular projection was applied.
policy_brief <- apply_projection(
infl,
cols = c("date", "headline_yoy", "core_ex_farm_yoy"),
rename = c(Date = "date", Headline = "headline_yoy", Core = "core_ex_farm_yoy"),
reason = "Quarterly macroeconomic policy brief — Q1 2025"
)Inspect the manifest afterwards:
manifest <- attr(policy_brief, "projection_manifest")
manifest[[1]]$timestamp # When it ran
manifest[[1]]$reason # Your label
manifest[[1]]$cols_kept # Which columns were kept6. Chaining Projections (Audit-Trail Accumulation)
You can call apply_projection() on the result
of a previous call. Each step appends a new entry to the manifest,
giving you a complete audit trail of every transformation.
# Step 1 — select
step1 <- apply_projection(
infl,
cols = c("date", "headline_yoy", "food_yoy"),
reason = "Initial column selection"
)
# Step 2 — rename (applied to step1's output)
step2 <- apply_projection(
step1,
rename = c(Date = "date", Headline = "headline_yoy", Food = "food_yoy"),
reason = "Standardize names for reporting"
)
# Both steps are recorded
attr(step2, "projection_manifest")
#> [[1]]
#> $action "projection"
#> $reason "Initial column selection"
#> $timestamp ...
#>
#> [[2]]
#> $action "projection"
#> $reason "Standardize names for reporting"
#> $timestamp ...This is particularly useful when a dataset passes through multiple analyst hands or pipeline stages.
7. Pipe-Friendly Usage
apply_projection() works naturally in a
|> or %>% pipeline:
library(dplyr)
infl |>
apply_projection(
cols = c("date", "headline_yoy", "food_yoy"),
rename = c(Date = "date", Headline = "headline_yoy", Food = "food_yoy"),
reason = "Pipeline transformation for dashboard"
)8. Standardizing Schemas Across Datasets
One powerful use of apply_projection() is ensuring that
different datasets share the same column naming convention before
merging or comparing them.
exchange <- cbn("exchange_rates")
# Give both datasets a common "Date" column name
infl_std <- apply_projection(
infl,
rename = c(Date = "date"),
reason = "Standard schema — macro datasets"
)
exchange_std <- apply_projection(
exchange,
rename = c(Date = "ratedate"),
reason = "Standard schema — macro datasets"
)
# Now both share the same "Date" column and can be merged cleanly
macro <- merge(infl_std, exchange_std, by = "Date")9. Error Reference
| Situation | Error message |
|---|---|
Column listed in cols does not exist |
Unknown column(s) in projection: <name> |
rename vector has no names |
`rename` must be a named vector: c(new_name = 'old_name') |
Column listed in order not in the projected set |
Error naming the missing column |
Workflow Position
apply_projection() is the first
transformation you apply after cbn(). Think of it
as setting your schema before any calculations begin. The typical
opennaijR workflow looks like this:
cbn() → apply_projection() → derive_measure() → analysis
Once you have a clean, consistently named dataset, you are ready to
engineer features with derive_measure().