Project spreadsheet-handling

1. Introduction

What This Project Does

spreadsheet-handling is a Python toolkit for roundtrip workflows between structured data and human-editable spreadsheets. A configurable pipeline projects data into spreadsheet form for review and editing, then re-imports it deterministically. Semantics are kept explicit: cross-references, coded cell values, and FK helper columns are all configuration-driven, not inferred from sheet layout.

Two Repositories

The library (spreadsheet-handling) and a demo integration (spreadsheet-handling-demo) are separate repositories. The demo is the main user entry point and contains runnable examples and sample data.

This guide documents the framework surface shipped with a specific library release.

Scope of This Guide

Pipeline workflow overview and starting points
Transformation catalog: step names, parameters, roundtrip guarantees

Demo data, runnable scripts, and integration walkthroughs live in the demo repository.

2. Workflow

The Core Idea

A pipeline transforms data bidirectionally between a normalized relation form (YAML-friendly) and a human-editable spreadsheet. The same configuration drives both directions, so re-import is deterministic.

Practical workflows tend to grow in rings: start with a plain tabular roundtrip (data in → spreadsheet out → back in), then add FK helper columns for display, then introduce cross-table or compact-cell encoding for multi-valued relations. Each ring adds one transform pair from the catalog.

A runnable walkthrough of this spiral is available in the spreadsheet-handling-demo repository.

Pipeline YAML Structure

A pipeline is a YAML list of step entries. Each entry names a registered transform and supplies its parameters inline:

pipeline:
  - step: expand_xref
    matrix: role_permissions
    output: permission_rows
    row_keys: role_id
    drop_empty: true

Available step names, required parameters, and roundtrip guarantees are listed in the Transformation Catalog.

Running a Pipeline

sheets-run is the primary runtime command for executing a documented pipeline. For an ad-hoc pipeline file like the one above, provide the steps file and input/output locations. The pipeline steps transform the loaded frames; the CLI input and output options choose how those frames are read and written:

sheets-run --steps pipeline.yaml \
  --in-kind yaml_dir --in-path data/input \
  --out-kind yaml_dir --out-path build/output

The spreadsheet-handling-demo repository contains the full first-hour walkthrough with checked-in input data, pipeline files, and expected outputs. The user guide focuses on the configuration concepts and transform catalog rather than duplicating that tutorial.

One-Shot Schema Column Maintenance

Use sheets-schema-maintain when you need an explicit, reviewable column schema change over loaded Frames: add a column, drop a column, rename a column, or reorder columns. It is a maintenance command, not a normal pipeline step. Use ordinary roundtrip editing when the change belongs in the spreadsheet payload itself; use schema maintenance when the column contract has changed and you want the tool to check recognized metadata references before writing output.

The command is dry-run by default. In dry-run mode the loaded Frames pass through the orchestrator and then into the output-only discard saver, so no output artifact is written. Add --write to allow a user-selected output write; write mode requires both --out-kind and --out-path.

Every run emits a machine-readable JSON report outside Frames and outside _meta. If --report is omitted, the report is printed to stdout. If --report is supplied, the report is written to that path. Blocking failures stop the write before the saver runs, but the report is still emitted.

Supported operations:

add_column
drop_column
rename_column
reorder_columns

Metadata handling is deliberately conservative. Recognized metadata references are updated, pruned, or blocked according to explicit rules. Ambiguous, malformed, or unsupported structured references block the operation. Unknown plugin-owned metadata is not string-searched or heuristically rewritten; it blocks only when a structured frame/column reference is detected. _meta.derived is non-authoritative and does not drive schema decisions.

Known limitations:

no public sheets-run YAML step and no public pipeline registry step;
no generic migration framework;
no workbook-header inference;
no dynamic projection schema handling;
FK relation drops remain blocked.

Dry-run rename with the report printed to stdout:

sheets-schema-maintain \
  --in-kind json_dir --in-path data/input \
  --op rename_column \
  --frame characters \
  --source-column name \
  --target-column display_name

Write-mode rename from json_dir to json_dir with a report file:

sheets-schema-maintain \
  --in-kind json_dir --in-path data/input \
  --write \
  --out-kind json_dir --out-path build/schema-updated \
  --op rename_column \
  --frame characters \
  --source-column name \
  --target-column display_name \
  --report build/schema-maintenance-report.json

Drop with explicit pruning of supported references:

sheets-schema-maintain \
  --in-kind json_dir --in-path data/input \
  --write \
  --out-kind json_dir --out-path build/schema-updated \
  --op drop_column \
  --frame characters \
  --source-column obsolete_note \
  --prune \
  --report build/drop-obsolete-note-report.json

--prune only removes explicitly supported references. Unsupported structured references still block, including FK relations that touch the dropped column.

3. Transformation Catalog

Reading This Catalog

Step names used in pipeline YAML must match the runtime_name values in pipeline_step_registry.json (under core/registries/), which is the authoritative registry artifact.

Inverse kinds

bijective	Fully reversible. The inverse step reconstructs the original exactly.
conditionally bijective	Reversible when the stated conditions hold (see each entry).
relation-preserving	No data is lost, but column order or frame shape may change.
projection-only	One-way; the inverse step cannot reconstruct the original.
validation-only	Read-only; no frames are changed.
configuration-only	Writes workbook metadata only; no data frames are changed.

Groups

Bijective transforms: Paired expand/contract steps that form the core of the bidirectional workflow.
Supporting steps: FK helpers, validation, configuration, and infrastructure steps listed in a reference table.

Bijective Transforms

These steps come in expand/contract pairs. The expand direction converts a compact or matrix representation into an explicit relation (long form). The contract direction reverses it. Losslessness is conditional — see each entry.

`expand_xref` / `contract_xref`

Category: primitive · Inverse: conditionally bijective

Expand a matrix frame (rows × columns of marker cells) into explicit long-form relation rows, or contract them back. Typical use: a role × permission table where each cell marks whether the role has the permission.

See also: concept doc technical_model/ch04_concepts/crosstab_binary_xref.adoc and compact_transforms.adoc.

expand_xref parameters

Parameter Required Description

Parameter	Required	Description
`matrix`	yes	Name of the input matrix frame.
`output`	yes	Name of the output relation frame to produce.
`row_keys`	yes	Column name (string) or list of column names that form the row identifier.
`value_columns`	no	Columns to expand. Defaults to all non-key columns.
`column_key`	no	Field name in the output relation that stores the column-axis address value (default: `column_key`). This is a field name, not a list of axis values — see `column_keys` on `contract_xref` for that.
`value`	no	Column name in the output holding the cell value (default: `value`).
`drop_empty`	no	If `true`, omit rows where the cell value is empty or blank. Default: `false`. With `base_relation`: a blank in-scope cell removes the matching base tuple. Without `base_relation`: a blank cell either produces a tuple with empty value (`false`) or is omitted (`true`).
`base_relation`	no	Optional name of an existing relation frame for scoped recomposition. When absent, `expand_xref` converts the matrix to a new relation (standard mode). When present, the matrix cells define the update scope and the matching base-relation tuples are replaced or removed; relation rows whose addresses fall outside the matrix scope are preserved and appended to the output. First-run pipelines with no prior relation omit this parameter.
`name`	no	Override the step name shown in logs.

matrix

yes

Name of the input matrix frame.

output

yes

Name of the output relation frame to produce.

row_keys

yes

Column name (string) or list of column names that form the row identifier.

value_columns

Columns to expand. Defaults to all non-key columns.

column_key

Field name in the output relation that stores the column-axis address value (default: column_key). This is a field name, not a list of axis values — see column_keys on contract_xref for that.

value

Column name in the output holding the cell value (default: value).

drop_empty

If true, omit rows where the cell value is empty or blank. Default: false. With base_relation: a blank in-scope cell removes the matching base tuple. Without base_relation: a blank cell either produces a tuple with empty value (false) or is omitted (true).

base_relation

Optional name of an existing relation frame for scoped recomposition. When absent, expand_xref converts the matrix to a new relation (standard mode). When present, the matrix cells define the update scope and the matching base-relation tuples are replaced or removed; relation rows whose addresses fall outside the matrix scope are preserved and appended to the output. First-run pipelines with no prior relation omit this parameter.

name

Override the step name shown in logs.

Losslessness condition: lossless when drop_empty: false, or when the original matrix had no empty cells in the expanded columns.

- step: expand_xref
  matrix: role_permissions
  output: permission_rows
  row_keys: role_id
  drop_empty: true

contract_xref parameters

Parameter Required Description

Parameter	Required	Description
`relation`	yes	Name of the input long-form relation frame.
`output`	yes	Name of the output matrix frame to produce.
`row_keys`	yes	Column name(s) forming the row identifier (must match `expand_xref` config).
`column_key`	no	Field name in the relation that holds the column-axis address values. Defaults to the value stored in meta by the preceding `expand_xref` (or the literal `column_key`). This is a field name, not a list of axis values — see `column_keys` below.
`value`	no	Column in the relation holding cell values. Defaults to meta-persisted value.
`column_keys`	no	Explicit ordered list of column-axis values that become the visible matrix columns. (`column_key` names the field; `column_keys` lists the values in that field.) When supplied together with `dense_axes.columns_from`, both must resolve to the same ordered list — a mismatch fails fast.
`fill_value`	no	Value to place in cells with no matching relation row. Default: empty string.
`dense_axes`	no	Map with optional `rows_from` and/or `columns_from` keys. Each specifies an entity frame and key field whose unique values drive the corresponding axis. When `columns_from` is supplied, the resolved value list must match any explicit `column_keys` exactly (consistency contract, not precedence).
`name`	no	Override the step name shown in logs.

relation

yes

Name of the input long-form relation frame.

output

yes

Name of the output matrix frame to produce.

row_keys

yes

Column name(s) forming the row identifier (must match expand_xref config).

column_key

Field name in the relation that holds the column-axis address values. Defaults to the value stored in meta by the preceding expand_xref (or the literal column_key). This is a field name, not a list of axis values — see column_keys below.

value

Column in the relation holding cell values. Defaults to meta-persisted value.

column_keys

Explicit ordered list of column-axis values that become the visible matrix columns. (column_key names the field; column_keys lists the values in that field.) When supplied together with dense_axes.columns_from, both must resolve to the same ordered list — a mismatch fails fast.

fill_value

Value to place in cells with no matching relation row. Default: empty string.

dense_axes

Map with optional rows_from and/or columns_from keys. Each specifies an entity frame and key field whose unique values drive the corresponding axis. When columns_from is supplied, the resolved value list must match any explicit column_keys exactly (consistency contract, not precedence).

name

Override the step name shown in logs.

Losslessness condition: lossless when relation rows are unique per (row_keys, column_key) combination.

- step: contract_xref
  relation: permission_rows
  output: role_permissions
  row_keys: role_id

column_key vs column_keys

column_key and column_keys look similar but carry different information:

Parameter Meaning

Parameter	Meaning
`column_key`	Field name — the name of the field in the long/relation frame that holds the column-axis address (e.g. `character_name`, `context_id`, `product_id`).
`column_keys`	Value list — an explicit, ordered list of concrete axis values that become the visible matrix columns (e.g. `[Galli, Donzo, Rexi]`).

column_key

Field name — the name of the field in the long/relation frame that holds the column-axis address (e.g. character_name, context_id, product_id).

column_keys

Value list — an explicit, ordered list of concrete axis values that become the visible matrix columns (e.g. [Galli, Donzo, Rexi]).

Example: character_name is the field name; Galli, Donzo, Rexi are the values.

- step: contract_xref
  relation: story_cast_role_named
  output: story_cast_role_matrix
  row_keys: [story_id]
  column_key: character_name      # field name in the relation
  column_keys: [Galli, Donzo, Rexi]   # concrete axis values
  value: value

column_key does not have to be a technical primary key. A configured unique alternate key (such as character_name) is valid when it is declared unique.

Column-axis key sources (contract_xref only):

column_keys only → list used directly.
dense_axes.columns_from only → list resolved from the entity frame at runtime.
Both supplied → must match exactly; a mismatch fails fast (consistency contract, not precedence).
Neither → falls back to column_keys stored in meta from a prior run, then to relation column order.

Sparse-default cells and expand_xref

expand_xref does not interpret sparse-default metadata. If blank cells in the matrix represent a compressed default (introduced by sparse_collapse), run sparse_expand before expand_xref to restore explicit default values. sparse_collapse / sparse_expand are separate transformations with their own registered steps.

`decode_cell_values` / `encode_cell_values`

Category: primitive · Inverse: conditionally bijective

Decode a compact cell string (e.g. "A,B,C") into one row per code, or encode code rows back into a compact cell string. Used inside the compact multiaxis chain or independently for coded single-column cells.

See also: concept doc technical_model/ch04_concepts/compact_transforms.adoc.

decode_cell_values parameters

Parameter Required Description

Parameter	Required	Description
`source`	yes	Name of the input frame containing encoded cell strings.
`output`	yes	Name of the output frame with one row per decoded token.
`value`	no	Column in `source` holding the encoded string. Default: `value`.
`code`	no	Column name in `output` for the decoded token. Default: `code`.
`passthrough_columns`	no	Columns from `source` to copy unchanged into every output row.
`drop_empty`	no	If `true`, omit rows for empty tokens. Default: `false`.
`mode`	no	Decoding mode. `delimited` (default) splits on `delimiter`; `positional` treats each character as a code.
`delimiter`	no	Token separator for `delimited` mode. Default: `,`.
`allowed_codes`	no	Explicit list of valid code strings. Validation fails on unknown codes.
`allowed_tokens`	no	Alias for `allowed_codes`.
`allowed_from_legend`	no	Name of a legend frame from which to derive the allowed-code set.
`normalize_case`	no	`upper`, `lower`, or omit to leave case unchanged.
`strip`	no	If `true`, strip whitespace from each token. Default: `false`.
`name`	no	Override the step name shown in logs.

source

yes

Name of the input frame containing encoded cell strings.

output

yes

Name of the output frame with one row per decoded token.

value

Column in source holding the encoded string. Default: value.

code

Column name in output for the decoded token. Default: code.

passthrough_columns

Columns from source to copy unchanged into every output row.

drop_empty

If true, omit rows for empty tokens. Default: false.

mode

Decoding mode. delimited (default) splits on delimiter; positional treats each character as a code.

delimiter

Token separator for delimited mode. Default: ,.

allowed_codes

Explicit list of valid code strings. Validation fails on unknown codes.

allowed_tokens

Alias for allowed_codes.

allowed_from_legend

Name of a legend frame from which to derive the allowed-code set.

normalize_case

upper, lower, or omit to leave case unchanged.

strip

If true, strip whitespace from each token. Default: false.

name

Override the step name shown in logs.

Losslessness condition: lossless when drop_empty: false and token order is preserved (or canonical_order is set on the encode side).

- step: decode_cell_values
  source: assignments
  output: assignment_codes
  value: codes
  code: code
  delimiter: ","
  passthrough_columns: [employee_id, project_id]

encode_cell_values parameters

Parameter Required Description

Parameter	Required	Description
`source`	yes	Name of the input frame with one row per code.
`output`	yes	Name of the output frame with encoded cell strings.
`group_by`	yes	Column name(s) to group rows before joining into a cell string.
`code`	no	Column containing the token to encode. Default: `code`.
`value`	no	Column name in `output` for the encoded string. Default: `value`.
`mode`	no	Encoding mode. Matches `decode_cell_values` mode.
`delimiter`	no	Separator. Default: `,`.
`canonical_order`	no	Explicit ordered list of codes; determines output token order.
`allowed_codes` / `allowed_tokens`	no	Validation whitelist.
`allowed_from_legend`	no	Legend frame name for allowed-code derivation.
`normalize_case`	no	`upper`, `lower`, or omit.
`strip`	no	Strip whitespace before encoding.
`name`	no	Override the step name shown in logs.

source

yes

Name of the input frame with one row per code.

output

yes

Name of the output frame with encoded cell strings.

group_by

yes

Column name(s) to group rows before joining into a cell string.

code

Column containing the token to encode. Default: code.

value

Column name in output for the encoded string. Default: value.

mode

Encoding mode. Matches decode_cell_values mode.

delimiter

Separator. Default: ,.

canonical_order

Explicit ordered list of codes; determines output token order.

allowed_codes / allowed_tokens

Validation whitelist.

allowed_from_legend

Legend frame name for allowed-code derivation.

normalize_case

upper, lower, or omit.

strip

Strip whitespace before encoding.

name

Override the step name shown in logs.

- step: encode_cell_values
  source: assignment_codes
  output: assignments
  group_by: [employee_id, project_id]
  code: code
  value: codes
  delimiter: ","

`expand_compact_multiaxis` / `contract_compact_multiaxis`

Category: composite · Inverse: conditionally bijective

Compose expand_xref and decode_cell_values (or their inverses) to handle compact multi-axis matrices where each cell encodes multiple coded values. Parameters are a superset of the two constituent steps.

Full parameter documentation planned.

`split_by_discriminator` / `merge_by_discriminator`

Category: primitive · Inverse: conditionally bijective

Split one frame into multiple named frames partitioned by a discriminator column value, recording frame-name → discriminator mapping in meta. merge_by_discriminator reconstructs the original frame from the split parts.

Full parameter documentation planned.

`sparse_collapse` / `sparse_expand`

Category: primitive · Inverse: conditionally bijective

Replace configured default values in editable crosstable cells with a sparse blank placeholder (sparse_collapse), or restore blanks to the configured default (sparse_expand). Used to make sparse matrices more readable in spreadsheet form.

Full parameter documentation planned.

`flatten_headers` / `unflatten_headers`

Category: primitive · Inverse: conditionally bijective

Flatten MultiIndex or tuple-like DataFrame column headers into scalar string labels (flatten_headers), or split scalar string headers back into MultiIndex form (unflatten_headers).

Full parameter documentation planned.

Supporting Steps

The following steps handle FK helpers, validation, configuration, and infrastructure concerns. Full parameter documentation is planned; this table gives the step name, category, inverse kind, and purpose for quick reference.

FK Helpers and Lookup

FK-helper primitives are policy consumers: they read the v2 relation policy under meta.helper_policies.fk (and derived helper provenance under _meta.derived.sheets.*.helper_columns). Either configure_fk_helpers (explicit / manual policy) or infer_fk_relations (heuristic policy) must run before add_fk_helpers, reorder_fk_helpers, validate_fk_helpers, and remove_fk_helpers. Missing policy is a clear error that names the producer step; primitives no longer infer relations from column names. The cell-level id(<target>) convention is the wire format on roundtrip and the default heuristic of infer_fk_relations; it is no longer interpreted by primitives.

Step Category Inverse Purpose

Step	Category	Inverse	Purpose
`infer_fk_relations`	configuration	configuration-only	Heuristic producer: scan data frames and write v2 FK relation policy under `meta.helper_policies.fk` with `produced_by.mode: naming_convention` (currently the only mode). Use it when FK columns follow the `id``(<target>)` convention.
`configure_fk_helpers`	configuration	configuration-only	Explicit / manual producer: take per-target policy from the YAML and write the v2 relation list under `_meta.helper_policies.fk` with `produced_by.mode: explicit`. Use it when you want to spell relations, helper fields, and `target_key` out by hand.
`add_fk_helpers`	primitive	projection-only	Materialize FK helper columns from the v2 relation policy. Requires that `configure_fk_helpers` or `infer_fk_relations` ran earlier. Writes derived helper provenance for later cleanup and validation.
`reorder_fk_helpers`	primitive	relation-preserving	Move FK helper columns next to their FK column using derived helper provenance and the v2 relation policy. Does not consult `FK_PATTERN`.
`validate_fk_helpers`	primitive	validation-only	Validate FK-helper consistency against the v2 relation policy and derived helper provenance (no convention inference).
`remove_fk_helpers`	primitive	projection-only	Remove materialized FK helper columns using the derived provenance written by `add_fk_helpers`. Frames without provenance or explicit policy are an error. Recovery from externally-edited workbooks without provenance is a separate opt-in concern and is not part of this step.
`add_lookup_helpers`	primitive	projection-only	Join a source frame to a lookup frame and materialize helper columns.
`configure_lookup_helpers`	configuration	configuration-only	Resolve lookup-helper policy and persist it as canonical metadata.

infer_fk_relations

configuration

configuration-only

Heuristic producer: scan data frames and write v2 FK relation policy under meta.helper_policies.fk with produced_by.mode: naming_convention (currently the only mode). Use it when FK columns follow the id(<target>) convention.

configure_fk_helpers

configuration

configuration-only

Explicit / manual producer: take per-target policy from the YAML and write the v2 relation list under _meta.helper_policies.fk with produced_by.mode: explicit. Use it when you want to spell relations, helper fields, and target_key out by hand.

add_fk_helpers

primitive

projection-only

Materialize FK helper columns from the v2 relation policy. Requires that configure_fk_helpers or infer_fk_relations ran earlier. Writes derived helper provenance for later cleanup and validation.

reorder_fk_helpers

primitive

relation-preserving

Move FK helper columns next to their FK column using derived helper provenance and the v2 relation policy. Does not consult FK_PATTERN.

validate_fk_helpers

primitive

validation-only

Validate FK-helper consistency against the v2 relation policy and derived helper provenance (no convention inference).

remove_fk_helpers

primitive

projection-only

Remove materialized FK helper columns using the derived provenance written by add_fk_helpers. Frames without provenance or explicit policy are an error. Recovery from externally-edited workbooks without provenance is a separate opt-in concern and is not part of this step.

add_lookup_helpers

primitive

projection-only

Join a source frame to a lookup frame and materialize helper columns.

configure_lookup_helpers

configuration

configuration-only

Resolve lookup-helper policy and persist it as canonical metadata.

Validation

Step Category Inverse Purpose

Step	Category	Inverse	Purpose
`add_validations`	configuration	validation-only	Write validation constraints into workbook metadata for later rendering.
`validate_references`	primitive	validation-only	Validate PK, uniqueness, FK, reference-tuple integrity, and `no_helper_columns` assertion rules against frame data.
`validate_graph`	primitive	validation-only	Validate node/edge networks for endpoint existence and duplicate edge identities.
`validate`	primitive	validation-only	Validate duplicate IDs and unresolved FK references with severity policy. (transitional status — prefer `validate_references`)

add_validations

configuration

validation-only

Write validation constraints into workbook metadata for later rendering.

validate_references

primitive

validation-only

Validate PK, uniqueness, FK, reference-tuple integrity, and no_helper_columns assertion rules against frame data.

validate_graph

primitive

validation-only

Validate node/edge networks for endpoint existence and duplicate edge identities.

validate

primitive

validation-only

Validate duplicate IDs and unresolved FK references with severity policy. (transitional status — prefer validate_references)

add_validations supports explicit column targets and role-based targets. For dynamic workbook views, prefer an explicit frame: when using roles:. The sheet: field names the visible workbook sheet where the validation is rendered. The frame: field names the frame used for column-role resolution. In projected workbook views, this is usually the semantic source frame that carries row-key, xref, helper-column, or other role metadata; it is not necessarily the final display/helper frame used to render the visible sheet. frame: may be omitted when the visible sheet and role-resolution frame are unambiguous, but keeping it explicit is recommended for consumer pipelines that use projected workbook views.

- step: add_validations
  rules:
    - target:
        sheet: story_groups
        frame: story_group_matrix
        roles: [matrix_value]
      rule:
        type: from_legend
        legend: story_group_codes
        include_empty: true

roles: accepts the foundation roles row_identity, display_helper, and matrix_value. It is mutually exclusive with explicit column: or columns: targets. Use roles: [matrix_value] for dynamic matrix value columns that previously required consumer-side fan-out plugins such as add_from_legend_constraints_for_matrix.

Configuration

Step Category Inverse Purpose

Step	Category	Inverse	Purpose
`configure_workbook_view`	configuration	configuration-only	Declare which frames render as workbook sheets (names, order, lifecycle, options).
`apply_workbook_view_sheet_mappings`	primitive	relation-preserving	Re-key visible sheets to logical workbook-view frames on readback using persisted mappings.
`apply_derived_column_policy`	primitive	projection-only	Drop helper/derived columns using registered `_meta.derived` provenance, falling back to durable workbook-view helper declarations (`_meta.sheets.<frame>.helper_columns`) for drop identity after workbook reimport. Pair with `validate_references` / `no_helper_columns` when the pipeline should assert that cleanup happened before writing canonical data.

configure_workbook_view

configuration

configuration-only

Declare which frames render as workbook sheets (names, order, lifecycle, options).

apply_workbook_view_sheet_mappings

primitive

relation-preserving

Re-key visible sheets to logical workbook-view frames on readback using persisted mappings.

apply_derived_column_policy

primitive

projection-only

Drop helper/derived columns using registered _meta.derived provenance, falling back to durable workbook-view helper declarations (_meta.sheets.<frame>.helper_columns) for drop identity after workbook reimport. Pair with validate_references / no_helper_columns when the pipeline should assert that cleanup happened before writing canonical data.

Infrastructure and Output

Step Category Inverse Purpose

Step	Category	Inverse	Purpose
`bootstrap_meta`	infrastructure	—	Merge profile defaults, persisted metadata, and runtime overrides into payload metadata.
`apply_overrides`	infrastructure	—	Load or apply workbook and sheet override metadata.
`extract_frame`	primitive	extraction	Derive a named frame by selecting columns, filtering rows, renaming, adding constants.
`pivot_frame`	primitive	projection-only	Derive a display frame by pivoting long-form rows into configured index and value columns.
`join_frames`	primitive	projection-only	Derive a view frame by joining two frames with configured keys, mode, and projections.
`normalize_resource_overrides`	primitive	projection-only	Normalize long-form resource override tuples according to a default-context and empty-value policy.
`write_structured_yaml`	infrastructure	extraction	Write nested YAML files from tabular frames and record a generated-file report frame.
`write_key_value_resources`	infrastructure	projection-only	Write key-value (`.properties`-style) files partitioned by template column values.
`write_artifact_manifest`	infrastructure	projection-only	Merge writer report frames into a deterministic manifest; optionally compute checksums.
`identity`	infrastructure	bijective	Return the pipeline payload unchanged. Useful as a no-op placeholder.
`plugin`	plugin	—	Run a caller-provided dotted callable as an explicit pipeline extension point.

bootstrap_meta

infrastructure

—

Merge profile defaults, persisted metadata, and runtime overrides into payload metadata.

apply_overrides

infrastructure

—

Load or apply workbook and sheet override metadata.

extract_frame

primitive

extraction

Derive a named frame by selecting columns, filtering rows, renaming, adding constants.

pivot_frame

primitive

projection-only

Derive a display frame by pivoting long-form rows into configured index and value columns.

join_frames

primitive

projection-only

Derive a view frame by joining two frames with configured keys, mode, and projections.

normalize_resource_overrides

primitive

projection-only

Normalize long-form resource override tuples according to a default-context and empty-value policy.

write_structured_yaml

infrastructure

extraction

Write nested YAML files from tabular frames and record a generated-file report frame.

write_key_value_resources

infrastructure

projection-only

Write key-value (.properties-style) files partitioned by template column values.

write_artifact_manifest

infrastructure

projection-only

Merge writer report frames into a deterministic manifest; optionally compute checksums.

identity

infrastructure

bijective

Return the pipeline payload unchanged. Useful as a no-op placeholder.

plugin

plugin

—

Run a caller-provided dotted callable as an explicit pipeline extension point.

Project spreadsheet-handling — User Guide

1. Introduction

What This Project Does

Two Repositories

Scope of This Guide

2. Workflow

The Core Idea

Pipeline YAML Structure

Running a Pipeline

One-Shot Schema Column Maintenance

3. Transformation Catalog

Reading This Catalog

Inverse kinds

Groups

Bijective Transforms

expand_xref / contract_xref

decode_cell_values / encode_cell_values

expand_compact_multiaxis / contract_compact_multiaxis

split_by_discriminator / merge_by_discriminator

sparse_collapse / sparse_expand

flatten_headers / unflatten_headers

Supporting Steps

FK Helpers and Lookup

Validation

Configuration

Infrastructure and Output

Project `spreadsheet-handling` — User Guide

`expand_xref` / `contract_xref`

`decode_cell_values` / `encode_cell_values`

`expand_compact_multiaxis` / `contract_compact_multiaxis`

`split_by_discriminator` / `merge_by_discriminator`

`sparse_collapse` / `sparse_expand`

`flatten_headers` / `unflatten_headers`