1. Introduction
What This Project Does
spreadsheet-handling is a Python toolkit for roundtrip workflows between
structured data and human-editable spreadsheets. A configurable pipeline
projects data into spreadsheet form for review and editing, then re-imports it
deterministically. Semantics are kept explicit: cross-references, coded cell
values, and FK helper columns are all configuration-driven, not inferred from
sheet layout.
Two Repositories
The library (spreadsheet-handling) and a demo integration
(spreadsheet-handling-demo) are separate repositories. The
demo is the main user entry point and contains runnable
examples and sample data.
This guide documents the framework surface shipped with a specific library release.
Scope of This Guide
-
Pipeline workflow overview and starting points
-
Transformation catalog: step names, parameters, roundtrip guarantees
Demo data, runnable scripts, and integration walkthroughs live in the demo repository.
2. Workflow
The Core Idea
A pipeline transforms data bidirectionally between a normalized relation form (YAML-friendly) and a human-editable spreadsheet. The same configuration drives both directions, so re-import is deterministic.
Practical workflows tend to grow in rings: start with a plain tabular roundtrip (data in → spreadsheet out → back in), then add FK helper columns for display, then introduce cross-table or compact-cell encoding for multi-valued relations. Each ring adds one transform pair from the catalog.
A runnable walkthrough of this spiral is available in the spreadsheet-handling-demo repository.
Pipeline YAML Structure
A pipeline is a YAML list of step entries. Each entry names a registered transform and supplies its parameters inline:
pipeline:
- step: expand_xref
matrix: role_permissions
output: permission_rows
row_keys: role_id
drop_empty: true
Available step names, required parameters, and roundtrip guarantees are listed in the Transformation Catalog.
Running a Pipeline
sheets-run is the primary runtime command for executing a documented
pipeline. For an ad-hoc pipeline file like the one above, provide the steps file
and input/output locations. The pipeline steps transform the loaded frames; the
CLI input and output options choose how those frames are read and written:
sheets-run --steps pipeline.yaml \
--in-kind yaml_dir --in-path data/input \
--out-kind yaml_dir --out-path build/output
The spreadsheet-handling-demo repository contains the full first-hour walkthrough with checked-in input data, pipeline files, and expected outputs. The user guide focuses on the configuration concepts and transform catalog rather than duplicating that tutorial.
3. Transformation Catalog
Reading This Catalog
Step names used in pipeline YAML must match the runtime_name values in
pipeline_step_registry.json (under core/registries/), which is the
authoritative registry artifact.
Inverse kinds
| bijective |
Fully reversible. The inverse step reconstructs the original exactly. |
| conditionally bijective |
Reversible when the stated conditions hold (see each entry). |
| relation-preserving |
No data is lost, but column order or frame shape may change. |
| projection-only |
One-way; the inverse step cannot reconstruct the original. |
| validation-only |
Read-only; no frames are changed. |
| configuration-only |
Writes workbook metadata only; no data frames are changed. |
Groups
- Bijective transforms
-
Paired expand/contract steps that form the core of the bidirectional workflow.
- Supporting steps
-
FK helpers, validation, configuration, and infrastructure steps listed in a reference table.
Bijective Transforms
These steps come in expand/contract pairs. The expand direction converts a compact or matrix representation into an explicit relation (long form). The contract direction reverses it. Losslessness is conditional — see each entry.
expand_xref / contract_xref
Category: primitive · Inverse: conditionally bijective
Expand a matrix frame (rows × columns of marker cells) into explicit long-form relation rows, or contract them back. Typical use: a role × permission table where each cell marks whether the role has the permission.
See also: concept doc technical_model/ch04_concepts/crosstab_binary_xref.adoc
and compact_transforms.adoc.
expand_xref parameters
| Parameter | Required | Description |
|---|---|---|
|
yes |
Name of the input matrix frame. |
|
yes |
Name of the output relation frame to produce. |
|
yes |
Column name (string) or list of column names that form the row identifier. |
|
no |
Columns to expand. Defaults to all non-key columns. |
|
no |
Column name in the output holding the original column label (default: |
|
no |
Column name in the output holding the cell value (default: |
|
no |
If |
|
no |
Override the step name shown in logs. |
Losslessness condition: lossless when drop_empty: false, or when the
original matrix had no empty cells in the expanded columns.
- step: expand_xref
matrix: role_permissions
output: permission_rows
row_keys: role_id
drop_empty: true
contract_xref parameters
| Parameter | Required | Description |
|---|---|---|
|
yes |
Name of the input long-form relation frame. |
|
yes |
Name of the output matrix frame to produce. |
|
yes |
Column name(s) forming the row identifier (must match |
|
no |
Column in the relation holding the column labels. Defaults to value persisted
in meta by the preceding |
|
no |
Column in the relation holding cell values. Defaults to meta-persisted value. |
|
no |
Explicit ordered list of output columns to generate (pins column order). |
|
no |
Value to place in cells with no matching relation row. Default: empty string. |
|
no |
Override the step name shown in logs. |
Losslessness condition: lossless when relation rows are unique per (row_keys, column_key) combination.
- step: contract_xref
relation: permission_rows
output: role_permissions
row_keys: role_id
decode_cell_values / encode_cell_values
Category: primitive · Inverse: conditionally bijective
Decode a compact cell string (e.g. "A,B,C") into one row per code, or encode
code rows back into a compact cell string. Used inside the compact multiaxis
chain or independently for coded single-column cells.
See also: concept doc technical_model/ch04_concepts/compact_transforms.adoc.
decode_cell_values parameters
| Parameter | Required | Description |
|---|---|---|
|
yes |
Name of the input frame containing encoded cell strings. |
|
yes |
Name of the output frame with one row per decoded token. |
|
no |
Column in |
|
no |
Column name in |
|
no |
Columns from |
|
no |
If |
|
no |
Decoding mode. |
|
no |
Token separator for |
|
no |
Explicit list of valid code strings. Validation fails on unknown codes. |
|
no |
Alias for |
|
no |
Name of a legend frame from which to derive the allowed-code set. |
|
no |
|
|
no |
If |
|
no |
Override the step name shown in logs. |
Losslessness condition: lossless when drop_empty: false and token order is
preserved (or canonical_order is set on the encode side).
- step: decode_cell_values
source: assignments
output: assignment_codes
value: codes
code: code
delimiter: ","
passthrough_columns: [employee_id, project_id]
encode_cell_values parameters
| Parameter | Required | Description |
|---|---|---|
|
yes |
Name of the input frame with one row per code. |
|
yes |
Name of the output frame with encoded cell strings. |
|
yes |
Column name(s) to group rows before joining into a cell string. |
|
no |
Column containing the token to encode. Default: |
|
no |
Column name in |
|
no |
Encoding mode. Matches |
|
no |
Separator. Default: |
|
no |
Explicit ordered list of codes; determines output token order. |
|
no |
Validation whitelist. |
|
no |
Legend frame name for allowed-code derivation. |
|
no |
|
|
no |
Strip whitespace before encoding. |
|
no |
Override the step name shown in logs. |
- step: encode_cell_values
source: assignment_codes
output: assignments
group_by: [employee_id, project_id]
code: code
value: codes
delimiter: ","
expand_compact_multiaxis / contract_compact_multiaxis
Category: composite · Inverse: conditionally bijective
Compose expand_xref and decode_cell_values (or their inverses) to handle
compact multi-axis matrices where each cell encodes multiple coded values.
Parameters are a superset of the two constituent steps.
Full parameter documentation planned.
split_by_discriminator / merge_by_discriminator
Category: primitive · Inverse: conditionally bijective
Split one frame into multiple named frames partitioned by a discriminator column
value, recording frame-name → discriminator mapping in meta. merge_by_discriminator
reconstructs the original frame from the split parts.
Full parameter documentation planned.
sparse_collapse / sparse_expand
Category: primitive · Inverse: conditionally bijective
Replace configured default values in editable crosstable cells with a sparse
blank placeholder (sparse_collapse), or restore blanks to the configured
default (sparse_expand). Used to make sparse matrices more readable in
spreadsheet form.
Full parameter documentation planned.
flatten_headers / unflatten_headers
Category: primitive · Inverse: conditionally bijective
Flatten MultiIndex or tuple-like DataFrame column headers into scalar string
labels (flatten_headers), or split scalar string headers back into MultiIndex
form (unflatten_headers).
Full parameter documentation planned.
Supporting Steps
The following steps handle FK helpers, validation, configuration, and infrastructure concerns. Full parameter documentation is planned; this table gives the step name, category, inverse kind, and purpose for quick reference.
FK Helpers and Lookup
FK-helper primitives are policy consumers: they read the v2 relation policy
under meta.helper_policies.fk (and derived helper provenance under
_meta.derived.sheets.*.helper_columns). Either configure_fk_helpers
(explicit / manual policy) or infer_fk_relations (heuristic policy) must
run before add_fk_helpers, reorder_fk_helpers, validate_fk_helpers, and
remove_fk_helpers. Missing policy is a clear error that names the producer
step; primitives no longer infer relations from column names. The cell-level
id(<target>) convention is the wire format on roundtrip and the default
heuristic of infer_fk_relations; it is no longer interpreted by primitives.
| Step | Category | Inverse | Purpose |
|---|---|---|---|
|
configuration |
configuration-only |
Heuristic producer: scan data frames and write v2 FK relation policy under
|
|
configuration |
configuration-only |
Explicit / manual producer: take per-target policy from the YAML and write
the v2 relation list under |
|
primitive |
projection-only |
Materialize FK helper columns from the v2 relation policy. Requires that
|
|
primitive |
relation-preserving |
Move FK helper columns next to their FK column using derived helper
provenance and the v2 relation policy. Does not consult |
|
primitive |
validation-only |
Validate FK-helper consistency against the v2 relation policy and derived helper provenance (no convention inference). |
|
primitive |
projection-only |
Remove materialized FK helper columns using the derived provenance written
by |
|
primitive |
projection-only |
Join a source frame to a lookup frame and materialize helper columns. |
|
configuration |
configuration-only |
Resolve lookup-helper policy and persist it as canonical metadata. |
Validation
| Step | Category | Inverse | Purpose |
|---|---|---|---|
|
configuration |
validation-only |
Write validation constraints into workbook metadata for later rendering. |
|
primitive |
validation-only |
Validate PK, uniqueness, FK, reference-tuple integrity, and
|
|
primitive |
validation-only |
Validate node/edge networks for endpoint existence and duplicate edge identities. |
|
primitive |
validation-only |
Validate duplicate IDs and unresolved FK references with severity policy.
(transitional status — prefer |
Configuration
| Step | Category | Inverse | Purpose |
|---|---|---|---|
|
configuration |
configuration-only |
Declare which frames render as workbook sheets (names, order, lifecycle, options). |
|
primitive |
relation-preserving |
Re-key visible sheets to logical workbook-view frames on readback using persisted mappings. |
|
primitive |
projection-only |
Drop helper/derived columns using registered |
Infrastructure and Output
| Step | Category | Inverse | Purpose |
|---|---|---|---|
|
infrastructure |
— |
Merge profile defaults, persisted metadata, and runtime overrides into payload metadata. |
|
infrastructure |
— |
Load or apply workbook and sheet override metadata. |
|
primitive |
extraction |
Derive a named frame by selecting columns, filtering rows, renaming, adding constants. |
|
primitive |
projection-only |
Derive a display frame by pivoting long-form rows into configured index and value columns. |
|
primitive |
projection-only |
Derive a view frame by joining two frames with configured keys, mode, and projections. |
|
primitive |
projection-only |
Normalize long-form resource override tuples according to a default-context and empty-value policy. |
|
infrastructure |
extraction |
Write nested YAML files from tabular frames and record a generated-file report frame. |
|
infrastructure |
projection-only |
Write key-value ( |
|
infrastructure |
projection-only |
Merge writer report frames into a deterministic manifest; optionally compute checksums. |
|
infrastructure |
bijective |
Return the pipeline payload unchanged. Useful as a no-op placeholder. |
|
plugin |
— |
Run a caller-provided dotted callable as an explicit pipeline extension point. |