ADR-0405: Export data contract and schema versioning¶
The serialized inventory shape every sink and consumer depends on: the
Itemrow, its ordering, and how the contract is versioned.
Theme: 04 · Export & sinks · Status: Current (schema versioning: Exploring)
Context¶
Kollect's external value is the exported inventory payload. Portals, SQL queries, Git diffs,
Kafka consumers, and the HTTP API all read this contract — it is the most stability-sensitive surface
in the project, yet it had no ADR. The shape is implemented in internal/collect/store.go but its
guarantees (field set, ordering, null handling, versioning) were never written down.
A data contract must be: explicit, stable-ordered (for diffable Git and golden tests — ADR-0103), bounded (no full payload in etcd status), and versioned so consumers can detect breaking changes.
Decision¶
Row shape (Item)¶
One collected resource = one Item (internal/collect/store.go):
{
"targetNamespace": "team-a",
"targetName": "deployments",
"namespace": "team-a",
"name": "api",
"group": "apps",
"version": "v1",
"kind": "Deployment",
"uid": "…",
"attributes": { "image": "…", "images": ["…"] }
}
- Identity fields (
group/version/kind,namespace,name,uid) locate the source object;targetNamespace/targetNamerecord whichKollectTargetproduced the row. attributesis the profile-defined extraction result (map[string]any); JSONPath[*]yields a JSON array (ADR-0302).groupisomitempty(core kinds); all other identity fields are always present.
Aggregated payload¶
- Default export = a JSON array of
Itemfor the inventory's scope (MarshalNamespaceJSON). - HTTP =
NamespaceSummary { namespace, itemCount, items }. - Sink projections derive from this canonical snapshot (ADR-0401):
Postgres rows keyed by
(inventory_namespace, inventory_name, target_name, source_uid)+cluster; Kafka keyed by{cluster}:{ns}/{name}; Git/object-store as the whole JSON document.
Export metadata¶
Carried alongside the payload (status + sink columns/headers), not inside each row:
schemaVersion (envelope contract version), checksum (SHA-256 of payload — aggregate.ContentHash),
source generation, itemCount, exportedAt, and cluster. These drive debounce/coalesce
(ADR-0305) and staleness detection without bloating rows.
Stability rules (binding)¶
- Deterministic ordering — stable key order on serialize so Git diffs and golden tests are reproducible.
- Additive evolution preferred — new attributes/fields are additive; removals/renames are breaking and gated by the API versioning policy (ADR-0206).
- No secrets, ever — redaction happens before export (ADR-0303, ADR-0104).
- Bounded size — spill over
maxExportBytesto object store; never to etcd (ADR-0103).
Consequences¶
- Consumers have one documented schema across all sinks.
- Golden/contract tests can assert the shape; breaking it fails CI.
- Consumers can branch on
schemaVersionwithout coupling to CRD API versions.
Implementation status (schemaVersion milestone)¶
| Export path | schemaVersion envelope |
Status |
|---|---|---|
Kafka EventEnvelope |
Yes — internal/sink/kafka/backend.go |
Shipped |
| Inventory / cluster inventory sink export | No — bare []Item JSON array (MarshalNamespaceJSON) |
Pre-beta gap |
| Git / Postgres / S3 / GCS object payloads | No — canonical array only | Pre-beta gap |
| Read API HTTP responses | No — NamespaceSummary without envelope |
Pre-beta gap — v0.5 gate per ADR-0411 |
Contract value: kollect.dev/v1alpha1 (ADR-0206).
Pre-beta milestone: wrap all sink exports and Read API responses in a versioned envelope
(schemaVersion, items, metadata) so consumers decouple from CRD API versions
(ADR-0206). Until then, schema versioning remains Exploring.
Open questions¶
- PARTIAL: Explicit
schemaVersionon Kafka event envelopes — inventory and state-sink JSON exports still emit bare arrays; milestone tracked above. - DECIDED : Attributes stay
map[string]anyin the contract; stronger typing is a sink-side concern — the Parquet sink promotes a hot-attribute allowlist to typed columns while keeping a JSONattributescolumn (ADR-0401). - PARTIAL : OpenAPI extensions (pagination, filters, envelope,
exportStatus) tracked in ADR-0411; publish JSON Schema forItemalongside OpenAPI when envelope milestone closes.