Skip to content

ADR-0405: Export data contract and schema versioning

The serialized inventory shape every sink and consumer depends on: the Item row, its ordering, and how the contract is versioned.

Theme: 04 · Export & sinks · Status: Current (schema versioning: Exploring)

Context

Kollect's external value is the exported inventory payload. Portals, SQL queries, Git diffs, Kafka consumers, and the HTTP API all read this contract — it is the most stability-sensitive surface in the project, yet it had no ADR. The shape is implemented in internal/collect/store.go but its guarantees (field set, ordering, null handling, versioning) were never written down.

A data contract must be: explicit, stable-ordered (for diffable Git and golden tests — ADR-0103), bounded (no full payload in etcd status), and versioned so consumers can detect breaking changes.

Decision

Row shape (Item)

One collected resource = one Item (internal/collect/store.go):

{
  "targetNamespace": "team-a",
  "targetName": "deployments",
  "namespace": "team-a",
  "name": "api",
  "group": "apps",
  "version": "v1",
  "kind": "Deployment",
  "uid": "…",
  "attributes": { "image": "…", "images": ["…"] }
}
  • Identity fields (group/version/kind, namespace, name, uid) locate the source object; targetNamespace/targetName record which KollectTarget produced the row.
  • attributes is the profile-defined extraction result (map[string]any); JSONPath [*] yields a JSON array (ADR-0302).
  • group is omitempty (core kinds); all other identity fields are always present.

Aggregated payload

  • Default export = a JSON array of Item for the inventory's scope (MarshalNamespaceJSON).
  • HTTP = NamespaceSummary { namespace, itemCount, items }.
  • Sink projections derive from this canonical snapshot (ADR-0401): Postgres rows keyed by (inventory_namespace, inventory_name, target_name, source_uid) + cluster; Kafka keyed by {cluster}:{ns}/{name}; Git/object-store as the whole JSON document.

Export metadata

Carried alongside the payload (status + sink columns/headers), not inside each row: schemaVersion (envelope contract version), checksum (SHA-256 of payload — aggregate.ContentHash), source generation, itemCount, exportedAt, and cluster. These drive debounce/coalesce (ADR-0305) and staleness detection without bloating rows.

Stability rules (binding)

  1. Deterministic ordering — stable key order on serialize so Git diffs and golden tests are reproducible.
  2. Additive evolution preferred — new attributes/fields are additive; removals/renames are breaking and gated by the API versioning policy (ADR-0206).
  3. No secrets, ever — redaction happens before export (ADR-0303, ADR-0104).
  4. Bounded size — spill over maxExportBytes to object store; never to etcd (ADR-0103).

Consequences

  • Consumers have one documented schema across all sinks.
  • Golden/contract tests can assert the shape; breaking it fails CI.
  • Consumers can branch on schemaVersion without coupling to CRD API versions.

Implementation status (schemaVersion milestone)

Export path schemaVersion envelope Status
Kafka EventEnvelope Yes — internal/sink/kafka/backend.go Shipped
Inventory / cluster inventory sink export No — bare []Item JSON array (MarshalNamespaceJSON) Pre-beta gap
Git / Postgres / S3 / GCS object payloads No — canonical array only Pre-beta gap
Read API HTTP responses No — NamespaceSummary without envelope Pre-beta gapv0.5 gate per ADR-0411

Contract value: kollect.dev/v1alpha1 (ADR-0206).

Pre-beta milestone: wrap all sink exports and Read API responses in a versioned envelope (schemaVersion, items, metadata) so consumers decouple from CRD API versions (ADR-0206). Until then, schema versioning remains Exploring.

Open questions

  • PARTIAL: Explicit schemaVersion on Kafka event envelopes — inventory and state-sink JSON exports still emit bare arrays; milestone tracked above.
  • DECIDED : Attributes stay map[string]any in the contract; stronger typing is a sink-side concern — the Parquet sink promotes a hot-attribute allowlist to typed columns while keeping a JSON attributes column (ADR-0401).
  • PARTIAL : OpenAPI extensions (pagination, filters, envelope, exportStatus) tracked in ADR-0411; publish JSON Schema for Item alongside OpenAPI when envelope milestone closes.