Platform decisions — architecture summary¶
This page summarizes locked platform decisions from the architecture review of 2026-06-05. It is the concise reference for operators, contributors, and architects evaluating or building Kollect. For the full decision log and reasoning, see ADR-0201: CRD model.
Pre-beta API
The public API is v1alpha1 and may change without notice while the project is pre-beta.
Breaking changes are batched deliberately before a future v1beta1 freeze (planned around the
v0.10 presentation gate). Do not build production integrations on field names or behavior
not yet covered by contract tests.
Phases, releases, and API stability¶
Phases describe build order, not release milestones. Work proceeds in the sequence below; individual features may land in parallel, but Kollect is not presented as a general-availability product until the tree reaches beta-quality overall.
The public API is v1alpha1 and may change without notice while the project is pre-beta.
There are no external adopters yet — tenancy, scalability, and correctness take priority over
backward compatibility. Breaking changes are batched deliberately before a future v1beta1 freeze
(planned around the v0.10 presentation gate — see RELEASE.md).
Core decisions¶
Non-negotiables¶
Same-namespace references
profileRef, sinkRefs, and KollectConnectionTest.spec.sinkRef resolve objects in the
same namespace as the referring CR. Cluster reconciled kinds reference namespaced config by
explicit name + namespace (sink refs default to sinkNamespace) (ADR-0208).
- Namespaced default: Profile, Sink, Target, Inventory, Scope in team namespace.
- Cluster reconciled kinds:
KollectClusterTarget(platform cross-namespace collection),KollectClusterInventory,KollectClusterScope. They reference namespacedKollectProfile/ family sinks byname+namespace— no cluster-scoped static config kinds (ADR-0208). - Default install: per-team Helm —
tenantMode: true,watchNamespaces: [team-ns]. - MVP: collect → aggregate → export to Postgres or Kafka for portals/scale; Git is the recommended sink for small single-cluster installs without a database or Kafka broker.
- HTTP inventory: optional debug (
featureGates.inventoryHttp.enabled: false); not MVP; portal read uses sink export or Read API (ADR-0408). KollectConnectionTestCR — implement (ADR-0403).- Shared informer per GVK — one cache per GVK, Targets filter in reconcile.
- Watch labels — support
AllandOptIn; platform central + per-resourcekollect.dev/watch: disabled. - Single operator mode —
mode: single(cluster collect + export); no hub/spoke runtime modes. - No doc-sync in operator (ADR-0702).
Build order¶
- Namespaced Sink + Profile (batch breaking change)
- MVP export path — Deployment profile → Postgres (or Kafka) sink
KollectScopereconciler enforcementKollectConnectionTestCR + keep Sink annotation/spec probes- Export debouncing — per sink ref via
exportMinIntervalprecedence (ADR-0413); inventory default 30s - Argo
Applicationsample + contract test (contract test first; then samples) KollectClusterTarget— API + webhook only until namespaced MVP proven; controller later- Secondary watches — Profile → Targets, family Sink → Inventories (beta requirement)
- Generic CRD sample —
cert-manager.io/Certificate+ contract test - GitLab sink — Phase 2 (custom CA via
tls.caSecretRef; first enterprise presentation path)
Sink taxonomy¶
Sink/transport reframe — ADR-0401.
| Topic | Decision |
|---|---|
| Sink taxonomy | Role-based: snapshot store (Git/Parquet/HTTP) · relational SoR (Postgres) · event emitter (NATS/Kafka) |
| Canonical artifact | In-memory snapshot per Inventory; all sinks are projections |
| Queryable, no DB | Add S3/GCS Parquet sink queryable by DuckDB/Athena; deletes free; 3 store tiers (Git + Parquet + Postgres) |
| Postgres deletes | Add delete reconciliation (diff vs last export) — upsert-only is a bug |
| Event default | NATS JetStream lean default; Kafka/Redpanda enterprise opt-in (one Kafka-API driver) |
| Multi-cluster default | Direct shared-sink fan-in (spec.cluster) per ADR-0501 |
Collection, cluster targets, and enterprise sinks¶
| Topic | Decision |
|---|---|
| Secondary watches | Ship — KollectProfile change enqueues referring Targets; family sink change enqueues Inventories with matching snapshotSinkRefs, databaseSinkRefs, or eventSinkRefs |
| Generic CRD sample | cert-manager.io/Certificate — contract test first, then profile + target + walkthrough |
KollectClusterTarget controller |
Defer — API + webhook + sample only until namespaced e2e solid |
profileRef (cluster target) |
Resolves a namespaced KollectProfile by explicit name + namespace — required at admission, no implicit fallback (ADR-0208) |
namespaceSelector |
Required — webhook rejects empty/missing selector |
| Helm values profile | helm-release-values-redacted + operator scrubKeys[] at extraction |
| GitLab sink | Phase 2 — implement with tls.caSecretRef for internal CA; Git remains small-install default |
Export debouncing and scope¶
| Topic | Decision |
|---|---|
exportMinInterval unset |
CRD default 30s (kubebuilder:default); per-sink overrides on structured sinkRefs[] |
exportMinInterval bypass |
Immediate export on payload checksum change or metadata.generation bump |
| ConnectionTest re-run | Re-probe on any spec change (generation ≠ observedGeneration); TTL restarts after new run |
KollectScope enforcement |
Hard degrade — Degraded + no collect/export; see ADR-0203 example |
KollectClusterInventory |
One CR rolls up all KollectClusterTargets; optional targetRefs for subset / 1:1 |
| ADR micro-decisions (Postgres PK, HTTP paths, Kafka keys, …) | Confirmed — see Implementation defaults below |
Samples, connection tests, and cluster rollup¶
| Topic | Decision |
|---|---|
| Argo contract test | First — golden fixture locks JSONPath + history ordering |
| Argo samples | Profile argo-application-summary + Target team-argo-applications |
KollectConnectionTest GC |
spec.ttlSecondsAfterFinished — default 300s |
| Export debounce | Per Inventory — spec.exportMinInterval, default 30s |
| Cluster rollup | KollectClusterInventory + KollectClusterTarget (no namespaced inventoryRef hack) |
Implementation defaults¶
These defaults apply unless a cited ADR specifies otherwise. Phase 1 rows are expected to ship as written; Phase 2+ rows may require a spike before implementation. Update the cited ADR when the corresponding code merges.
HTTP API and auth (ADR-0103, ADR-0404)¶
| Topic | Default | Phase |
|---|---|---|
| Inventory HTTP path | GET /v1alpha1/inventory; optional GET /v1alpha1/inventory/{namespace}/{name} |
When HTTP enabled (debug; default off) |
| OpenAPI | openapi/v1alpha1/inventory.yaml beside handler |
1 |
| Inventory read SAR | get on kollectinventories in caller namespace; list for index |
1 |
| GitLab sink | type: gitlab backend + custom CA TLS; Phase 2 after Git path proven |
2 |
| Secondary watches | Profile → Targets; family Sink → Inventories | 1 (beta) |
| TokenReview/SAR cache | 30s TTL in-memory per (token hash, verb, resource); flag + disabled for dev |
1 |
maxExportBytes |
Global manager default (~1.5 MiB) + optional KollectInventory.spec.maxExportBytes; webhook rejects override > global cap |
1 |
Sinks and export (ADR-0402, ADR-0602)¶
| Topic | Default | Phase |
|---|---|---|
| SQLite sink | Skip — Postgres testcontainers sufficient | — |
| Postgres upsert PK | (cluster, inventory_namespace, inventory_name, target_name, source_uid) |
1 |
| Kafka message key | {cluster}:{inventory_ns}/{inventory_name} when fleet; else {inventory_ns}/{inventory_name} |
1 |
| Kafka value | JSON row batch + metadata (generation, checksum); at-least-once |
1 |
| Sink error metric | kollect_sink_errors_total{reason} — separate from reconcile counter |
1 |
| Export duration histogram | Default buckets: .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10 seconds |
1 |
| Export debounce | Per sink ref — ref override → sink default → inventory default 30s → scope floor | 1 |
| Connection test TTL | KollectConnectionTest.spec.ttlSecondsAfterFinished — default 300 |
1 |
Extraction and Helm (ADR-0302, ADR-0303)¶
| Topic | Default | Phase |
|---|---|---|
helm: decode |
Prefix helm:release.<field> on attribute path; decoder expands data.release gzip JSON |
2+ |
| Values redaction | Global operator scrubKeys[]; per-attribute redact: true later |
2 |
| JSONPath filter validation | Webhook warn only; reject unsupported filters | 1 warn / 2 reject |
| CEL prefix | Webhook requires cel: prefix on CEL expressions |
1 |
Fleet export (ADR-0501)¶
| Topic | Default | Phase |
|---|---|---|
| Topology | Shared sink fan-in — spec.cluster per installation |
1 |
| Git layout | pathTemplate: clusters/{cluster}/… on snapshot sinks |
1 |
| Delivery semantics | At-least-once export; idempotent merge on (cluster, ns, name, uid) |
1 |
CRD model (ADR-0201)¶
| Topic | Default | Phase |
|---|---|---|
caBundle vs caSecretRef |
Keep both — caSecretRef preferred; caBundle max 64 KiB (webhook) |
1 |
KollectClusterInventory |
One platform rollup CR — aggregates all KollectClusterTargets; optional targetRefs |
3+ |
Documentation status¶
- [x] CR reference guide — scaffold at CR-REFERENCE.md + crds/; per-kind failure-mode detail is tracked in CR-REFERENCE.md.
- [x] Architecture data flows — DATA-FLOWS.md (debounce + collection + scope + connection test).
- [x] JSONPath
[*]wildcard — all array elements; deployment sample updated (ADR-0302).
Completed follow-ups¶
- [x] Argo
Applicationcontract test —internal/collect/argo_application_contract_test.go - [x] Argo samples — profile + target under
config/samples/ - [x]
KollectConnectionTestTTL — API + reconciler GC (ttlSecondsAfterFinished, default 300s) - [x]
exportMinIntervalonKollectInventory— wired; per-sinksinkRefs[]+status.sinkExports[](ADR-0413); manager--export-debounceflag removed
Tenancy and CRD model¶
flowchart TD
subgraph teamNs [Team namespace]
Prof[KollectProfile]
Snap[KollectSnapshotSink]
Db[KollectDatabaseSink]
Ev[KollectEventSink]
Scope[KollectScope]
Tgt[KollectTarget]
Inv[KollectInventory]
ConnTest[KollectConnectionTest]
end
Prof --> Tgt
Scope -.-> Tgt
Scope -.-> Inv
Tgt --> Inv
Inv --> Snap
Inv --> Db
Inv --> Ev
ConnTest -.-> Snap
ConnTest -.-> Db
ConnTest -.-> Ev
| Kind | Scope | Notes |
|---|---|---|
KollectProfile |
Namespace | Same-ns profileRef on Target |
KollectSnapshotSink |
Namespace | Same-ns snapshotSinkRefs on Inventory |
KollectDatabaseSink |
Namespace | Same-ns databaseSinkRefs on Inventory |
KollectEventSink |
Namespace | Same-ns eventSinkRefs on Inventory |
KollectTarget |
Namespace | Default for tenantMode; same-ns profileRef |
KollectClusterTarget |
Cluster | Platform operator; namespaceSelector + namespaced profileRef (name + namespace) |
KollectInventory |
Namespace | Aggregates namespaced targets in namespace |
KollectScope |
Namespace | Webhook + reconciler enforcement |
KollectConnectionTest |
Namespace | One-shot / CI connectivity probes |
KollectClusterInventory |
Cluster | Rollup for cluster targets; namespaced family-sink refs by name + namespace |
KollectClusterScope |
Cluster | Platform policy ceiling |
Reserved CRDs — what they mean¶
Reserved kinds are design placeholders, not promises to ship soon:
| Kind | Intent | When |
|---|---|---|
KollectClusterTarget |
One cluster object collects across namespaces (platform operator) | Shipped — references a namespaced KollectProfile by name + namespace (ADR-0208) |
KollectClusterInventory |
Roll up all namespaces for platform portal | Shipped — namespaced family-sink refs (name + namespace, default sinkNamespace) |
KollectClusterScope |
Cluster-wide policy when namespaced Scope is too weak | Shipped — platform ceiling |
KollectReceiver |
Inbound webhook → trigger export (Flux Receiver) | No webhook trigger requirement yet |
KollectTargetSet |
Generate many Targets (ApplicationSet) | Manual Targets OK for MVP |
~~KollectPublication~~ |
Confluence/doc sync | Rejected — ADR-0702 |
Do not add controllers or document samples for reserved kinds unless an ADR promotes them.
Sinks and portals¶
Sinks are classified by role, not vendor (ADR-0401). The in-memory snapshot per Inventory is canonical; every sink is a projection.
| Role | Backends | When | Deletes |
|---|---|---|---|
| Snapshot store | Git (audit), S3/GCS Parquet (queryable via DuckDB, no DB server), HTTP (debug) | small/medium installs; audit; ad-hoc SQL without a database | free |
| Relational SoR | Postgres | rich portal queries/joins | needs delete reconciliation |
| Event emitter | NATS JetStream (lean default), Kafka/Redpanda (enterprise opt-in) | downstream integration / change feed | tombstone (consumer-owned) |
- Postgres vs Kafka are not twins — Postgres = queryable state; Kafka = event stream (consumer builds its view).
- S3/GCS Parquet is the recommended "queryable inventory without running a database" tier.
- GitLab (Phase 2) — enterprise Git host, internal CA via
tls.caSecretRef. - Fleet fan-in — each operator exports to shared sinks with
spec.cluster(ADR-0501).
Collection engine¶
- One shared informer per GVK (ADR-0301).
- Watch labels (ADR-0205): platform runs
watchMode: All; teams opt out withkollect.dev/watch: disabledon a resource or namespace annotation. - Export debouncing: in-memory store updates on every event; export to sink coalesced per
Inventory via
spec.exportMinInterval(default 30s) + checksum/generation bump for material changes. Not a global manager flag.
Where inventory lives¶
| Store | Content | Survives restart? |
|---|---|---|
| Collect engine / informer cache | Extracted attribute rows | Rebuilt on resync after restart |
KollectInventory.status |
Counts, conditions, last export SHA/ref | Yes (metadata only) |
| Postgres / Kafka sink | Full inventory payload | Yes — system of record |
| Inventory HTTP (optional) | RAM snapshot | No — debug only |
Never persist full payloads in etcd (ADR-0103).
Multi-cluster fleet¶
Topology = direct shared-sink fan-in (ADR-0501): each operator
exports to a shared Postgres/Git/event backend with spec.cluster set; the backend key/PK provides
the merge. See Multi-cluster fleet example.
Helm / chart version sample¶
- Primary:
argoproj.io/v1alpha1Application(not FluxHelmRelease) - Contract test required — validate status field paths
- Secondary: Flux sample may remain; plain Helm Secret deferred (
helm:decode)
Connection test¶
KollectConnectionTestCR — primary for audited/CI/composite probesspec.ttlSecondsAfterFinished— default 300s; delete CR after probe completes- Sink
connectionTest: falsein prod; annotation for quick sink-only retest - See ADR-0403
Still open¶
| Topic | ADR |
|---|---|
| Cross-cluster sink auth (mTLS, workload identity) | Deferred — sink-specific; no hub tier (ADR-0501) |