Best practices¶
Platform-oriented guidance for designing Kollect scope, sinks, and multi-cluster topology. For install defaults, see Operator manual.
Assumptions
Read Understand the basics and Platform decisions before changing production values. Sink role taxonomy: ADR-0401.
Pre-beta API
v1alpha1 fields may change until the first release candidate. Check ROADMAP
before fleet rollout.
Scope design¶
Per-team install (recommended default)¶
Install one operator per tenant boundary with namespaced RBAC and a restricted informer cache (ADR-0203, ADR-0703 (archived)):
tenantMode: true
watchNamespaces:
- team-a
mode: single
featureGates:
inventoryHttp:
enabled: false
Place KollectProfile, KollectSink, KollectTarget, and KollectInventory in the team
namespace. Portal read paths should use Postgres or Kafka sink export — not the optional HTTP
inventory API in production.
Same-namespace references¶
Namespaced pipeline objects must reference peers in the same namespace:
| Field | Must resolve in |
|---|---|
KollectTarget.spec.profileRef |
Target namespace |
KollectInventory.spec.sinkRefs |
Inventory namespace |
KollectConnectionTest.spec.sinkRef |
Test namespace |
Cluster-wide rollup uses KollectClusterInventory with spec.sinkNamespace instead.
Watch labels and KollectScope¶
- Exclude platform namespaces with Target
excludedNamespacesor ScopedeniedNamespaces— no Namespace patch RBAC (ADR-0207). - Trivy HIGH-only collection:
resourceRuleswith label match OR CELmatchPolicy— see examples/kollecttarget_trivy-high.yaml. - Run platform targets with
watchMode: All; let teams opt out noisy namespaces viakollect.dev/namespace-watch: disabled(ANNOTATIONS-LABELS.md). - Use
watchMode: OptInonly in shared clusters where most tenants should be ignored by default. - Enforce policy with
KollectScope— violations setDegraded=Trueand block export (hard degrade, not silent skip).
Scope vs Helm watchNamespaces
Helm watchNamespaces limits what the operator informer cache sees. KollectScope limits
what a tenant may configure. Use both for defense in depth.
Sink roles (ADR-0401)¶
Classify sinks by role, not vendor. The in-memory snapshot per KollectInventory is the
canonical artifact; every sink is a projection of it.
| Role | Backends | Answers | Deletes |
|---|---|---|---|
| Snapshot store | Git, S3/GCS Parquet, HTTP | Current state, written whole each cycle | Free — absent from snapshot = deleted |
| Relational SoR | Postgres | Queryable current state, SQL joins for portals | Requires delete reconciliation |
| Event emitter | NATS JetStream (lean default), Kafka/Redpanda | Change stream for downstream integration | Tombstone (consumer-owned) |
Choosing a sink¶
| Need | Prefer |
|---|---|
| Audit trail, GitOps-friendly history | Git snapshot |
| Queryable inventory without running a DB | S3/GCS Parquet snapshot |
| Rich relational portal, BI joins | Postgres SoR |
| Event-driven integrations, fan-out | NATS or Kafka emitter |
Not a sink type
Prometheus metrics come from the operator /metrics endpoint only — not a KollectSink type
(ADR-0601).
Postgres and event emitters¶
- Postgres must delete rows (or tombstone) for resources no longer in the snapshot — upsert-only drifts stale (ADR-0401).
- Set
spec.clusteron sinks in multi-cluster installs so the backend primary key merges rows across clusters. - Tune per-sink
exportMinIntervalon structuredsinkRefs[]before adding more backends — portal Postgres at 30s + Git audit at 1h is the default sample (ADR-0413). See Performance tuning.
Connection testing¶
Keep spec.connectionTest: false in Git-managed manifests. Probe on demand with the
kollect.dev/test-connection annotation or a KollectConnectionTest CR
(Connection test example).
Multi-cluster fleet (shared sink)¶
Default multi-cluster path: each cluster runs the same single-mode operator and exports to a
shared sink (Postgres, Git, Kafka, NATS) with spec.cluster set (or {cluster} in
pathTemplate). The backend primary key merges rows across clusters — no aggregation tier
inside the operator (ADR-0501,
ADR-0401).
flowchart LR
subgraph clusters [Per-cluster operators]
S1[Kollect cluster A]
S2[Kollect cluster B]
end
subgraph backend [Shared backend]
PG[(Snapshot · Database · Event sinks)]
end
S1 -->|export + cluster id| PG
S2 -->|export + cluster id| PG
Walkthrough: Multi-cluster fleet.
Operational checklist¶
| Area | Practice |
|---|---|
| Upgrades | Two-step: kubectl apply -f dist/install-crds.yaml, then helm upgrade — never delete CRDs |
| Secrets | Credentials in Kubernetes Secrets only; restrict operator SA Secret access |
| HTTP API | Keep featureGates.inventoryHttp.enabled: false unless debugging |
| Observability | Watch ConnectionVerified, SinkReachable, Synced; alert on Degraded=True |
| Performance | Restrict watchNamespaces, narrow target selectors, raise exportMinInterval under load |