Home

Kubernetes knows what's running. Kollect makes it a record. Declare what matters in a few CRs and get a durable, always-current inventory wherever your platform needs it — a Git history you can diff, a database your portal can query, an event stream your automation can react to. Start with one Git repo; grow to multi-tenant fan-out across teams without rebuilding anything.

Record the hero demo locally: DEMO-GIF-GUIDE.md.

Git-simple to start · platform-grade to grow — kollect.dev/v1alpha1 · event-driven · CRD-native · fleet-ready

Quick start CR reference

What Kollect does¶

Kubernetes is the source of truth for what is running; it is a poor system of record for stakeholder inventory. Kollect maintains a read model — live state captured once, then served from export data:

Scope and Target select resources by GVK and namespace; Profile extracts the attributes that matter (CEL or JSONPath); Inventory rolls up matching objects, debounces churn, and exports snapshots to pluggable sinks (Git, object stores, databases, event streams). Every backend sees the same aggregated rows; sinks are interchangeable projections.

Inventory is configuration, not code — owned per team in its own namespace.

Pre-beta

APIs and defaults may change until the first release candidate. See the roadmap for current status.

Why Kollect?¶

Event-driven¶

Shared informers per GVK — inventory stays current without polling loops (ADR-0301).

CRD-native¶

Declare profiles, sinks, targets, and inventory in Kubernetes; GitOps-friendly from day one.

Multi-tenant¶

KollectScope gates which teams and namespaces can export to which sinks.

Fleet-ready¶

Each cluster runs mode: single and exports to shared sinks with a cluster label (ADR-0501).

How it works¶

Left-to-right operator pipeline from Kubernetes API through shared per-GVK informers and an in-memory collect store, KollectInventory debounce, to fan-out sink projections for Git, GitLab, S3, GCS, Postgres, MongoDB, and Kafka.

The in-memory snapshot per inventory is canonical; every sink is a projection of it — no single backend is privileged. Sink roles (snapshot store, relational store, event emitter) are documented in ADR-0401; reconciliation detail in Architecture and Data flows.

Supported & planned sinks¶

Family CRD	`spec.type`	Status
`KollectSnapshotSink`	`git`, `gitlab`, `s3`	Core — production-ready
`KollectSnapshotSink`	`gcs`	Beta — shipped, maturing
`KollectDatabaseSink`	`postgres`	Core
`KollectDatabaseSink`	`mongodb`, `bigquery`	Beta — `bigquery` v0.7.x hardening
`KollectEventSink`	`kafka`, `nats`	Beta — `nats` v0.7.x hardening
`KollectSnapshotSink`	`azureblob`	Planned
Object-store sinks	Parquet layout	Planned — on S3/GCS

Release timing and deferred backends: Roadmap — Supported & planned sinks.

The resource model¶

A pipeline is just a handful of Kubernetes resources: config you declare (KollectProfile, family sinks — KollectSnapshotSink, KollectDatabaseSink, KollectEventSink, KollectScope) and objects the operator reconciles (KollectTarget, KollectInventory). Cluster-scoped KollectCluster* variants add cross-namespace rollup.

flowchart LR
  K8s(["Kubernetes API"]):::api

  subgraph declare["You declare — static config"]
    direction TB
    Profile["<b>KollectProfile</b><br/>what to extract"]
    Scope["<b>KollectScope</b><br/>guardrails"]
    Snap["<b>KollectSnapshotSink</b><br/>snapshot store"]
    Db["<b>KollectDatabaseSink</b><br/>relational SoR"]
    Ev["<b>KollectEventSink</b><br/>event emitter"]
  end

  subgraph run["Operator reconciles"]
    direction TB
    Target["<b>KollectTarget</b><br/>what to watch"]
    Inv["<b>KollectInventory</b><br/>aggregate · debounce · export"]
  end

  subgraph out["Sink projections — choose any"]
    direction TB
    SnapOut["Git · GitLab · S3 · GCS<br/><i>snapshot store</i>"]
    Rel["Postgres · MongoDB<br/><i>relational SoR</i>"]
    EvtOut["Kafka<br/><i>event emitter</i>"]
  end

  K8s -- "informer per GVK" --> Target
  Profile --> Target
  Target --> Inv
  Scope -. gates .-> Target
  Scope -. gates .-> Inv
  Inv --> Snap
  Inv --> Db
  Inv --> Ev
  Snap --> SnapOut
  Db --> Rel
  Ev --> EvtOut

  classDef api fill:#1F2937,stroke:#6B7280,color:#fff;
  classDef config fill:#326CE5,stroke:#1b3a8c,color:#fff;
  classDef work fill:#18B6A3,stroke:#0e6f63,color:#fff;
  classDef proj fill:#7FB3FF,stroke:#326CE5,color:#081A4B;

  class Profile,Scope,Snap,Db,Ev config;
  class Target,Inv work;
  class SnapOut,Rel,EvtOut proj;

Kind	You set	Role
`KollectProfile`	GVK + CEL / JSONPath attributes	What to extract from each object
`KollectTarget`	selectors + `profileRef`	What to watch and collect
`KollectInventory`	family sink refs + cadence	Aggregate, debounce, and export
`KollectSnapshotSink`	type + endpoint + `secretRef`	Snapshot store (Git, GitLab, S3, GCS)
`KollectDatabaseSink`	type + credentials	Relational SoR (Postgres, MongoDB)
`KollectEventSink`	type + brokers	Event emitter (Kafka)
`KollectScope`	allowed GVKs / namespaces / sinks	Guardrails for the team namespace

Full fields: CR reference · model rationale: ADR-0201.

Performance¶

Kollect is built for large single clusters and multi-cluster fleets, with honest, tested targets (ADR-0603) — 10,000+ rows validated in nightly load tests, 100,000-row design target per cluster, and fleet fan-in with no hub merge tier. Tuning knobs are catalogued in the performance guide.

Documentation map¶

Section	Start here
Getting started	Quick start · Development setup · Examples
Core concepts	CRD model · CR reference · Multi-cluster fleet
Operator manual	Install & ops · Upgrading · Helm values
Performance & ops	Performance tuning · Scaling & fleet · Best practices · Troubleshooting
Background	Prerequisites & basics · Architecture (package graph) · Data flows
Reference	Custom resources · FAQ · ADRs · RFCs
Contributing	Roadmap · Planned features · ADR/RFC process · Release process

Try an example¶

Deployment inventory → Git / Postgres / Kafka — the end-to-end walkthrough
Postgres state store (relational SoR)
NATS event sink
Helm release inventory (Argo primary; Flux secondary)
Live demo inventory exported to Git — see real output

Go deeper¶

Platform decisions — the locked design summary
Sink taxonomy: state vs stream — why no backend is privileged
Read-only UI console (frozen preview) — early adopter SPA; program frozen until v0.7.x+
Roadmap — build-order phases and current status