Skip to content

Home

OpenSSF Scorecard OpenSSF Best Practices CI Preflight
Docs CodeQL Release codecov
License: MIT Go Container

Kubernetes knows what's running. Kollect makes it a record. Declare what matters in a few CRs and get a durable, always-current inventory wherever your platform needs it — a Git history you can diff, a database your portal can query, an event stream your automation can react to. Start with one Git repo; grow to multi-tenant fan-out across teams without rebuilding anything.

Record the hero demo locally: DEMO-GIF-GUIDE.md.

Git-simple to start · platform-grade to growkollect.dev/v1alpha1 · event-driven · CRD-native · fleet-ready

Quick start CR reference

What Kollect does

Kubernetes is the source of truth for what is running; it is a poor system of record for stakeholder inventory. Kollect maintains a read model — live state captured once, then served from export data:

Scope and Target select resources by GVK and namespace; Profile extracts the attributes that matter (CEL or JSONPath); Inventory rolls up matching objects, debounces churn, and exports snapshots to pluggable sinks (Git, object stores, databases, event streams). Every backend sees the same aggregated rows; sinks are interchangeable projections.

Inventory is configuration, not code — owned per team in its own namespace.

Pre-beta

APIs and defaults may change until the first release candidate. See the roadmap for current status.

Why Kollect?

Event-driven

Shared informers per GVK — inventory stays current without polling loops (ADR-0301).

CRD-native

Declare profiles, sinks, targets, and inventory in Kubernetes; GitOps-friendly from day one.

Multi-tenant

KollectScope gates which teams and namespaces can export to which sinks.

Fleet-ready

Each cluster runs mode: single and exports to shared sinks with a cluster label (ADR-0501).

How it works

Left-to-right operator pipeline from Kubernetes API through shared per-GVK informers and an in-memory collect store, KollectInventory debounce, to fan-out sink projections for Git, GitLab, S3, GCS, Postgres, MongoDB, and Kafka.

The in-memory snapshot per inventory is canonical; every sink is a projection of it — no single backend is privileged. Sink roles (snapshot store, relational store, event emitter) are documented in ADR-0401; reconciliation detail in Architecture and Data flows.

Supported & planned sinks

Family CRD spec.type Status
KollectSnapshotSink git, gitlab, s3 Core — production-ready
KollectSnapshotSink gcs Beta — shipped, maturing
KollectDatabaseSink postgres Core
KollectDatabaseSink mongodb, bigquery Betabigquery v0.7.x hardening
KollectEventSink kafka, nats Betanats v0.7.x hardening
KollectSnapshotSink azureblob Planned
Object-store sinks Parquet layout Planned — on S3/GCS

Release timing and deferred backends: Roadmap — Supported & planned sinks.

The resource model

A pipeline is just a handful of Kubernetes resources: config you declare (KollectProfile, family sinks — KollectSnapshotSink, KollectDatabaseSink, KollectEventSink, KollectScope) and objects the operator reconciles (KollectTarget, KollectInventory). Cluster-scoped KollectCluster* variants add cross-namespace rollup.

flowchart LR
  K8s(["Kubernetes API"]):::api

  subgraph declare["You declare — static config"]
    direction TB
    Profile["<b>KollectProfile</b><br/>what to extract"]
    Scope["<b>KollectScope</b><br/>guardrails"]
    Snap["<b>KollectSnapshotSink</b><br/>snapshot store"]
    Db["<b>KollectDatabaseSink</b><br/>relational SoR"]
    Ev["<b>KollectEventSink</b><br/>event emitter"]
  end

  subgraph run["Operator reconciles"]
    direction TB
    Target["<b>KollectTarget</b><br/>what to watch"]
    Inv["<b>KollectInventory</b><br/>aggregate · debounce · export"]
  end

  subgraph out["Sink projections — choose any"]
    direction TB
    SnapOut["Git · GitLab · S3 · GCS<br/><i>snapshot store</i>"]
    Rel["Postgres · MongoDB<br/><i>relational SoR</i>"]
    EvtOut["Kafka<br/><i>event emitter</i>"]
  end

  K8s -- "informer per GVK" --> Target
  Profile --> Target
  Target --> Inv
  Scope -. gates .-> Target
  Scope -. gates .-> Inv
  Inv --> Snap
  Inv --> Db
  Inv --> Ev
  Snap --> SnapOut
  Db --> Rel
  Ev --> EvtOut

  classDef api fill:#1F2937,stroke:#6B7280,color:#fff;
  classDef config fill:#326CE5,stroke:#1b3a8c,color:#fff;
  classDef work fill:#18B6A3,stroke:#0e6f63,color:#fff;
  classDef proj fill:#7FB3FF,stroke:#326CE5,color:#081A4B;

  class Profile,Scope,Snap,Db,Ev config;
  class Target,Inv work;
  class SnapOut,Rel,EvtOut proj;
Kind You set Role
KollectProfile GVK + CEL / JSONPath attributes What to extract from each object
KollectTarget selectors + profileRef What to watch and collect
KollectInventory family sink refs + cadence Aggregate, debounce, and export
KollectSnapshotSink type + endpoint + secretRef Snapshot store (Git, GitLab, S3, GCS)
KollectDatabaseSink type + credentials Relational SoR (Postgres, MongoDB)
KollectEventSink type + brokers Event emitter (Kafka)
KollectScope allowed GVKs / namespaces / sinks Guardrails for the team namespace

Full fields: CR reference · model rationale: ADR-0201.

Performance

Kollect is built for large single clusters and multi-cluster fleets, with honest, tested targets (ADR-0603) — 10,000+ rows validated in nightly load tests, 100,000-row design target per cluster, and fleet fan-in with no hub merge tier. Tuning knobs are catalogued in the performance guide.

Documentation map

Section Start here
Getting started Quick start · Development setup · Examples
Core concepts CRD model · CR reference · Multi-cluster fleet
Operator manual Install & ops · Upgrading · Helm values
Performance & ops Performance tuning · Scaling & fleet · Best practices · Troubleshooting
Background Prerequisites & basics · Architecture (package graph) · Data flows
Reference Custom resources · FAQ · ADRs · RFCs
Contributing Roadmap · Planned features · ADR/RFC process · Release process

Try an example

Go deeper