ADR-0306: Full resource export with path pruning¶
Export the whole target object (minus noise) from a profile, using Argo CD–style path exclusions and the existing redaction stack — without hand-authoring every field.
Theme: 03 · Collection & extraction · Status: Accepted (Phase 1 shipped)
Context¶
KollectProfile today requires an explicit spec.attributes[] list: each row in export is a map of
named JSONPath/CEL extractions (ADR-0302). That model is ideal for
curated inventory (image tags, chart versions, ingress hosts) but painful for:
- Audit / drift snapshots — “give me the Deployment as YAML-ish JSON, minus controller noise.”
- Exploratory profiles — new GVKs where the author does not yet know which fields matter.
- GitOps debugging — compare live
spec+ selectivestatuswithout maintaining 40 paths.
Authors can approximate full export today with a single attribute path: "cel:object", but that
bypasses pruning, size governance, and admission gates. It also embeds identity/metadata the
export contract already carries on the Item envelope (ADR-0405).
Prior art
| Project | Pattern | Relevance |
|---|---|---|
| Argo CD | spec.ignoreDifferences[] with jsonPointers (RFC 6901) and jqPathExpressions |
Familiar “ignore this subtree” UX for GitOps users |
| kubectl | last-applied-configuration, managedFields noise in live objects |
Default exclusions candidates |
| Flux / ESO | Never persist secret bytes in status | Reinforces scrub-at-extraction (ADR-0104) |
| Kollect today | Explicit attributes + scrubKeys / Secret.data webhook guard (ADR-0303) |
Security must apply to full-object mode too |
The design question is not whether to support wide export — some teams need it — but how to make it
safe, bounded, and ergonomic without breaking the Item contract or the explicit-attribute default.
Decision¶
Add an optional spec.export block on KollectProfile / KollectClusterProfile. When
export.mode: Resource, the collector serializes a pruned copy of the informer object into export
instead of (or in addition to) hand-picked attributes.
CRD shape (proposed)¶
apiVersion: kollect.dev/v1alpha1
kind: KollectProfile
metadata:
name: deployment-snapshot
namespace: team-a
annotations:
# Required when targetGVK is Secret or export.mode is Resource on sensitive kinds
kollect.dev/allow-full-resource-export: "true"
spec:
targetGVK:
group: apps
version: v1
kind: Deployment
export:
# Attributes | Resource — default Attributes (current behaviour)
mode: Resource
# Attribute key for the embedded object in Item.attributes (default: resource)
as: resource
# Which top-level object sections to include (default: SpecAndStatus)
include: SpecAndStatus # MetadataOnly | SpecOnly | StatusOnly | SpecAndStatus | All
# Path-based pruning — Argo CD–compatible + Kollect JSONPath
prune:
# Apply built-in noise exclusions (managedFields, last-applied-configuration, …)
defaults: true
# RFC 6901 JSON Pointers — same mental model as Argo CD ignoreDifferences
jsonPointers:
- /metadata/resourceVersion
- /metadata/generation
- /metadata/managedFields
- /metadata/annotations/kubectl.kubernetes.io~1last-applied-configuration
- /metadata/annotations/argocd.argoproj.io~1tracking-id
- /status/conditions
# Optional kubectl/JSONPath exclusions for authors already using profile paths
jsonPaths:
- '$.metadata.labels["pod-template-hash"]'
# Key-name denylist at any depth — merges with operator scrubKeys ([ADR-0303](0303-helm-release-inventory.md))
scrubKeys:
- password
- token
# Advanced: CEL predicates evaluated against object; true => drop matched value
# (Phase 2 — webhook validates compile; Phase 1 omit)
cel:
- 'cel:has(object.status) && has(object.status.observedGeneration)'
# When mode: Attributes (default), attributes[] is required (unchanged).
# When mode: Resource, attributes[] is optional — use for computed fields on top of the blob.
attributes:
- name: containerCount
path: 'cel:size(object.spec.template.spec.containers)'
type: int
OpenAPI sketch (Go types):
type ExportSpec struct {
// +kubebuilder:validation:Enum=Attributes;Resource
// +kubebuilder:default=Attributes
Mode string `json:"mode,omitempty"`
// +kubebuilder:default=resource
As string `json:"as,omitempty"`
// +kubebuilder:validation:Enum=MetadataOnly;SpecOnly;StatusOnly;SpecAndStatus;All
// +kubebuilder:default=SpecAndStatus
Include string `json:"include,omitempty"`
Prune *PruneSpec `json:"prune,omitempty"`
}
type PruneSpec struct {
// +kubebuilder:default=true
Defaults *bool `json:"defaults,omitempty"`
JSONPointers []string `json:"jsonPointers,omitempty"`
JSONPaths []string `json:"jsonPaths,omitempty"`
ScrubKeys []string `json:"scrubKeys,omitempty"`
CEL []string `json:"cel,omitempty"` // Phase 2
}
Semantics¶
| Topic | Rule |
|---|---|
| Default | export omitted ⇒ mode: Attributes; existing profiles unchanged |
| Mutual requirement | mode: Attributes ⇒ spec.attributes non-empty; mode: Resource ⇒ attributes optional |
| Export payload | Pruned object stored under Item.attributes[<as>] (default key resource) |
| Identity dedup | When include contains metadata, strip fields already on Item (uid, namespace, name, GVK) from the embedded copy to avoid triple storage — configurable dedupeIdentity: true (default) |
| Ordering | Deep-sort map keys on serialize so Git diffs stay stable (ADR-0405) |
| Size | Full-object rows count toward maxExportBytes; oversize rows truncate with ErrTerminal + target condition — never spill raw objects to etcd status (ADR-0103) |
| Metrics | spec.metrics[].path may reference <as> only when the metric extractor supports nested JSONPath (Phase 2); Phase 1: metrics disabled or require flat attributes |
Built-in defaults (prune.defaults: true)¶
Applied before user jsonPointers / jsonPaths:
| Pointer | Rationale |
|---|---|
/metadata/managedFields |
SSA noise; huge and unstable |
/metadata/resourceVersion |
Churn on every watch event |
/metadata/generation |
Churn; rarely needed in inventory |
/metadata/annotations/kubectl.kubernetes.io~1last-applied-configuration |
Legacy apply blob |
/metadata/annotations/argocd.argoproj.io~1tracking-id |
Argo tracking noise (optional in preset gitops) |
Not in defaults: /status (often valuable), /metadata/labels, /metadata/annotations (teams prune explicitly).
Future: named presets via prune.preset: kubernetes | gitops | none instead of a boolean.
Security and governance¶
Full-object export inherits the redaction-at-extraction model (ADR-0104):
- Global operator
scrubKeys[]always runs (Phase 2 per ADR-0303). - Profile
prune.scrubKeysmerges with global list (case-insensitive key match at any depth). SecretGVK —mode: Resourcerejected unlesskollect.dev/allow-full-resource-export: "true"on the profile andKollectScope(or cluster policy) allows wide export for that GVK.Secret.data— always scrubbed to{"redacted": true, "reason": "secret-data"}even with opt-in annotation (same as summary profiles).- Helm storage — raw
data.release/ rendered manifest never exported (ADR-0303).
Admission webhook validates:
- RFC 6901 pointer syntax for
jsonPointers - JSONPath parse for
jsonPaths(warn-only Phase 1, reject Phase 2 — ADR-0302) export.asis a valid attribute name and does not collide with explicitattributes[].name- CEL compile for
prune.cel(Phase 2)
Example: Argo CD Application (audit-friendly)¶
apiVersion: kollect.dev/v1alpha1
kind: KollectProfile
metadata:
name: argo-application-snapshot
namespace: platform
spec:
targetGVK:
group: argoproj.io
version: v1alpha1
kind: Application
export:
mode: Resource
as: application
include: SpecAndStatus
prune:
defaults: true
jsonPointers:
- /status/operationState
- /status/reconciledAt
- /status/sync.comparedTo
attributes:
- name: chartVersion
path: '$.status.sync.revision'
type: string
Export row (conceptual):
{
"targetNamespace": "platform",
"targetName": "argo-apps",
"namespace": "argocd",
"name": "guestbook",
"group": "argoproj.io",
"version": "v1alpha1",
"kind": "Application",
"uid": "…",
"attributes": {
"application": {
"apiVersion": "argoproj.io/v1alpha1",
"kind": "Application",
"spec": { "…": "…" },
"status": { "sync": { "status": "Synced" } }
},
"chartVersion": "abc123"
}
}
Alternatives considered¶
| Option | Verdict |
|---|---|
cel:object attribute only |
Too easy to foot-gun; no defaults, no pointer pruning, weak governance |
New top-level Item.resource field |
Cleaner typing but breaks additive-only contract assumptions in early consumers; defer until envelope milestone (ADR-0405) |
attributes: [{ name: "*", path: … }] wildcard |
Magic name; hard to validate and to document in OpenAPI |
| Target-side pruning | Wrong layer — pruning is schema concern on Profile, not collection scope (ADR-0207) |
| jqPathExpressions (Argo) | Useful Phase 2 alias; start with jsonPointers + jsonPaths to reuse existing parsers |
Consequences¶
Positive¶
- One profile covers “export almost everything” with familiar Argo CD pointer syntax.
- Explicit attributes remain the default — no behaviour change for curated inventories.
- Hybrid mode (
Resource+ extra attributes) supports checksums / derived fields without duplicating paths. - Security stays at extraction time; sinks remain secret-free by construction.
Negative¶
- Payload size and Git diff noise increase — authors must opt in and tune
prune. - Deep scrub walks add CPU per object; needs benchmarks in PERFORMANCE.md.
- Nested-object metrics and SQL column promotion need follow-up (ADR-0304, ADR-0401).
Implementation phases¶
- Phase 1 (MVP):
export.mode,as,include,prune.defaults,prune.jsonPointers,prune.jsonPaths,prune.scrubKeys; webhook + unit tests; noprune.cel. - Phase 2:
prune.cel,prune.preset, jqPathExpressions alias, nested metrics paths, scope-levelallowResourceExport. - Phase 3: Optional
Itemenvelope fieldresourceif consumers outgrowattributes.resource.
Open questions¶
- OPEN: Should
include: MetadataOnlybe allowed, or always fold identity intoItemand omit metadata from the blob? - OPEN: Preset name
gitopsvs documenting Argo pointers only in samples? - OPEN: Hybrid default — when both
Resourceandattributesare set, fail if an attribute path reads outside pruned tree? - OPEN: Parquet / Postgres sinks — promote top-level
spec.*columns automatically in Resource mode, or stay JSON blob only?