RFC: Prometheus metrics from collected attribute values¶
Status: Proposed (Exploring) · Author: @konih · Created: 2026-06-06
Spec-only exploration — no implementation in the ADR-0604 ship window. Revisit when Tier C domain metrics (ADR-0604) stabilizes.
Summary¶
Extend operator /metrics export so numeric (and optionally other cardinality-safe) values
extracted via CEL/JSONPath in KollectProfile.spec.attributes can become first-class Prometheus
series — gauges and counters — not only KSM-style aggregated sums over a GVK.
Today ADR-0304 emits kollect_custom_resource_series
as sums across objects matching a profile path. Platform teams also want direct export of scalar
fields (e.g. cert notAfter epoch seconds, replica availableReplicas, Argo health_status as a
bounded enum mapped to numeric codes) without hand-rolling PromQL over inventory export sinks.
Goals / non-goals¶
| Goals | Non-goals |
|---|---|
| Profile-defined gauge/counter from numeric attribute paths | KollectSink.type: prometheus — reaffirm ADR-0601 |
| Clear label vs value rules (what becomes a label key vs the sample value) | Unbounded string labels (pod names, UIDs, free-text messages) |
| Cardinality guardrails aligned with ADR-0603 | Histogram/summary from arbitrary attribute distributions in v1 of this RFC |
| Reuse existing extraction engine (ADR-0302) | A second informer loop or push gateway |
metricsScope parity with ADR-0604 (profile | target) |
Per-object series keyed by resource name/UID |
Background¶
| Artifact | Relevance |
|---|---|
| ADR-0304 | KSM-style sum series from spec.metrics[] |
| ADR-0604 | Tier B/C label keys, metricsScope, scrape-at-spoke |
| ADR-0302 | Attribute extraction paths reused for metric values |
| Planned features — attribute metrics | Backlog entry |
Relationship to ADR-0604 tiers:
| Tier | This RFC |
|---|---|
| A — Operator internals | Out of scope |
| B — Collection boundary | Out of scope (counts/readiness stay as today) |
| C — Domain (KSM-style sums) | Complements — same spec.metrics config surface may grow, or a sibling spec.attributeMetrics[] block |
| C′ — Attribute value export (proposed) | This RFC — export the scalar value of a path per aggregation scope, not only sum/count |
When promoted, Tier C and C′ should share admission webhooks and cardinality budgets documented in PERFORMANCE.md.
Proposal¶
1. Config surface (sketch)¶
Extend KollectProfile / KollectClusterProfile (exact field name TBD — attributeMetrics vs extended
metrics[] with a kind discriminator):
apiVersion: kollect.dev/v1alpha1
kind: KollectProfile
metadata:
name: cert-metrics
spec:
metricsScope: target # ADR-0604 — profile (default) | target
attributeMetrics:
- name: cert_not_after_seconds
attribute: notAfterEpoch # references spec.attributes[].name
type: gauge # gauge | counter (counter only for monotonic counters)
help: "Certificate notAfter as Unix seconds"
labels: # optional — attribute names → Prometheus label keys
- issuer
- isCA
- name: deployment_available_replicas
attribute: availableReplicas
type: gauge
Rules:
attributemust reference an existingspec.attributes[]entry whose extracted type is numeric (int,float) or a bounded enum with an explicit numeric mapping in admission.- Value: for
gauge, emit the extracted number (last-write or sum policy per series kind — see §2). - Labels: only keys listed in
labels[]; each referenced attribute must pass low-cardinality admission (same rules asspec.metrics[].labelsin ADR-0304). metricsScope: whenprofile, aggregate (e.g. sum or max — see open questions) across targets; whentarget, one metric family per target refresh withtarget_namespace,target_namelabels (ADR-0604).
Metric name prefix: kollect_attribute_ or reuse kollect_custom_resource_ with a _value
suffix — pick one at promotion time to avoid colliding with sum series.
2. Label vs value semantics¶
| Extracted shape | Prometheus role | Example |
|---|---|---|
| Single numeric field | Sample value | cert_not_after_seconds 1.734e9 |
| Low-cardinality string enum | Label (if listed) or rejected | phase="Active" → label only when allow-listed |
| High-cardinality string | Forbidden as label; do not emit | pod name, CN, serial number |
| Boolean | Label true/false or numeric 0/1 — one convention locked at promotion |
isCA="true" |
Forbidden by default: name, namespace, uid, resourceVersion as labels — same as ADR-0604
Tier C.
3. Cardinality guardrails¶
| Guardrail | Enforcement |
|---|---|
Max attributeMetrics entries per profile |
Webhook cap (e.g. 20) |
| Max label keys per entry | 5 (align ADR-0304) |
| Series budget | Count toward ADR-0604 soft cap (~10k series / operator) |
| Counter type | Only when attribute is documented monotonic; otherwise gauge |
| Stale series | Clear gauges on object delete / target teardown (same as Tier B) |
4. Emission model¶
- Compute during collection engine refresh (ADR-0301) — no separate scrape of inventory exports.
- Spoke scrape: metrics appear on the local operator
/metrics(ADR-0604 scrape-at-spoke decision); hub does not federate these series in v1. - Optional enum → numeric mapping in CRD for alert-friendly thresholds without string labels.
Alternatives considered¶
| Option | Pros | Cons |
|---|---|---|
Extend spec.metrics[] with aggregation: sum\|value\|max |
One config block | Overloads KSM sum semantics; harder validation |
Separate attributeMetrics[] (this RFC) |
Clear mental model | More CRD surface |
| Prometheus recording rules over inventory export | No operator change | Requires sink scrape; violates ADR-0601 spirit for domain alerts |
| Only sums (status quo ADR-0304) | Simple | Cannot expose per-field scalars (cert expiry timestamp) |
Open questions¶
- Aggregation when
metricsScope: profile: sum, max, min, or avg for gauge values across targets? - Multi-object same label tuple: last-write vs sum vs expose
_countcompanion gauge? - Histogram from numeric attributes: defer or allow fixed-bucket profiles only?
- Unified vs split CRD fields: merge into
spec.metrics[]with aexport: sum\|valueenum at promotion. - Testing: mirror ADR-0604 verification matrix when implementation starts.
Promotion¶
When accepted:
- Update ADR-0304 and/or ADR-0604 with Tier C′ rules.
- Set this RFC to Superseded with links.
- Add CRD docs under
docs/crds/kollectprofile.mdand webhook validation.