Skip to content

Status: Active (v0.1.13 Wave 1) Β· Owner: Storage/Reclamation Β· Version: v0.1.13

Reclamation SLO ContractΒΆ

This document is the operator-facing contract for Moon's storage reclamation subsystem. It defines six SLOs, the INFO reclamation fields used to measure them, alert thresholds that indicate a breach is imminent or in progress, and remediation pointers into the Reclamation Runbook.

Every field listed here is emitted by INFO reclamation in v0.1.13. Fields marked Wave-2 placeholder emit 0 in v0.1.13 and are populated by the Wave-2 autovacuum daemon (v0.1.14). Do not alert on a 0 value from a Wave-2 placeholder field β€” configure alert rules to fire only when the field is > 0 and breaches the threshold.

Related documents:


Quick ReferenceΒΆ

# SLO Name Target Primary INFO Field Alert Threshold Runbook Scenario v0.1.13 Status
1 Bloat dead fraction P99 ≀ 30% reclamation_dead_fraction_max > 0.35 Compaction Not Keeping Up Live
2 WAL size total ≀ 2Γ— --max-wal-size for β‰₯ 60 s reclamation_wal_bytes > 2Γ— configured limit for 60 s Disk Filling Up Live
3 Manifest commit P99 ≀ 5 ms with ≀ 100 K entries reclamation_manifest_tombstones tombstones > 80 K Query Latency Cliffed Live
4 Read amplification vector query fanout P99 ≀ 16 reclamation_read_amp_p99 > 20 Query Latency Cliffed Wave-2 placeholder
5 MVCC pinning oldest snapshot age ≀ --mvcc-old-snapshot-threshold-secs reclamation_mvcc_oldest_snapshot_age_secs > configured threshold OOM / Memory Growth Live
6 Request impact autovacuum overhead P50 ≀ +5%, P99 ≀ +10% reclamation_autovacuum_throttled_due_to_load throttled continuously for > 300 s Compaction Not Keeping Up Wave-2 placeholder

SLO 1 β€” Bloat (dead fraction)ΒΆ

Definition: The fraction of dead (tombstoned, unreferenced) records in the hottest shard's segment store must not exceed 30% at P99 when the ingest rate is at or below the compaction throughput ceiling.

Attribute Value
Target reclamation_dead_fraction_max ≀ 0.30 when ingest ≀ compaction throughput
INFO field reclamation_dead_fraction_max
Alert threshold > 0.35 for 5 consecutive minutes
Escalation threshold > 0.50 for 2 consecutive minutes (compaction critically behind)
Remediation Scenario 4 β€” Compaction Not Keeping Up
v0.1.13 status Live β€” emitted by P10 metrics surface

Why this matters: At dead fraction > 30%, vector query read amplification climbs (more segments must be probed), and graph traversal must skip increasing numbers of dead edges. At > 50%, a compaction backlog has formed and disk growth is unbounded until compaction catches up.

Supporting fields:

reclamation_immutable_segments:<n>      # immutable segment count (compaction input)
reclamation_warm_segments:<n>           # segments in warm tier
reclamation_cold_segments:<n>           # segments in cold tier
reclamation_compaction_pending_bytes:<n> # bytes queued for compaction

Alert rule (Prometheus / VictoriaMetrics):

moon_reclamation_dead_fraction_max > 0.35

If Moon's admin HTTP port is not enabled, poll via:

redis-cli -p 6399 INFO reclamation | grep reclamation_dead_fraction_max

SLO 2 β€” WAL SizeΒΆ

Definition: The combined WAL byte size across all shards must not exceed twice the configured --max-wal-size limit for 60 or more consecutive seconds under any workload.

Attribute Value
Target reclamation_wal_bytes ≀ 2 Γ— --max-wal-size (default 2 Γ— 256 MiB = 512 MiB) for any 60-second window
INFO fields reclamation_wal_bytes, reclamation_wal_segments
Alert threshold reclamation_wal_bytes > 1.5 Γ— --max-wal-size for 30 s
Escalation threshold reclamation_write_stall_active = true
Remediation Scenario 1 β€” Disk Filling Up
v0.1.13 status Live β€” emitted by P10 metrics surface

Config flags involved:

--max-wal-size              default: 256mb    per-shard WAL size cap
--wal-segment-size          default: 16mb     individual WAL segment size
--wal-max-checkpoint-lag-ms default: 10000    max ms before a checkpoint is forced
--disk-free-min-pct         default: 5        write-stall trigger (% free disk)

Supporting fields:

reclamation_disk_free_bytes:<n>          # free disk on persistence partition
reclamation_write_stall_active:<bool>    # true when writes are stalled
reclamation_write_stall_threshold_pct:<f> # --disk-free-min-pct in effect

What write-stall means: When reclamation_write_stall_active is true, write commands return an error (WRITE_STALL disk usage too high) and block new writes until disk is freed. This is intentional: fail-fast is safer than OOM. See Scenario 1.


SLO 3 β€” Manifest Commit LatencyΒΆ

Definition: The manifest commit latency (dual-root atomic swap) must stay below 5 ms at P99 when the combined manifest entry count (active + tombstoned) is at or below 100 000.

Attribute Value
Target Manifest commit P99 ≀ 5 ms with ≀ 100 K entries
INFO fields reclamation_manifest_active, reclamation_manifest_tombstones
Alert threshold reclamation_manifest_tombstones > 80 000
Escalation threshold reclamation_manifest_tombstones > 100 000
Remediation Scenario 2 β€” Query Latency Cliffed
v0.1.13 status Live β€” emitted by P10 metrics surface

Config flags involved:

--manifest-tombstone-retain-epochs   default: 2    epochs before tombstone GC
--manifest-tombstone-retain-secs     default: 300  seconds before tombstone GC

Supporting fields:

reclamation_manifest_active:<n>         # active (live) manifest entries
reclamation_manifest_tombstones:<n>     # pending-GC tombstone entries

Root cause when breached: Tombstones accumulate when segment GC is slower than segment creation. This typically co-occurs with high dead fraction (SLO 1) and an active compaction backlog. Reduce --manifest-tombstone-retain-epochs to 1 as a short-term mitigation.


SLO 4 β€” Read Amplification (Vector Queries)ΒΆ

Definition: The P99 number of segments probed per vector query (HNSW search fanout) must stay at or below 16 with default configuration.

Attribute Value
Target reclamation_read_amp_p99 ≀ 16
INFO fields reclamation_read_amp_p99, reclamation_segment_fanout_p99, reclamation_segment_fanout_p50
Alert threshold reclamation_read_amp_p99 > 20
Escalation threshold reclamation_read_amp_p99 > 32 (2Γ— budget β€” recall degrades measurably)
Remediation Scenario 2 β€” Query Latency Cliffed
v0.1.13 status Wave-2 placeholder β€” emits 0 in v0.1.13; populated by v0.1.14 autovacuum daemon

Wave-2 note: In v0.1.13 reclamation_read_amp_p99 always emits 0. Configure your alert to fire only when the field is > 0 AND > 20. The underlying cause of read amplification growth β€” too many immutable segments β€” is observable now via reclamation_immutable_segments > 16.

Proxy metric available now (v0.1.13):

redis-cli -p 6399 INFO reclamation | grep reclamation_immutable_segments
# Alert when > 16 β€” each immutable segment adds one full probe per FT.SEARCH call

Root cause: Immutable segment count grows when compaction (Wave-2 P2) has not run. Manual FT.COMPACT <index> forces compaction. See Scenario 2.


SLO 5 β€” MVCC PinningΒΆ

Definition: An alarm must fire when the oldest active snapshot is older than the configured threshold (--mvcc-old-snapshot-threshold-secs, default 600 s). The MVCC subsystem must not hold unbounded memory due to a single stuck snapshot.

Attribute Value
Target reclamation_mvcc_oldest_snapshot_age_secs ≀ --mvcc-old-snapshot-threshold-secs (default 600 s)
INFO fields reclamation_mvcc_oldest_snapshot_age_secs, reclamation_mvcc_oldest_snapshot_lag, reclamation_mvcc_active, reclamation_mvcc_committed
Alert threshold reclamation_mvcc_oldest_snapshot_age_secs > configured threshold
Escalation threshold > 2 Γ— threshold (MVCC committed set likely growing unbounded)
Remediation Scenario 3 β€” OOM / Memory Growth
v0.1.13 status Live β€” emitted by P10 metrics surface

Config flags involved:

--mvcc-old-snapshot-threshold-secs   default: 600   seconds before snapshot is "old"
--mvcc-committed-prune-margin        default: 1000  LSN prune margin

Supporting fields:

reclamation_mvcc_committed:<n>              # committed version entries in RoaringTreemap
reclamation_mvcc_active:<n>                # live (open) snapshots
reclamation_mvcc_oldest_snapshot_lag:<n>   # LSN lag of oldest snapshot
reclamation_mvcc_oldest_snapshot_age_secs:<n>
reclamation_mvcc_zombies_swept_total:<n>   # cumulative zombie sweep count
reclamation_delete_pending_visible_lsn:<n> # LSN below which tombstones are safe to GC

What "MVCC pinning" means: A long-running FT.SEARCH or graph query holds an open snapshot. The MVCC manager cannot prune committed versions below that snapshot's LSN. Memory grows proportionally to write throughput Γ— snapshot age. One stuck client can pin hundreds of megabytes. The VACUUM MVCC command (Wave-1 P8) forces a sweep and reports bytes reclaimed.


SLO 6 β€” Request Impact (Autovacuum Overhead)ΒΆ

Definition: When the autovacuum daemon (Wave-2, v0.1.14) is running, the request latency overhead it introduces must not exceed +5% on P50 or +10% on P99 relative to a no-vacuum baseline.

Attribute Value
Target Autovacuum-induced P50 overhead ≀ +5%, P99 ≀ +10%
INFO fields reclamation_autovacuum_throttled_due_to_load, reclamation_autovacuum_last_run_ts, reclamation_autovacuum_segments_compacted_total
Alert threshold reclamation_autovacuum_throttled_due_to_load = 1 continuously for > 300 s
Escalation threshold Throttled for > 1 800 s (30 min) β€” compaction is materially behind and autovacuum cannot catch up
Remediation Scenario 4 β€” Compaction Not Keeping Up
v0.1.13 status Wave-2 placeholder β€” emits 0 in v0.1.13; autovacuum daemon ships in v0.1.14

Wave-2 note: In v0.1.13, autovacuum is not running and all reclamation_autovacuum_* fields emit 0. Compaction is triggered manually via VACUUM / FT.COMPACT. The SLO target applies from v0.1.14 onward. Configure alerts to fire only when reclamation_autovacuum_last_run_ts > 0.


Measuring the SLOsΒΆ

Getting current valuesΒΆ

# Full reclamation section
redis-cli -p 6399 INFO reclamation

# Targeted field check
redis-cli -p 6399 INFO reclamation | grep -E \
  'reclamation_dead_fraction_max|reclamation_wal_bytes|reclamation_manifest_tombstones|reclamation_read_amp_p99|reclamation_mvcc_oldest_snapshot_age_secs|reclamation_autovacuum_throttled_due_to_load'
# Prometheus / VictoriaMetrics alert examples
# Adjust instance label and scrape target to match your deployment.

- alert: MoonBloatHigh
  expr: moon_reclamation_dead_fraction_max > 0.35
  for: 5m
  annotations:
    summary: "Moon dead fraction exceeds 35% β€” compaction backlog forming"
    runbook: "https://your-docs/docs/operations/reclamation-runbook.md#scenario-4"

- alert: MoonWalOversize
  expr: moon_reclamation_wal_bytes > moon_config_max_wal_bytes * 1.5
  for: 30s
  annotations:
    summary: "Moon WAL bytes > 1.5Γ— limit β€” approaching write-stall"
    runbook: "https://your-docs/docs/operations/reclamation-runbook.md#scenario-1"

- alert: MoonManifestTombstoneHigh
  expr: moon_reclamation_manifest_tombstones > 80000
  for: 5m
  annotations:
    summary: "Moon manifest tombstones > 80K β€” manifest GC is behind"
    runbook: "https://your-docs/docs/operations/reclamation-runbook.md#scenario-2"

- alert: MoonImmutableSegmentsHigh
  # Proxy for SLO 4 until v0.1.14 populates reclamation_read_amp_p99
  expr: moon_reclamation_immutable_segments > 16
  for: 2m
  annotations:
    summary: "Moon immutable segment count > 16 β€” FT.SEARCH read amplification elevated"
    runbook: "https://your-docs/docs/operations/reclamation-runbook.md#scenario-2"

- alert: MoonMvccSnapshotOld
  expr: moon_reclamation_mvcc_oldest_snapshot_age_secs > 600
  for: 1m
  annotations:
    summary: "Moon MVCC oldest snapshot > 600 s β€” possible stuck client pinning memory"
    runbook: "https://your-docs/docs/operations/reclamation-runbook.md#scenario-3"

SLO Compliance BoundariesΒΆ

These SLOs apply under the following conditions. Behavior outside these bounds is unspecified:

Condition Assumption
Shard count 1–16 shards
Ingest rate ≀ compaction throughput ceiling (not an overload scenario)
Snapshot age No snapshot older than --mvcc-old-snapshot-threshold-secs
Disk headroom reclamation_disk_free_bytes > 10% of partition (above --disk-free-min-pct)
Platform Linux aarch64 / x86_64, kernel β‰₯ 6.1 (Tier 1/2 per PRODUCTION-CONTRACT.md)

Out-of-scope: macOS (dev-only), tokio runtime (CI parity only), WAL sync mode (appendfsync=always).


VersioningΒΆ

Version Changes
v0.1.13 Initial publication. SLOs 1-3, 5 are live. SLOs 4, 6 are Wave-2 placeholders.
v0.1.14 (planned) SLOs 4, 6 become live when autovacuum daemon and read-amp metrics ship.