Moon Production ContractΒΆ
Status: provisional β SLO numbers lock in at Phase 97 (Performance SLO Lock-In) and the full checklist locks at Phase 100 (GA Gate).
Last updated: 2026-04-08 (Phase 87 initial publication)
Milestone: v0.1.3 Production Readiness
Architectural baseline: .planning/MOON-DATAFLOW-WALKTHROUGH.md
What this document isΒΆ
This is the contract every Moon v1.0 user is entitled to and every v0.1.3 phase tests against. If a number appears here without an automated test or benchmark behind it, the number is wrong. Phases 88-100 of the v0.1.3 milestone each tick items off the GA Exit Criteria checklist at the bottom of this document. The checklist is the gate β nothing promotes to v1.0-rc1 until every box is ticked green.
Aspirational numbers do not belong in this document. Provisional numbers do, and they are marked [provisional β Phase N] until the phase that verifies them lands.
Supported PlatformsΒΆ
| Tier | Platform | Runtime | Guarantees |
|---|---|---|---|
| 1 β Primary | Linux aarch64, kernel β₯ 6.1 | monoio + io_uring | Full SLOs in this document |
| 2 β Secondary | Linux x86_64, kernel β₯ 6.1 | monoio + io_uring | Full SLOs contingent on PERF-04 fix (x86_64 monoio accept-loop regression closed in Phase 97) |
| 3 β CI / dev | Linux any arch, any kernel | tokio (MOON_NO_URING=1) |
Functional correctness only β no SLO commitment |
| Unsupported | macOS native, Windows, WSL1, kernel < 6.1, x86_64 without AVX2, aarch64 without NEON | β | Dev-only via OrbStack for macOS; others not built |
Rationale for exclusions:
- macOS native β kqueue integration with the monoio cold tier is not validated; OrbStack VM gives Linux parity on macOS hosts without forking the I/O layer.
- Windows / WSL1 β io_uring unavailable; tokio fallback would create a two-tier support story we cannot staff.
- Kernel < 6.1 β io_uring feature floor; MOON_NO_URING=1 works but cannot hit the SLOs below.
Performance SLOsΒΆ
All numbers are per single node, single process. Pipelining and multi-key commands use separate tables. Reference hardware: aarch64 Apple M-series via OrbStack for development, Google Cloud c3-standard-8 for CI/reference benches.
Single-key, non-pipelined (Tier 1/2)ΒΆ
| Command | Load | p50 | p99 | p99.9 | Verified by |
|---|---|---|---|---|---|
GET |
1 M QPS | β€ 50 Β΅s | β€ 500 Β΅s | β€ 2 ms | PERF-01, PERF-02 [provisional β Phase 97] |
SET appendfsync=everysec |
300 K QPS | β€ 80 Β΅s | β€ 800 Β΅s | β€ 3 ms | PERF-01, PERF-02 [provisional β Phase 97] |
SET appendfsync=always |
50 K QPS | β€ 500 Β΅s | β€ 5 ms | β€ 20 ms | PERF-01, PERF-02 [provisional β Phase 97] |
HSET |
300 K QPS | β€ 100 Β΅s | β€ 1 ms | β€ 4 ms | PERF-01 [provisional β Phase 97] |
ZADD |
200 K QPS | β€ 150 Β΅s | β€ 1.5 ms | β€ 6 ms | PERF-01 [provisional β Phase 97] |
LPUSH |
300 K QPS | β€ 100 Β΅s | β€ 1 ms | β€ 4 ms | PERF-01 [provisional β Phase 97] |
PipelinedΒΆ
| Command | Pipeline depth | Throughput (absolute) | Ratio vs Redis 8.x | Verified by |
|---|---|---|---|---|
GET |
p=16 | β₯ 4 M QPS | β₯ 1.7Γ | PERF-01 [provisional] |
GET |
p=128 | β₯ 5.5 M QPS | β₯ 2.3Γ | PERF-01 [provisional β x86_64 only until PERF-04] |
SET everysec |
p=16 | β₯ 1.9 M QPS | β₯ 1.5Γ | PERF-01 [provisional] |
SET everysec |
p=64 | β₯ 2.2 M QPS | β₯ 1.9Γ | PERF-01 [provisional] |
Vector search (HNSW + TurboQuant)ΒΆ
| Operation | Dataset | k | Metric | Target | Verified by |
|---|---|---|---|---|---|
FT.SEARCH |
1 M Γ 768-d | 10 | p99 latency @ 100 QPS | β€ 5 ms | PERF-01 [provisional β Phase 97] |
FT.SEARCH |
1 M Γ 768-d | 10 | recall@10 | β₯ 0.92 | Phase 96 benchmark reruns |
HSET (indexed) |
768-d vector | β | Throughput | β₯ 30 K inserts/s | Phase 96 benchmark reruns |
What "meets SLO" means operationallyΒΆ
- The number is measured over a 24 h HDR histogram run on reference hardware (
PERF-02). - The number is the steady-state observation β first 5 minutes of warmup excluded.
- CI Criterion gate (
PERF-01) blocks any PR that regresses a listed target by > 5 %. - Failure to hit an SLO after measurement does not cause Moon to abort β it causes the release to be blocked, the SLO to be relaxed, or the regression to be fixed. Users are not surprised at runtime; operators are warned at CI time.
Durability ModesΒΆ
Moon ships three fsync modes and a disk-offload cold tier. Each has a documented recovery bound.
appendfsync |
Process crash (SIGKILL) | OS crash / power loss | Disk full | RTO for 10 GB |
|---|---|---|---|---|
always |
RPO = 0 (all committed writes survive) | RPO = 0 | Graceful OOM error; no silent loss | β€ 10 s |
everysec (default) |
RPO β€ last buffered batch (~ 1 ms) | RPO β€ 1 s | Graceful OOM error | β€ 10 s |
no |
RPO = OS flush window | RPO = OS flush window (minutes) | Graceful OOM error | β€ 10 s |
Enforcement:
- Per-crash-class behavior is proven by the CRASH-01 scripted crash-injection matrix in Phase 94. Every {fsync-mode Γ failure-class Γ write-phase} cell must pass in CI.
- Torn writes (partial sector) are detected by CRC32 on WAL v3 records and truncated at the last durable offset β TORN-01 in Phase 94.
- Disk-offload cold tier uses a 6-phase recovery protocol (OFFLOAD-01 in Phase 94) with v2 fallback on corruption.
- Recovery order: RDB snapshot β WAL v3 segments β AOF tail. Proven by integration tests.
- appendfsync=no is documented as cache-mode only β do not use for primary storage.
Availability & Replication GuaranteesΒΆ
- Single-node availability: graceful shutdown on SIGTERM, drains in-flight requests before close. Recovery < RTO above.
- Async replication (REPLICAOF): eventually consistent. Replication lag is exposed via
/metrics(Phase 92) and bounded by backlog size and replica ACK cadence. - PSYNC2 partial resync: on replica reconnect inside backlog window β no full retransfer. Proven by
REPL-01(Phase 95). - Full resync: on reconnect outside backlog window β replica rebuilds from master snapshot. Proven by
REPL-02. - Network partition: master continues accepting writes (availability over consistency β Redis-semantics). Replica diverges and is reconciled on heal. Proven by
REPL-03. - Replica promotion:
REPLICAOF NO ONEpromotes a replica to master. Client reconfiguration is the client's responsibility. Proven byREPL-06. - Cluster mode: not in v0.1.3 scope. Deferred to v0.2+.
Security GuaranteesΒΆ
- TLS version floor: TLS 1.3 mandatory when TLS is enabled. TLS 1.2 permitted via explicit opt-in (
--tls-version 1.2) for legacy clients. - Cipher allowlist: frozen in code and audited in Phase 98 (
SEC-06). - mTLS: supported when
--tls-ca-cert-fileprovided; client cert required for connection. Proven by TLS integration tests. - ACL enforcement: every command dispatch checks the user's category + key-pattern rules before execution. Proven by
SEC-08ACL fuzzing (Phase 89). - Lua sandbox scope: no fs, net, os, debug, or package access. Audited in
SEC-04(Phase 98). Any sandbox finding is a P0. - Unsafe code: every
unsafeblock carries a// SAFETY:comment. Enforced byUNSAFE-01xtask audit in CI. - CVE disclosure: SECURITY.md (Phase 98
SEC-07) documents the 90-day embargo disclosure process and GPG key. - Supply chain:
cargo audit+cargo denyblock PR merges. SBOM (CycloneDX) published per release. Release artifacts signed via cosign with provenance attestation. All enforced in Phase 98 (SEC-01,SEC-02).
Out of ScopeΒΆ
Explicitly not promised by this contract:
| Excluded capability | Reason |
|---|---|
| Cluster mode GA | Deferred to v0.2+ β Jepsen-grade cluster testing is a milestone on its own |
| Multi-master / active-active | Not in Moon's architectural scope |
| Cross-region replication | Single-datacenter async replication only |
| Redis Modules API | Moon builds features natively β modules conflict with thread-per-core ownership model |
| Sentinel | Cluster mode (when GA) subsumes HA coordination |
| macOS native runtime | Dev-only via OrbStack VM; kqueue integration with cold tier not validated |
| Windows | io_uring unavailable; not staffed |
| WSL1 | io_uring unavailable |
| Kernel < 6.1 | io_uring feature floor below SLO viability |
| GPU vector acceleration | Feature-gated (gpu-cuda), not on default path |
| DiskANN | Not stabilized in v0.1.3 |
| HexaHNSW GA | Still experimental β recall gains not validated on real datasets |
| Redis Functions (scripting v2) | Deferred; EVAL/EVALSHA covers scripting needs |
| Client-side caching (invalidation tracking) | Deferred to v0.2+ |
GA Exit Criteria ChecklistΒΆ
Every box below must be ticked green before v0.1.3 promotes to v1.0-rc1. Each line links to a REQ-ID in .planning/REQUIREMENTS.md and the phase that closes it.
Production ContractΒΆ
-
CONTRACT-01β this document published (Phase 87) β you are here
ToolchainΒΆ
-
RUST-01β MSRV and CI on Rust 1.94.*, clippy clean on both feature sets, no Criterion regression > 2 % (Phase 88)
Correctness HardeningΒΆ
-
FUZZ-01β cargo-fuzz targets for RESP/RDB/WAL/cluster/ACL parsers; 24 h cumulative clean (Phase 89) -
LOOM-01β loom model tests for ResponseSlot, SPSC drain+notify, pending-wakers (Phase 89) -
SEC-08β ACL glob pattern + key-bypass fuzzing clean (Phase 89) -
UNSAFE-01β 100 %// SAFETY:coverage, CI-enforced (Phase 90) -
PANIC-01β zerounwrap/expect/panicon hot-path modules, module-scoped clippy deny (Phase 90) -
SEC-05βdocs/security/unsafe-audit.mdpublished (Phase 90)
Code HygieneΒΆ
-
HYGIENE-01β all files β€ 1500 lines per project rule (Phase 91) -
HYGIENE-02β unifiedConnectionCorestate machine; three handlers reduced to thin adapters (Phase 91) -
HYGIENE-03βsrc/lib.rs#![allow(...)]list audited and justified (Phase 91)
ObservabilityΒΆ
-
METRICS-01β Prometheus/metricson admin port with full metric set (Phase 92) -
SLOWLOG-01β Redis-compatible SLOWLOG commands (Phase 92) -
HEALTH-01β/healthz+/readyzendpoints (Phase 92) -
TRACE-01β structured tracing spans with sampling (Phase 92) -
INFO-01β INFO parity with Redis 7.x (Phase 92) -
CONFIG-01βmoon --check-configvalidator (Phase 92) -
CONFIG-02β TLS SIGHUP hot-reload (Phase 92)
Durability ProofΒΆ
-
OFFLOAD-02βfeat/disk-offloadmerged; recovery v3 validated (Phase 93) -
CRASH-01β crash-injection matrix green (Phase 94) -
TORN-01β torn-write replay clean (Phase 94) -
OFFLOAD-01β disk-offload SIGKILL crash test clean (Phase 94) -
JEPSEN-01β Jepsen-lite linearizability green (Phase 94) -
BACKUP-01β BGSAVE β restoreDEBUG DIGESTparity; RTO recorded in runbook (Phase 94)
Replication HardeningΒΆ
-
REPL-01..06β PSYNC partial + full, partition, kill-restart, lag metric, promotion (Phase 95)
Compatibility MatrixΒΆ
-
COMPAT-01β 8-client CI matrix green (Phase 96) -
COMPAT-02β vector client smoke tests (Phase 96) -
COMPAT-03βdocs/redis-compat.mdpublished (Phase 96) -
COMPAT-04β Redis TCL subset in CI (Phase 96)
Performance SLO Lock-InΒΆ
-
PERF-01β Criterion regression gate active in CI (Phase 97) -
PERF-02β 24 h HDR histogram rig on reference hardware; numbers above promoted from[provisional](Phase 97) -
PERF-03β RSS-per-1M-keys memory gate (Phase 97) -
PERF-04β x86_64 monoio accept-loop regression closed (Phase 97) -
PERF-05β 7-day soak clean (Phase 97)
Security HardeningΒΆ
-
SEC-01βcargo audit+cargo denyCI blocking (Phase 98) -
SEC-02β SBOM + cosign signing (Phase 98) -
SEC-03βdocs/THREAT-MODEL.mdpublished (Phase 98) -
SEC-04βdocs/security/lua-sandbox.mdpublished (Phase 98) -
SEC-06β TLS cipher allowlist frozen + cert rotation tested (Phase 98) -
SEC-07βSECURITY.mddisclosure policy published (Phase 98)
Release EngineeringΒΆ
-
REL-01βdocs/versioning.md+MOON_FORMAT_VERSION(Phase 99) -
REL-02β Upgrade/downgrade tests (Phase 99) -
REL-03β Artifacts: musl aarch64+x86_64, deb, rpm, Docker, systemd (Phase 99) -
REL-04β CHANGELOG CI gate (Phase 99) -
REL-05β Operator runbooks (runbooks/) (Phase 99) -
REL-06β User docs (getting-started, config, commands, tuning, migration) (Phase 99) -
REL-07β Tag-triggered release pipeline (Phase 99)
GA GateΒΆ
-
GA-01βv0.1.3-rc1tagged + 4-week RC soak with 0 P0 bugs (Phase 100) -
GA-02β Every checkbox above ticked green βv0.1.3final β v1.0 candidate (Phase 100)
Revision historyΒΆ
| Date | Change | Phase |
|---|---|---|
| 2026-04-08 | Initial publication β provisional SLO numbers from v0.1.2 benchmark memory; full checklist structure locked | 87 |