Changelog¶
All notable changes to this project will be documented in this file. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Fixed — SQ8 vector quantization now works across the full FT lifecycle¶
FT.CREATE ... QUANTIZATION SQ8was a declared-but-unimplemented quantizer — vectors fell through to the TQ4 encoder against an empty codebook, producing degenerate codes and random search (recall 0.014, exact-match returned the wrong key) on real embeddings. Implemented a real per-vector affine scalar-8-bit quantizer (dimu8 codes +(min, scale)trailer) and wired it through the entire segment lifecycle: append, brute-force search, compact, immutable HNSW search, multi-segment search fan-out, segment merge, and persistence reload. Recall on real all-MiniLM-L6-v2 384d embeddings: 0.014 → 0.897, exact-match correct.append_transactional()corrupted SQ8 vectors — it wrote the TQ slot layout (padded/2 + 4) intodim + 8SQ8 slots, corrupting the code stride for every transactionally-inserted or WAL-recovered SQ8 vector. Both append paths now share oneencode_sq8_slot()helper.- SQ8 ignored the index metric — an
InnerProductindex ranked non-normalized inputs by magnitude. SQ8 now normalizes for both unit-sphere metrics (Cosine + InnerProduct), matching the rest of the engine;L2keeps raw vectors for true Euclidean ranking. - Docs: corrected the ≤384d quantization guidance to recommend SQ8 (the real 8-bit option) instead of the nonexistent "TQ8".
Changed — Rust SDK released as moondb 0.2.0 on crates.io¶
moondbcrate0.1.1→0.2.0— publishes the v0.2-era client API that has lived in-tree since thehybrid_searchsparse upgrade (f4fcd5a). Breaking:text().hybrid_search()now takessparse_field: Option<&str>andweights: [f64; 3](was two-way[f64; 2]) and speaks the PARAMS-based wire format with@-prefixed field refs. Added:Client::connect_with_timeoutfor bulk-write workloads. Released as 0.2.0 — not 0.1.2 — because cargo treats 0.1.x versions as compatible, and the signature change would break published 0.1.x consumers (lunaris-retrieve / lunaris-storage-moon 0.2.1 pin"0.1.1"with the two-way call). Aligns the SDK version with the Moon v0.2.0 server release.
[0.2.0] — 2026-06-06¶
The v0.2 enterprise beachhead. Built additively on per-shard WAL v3 + the dual-root manifest; no changes to the KV hot path, MVCC, page format, or transaction layer.
Changed — default persistence directory is the platform user-data dir (was: current directory)¶
--dirnow defaults to the platform user-data directory, created on first run: Linux$XDG_DATA_HOME/moon(or~/.local/share/moon), macOS~/Library/Application Support/moon, Windows%LOCALAPPDATA%\moon. Installed binaries no longer litter whatever directory they happen to be started from.- Back-compat guard: if the startup directory already contains moon
persistence data (
appendonlydir,shard-0,dump.rdb,replication.state— the pre-v0.2.0 default layout), moon keeps using it and logs a warning, so upgrades never silently boot with an empty keyspace away from their data. - Explicit
--dir <path>/ confdir(including--dir .) opt out of auto-resolution entirely; environments with noHOME/LOCALAPPDATAfall back to the current directory with a warning. Docker (--dir /data) and the systemd package (dir /var/lib/moon) already pass explicit paths and are unaffected.
Changed — default shard count is now 1 (was: auto-detect)¶
--shardsdefaults to 1 instead of auto-detecting the CPU count. Single-shard gives the best throughput for non-pipelined workloads (cross-shard SPSC dispatch dominates local lookups) and a deterministic persistence layout across hosts.--shards 0remains the explicit auto-detect opt-in; nothing changes for deployments that pass--shardsexplicitly.- Upgrade note: deployments that previously relied on the auto-detect
default with
appendonly yeswill refuse to start after upgrading (ERR shard count changed (manifest=N, config=1)) — this is the intended data-loss guard. Start with--shards <N>matching the manifest, or see the new shard-count-change runbook (also linked from the error message itself, now as a full GitHub URL so installed binaries point somewhere reachable).
Fixed — release pipeline verification + prerelease safety¶
release.ymlsign job now publishes Fulcio certificates (*.crt) alongside signatures: keylesscosign verify-blobrequires the certificate — the previous.sig-only output was unverifiable by users.- Docker job no longer moves
ghcr.io/pilotspace/moon:lateston prerelease tags (e.g.v0.2.0-rc.1); only stable releases repoint:latest. The versioned tag is always pushed.
Changed — v0.2.0 release prep¶
- Crate version bumped 0.1.12 → 0.2.0;
INFO servermoon_versionnow reports 0.2.0 (previously released binaries would have self-reported the stale crate version regardless of the git tag). release.yml: thehomebrew-tapbump job is gated behind theHOMEBREW_TAP_ENABLEDrepository variable. Homebrew distribution is deferred — v0.2.0 ships via curlinstall.sh/install.ps1, .deb/.rpm, and Docker. Without the gate the job would hard-fail on every stable tag (missingHOMEBREW_TAP_TOKENsecret).packaging/bump-homebrew.shand the formula template stay in-tree for later enablement.
Added — Temporal-decay traversal scoring (agent-memory recency)¶
User-facing recency bias for graph traversal: paths through recently
created edges win over stale ones. The decay engine (scorers, Dijkstra
composite cost) existed but was unreachable — shortestPath() hardcoded
lambda = 0. This wires it end to end.
GRAPH.QUERY ... --decay <λ> [--time-weight <w>]— per-edge cost becomes|weight| + λ·w·age_secondsforshortestPath(); λ is 1/seconds, strictly validated. Decay off (no flag) keeps exact distance-only behavior — the age term contributes zero to every edge cost, so path choice is identical to pre-decay Moon. Applies to the read-only andGRAPH.PROFILEpaths viaExecutionContext(same pattern asVALID_AT); write queries (CREATE/SET/DELETE/MERGE) reject the flag instead of silently ignoring it.FT.NAVIGATE ... DECAY <λ>— graph-expanded hits payλ × age_secondsof their discovery edge on top of the hop penalty (a re-rank of the already-explored expansion, not a steer of the expansion itself); KNN direct hits unaffected.- Edges stamp
created_msat insert from the shard-cached clock (zero syscall on the insert path).0 = unknownis decay-neutral — pre-upgrade edges never look maximally old. Distinct from the user-owned bi-temporalvalid_from/valid_to. - CSR segment format v3 — per-edge
created_msarray (parallel tocol_indices) survives freeze → disk → mmap → compaction, so decay sees true edge age after segments rotate. v1/v2 files keep loading (empty stamps = neutral); both parsers (heap + mmap zero-copy) are version-gated, plus a newcsr_from_bytesfuzz target. - Fixed (latent durability bug):
compact_segmentsstamped merged segmentsversion: 1while the serializer always writes 48-byte v2+ NodeMeta records — a vacuumed segment written to disk misparsed on reload (panic in debug, silent node_meta corruption in release). Merged segments are now v3 and carry per-edge stamps through dedup. - Docs:
guides/temporal.mdxdecay section +commands.mdx; script coverage intest-commands.sh(6 DECAY cases) andtest-consistency.sh(1/4/12-shard path-flip parity). - Known gap: WAL-replayed not-yet-frozen edges re-stamp to replay time on restart (newest edges look new — bias direction preserved); CSR-resident edges keep exact age.
Added — Ship moon as an installable application on macOS, Linux, and Windows¶
Five-PR milestone making moon installable on all three platforms
(consolidated entry; the sibling PRs carry the skip-changelog label).
- Windows native port — cross-platform positioned-I/O helper
(
src/util/file_ext.rs; pwrite/pread on unix,seek_write/seek_readloops on Windows), portableRawSocketFdalias with#[cfg(unix)]-gated connection-migration fd ops,compile_error!guard forjemallocon Windows MSVC (mimalloc fallback), and Windows-only startup warnings (VirtualLock working-set, no memory guardrail). Build:--no-default-features --features runtime-tokio,graph,text-index. - Graceful shutdown on SIGTERM —
ctrlcterminationfeature with a single centralized handler inmain.rs;systemctl stop/launchctl stopnow flush AOF before exit (previously only SIGINT was caught). - Version strings —
INFO serverreportsredis_version:7.4.0(client feature-gating) plus a newmoon_version:field fromCARGO_PKG_VERSION; HELLO/LOLWUT report the real moon version (previously hardcoded0.1.0). - moon.conf config file — redis-style
key valueconfig (moon /etc/moon/moon.confor--config), CLI flags override the file; commented default shipped aspackaging/moon.conf.exampleand installed to/etc/moon/moon.conf(deb/rpm,config|noreplace). - Install channels —
install.sh(Linux/macOS) andinstall.ps1(Windows) one-liner installers with SHA256 verification; Homebrew tap automation (packaging/homebrew/moon.rb.tmpl+bump-homebrew.shpublishing topilotspace/homebrew-moonon release). - Release pipeline overhaul — 8-target matrix (linux gnu/musl
x86_64/aarch64, macOS arm64/Intel, Windows x86_64 MSVC), all release
binaries now include
graph,text-index,console(previously missing from releases),prepare-consolepnpm build job, checksums + cosign over all artifacts including .deb/.rpm,--prereleasefor-rctags, and aworkflow_dispatchdry-run mode. - Packaging fixes — nfpm license corrected to Apache-2.0,
moon.serviceExecStart path fixed (/usr/bin/moon), CI gains main-push-onlycheck-windowsandcheck-consolejobs plus a macOS Intel cross-build check. - Console build fix (PR #152) — graph components aligned with the
pinned
@cosmos.gl/graph2.6.4 API (setConfigPartial→setConfig, no asyncreadyhook);pnpm run buildtype-checks again, unblocking the Console Integration workflow and the releaseprepare-consolejob. - RESP3 negotiation fix (PR #153) — the monoio (default) handler now
syncs the wire codec when
HELLO 3negotiates RESP3; previously the codec stayed on RESP2 and silently flattened map/set/push frames, breaking RESP3 clients on connect (redis-py 8 defaults to RESP3). - Windows runtime fixes (PR #154) — first fully green
check-windowsrun: fsync helpers reworked for Windows semantics (no directory handles, writable handle forFlushFileBuffers); unsigned-Instantunderflows fixed (autovacuum scheduling, rate-limiter cutoffs, mvcc test anchors); the tokio sharded handler no longer initiates connection migration on non-unix platforms (previously aborted clients mid-session in multishard workloads);GraphManifest::savenow propagates parent-dir fsync errors. - Known limitations (v0.2.0) — Windows binaries are not Authenticode-signed (SmartScreen warning); connection migration is unix-only (connections stay on the originating shard on Windows); Windows service wrapper + MSI deferred.
Fixed — CodeRabbit PR #136 durability follow-ups + decomposition + test isolation (PR #144)¶
Closes the 8 CodeRabbit findings left open after PR #136, plus two PR #144-review Majors, oversized-file decomposition, and a parallel-test flake. No production hot-path behaviour change.
- Disk-offload spill (data-loss fixes): block instead of dropping spill
completions; salvage inline-batch spill failures per-entry rather than
wholesale; preserve spill context across tokio connection migration; recover
the cold tier under
appendonly=no. - Per-shard AOF rewrite robustness: clear the rewrite flag when fan-out
fails partway; roll per-shard rewrite writers back to the committed generation
on abort (barrier-before-resume + panic-safe
ShardDoneGuard); ack drainedAppendSynconly after the boundary fsync (issue #140 ordering). - Async-spill eviction: corrected a stale doc comment, removed the dead remove-first eviction path, and added a regression test locking the fail-safe send-before-remove ordering (a full spill channel keeps the victim resident — no data loss).
- Platform hygiene: gate the migrated-connection spawn fns behind
cfg(all(..., unix))to match theirRawFdusage. - File decomposition (1500-line cap): split
aof_manifest.rs(3058 → mod/shard_replay/shard_rewrite) andaof.rs(4379 → mod/pool/writer_task/ rewrite); pure code relocation, verified line-exact on both runtimes. - Test isolation: fixed the
VECTOR_INDEXEScounter flake — the process- global metrics counter is now guarded by anRwLock(delta-reader tests takewrite(), mutator tests takeread()), making the index-count delta assertions deterministic under the parallel test harness.
Persistence — Per-shard AOF migration complete (PR #129)¶
Closes the P0 multi-shard AOF data-loss bug (~50% loss on SIGKILL with
--shards >= 2 + --appendonly yes) by shipping the full per-shard AOF
architecture (Option B of tmp/rfc-per-shard-aof-v02.md).
- H2 closed —
src/main.rsno longer skips multi-shard AOF replay. Each shard owns its own writer task; recovery walks every shard's segment manifest independently. Shard replay is parallel (recovery time does not grow linearly with shard count). - H1 closed — new
AppendSync { bytes, ack }rendezvous variant ensures+OKis on the wire only after fsync ack underappendfsync=always.try_send(everysec/no) paths unchanged. --unsafe-multishard-aofdeprecated — was the v0.1.13 escape hatch acknowledging the ~50% loss risk. The flag is now a no-op that prints[DEPRECATED]at startup; will be removed in v0.2.0-rc.- CRASH-01-LITE matrix — 200/200 SIGKILL recoveries across
--shards 1/2/4/8×appendfsync always/everysec. Gated in.github/workflows/integration-tests.yml; run locally withcargo test --release crash_01_liteon the moon-dev OrbStack VM. - TopLevel-manifest safety guard — Moon refuses to start (exit 2) when
it finds an existing v0.1.13-style TopLevel manifest with
--shards >= 2, to prevent silent data loss from replaying a non-routed log. Migration viadocs/runbooks/multi-shard-aof-rewrite.mdOption A. INFO persistence— new fieldaof_backpressure_dropped:<N>exposes per-shard writer drop counts; non-zero indicates the AOF writer is falling behind write throughput.- Per-shard layout on disk:
appendonlydir/shard-{N}/moon.aof.{seq}.base.rdb moon.aof.{seq}.incr.aofmirrors the per-shard WAL v3 design.
Architectural follow-ups parked for v0.2.0:
- Rule 3 (LSN ordering invariant) under per-shard topology — issue #131
- Always-fsync handler-layer integration audit — issue #132
OrderedAcrossShardsmerge-replay correctness on large transcripts — issue #133
Per-shard BGREWRITEAOF (step 6 of the migration RFC) is not yet in
this PR; rewriting a per-shard AOF returns ERR BGREWRITEAOF is not yet
supported under per-shard AOF layout. Tracked for v0.2.0.
Headline capabilities landed in alpha:
- Point-in-Time Recovery (PITR) —
--recovery-target-lsn/--recovery-target-timerestore to any LSN or wall-clock boundary inside the WAL retention window. - Change Data Capture (CDC) —
CDC.READpolling command with Debezium-compatible JSON envelopes, resumable cursors, segment-rotation safety. - Hash-field TTL — full Valkey 9.0 / 9.1 surface (
HEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT/HEXPIRETIME/HPEXPIRETIME/HTTL/HPTTL/HPERSIST/HGETDEL/HGETEX) with O(1) HGET + HLEN fast path. Three-way benchmark vs Redis 8.0.2 / Valkey 9.1.0 ships indocs/perf/2026-05-27-hash-ttl-3way-bench.md. - Tier 2 Lane A —
SWAPDB,MOVE,COPY ... DB n,CLUSTER REPLICAS/SLAVES,CLUSTER COUNT-FAILURE-REPORTS. All WAL-durable with cross-shard atomic semantics. - Storage format v1 commitment — RDB v2 + WAL v3 + multi-part AOF
manifest grouped under a single
--storage-format v1umbrella with ≥18-month LTS forward-read guarantees. - Embedded sharded server —
server::embedded::run_embedded(config, cancel)exposes the full sharded handler (withTXN.*) to in-process embedders.
What is not yet in alpha: PITR live-snapshot LSN wiring (P3c),
CDC.SUBSCRIBE push channel (C3b), and the multi-shard master PSYNC
deferred from v0.1.10. Tracked in .planning/rfcs/v02-enterprise-architecture.md.
Fixed — Multishard idle-RAM blowup + tokio-Linux serving hang + graph parser DoS (PR #136)¶
Closes the "multishard RAM zombie": a fresh multishard instance with no
maxmemory could commit multiple GB of RSS while idle, and the tokio
runtime could hang its accept loop on Linux. Three independent root
causes, all fixed and verified in the OrbStack VM under a cgroup memory
cap (idle RSS 3791 MB → 29 MB for a 4-shard no-maxmemory instance;
~45 MB under load):
- PageCache eager pre-allocation. Each shard committed
num_frames × PAGEzeroed bytes at construction, sized to 25% of the whole-instancemaxmemory(or a large default when unset). Now the page buffers allocate lazily and the frame budget is divided by shard count (per_shard_pagecache_budget); a startupWARNreports the resolved per-shard budget. - tokio listener bind-race. The central accept listener bound the port
without
SO_REUSEPORTwhile per-shard listeners bound with it, producing a bind-order race that could leave the port served by a shard that never received connections. The central listener now also bindsSO_REUSEPORT;--per-shard-acceptdefaults tofalseunder tokio. - io_uring-under-tokio default-off. The tokio→io_uring bridge floods
errors under load. It is now opt-in via
MOON_URING=1; tokio shards run plain epoll/kqueue by default.MOON_NO_URING=1still force-disables io_uring everywhere. The monoio runtime is unaffected (always io_uring unlessMOON_NO_URING).
Two durability defects found during review were fixed in-PR with red/green TDD:
- Manifest compact-reopen failure no longer silently loses commits.
manifest.rs compact()reopenedself.fileafterrename(tmp, path); if that reopen failed, later commits silently wrote to an orphaned inode. Aneeds_reopenflag now reattaches toself.pathbefore any subsequent commit (or fails loudly). -
tokio per-shard AOF writer now latches after a torn write. The tokio per-shard writer lacked the
write_errorlatch present on the single-file and monoio writers; a torn write (header OK, data fails) corrupted the frame stream. It now latches on any write failure and acksWriteFailed. -
Cypher parser stack-overflow DoS bounded. The precedence-climbing expression grammar pushes ~9 stack frames per source nesting level but
check_depth()counts only one level per recursion, so the previous limit of 64 overflowed a 2 MiB async-worker stack (SIGABRT) before the guard could fire.DEFAULT_MAX_NESTING_DEPTHis now32(~288 worst-case debug frames, ~50% margin); deep nesting returnsNestingDepthExceededgracefully.
Low-severity durability edge cases filed as follow-ups: #137
(apply_spill_completions failed-commit cold-key window), #138
(do_rewrite_per_shard panic wedges --experimental-per-shard-rewrite),
139 (multi-DB SELECT >0 cold recovery restores db0 only).¶
Fixed — maxmemory is now a whole-instance cap across shards (G2)¶
Behavior change for multishard deployments. Previously each shard
enforced eviction against the full maxmemory, so an N-shard server
tolerated ~N× the configured cap before evicting (a 4-shard server at
--maxmemory 100mb retained ~307 MB / 768K keys vs a 1-shard server's
124 MB / 322K — the "RAM keeps growing in multishard mode" report).
maxmemory is now a true whole-instance cap. Each shard enforces
eviction against maxmemory / num_shards, so aggregate RSS converges on
the configured value regardless of shard count.
CONFIG GET maxmemory/ INFO are unchanged — they report the whole-instance value verbatim (Redis-compatible). Division happens only at enforcement.- Operators running explicit
--maxmemory N --shards M (M>1)now get an effective ceiling ofN(notN×M). A startup log line states the resolved per-shard budget so the change is visible:maxmemory <N> bytes is a whole-instance cap; each of <M> shards enforces eviction against a per-shard budget of <N/M> bytes. - Single-shard servers are byte-for-byte unaffected (
num_shards == 1⇒ no division).
Docs — Hash-field TTL three-way benchmark suite (PR #127)¶
scripts/bench-hash-ttl.sh(2-way harness) +scripts/bench-hash-ttl-3way.sh(3-way harness with Valkey 9.1.0). Both useredis-benchmarkagainst concurrent Moon / Redis / Valkey servers on distinct ports with trap-based cleanup, FLUSHALL + re-seed between scenarios, and median-of-3 RPS reporting.docs/perf/2026-05-26-hash-ttl-bench.md— pre-fix 2-way baseline that surfaced the two HashWithTtl perf issues fixed in PR #126.docs/perf/2026-05-27-hash-ttl-3way-bench.md— headline Moon vs Redis 8.0.2 vs Valkey 9.1.0 comparison across 26 scenarios. Plain HGET p=16 ties both competitors (1.00–1.01×). HEXPIRE-family Moon vs Valkey: 0.90–0.99× across the surface; HGETEX hits parity at 0.99×. Redis 8.x has no HEXPIRE-family — Moon is the only Redis-compatible alternative aside from Valkey.
Performance — HashWithTtl HGET + HLEN O(1) fast path (PR #126)¶
Resolves the two HashWithTtl perf issues surfaced by the 2026-05-26 bench:
HGETonHashWithTtlwas 39.9% slower than plainHashat p=16; now +4% faster (within VM measurement noise — effective parity).HLENonHashWithTtlwas 80.5× slower than plainHashat 1000 fields; now 1.00–1.03× parity (the O(N) live-count scan is fully eliminated when no field has expired).
Two changes shipped together (variant layout forces both at once):
HashWithTtl.ttlsBTreeMap → HashMap. Per-field TTL probe becomes O(1) HashMap lookup instead of O(log N) BTreeMap descent. Active-expire iteration doesn't require ordered keys.HashWithTtl.min_expiry_ms: u64cached minimum. Tracks the smallest expiry across all per-field TTLs. Invariant:min_expiry_ms = min(ttls.values()). Whencached_now_ms < min_expiry_ms, no field can be expired, soHGETskips thettlsprobe andHLENreturnsfields.len()directly. The hot path is a singleu64 < u64compare. Invariant maintenance is amortized O(1) (onemin(min, ts_ms)perHEXPIRE; conditional recompute onHPERSIST/ overwrite / active reap when the removed TTL equalled the min).
No on-disk format change. min_expiry_ms is recomputed at load time from the
existing v2 RDB per-field TTL trailer. 6 new invariant tests cover HEXPIRE /
HPERSIST / HSET-overwrite / active-reap / persistence-decode paths.
Docs — Hash-field TTL audit follow-up (PR #123)¶
docs/commands.mdx— Hashes section bumped from "(14)" → "(25)" and now lists all 11 new Valkey 9.0/9.1 commands plus a paragraph on the per-field return convention and the active-expiry downgrade behaviour.docs/STORAGE-FORMAT-V1.md§3.2 — added the per-field hash TTL trailer format: everyTYPE_HASHbody is followed by[ttl_count u32][field_len varint | field_bytes | ttl_ms u64]*in v2 RDB files; v1 readers stop after the hash body.src/command/metadata.rs—HEXPIRETIME/HPEXPIRETIME/HTTL/HPTTLPHF flags flipped fromR(READONLY) toRF(READONLY|FAST) — per-field TTL lookup is O(1) thanks to the PR #126 fast path. Cosmetic; no behavioural impact.
Added — HGETDEL / HGETEX atomic compound hash commands (phase 199, issue #110)¶
Two new Valkey 9.1 atomic compound hash commands:
-
HGETDEL key FIELDS numfields field [field ...]— returns the values of the specified fields and deletes them from the hash atomically. Returns a RESP Array with oneBulkString(value)per found field andNullfor missing fields. If the hash becomes empty after all deletes the key is removed entirely (auto-cleanup). -
HGETEX key [EX s | PX ms | EXAT unix-s | PXAT unix-ms | PERSIST] FIELDS numfields field [field ...]— returns the values of the specified fields and optionally updates (or removes) their per-field TTLs atomically. TTL modes: EX s— set relative expiry in seconds from now.PX ms— set relative expiry in milliseconds from now.EXAT unix-s— set absolute expiry as unix seconds.PXAT unix-ms— set absolute expiry as unix milliseconds.PERSIST— remove any existing per-field TTL.- (no mode) — pure read; no TTL change (fast path, zero DB mutation).
TTL changes apply only to live (non-expired) fields. Missing / expired
fields return
Nulland leave TTLs untouched.
Atomicity: per-shard single-threaded execution gives atomicity for free across the entire field list — no client can observe a partial state between reads and deletes/TTL-updates within a single call.
Implementation:
- Two new Database primitives in src/storage/db.rs:
- hash_get_and_delete_field — atomically reads and removes a single
field; handles Hash, HashListpack, and HashWithTtl (also removes TTL
sidecar). Downgrades HashWithTtl → Hash when the last TTL is removed.
- cleanup_empty_hash — removes the key when its hash has become empty;
called once after a HGETDEL/HGETEX field loop.
- HGETDEL handler uses parse_key_and_fields (phase-198 shared parser)
plus a single cleanup_empty_hash call after the field loop.
- HGETEX parser (parse_hgetex_args) scans the optional mode token(s)
before FIELDS with mutual-exclusion enforcement; uses saturating i128
arithmetic + u64 clamp for safe overflow handling (mirrors phase 196).
- Both commands dispatch as writes (WF flags, &mut Database); neither is
added to is_dispatch_read_supported.
- 17 new unit tests: 8 for HGETDEL, 9 for HGETEX.
Added — Hash-field TTL read + persist (phase 198, issue #109)¶
Five new Valkey 9.0 hash-field TTL commands:
HEXPIRETIME key FIELDS numfields field [field ...]— absolute expiry per field as a unix timestamp in seconds.HPEXPIRETIME key FIELDS numfields field [field ...]— absolute expiry per field as a unix timestamp in milliseconds.HTTL key FIELDS numfields field [field ...]— remaining TTL per field in seconds; already-expired-but-not-reaped fields return0.HPTTL key FIELDS numfields field [field ...]— remaining TTL per field in milliseconds; same0edge-case for expired-but-not-reaped.HPERSIST key FIELDS numfields field [field ...]— removes the per-field TTL; downgradesHashWithTtlback to plainHashwhen the last TTL is removed (handled by the phase-195hash_persist_fieldprimitive).
Per-field return codes (Valkey 9.0):
- -2 — field does not exist (or key is missing — not a WRONGTYPE error)
- -1 — field exists but has no TTL
- ≥0 — absolute unix time or remaining duration (HEXPIRETIME/HPEXPIRETIME)
- 1 — TTL successfully removed (HPERSIST only)
WRONGTYPE is returned immediately (before field iteration) when the key holds a non-hash value.
Implementation:
- FieldState tri-state enum and hash_field_state helper (pre-landed in
phase 197) provide the zero-allocation field-state read used by all five
commands.
- parse_key_and_fields shared parser (hash_write.rs) extracts
key FIELDS numfields field [field ...]; reuses SmallVec<[&[u8]; 4]>
to avoid heap allocation for the common ≤4-field case.
- Four read handlers (HEXPIRETIME, HPEXPIRETIME, HTTL, HPTTL) take
&Database; HPERSIST takes &mut Database.
- All five commands routed in both dispatch() (mutable path) and
dispatch_read_inner() / is_dispatch_read_supported() (shared-read
path) for the four read commands.
- 14 unit tests cover all return-code variants, the already-expired edge
case, WRONGTYPE, missing key, numfields=0, and encoding downgrade.
Added — Hash-field active expiration (phase 197, issue #108)¶
All 9 hash read commands (HGET, HMGET, HGETALL, HEXISTS, HLEN,
HKEYS, HVALS, HSCAN, HRANDFIELD) now respect per-field TTLs set by
the HEXPIRE family. Expired fields are invisible to callers without
requiring the active-expiry tick to have run first (lazy expiry).
Lazy expiry (read path, &Database): HashRef gains a third variant
WithTtl { fields, ttls, now_ms } that filters expired fields on every
field-level operation. The get_hash_ref_if_alive accessor now returns this
variant for HashWithTtl entries instead of falling through to WRONGTYPE.
All 9 mutable read commands now call get_hash_ref_if_alive instead of the
unfiltered get_hash, so expired fields are never returned.
Active expiry (tick path, &mut Database): the per-shard expire_cycle
gains a second sweep via Database::hashes_with_field_expiry() and
Database::reap_expired_fields_one_hash(). The reaper removes expired
fields from the fields and ttls maps, downgrades the hash back to plain
Hash when the last TTL sidecar entry is drained, and signals KeyDeleted
when all fields expire (the key is then removed by the caller).
The maybe_has_expiring_keys fast-path flag is now cleared only when
both the whole-key sweep and the hash-field sweep return empty, preventing
premature flag-clearing that would have silenced future field reaping.
Complexity change: HLEN is now O(N) for HashWithTtl hashes (counts
only live fields). Plain Hash and HashListpack remain O(1) / O(N)
respectively — no regression.
Added — HEXPIRE-family write commands (phase 196, issue #107)¶
HEXPIRE key seconds [NX|XX|GT|LT] FIELDS numfields field [field ...]HPEXPIRE key milliseconds [NX|XX|GT|LT] FIELDS numfields field [...]HEXPIREAT key unix-seconds [NX|XX|GT|LT] FIELDS numfields field [...]HPEXPIREAT key unix-ms [NX|XX|GT|LT] FIELDS numfields field [...]
Per-field return codes match Valkey 9.0: 0 (no such field), 1 (TTL set or
updated), 2 (field expired during this call and was deleted), -2 (NX / XX /
GT / LT condition not met). Wrong-type key returns WRONGTYPE; missing key
returns one 0 per requested field. PHF + dispatch routes the four commands
to hash::{hexpire,hpexpire,hexpireat,hpexpireat}; AOF replay already handled
by the phase-200 intercepts.
Side fix — HashWithTtl-aware hash-write path: HSET, HMSET, HSETNX,
HDEL, HINCRBY, and HINCRBYFLOAT previously returned WRONGTYPE once
HEXPIRE had promoted a hash to the HashWithTtl encoding. Extended
Database::get_or_create_hash / get_or_create_hash_listpack to handle
HashWithTtl (returning the inner fields map / Ok(None) respectively).
Added Database::hash_clear_field_ttls (called by HSET/HMSET on overwrite,
no-op on plain Hash) and Database::hash_delete_field (used by HDEL —
removes from both the fields map and the per-field TTL sidecar, downgrades
back to plain Hash when the last TTL is dropped). HSETNX correctly leaves
existing TTLs intact; HINCRBY / HINCRBYFLOAT preserve TTL on the incremented
field.
Fixed — Test infra: txn_kv_wiring flake diagnosis¶
test_txn_commit_wal_crash_recoverypreviously masked moon-server crashes as "Connection refused (os error 61) after 60s" because the spawned moon binary's stdout / stderr were piped toStdio::null(). Hardened the child-process harness:ChildGuard::spawnnow redirects child stdout/stderr tomoon-phase-{1,2}.loginside the per-test temp dir.ChildGuard::poll_exitcheckstry_wait()inside the connection retry loop — if the child exited, the test fails immediately with the exit status instead of timing out for a useless 60 s.ChildGuard::dump_logis called on every failure path (wait_for_servertimeout,connect_redis_with_retrytimeout, child crash), so the CI log records the actual server output.wait_for_serverdeadline widened from 5 s → 15 s, matching the realistic CI-runner spawn + WAL boot envelope.- No semantic change for the happy path. Future flakes become diagnosable instead of silently retrying for 60 s.
Docs — Storage Format v1 commitment (phase 192, PR #115)¶
docs/STORAGE-FORMAT-V1.md— public on-disk format contract for v0.2.x. Documents the WAL v3 / RDB v2 / AOF multi-part sub-formats as a single "storage format v1" umbrella with explicit forward-read, reverse-read, crash-recovery, and migration guarantees through ≥18 months of LTS. Adds cross-reference doc-comments tosrc/persistence/aof_manifest.rs,snapshot.rs, andwal_v3/segment.rspointing readers at the canonical on-disk markers. Reserves a--storage-format <v1>CLI flag for the follow-up code PR closing issue #103's second checkbox.
Added — Hash-Field TTL primitive (phase 195, PR #116)¶
RedisValue::HashWithTtl { fields, ttls }storage variant + borrowedRedisValueRef::HashWithTtlview. Per-field TTL sidecar (BTreeMap<Bytes, u64>) carries absolute unix-ms expiry alongside an unchangedHashMap<Bytes, Bytes>field map.OBJECT ENCODINGreportshashtable.Database::hash_set_field_ttl(key, field, abs_ms, cond)— Valkey 9.0 parity result codes (0 missing / 1 set / 2 deleted-on-set / -2 cond not met). NX / XX / GT / LT semantics matching Valkey (non-volatile is +∞ for GT/LT). Auto-promotesHash/HashListpack→HashWithTtlon first per-field TTL; past-expiry short-circuits to in-place delete with code 2.Database::hash_persist_field(key, field)— clears one field's TTL; downgrades back to plainHashwhen the last TTL is removed.Database::hash_get_field_ttl_ms+Database::hash_field_state— read helpers returning the tri-stateFieldState::{Missing, NoTtl, Ttl(u64)}consumed by phase-198 HTTL / HEXPIRETIME / HPTTL / HPEXPIRETIME read commands.- Additive
HashWithTtlmatch arms incommand::key::should_async_drop,server_admin::estimate_serialized_length,eviction::evict_one_async_spill,tiered::kv_spill,tiered::kv_serde,persistence::rdb::write_entry,persistence::redis_rdb, andpersistence::aof::generate_rewrite_commands. Persistence-side TTL payload lands in PR #117 (phase 200); arms here are TTL-stripping placeholders so HEXPIRE handlers (phase 196) cannot be merged ahead of the persistence wiring.
Added — Hash-Field TTL persistence wiring (phase 200, PR #117)¶
- RDB v2 — bumped
RDB_VERSION 1 → 2. New per-hash trailer[ttl_count u32][field, ttl_ms u64]*follows everyTYPE_HASHbody (count=0 for non-TTL hashes). Reader accepts both v1 and v2 files;count_entries_per_db/skip_entry/read_entry_zero_copy/read_entryall plumb ahas_hash_ttl_trailerflag so the format is parsed correctly on both code paths (file load + in-memory bytes). - Shard RDB V3 —
SHARD_RDB_VERSION 2 → 3with the same trailer plumbed throughrdb::read_entry. V1 (legacy) / V2 (PITR LSN+ts) / V3 (TTL trailer) all load via per-version preamble + min-file-size branching. - Tiered KV serde — per-hash trailer on disk-offload blobs. Plain
Hash + HashListpack serializers now append
ttl_count = 0; HashWithTtl serializer writes the real trailer. Deserializer treats a truncated trailer as zero TTLs for graceful migration from pre-trailer in-process spill blobs. - BGREWRITEAOF —
RedisValueRef::HashWithTtlarm emitsHSET key f1 v1 f2 v2 ...followed by per-fieldHPEXPIREAT key abs_ms FIELDS 1 field, one per TTL'd field. Per- field framing keeps the replay shim simple (single-field parse). - Replay shim —
CommandReplayEngine::replay_commandinterceptsHEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT/HPERSIST(case-insensitive) beforecommand::dispatchand routes directly toDatabase::hash_set_field_ttl/hash_persist_field. This bypasses the phase-196 command handlers (which do not yet exist) so crash-restart restores per-field TTLs from any AOF stream emitted by either user-typed HEXPIRE or BGREWRITEAOF. - Redis-compat RDB — emits
tracing::warn!when dropping per-field TTLs on Redis-compat export. Redis 7.4 hash-field-TTL opcode emission deferred to a future cross-vendor compat phase.
Added — Tier 2 Lane A (PR #100)¶
- T2.1
c381b31—SWAPDBcross-shard atomic swap viaShardMessage::SwapDb; WAL-durable; BGREWRITEAOF concurrency guard; restart-replay test. - T2.2
4958dc9—MOVE key dbwithwith_two_dbs_locked(lower-index-first lock ordering); WAL-durable; intercept in all four handler paths. - T2.3
bbc6117—COPY ... DB ncross-database; reuseswith_two_dbs_locked; WAL-durable. - T2.4
f538589—CLUSTER REPLICAS/CLUSTER SLAVES; sharedformat_node_line(node, self_node_id)helper extracted fromCLUSTER NODES. - T2.5
ebd240a—CLUSTER COUNT-FAILURE-REPORTS; counts non-stalepfail_reports; exposesDEFAULT_NODE_TIMEOUT_MSaspub(crate).
Fixed¶
- PERF
608e2d1— collapse duplicateis_writePHF gate on MOVE/COPY hot path; restores s=1 SET p=1 throughput (−9.5 % → +0.9 % vs. merge base). - CR (this PR) — SWAPDB now runs after the ACL gate in
handler_monoio, closing a runtime-specific authorization bypass. - CR (this PR) —
with_two_dbs_lockedandShardDatabases::swap_dbsnow hard-assert non-equal indices in release builds, preventing same-index self-deadlock. - CR (this PR) —
SWAPDBstrict arity (exactly two args) across all three handlers; rejectsSWAPDB 0 1 extrawith the canonical wrong-arity error. - CR (this PR) —
DashTable::Segment::insert_or_update_atnow setshas_non_home_keys = truewhenever the chosen free slot is in a non-home group; fixes a latent miss wherefind()could not locate a fallback-placed key on subsequent lookups. - CR (this PR) — Local
MOVE/COPYAOF append is gated onFrame::Integer(1)(success) rather than!Error, matching thehandler_singlebehavior and suppressing no-op:0log entries. - CR (this PR) —
MOVE,COPY ... DB n, andSWAPDBare rejected withERR_TXN_CROSS_SHARDwhile anactive_cross_txnis in flight; previously the intercepts bypassed undo/intents bookkeeping and escapedTXN.ABORTrollback. - CR (this PR) —
spsc_handlerMOVE/COPYarms callrefresh_now_from_cacheon both source and destination DBs beforemove_core/copy_core; fixes expired-key visibility skew on the local-write path.
Refactor¶
e429b2b—src/storage/dashtable/segment.rs(1587 LOC) split intosegment/{mod,find,insert,ops}.rs; mechanical refactor, zero semantic change, brings all files under the 1500-LOC limit ahead of future hot-path additions.
Added — Point-in-Time Recovery (PITR)¶
- P0
ac3aa92—WalWriterV3::new()now scans existing.walsegments on open and resumesnext_lsnfrommax_observed_lsn + 1. Fixes a latent durability bug where LSN reset to 1 on every restart and blocked both PITR and CDC. - P1
e1e9bda—FileEntryextended withlast_modified_lsn(offset 48..56, struct size 48 → 56). Manifestformat_versionbumped to v2; backward-compat reader synthesizeslast_modified_lsn = created_lsnfor v1 entries. New CLI flags--recovery-target-lsnand--recovery-target-time. - P2
25ece4b— Snapshot header bumped toSHARD_RDB_VERSION = 2, embeddinglast_lsnandcreated_at_unix_ms. v1 snapshots load withlast_lsn = 0and are conservatively skipped by PITR. Addsread_snapshot_metadatapeek API. - P3a
048a883—replay_wal_v3_dir_until(stop_at_lsn)andresolve_target_time_to_lsn()(scansTemporalUpsert/GraphTemporalrecords forsystem_fromanchors). - P3b
a496413—recover_shard_v3_pitr()honorstarget_lsn: skips snapshots whoselast_lsnis unknown or past the target, then stops replay at the cutoff without advancingwal_flush_lsn.
See docs/guides/pitr.md for operator usage.
Added — Change Data Capture (CDC)¶
- C1
b271e21—WalTailReaderwith resumableTailCursor { segment_seq, byte_offset, last_lsn }. Re-stats segment metadata on each call for torn-write safety, auto-advances on segment rotation. - C2
e97b80d— Newsrc/cdc/module with typedCdcEventenum,decode_wal_record()translator, and hand-rolled Debezium-compatible JSON envelope serializer (encode_debezium). Non-UTF-8 keys fall back to{"_b64":"..."}. - C3 v1
bf4230b—CDC.READ <wal_dir> <from_lsn> [LIMIT N]polling command. Returns RESP array[next_lsn, env1, env2, ...]; default LIMIT 256, hard ceiling 10 000. Idle response is[from_lsn](length 1) — stable no-new-data signal.
See docs/guides/cdc.md for consumer integration.
Added — Embedded sharded server¶
- PR #95 —
server::embedded::run_embedded(config, cancel)exposes the full sharded handler (with TXN.* cross-store transactions) to in-process embedders such ashelios moon-daemon. The existingrun_with_shutdowndriveshandler_single, which deliberately does not implement TXN. Embedded mode skips TLS, console, cluster bus, admin port, and multi-part AOF manifest replay; it does include per-shard RDB + WAL recovery, graph/temporal/workspace/MQ WAL replay, SO_REUSEPORT, NUMA pinning, and cancel-driven graceful shutdown.
Fixed — PR #96 test deflake + tokio AOF replay¶
main.rsAOF recovery: gated the multi-part AOF manifest replay block to#[cfg(feature = "runtime-monoio")]. Under tokio, the legacy single-fileappendonly.aofis loaded via the v2 recovery chain; the multi-part loader no longer creates an empty manifest at first boot that wiped v2-loaded state on the next restart (every tokio SET was lost on restart).tests/txn_kv_wiring.rs: madetest_txn_commit_wal_crash_recoveryruntime-agnostic by polling for either the monoio multi-part.base.rdbartifact or the tokio single-fileappendonly.aof. Addedconnect_redis_with_retryhelper to bound and retry the post-bind RESP handshake (was racing the shard accept loop, surfacing as EAGAIN on Linux and ECONNRESET on macOS CI).
Fixed — PR #95 review hardening¶
main.rsmalloc_confsymbol: replaced the union-based unsafe pun with a#[repr(transparent)]Syncwrapper around ac"..."literal.command/server_admin.rsget_vsz_bytes: replaced fourunsafelibc::{open,read,close,sysconf} blocks with safe/proc/self/statusparsing on the cold MEMORY DOCTOR path.main.rsarena scan now usesenv::args_os()so non-UTF-8 argv no longer panics before clap reports the error.server/embedded.rsshutdown sequence: cancel → join shard threads → drop the outeraof_tx→ join the AOF thread, so the writer never exits while shards are still queuing appends. Thread join panics are now propagated through the function result.storage/db.rsannotated two.expect()calls inDatabase::set'sinsert_or_updateclosures for the hot-path unwrap ratchet.
Deferred to v0.2 follow-ups¶
- P3c — wire
SnapshotState::set_last_lsn(wal_flush_lsn)into the live persistence tick so freshly-written snapshots embed their LSN (currently PITR falls back to full WAL replay when only v1-shaped snapshots exist). - C3b — push-based
CDC.SUBSCRIBEover RESP3 Push frames, per-shard subscriber registry hooked intowal_append_and_fanout, slow-consumer disconnect policy. Envelope format unchanged. - Integration suites (
tests/pitr_integration.rs,tests/cdc_integration.rs),scripts/test-pitr-cdc.shend-to-end smoke, and the benchmark gates (PITR restart ±10%, CDC ≥100K events/s/shard, write p99 ±5%).
[0.1.12] — 2026-05-12¶
Performance & memory observability release. 50 commits since v0.1.11, no
public API breaks, no on-disk format change. Validated on OrbStack moon-dev
(2026-05-12) and locally green for both runtime-monoio and
runtime-tokio,jemalloc.
Performance — DashTable hot-path (Phase 189, PERF-07 + PERF-09)¶
- Pre-sized DashTable.
DashTable::with_capacity()plus the new--initial-keyspace-hint <N>flag size the segment array up front so steady-state operation hits zerosplit_segmentcalls. Pre-size invariant test confirms zero splits at 1 M keys. The 27 % CPU spent insplit_segmentduring SET p=16 (PERF-07) is fully eliminated. Database::setrewrite. NewDashTable::insert_or_update/Segment::insert_or_update_atsingle-probe helpers replace the previousfind + remove + inserttriple-probe pattern.Segment::findfallback elimination + force-inlined SIMD. The cold "key spilled to non-home group" fallback path is removed oncehas_non_home_keysis invariant-tracked on insert (theinsert_or_update_atchange above already maintains the flag); the SIMD probe helpers are#[inline(always)]. PERF-09 attributed 12.65 % ofSegment::findself-time to the fallback; remaining cost is the irreducible per-hitmemcmpconfirm (threshold amended to <3 %). 1 M-key correctness gate validates zero false positives/negatives.
Performance — Memory observability (Phase 190)¶
moon_memory_bytes{kind=…}Prometheus gauge. Seven subsystem labels —dashtable,hnsw,csr,wal,sealed_replication_backlog,allocator_overhead, and the rolled-uptotal. Updated every scrape via a single hook so the sum reconciles toRSSwithin the CI tolerance window.MEMORY DOCTORfull schema. Multi-line RESP response covering every subsystem, the rolled-up total, and a derivedallocator_overheadpseudo-kind (RSS − Σ subsystems). Adds operator triage signal beyond the legacy single-line summary.resident_bytes()trait implemented acrossDatabase,DashTable,VectorStore(HNSW + IVF),GraphStore(CSR + SlotMap),WalWriter,ReplicationBacklog(sealed-segment side), andAllocatorOverhead. Zero-allocation, on-demand poll.- Memory steady-state CI job.
scripts/bench-memory-steady-state.sh - baseline fixture; gate widened to
±10 %on RSS / Σ ratio after a Linux-CI tolerance pass.
Changed — Allocator UX (Phase 191)¶
- jemalloc
narenas:8cap with--memory-arenas-cap <N>CLI override. Caps the per-CPU arena explosion that inflates VSZ on high-core hosts; mostly a cosmetic fix on Linux containers but produces a meaningfully tightertop/psreading for operators. - Tri-state allocator selection. New
mimalloc-altcargo feature alongside the existingjemalloc/mimalloc(fallback) paths; mutually exclusive at compile time. A/B benchmark scriptscripts/bench-allocator-ab.shships with the release. docs/OPERATOR-GUIDE.md— Memory Accounting section. Documents the VSZ-vs-RSS distinction, MEMORY DOCTOR field-by-field, and the--memory-arenas-cap/mimalloc-alttuning knobs.
Added — Dispatch Observability (Phase 177)¶
moon_dispatch_path_total{path=...}Prometheus counter: four-way classification of every command by shard-routing decision —local_inline(SIMD fast path),local(standard local branch),cross_read_fast(RwLock shared-read bypass of SPSC),cross_spsc(deferred cross-shard write viaPipelineBatchSlotted). Ratiocross_spsc / Σis the ground-truth signal for dispatch-layer optimization work. Zero-allocation hot-path overhead (&'static strlabels,#[inline]with early-return on!METRICS_INITIALIZED). Verified on macOS + Linux: counter sums close exactly to driven traffic, no overcount.
Changed¶
text-indexis now a default feature. BM25 full-text search (FT.SEARCHBM25 mode),FT.AGGREGATE, and three-way RRF hybrid fusion are included in all standard builds. No longer requires--features text-index. To exclude it (e.g. minimal embedded builds):--no-default-features --features runtime-monoio,jemalloc,graph.
Added — SDK Validation¶
- Python SDK
sdk/python/examples/validate.py: End-to-end live validator for all SDK sub-clients: ping, strings, counter, hash, list, set, zset, vector index lifecycle, graph engine, session search, semantic cache, text search (BM25 + aggregate + hybrid), and server info. Result against Moon withtext-index: 114 PASS / 0 FAIL / 0 SKIP. Gracefully skips text sections when server built withouttext-index. - Rust SDK
sdk/rust/examples/validate.rs: Re-validated againsttext-indexbuild — 85 PASS / 0 FAIL.
Fixed — Python SDK¶
moondb.graph._parse_neighbors: server returns alternating[edge_map, node_map, ...]as flat key-value arrays (b'id',int,b'src',int,b'dst',int, …). Previous parser expected positional[node_id, label, props]— causedint() on b'id'crash. Now correctly identifies node entries bylabelskey and parses them from the flat kv format.
Fixed — CI Hygiene¶
tests/pipeline_auto_index.rs: tighten outer cfg fromruntime-tokiotoall(runtime-tokio, text-index)so the file compiles to zero tests when text-index is disabled. Previously the file compiled but the FT.SEARCH text fast path was#[cfg]-ed out, causing@name:corpusqueries to fall through to the KNN-only parser and panic with "invalid KNN query syntax".- 4 FT unwraps: add inline
#[allow(clippy::unwrap_used)]with invariant justifications invector_search/ft_text_search.rs(3 sites insideapply_post_processingwheredo_summarize/do_highlightimplies the Option is Some) andhandler_monoio/ft.rs:165(is_textwas derived fromquery_bytes.as_ref().map_or(false, _)). Restores the audit-unwrap baseline to 0.
Compatibility¶
- Wire protocol: unchanged. Drop-in replacement for v0.1.11.
- Persistence on-disk format: unchanged.
- Default feature set:
text-indexis now on by default. Minimal embedded builds need an explicit--no-default-features --features runtime-monoio,jemalloc,graph.
[0.1.11] — 2026-04-27¶
Hot-path perf release — eliminates two atomic-CAS hot paths in the write
dispatch loop discovered via ARM perf annotate on c4a-16 (GCloud Axion).
Empirically validated on the same hardware: 8-shard SET p=64 c=200
throughput 1.84M → 3.87M RPS (+110%) when run with --disk-offload disable,
or +15% under default flags. No public API change.
Sprint 3.5a and 3.5b from .planning/rfcs/v02-enterprise-architecture.md.
Performance — Sprint 3.5a: Lock-free is_replica mirror¶
try_enforce_readonly was taking RwLock::try_read() on
Arc<RwLock<ReplicationState>> for every command before dispatch — an
atomic CAS on the per-command hot path. ARM annotate showed mov w8, #0xfffd;
cmp w11, w9 consuming 84% of self-time inside the function (10% of
total CPU on 8-shard SET p=64).
Fix: replace the per-command lock probe with a single
AtomicBool::load(Acquire).
ReplicationState::is_replica_mirror: Arc<AtomicBool>— lock-free mirror ofrole == Replica { .. }, kept in sync via the newReplicationState::set_role(&mut self, role)method (single owner of the invariant).ConnectionContext::is_replica_mirror: Option<Arc<AtomicBool>>— snapshotted fromReplicationStateonce at connection setup; per-commandtry_enforce_readonlyis now just an atomic load with no lock acquisition.- All 6 production
rs_guard.role = ...sites inhandler_single,handler_monoio/dispatch, andhandler_sharded/dispatchmigrated toset_role(). Test fixtures inreplication/handshake.rsmigrated too so the invariant holds in test code. - Round-5 verification on commit
32f48c4(c4a-16, 8-shard SET p=64 c=200):try_enforce_readonlyis now 0% of profile (down from 10%). Sprint 3 acceptance criterion<1%met.
Performance — Sprint 3.5b: WAL no-op bypass¶
wal_append_and_fanout was acquiring a parking_lot::Mutex (replication
backlog) and a std::sync::RwLock (replication state) on every write,
even when no replica was connected and no WAL writer existed. ARM annotate
showed caslb/casab ARM CAS-byte atomics dominating self-time (~21% of
total CPU on 8-shard SET p=64 with --appendonly no and zero replicas).
Fix: hoist a single early-return at the top of the function:
The criterion is fully derivable from existing inputs — no new shared
state. Skips both the backlog Mutex::lock and the repl_state
RwLock::read on the cold path.
- Round-5 verification with
--disk-offload disable(commit32f48c4):wal_append_and_fanoutis now 0.05% of profile (down from 21%). Sprint 3 acceptance criterion<2%met. - Operator note: the bypass only fires when (a)
--appendonly no, (b)--disk-offload disable, AND (c) no replicas are connected. Default builds with disk-offload on (the production default) keepwal_v3_writer = Some(_)and the function continues to do real WAL v3 work — that path is unaffected.
Throughput Impact¶
| Workload (8-shard SET p=64 c=200, c4a-16, frame pointers ON) | v0.1.10 | v0.1.11 | Δ |
|---|---|---|---|
| Default flags | 1.84M RPS | 2.11M RPS | +15% |
--disk-offload disable |
~2.95M RPS (projected) | 3.87M RPS sustained | +31% |
try_enforce_readonly self-time |
10.0% | 0% | -10pp |
wal_append_and_fanout self-time |
21.2% | 0.05% (with disk-offload disable) | -21.1pp |
Production builds without frame pointers should clear 5.5–6.5M RPS at the
same flag set. Round-4 baseline data: memory/benchmark_perf_round4_2026_04_27.md.
Round-5 verification data: memory/benchmark_perf_round5_2026_04_27.md.
Tests Added¶
replication::state::tests::test_set_role_updates_is_replica_mirrorreplication::state::tests::test_is_replica_mirror_default_falseshard::spsc_handler::wal_append_tests::test_wal_append_bypass_when_no_writers_no_replicasshard::spsc_handler::wal_append_tests::test_wal_append_writes_backlog_when_replicas_present
All 2450 lib tests passing locally (cargo test --no-default-features
--features runtime-tokio,jemalloc --lib); clippy clean (cargo clippy
--no-default-features --features runtime-tokio,jemalloc -- -D warnings).
Drive-by Fixes¶
src/shard/mod.rs: pre-existing test compile failure where twodrain_spsc_sharedcall sites passed&mut Noneforrepl_backloginstead of aSharedBacklog(Arc<Mutex<Option<...>>>). Fixed by constructingArc::new(parking_lot::Mutex::new(None))in the test fixture. This was blockingcargo test --libonmainindependent of this release.src/shard/dispatch.rs: pre-existing clippydoc_lazy_continuationerror on theTextSearchPayloaddoc comment, blockingcargo clippy -- -D warningsonmain.
Compatibility¶
- Wire protocol: unchanged. Drop-in replacement for v0.1.10.
- Public API: only additive (
ReplicationState::set_role,ReplicationState::is_replica_mirrorfield). No breaking changes. - Persistence on-disk format: unchanged.
- Replication wire format: unchanged.
[0.1.10] — 2026-04-23¶
Stable replication marker. Single-shard PSYNC2 wired end-to-end and
production-ready for --shards 1 master with any --shards N replica
topology. Multi-shard master PSYNC is scheduled for v0.2 (see
.planning/rfcs/multi-shard-replication-design.md).
- Replication (
081c43b): single-shard master PSYNC2 end-to-end wired, REPLCONF validated,master_link_statusreports the actual handshake state instead of the legacyupstub. - Performance: batch-level eviction gate;
try_handle_*paths#[inline]-ed; DashTable carries through the v0.1.10 pre-size groundwork (capacity hint + headroom). - Docs: BENCHMARK.md §2.7 updated with the 2026-04-22 GCloud
re-measurement; v0.1.x replication scope documented under
docs/guides/clustering.mdx#replication.
[0.1.9] — 2026-04-19¶
Lunaris Retriever Gap Closure. Every v0.1.8 client-side fallback in
the Lunaris SDK is now closed so HybridRRFRetriever (dense path),
GraphFirstRetriever, and PathReasoningRetriever run Moon-native.
- Phase 167 CYP-01/02: Cypher
CREATE/MERGEwrites participate inCrossStoreTxnviarecord_graph();TXN.ABORTrolls them back. - Phase 168 CYP-03/06:
coalesce()built-in + single-hop edge-var binding in variable-lengthEXPAND. - Phase 169 CYP-04/05:
shortestPath()parser + Dijkstra executor bridge with path-variable binding. - Phase 170 HYB-01/02/04:
FT.SEARCH HYBRIDdense stream honoursas_of_lsn. - Phase 171 SCAT-01/02/03:
ShardMessage::VectorSearch+FtHybridPayloadcarryas_of_lsnfor multi-shardAS_OFcorrectness. - Phase 172 PIPE-01/02/03: pipeline-aware HSET auto-indexing regression guard (3-test suite).
Audit status: PASSED_WITH_DOCUMENTED_DEFERRALS. 15 / 20 requirements fully satisfied; HYB-03 BM25 MVCC deferred and closed in v0.1.10 follow-up (G-1); Phase 173 hygiene HYG-02 handler split RFC'd.
Stats: 6 phases shipped, 17 plans, 27 files changed, +2924 / −376 LOC.
[0.1.8] — 2026-04-18¶
Added — Cross-Store ACID Transactions (Phases 157, 161-163)¶
- TXN.BEGIN: Start a cross-store transaction — buffers KV, vector, and graph writes as intents.
- TXN.COMMIT: Commit all changes atomically with WAL record (
XactCommit0x34) for crash recovery. - TXN.ABORT: Roll back all changes via undo-log replay with before-images.
- KvWriteIntents: Sparse MVCC side-table for uncommitted KV writes during transactions.
- DeferredHnswInserts: Vector index inserts deferred until commit, avoiding partial graph states.
- UndoLog: SmallVec-based KV rollback with before-image recording for all mutation types.
- WAL transaction records:
XactBegin(0x33),XactCommit(0x34),XactAbort(0x37) with crash recovery replay. - Mutual exclusion:
TXNandMULTI/EXECcannot be mixed (enforced at handler level).
Added — Bi-Temporal MVCC (Phase 158)¶
- TEMPORAL.SNAPSHOT_AT: Record wall-clock → WAL LSN binding for point-in-time queries.
- TEMPORAL.INVALIDATE: Set
valid_toon a graph entity (NODE or EDGE) for temporal visibility control. - FT.SEARCH AS_OF: Query vector indexes at a historical timestamp via LSN resolution.
- GRAPH.QUERY VALID_AT: Execute Cypher queries against graph state valid at a specific timestamp.
- TemporalRegistry: BTreeMap-backed wall-clock → LSN mappings with O(log n) range lookups.
- TemporalKvIndex: Sparse versioned KV index with lazy initialization.
- CSR segment format v2: Bi-temporal
NodeMetawithvalid_from/valid_tofields. - WAL temporal records:
TemporalUpsert(0x35) andGraphTemporal(0x36) for crash recovery.
Added — Workspace Partitioning (Phase 159)¶
- WS CREATE: Create a workspace with UUID v7 (time-ordered, 74-bit random). Name max 64 bytes.
- WS DROP: Delete a workspace and its registry entry.
- WS AUTH: Bind a connection to a workspace — all subsequent commands transparently prefixed.
- WS INFO: Return workspace metadata (name, creation timestamp).
- WS LIST: Enumerate all registered workspaces.
- Transparent key rewriting:
workspace_rewrite_args()injects{ws_hex}:hash tag prefix on key arguments and strips it from responses. - WorkspaceRegistry: Per-shard metadata with creation timestamps and WAL persistence.
Added — Durable Message Queues (Phase 160)¶
- MQ CREATE: Create a durable queue with
MAXDELIVERY(default 3) andDEBOUNCEoptions. - MQ PUSH: Enqueue messages with field/value pairs (returns stream ID).
- MQ POP: Claim messages with optional
COUNT(defaults to 1). Increments delivery counter. - MQ ACK: Acknowledge messages by stream ID.
- MQ DLQLEN: Return dead-letter queue depth.
- MQ TRIGGER: Register debounced trigger callbacks with configurable debounce interval.
- MQ PUBLISH: Transactional enqueue within a
TXNblock — applied onTXN COMMIT. - Dead-letter queue: Automatic DLQ at
{queue_key}::mq:dlqafter exceedingMAXDELIVERYattempts. - TriggerRegistry: Debounced callback execution via pub/sub publish.
- WAL recovery:
replay_mq_wal()with cursor rollback for durable queue state restoration.
Added — Handler Parity¶
- All TXN/TEMPORAL/MQ/WS commands wired into
handler_monoio.rs,handler_sharded.rs,uring_handler.rs, andhandler_single.rs. - 12 KV transaction integration tests, 14 MQ integration tests, workspace cross-shard dispatch tests.
- TEMPORAL/MQ/WS/TXN entries added to
test-commands.shandtest-consistency.sh.
[0.1.7] — 2026-04-17¶
Added — BM25 Full-Text Search Engine (Phases 149-156)¶
- BM25 inverted index: Full-text search with multi-field boosting and per-field term frequency tracking.
- TEXT field type: Unicode tokenization with stemming and normalization in
FT.CREATE. - TAG field type: Categorical tag filtering with multi-value support (
@field:{val1|val2}). - NUMERIC field type: Range filtering (
@field:[min max]) inFT.CREATEandFT.SEARCH. - FT.AGGREGATE: Aggregation pipeline with
GROUPBY/REDUCE, scatter-gather across shards, HLLCOUNT_DISTINCT. - Three-way RRF hybrid fusion: Combines BM25 + dense vector + sparse vector results via Reciprocal Rank Fusion.
- Typo tolerance: FST Levenshtein fuzzy matching (
%%term%%) and prefix search (term*). - HIGHLIGHT/SUMMARIZE: Post-processors for formatting search results with matched term highlighting.
- Multi-shard DFS global IDF: Distributed frequency statistics for accurate BM25 scoring regardless of shard count.
- FT.DROPINDEX DD: Atomic index + document deletion flag — deletes all hash keys matching index prefixes.
- Python SDK text module:
client.text.text_search(),client.text.aggregate(),client.text.hybrid_search()with typed pipeline DSL. - LangChain/LlamaIndex hybrid adapters: Framework integrations updated for three-way hybrid search.
Statistics¶
- 8 phases (149-156), 27 plans, 26 requirements, 122 commits.
[0.1.6] — 2026-04-15¶
Added — AI-Native Data Primitives¶
- Multi-field vector indexes (
FT.CREATE): multiple VECTOR fields per index, per-field segment storage, field-targeted@field_namesyntax inFT.SEARCHKNN clause. - Sparse vector module (
src/vector/sparse/): inverted index withSparseStore, enabling BM25-style sparse retrieval alongside dense HNSW. - Hybrid dense+sparse search:
SPARSEclause inFT.SEARCHwith Reciprocal Rank Fusion (RRF) for combining dense and sparse results. - Text index (
src/vector/text_index.rs): Unicode tokenization pipeline with stemming and normalization, feature-gated undertext-index. - Boolean and geo filter expressions:
BoolEqandGeoRadiusfilter variants with evaluation logic inFilterExpr. - FT.RECOMMEND: centroid-based recommendation over vector indexes — computes centroid of seed vectors and returns nearest neighbors.
- FT.NAVIGATE: multi-hop knowledge graph navigation from vector search results, bridging vector and graph queries.
- FT.EXPAND: GraphRAG expansion command — traverses graph edges from vector search results to discover related entities, with configurable depth.
- FT.CACHESEARCH: semantic cache-or-search command — returns cached results on similarity hit, falls back to full search on miss.
- FT.CONFIG SET/GET: runtime configuration for per-index knobs (e.g.,
AUTOCOMPACTtoggle). - SESSION clause: session-scoped filtering in
FT.SEARCHfor multi-tenant and agent memory isolation. - RANGE threshold post-filter: distance threshold filtering in
FT.SEARCHresults. - LIMIT pagination:
LIMIT offset countsupport inFT.SEARCHwith multi-segment merge.
Added — Production Infrastructure¶
- moondb Python SDK (
python/moondb/): high-level client with vector, graph, session, cache, and framework integrations (LangChain, LlamaIndex). - 5 quickstart examples: RAG, semantic cache, GraphRAG, AI agent tools, memory engine — all using real MiniLM embeddings.
- Production deployment guide (
docs/production-guide.md): configuration reference, TLS setup, monitoring, ACL, tuning. - Dockerfile improvements: OCI labels, admin port exposure, production defaults.
- docker-compose.yml: production configuration with resource limits, health checks, ulimits.
- Prometheus metrics: counters and histograms for v0.1.6 commands (FT.RECOMMEND, FT.NAVIGATE, FT.EXPAND, FT.CACHESEARCH, FT.CONFIG).
- OpenTelemetry stubs:
otelfeature flag with tracing infrastructure for future OTLP export. - Benchmark script (
scripts/bench-v0.1.6.sh): automated benchmarks for new vector search features.
Added — Graph-Vector Integration¶
- EXPAND GRAPH clause in
FT.SEARCH: inline graph expansion during vector search with configurable depth. graph_expand.rsbridge module: connects vector search results to graph traversal engine.key_to_nodemapping onNamedGraph: enables graph expansion from vector search key hashes.
Fixed — Critical Production Bugs¶
- Deadlock in cross-shard HSET auto-index:
parking_lot::Mutexre-entry inPipelineBatchandPipelineBatchSlottedhandlers —shard_databases.vector_store(shard_id)attempted to re-lock a non-reentrant mutex already held by the caller. Fixed by using the passed-in reference. - Auto-index HSET inside MULTI/EXEC: vector auto-indexing now works correctly within transactions and pipeline batch paths.
- FT.RECOMMEND filter bug: filter expressions were not applied correctly to recommendation results.
- FT.CONFIG/RECOMMEND/NAVIGATE/EXPAND routing: commands now dispatched correctly in all connection handlers (monoio, tokio, single-threaded).
- Session deduplication: fixed duplicate session entries in search results.
- Inline filter parsing: corrected parsing of filter expressions in inline command mode.
- FT.COMPACT, FT.CONFIG, FT.CACHESEARCH metadata: registered in phf command metadata table for ACL and COMMAND DOCS.
Fixed — CI/Release Pipeline¶
- nfpm download URL: pinned v2.46.1 with correct
Linux_x86_64asset name (oldnfpm_linux_amd64.tar.gzreturns 404). - SBOM filenames:
cargo-cyclonedxv0.5+ writes.jsonnot.cdx.json. - Cosign keyless signing: added
id-token: writepermission for Sigstore OIDC (was falling back to interactive device flow). - Package job race condition: deb/rpm upload now waits for GitHub release to be created first.
- upload-artifact: upgraded v4 → v7 (Node.js 20 deprecation).
- Test compilation: added tokio dev-dependency with
processfeature forblocking_list_timeout.rsunder default (monoio) features.
Changed¶
- Graph enabled by default:
graphfeature now included in default feature set. - Vector index persistence: v2 format with backward-compatible v1 migration.
- Memory engine example: rewritten as 142-line script with real MiniLM embeddings (was 487-line complex agent loop).
- Clippy 1.94 compliance: all warnings resolved for Rust 1.94 MSRV.
Validation¶
- 2,613+ unit tests pass (release mode, default features).
- 2,139 library tests pass under
runtime-tokiofeature set. - 184 unsafe blocks, all with SAFETY comments (audit pass).
- 0 unannotated unwraps on hot paths (ratchet pass).
- Zero clippy warnings (default +
runtime-tokio,jemallocfeature sets). cargo fmt --checkclean.- 8 fuzz targets in CI.
- Full release pipeline validated: 6 binary targets, Docker image, deb/rpm packages, SBOMs, cosign signatures.
[0.1.5] — 2026-04-12¶
Added — Moon Console (Interactive Data Client)¶
- HTTP/WebSocket gateway (
src/admin/): REST endpoints (/api/v1/info,/api/v1/command,/api/v1/keys,/api/v1/key/*,/api/v1/memory/treemap,/api/v1/hnsw/trace), WebSocket-to-RESP3 bridge at/ws/console, SSE metrics stream at/sse/metrics(1 Hz), CORS allowlist, per-IP token-bucket rate limit, HMAC-SHA256 Bearer auth, HTTP/2 support, static file serving viarust-embed. - React 19 console (
console/): 7-view SPA (Dashboard, Browser, Console, Vector Explorer, Graph Explorer, Memory, Help) served at/ui/. 50.9 KB gzipped initial bundle (6× under 300 KB target) via Vite 8 + manual chunk splitting (Three.js/Monaco/Recharts lazy). - Real-time Dashboard: 7 widgets (QPS, latency P50/P99, memory, clients, ops by type, keyspace) driven by SSE stream.
- KV Data Browser: namespace tree, virtual-scrolled key list (TanStack Virtual), type-specific editors for Strings/Hashes/Lists/Sets/Sorted Sets/Streams, TTL display + edit, bulk delete with toasts.
- Query Console: Monaco editor with RESP + Cypher Monarch syntax, 233-command auto-complete, multi-tab, history, Cmd+Enter (current line) / Cmd+Shift+Enter (whole buffer), line-by-line execution for paste safety.
- Vector 3D Explorer: UMAP projection in a web worker, HNSW layer overlay, KNN search with distance rings, lasso selection, Three.js r183 + React Three Fiber.
- Graph 3D Explorer: force-directed layout (d3-force-3d worker), Cypher editor, node/edge property inspector, hybrid query integration.
- Memory view: keyspace treemap (server-aggregated
/api/v1/memory/treemap), slowlog table, command stats. - Built-in Help guide: 427-line Getting Started tutorial with seed examples.
- Core admin commands (
src/command/server_admin.rs): FLUSHALL, FLUSHDB, DBSIZE, DEBUG OBJECT/SLEEP/JMAP, MEMORY USAGE — closing pre-existing dispatch gaps. - Multi-shard SCAN fan-out (
src/admin/scan_fanout.rs): composite cursor{shard_id}:{cursor}so Browser sees unified keyspace. - Frontend test infrastructure: 56 Vitest unit tests + 9 Playwright E2E specs. New
scripts/test-integration.shharness and.github/workflows/console-integration.yml. - Admin-port hardening (
src/admin/{auth,cors,rate_limit,middleware}.rs): Bearer auth, CORS allowlist, per-IP rate limit.
Fixed¶
- WebSocket request ID echo: errors now echo back the client's
id, preventing client-side promise timeouts on malformed input. - Console type badges:
execCommandresponse was returning the full{result, type}envelope; now unwraps.resultcorrectly. - Multi-line paste in Console: Cmd+Enter now executes the current line only (redis-cli paste behavior). Cmd+Shift+Enter executes the whole buffer line-by-line.
Validation¶
- 101+ Rust unit/integration tests pass on both
runtime-tokioandruntime-monoio. - 56 Vitest tests + 9 Playwright specs pass.
- Zero clippy warnings (default +
runtime-tokio,jemallocfeature sets). cargo fmt --checkclean.- 8 fuzz targets in CI.
[0.1.4] — 2026-04-11¶
Added — Graph Engine Integration (v0.1.4, 2026-04-11)¶
- Property graph engine (
src/graph/, feature-gated undergraph): segment-aligned CSR storage with SlotMap generational indices, ArcSwap lock-free reads, Roaring validity bitmaps, and Rabbit Order compaction for cache locality. 8,500+ LOC, 319 tests. - 12 GRAPH.* commands: CREATE, ADDNODE, ADDEDGE, NEIGHBORS, QUERY, RO_QUERY, EXPLAIN, VSEARCH, HYBRID, INFO, LIST, DELETE — all with RESP3 Map responses and ACL annotations.
- Cypher subset parser: hand-rolled recursive descent with logos lexer, 12 clauses (MATCH/WHERE/RETURN/CREATE/DELETE/SET/MERGE/WITH/UNWIND/CALL/ORDER/LIMIT), parameterized queries ($param), nesting depth limit (64), plan caching.
- Hybrid graph+vector queries: graph-filtered vector search, vector-to-graph expansion, vector-guided walk with automatic strategy selection.
- Traversal engine: BFS/DFS/Dijkstra with bounded frontiers (100K cap), temporal decay + distance scoring, segment merge reader across mutable + immutable segments.
- Graph indexes: per-label/type Roaring bitmaps, boomphf minimal perfect hash (~3 bits/key), property B-tree for range queries.
- Cross-shard traversal: scatter-gather via SPSC mesh, graph hash tags for shard co-location, snapshot-LSN forwarding, configurable depth limit.
- Graph MVCC: extends existing TransactionManager with graph write intents, snapshot-isolated multi-hop traversal, bounded epoch hold (30s).
- Graph WAL durability: RESP-encoded graph commands in per-shard WAL, two-pass replay (nodes before edges), CRC32-validated CSR segment persistence.
- Cost-based planner: GraphStats with incremental degree tracking, graph-first vs vector-first strategy selection, P99 hub detection.
- Criterion benchmarks: CSR 1-hop 1.02ns, edge insert 64.8ns, 2-hop BFS 4.99µs, CSR freeze 5.12ms, SIMD cosine 384d 33.9ns.
- Fair comparison benchmark (
tests/graph_bench_compare.rs): Moon 2.4x FalkorDB on Cypher MATCH, 19x on native 1-hop, 23x on population. - New dependencies:
slotmap1.x (generational indices),boomphf0.6 (MPH),logos0.14 (Cypher lexer, optional).
Added — High-Impact Redis Command Parity (2026-04-10)¶
- COPY command — atomic key duplication with DESTINATION, REPLACE options (Redis 6.2+).
- Bit operations — GETBIT, SETBIT, BITCOUNT (byte/bit range modes), BITOP (AND/OR/XOR/NOT), BITPOS (byte/bit range modes) with read-only dispatch variants.
- SORT command — full BY/GET/LIMIT/ALPHA/ASC/DESC/STORE support for lists, sets, and sorted sets.
- Geospatial commands — GEOADD (NX/XX/CH), GEOPOS, GEODIST (M/KM/FT/MI), GEOHASH (11-char base32), GEOSEARCH (FROMLONLAT/FROMMEMBER, BYRADIUS/BYBOX, WITHCOORD/WITHDIST/WITHHASH), GEOSEARCHSTORE.
- CONFIG REWRITE — atomic write of runtime config to
<dir>/moon.conf(tmpfile + rename). CONFIG RESETSTAT stub. - CLIENT PAUSE/UNPAUSE — delays command processing with WRITE-only mode support. CLIENT INFO, CLIENT LIST (stub), CLIENT NO-EVICT/NO-TOUCH accepted.
- MEMORY USAGE/DOCTOR/HELP — key memory estimation via
estimate_memory(). - Lazyfree threshold — configurable via
CONFIG SET lazyfree-threshold N(default 64). - GETBIT/SETBIT metadata — added to PHF command registry.
- GEOADD/GEOSEARCHSTORE — added to AOF write commands test list.
- EXPIREAT/PEXPIREAT — absolute Unix timestamp expiry (seconds/milliseconds).
- EXPIRETIME/PEXPIRETIME — read back absolute expiry timestamp.
- FLUSHDB/FLUSHALL — clear all keys in current database.
- TIME — server clock as
[seconds, microseconds]. - RANDOMKEY — return a random key from the database.
- TOUCH — refresh LRU/LFU access time without reading value.
- SHUTDOWN — dispatch entry (graceful stop via signal handler).
- BITFIELD — GET/SET/INCRBY with type specifiers (u8/i16/u32/...), OVERFLOW WRAP/SAT/FAIL.
- LCS — Longest Common Substring with LEN option.
- XSETID — set stream last-delivered ID without adding entries.
- GEORADIUS/GEORADIUSBYMEMBER — deprecated wrappers translating to GEOSEARCH.
- OBJECT FREQ/IDLETIME/REFCOUNT — LFU counter, idle seconds, reference count introspection.
- LOLWUT — Easter egg returning Moon version.
Added — Client Connection Security Hardening (2026-04-10)¶
--maxclients(P0): Connection limit with atomic CAS rejection (default 10000, 0=unlimited). Returns-ERR max number of clients reachedwhen exceeded.--timeout(P0): Client idle timeout in seconds (default 0=disabled). Disconnects idle clients viatokio::time::timeout/monoio::select!.--tcp-keepalive(P0): TCP keepalive interval (default 300s, 0=disabled). SetsSO_KEEPALIVE+TCP_KEEPIDLEon accepted sockets viasocket2.- AUTH rate limiting (P0): Per-IP exponential backoff on AUTH failures (100ms base, 10s cap, 60s auto-reset). New module
src/auth_ratelimit.rs. - CLIENT LIST / INFO / KILL (P1): Global client registry with Drop-guard deregister. Redis-compatible output format. Kill by ID/ADDR/USER. New module
src/client_registry.rs. - CLIENT PAUSE / UNPAUSE (P1): Server-wide pause with ALL/WRITE modes and auto-expiry. New module
src/client_pause.rs. - CLIENT NO-EVICT / NO-TOUCH (P1): Accepted stubs for Redis compatibility.
- ACL GENPASS (P1): Cryptographically secure random password generation (1-4096 bits, hex output).
- CONFIG GET/SET support for
maxclients,timeout,tcp-keepalive(runtime-mutable). - Monoio connection tracking: Added missing
record_connection_opened/record_connection_closedfor accurateconnected_clientsmetric.
Fixed — Deep Review Findings (2026-04-11)¶
- DoS protection:
execute_profileandexecute_mutCypher paths now enforce MAX_HOPS_LIMIT=20 and MAX_RESULT_ROWS=100K (were unbounded). - WAL correctness: Cypher DELETE passes actual LSN to
remove_node/remove_edge(was hardcoded to 0). - GRAPH.DROP metadata: added missing phf dispatch table entry.
- SAFETY comments: added to all 7 unsafe SIMD/mmap functions.
- BFS 30% faster: scratch buffer reuse in SegmentMergeReader, zero-alloc CsrStorage callback, MergedNeighbor derives Copy.
- ParallelBfs: uses plain HashSet on sequential path (was DashSet with 64 shards overhead).
- Recovery hardening: CSR manifest path traversal validation, WAL embedding dimension cap (65536), LSN saturating_add.
- CI optimized: consolidated 26 jobs → 4 per PR, concurrency groups cancel superseded runs, fixed org runner group for public repos.
Fixed — Wave 0-4 Gap Closure (2026-04-09)¶
- ZREVRANGEBYSCORE/ZREVRANGEBYLEX correctness bug: Fixed double-swap of min/max bounds in
zrange_by_scoreandzrange_by_lexthat caused empty results for finite score ranges (e.g.,ZREVRANGEBYSCORE key 3 1). Added finite-range test totest-commands.sh. - INFO command enriched: Clients section now reports
connected_clients, Memory section reportsused_memory/used_memory_human/used_memory_rss(from/proc/self/status), Replication section wired to actualReplicationState(role, connected_slaves, master_replid, master_repl_offset). - Tracing spans: Added
#[instrument]to connection handlers (single, monoio), replication master (tokio, monoio), HNSW compaction, and AOF rewrite — 6 new spans. - Replication lag metric wired:
moon_replication_lag_bytesPrometheus gauge now updated fromget_replication_info(). - CI supply chain security:
cargo deny check+cargo auditadded to CI pipeline (deny.toml was previously unenforced). - Release pipeline: aarch64-unknown-linux-gnu build added via
crossfor primary production target. - Crash matrix expanded: BGSAVE and BGREWRITEAOF crash cells added (6/7 coverage).
- Compatibility tests expanded: Stream (XADD/XLEN/XRANGE/XTRIM), Lua scripting (EVAL/EVALSHA/SCRIPT), and ACL (WHOAMI/LIST) tests added to
redis_compat.rs.
Added — Production Contract (Phase 87, 2026-04-08)¶
docs/PRODUCTION-CONTRACT.md— Moon's v1.0 promises: per-command-class SLOs (provisional until Phase 97), supported platform matrix (Linux aarch64 primary, Linux x86_64 secondary contingent onPERF-04, macOS dev-only via OrbStack), durability mode semantics perappendfsync× failure-class, availability & replication guarantees, security guarantees, explicit out-of-scope list, and a machine-checkable GA Exit Criteria checklist that every v0.1.3 phase ticks off. This is the contract every downstream hardening phase (88–100) tests against.docs/runbooks/— stub directory for operator runbooks authored in Phase 99 (REL-05).
Changed — Toolchain Upgrade (Phase 88, 2026-04-08)¶
- MSRV bumped from Rust 1.85 to 1.94.0.
rust-toolchain.tomlcommitted so fresh clones auto-install the pinned version; CI workflows (ci.yml,codeql.yml,release.yml) and OrbStackmoon-devVM provisioning inCLAUDE.mdupdated. No language/runtime behavior change; downstream phases benefit from new clippy lints and std/compiler improvements. Contributors must runrustup updateon next pull.
Added — Production Readiness Phases 92-105 (2026-04-09)¶
- Observability: Prometheus
/metricson--admin-port, SLOWLOG GET/LEN/RESET/HELP, HEALTHZ + READYZ commands,/healthz+/readyzHTTP endpoints, INFO extended with Server/Clients/Memory/Stats/CPU sections,--check-configflag, per-command latency histograms + connection metrics wired into dispatch - Durability proof: Crash-injection test matrix, torn-write WAL v3 tests (CRC32C validated), Jepsen-lite linearizability harness, backup/restore workflow test
- Replication hardening: PSYNC partial resync, full resync, network partition, kill-restart, replica promotion tests
- Client compatibility: CI matrix (redis-py, go-redis, jedis, ioredis, node-redis, redis-rs, hiredis), 24 Redis compat tests, vector client smoke script,
docs/redis-compat.md - Performance gates: Criterion regression CI with baseline caching, RSS-per-key memory gate script
- Security hardening:
deny.toml(cargo-deny),SECURITY.md,docs/THREAT-MODEL.md,docs/security/lua-sandbox.md, TLS cipher suite freeze - Release engineering:
docs/versioning.md, 6 operator runbooks, CHANGELOG CI gate, user docs (getting-started, configuration, monitoring), release pipeline SHA256 checksums + SBOM + cosign
[0.1.3] — 2026-04-10¶
Production-readiness foundation: dispatch hot-path recovery, vector-search 4× QPS + correctness fixes, and the tiered disk-offload landing with 100 % crash recovery across 7 persistence configurations. Bundles three work streams originally tracked as separate Unreleased blocks (Apr 7–8).
Dispatch Hot-Path Recovery (2026-04-08)¶
Pipelined SET +37%, pipelined GET +68% at p=16 after PR #43 regression recovery.
Three targeted perf fixes landed after flamegraph-driven analysis of pipelined SET on aarch64 (OrbStack moon-dev, 1 shard, default config, redis-benchmark -c 50 -n 3M -P 16 -r 100000 -d 64):
| Metric | Broken baseline | After T0a+T0b+T0c | Δ |
|---|---|---|---|
| SET p=1 (ratio Redis) | 0.99x | 1.12x | +13pp |
| SET p=16 | 1.42M/s | 1.94M/s | +37% |
| SET p=32 | 2.06M/s | 2.26M/s | +10% |
| GET p=16 | 2.40M/s | 4.04M/s | +68% |
| GET p=128 vs Redis | 1.87x | 1.91x | +4pp |
Perf fixes¶
-
T0a — Thread-local cached clock (4041b0d).
Entry::new_*constructors were callingSystemTime::now()/clock_gettimeon every write, showing up at 10.14% of CPU in the perf profile. Added a thread-localCell<u32>/Cell<u64>refreshed once per shard tick (~1 ms) fromCachedClock::update().current_secs/current_time_msnow read the Cell and fall back to the syscall only on tests / cold init.__kernel_clock_gettimedropped from 10.14% → 0% of CPU. -
T0b — Hot command dispatch bypasses phf SipHasher (4b0eec3). The command metadata registry is a
phf::Mapkeyed by&'static strusingSipHasher— cryptographic overkill for a 173-entry ASCII table. Combinedphf::Map::get -
SipHasher::write+hash_onewas ~6% of CPU. Added a direct match path incommand::metadata::lookup: pack the first ≤8 bytes of the command name as au64with ASCII letters uppercased, match against 24 hand-picked hot commands (GET/SET/DEL/TTL/MGET/MSET/INCR/DECR/HSET/HGET/HDEL/HLEN/LPOP/ RPOP/LLEN/PING/LPUSH/RPUSH/EXPIRE/EXISTS/INCRBY/DECRBY/SELECT/HGETALL). Hot-path resolves through a pre-resolvedLazyLock<[&'static CommandMeta; 24]>— single array index, no hashing. Cold commands fall through to phf unchanged. Correctness asserted byhot_path_matches_phf_maptest: every hot entry must return the same&'staticpointer as a direct phf probe, in both upper and lowercase. -
T0c — ACL unrestricted-user short-circuit (4603511). Every command executed
check_command_permission+check_key_permissioneven for the defaulton nopass ~* &* +@alluser, burning 2.11% of CPU on lowercasing,extract_command_keys, and glob matching. Added a cachedunrestricted: boolfield toAclUser, true iff the user is enabled, hasAllAllowedcommands, only~*read/write key patterns, and only*channel patterns. The threecheck_*_permissionmethods early-returnNoneonunrestrictedbefore any allocation or iteration. The cache is recomputed once at the end ofapply_rule(the single mutation entry point used by ACL SETUSER / LOAD / reset). Correctness covered by three new tests (default_user_is_unrestricted,restrictions_clear_unrestricted_flag,unrestricted_user_passes_all_checks).
Correctness fix (PR #43 review)¶
- Inline monoio fast-path restricted to GET (613c164). The previous inline
dispatch in
try_inline_dispatchhandled both GET and SET directly against the DashTable, bypassing replica READONLY enforcement, ACL checks, maxmemory eviction, client-side tracking invalidation, keyspace notifications, replication propagation, and blocking-waiter wakeups. Under any of those configurations the inlined SET would silently diverge from the normal path — accepted writes on replicas, ACL-denied clients writing, maxmemory overshoot, stale client-side caches. Fix: inline only handles*2\r\n$3\r\nGETnow; SET and everything else fall through to the full dispatcher where all side-effects run.
Cold-tier lock hygiene (PR #43 review)¶
- Release shard read guard before cold-tier disk read (ff51135). The
cold-tier fallback in
server::conn::blockingpreviously calledget_cold_value()— which does a synchronousstd::fs::read()— while still holding the per-shard read guard, blocking all concurrent operations on that shard during disk I/O. Split the path:Database::cold_lookup_locationreturns the(ColdLocation, PathBuf)under the lock, the guard is dropped, andcold_read::read_cold_entry_atperforms the disk read unlocked.
Additional PR #43 fixes¶
read_overflow_chainnow bounded at 1000 iterations (cycle guard against corruptednext_pagelinks)recovery.rsFPI replay replaces.unwrap()ontry_into()with explicit byte-array construction (coding-guidelines compliance)bench-production.sh: fixed unsupported-t zrangebyscore(→zpopmin), MSET rps parser for"MSET (10 keys):"output, heredoc$(date)expansion, and Redis RSS probe (pgrep//procinstead of missinglsof)bench-cold-tier.sh: removed stray&backgroundingFT.CREATEtest-recovery-all-cases.sh:NoPersistencecase now PASSes at 0 keysbenches/resp_parsing.rs,benches/get_hotpath.rs: wrapVec<Frame>inFrameVecvia.into()after frame.rs type change
All 1872 unit tests pass under --no-default-features --features
runtime-tokio,jemalloc. Follow-up work (T1 dispatch_raw zero-alloc entry
point, Tier 2 storage/DashTable optimization, residual ACL SipHash elimination)
captured as todo in .planning/todos/pending/.
Vector Search 4× QPS + Correctness (2026-04-07)¶
4x search QPS, 4.1x lower latency, 2.56x faster than Qdrant on real MiniLM data.
Performance (perf-profiled on GCloud c3-standard-8, Intel Xeon 8481C)¶
- 8-wide ILP unrolled
dist_bfs_budgetedsubcent path (the real hot loop, 90% of search time per perf profile). Loads 4 code bytes + 1 sign byte per iteration, 8 independent f32 accumulators. Confirmed via objdump: parallelvaddssinto xmm3-xmm8 (vs serial single-xmm0 chain before). - 4-way unrolled
dist_bfsnon-subcent path withunsafepointer arithmetic - Pre-allocated ADC LUT in
SearchScratch(eliminates 32-65KB heap alloc per query) - Hoisted IVF
q_rotatedandlut_bufallocation out of per-segment loop
Correctness fixes¶
FT.COMPACTsilent no-op: splittry_compact(threshold-gated) fromforce_compact(unconditional). PreviouslyFT.COMPACTreturned OK without compacting whencompact_threshold >= mutable_len, leaving all vectors in brute-force O(n) mutable segment.key_hash_to_keymapping restored (lost in earlier refactor).FT.SEARCHnow returns original Redis keys (doc:N) instead ofvec:<internal_id>. Carried throughSearchResult.key_hashand populated byremap_to_global_ids.FT.INFO num_docsnow sums mutable + immutable segments (was 0 after compact)- Vector index recovery metadata loads without
--disk-offloadflag (was gated behindserver_config.disk_offload_enabled())
Real MiniLM benchmarks (10K vectors, 384d, x86 Xeon 8481C)¶
| Metric | Mar 31 (M4 Pro) | Apr 7 (Xeon 8481C) | Δ |
|---|---|---|---|
| Recall@10 | 0.9250 | 0.9670 | +4.5% |
| QPS | 1,126 | 1,296 | +15% |
| p50 | 0.878 ms | 0.783 ms | -11% |
| Moon | Qdrant 1.12 FP32 | Ratio | |
|---|---|---|---|
| QPS (10K MiniLM) | 1,296 | 507 | 2.56x |
| p50 | 0.783 ms | 1.79 ms | 2.29x lower |
| Recall@10 | 0.967 | ~0.95 | +1.7% |
Infrastructure (for future segment merge work)¶
ImmutableSegment::decode_vector/iter_live_decodedMutableSegment::iter_live
Attempted and reverted¶
Segment merge on FT.COMPACT via TQ4 decode → re-encode. Dropped recall from
0.73 → 0.0005 due to accumulated quantization error across 14 segments. Proper
fix requires retaining f32/f16 vectors alongside TQ codes in immutable segments.
Known limitation¶
TQ4 quantization at 384d with random Gaussian inputs hits ~0.73 recall floor (curse of dimensionality — all points nearly equidistant). Real semantic embeddings (clustered) achieve 0.92-0.97 recall with the same code.
Disk Offload & x86_64 Performance (2026-04-06)¶
Tiered storage, crash recovery, and 2× Redis on x86_64 (Intel Xeon, io_uring).
Added¶
Disk Offload (Tiered Storage)¶
--disk-offload enable— evicted keys under maxmemory are spilled to NVMe instead of being deleted- Async SpillThread: background pwrite via dedicated
std::threadper shard (no event loop blocking) - Cold read-through: GET transparently reads spilled keys from NVMe DataFiles
- ColdIndex: in-memory key→file mapping, updated immediately on eviction for consistent reads
- SpillThread channel capacity: 4096 bounded flume channel for burst absorption
--disk-offload-dir,--disk-offload-thresholdconfiguration flags
Crash Recovery¶
- V3 recovery falls back to appendonly.aof when WAL v3 has 0 commands
- V2 recovery falls back to appendonly.aof when shard WAL has 0 commands
- Automatic
--dircreation before AOF writer starts (fixes silent write failure) - Cold index rebuilt from manifest during v3 recovery
- Verified: 100% recovery (5000/5000 keys) across 7 persistence configurations after SIGKILL
Inline GET Optimization¶
read_db+get_if_alivereplaceswrite_db+ triple-lookupget()— single DashTable probe- Removed unnecessary write lock for timestamp refresh before inline dispatch
- Multi-shard inline dispatch: local keys bypass Frame construction via
key_to_shard()check - Cold storage fallback in
get_readonlyand inline GET dispatch paths
Changed¶
- Connection handler eviction uses
try_evict_if_needed_async_spillwhen disk offload enabled spawn_monoio_connectionpasses spill sender, file ID counter, and offload dir to handlers- Event loop syncs
next_file_idbetweenRc<Cell<u64>>(handlers) and local variable (timer tick) - Inline dispatch
try_inline_dispatchtakesnow_msandnum_shardsparameters
Fixed¶
- Data loss under maxmemory: evicted keys were silently deleted instead of spilled to disk (6 bugs)
- Crash recovery = 0 keys: appendonly.aof never tried as fallback source
- AOF writer silent failure:
--dirdirectory not created before AOF writer task started - Cold read miss:
get_if_alive(read path) didn't check cold storage;get_readonlyreturned NULL for spilled keys - ColdIndex never initialized:
cold_indexandcold_shard_dirwere None on all databases at startup
Performance (GCP c3-standard-8, Intel Xeon 8481C, CPU-pinned)¶
| Metric | Before | After |
|---|---|---|
| c=1 p=1 GET vs Redis | 0.35x (47K) | 1.0x (47K) — parity |
| c=10 p=64 GET | 2.29M | 4.71M (2.06x Redis) |
| c=50 p=64 GET | 2.36M | 4.81M (2.04x Redis) |
| Disk offload GET overhead | N/A | <1% vs no-persist |
| Recovery (SIGKILL) | 0/5000 | 5000/5000 (100%) |
[0.1.2] - 2026-03-29¶
Multi-shard scaling milestone. Eliminated negative scaling, achieving 5M GET/s and 2.5M SET/s at 4 shards — both exceeding Redis 8.6.1.
Added¶
Shared-Read Direct Access (Phase 49)¶
Arc<ShardDatabases>withparking_lot::RwLock<Database>replacesRc<RefCell<Vec<Database>>>- Cross-shard read commands (GET, HGET, SCARD, ZRANGE, etc.) bypass SPSC channels entirely via
read_db()+dispatch_read()— reduces cross-shard read latency from ~88μs to ~56ns - Local read path uses shared
read_db()lock instead of exclusivewrite_db()— eliminates RwLock contention between shards
Connection Affinity (Phase 50)¶
AffinityTrackersamples first 16 commands per connection to detect dominant shard- Lazy FD migration: if ≥60% of keys target a non-local shard, migrates the TCP connection's file descriptor to the target shard via
ShardMessage::MigrateConnection MigratedConnectionStatepreserves selected_db, client_name, protocol_version across migration- Graceful fallback: if migration fails, connection stays on current shard with shared-read
Pre-Allocated Response Slots (Phase 51)¶
ResponseSlotPoolwith lock-freeAtomicU8state machine for zero-allocation cross-shard write dispatch (Tokio path)- Eliminates per-dispatch
channel::oneshot()heap allocation (~80-120ns savings per cross-shard write)
SO_REUSEPORT Per-Shard Accept (Phase 52)¶
- Each shard opens its own TCP listener with
SO_REUSEPORTon Linux viasocket2crate - Kernel distributes connections across shard listeners using consistent 4-tuple hashing
- macOS/non-Linux: falls back to single-listener + MPSC round-robin (no behavior change)
jemalloc Production Tuning (Phase 53)¶
malloc_confstatic:percpu_arena:percpu,background_thread:true,metadata_thp:auto,dirty_decay_ms:5000,muzzy_decay_ms:30000,abort_conf:true- Closes ~50% of allocation speed gap with mimalloc while retaining jemalloc's superior fragmentation behavior
New Commands (Phase 55)¶
- GETRANGE — return substring of stored string value
- SETRANGE — overwrite part of stored string at offset with zero-fill
- SUBSTR — alias for GETRANGE (Redis 1.x compatibility)
Changed¶
- Custom
AtomicU8oneshot channel replaced withflume::bounded(1)for cross-thread safety on monoio's!Sendexecutor pending_wakersrelay pattern: event loop locally wakes connection tasks after SPSC processing, bridging monoio's cross-thread waker limitationwrite_db()usestry_write()spin loop instead of blockingwrite()— prevents OS thread freeze on monoio when cross-shard readers hold locks- Benchmark scripts:
scripts/bench-scaling.shfor multi-shard test matrix,scripts/bench-production.shupdated
Fixed¶
- ResponseSlot
UnsafeCell<Option<Waker>>data race on ARM64 — replaced withAtomicWaker - Local read path took exclusive write lock (
write_db()) even for GET — split intoread_db()+dispatch_read() - Monoio local write path silently dropped responses (
responses.push(response)missing after read/write split) — all write commands (SET, INCR, LPUSH, etc.) hung on monoio - Pipeline ordering guard:
!remote_groups.contains_key(&target)prevents stale reads when batch has pending writes for same shard
Performance¶
| Metric | Before (v0.1.0) | After (v0.1.2) | Change |
|---|---|---|---|
| Multi-shard GET p=16 | 688K (0.38x Redis) | 1,923K (1.17x Redis) | 2.8x |
| Multi-shard GET p=64 | N/A | 5,002K (1.60x Redis) | New |
| Multi-shard SET p=16 | N/A | 1,515K (1.32x Redis) | New |
| Multi-shard SET p=64 | N/A | 2,500K (1.55x Redis) | New |
| Monoio 1s p=128 GET | 5,407K | 5,005K (1.25x Redis) | Maintained |
| Negative scaling | -25% at 12 shards | Zero at 1-8 shards | Eliminated |
| Command coverage p=1 | Parity | Monoio beats Redis 8/10 | Improved |
[0.1.1] - 2026-03-28¶
Structural stability milestone. Codebase refactoring for maintainability — no feature changes, no performance changes.
Changed¶
Error and State Foundations (Phase 44)¶
- Unified
MoonErrortype hierarchy with structured#[source]on I/O variants carryingPathBufcontext ConnectionContextstruct for connection state (selected_db, authenticated, client_name, protocol_version)- Criterion benchmark baseline (GET dispatch 69.1ns) to guard against regressions
Command Metadata Registry (Phase 45)¶
phfstatic perfect hash map for O(1) command lookup (112 commands)CommandMetastruct: name, arity, flags (read/write/fast/admin), key positions, ACL categoriesis_write()classification via const bitflags — replaces duplicated match arms across codebase
Persistence Hardening (Phase 46)¶
- Eliminated server-crashing
unwrap()calls in WAL, AOF, and RDB persistence code - Corruption recovery: WAL uses per-block CRC32 log+skip, AOF seeks to next RESP
*marker, RDB breaks on mid-stream corruption WalWritermethods remainstd::io::Result(must-panic on flush = data loss prevention)
AOF Replay Decoupling (Phase 47)¶
CommandReplayEnginetrait breaks circular dependency between persistence and command dispatchStorageEnginetrait boundary for persistence replay and Lua scriptingexecute_command()at command level (not individual get/set methods)
God-File Decomposition (Phase 48)¶
connection.rs(5,102 lines) → 6 sub-modules inconn/:handler_sharded.rs,handler_monoio.rs,handler_single.rs,shared.rs,blocking.rs,conn_state.rsshard/mod.rs(2,004 lines) → 6 sub-modules:event_loop.rs,spsc_handler.rs,persistence_tick.rs,conn_accept.rs,timers.rs,uring_handler.rs- Module facade pattern with
pub(crate)re-exports preserving all external import paths - No single file exceeds 800 lines
Added¶
- Docker: optimized multi-stage build (113MB → 41MB)
- Mintlify documentation site
- Claude Code GitHub workflow for PR reviews
scripts/bench-resources.shfor memory/CPU efficiency benchmarking
0.1.0 - 2026-03-27¶
Initial release. A Redis-compatible in-memory data store written in Rust, achieving 1.84-1.99x Redis throughput at 8 shards and 27-35% less memory for 1KB+ values.
Added¶
Core Data Types (Phases 1-5)¶
- RESP2 protocol parser and serializer with inline command support
- TCP server with concurrent connections, graceful shutdown, and
redis-clicompatibility - String commands: GET, SET, MGET, MSET, INCR/DECR, APPEND, GETEX, GETDEL (17 commands)
- Hash commands: HSET, HGET, HGETALL, HINCRBY, HSCAN (14 commands)
- List commands: LPUSH, RPUSH, LPOP, RPOP, LRANGE, LPOS (12 commands)
- Set commands: SADD, SREM, SINTER, SUNION, SDIFF, SPOP (15 commands)
- Sorted Set commands: ZADD, ZRANGE, ZRANGEBYSCORE, ZINCRBY, ZPOPMIN (18 commands)
- Key management: DEL, EXISTS, EXPIRE, TTL, SCAN, KEYS, RENAME (13 commands)
- Lazy + active key expiration with probabilistic sampling
- RDB persistence with point-in-time snapshots
- AOF persistence with configurable fsync (always/everysec/no)
- Pub/Sub messaging: SUBSCRIBE, PUBLISH, PSUBSCRIBE (4 commands)
- Transactions: MULTI/EXEC/DISCARD with WATCH optimistic locking
- LRU/LFU/random eviction policies with configurable maxmemory
Performance Architecture (Phases 6-15)¶
- SIMD-accelerated RESP parsing via memchr CRLF scanning and atoi
- CompactValue 16-byte SSO struct with embedded TTL delta
- DashTable segmented hash table with Swiss Table SIMD probing
- Thread-per-core shared-nothing architecture with per-shard event loops
- io_uring networking layer with multishot accept/recv and registered buffers
- Per-shard memory management with jemalloc and bumpalo arenas
- Forkless compartmentalized persistence (no COW memory spike)
- B+ tree sorted sets replacing BTreeMap for cache-friendly access
- Per-connection arena allocation with bumpalo
Protocol & Data Types (Phases 16-18)¶
- RESP3 protocol: Map, Set, Double, Boolean, VerbatimString, Push frames
- HELLO command for protocol negotiation
- Client-side caching invalidation via Push frames
- Blocking commands: BLPOP, BRPOP, BLMOVE, BZPOPMIN, BZPOPMAX
- Streams data type: XADD, XREAD, XRANGE, XGROUP, XREADGROUP, XACK, XPENDING, XCLAIM, XAUTOCLAIM
Clustering & Replication (Phases 19-20, 26)¶
- PSYNC2-compatible replication with per-shard WAL streaming
- Partial resync support with replication backlog
- Cluster mode with 16,384 hash slots and gossip protocol
- MOVED/ASK redirections and live slot migration
- Majority consensus failover election with automatic promotion
Scripting & Security (Phases 21-22, 43)¶
- Lua 5.4 scripting via mlua: EVAL, EVALSHA, SCRIPT LOAD/EXISTS/FLUSH
- Sandboxed Lua VM with Redis API bindings (redis.call, redis.pcall)
- ACL system: per-user command/key/channel permissions
- ACL SETUSER, GETUSER, DELUSER, LIST, WHOAMI, LOG, SAVE, LOAD
- TLS 1.3 via rustls + aws-lc-rs with dual-port support
- mTLS client authentication
- Protected mode (reject non-loopback when no password set)
Optimization (Phases 24-42)¶
- WAL v2 format: checksums, header, block framing, corruption isolation
- CompactKey SSO: 23-byte inline keys, eliminating heap allocation
- Response buffer pooling and adaptive pipeline batching
- Dual runtime: Tokio (all platforms) + Monoio (Linux io_uring / macOS kqueue)
- Full Monoio migration: channels, TCP, codec, spawn, persistence, replication
- Direct GET serialization bypassing Frame allocation
- Zero-copy argument slicing from parse buffer
- Lock-free oneshot channels (12% CPU reduction vs tokio::oneshot)
- CachedClock timestamp caching (4% throughput gain)
- HeapString for values (eliminates Arc overhead)
- Inline dispatch for single-shard commands
Performance¶
| Benchmark | Result |
|---|---|
| Peak GET throughput | 3.79M ops/sec (4 shards, p=64) |
| Peak SET with AOF | 2.78M ops/sec (AOF everysec, p=64) |
| vs Redis (pipeline=64) | 3.17x SET, 2.50x GET |
| vs Redis (8 shards, p=16) | 1.84-1.99x |
| vs Redis with AOF | 2.75x (per-shard WAL vs global) |
| Memory (1KB+ values) | 27-35% less than Redis |
| Memory (empty server) | Identical 7.0 MB baseline |
| p50 latency (8 shards) | 0.031ms (Redis: 0.26ms) |
| Data consistency | 132/132 tests pass |
Technical Details¶
- Language: Rust (stable, edition 2024)
- Lines of code: ~54,000 across 96 files
- Dependencies: tokio, monoio, jemalloc, rustls, mlua, bumpalo, bytes, clap
- Supported platforms: Linux (io_uring via Monoio), macOS (kqueue via Monoio or Tokio)
- Build time: ~50s release build