Benchmarks¶
Note
Production benchmark numbers are measured on Linux (GCloud c3-standard-8 x86_64 / t2a-standard-8 ARM64) — see the canonical BENCHMARK.md for the full report (raw throughput, persistence, vector, graph, latency, variance notes). The detailed per-table figures below are an Apple M4 Pro (12 cores, 24 GB) development reference; absolute numbers differ from the Linux production figures, but the Moon/Redis ratios are representative. All runs co-locate client and server using redis-benchmark, fresh server instance per memory data point.
Executive summary¶
Headline figures vs Redis 8.6.1 on the Linux production reference (GCloud c3-standard-8 x86_64, peak throughput):
| Metric | Moon vs Redis | Conditions |
|---|---|---|
| Peak throughput (GET) | 5.11M ops/sec (1.72×) | GCloud x86_64, p=64 |
| Peak throughput (SET) | 3.50M ops/sec (1.92×) | GCloud x86_64, p=64 |
| Peak GET (ARM64) | 3.47M ops/sec (2.20×) | GCloud Neoverse-N1, p=64 |
| Memory (1KB+ values) | 27-35% less | 1-shard, per-key RSS |
| Vector search (384d) | 12.7K QPS | HNSW + TurboQuant, COSINE |
| With AOF persistence | ~1.9× Redis | SET, pipeline=64 |
| Data correctness | 132/132 tests | All types, 1/4/12 shards |
Memory efficiency¶
Per-key memory (1-shard, string keys)¶
| Value size | Keys | Redis/key | Moon/key | Winner | Ratio |
|---|---|---|---|---|---|
| 32 B | ~63K | 118 B | 147 B | Redis | 0.80x |
| 256 B | ~63K | 412 B | 407 B | Tied | 1.01x |
| 1,024 B | ~63K | 1,879 B | 1,207 B | Moon | 1.56x |
| 4,096 B | ~63K | 5,131 B | 4,352 B | Moon | 1.18x |
At 1M keys:
| Value size | Redis RSS | Moon RSS | Redis/key | Moon/key | Winner |
|---|---|---|---|---|---|
| 32 B | 78.2 MB | 95.8 MB | 118 B | 147 B | Redis |
| 256 B | 231.5 MB | 234.4 MB | 372 B | 376 B | Tied |
| 1,024 B | 954.2 MB | 703.0 MB | 1,571 B | 1,153 B | Moon |
Tip
Moon's advantage comes from HeapString(Vec<u8>) (48 bytes overhead) vs Redis's robj + SDS chain (~64-80 bytes overhead). TTL is packed as a 4-byte delta inside CompactEntry at zero extra cost, while Redis allocates a separate 24-byte dictEntry per expiring key.
Baseline RSS¶
| Server | RSS |
|---|---|
| Redis 8.6.1 | 7.0 MB |
| Moon (1 shard) | 7.0 MB |
| Moon (12 shards) | 15.7 MB |
Throughput¶
Single-shard SET (pipeline=16, 50 clients)¶
| Value size | Redis SET/s | Moon SET/s | Ratio |
|---|---|---|---|
| 32 B | 1,298,701 | 1,754,386 | 1.35x |
| 256 B | 1,219,512 | 1,639,344 | 1.34x |
| 1,024 B | 1,010,101 | 1,030,928 | 1.02x |
| 4,096 B | 540,541 | 571,429 | 1.06x |
Multi-shard peak throughput¶
| Config | Moon | Redis | Ratio |
|---|---|---|---|
| 8-shard GET p=16 c=50 | 2.60M | 1.41M | 1.84x |
| 8-shard SET p=16 c=50 | 2.52M | 1.27M | 1.99x |
| 4-shard GET p=64 c=50 | 3.79M | 2.41M | 1.57x |
CPU efficiency¶
| Pipeline | Redis CPU | Moon CPU | Redis RPS | Moon RPS | RPS ratio |
|---|---|---|---|---|---|
| p=1 | 97.2% | 91.1% | 169K | 148K | 0.87x |
| p=8 | 100.0% | 3.3% | 1.14M | 1.11M | 0.97x |
| p=16 | 100.0% | 1.9% | 1.95M | 1.97M | 1.01x |
| p=64 | 43.9% | 1.9% | 2.42M | 4.13M | 1.71x |
At pipeline=64, Moon delivers 1.71x the throughput of Redis while using 23x less CPU.
Persistence (AOF) performance¶
| Pipeline | Moon SET/s | vs Redis (no AOF) | vs Redis (AOF everysec) |
|---|---|---|---|
| p=1 | 146K | 0.95x | 0.95x |
| p=8 | 1,117K | 1.68x | 1.68x |
| p=16 | 1,887K | 1.90x | 2.21x |
| p=64 | 2,778K | 1.80x | 2.75x |
Note
Moon's per-shard WAL avoids the global serialization point that Redis's single AOF file introduces. The advantage grows with pipeline depth because per-shard WAL scales linearly with shards.
Latency¶
| Metric | Redis | Moon | Improvement |
|---|---|---|---|
| p50 latency (8-shard) | 0.26-0.33 ms | 0.031 ms | 8-10x lower |
Multi-core parallelism reduces per-shard queue depth, so the median request sees less waiting time.
Production workload patterns¶
| Scenario | Description | Moon vs Redis |
|---|---|---|
| Session store | 80% GET / 15% SET, 512B values | 1.24x |
| Rate limiting | INCR with 100-200 clients | 1.15x |
| Leaderboard | ZADD + ZRANGEBYSCORE | 1.06-1.25x |
| App caching | 1KB-4KB values, MSET batch | 1.10-1.27x |
| Job queue | LPUSH/RPOP producer-consumer | 1.06x |
| User profiles | HSET, HGET | 1.10x |
How to reproduce¶
# Build with native CPU optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Memory and CPU benchmark
./scripts/bench-resources.sh --shards 1
# Production workload scenarios
./scripts/bench-production.sh --shards 1
# Multi-shard scaling
./scripts/bench-production.sh --shards 4
./scripts/bench-production.sh --shards 8
# Data consistency tests
./scripts/test-consistency.sh --shards 1
./scripts/test-consistency.sh --shards 4
Warning
Co-located benchmarks (client and server on the same machine) are conservative. Separate-machine benchmarks with dedicated NICs show higher throughput. Always use redis-benchmark -r <num_keys> to generate unique keys. Use redis-benchmark 8.x which correctly handles \r in progress output.