Skip to content

Benchmarks¶

Note

Production benchmark numbers are measured on Linux (GCloud c3-standard-8 x86_64 / t2a-standard-8 ARM64) — see the canonical BENCHMARK.md for the full report (raw throughput, persistence, vector, graph, latency, variance notes). The detailed per-table figures below are an Apple M4 Pro (12 cores, 24 GB) development reference; absolute numbers differ from the Linux production figures, but the Moon/Redis ratios are representative. All runs co-locate client and server using redis-benchmark, fresh server instance per memory data point.

Executive summary¶

Headline figures vs Redis 8.6.1 on the Linux production reference (GCloud c3-standard-8 x86_64, peak throughput):

Metric Moon vs Redis Conditions
Peak throughput (GET) 5.11M ops/sec (1.72×) GCloud x86_64, p=64
Peak throughput (SET) 3.50M ops/sec (1.92×) GCloud x86_64, p=64
Peak GET (ARM64) 3.47M ops/sec (2.20×) GCloud Neoverse-N1, p=64
Memory (1KB+ values) 27-35% less 1-shard, per-key RSS
Vector search (384d) 12.7K QPS HNSW + TurboQuant, COSINE
With AOF persistence ~1.9× Redis SET, pipeline=64
Data correctness 132/132 tests All types, 1/4/12 shards

Memory efficiency¶

Per-key memory (1-shard, string keys)¶

Value size Keys Redis/key Moon/key Winner Ratio
32 B ~63K 118 B 147 B Redis 0.80x
256 B ~63K 412 B 407 B Tied 1.01x
1,024 B ~63K 1,879 B 1,207 B Moon 1.56x
4,096 B ~63K 5,131 B 4,352 B Moon 1.18x

At 1M keys:

Value size Redis RSS Moon RSS Redis/key Moon/key Winner
32 B 78.2 MB 95.8 MB 118 B 147 B Redis
256 B 231.5 MB 234.4 MB 372 B 376 B Tied
1,024 B 954.2 MB 703.0 MB 1,571 B 1,153 B Moon

Tip

Moon's advantage comes from HeapString(Vec<u8>) (48 bytes overhead) vs Redis's robj + SDS chain (~64-80 bytes overhead). TTL is packed as a 4-byte delta inside CompactEntry at zero extra cost, while Redis allocates a separate 24-byte dictEntry per expiring key.

Baseline RSS¶

Server RSS
Redis 8.6.1 7.0 MB
Moon (1 shard) 7.0 MB
Moon (12 shards) 15.7 MB

Throughput¶

Single-shard SET (pipeline=16, 50 clients)¶

Value size Redis SET/s Moon SET/s Ratio
32 B 1,298,701 1,754,386 1.35x
256 B 1,219,512 1,639,344 1.34x
1,024 B 1,010,101 1,030,928 1.02x
4,096 B 540,541 571,429 1.06x

Multi-shard peak throughput¶

Config Moon Redis Ratio
8-shard GET p=16 c=50 2.60M 1.41M 1.84x
8-shard SET p=16 c=50 2.52M 1.27M 1.99x
4-shard GET p=64 c=50 3.79M 2.41M 1.57x

CPU efficiency¶

Pipeline Redis CPU Moon CPU Redis RPS Moon RPS RPS ratio
p=1 97.2% 91.1% 169K 148K 0.87x
p=8 100.0% 3.3% 1.14M 1.11M 0.97x
p=16 100.0% 1.9% 1.95M 1.97M 1.01x
p=64 43.9% 1.9% 2.42M 4.13M 1.71x

At pipeline=64, Moon delivers 1.71x the throughput of Redis while using 23x less CPU.

Persistence (AOF) performance¶

Pipeline Moon SET/s vs Redis (no AOF) vs Redis (AOF everysec)
p=1 146K 0.95x 0.95x
p=8 1,117K 1.68x 1.68x
p=16 1,887K 1.90x 2.21x
p=64 2,778K 1.80x 2.75x

Note

Moon's per-shard WAL avoids the global serialization point that Redis's single AOF file introduces. The advantage grows with pipeline depth because per-shard WAL scales linearly with shards.

Latency¶

Metric Redis Moon Improvement
p50 latency (8-shard) 0.26-0.33 ms 0.031 ms 8-10x lower

Multi-core parallelism reduces per-shard queue depth, so the median request sees less waiting time.

Production workload patterns¶

Scenario Description Moon vs Redis
Session store 80% GET / 15% SET, 512B values 1.24x
Rate limiting INCR with 100-200 clients 1.15x
Leaderboard ZADD + ZRANGEBYSCORE 1.06-1.25x
App caching 1KB-4KB values, MSET batch 1.10-1.27x
Job queue LPUSH/RPOP producer-consumer 1.06x
User profiles HSET, HGET 1.10x

How to reproduce¶

# Build with native CPU optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Memory and CPU benchmark
./scripts/bench-resources.sh --shards 1

# Production workload scenarios
./scripts/bench-production.sh --shards 1

# Multi-shard scaling
./scripts/bench-production.sh --shards 4
./scripts/bench-production.sh --shards 8

# Data consistency tests
./scripts/test-consistency.sh --shards 1
./scripts/test-consistency.sh --shards 4

Warning

Co-located benchmarks (client and server on the same machine) are conservative. Separate-machine benchmarks with dedicated NICs show higher throughput. Always use redis-benchmark -r <num_keys> to generate unique keys. Use redis-benchmark 8.x which correctly handles \r in progress output.