Monitoring¶
Moon exposes a Prometheus-compatible metrics endpoint on its admin HTTP port. This guide covers enabling the admin port, scraping metrics, and setting up basic alerting.
Enable the admin port¶
Start Moon with --admin-port to expose the HTTP endpoints:
This serves three endpoints:
| Endpoint | Description |
|---|---|
GET /metrics |
Prometheus metrics in exposition format |
GET /healthz |
Health check -- returns 200 OK when the server is running |
GET /readyz |
Readiness check -- returns 200 OK when the server is accepting commands |
Verify it is working:
Prometheus configuration¶
Add Moon as a scrape target in your prometheus.yml:
scrape_configs:
- job_name: "moon"
scrape_interval: 15s
static_configs:
- targets: ["127.0.0.1:9100"]
labels:
instance: "moon-primary"
For multiple Moon instances or sharded deployments, list each instance:
scrape_configs:
- job_name: "moon"
scrape_interval: 15s
static_configs:
- targets:
- "moon-1:9100"
- "moon-2:9100"
- "moon-3:9100"
Key metrics¶
Moon exposes standard Redis-compatible INFO metrics through the Prometheus endpoint. Key metrics to monitor include:
moon_connected_clients-- current number of connected clientsmoon_used_memory_bytes-- total memory used by the servermoon_commands_processed_total-- total commands processed (rate = ops/sec)moon_keyspace_hits_total-- successful key lookupsmoon_keyspace_misses_total-- failed key lookups (cache miss rate)moon_evicted_keys_total-- keys evicted due to maxmemorymoon_expired_keys_total-- keys removed by expiration
Grafana dashboard¶
Import the metrics into Grafana for visualization. A minimal dashboard should include:
- Operations rate --
rate(moon_commands_processed_total[5m]) - Hit rate --
moon_keyspace_hits_total / (moon_keyspace_hits_total + moon_keyspace_misses_total) - Memory usage --
moon_used_memory_bytes - Connected clients --
moon_connected_clients - Eviction rate --
rate(moon_evicted_keys_total[5m])
Health check integration¶
Use the /healthz and /readyz endpoints with your orchestrator:
Kubernetes¶
livenessProbe:
httpGet:
path: /healthz
port: 9100
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 9100
initialDelaySeconds: 5
periodSeconds: 5
Docker Compose¶
services:
moon:
image: moon:latest
command: ["--port", "6379", "--admin-port", "9100"]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9100/healthz"]
interval: 10s
timeout: 5s
retries: 3
Alerting rules¶
Example Prometheus alerting rules:
groups:
- name: moon_alerts
rules:
- alert: MoonDown
expr: up{job="moon"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Moon instance {{ $labels.instance }} is down"
- alert: MoonHighMemory
expr: moon_used_memory_bytes / moon_maxmemory_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Moon instance {{ $labels.instance }} is above 90% memory"
- alert: MoonHighEvictionRate
expr: rate(moon_evicted_keys_total[5m]) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Moon instance {{ $labels.instance }} is evicting >100 keys/sec"