Skip to content

Commit 86317c0

Browse files
[Docs] update grafana setup guide in production metrics (#5643)
Co-authored-by: NoahM <[email protected]>
1 parent daed453 commit 86317c0

File tree

1 file changed

+70
-26
lines changed

1 file changed

+70
-26
lines changed

docs/references/production_metrics.md

Lines changed: 70 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -127,44 +127,88 @@ sglang:num_queue_reqs{model_name="meta-llama/Llama-3.1-8B-Instruct"} 2826.0
127127

128128
## Setup Guide
129129

130-
To setup a monitoring dashboard, you can use the following docker compose file: [examples/monitoring/docker-compose.yaml](../examples/monitoring/docker-compose.yaml).
130+
This section describes how to set up the monitoring stack (Prometheus + Grafana) provided in the `examples/monitoring` directory.
131131

132-
Assume you have sglang server running at `localhost:30000`, to start the server, ensure you have `--enable-metrics` flag enabled:
132+
### Prerequisites
133133

134-
```bash
135-
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
136-
--port 30000 --host 0.0.0.0 --enable-metrics
137-
```
138-
139-
To start the monitoring dashboard (prometheus + grafana), cd to `examples/monitoring` and run:
134+
- Docker and Docker Compose installed
135+
- SGLang server running with metrics enabled
140136

141-
```bash
142-
docker compose -f compose.yaml -p monitoring up
143-
```
137+
### Usage
144138

145-
Then you can access the Grafana dashboard at http://localhost:3000.
139+
1. **Start your SGLang server with metrics enabled:**
146140

147-
### Grafana Dashboard
141+
```bash
142+
python -m sglang.launch_server --model-path <your_model_path> --port 30000 --enable-metrics
143+
```
144+
Replace `<your_model_path>` with the actual path to your model (e.g., `meta-llama/Meta-Llama-3.1-8B-Instruct`). Ensure the server is accessible from the monitoring stack (you might need `--host 0.0.0.0` if running in Docker). By default, the metrics endpoint will be available at `http://<sglang_server_host>:30000/metrics`.
148145

149-
In a new Grafana setup, ensure that you have the `Prometheus` data source enabled. To check that, go to `http://localhost:3000/connections/datasources` and ensure that `Prometheus` is enabled.
146+
2. **Navigate to the monitoring example directory:**
147+
```bash
148+
cd examples/monitoring
149+
```
150150

151-
If not, click `Add data source` -> `Prometheus`, set Prometheus URL to `http://localhost:9090`, and click `Save & Test`.
151+
3. **Start the monitoring stack:**
152+
```bash
153+
docker compose up -d
154+
```
155+
This command will start Prometheus and Grafana in the background.
152156

153-
To import the Grafana dashboard, click `+` -> `Import` -> `Upload JSON file` -> `Upload` and select [grafana.json](../examples/monitoring/grafana/dashboards/json/sglang-dashboard.json).
157+
4. **Access the monitoring interfaces:**
158+
* **Grafana:** Open your web browser and go to [http://localhost:3000](http://localhost:3000).
159+
* **Prometheus:** Open your web browser and go to [http://localhost:9090](http://localhost:9090).
154160

155-
### Troubleshooting
161+
5. **Log in to Grafana:**
162+
* Default Username: `admin`
163+
* Default Password: `admin`
164+
You will be prompted to change the password upon your first login.
156165

157-
#### Check if the variables are created
166+
6. **View the Dashboard:**
167+
The SGLang dashboard is pre-configured and should be available automatically. Navigate to `Dashboards` -> `Browse` -> `SGLang Monitoring` folder -> `SGLang Dashboard`.
158168

159-
The example dashboard assume you have the following variables avaliable:
160-
- `model_name` (name: `model_name`, label: `model name`, Data source: `Prometheus`, Type: `Label values`)
161-
- `instance` (name: `instance`, label: `instance`, Data source: `Prometheus`, Type: `Label values`)
162-
163-
If you don't have these variables, you can create them manually.
164-
165-
To create a variable, go to dashboard settings, `Variables` -> `New variable`.
169+
### Troubleshooting
166170

167-
You should be able to see the preview the values (e.g. `meta-llama/Llama-3.1-8B-Instruct` for `model_name`).
171+
* **Port Conflicts:** If you encounter errors like "port is already allocated," check if other services (including previous instances of Prometheus/Grafana) are using ports `9090` or `3000`. Use `docker ps` to find running containers and `docker stop <container_id>` to stop them, or use `lsof -i :<port>` to find other processes using the ports. You might need to adjust the ports in the `docker-compose.yaml` file if they permanently conflict with other essential services on your system.
172+
173+
To modify Grafana's port to the other one(like 3090) in your Docker Compose file, you need to explicitly specify the port mapping under the grafana service.
174+
175+
Option 1: Add GF_SERVER_HTTP_PORT to the environment section:
176+
```
177+
environment:
178+
- GF_AUTH_ANONYMOUS_ENABLED=true
179+
- GF_SERVER_HTTP_PORT=3090 # <-- Add this line
180+
```
181+
Option 2: Use port mapping:
182+
```
183+
grafana:
184+
image: grafana/grafana:latest
185+
container_name: grafana
186+
ports:
187+
- "3090:3000" # <-- Host:Container port mapping
188+
```
189+
* **Connection Issues:**
190+
* Ensure both Prometheus and Grafana containers are running (`docker ps`).
191+
* Verify the Prometheus data source configuration in Grafana (usually auto-configured via `grafana/datasources/datasource.yaml`). Go to `Connections` -> `Data sources` -> `Prometheus`. The URL should point to the Prometheus service (e.g., `http://prometheus:9090`).
192+
* Confirm that your SGLang server is running and the metrics endpoint (`http://<sglang_server_host>:30000/metrics`) is accessible *from the Prometheus container*. If SGLang is running on your host machine and Prometheus is in Docker, use `host.docker.internal` (on Docker Desktop) or your machine's network IP instead of `localhost` in the `prometheus.yaml` scrape configuration.
193+
* **No Data on Dashboard:**
194+
* Generate some traffic to your SGLang server to produce metrics. For example, run a benchmark:
195+
```bash
196+
python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 100 --random-input 128 --random-output 128
197+
```
198+
* Check the Prometheus UI (`http://localhost:9090`) under `Status` -> `Targets` to see if the SGLang endpoint is being scraped successfully.
199+
* Verify the `model_name` and `instance` labels in your Prometheus metrics match the variables used in the Grafana dashboard. You might need to adjust the Grafana dashboard variables or the labels in your Prometheus configuration.
200+
201+
### Configuration Files
202+
203+
The monitoring setup is defined by the following files within the `examples/monitoring` directory:
204+
205+
* `docker-compose.yaml`: Defines the Prometheus and Grafana services.
206+
* `prometheus.yaml`: Prometheus configuration, including scrape targets.
207+
* `grafana/datasources/datasource.yaml`: Configures the Prometheus data source for Grafana.
208+
* `grafana/dashboards/config/dashboard.yaml`: Tells Grafana to load dashboards from the specified path.
209+
* `grafana/dashboards/json/sglang-dashboard.json`: The actual Grafana dashboard definition in JSON format.
210+
211+
You can customize the setup by modifying these files. For instance, you might need to update the `static_configs` target in `prometheus.yaml` if your SGLang server runs on a different host or port.
168212

169213
#### Check if the metrics are being collected
170214

0 commit comments

Comments
 (0)