-
-
Notifications
You must be signed in to change notification settings - Fork 777
Memory leak: per-request voice tensor reload causes unbounded RSS growth #453
Description
Summary
The generate() method in kokoro_v1.py performs a full voice tensor round-trip on every TTS request: load from .pt file → deserialize to torch.Tensor → serialize back → write to new temp file. This creates ~20-30MB of transient allocations per request that fragment the Python heap, causing RSS to grow monotonically and never shrink.
Reproduction
- Run the CPU container:
docker run -d -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.0post4 - Monitor RSS:
docker stats kokoro-tts --no-stream - Send ~50-100 TTS requests over a few hours
- Observe RSS climbing from ~500MB baseline toward multi-GB without returning
In our case, RSS reached 7.6GB after 2.5 days of moderate use (~50-100 requests/day), triggering the Linux OOM killer on the host.
Root Cause
In api/src/inference/kokoro_v1.py, both generate() and generate_from_tokens():
- Call
paths.load_voice_tensor(voice_path, device)— reads entire.ptfile intoBytesIO, deserializes - Call
paths.save_voice_tensor(voice_tensor, temp_path)— serializes back, writes to NEW temp file - This happens every request, even when the same voice is used repeatedly
The temp files (temp_voice_*) are written to Python's tempfile.gettempdir() (system /tmp), NOT the app's configured temp_file_dir, so the app's cleanup_temp_files() never finds or cleans them.
Additionally, AudioService._writers in api/src/services/audio.py is a class-level dict that accumulates StreamingAudioWriter objects on client disconnect or error (the writer key is never removed if is_last_chunk is never reached).
Suggested Fixes
- Cache the voice tensor and temp file path in
KokoroV1— skip the load/save cycle when the same voice is used again - Use
settings.temp_file_dirfor all temp files so the cleanup routine can find them - Add a
finallyblock inAudioService.convert_audio()to remove the writer key on exception
Environment
- Image:
ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.0post4 - Host: 31GB RAM, Linux 6.17.0
- Usage pattern: ~50-100 TTS requests/day via local API calls