fix cuda oom because gpu mem is full of fragement in two micro batch … #41

pseudonym65535n · 2025-04-27T12:12:47Z

…decode

Motivation

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…decode

pseudonym65535n added 5 commits April 27, 2025 20:10

fix cuda oom because gpu mem is full of fragement in two micro batch …

cf1de79

…decode

remove annoation

4a02691

remove annoation

b5c40c4

fix clean

e677945

remove annotations

ece72ee

pseudonym65535n merged commit 86f80e0 into epic/two-batch-overlap Apr 29, 2025
1 check failed

pseudonym65535n deleted the fix/two_batch_decode_cuda_oom branch April 29, 2025 12:00