Skip to content

Conversation

@wuxibin89
Copy link
Collaborator

Checklist Before Starting

  • Search for similar PR(s).

What does this PR do?

ChatCompletionScheduler interacts with async rollout server with OpenAI client, and follow OpenAI's tool calling schema. So any inference frameworks implementing OpenAI compatible server (e.g vllm, sglang) should works with ChatCompletionScheduler.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

  • Issue Number: Fixes issue # or discussion # if any.
  • Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
  • Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • Add CI test(s) if necessary.

@casper-hansen
Copy link
Contributor

casper-hansen commented Jun 3, 2025

All in all, I think it's a great PR to abstract a bit of the chat scheduler away - however, I would recommend keeping extensiblity for the chat scheduler instead of removing it.

Here is why:

  1. I compute specific amount of completion tokens left and have to set that in the sampling parameter kwargs.
  2. I had a case where I needed to override the _postprocess to set enable_thinking for the Qwen3 models to work properly (I was getting empty <think></think> in a multi-turn scenario).
  3. I also extensively use self.tokenizer.apply_chat_template to stay without context length in the case of calling tools many times and with a lot of context returned. Removing this would lead to less usability.

Btw, tools in training are also kind of hard to debug. Is there anything that can make this easier?

@wuxibin89
Copy link
Collaborator Author

@casper-hansen Thanks for your advise, I add a configurable option completion_callback in verl/trainer/config/ppo_trainer.yaml, which allow users to specify their own custom CompletionCallback implementation.

User should inherit CompletionCallback and implement two functions:

class CustomCompletionCallback(CompletionCallback):
    async def __call__(self, messages: List[Dict[str, str]], completions: ChatCompletion, info: Dict[str, Any]):
        ...

    def postprocess(self, batch: DataProto, batch_conversations: List[List[Dict[str, str]]], n: int) -> DataProto:
        ...

For reference, I add a unit test in test_custom_completion_callback.py to demonstrate how to write custom CompletionCallback.

@casper-hansen
Copy link
Contributor

@wuxibin89 thanks for addressing the usage needs. I just looked through the code and confirmed I have access to the tokenizer, sampling paremeters, and postprocessing. So this looks good to me!

@vermouth1992 vermouth1992 merged commit 457f4d2 into main Jun 6, 2025
38 checks passed
@vermouth1992 vermouth1992 deleted the wuxibin/chat_scheduler branch June 6, 2025 23:47
waleko added a commit to waleko/verl that referenced this pull request Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants