-
Notifications
You must be signed in to change notification settings - Fork 17.9k
feat: Allow passing tokenizer template values to HuggingFace chat models #31489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Allow passing tokenizer template values to HuggingFace chat models #31489
Conversation
This commit introduces two ways for you to customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. This change addresses issue langchain-ai#31470 by providing a flexible way for you to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
CodSpeed Walltime Performance ReportMerging #31489 will not alter performanceComparing
|
CodSpeed Instrumentation Performance ReportMerging #31489 will not alter performanceComparing Summary
|
This commit introduces two ways for you to customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. Linting errors (E501 Line too long) found in a previous CI run have been fixed. This change addresses issue langchain-ai#31470 by providing a flexible way for you to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
This commit introduces two ways for you to customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. Linting errors (E501 Line too long) found in a previous CI run have been fixed by reformatting dictionary comprehensions and other long lines. This change addresses issue langchain-ai#31470 by providing a flexible way for you to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
This commit introduces two ways for you to customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. Further fixes for E501 (Line too long) linting errors based on CI feedback. This change addresses issue langchain-ai#31470 by providing a flexible way to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
This commit introduces two ways you can customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. This commit also includes comprehensive linting and formatting fixes for the `libs/partners/huggingface` directory to align with project standards, including `pyupgrade`, `ruff`, `black`, and `isort` changes. This change addresses issue langchain-ai#31470 by providing a flexible way to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
This commit introduces two ways to customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. This commit also includes comprehensive linting and formatting fixes for the `libs/partners/huggingface` directory to align with project standards, including `pyupgrade`, `ruff --fix`, `black`, `isort`, and `ruff format` changes. This change addresses issue langchain-ai#31470 by providing a flexible way to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
This commit introduces two ways you can customize chat templating when using `ChatHuggingFace`: 1. **Custom Chat Template String**: A new `chat_template` parameter has been added to the `ChatHuggingFace` constructor. This allows you to provide a custom Jinja template string, which will be assigned to `tokenizer.chat_template` after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures. 2. **Dynamic Template Variables via `**kwargs`**: The `_to_chat_prompt` method in `ChatHuggingFace` (which is responsible for formatting messages using `tokenizer.apply_chat_template`) has been modified to accept arbitrary keyword arguments (`**kwargs`). These `kwargs` are then passed directly to `tokenizer.apply_chat_template`. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls to `invoke`, `stream`, `generate`, etc. Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to `apply_chat_template`. Documentation has been updated in the `ChatHuggingFace` class docstring and in the HuggingFace integration notebook (`docs/docs/integrations/chat/huggingface.ipynb`) to explain these new features with examples. This commit also includes comprehensive linting and formatting fixes for the `libs/partners/huggingface` directory to align with project standards. This includes: - Fixing E501 (line too long) errors. - Correcting an invalid `type: ignore[import]` comment in `llms/huggingface_endpoint.py`. - Resolving an `AttributeError` in mypy for `tests/unit_tests/test_chat_models.py`. - Ensuring `pyupgrade`, `ruff`, `black`, and `isort` checks pass. This change addresses issue langchain-ai#31470 by providing a flexible way to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
This commit introduces two ways for you to customize chat templating when using
ChatHuggingFace
:Custom Chat Template String: A new
chat_template
parameter has been added to theChatHuggingFace
constructor. This allows you to provide a custom Jinja template string, which will be assigned totokenizer.chat_template
after the tokenizer is loaded. This gives you full control over the chat prompt formatting if the default template associated with a model is not suitable or if you want to experiment with different prompt structures.Dynamic Template Variables via
**kwargs
: The_to_chat_prompt
method inChatHuggingFace
(which is responsible for formatting messages usingtokenizer.apply_chat_template
) has been modified to accept arbitrary keyword arguments (**kwargs
). Thesekwargs
are then passed directly totokenizer.apply_chat_template
. This allows you to define variables in your Jinja chat templates (either the default one or a custom one) and provide values for these variables dynamically during calls toinvoke
,stream
,generate
, etc.Unit tests have been added to verify these new functionalities, including setting custom templates and passing keyword arguments to
apply_chat_template
. Documentation has been updated in theChatHuggingFace
class docstring and in the HuggingFace integration notebook (docs/docs/integrations/chat/huggingface.ipynb
) to explain these new features with examples.This change addresses issue #31470 by providing a flexible way for you to pass tokenizer template values, interpreted as HuggingFace chat template strings and variables for those Jinja templates.
Thank you for contributing to LangChain!
PR title: "package: description"
PR message: Delete this entire checklist and replace with
Add tests and docs: If you're adding a new integration, please include
docs/docs/integrations
directory.Lint and test: Run
make format
,make lint
andmake test
from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/Additional guidelines:
If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17.