-
Notifications
You must be signed in to change notification settings - Fork 646
Delete deprecated ChatDataset and InstructDataset #1781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete deprecated ChatDataset and InstructDataset #1781
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1781
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 01c90db with merge base 60864e3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torchtune/docs/source/deep_dives/configs.rst
L122
.. code-block:: python
# Note the API of the tokenizer we specified - we need to pass in a path
def llama2_tokenizer(path: str) -> Llama2Tokenizer:
# Note the API of the dataset we specified - we need to pass in a model tokenizer
# and any optional keyword arguments
def alpaca_dataset(
tokenizer: ModelTokenizer,
train_on_input: bool = True,
max_seq_len: int = 512,
) -> InstructDataset:
from torchtune import config
# Since we've already specified the path in the config, we don't need to pass
# it in
tokenizer = config.instantiate(cfg.tokenizer)
# We pass in the instantiated tokenizer as the first required argument, then
# we change an optional keyword argument
dataset = config.instantiate(
cfg.dataset,
tokenizer,
train_on_input=False,
)
and :class:`~torchtune.datasets.TextCompletionDataset` provide may require you to create your own dataset | ||
class for more flexibility. Let's walk through the :class:`~torchtune.datasets.PreferenceDataset`, | ||
that :class:`~torchtune.datasets.SFTDataset` and :class:`~torchtune.datasets.TextCompletionDataset` provide may require | ||
you to create your own dataset class for more flexibility. Let's walk through the :class:`~torchtune.datasets.PreferenceDataset`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line below still contains a reference to InstructDataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoop
# Issue Closes #2073 # What does this PR do? - Removes the `datasets.rst` from the list of document urls as it no longer exists in torchtune. Referenced PR: pytorch/torchtune#1781 - Added a step to run `uv sync`. Previously, I would get the following error: ``` ➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10 source .venv/bin/activate Using CPython 3.10.13 interpreter at: /usr/bin/python3.10 Creating virtual environment at: .venv Activate with: source .venv/bin/activate (llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run zsh: llama: command not found... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan To test: Run through `rag_agent` example in the `detailed_tutorial.md` file. [//]: # (## Documentation)
…ama#2061) # Issue Closes meta-llama#2073 # What does this PR do? - Removes the `datasets.rst` from the list of document urls as it no longer exists in torchtune. Referenced PR: pytorch/torchtune#1781 - Added a step to run `uv sync`. Previously, I would get the following error: ``` ➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10 source .venv/bin/activate Using CPython 3.10.13 interpreter at: /usr/bin/python3.10 Creating virtual environment at: .venv Activate with: source .venv/bin/activate (llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run zsh: llama: command not found... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan To test: Run through `rag_agent` example in the `detailed_tutorial.md` file. [//]: # (## Documentation)
Context
What is the purpose of this PR? Is it to
Deleting deprecated classes
ChatDataset
andInstructDataset
, which have now been in the release for a cycle. Closes #1780.Changelog
What are the changes made in this PR?
ChatDataset
andInstructDataset
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example