Delete deprecated ChatDataset and InstructDataset #1781

joecummings · 2024-10-09T12:01:08Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Deleting deprecated classes ChatDataset and InstructDataset, which have now been in the release for a cycle. Closes #1780.

Changelog

What are the changes made in this PR?

Delete ChatDataset and InstructDataset
Update references and tests

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-10-09T12:01:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1781

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 01c90db with merge base 60864e3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SalmanMohammadi

torchtune/docs/source/deep_dives/configs.rst L122

.. code-block:: python

    # Note the API of the tokenizer we specified - we need to pass in a path
    def llama2_tokenizer(path: str) -> Llama2Tokenizer:

    # Note the API of the dataset we specified - we need to pass in a model tokenizer
    # and any optional keyword arguments
    def alpaca_dataset(
        tokenizer: ModelTokenizer,
        train_on_input: bool = True,
        max_seq_len: int = 512,
    ) -> InstructDataset:

    from torchtune import config

    # Since we've already specified the path in the config, we don't need to pass
    # it in
    tokenizer = config.instantiate(cfg.tokenizer)
    # We pass in the instantiated tokenizer as the first required argument, then
    # we change an optional keyword argument
    dataset = config.instantiate(
        cfg.dataset,
        tokenizer,
        train_on_input=False,
    )

SalmanMohammadi · 2024-10-09T12:09:55Z

docs/source/tutorials/datasets.rst

-and :class:`~torchtune.datasets.TextCompletionDataset` provide may require you to create your own dataset
-class for more flexibility. Let's walk through the :class:`~torchtune.datasets.PreferenceDataset`,
+that :class:`~torchtune.datasets.SFTDataset` and :class:`~torchtune.datasets.TextCompletionDataset` provide may require
+you to create your own dataset class for more flexibility. Let's walk through the :class:`~torchtune.datasets.PreferenceDataset`,


Line below still contains a reference to InstructDataset.

# Issue Closes #2073 # What does this PR do? - Removes the `datasets.rst` from the list of document urls as it no longer exists in torchtune. Referenced PR: pytorch/torchtune#1781 - Added a step to run `uv sync`. Previously, I would get the following error: ``` ➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10 source .venv/bin/activate Using CPython 3.10.13 interpreter at: /usr/bin/python3.10 Creating virtual environment at: .venv Activate with: source .venv/bin/activate (llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run zsh: llama: command not found... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan To test: Run through `rag_agent` example in the `detailed_tutorial.md` file. [//]: # (## Documentation)

…ama#2061) # Issue Closes meta-llama#2073 # What does this PR do? - Removes the `datasets.rst` from the list of document urls as it no longer exists in torchtune. Referenced PR: pytorch/torchtune#1781 - Added a step to run `uv sync`. Previously, I would get the following error: ``` ➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10 source .venv/bin/activate Using CPython 3.10.13 interpreter at: /usr/bin/python3.10 Creating virtual environment at: .venv Activate with: source .venv/bin/activate (llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run zsh: llama: command not found... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan To test: Run through `rag_agent` example in the `detailed_tutorial.md` file. [//]: # (## Documentation)

Delete deprecated ChatDataset and InstructDataset

03bf740

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2024

Update docs

c6c1b7a

joecummings requested review from RdoubleA, ebsmothers and SalmanMohammadi October 9, 2024 12:03

joecummings mentioned this pull request Oct 9, 2024

v0.4.0 release tracker #1747

Closed

34 tasks

SalmanMohammadi approved these changes Oct 9, 2024

View reviewed changes

Finish removing references to InstructDataset

01c90db

RdoubleA approved these changes Oct 9, 2024

View reviewed changes

joecummings merged commit 2db53b4 into pytorch:main Oct 9, 2024
17 checks passed

joecummings deleted the remove-chat-and-instruct-ds branch October 9, 2024 15:04

mori360 pushed a commit to mori360/torchtune that referenced this pull request Oct 14, 2024

Delete deprecated ChatDataset and InstructDataset (pytorch#1781)

90338a6

This was referenced Apr 30, 2025

docs: Remove datasets.rst and fix llama-stack build commands meta-llama/llama-stack#2061

Merged

docs: datasets.rst torchtune tutorial is deprecated meta-llama/llama-stack#2073

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delete deprecated ChatDataset and InstructDataset #1781

Delete deprecated ChatDataset and InstructDataset #1781

Uh oh!

joecummings commented Oct 9, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 9, 2024 •

edited

Loading

Uh oh!

SalmanMohammadi left a comment •

edited

Loading

Uh oh!

SalmanMohammadi Oct 9, 2024

Uh oh!

joecummings Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

Delete deprecated ChatDataset and InstructDataset #1781

Delete deprecated ChatDataset and InstructDataset #1781

Uh oh!

Conversation

joecummings commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changelog

Test plan

UX

Uh oh!

pytorch-bot bot commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1781

✅ No Failures

Uh oh!

SalmanMohammadi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joecummings commented Oct 9, 2024 •

edited

Loading

pytorch-bot bot commented Oct 9, 2024 •

edited

Loading

SalmanMohammadi left a comment •

edited

Loading