Skip to content

modules doc updates #1588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 15, 2024
Merged

modules doc updates #1588

merged 5 commits into from
Sep 15, 2024

Conversation

ebsmothers
Copy link
Contributor

A bunch of miscellaneous changes to torchtune/modules docstrings so that our API docs look a bit nicer. Tbh I probably could have done a lot more here and many of the changes are more surface-level, but every little bit helps.

Copy link

pytorch-bot bot commented Sep 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1588

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 76782d4 with merge base f6d3a7a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 15, 2024
the cross entropy normally, but upcasting only one chunk at a time saves considerable memory.

The CE and upcasting have to be compiled together for better performance.
When using this class, we recommend using torch.compile only on the method `compute_cross_entropy`.
When using this class, we recommend using :func:`torch.compile` only on the method ``compute_cross_entropy``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you self reference method here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was debating it but honestly it's like 5 lines below and not well-documented anyways so I thought it was overkill

0.0 to lr over num_warmup_steps, then decreases to 0.0 on a cosine schedule over
the remaining num_training_steps-num_warmup_steps (assuming num_cycles = 0.5).
0.0 to lr over ``num_warmup_steps``, then decreases to 0.0 on a cosine schedule over
the remaining ``num_training_steps-num_warmup_steps`` (assuming num_cycles = 0.5).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks on num cycles

@@ -149,7 +149,7 @@ class FusionEmbedding(nn.Module):
second embedding for the additional tokens. During forward this module routes
the tokens to the appropriate embedding table.

Use this as a drop-in replacement for `nn.Embedding` in your model.
Use this as a drop-in replacement for ``nn.Embedding`` in your model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point to :class:torch.nn.Embedding?

@@ -85,16 +85,13 @@ def forward(
If none, assume the index of the token is its position id. Default is None.

Returns:
torch.Tensor: output tensor with RoPE applied
torch.Tensor: output tensor with shape [b, s, n_h, h_d]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks on shape

@@ -31,4 +32,12 @@ def __init__(self, tied_module: nn.Module):
)

def __call__(self, x: torch.tensor) -> torch.tensor:
"""
Args:
x (torch.tensor): Input tensor. Should have shape ``(..., in_dim)``, where ``in_dim``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these typed with lowercase tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh you caught me. idk and idc, I just wanted these changes to be docstring-only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it as is for now, we can check with @felipemello1 on whether this was by design or not

Copy link
Contributor

@felipemello1 felipemello1 Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it just to see if you guys truly review the PRs, obviously. Congratulations, you passed the test! Wanna update it to upper case?

@@ -28,7 +28,7 @@ class MultiHeadAttention(nn.Module):
Following is an example of MHA, GQA and MQA with num_heads = 4

(credit for the documentation:
https://github.com/Lightning-AI/lit-gpt/blob/main/lit_gpt/config.py).
https://github.com/Lightning-AI/litgpt/blob/eda1aaaf391fd689664f95487ab03dc137e213fd/litgpt/config.py).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
https://github.com/Lightning-AI/litgpt/blob/eda1aaaf391fd689664f95487ab03dc137e213fd/litgpt/config.py).
`litgpt.Config <https://github.com/Lightning-AI/litgpt/blob/eda1aaaf391fd689664f95487ab03dc137e213fd/litgpt/config.py>`_).

@@ -25,7 +25,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
x (torch.Tensor): Input tensor.

Returns:
torch.Tensor: The normalized output tensor.
torch.Tensor: The normalized output tensor having the same shape as x.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
torch.Tensor: The normalized output tensor having the same shape as x.
torch.Tensor: The normalized output tensor having the same shape as ``x``.

@@ -356,8 +356,8 @@ def forward(
KV values for each position.

Returns:
Tensor: output tensor with shape [b x s x v] or a list of layer
output tensors defined by ``output_hidden_states`` with the
Tensor: output tensor with shape [b x s x v] or a list of layer \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tensor: output tensor with shape [b x s x v] or a list of layer \
Tensor: Output tensor with shape ``[b x s x v]`` or a list of layer \

@@ -27,7 +27,7 @@ def get_reward_penalty_mask(

Args:
padding_masks (torch.Tensor): torch.Tensor where True indicates a padding token in the generated
sequence, and False otherwise. Shape: (b, reponse_len)
sequence, and False otherwise. Shape: (b, response_len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sequence, and False otherwise. Shape: (b, response_len)
sequence, and False otherwise. Shape: ``(b, response_len)``

@@ -27,7 +27,7 @@ def get_reward_penalty_mask(

Args:
padding_masks (torch.Tensor): torch.Tensor where True indicates a padding token in the generated
sequence, and False otherwise. Shape: (b, reponse_len)
sequence, and False otherwise. Shape: (b, response_len)
seq_lens (torch.Tensor): The length of each generated sequence. Shape: (b,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seq_lens (torch.Tensor): The length of each generated sequence. Shape: (b,)
seq_lens (torch.Tensor): The length of each generated sequence. Shape: ``(b,)``

@@ -58,8 +58,8 @@ def get_rewards_ppo(

Args:
scores (torch.Tensor): Reward model scores, shape (b,).
logprobs (torch.Tensor): Policy logprobs, shape (b, reponse_len).
ref_logprobs (torch.Tensor): Reference base model, shape (b, reponse_len).
logprobs (torch.Tensor): Policy logprobs, shape (b, response_len).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logprobs (torch.Tensor): Policy logprobs, shape (b, response_len).
logprobs (torch.Tensor): Policy logprobs, shape ``(b, response_len)``.

logprobs (torch.Tensor): Policy logprobs, shape (b, reponse_len).
ref_logprobs (torch.Tensor): Reference base model, shape (b, reponse_len).
logprobs (torch.Tensor): Policy logprobs, shape (b, response_len).
ref_logprobs (torch.Tensor): Reference base model, shape (b, response_len).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ref_logprobs (torch.Tensor): Reference base model, shape (b, response_len).
ref_logprobs (torch.Tensor): Reference base model logprobs, shape ``(b, response_len)``.

Comment on lines 191 to 192
values (torch.Tensor): The predicted values for each state. Shape: (b, response_len)
rewards (torch.Tensor): The rewards received at each time step. Shape: (b, response_len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
values (torch.Tensor): The predicted values for each state. Shape: (b, response_len)
rewards (torch.Tensor): The rewards received at each time step. Shape: (b, response_len)
values (torch.Tensor): The predicted values for each state. Shape: ``(b, response_len)``
rewards (torch.Tensor): The rewards received at each time step. Shape: ``(b, response_len)``

Comment on lines 199 to 200
- advantages (torch.Tensor): The estimated advantages. Shape: (b, response_len)
- returns (torch.Tensor): The estimated returns. Shape: (b, response_len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- advantages (torch.Tensor): The estimated advantages. Shape: (b, response_len)
- returns (torch.Tensor): The estimated returns. Shape: (b, response_len)
- advantages (torch.Tensor): The estimated advantages. Shape: ``(b, response_len)``
- returns (torch.Tensor): The estimated returns. Shape: ``(b, response_len)``

@@ -34,7 +34,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
x (torch.Tensor): input tensor to normalize

Returns:
torch.Tensor: The output tensor after applying RMSNorm.
torch.Tensor: The normalized and scaled tensor having the same shape as x.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
torch.Tensor: The normalized and scaled tensor having the same shape as x.
torch.Tensor: The normalized and scaled tensor having the same shape as ``x``.

@@ -22,6 +22,6 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
x (torch.Tensor): input tensor to gate

Returns:
torch.Tensor: The output tensor after gating.
torch.Tensor: The output tensor after gating. Has the same shape as x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
torch.Tensor: The output tensor after gating. Has the same shape as x
torch.Tensor: The output tensor after gating. Has the same shape as ``x``.

maybe I'm getting carried away

@@ -12,7 +12,8 @@
class TiedLinear:
"""
A tied linear layer, without bias, that shares the same weight as another linear layer.
This is useful for models that use tied weights, such as qwen and gemma.
This is useful for models that use tied weights, such as :func:`~torchtune.models.qwen2_0_5b`,
:func:`~torchtune.models.qwen2_1_5b` and :func:`~torchtune.models.gemma`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:func:`~torchtune.models.qwen2_1_5b` and :func:`~torchtune.models.gemma`.
:func:`~torchtune.models.qwen2_1_5b` and all of the :func:`~torchtune.models.gemma` models.

@SalmanMohammadi
Copy link
Collaborator

That was kind of cathartic

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 27.00%. Comparing base (62b0c79) to head (76782d4).
Report is 2 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (62b0c79) and HEAD (76782d4). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (62b0c79) HEAD (76782d4)
7 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1588       +/-   ##
===========================================
- Coverage   73.12%   27.00%   -46.13%     
===========================================
  Files         289      290        +1     
  Lines       14175    14252       +77     
===========================================
- Hits        10366     3849     -6517     
- Misses       3809    10403     +6594     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ebsmothers ebsmothers merged commit 6820089 into pytorch:main Sep 15, 2024
17 checks passed
@ebsmothers ebsmothers deleted the modules-doc-updates branch September 15, 2024 22:46
ebsmothers added a commit that referenced this pull request Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants