-
Notifications
You must be signed in to change notification settings - Fork 568
Adding support for nope positional encoding in block overrides. #1794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for nope positional encoding in block overrides. #1794
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new flag ("nope") to disable positional encoding via block overrides, providing users with greater flexibility in specifying positional encoding behavior.
- Added tests to verify attention behavior with the "nope" flag.
- Updated configuration defaults and validation in the MPT configuration to restrict "nope" usage to block overrides.
- Propagated the "nope" flag through attention layer constructors and forward logic to conditionally disable rotary embeddings and alibi slopes.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
File | Description |
---|---|
tests/models/layers/test_flash_torch.py | Added tests covering cases with "nope" enabled and disabled. |
llmfoundry/models/utils/config_defaults.py | Added default "nope": False setting in configuration defaults. |
llmfoundry/models/mpt/configuration_mpt.py | Enforced "nope" usage only in block overrides with validation. |
llmfoundry/models/layers/attention.py | Propagated "nope" flag across attention layer initialization and forward pass. |
Comments suppressed due to low confidence (3)
llmfoundry/models/layers/attention.py:464
- [nitpick] The parameter name 'nope' is not self-explanatory; consider renaming it to something more descriptive like 'disable_positional_encoding' to improve code clarity.
nope: bool = False,
llmfoundry/models/mpt/configuration_mpt.py:214
- [nitpick] The raised error message when 'nope' is specified as a default indicates that it must only be used in block_overrides; please ensure this behavior is clearly documented for users.
if self.attn_config.get('nope', False):
tests/models/layers/test_flash_torch.py:821
- [nitpick] Consider adding an explicit assertion verifying that positional encoding is disabled when 'nope' is set to True, to ensure future changes do not unintentionally alter this behavior.
cfg.nope = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a PR description with an example of how to use this
Co-authored-by: Vitaliy Chiley <[email protected]>
Allows for certain layers to have no positional embedding, similar to Llama 4. Example usage: