Releases: mosaicml/llm-foundry
v0.21.0
TLDR
- Torch version has been bumped to 2.7.0
- Support FSDP2 via a ENV VAR: FSDP_VERSION=2. Currently it only supports pretraining (with meta init). No yaml change is needed to enable FSDP2, the attrs that only apply to FSDP(1) will be ignored and raised as warnings. See composer release for more details
What's Changed
- Adding support for nope positional encoding in block overrides. by @ShashankMosaicML in #1794
- Bump foundry version to 0.21.0.dev0 by @dakinggg in #1812
- Adding temperature tuning in attention by @ShashankMosaicML in #1793
- Update foundry version in MCLI yamls by @dakinggg in #1813
- Upgrade yapf version by @dakinggg in #1814
- Allow subselecting the appropriate config for llama4 by @dakinggg in #1815
- Change RMSNorm to use PyTorch native implementation by @josejg in #1809
- Update datasets requirement from <3.6,>=3.3.2 to >=3.3.2,<3.7 by @dependabot in #1817
- Bump onnxruntime from 1.19.2 to 1.22.0 by @dependabot in #1819
- Update huggingface-hub[hf_xet] requirement from <0.31,>=0.30.0 to >=0.30.0,<0.32 by @dependabot in #1818
- Deprecate inference API wrappers by @dakinggg in #1821
- Fix Dtensor initialization by @bowenyang008 in #1820
- Update accelerate requirement from <1.7,>=0.25 to >=0.25,<1.8 by @dependabot in #1824
- Bump onnx from 1.17.0 to 1.18.0 by @dependabot in #1823
- Bump docformatter for python3.12 and change blank_line_before_module_docstring = false by @sashaDoubov in #1825
- Delete useless print("here") by @tsebaka in #1826
- Update ci-testing version to latest by @dakinggg in #1827
- Bump coverage[toml] from 7.8.0 to 7.8.2 by @dependabot in #1830
- Configurable shard size by @dakinggg in #1833
- Bump Composer 0.31.0 by @bowenyang008 in #1835
- Fix monolithic checkpointing against composer main by @dakinggg in #1836
- Bump torch version to 2.7 by @bowenyang008 in #1832
- bump huggingface-hub upper bound to 0.33 by @bowenyang008 in #1838
New Contributors
- @bowenyang008 made their first contribution in #1820
- @tsebaka made their first contribution in #1826
Full Changelog: v0.20.0...v0.21.0
v0.20.0
What's Changed
- Bump Dev 0.20.0.dev0 by @KuuCi in #1778
- Bump Example Yamls to use 0.19.0 by @KuuCi in #1779
- Making tokenizers optional in the building of LLMs by @ethantang-db in #1781
- Remove some more calls to HF during CI by @dakinggg in #1780
- Modify validation check for multimodal messages by @adyasha-db in #1787
- Remove all connection to HF in CI by @dakinggg in #1786
- Update transformers requirement from <4.50,>=v4.49.0 to >=v4.49.0,<4.52 by @dependabot in #1788
- Bump einops from 0.8.0 to 0.8.1 by @dependabot in #1776
- Bump gitpython from 3.1.43 to 3.1.44 by @dependabot in #1775
- Update transformers to 4.51 by @dakinggg in #1790
- Update setuptools requirement from <78.0.0 to <80.0.0 by @dependabot in #1796
- Update tiktoken requirement from <0.8.1,>=0.4 to >=0.4,<0.9.1 by @dependabot in #1797
- Update packaging requirement from <25,>=21 to >=21,<26 by @dependabot in #1800
- Update accelerate requirement from <1.4,>=0.25 to >=0.25,<1.7 by @dependabot in #1799
- extended hf_checkpointer for any additional content saving by @ethantang-db in #1792
- Load model only on global rank 0 for mixed init by @dakinggg in #1795
- added attn_implementation for hf_base.py by @ethantang-db in #1801
- Update setuptools requirement from <80.0.0 to <81.0.0 by @dependabot in #1803
- Update datasets requirement from <3.4,>=3.3.2 to >=3.3.2,<3.6 by @dependabot in #1807
- Update grouped-gemm version by @dakinggg in #1810
- Remove some old deprecated code/comments by @dakinggg in #1811
New Contributors
- @ethantang-db made their first contribution in #1781
- @adyasha-db made their first contribution in #1787
Full Changelog: v0.19.0...v0.20.0
v0.19.0
What's New
1. Python 3.12 Bump (#1755)
We've added support for Python 3.12 and deprecated Python 3.9 support.
What's Changed
- Use llmfoundry image instead of pytorch image for gpu tests by @rithwik-db in #1752
- bump dev version to 0.19.0.dev0 by @rithwik-db in #1753
- Bump mcli yaml examples to use 0.18.0 and torch 2.6 by @rithwik-db in #1754
- Fix meta initialization for FSDP training with HF models and TE Layers by @jjuvonen-amd in #1745
- Fix bugs in
llmfoundry/data/text_data.py
by @gsganden in #1760 - Update setuptools requirement from <76.0.0 to <78.0.0 by @dependabot in #1758
- Update README.md by @gsganden in #1721
- Add error handling for general table download errors by @dakinggg in #1761
- modified the packing slightly to enable inheritance by @abaheti95 in #1762
- Remove registration fallback by @dakinggg in #1764
- Move save/load planner creation to after config logging by @dakinggg in #1769
- Bump Python 3.12 by @KuuCi in #1755
- Fix GPU Tests 3.10 by @KuuCi in #1770
- Remove a bunch of repeated calls to HF in the tests by @dakinggg in #1768
- Bump coverage[toml] from 7.6.10 to 7.8.0 by @dependabot in #1767
- Update mlflow requirement from <2.19,>=2.14.1 to >=2.14.1,<2.22 by @dependabot in #1766
- Bump Composer 0.30.0 by @KuuCi in #1772
- Bump streaming 0.12.0 by @KuuCi in #1777
New Contributors
- @jjuvonen-amd made their first contribution in #1745
- @gsganden made their first contribution in #1760
- @abaheti95 made their first contribution in #1762
Full Changelog: v0.18.0...v0.19.0
v0.18.0
What's Changed
- Torch has been bumped to
2.6.0
(in #1740)- Sparse support has been disabled in the latest megablocks version (as part of the latest torch upgrade) and we cascaded those disables to llm-foundry as well (for more details, view the megablocks release)
TransformerEngine
has been removed from theall
dependency group due to version compatibility issues (in #1742). We expect to add this back in a future release.- Transformers has been bumped to
v4.49.0
(in #1735) and this would result in the master weights beingtorch.bfloat16
(view huggingface/transformers#36567 for more context).llm-foundry
doesn't support master weights in lower precision, so we manually hardcoded this totorch.float32
when loading in #1734.
Detailed Changes
- remove deprecated param by @bigning in #1727
- Bump TE for FA 2.7.1.post1 bump by @KuuCi in #1730
- Fix dtype issue in transformers by @dakinggg in #1734
- Bump composer to 0.29.0 by @rithwik-db in #1733
- Bump Transformer v4.49.0 by @KuuCi in #1735
- Bump FA2 to 2.7.4.post1 by @KuuCi in #1728
- Comment GHCR Image Upload by @KuuCi in #1739
- Remove TE from all dependency group by @dakinggg in #1742
- Bump torch to 2.6 by @rithwik-db in #1740
- Update Makefile to use WORLD_SIZE by @irenedea in #1751
New Contributors
- @rithwik-db made their first contribution in #1733
Full Changelog: v0.17.1...v0.18.0
v0.17.1
What's New
Datasets version upgrade (#1724)
We've upgraded the version of Hugging Face datasets library to include a fix for a common issue of the multiprocessing pool hanging after tokenization or filtering.
What's Changed
- Update accelerate requirement from <1.2,>=0.25 to >=0.25,<1.4 by @dependabot in #1714
- Bump datasets version by @dakinggg in #1724
Full Changelog: v0.17.0...v0.17.1
v0.17.0
What's Changed
- Update mcli examples to use 0.16.0 by @irenedea in #1713
- Refactor HF checkpointer by @milocress in #1690
Previously, MlFlow required PEFT models to be specified as a special "flavor" distinct from Transformers models. This workaround is no longer necessary, allowing us to simplify the codepath and cleanly abstract uploading the HuggingFace checkpoints from registering trained models. - Bump version to 0.18.0.dev by @milocress in #1717
Removes the deprecatedsample_weighing_factor
argument frommpt
loss calculations.
Full Changelog: v0.16.0...v0.17.0
v0.16.0
What's New
Streaming 0.11.0 🚀 (#1711)
We've upgraded streaming to 0.11.0. StreamingDataset can now be used with custom Stream implementations via a registry. See the documentation page for example usage.
What's Changed
- Fix llama3 example yamls by @j316chuck in #1688
- Update example yamls to use newest foundry version by @snarayan21 in #1689
- Update datasets requirement from <2.21,>=2.20.0 to >=2.20.0,<3.2 by @dependabot in #1670
- Catch multiple slashes in source dataset into one slash by @KuuCi in #1697
- Make loaded peft adapters optionally trainable by @snarayan21 in #1701
- Adding preprocessors for QA and messages datasets by @ShashankMosaicML in #1700
- Update pycln by @b-chu in #1704
- Add permission error by @b-chu in #1703
- Update datasets requirement from <3.2,>=2.20.0 to >=2.20.0,<3.3 by @dependabot in #1698
- Bump coverage[toml] from 7.6.4 to 7.6.10 by @dependabot in #1702
- Update mosaicml-streaming to 0.11.0 by @es94129 in #1711
- Bump version to 0.17.0.dev0 by @irenedea in #1712
Full Changelog: v0.15.1...v0.16.0
v0.15.1
What's Changed
- Bump version 0.16.0.dev0 by @j316chuck in #1667
- Update mlflow requirement from <2.18,>=2.14.1 to >=2.14.1,<2.19 by @dependabot in #1673
- Speed up embedding tests by @dakinggg in #1668
- Add mcli yaml version bump by @j316chuck in #1674
- Bump Openai version by @snarayan21 in #1684
- Bump Streaming to v0.10.0 by @snarayan21 in #1685
- Bugfix auto packing with streams + no remote path by @mattyding in #1679
- Bump Composer to v0.28.0 by @snarayan21 in #1687
- Expose
DistributedSampler
RNG seed argument by @janEbert in #1677 - Add llama3 ft example yamls by @j316chuck in #1686
New Contributors
Full Changelog: v0.15.0...v0.15.1
v0.15.0
New Features
Open Source Embedding + Contrastive Code (#1615)
LLM foundry now supports finetuning embedding models with contrastive loss. Foundry now supports various approaches to selecting negative passages for contrastive loss which can be either randomly selected or pre-defined. For more information, please view the the readme.
PyTorch 2.5.1 (#1665)
This release updates LLM Foundry to the PyTorch 2.5.1 release, bringing with it support for the new features and optimizations in PyTorch 2.5.1.
Improved error messages (#1657, #1660, #1623, #1625)
Various improved error messages, making debugging user errors more clear.
What's Changed
- Update mcli examples to use 0.14.0 by @irenedea in #1624
- Open Source Embedding + Contrastive Code by @KuuCi in #1615
- Catch delta table not found error by @milocress in #1625
- Add Mlflow 403 PL UserError by @mattyding in #1623
- Catches when data prep cluster fails to start by @milocress in #1628
- Bump mlflow max version by @dakinggg in #1629
- add another cluster connection failure wrapper by @milocress in #1630
- Add MLflow
log_model
option by @nancyhung in #1544 - Move loss generating token counting to the dataloader by @dakinggg in #1632
- Bump databricks-connect from 14.1.0 to 15.4.3 by @dependabot in #1636
- Fix dataset download location by @dakinggg in #1639
- Revert "Bump databricks-connect from 14.1.0 to 15.4.3" by @XiaohanZhangCMU in #1640
- Bump transformers version by @dakinggg in #1631
- Fix gpu tests test_tp_train and test_huggingface_conversion_callback_interval by @irenedea in #1642
- Update datasets requirement from <2.20,>=2.19 to >=2.20.0,<2.21 by @dependabot in #1330
- Add max shard size to transformers save_pretrained by @b-chu in #1648
- Update huggingface-hub requirement from <0.25,>=0.19.0 to >=0.19.0,<0.27 by @dependabot in #1652
- Update accelerate requirement from <0.34,>=0.25 to >=0.25,<1.2 by @dependabot in #1633
- Catch Delta Table Not Found by @KuuCi in #1653
- Add Exception for missing UC column by @milocress in #1654
- Infer step size for Embeddings by @KuuCi in #1647
- Pin FAv2 by @mvpatel2000 in #1656
- Retry catching BlockingIOError by @KuuCi in #1657
- Catch bad data prep by @milocress in #1644
- Update pytest-cov requirement from <6,>=4 to >=4,<7 by @dependabot in #1663
- Bump coverage[toml] from 7.6.1 to 7.6.4 by @dependabot in #1650
- Move transform_model_pre_registration in hf_checkpointer by @irenedea in #1664
- Catch Cluster Permissions Error by @KuuCi in #1660
- Mosaicml version bump by @j316chuck in #1661
- Changes for removing unused terms in CE loss fn by @gupta-abhay in #1643
- Update setuptools requirement from <68.0.0 to <76.0.0 by @dependabot in #1662
- Bump docker version to torch 2.5.1 by @j316chuck in #1665
- Bump ubuntu 22.04 + torch 2.5.1 by @KuuCi in #1666
New Contributors
- @mattyding made their first contribution in #1623
Full Changelog: v0.14.5...v0.15.0