This release:
- ➕ Adds support for ibm-fms 1.1.0
- ➕ Adds support for the latest compiler updates in the newest base image
- ❗ Removes v0 support for text generation
- ⚗️ Adds (very experimental) support for continuous batching mode on spyre hardware
This release is not compatible with vllm==0.9.1
, read more details here
What's Changed
- [CI] Don't skip tests when
uv.lock
is updated by @ckadner in #221 - [CI] Use
uv
for type-check by @ckadner in #222 - ✨ add top-level spyre version by @prashantgupta24 in #224
- [CB] parametrize example script by @sducouedic in #228
- Clean up examples and PR template by @rafvasq in #227
- 🔥🔧 Remove environment variables specific to hardware conf by @gkumbhat in #229
- [CI] Only build docker image on source changes by @ckadner in #220
- [CB] remove VLLM_SPYRE_RM_PADDED_BLOCKS, enable the feature by default by @yannicks1 in #231
- [do not merge][CB] get number of blocks from compiler mock implementation by @yannicks1 in #205
- Exclude vllm v0.9.1 as an allowed version due to breaking bug by @tjohnson31415 in #232
- 🐛 add initialize_cache for v1 worker by @prashantgupta24 in #237
- [tests][CB][SB] minor refactoring of test by @yannicks1 in #239
- 📝 update deployment examples, add kserve by @joerunde in #226
- [Test] CB rejects requests longer than max length by @rafvasq in #236
- [FIX] lazy import of SpyreCausalLM to avoid issues with pytest-forked by @wallashss in #238
- [docs] add debugging docs by @prashantgupta24 in #235
- Support both paged and non-paged attention by @yannicks1 in #162
- [refact] Remove V0 tests by @wallashss in #241
- 🥅 disable v0 decoders by @joerunde in #242
- 🐛 fix runtime msg by @prashantgupta24 in #244
- 🐛 fixed static batch warmup by @joerunde in #246
- ⬆️ upgrade base image for release by @joerunde in #250
- Remove unused DT_OPT by @joerunde in #251
Full Changelog: v0.3.1...v0.4.0