Skip to content

Releases: flashinfer-ai/flashinfer

v0.2.7.post1

01 Jul 18:14
3fb73b3
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.7...v0.2.7.post1

v0.2.7

30 Jun 19:39
4d3fb6d
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.6.post1...v0.2.7

v0.2.6.post1

07 Jun 03:24
bc50f1a
Compare
Choose a tag to compare

What's Changed

  • [CI] Add x86_64 tag for x86 self-hosted runner by @yongwww in #1126
  • hotfix: fix installation script behavior by @yzh119 in #1125

Full Changelog: v0.2.6...v0.2.6.post1

v0.2.6

06 Jun 19:13
608a343
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.2.6

v0.2.5

04 Apr 00:41
592b110
Compare
Choose a tag to compare

What's Changed

  • Fix compilation with FP16_QK_REDUCTION enabled. by @diptorupd in #962
  • misc: Use environment variable to control JIT verbose flag by @yzh119 in #981
  • Triton rms_norm kernels by @nandor in #983
  • Allow passing workspace base directory via environment variable by @jsuchome in #973
  • [CHORE] Rename output_emitted_token_num -> output_emitted_draft_token_num by @jon-chuang in #977
  • ci: switch to on-demand instances if spot instance is interrupted by @yzh119 in #987
  • misc: update devcontainer by @yzh119 in #986
  • ci: add torch 2.6+cu126 wheel by @yzh119 in #985
  • misc: fix devcontainer conda path by @yzh119 in #989
  • perf: prefetch page indices for mla kernel by @yzh119 in #991
  • SM-constraint-GEMM by triton persistent kernel by @yyihuang in #982
  • 3rdparty: upgrade cutlass to 3.9 by @yzh119 in #997
  • perf: add -DNDEBUG compilation flag by @yzh119 in #998
  • release: bump version to v0.2.5 by @yzh119 in #999

New Contributors

Full Changelog: v0.2.4...v0.2.5

v0.2.4

29 Mar 05:09
bc81a59
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.3...v0.2.4

v0.2.3

11 Mar 02:22
fdedc43
Compare
Choose a tag to compare

Breaking Changes

We changed the interface for sampling APIs, more specifically (see #912 ):

  • The sampling API removes the success return value of all sampling API, which is not compatible with earlier design.
  • Instead of passing uniform tensor, we changed the sampling interface to accept torch.Generator (optional, https://pytorch.org/docs/stable/generated/torch.Generator.html), to align with the behavior of torch.

What's Changed

  • release: bump version v0.2.2.post1 by @yzh119 in #902
  • Naive Support for Hopper FP8 Prefill Kernel with Per-Head Quantization by @happierpig in #869
  • bugfix: Fix no return type error by @yzh119 in #904
  • ci: add dockerfile for CI by @yzh119 in #909
  • ci: bugfix on release-ci-docker github action by @yzh119 in #910
  • feat: flashinfer intra-kernel profiler by @yzh119 in #913
  • [Package] Add tvm binding to flashinfer.data when packaging by @MasterJH5574 in #917
  • refactor: move triton dependency to flashinfer.triton by @yzh119 in #918
  • sampling: dual pivot rejection sampling algorithm to improve top-p/top-k sampling efficiency by @yzh119 in #912
  • feat: support non-contiguous input/output in normalization functions by @yzh119 in #921
  • feat: improve sampling algorithm robustness by @yzh119 in #923
  • perf: use max probability instead of 1 as upper bound in top-p/k sampling by @yzh119 in #925
  • fix: add install step of profiler's dependency by @zobinHuang in #929
  • fix: undefined symbol cudaGetDriverEntryPointByVersion with CUDA >= 12.5 by @zobinHuang in #928
  • feat: experimenta support of PDL by @yzh119 in #930
  • release: bump version to v0.2.3 by @yzh119 in #932

New Contributors

Full Changelog: v0.2.2.post1...v0.2.3

v0.2.2.post1

27 Feb 06:00
Compare
Choose a tag to compare

What's Changed

  • bump version to v0.2.2 by @yzh119 in #891
  • perf: fix the performance of second stage of split-k by @yzh119 in #894
  • fix: pin_memory use cpu as default device by @KnowingNothing in #895
  • perf: tweak register amount for producer/consumer in MLA template by @yzh119 in #896
  • perf: fix MLA split-k performance bug by @yzh119 in #898
  • perf: use f16 as split-k partial output data type by @yzh119 in #900
  • perf: tweak the pipeline design of mla kernel by @yzh119 in #901

Full Changelog: v0.2.2...v0.2.2.post1

v0.2.2

23 Feb 22:28
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.1.post2...v0.2.2

v0.2.1.post2

17 Feb 18:05
8127793
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.1.post1...v0.2.1.post2