HabanaAI / vllm-hpu-extension Public

Notifications You must be signed in to change notification settings
Fork 35
Star 12

Code
Issues 1
Pull requests 30
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Projects
Security
Insights

Pull requests: HabanaAI/vllm-hpu-extension

Labels 10 Milestones 0

New pull request New

30 Open 229 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Introduce block_softmax_adjustment kernel (#163)

#263 opened Jul 8, 2025 by kdamaszk • Draft

[UX] Warning for exponential bucketing

#262 opened Jul 8, 2025 by adobrzyn

Loading…

Remove self_attn & lm_head from Mixtral quant config

#261 opened Jul 7, 2025 by tpawlows

Loading…

Update of exponential bs warmup mechanism

#258 opened Jul 4, 2025 by iboiko-habana

Loading…

Fix ModuleFusedSDPA graph break

#257 opened Jul 4, 2025 by bkowalskiINTEL

Loading…

Fix for calibration error TypeError: generate_responses() missing 1 required positional argument: 'args'

#255 opened Jul 2, 2025 by tthakkal

Loading…

Enable block_softmax_adjustment on Gaudi2

#254 opened Jul 2, 2025 by kdamaszk • Draft

[SW-233526]Fix runtime dequant for block fp8

#251 opened Jul 1, 2025 by xuechendi

Loading…

vllm hpu-extension for automatization of long context, prompt

#249 opened Jun 30, 2025 by iboiko-habana

Loading…

Automatization of long context

#248 opened Jun 30, 2025 by iboiko-habana

Loading…

Add pre-commit static checks

#247 opened Jun 30, 2025 by kzawora-intel

Loading…

Allow usage of fused_block_softmax_adjustment for Qwen with Lazy

#246 opened Jun 27, 2025 by mswiniarsk • Draft

Update dependabot.yml

#242 opened Jun 26, 2025 by michalkuligowski

Loading…

Update linear.py

#239 opened Jun 25, 2025 by michalkuligowski

Loading…

Integrating block_softmax

#238 opened Jun 24, 2025 by ksmusz • Draft

Remove double generate

#229 opened Jun 18, 2025 by adobrzyn

Loading…

Exponential bucketing tweaks

#224 opened Jun 13, 2025 by madamczyk-intel

Loading…

Find bucket with bmin not divs by step

#212 opened Jun 5, 2025 by adobrzyn

Loading…

Fix max_blocks for warmup decode buckets in case of disabled CONTIGUOUS PA feature

#204 opened May 29, 2025 by iboiko-habana

Loading…

Use sets for faster filter checks. Better long context support

#203 opened May 28, 2025 by pi314ever

Loading…

Add useful internal vllm test

#200 opened May 27, 2025 by nirda7 • Draft

[SW-225565] Enable triangular softmax with merged prefill

#197 opened May 26, 2025 by kamil-kaczor • Draft

fix the issue that bmax not in bucket buffer

#191 opened May 22, 2025 by sywangyi

Loading…

[SW-233624] Unify FusedMoe with expert parallelism

#175 opened May 14, 2025 by mengniwang95

Loading…

Optimized MoE on Gaudi

#159 opened Apr 18, 2025 by gyou2021 • Draft

Previous 1 2 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!