Skip to content

sampling: Port of Smooth Sampling / Quadratic Sampling support #13441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

Silver267
Copy link

@Silver267 Silver267 commented May 10, 2025

This is a port of @kalomaze 's PR #6445 to the refactored sampler structure due to the inactivity of that PR since May 2024.

Differences from original PR:

  • Added tests and docs!
  • Webui support.
  • Decoupled smooth sampling from temperature, so that it is a sampler of its own.

Everything should work properly by now, but any help / suggestions are welcomed!

@Silver267 Silver267 requested a review from ngxson as a code owner May 10, 2025 20:26
@github-actions github-actions bot added testing Everything test related examples server labels May 10, 2025
@ngxson ngxson requested review from ggerganov and removed request for ngxson May 10, 2025 20:27
@Silver267
Copy link
Author

@ggerganov At this point, everything related to Smooth Sampling should be functionally complete and ready for review! Someone should probably rebuild the npm package for webui though, since my dev machine is running on debian stable and the npm version is old...

@ggerganov
Copy link
Member

@kalomaze Are you available to do a review of this PR?

@Silver267
Copy link
Author

I haven't changed any of the actual sampling part from the original PR, maybe we can review the correctness of the interfaces first?

jeffbolznv and others added 12 commits May 14, 2025 14:31
* Update multimodal.md

Minor change to include the huggingface link

* Update docs/multimodal.md

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
* server: Allow pasting file from clipboard

* server: Prevent default action on file paste

* update build

* format then build combined

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
* webui : use pako for more deterministic gzip compress

* simpler code

* use fflate instead of pako
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.
)

* server : passthrough the /models endpoint during loading

* server : update readme + return json for "meta" field
…nite (ggml-org#13538)

This matches how others do it, but will still avoid the extra
initialization when rope is disabled.

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <[email protected]>
@Silver267
Copy link
Author

Messed up when resolving conflict...

@Silver267 Silver267 closed this May 14, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation script Script related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning labels May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes script Script related server testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.