-
Notifications
You must be signed in to change notification settings - Fork 12.1k
sampling: Port of Smooth Sampling / Quadratic Sampling support #13441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ggerganov At this point, everything related to Smooth Sampling should be functionally complete and ready for review! Someone should probably rebuild the npm package for webui though, since my dev machine is running on debian stable and the npm version is old... |
@kalomaze Are you available to do a review of this PR? |
I haven't changed any of the actual sampling part from the original PR, maybe we can review the correctness of the interfaces first? |
* Update multimodal.md Minor change to include the huggingface link * Update docs/multimodal.md --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>
* server: Allow pasting file from clipboard * server: Prevent default action on file paste * update build * format then build combined --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
* webui : use pako for more deterministic gzip compress * simpler code * use fflate instead of pako
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more difficult for various reasons so I haven't done it. Performance for this shader is around 2.5x better than for the scalar shader when doing prompt processing. Some of the benefit may be from other optimizations like staging through shared memory, or splitting by rows.
…nite (ggml-org#13538) This matches how others do it, but will still avoid the extra initialization when rope is disabled. Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>
Messed up when resolving conflict... |
This is a port of @kalomaze 's PR #6445 to the refactored sampler structure due to the inactivity of that PR since May 2024.
Differences from original PR:
Everything should work properly by now, but any help / suggestions are welcomed!