Skip to content

feat: support ep size < 32 for sgl kernel #4348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 13, 2025
Merged

Conversation

shuaills
Copy link
Collaborator

@shuaills shuaills commented Mar 12, 2025

Motivation

Support ep size < 32 for sgl kernel.
related pr #4249

Modifications

Checklist

@shuaills shuaills changed the title support ep size < 32 for sgl kernel feat: support ep size < 32 for sgl kernel Mar 12, 2025
@shuaills
Copy link
Collaborator Author

shuaills commented Mar 12, 2025

Benchmark results from python sgl-kernel/benchmark/bench_moe_align_block_size.py --num_experts 8 --topk 2
We cross-validated the numerical outputs from SGL, Triton, and VLLM to ensure equivalence.

moe-align-block-size-performance:
     num_tokens  num_experts  topk        SGL       Triton        VLLM
0          16.0          8.0   2.0  16.384000    26.624000   15.360000
1          16.0          8.0   4.0  16.384000    26.624000   16.384000
2          16.0          8.0   8.0  16.384000    28.672000   16.384000
3          16.0         32.0   2.0  18.432001    27.648000   18.432001
4          16.0         32.0   4.0  18.432001    28.672000   19.455999
5          16.0         32.0   8.0  18.432001    28.672000   19.455999
6          16.0         64.0   2.0  19.455999    30.719999   23.552001
7          16.0         64.0   4.0  19.455999    30.719999   23.552001
8          16.0         64.0   8.0  20.479999    31.744000   23.552001
9          16.0        128.0   2.0  22.528000    36.864001   38.911998
10         16.0        128.0   4.0  22.528000    36.864001   38.911998
11         16.0        128.0   8.0  22.528000    36.864001   38.911998
12         16.0        256.0   2.0  27.648000    53.247999         inf
13         16.0        256.0   4.0  27.648000    53.247999         inf
14         16.0        256.0   8.0  27.648000    53.247999         inf
15         32.0          8.0   2.0  16.384000    27.648000   16.384000
16         32.0          8.0   4.0  16.384000    28.672000   16.384000
17         32.0          8.0   8.0  16.384000    32.768000   17.408000
18         32.0         32.0   2.0  18.432001    28.672000   19.455999
19         32.0         32.0   4.0  18.432001    29.696001   19.455999
20         32.0         32.0   8.0  19.455999    29.696001   20.479999
21         32.0         64.0   2.0  19.455999    30.719999   23.552001
22         32.0         64.0   4.0  19.455999    31.744000   23.552001
23         32.0         64.0   8.0  20.479999    32.768000   24.576001
24         32.0        128.0   2.0  22.528000    36.864001   38.911998
25         32.0        128.0   4.0  22.528000    36.864001   38.911998
26         32.0        128.0   8.0  22.528000    36.864001   39.935999
27         32.0        256.0   2.0  27.648000    53.247999         inf
28         32.0        256.0   4.0  27.648000    53.247999         inf
29         32.0        256.0   8.0  27.648000    53.247999         inf
30         64.0          8.0   2.0  16.384000    28.672000   16.384000
31         64.0          8.0   4.0  16.384000    32.768000   17.408000
32         64.0          8.0   8.0  17.408000    39.935999   19.455999
33         64.0         32.0   2.0  18.432001    29.696001   19.455999
34         64.0         32.0   4.0  19.455999    29.696001   20.479999
35         64.0         32.0   8.0  19.455999    32.768000   22.528000
36         64.0         64.0   2.0  19.455999    31.744000   23.552001
37         64.0         64.0   4.0  20.479999    32.768000   24.576001
38         64.0         64.0   8.0  20.479999    33.792000   25.599999
39         64.0        128.0   2.0  22.528000    37.888002   38.911998
40         64.0        128.0   4.0  22.528000    37.888002   39.935999
41         64.0        128.0   8.0  22.528000    38.911998   39.935999
42         64.0        256.0   2.0  27.648000    53.247999         inf
43         64.0        256.0   4.0  27.648000    53.247999         inf
44         64.0        256.0   8.0  27.648000    55.296000         inf
45        128.0          8.0   2.0  16.384000    32.768000   17.408000
46        128.0          8.0   4.0  17.408000    39.935999   19.455999
47        128.0          8.0   8.0  17.408000    54.272000   24.576001
48        128.0         32.0   2.0  19.455999    29.696001   20.479999
49        128.0         32.0   4.0  19.455999    32.768000   22.528000
50        128.0         32.0   8.0  19.455999    36.864001   27.648000
51        128.0         64.0   2.0  20.479999    32.768000   24.576001
52        128.0         64.0   4.0  20.479999    34.816001   25.599999
53        128.0         64.0   8.0  21.504000    36.864001   27.648000
54        128.0        128.0   2.0  22.528000    36.864001   39.935999
55        128.0        128.0   4.0  22.528000    38.911998   39.935999
56        128.0        128.0   8.0  23.552001    43.008000   41.983999
57        128.0        256.0   2.0  27.648000    53.247999         inf
58        128.0        256.0   4.0  27.648000    55.296000         inf
59        128.0        256.0   8.0  27.648000    56.320000         inf
60        256.0          8.0   2.0  17.408000    39.935999   19.455999
61        256.0          8.0   4.0  17.408000    54.272000   24.576001
62        256.0          8.0   8.0  17.408000    82.943998   33.792000
63        256.0         32.0   2.0  19.455999    32.768000   22.528000
64        256.0         32.0   4.0  19.455999    36.864001   27.648000
65        256.0         32.0   8.0  19.455999    44.032000   36.864001
66        256.0         64.0   2.0  20.479999    34.816001   25.599999
67        256.0         64.0   4.0  21.504000    36.864001   27.648000
68        256.0         64.0   8.0  21.504000    40.959999   32.768000
69        256.0        128.0   2.0  22.528000    38.911998   39.935999
70        256.0        128.0   4.0  23.552001    43.008000   40.959999
71        256.0        128.0   8.0  23.552001    46.080001   45.056000
72        256.0        256.0   2.0  27.648000    55.296000         inf
73        256.0        256.0   4.0  27.648000    56.320000         inf
74        256.0        256.0   8.0  28.672000    59.392001         inf
75        512.0          8.0   2.0  17.408000    54.272000   24.576001
76        512.0          8.0   4.0  17.408000    82.943998   33.792000
77        512.0          8.0   8.0  20.479999   140.287995   54.272000
78        512.0         32.0   2.0  19.455999    36.864001   27.648000
79        512.0         32.0   4.0  19.455999    43.008000   36.864001
80        512.0         32.0   8.0  20.479999    57.344001   62.463999
81        512.0         64.0   2.0  21.504000    36.864001   27.648000
82        512.0         64.0   4.0  21.504000    40.959999   32.768000
83        512.0         64.0   8.0  22.528000    48.128001   49.152002
84        512.0        128.0   2.0  23.552001    43.008000   41.983999
85        512.0        128.0   4.0  23.552001    46.080001   45.056000
86        512.0        128.0   8.0  24.576001    50.175998   53.247999
87        512.0        256.0   2.0  27.648000    55.296000         inf
88        512.0        256.0   4.0  28.672000    60.416002         inf
89        512.0        256.0   8.0  29.696001    63.487999         inf
90       1024.0          8.0   2.0  17.408000    82.943998   33.792000
91       1024.0          8.0   4.0  20.479999   140.287995   54.272000
92       1024.0          8.0   8.0  25.599999   254.976004   93.184002
93       1024.0         32.0   2.0  19.455999    43.008000   36.864001
94       1024.0         32.0   4.0  20.479999    57.344001   62.463999
95       1024.0         32.0   8.0  23.552001    87.040000  115.712002
96       1024.0         64.0   2.0  21.504000    40.959999   33.792000
97       1024.0         64.0   4.0  22.528000    48.128001   49.152002
98       1024.0         64.0   8.0  25.599999    62.463999   77.823997
99       1024.0        128.0   2.0  23.552001    46.080001   45.056000
100      1024.0        128.0   4.0  24.576001    50.175998   54.272000
101      1024.0        128.0   8.0  27.648000    57.344001   68.608001
102      1024.0        256.0   2.0  28.672000    60.416002         inf
103      1024.0        256.0   4.0  29.696001    63.487999         inf
104      1024.0        256.0   8.0  32.768000    69.632001         inf
105      2048.0          8.0   2.0  20.479999   140.287995   54.272000
106      2048.0          8.0   4.0  25.599999   254.976004   93.184002
107      2048.0          8.0   8.0  34.816001   483.328015  178.176001
108      2048.0         32.0   2.0  20.479999    57.344001   62.463999
109      2048.0         32.0   4.0  23.552001    87.040000  116.736002
110      2048.0         32.0   8.0  30.719999   144.383997  221.184000
111      2048.0         64.0   2.0  22.528000    48.128001   49.152002
112      2048.0         64.0   4.0  25.599999    62.463999   76.800004
113      2048.0         64.0   8.0  33.792000    91.136001  133.120000
114      2048.0        128.0   2.0  24.576001    50.175998   54.272000
115      2048.0        128.0   4.0  27.648000    57.344001   67.584001
116      2048.0        128.0   8.0  34.816001    73.728003   97.280003
117      2048.0        256.0   2.0  29.696001    63.487999         inf
118      2048.0        256.0   4.0  32.768000    69.632001         inf
119      2048.0        256.0   8.0  39.935999    83.967999         inf
120      4096.0          8.0   2.0  25.599999   254.976004   93.184002
121      4096.0          8.0   4.0  34.816001   483.328015  178.176001
122      4096.0          8.0   8.0  51.199999   924.672008  340.992004
123      4096.0         32.0   2.0  23.552001    86.015999  115.712002
124      4096.0         32.0   4.0  30.719999   144.383997  222.207993
125      4096.0         32.0   8.0  54.272000   260.096014  428.032011
126      4096.0         64.0   2.0  25.599999    63.487999   77.823997
127      4096.0         64.0   4.0  33.792000    91.136001  132.095993
128      4096.0         64.0   8.0  59.392001   148.479998  244.736001
129      4096.0        128.0   2.0  27.648000    57.344001   68.608001
130      4096.0        128.0   4.0  34.816001    73.728003   97.280003
131      4096.0        128.0   8.0  57.344001   104.447998  157.695994
132      4096.0        256.0   2.0  32.768000    69.632001         inf
133      4096.0        256.0   4.0  39.935999    83.967999         inf
134      4096.0        256.0   8.0  59.392001    98.304003         inf
135      8192.0          8.0   2.0  34.816001   483.328015  179.199994
136      8192.0          8.0   4.0  51.199999   924.672008  340.992004
137      8192.0          8.0   8.0  84.991999  1827.839971  666.624010
138      8192.0         32.0   2.0  30.719999   144.383997  221.184000
139      8192.0         32.0   4.0  54.272000   260.096014  428.032011
140      8192.0         32.0   8.0  90.112001   490.496010  831.488013
141      8192.0         64.0   2.0  33.792000    91.136001  133.120000
142      8192.0         64.0   4.0  58.368001   148.479998  243.711993
143      8192.0         64.0   8.0  97.280003   264.191985  454.656005
144      8192.0        128.0   2.0  34.816001    73.728003   97.280003
145      8192.0        128.0   4.0  57.344001   104.447998  158.720002
146      8192.0        128.0   8.0  92.160001   163.839996  277.503997
147      8192.0        256.0   2.0  39.935999    83.967999         inf
148      8192.0        256.0   4.0  59.392001    97.280003         inf
149      8192.0        256.0   8.0  91.136001   130.048007         inf

@shuaills
Copy link
Collaborator Author

@sleepcoo @zhyncs

@sleepcoo sleepcoo self-requested a review March 12, 2025 15:37
Copy link
Collaborator

@BBuf BBuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. LGTM! cc @zhyncs

@zhyncs zhyncs merged commit 817d437 into sgl-project:main Mar 13, 2025
3 of 5 checks passed
hebiao064 pushed a commit to hebiao064/sglang that referenced this pull request Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants