DeepSeek Simulator

This simulator emulates the potential performance of DeepSeek V3/R1 across various NVIDIA Hopper architectures. For compatibility with this tool, other hardware platforms need to implement their respective GEMM/MLA kernels.

More info about this tool is show in my Chinese blog:

Installation

Requirements

Follow DeepGemm, it requires:

Hopper architecture GPUs, sm_90a must be supported
Python 3.8 or above
CUDA 12.3 or above
PyTorch 2.1 or above
CUTLASS 3.6 or above (could be cloned by Git submodule)

# omit install torch

# install FlashMLA
git clone  --recursive https://github.com/deepseek-ai/FlashMLA.git
python setup.py install

# install DeepGemm
git clone --recursive https://github.com/deepseek-ai/DeepGEMM.git
python setup.py install

Features

Hardware Supported

H800 80G(tested)
H20 96G(tested)
Other Hopper architectures should be working

Parallel Method：

Attention DP , MoE EP
Attention TP+DP, MoE EP

Overlap Method：

two-mircobatch overlapping （DeepSeek Official）
single-batch compute-communication overlapping

Results

H800

H800 80G with two-mircobatch overlapping
H800 80G with single-batch compute-communication overlapping

H20

H20 96G with two-mircobatch overlapping
H20 96G with single-batch compute-communication overlapping

License

This code repository is released under the MIT License.

Citation

@misc{deepseek_simulator,
      title={DeepSeek-Simulator: A test-based Performance Simulator for DeepSeek V3/R1}, 
      author={Han Shen},
      year={2025},
      publisher = {GitHub},
      howpublished = {\url{https://github.com/shenh10/DeepSeek_Simulator.git}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
python		python
results		results
LICENCE		LICENCE
README.md		README.md
run_test.sh		run_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepSeek Simulator

Installation

Requirements

Features

Hardware Supported

Parallel Method：

Overlap Method：

Results

H800

H20

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

billishyahao/DeepSeek_Simulator

Folders and files

Latest commit

History

Repository files navigation

DeepSeek Simulator

Installation

Requirements

Features

Hardware Supported

Parallel Method：

Overlap Method：

Results

H800

H20

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages