Skip to content

billishyahao/DeepSeek_Simulator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSeek Simulator

This simulator emulates the potential performance of DeepSeek V3/R1 across various NVIDIA Hopper architectures. For compatibility with this tool, other hardware platforms need to implement their respective GEMM/MLA kernels.

More info about this tool is show in my Chinese blog:

Installation

Requirements

Follow DeepGemm, it requires:

  • Hopper architecture GPUs, sm_90a must be supported
  • Python 3.8 or above
  • CUDA 12.3 or above
  • PyTorch 2.1 or above
  • CUTLASS 3.6 or above (could be cloned by Git submodule)
# omit install torch

# install FlashMLA
git clone  --recursive https://github.com/deepseek-ai/FlashMLA.git
python setup.py install

# install DeepGemm
git clone --recursive https://github.com/deepseek-ai/DeepGEMM.git
python setup.py install

Features

Hardware Supported

  • H800 80G(tested)
  • H20 96G(tested)
  • Other Hopper architectures should be working

Parallel Method:

  • Attention DP , MoE EP
  • Attention TP+DP, MoE EP

Overlap Method:

  • two-mircobatch overlapping (DeepSeek Official)
  • single-batch compute-communication overlapping

Results

H800

  • H800 80G with two-mircobatch overlapping H800_two_microbatch_overlapping_results

  • H800 80G with single-batch compute-communication overlapping H800_single_batch_comp_comm_overlapping_results

H20

  • H20 96G with two-mircobatch overlapping H20_two_microbatch_overlapping_results
  • H20 96G with single-batch compute-communication overlapping H20_single_batch_comp_comm_overlapping_results

License

This code repository is released under the MIT License.

Citation

@misc{deepseek_simulator,
      title={DeepSeek-Simulator: A test-based Performance Simulator for DeepSeek V3/R1}, 
      author={Han Shen},
      year={2025},
      publisher = {GitHub},
      howpublished = {\url{https://github.com/shenh10/DeepSeek_Simulator.git}},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.2%
  • Shell 7.8%