👆 touchnet
is highly motivated by torchtitan
. Both of them are clean, minimal codebases for large-scale LLM training using native PyTorch. The main goal that differentiates 👆 touchnet
from torchtitan
is that 👆 touchnet
focuses on multimodal LLM training where special data pipelines and model structures are needed. Please note that 👆 touchnet
is currently in a pre-release state and under extensive development.
Our guiding principles when building 👆 touchnet
are:
- ⚡️ Blazing-fast checkpointable data loader with modular preprocessing and fully random access for large scale multimodal data
- [New Storage Format] optimized for random access on sequentially saved tar files
- Efficient [Sequence Packing] powered by [Flex Attention]
- 🤗 Native integration with
transformers
models while get rid of structured trainer classes (e.g., [PyTorch-Lightning] or [HuggingFace Trainer])- Only reuse model definitions in
transformers
and leave other parts untouched - Entire training logic exposed in a single file [touchnet/bin/train.py], everything is under your control
- Only reuse model definitions in
- 🛠️ Built-in profilers (CPU/GPU/memory) with flight recorder diagnostics.
- [Nsys-like Profiler] to get optimization recommendations
- [Memory Monitor] to debug OOM errors and improve memory usage
- 🎯 N-D parallelism enabled through PyTorch native API and minimal lines of model code changes.
- ✨ Intuitive API design for rapid adoption & customization in minutes.
- Supported tasks: [text/pretrain], [audio/pretrain], [audio/sft/asr], more tasks coming soon
- Supported models: [Llama], [LlamaForASR] more models coming soon
touchnet_glance2.mp4
Loss, Accuracy, Memory, Throughput, TFLOPs, and MFU logged via both stdout and Tensorboard.
touchnet_tb2.mp4
Detailed CPU/GPU profiling that can be visualized in Tensorboard. Enjoy your optimization journey ~
touchynet_mem.mp4
Memory profiling identifies GPU memory allocation patterns to guide tuning strategies.
Here is an end-to-end workflow for a traning job in 👆 TouchNet
:
stage-1
: Download dataset. We useload_dataset
API inHuggingFace.datasets
to download specific data.stage-2
: Convert dataset format toTouchDataset
. see [touchnet/bin/make_data.py]stage-3
: (optional) Convert hf-format ckpt to torch distributed ckpt. see [touchnet/bin/convert_hf_to_dcp.py]stage-4
: Start training, either from scratch or from pretrained ckpt that has been converted in stage-3. see [touchnet/bin/train.py]stage-5
: Convert torch distributed ckpt to hf-format, enjoy HuggingFace ecosystem for inference and deployment. see [touchnet/bin/convert_dcp_to_hf.py]
For a more concrete example running those stages one by one, see [examples/audio/sft/asr/aishell/run.sh]
- support audio/sft/tts
- support MoE
- support vision/pretrain vision/sft
- support text/sft
# NOTE(xcsong): Ensure that the linux system's glibc version is greater than or equal to 2.17 (see `ldd --version`)
# (for example, Ubuntu 22.04 and later versions).
conda create -n touchnet python=3.10
conda activate touchnet
conda install -c conda-forge sox ffmpeg -y
# install cuda12.6.3+cudnnn9.5.1.17, be aware to change `prefix` to your path.
bash install_cuda_cudnn.sh
# install the most recent PyTorch to use the latest features of parallelism. recommended torch>=2.7.0
pip install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126 --force-reinstall
pip install .
@misc{touchnet,
title={TouchNet: A PyTorch native N-D parallel library for large-scale multimodal LLM (text/audio) training},
author={Xingchen Song},
year={2025},
url={https://github.com/xingchensong/TouchNet},
}
- This repo is highly motivated by torchtitan and we borrowed a lot of code from it.
- This repo also benefits from Megatron-LM, WeNet, flame.
Thanks for their wonderful works.