by zechenzhangAGI
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
2.2
Rating
0
Installs
Machine Learning
Category
No summary available.
zechenzhangAGI
Skill Author
Loading SKILL.md…
ml-pipeline
Jeffallan
sparse-autoencoder-training
huggingface-accelerate
moe-training