Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
8.1
Rating
0
Installs
Machine Learning
Category
This skill provides comprehensive DeepSpeed distributed training expertise with strong documentation on ZeRO optimization stages, pipeline parallelism, mixed precision training, and advanced features like 1-bit Adam and sparse attention. The description clearly identifies when to use the skill (distributed training scenarios) and what capabilities it offers. Task knowledge is excellent with detailed configuration examples, API patterns, and code snippets covering DeepNVMe I/O, MoE models, curriculum learning, and profiling tools. Structure is logical with a clear index to reference files organized by topic. The skill offers meaningful value by consolidating complex distributed training knowledge that would otherwise require extensive documentation traversal and many tokens for a CLI agent to discover independently.
Loading SKILL.md…