huggingface-accelerate

7.6

176

222

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

distributed-training

7.6

Rating

Installs

Machine Learning

Quick Review

Excellent skill that clearly demonstrates how to add distributed training to PyTorch scripts with minimal code changes. The description accurately captures the '4 lines' value proposition and unified API benefits. Task knowledge is comprehensive with 5 detailed workflows covering the most common use cases (single→multi GPU, mixed precision, DeepSpeed, FSDP, gradient accumulation), complete with working code examples. Structure is clean with a logical progression from quick start to advanced topics, and references to separate files for specialized content (Megatron, custom plugins, performance) keep the main document focused. The skill provides high novelty by drastically reducing the complexity and token count needed for distributed training setup—converting what would typically require extensive CLI configuration and PyTorch distributed knowledge into a simple 4-line code change. Minor room for improvement: could explicitly show a complete end-to-end example with data loading and model definition, and the CLI invocation patterns in the description could be slightly more prominent. Overall, this is a very strong skill that would save significant effort and tokens for any agent attempting distributed training tasks.