openrlhf-training

8.1

435

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

RLHF

8.1

Rating

Installs

AI & LLM

Quick Review

Excellent skill for high-performance RLHF training. The description clearly identifies when to use OpenRLHF (large models 7B-70B+, Ray+vLLM acceleration, distributed training), and SKILL.md provides comprehensive runnable commands for all major workflows (PPO, GRPO, DPO, full pipeline). Task knowledge is strong with concrete examples, hyperparameters, and troubleshooting for common issues (OOM, instability, slow generation). Structure is good with logical sections and references to external files for advanced topics. Novelty is high - orchestrating distributed RLHF with Ray, vLLM, and multiple algorithms is complex and token-intensive for a CLI agent alone. Minor improvement areas: could add more explicit parameter explanations and clearer decision trees for algorithm selection, but overall this is a highly useful skill that meaningfully reduces the complexity of deploying production RLHF training.