TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. openrlhf-training
Improve

openrlhf-training

8.1

by davila7

78Favorites
435Upvotes
0Downvotes

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

RLHF

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill for high-performance RLHF training. The description clearly identifies when to use OpenRLHF (large models 7B-70B+, Ray+vLLM acceleration, distributed training), and SKILL.md provides comprehensive runnable commands for all major workflows (PPO, GRPO, DPO, full pipeline). Task knowledge is strong with concrete examples, hyperparameters, and troubleshooting for common issues (OOM, instability, slow generation). Structure is good with logical sections and references to external files for advanced topics. Novelty is high - orchestrating distributed RLHF with Ray, vLLM, and multiple algorithms is complex and token-intensive for a CLI agent alone. Minor improvement areas: could add more explicit parameter explanations and clearer decision trees for algorithm selection, but overall this is a highly useful skill that meaningfully reduces the complexity of deploying production RLHF training.

LLM Signals

Description coverage9
Task knowledge9
Structure8
Novelty8

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7

davila7

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 avatar
davila7

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online