Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
7.0
Rating
0
Installs
Machine Learning
Category
Excellent skill for SimPO training with comprehensive coverage. The description clearly explains when to use SimPO vs alternatives (DPO/PPO). Task knowledge is strong with 3 concrete workflows (base model, instruct fine-tuning, reasoning tasks), complete configs, and troubleshooting for common issues (loss divergence, OOM, capability forgetting). Structure is clean with concise main doc and references for advanced topics. Novelty is good—SimPO is a specialized method that would be token-intensive for a CLI agent to implement from scratch, though the underlying concept is a known academic technique. Minor deduction on novelty as preference optimization is an established task category, but the reference-free approach and specific hyperparameter guidance add meaningful value.
Loading SKILL.md…