TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. simpo-training
Improve

simpo-training

8.1

by davila7

57Favorites
448Upvotes
0Downvotes

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

preference-optimization

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill for SimPO training with comprehensive coverage. The description clearly conveys when to use SimPO vs alternatives (DPO/PPO). Task knowledge is strong with three concrete workflows (base model, instruct fine-tuning, reasoning tasks), complete installation steps, and practical troubleshooting for common issues like loss divergence and OOM errors. Structure is good with logical sections and references to separate files for deep-dive topics (loss functions, hyperparameters, datasets). Novelty is moderate-to-high: while a CLI agent could theoretically configure training, the skill consolidates model-specific learning rates, beta/gamma tuning, and workflow patterns that would require significant trial-and-error or research. The reference-free advantage and comparative guidance (when to use vs DPO/PPO/GRPO) add meaningful decision-making value. Minor room for improvement: could include a decision tree diagram for algorithm selection or more explicit evaluation/validation steps post-training.

LLM Signals

Description coverage9
Task knowledge9
Structure8
Novelty7

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7

davila7

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 avatar
davila7

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online