TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. simpo-training
Improve

simpo-training

7.0

by zechenzhangAGI

176Favorites
182Upvotes
0Downvotes

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

llm-training

7.0

Rating

0

Installs

Machine Learning

Category

Quick Review

Excellent skill for SimPO training with comprehensive coverage. The description clearly explains when to use SimPO vs alternatives (DPO/PPO). Task knowledge is strong with 3 concrete workflows (base model, instruct fine-tuning, reasoning tasks), complete configs, and troubleshooting for common issues (loss divergence, OOM, capability forgetting). Structure is clean with concise main doc and references for advanced topics. Novelty is good—SimPO is a specialized method that would be token-intensive for a CLI agent to implement from scratch, though the underlying concept is a known academic technique. Minor deduction on novelty as preference optimization is an established task category, but the reference-free approach and specific hyperparameter guidance add meaningful value.

LLM Signals

Description coverage9
Task knowledge9
Structure9
Novelty7

GitHub Signals

891
74
19
2
Last commit 0 days ago

Publisher

zechenzhangAGI

zechenzhangAGI

Skill Author

Related Skills

ml-pipelinesparse-autoencoder-traininghuggingface-accelerate

Loading SKILL.md…

Try onlineView on GitHub

Publisher

zechenzhangAGI avatar
zechenzhangAGI

Skill Author

Related Skills

ml-pipeline

Jeffallan

6.4

sparse-autoencoder-training

zechenzhangAGI

7.6

huggingface-accelerate

zechenzhangAGI

7.6

moe-training

zechenzhangAGI

7.6
Try online