Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.
8.1
Rating
0
Installs
Machine Learning
Category
Excellent skill for knowledge distillation with comprehensive coverage of techniques (temperature scaling, soft targets, reverse KLD, MiniLLM). The description clearly articulates when to use this skill, and SKILL.md provides complete, production-ready code including custom trainers, loss functions, and multi-teacher strategies. Structure is logical with clear progression from quick start to advanced techniques. Task knowledge is strong with multiple implementation approaches and best practices. Novelty is good—distillation requires specialized understanding of loss functions, temperature scaling trade-offs, and forward vs reverse KLD that would be token-intensive for a CLI agent to derive. The skill meaningfully reduces complexity for model compression tasks. Minor improvement: could benefit from more explicit metrics/benchmarks for evaluating distillation success, but overall this is a high-quality, immediately actionable skill.
Loading SKILL.md…