Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.
8.7
Rating
0
Installs
Machine Learning
Category
Exceptional skill for training and analyzing Sparse Autoencoders using SAELens. The description clearly communicates when to use this skill (discovering interpretable features, analyzing superposition, studying monosemantic representations). Task knowledge is comprehensive with three complete workflows: loading pre-trained SAEs, training custom SAEs, and feature analysis/steering, each with executable code, hyperparameter guidance, evaluation metrics, and troubleshooting. Structure is excellent with clear sections, comparison tables, and external references. High novelty: SAE training is a specialized mechanistic interpretability technique requiring significant domain expertise, hyperparameter tuning knowledge, and understanding of concepts like polysemanticity and superposition that would be expensive for a CLI agent to derive independently. Minor improvement possible: could add a brief decision tree for choosing between workflows or architectures upfront.
Loading SKILL.md…