transformer-lens-interpretability

7.6

104

301

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

interpretability

7.6

Rating

Installs

AI & LLM

Quick Review

Excellent mechanistic interpretability skill with comprehensive workflows for activation patching, circuit analysis, and induction head detection. The SKILL.md provides clear, executable code examples and troubleshooting guidance that would enable a CLI agent to perform complex interpretability research. Structure is well-organized with three detailed workflows, common pitfalls section, and proper delegation to reference files. The skill addresses a genuinely complex domain where a naive CLI agent would struggle significantly with the nuances of hook management, tokenization gotchas, and causal tracing experiments. Minor deductions: description could be slightly more explicit about when to invoke each workflow, and novelty is somewhat limited by the fact that this wraps an existing well-documented library (though the curated workflows and pitfall warnings add substantial value).