OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
8.1
Rating
0
Installs
AI & LLM
Category
Excellent comprehensive skill for Whisper speech recognition. The description clearly conveys capabilities (multilingual ASR, transcription, translation) that enable a CLI agent to invoke it appropriately. Task knowledge is thorough with complete code examples for basic transcription, model selection, batch processing, CLI usage, and integration patterns. Structure is logical with clear sections, though the SKILL.md is somewhat lengthy (could offload some reference tables to separate files). Novelty is moderate: while Whisper setup and parameter tuning can be complex, basic transcription is relatively straightforward for an LLM agent, so the token savings are meaningful but not exceptional. The skill excels at providing actionable guidance (model recommendations, best practices, performance metrics) and covers edge cases well (GPU acceleration, streaming, integration). Minor improvement: the languages.md reference file appears unused in the main document.
Loading SKILL.md…

Skill Author