OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
7.0
Rating
0
Installs
AI & LLM
Category
Excellent skill with comprehensive coverage of Whisper's capabilities. The description clearly explains when to use it, and SKILL.md provides detailed code examples, model selection guidance, and practical usage patterns. Strong task knowledge with working code for transcription, translation, batch processing, and CLI usage. Well-structured with logical sections and a clear progression from basics to advanced topics. Novelty is moderate-to-good: while Whisper is accessible via simple pip install and CLI, this skill consolidates best practices, model selection logic, GPU optimization, and integration patterns that would require significant token usage for a CLI agent to discover independently. The references/languages.md file is appropriately separated. Minor improvement could include more explicit decision trees for model selection based on constraints.
Loading SKILL.md…