promptfoo-evaluation

6.8

168

145

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".

evaluation

6.8

Rating

Installs

AI & LLM

Quick Review

Excellent skill for LLM evaluation with Promptfoo. The description clearly identifies when to use this skill (prompt testing, eval configs, custom assertions, llm-rubric, few-shot examples). SKILL.md is comprehensive and well-structured, covering configuration, prompt formats, test cases, Python assertions, LLM-as-judge, and troubleshooting. Includes practical examples like echo provider for preview mode, few-shot patterns, and long-text handling. The real-world example and reference to an actual project add credibility. Structure is logical with clear sections, though the document is lengthy (could benefit from more content in separate files). Novelty is solid—setting up Promptfoo evaluations with custom Python assertions and complex few-shot patterns would consume many tokens for a CLI agent to figure out independently. Minor deduction on structure for keeping all content in SKILL.md rather than splitting advanced patterns into separate guides, and novelty score reflects that while useful, the core Promptfoo workflow is somewhat straightforward once learned.