clip

8.1

107

271

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

vision-language

8.1

Rating

Installs

Machine Learning

Quick Review

Excellent skill documentation for CLIP with comprehensive coverage of use cases, clear code examples, and practical guidance. The description accurately conveys capabilities for zero-shot classification, image-text matching, and semantic search. Task knowledge is strong with working examples for all major use cases including batch processing and vector database integration. Structure is clean with good organization, though the applications.md reference file appears unused. Novelty is moderate-to-good: while CLIP operations are relatively straightforward, the skill provides valuable prebuilt patterns for semantic search, content moderation, and batch processing that would otherwise require significant token overhead for an agent to construct. Best practices and performance benchmarks add practical value. Minor improvements could include more explicit CLI invocation patterns and clearer integration points with the referenced applications.md file.