clip

7.0

194

179

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

vision-language

7.0

Rating

Installs

Machine Learning

Quick Review

Excellent skill for CLIP with comprehensive coverage of use cases, clear code examples, and practical guidance. The description clearly communicates when to use CLIP (zero-shot classification, image-text matching, semantic search). Task knowledge is strong with working code for all major use cases including batch processing and vector database integration. Structure is clean with logical sections, though the applications.md reference is not heavily leveraged in the main document. Novelty is good - CLIP reduces token cost for vision-language tasks that would require extensive prompting or tool chaining with a CLI agent alone. Minor improvement areas: could provide more specific guidance on prompt engineering for better zero-shot results, and deeper integration examples with the references folder content.