TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. nemo-curator
Improve

nemo-curator

8.1

by davila7

179Favorites
291Upvotes
0Downvotes

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

data curation

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill documentation for GPU-accelerated LLM data curation. The description clearly communicates when to use NeMo Curator versus alternatives, and the SKILL.md provides comprehensive code examples covering all major operations (filtering, deduplication, PII redaction, multi-modal processing). Task knowledge is strong with concrete pipelines, performance benchmarks, and real-world cost comparisons (89% savings demonstrated). Structure is good with logical progression from basics to advanced patterns, though the single file is somewhat lengthy; some content could be modularized. Novelty is solid—this addresses a computationally expensive problem (data curation at TB scale) where GPU acceleration provides 10-16× speedups, meaningfully reducing both time and cost compared to CPU approaches or manual CLI operations. Minor improvements could include more modular organization and clearer separation of quickstart vs advanced patterns.

LLM Signals

Description coverage9
Task knowledge9
Structure8
Novelty7

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7

davila7

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 avatar
davila7

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online