TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. llama-cpp
Improve

llama-cpp

8.1

by davila7

197Favorites
310Upvotes
0Downvotes

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

llm-inference

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill for CPU/non-NVIDIA LLM inference. The description clearly states when to use llama.cpp vs alternatives (TensorRT-LLM, vLLM), making it easy for a CLI agent to decide. Task knowledge is comprehensive with concrete installation, quantization, and hardware acceleration commands. Structure is good with a well-organized main file and logical reference separation. Novelty is solid: while CLI agents could eventually figure out llama.cpp setup, this skill consolidates platform-specific commands (Metal/CUDA/ROCm), quantization trade-offs, and performance benchmarks that would require significant exploration and token usage to discover independently. Minor deduction on novelty because basic llama.cpp usage is relatively straightforward once discovered.

LLM Signals

Description coverage9
Task knowledge9
Structure8
Novelty7

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7

davila7

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 avatar
davila7

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online