TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. sglang
Improve

sglang

7.6

by zechenzhangAGI

140Favorites
320Upvotes
0Downvotes

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

llm-serving

7.6

Rating

0

Installs

AI & LLM

Category

Quick Review

Exceptional skill for SGLang serving with comprehensive coverage of structured generation, RadixAttention caching, and production deployment. The description clearly identifies when to use SGLang vs. alternatives (vLLM, TensorRT-LLM). Provides complete code examples for JSON/regex/grammar outputs, agent workflows, multi-turn chats, and OpenAI-compatible API usage. Includes performance benchmarks showing 5-10× speedups. Structure is excellent with concise SKILL.md and detailed references. High novelty: RadixAttention prefix caching and constrained decoding are complex features that would require many tokens for a CLI agent to implement. Minor improvement possible: could add error handling patterns and edge cases.

LLM Signals

Description coverage9
Task knowledge10
Structure9
Novelty9

GitHub Signals

891
74
19
2
Last commit 0 days ago

Publisher

zechenzhangAGI

zechenzhangAGI

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

zechenzhangAGI avatar
zechenzhangAGI

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online