sglang

7.6

140

320

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

llm-serving

7.6

Rating

Installs

AI & LLM

Quick Review

Exceptional skill for SGLang serving with comprehensive coverage of structured generation, RadixAttention caching, and production deployment. The description clearly identifies when to use SGLang vs. alternatives (vLLM, TensorRT-LLM). Provides complete code examples for JSON/regex/grammar outputs, agent workflows, multi-turn chats, and OpenAI-compatible API usage. Includes performance benchmarks showing 5-10× speedups. Structure is excellent with concise SKILL.md and detailed references. High novelty: RadixAttention prefix caching and constrained decoding are complex features that would require many tokens for a CLI agent to implement. Minor improvement possible: could add error handling patterns and edge cases.