Lessons from Building Claude Code: How Skills Are Changing AI Engineering
Thariq Shihipar breaks down how Claude Code's skill system works — and what it reveals about where AIassisted engineering is heading.
Thariq Shihipar from Anthropic walks through the architecture behind Claude Code's skill system — and it's a more substantive engineering talk than the title suggests.
Skills as Environments, Not Macros
The key mental model shift: skills in Claude Code aren't text expansions or prompt templates. They're isolated execution environments with their own tool access, context, and failure modes. A skill can invoke other tools, maintain intermediate state, and return structured output that the parent agent consumes.
This is a meaningful architectural distinction. Most people building on top of LLMs treat prompts as the primary unit of composition. The Claude Code team treats environments as the primary unit.
The Nine Skill Types
The taxonomy Thariq presents maps to different trust levels and execution patterns:
- Read-only skills: can read files, search code, fetch URLs — no side effects
- Write skills: can modify files, create branches, run tests
- Composite skills: orchestrate multiple sub-skills with shared context
- Background skills: fire-and-forget for long-running tasks
- Verification skills: validate that a previous action had the intended effect
- Research skills: web search + synthesis, designed to minimize hallucination on lookup tasks
- Setup skills: configure environments before main task execution
- Teardown skills: cleanup and checkpointing
- Human-in-the-loop skills: pause and surface decisions that require explicit approval
The read/write boundary is enforced at the environment level, not just convention. Write skills require explicit permission grants; read-only skills can be trusted in broader contexts.
Progressive Disclosure
One of the more interesting design choices: skills expose a minimal interface by default. The caller specifies a goal; the skill handles the how. Over-specification (telling the skill exactly how to accomplish the task) tends to make skills brittle.
This mirrors a pattern in good API design: expose behavior, not implementation. The skill contract is: given this intent, produce this result. The implementation is internal.
Failure-Driven Design
The part of the talk that's most useful for practitioners building their own skill-like systems: the team spent more time designing failure modes than success paths.
What happens when a skill partially completes? When it encounters something unexpected? When it would need to make a destructive decision? The Claude Code answer is: surface the decision rather than make it. A skill that fails loudly and traceably is more useful than one that handles edge cases silently and incorrectly.
For production AI systems in regulated environments, this framing matters. The question isn't just "does the skill work?" but "when it fails, does it fail in a way I can understand and recover from?"
Want to go deeper?
I work with SaaS companies, real-estate, finance, and regulated-industry teams on AI adoption. Book a 20-minute strategy call — no pitch, just a focused conversation about your situation.
I make videos like this when I have something worth explaining. Join AI Command Room and I'll let you know when the next one ships.