Claude CodeskillsAI engineeringAnthropic

Lessons from Building Claude Code: How Skills Are Changing AI Engineering

2026-03-23Watch on YouTube ↗

Thariq Shihipar breaks down how Claude Code's skill system works - and what it reveals about where AIassisted engineering is heading.

Use with AI

ShareX LinkedIn

Thariq Shihipar from Anthropic walks through the architecture behind Claude Code's skill system - and it's a more substantive engineering talk than the title suggests.

Skills as Environments, Not Macros

The key mental model shift: skills in Claude Code aren't text expansions or prompt templates. They're isolated execution environments with their own tool access, context, and failure modes. A skill can invoke other tools, maintain intermediate state, and return structured output that the parent agent consumes.

This is a meaningful architectural distinction. Most people building on top of LLMs treat prompts as the primary unit of composition. The Claude Code team treats environments as the primary unit.

The Nine Skill Types

The taxonomy Thariq presents maps to different trust levels and execution patterns:

Read-only skills: can read files, search code, fetch URLs - no side effects
Write skills: can modify files, create branches, run tests
Composite skills: orchestrate multiple sub-skills with shared context
Background skills: fire-and-forget for long-running tasks
Verification skills: validate that a previous action had the intended effect
Research skills: web search + synthesis, designed to minimize hallucination on lookup tasks
Setup skills: configure environments before main task execution
Teardown skills: cleanup and checkpointing
Human-in-the-loop skills: pause and surface decisions that require explicit approval

The read/write boundary is enforced at the environment level, not just convention. Write skills require explicit permission grants; read-only skills can be trusted in broader contexts.

Progressive Disclosure

One of the more interesting design choices: skills expose a minimal interface by default. The caller specifies a goal; the skill handles the how. Over-specification (telling the skill exactly how to accomplish the task) tends to make skills brittle.

This mirrors a pattern in good API design: expose behavior, not implementation. The skill contract is: given this intent, produce this result. The implementation is internal.

Failure-Driven Design

The part of the talk that's most useful for practitioners building their own skill-like systems: the team spent more time designing failure modes than success paths.

What happens when a skill partially completes? When it encounters something unexpected? When it would need to make a destructive decision? The Claude Code answer is: surface the decision rather than make it. A skill that fails loudly and traceably is more useful than one that handles edge cases silently and incorrectly.

For production AI systems in regulated environments, this framing matters. The question isn't just "does the skill work?" but "when it fails, does it fail in a way I can understand and recover from?"

Want to go deeper?

I work with SaaS companies, real-estate, finance, and regulated-industry teams on AI adoption. Book a 20-minute strategy call - no pitch, just a focused conversation about your situation.

Book a strategy call →Download the checklist →

I make videos like this when I have something worth explaining. Join AI Command Room and I'll let you know when the next one ships.