A Skill Isn't a Prompt. It's Documentation.

Updates on structured prompting

Jun 15, 2026

A few weeks ago I posted an upgrade to the prompt framework I’ve been teaching for two years.

Tasks → Skills
Context → Knowledge
Content → Materials

The symmetry is nice, but it doesn’t quite work out in reality.

I’d been describing a skill as the evolution of the task, but a task was a one-shot instruction. A skill is a reusable document an agent invokes across situations.

True, as far as it goes. But then I opened one of my own skill files and started labeling what was actually in it … and the task turned out to be the smallest part of the document.

Though this is a narrow analogy, I think this is where “writing for machines” is going as AI workflows become both more complex and more accessible.

I’ll be presenting on this soon with Scott Abel (with a recording available).

➡️ Check it out here.

The same structure that makes content reusable makes it testable.

What’s actually inside a skill

Here is a quick review if you haven’t been tracking this.

In December, Anthropic released Agent Skills as an open standard, and other platforms picked it up, which means its pretty close to a universal format.

Simply put, skills are text files that AI models reference for tasks: YAML frontmatter carrying machine-readable metadata, Markdown instructions for the model, and whatever scripts and reference assets the work requires.

➡️ You can see one of my examples here.

Read that back as a content professional. A file with metadata and managed assets.

Sound familiar? It should. That’s documentation.

Here’s what I found when I labeled my own weekly-class-plan skill, the file I use to build the weekly plans I send students.

Five information types. One document. The task was in there, but it arrived bundled with the concepts, references, and principles the model needs to execute without me supervising every step.

That is what a skill actually is. Not an evolved prompt, but a complete piece of structured documentation, written for an audience that happens to be a machine.

Why the typing matters more now than it did last year

You might see this as a cute exercise … except that the newer models keep locating failure exactly where structure is missing.

For a sense of scale, consider AgentIF, a 2025 benchmark that tested how well leading models followed long, real-world agentic instructions. These are the extended system prompts and tool specs that skills are made of.

The best models at the time perfectly followed fewer than a third of those instructions, and performance fell off a cliff once the instructions ran long. The research methodology has weak spots worth their own post, and the frontier has moved several generations since, so I’d treat the number as illustrative rather than proof.

But the basic problem still exists.

A good example is Claude’s new releases that come with a warning: The model now takes your instructions literally, where earlier models interpreted them loosely or quietly skipped parts. Re-tune your prompts before you upgrade.

The best models at the time perfectly followed fewer than a third of those instructions, and performance fell off a cliff once the instructions ran long.

This means your context, prompts, and skills need to be more precise, not less.

Better instruction following means the model does exactly what you wrote, including the parts you didn’t mean to write. The reliability problem is shifting from “the model can’t follow” to “the model follows precisely, and you didn’t say what you thought you said.”

Now picture an untyped skill file through that lens. Everything written as a directive. A conceptual explanation bleeding into the middle of a procedure. Gaps the model quietly fills from training data instead of from your intent.

I’ve started calling these failure modes type collapse, type contamination, and missing types. The better the model gets at following instructions, the more visible they become, because the model stops covering for them.

The fix isn’t a better prompt, but better context … the kind content professionals and many writers have practiced for decades.

One type per block.
Prerequisites before steps.
Facts stated as facts.
Guidance scoped and bounded.

The model responds to those boundaries because human writing has always carried them, and the model was shaped on human writing. That’s becoming even more important.

The part I’m saving for the webinar

There’s a second payoff to typed content that matters more than output quality. Typing tells you how to evaluate the output.

Reference and Task content is largely verifiable. The URL matches or it doesn’t. The steps complete or they don’t.

Concept and Principle content takes judgment. Is the explanation accurate, is the guidance scoped correctly.

The evaluation research has converged on exactly this split, pairing probablistic checks with rubric review, and typed content tells you which mode applies to which block.

The same structure that makes content reusable makes it testable. That’s one thing I’ll be talking about in my next Content Wrangler webinar.

The session is Structured Prompting 2.0: What the Research Actually Says, hosted by Scott Abel on June 16 at 1:30 PM ET. If you’ve missed it, there will be a recording.

If you or your team is building skills, agent instructions, or AI knowledge bases right now, this is the hour that connects what you already know how to do to the artifacts that suddenly need it.

Thanks for reading Cyborgs Writing! This post is public so feel free to share it.

Discussion about this post

Ready for more?