0:00
/
0:00
Transcript

Do Prompts Really Need Markup?

Deep Reading, Episode 8

If you’ve taken any course on prompt design, including mine, you’ve probably been told to use markup in some way.

This might be markdown, XML, or, in my case, semantic tags.

➡️ See this free lesson from my Writing with Machines course to learn more.

These function as labels for both machines and humans that help organize your prompt into sections, for example [ROLE], [CONTEXT], [TASK].

I’ve taught this. I still use tags in my own work when creating reusable prompts.

… And I get asked constantly whether they’re actually necessary anymore, especially now that models keep getting more capable.

I’m Lance Cummings. And welcome to my intermittent (or aspirationally biweekly) podcast that explore deep research on AI and writing.

That question got me digging into recent research on prompt structure and performance, and what I found reframes the conversation a bit.

The tags aren’t really the point. Specificity is the point. The tags just help us get there.

But as we move from prompt design into what Anthropic now calls context engineering, tags may matter more than you think. Just not for the reasons you’d expect.

The Real Question

Here’s what most people mean when they ask about semantic tags.

Does the AI actually perform better when I label parts of my prompts? Does [GOAL] do something that “I want to …” doesn’t?

We have good research on this now. Sclar and colleagues at ICLR 2024 tested how formatting preserves meaning across 53 tasks and found that formatting alone could swing accuracy dramatically, but the best format for one model wasn’t the best for another.

Different models often prefer different structures. This is why you should test you prompts as a team … and not just go by gut.

But if you’re looking for a formatting rule that works everywhere, there isn’t one. That’s a dead end.

But the research did find something that works everywhere, and it’s not about format at all.

Its All About Specificity

Pecher and colleagues published a study in February 2025 investigating why small changes to prompts produce wildly different outputs. They traced most of it back to a single cause: prompt underspecification.

Not format. Not tags.

The prompts that produced erratic results were prompts that didn’t clearly describe the task, the constraints, or the expected output. Well-specified prompts suffered dramatically less from sensitivity, regardless of formatting choices.

Think of it like giving directions.

“Go to the store” is underspecified. You might end up at a grocery store, a hardware store, a convenience store three blocks away.

But “drive to the Harris Teeter on College Road, pick up two pounds of ground beef from the butcher counter, and use the self-checkout,” now the format barely matters. The task is embedded in the sentence structure itself and will constrain the output whether you text it, email it, or scribble it on a sticky note.

This maps directly onto the three-component model I teach: Task, Context, Content.

Those three categories were never about the brackets. They were about forcing you to answer three separate questions:

  • What do I want the AI to do?

  • What does it need to know about the situation?

  • And what source material should it work with?

The tags were one way to organize those answers. A useful way. But the answers themselves are what drive performance.

What About New Reasoning Models?

Now, I should complicate this, because the landscape has shifted.

Tam and colleagues showed at EMNLP 2024 that forcing structured output formats significantly degraded reasoning.

Imagine you ask a colleague to analyze a customer support problem and give you their recommendation.

Normally, they’d read through the tickets, notice some patterns, and reason their way to a conclusion.

Now imagine instead you hand them a form — fill in the “Recommendation” field first, then the “Reasoning” field.

That’s essentially what happened when models were forced to produce structured output like JSON or XML. The model placed the answer before the reasoning, skipping the step where it works through the problem.

Their solution was a two-step approach: reason in natural language first, then convert to structured format.

Here’s what’s changed since then, though.

Structure your context, not your commands.

Reasoning models now reason internally before generating output. Claude 4 models use what Anthropic calls “extended thinking.” They work through the problem behind the scenes, then produce the response. The model handles that “reason first, format second” step on its own.

Does that make the finding obsolete? Not entirely.

For content professionals working with structured authoring like DITA, XML schemas, and technical documentation, the principle still holds for how you write your prompts.

You’ll get better content by describing what you want in natural language and letting the model generate the substance, rather than forcing a rigid format from the start.

The reasoning models are better at this than their predecessors, but the content still benefits from clear, natural-language instructions.

Structure your context, not your commands.

From Prompt Engineering to Context Engineering

And that phrase — structure your context — is where tags become more important, not less.

In September 2025, Anthropic published a piece on what they call “context engineering.” Building with language models is becoming less about finding the right words for your prompts and more about curating the right configuration of context.

This is the full set of information the model sees at any given moment, which does include your prompt, but also tools, documents, conversation history, reference material, and system instructions.

This is where my own practice has evolved. I actually use more XML-style tags now than I did a year ago. Not fewer.

This is for two reasons.

First, I work primarily in Claude, and Anthropic still explicitly recommends XML tags in their current documentation for Opus 4.6 and Sonnet 4.6.

They’re clear that there are no magic tag names — <instructions> doesn’t outperform <my_rules> — but XML as a delimiter system helps Claude parse complex prompts. That’s a model-specific advantage, not a universal rule.

That’s really the move from prompt engineering to context engineering in practice. You’re no longer crafting a single message. You’re designing an information environment.

Second, most of what I’m putting into prompts these days isn’t instructions. It’s content.

Course materials, style guides, reference documents, background research.

When you’re loading a context window with thousands of tokens of source material, tags become boundaries between what the AI should read and what it should do. They’re separating content from instruction, not labeling instruction blocks.

That’s really the move from prompt engineering to context engineering in practice. You’re no longer crafting a single message. You’re designing an information environment.

And tags — whatever flavor you prefer — become the architecture of that environment.

Takeaways for Writers and Content Professionals

Here are three guidelines going forward.

  1. Keep the categories, hold the brackets loosely. Task, Context, and Content remain the most research-supported way to organize what you give an AI. Whether you wrap them in XML, use markdown headers, or write clear paragraphs matters far less than whether you’ve actually specified all three.

  2. Use tags to structure your context, not just your prompts. As your AI workflows grow beyond single prompts, tags become architecture. They’re a coordination tool for humans and a parsing tool for the model. That value only increases as the information environment gets more complex.

  3. Let the model reason naturally, then apply structure. If your final output needs to follow a structured format, describe your intent in natural language first. Reasoning models handle this better than ever, but the content still benefits from natural-language instructions over rigid format constraints up front.

The real lesson here isn’t about brackets or XML. It’s that we’ve moved past single-prompt optimization.

Context engineering means designing information environments, and the tools we use to organize those environments matter more now than they did when all we had was a chat box and a one-shot prompt.

If someone on your team is wrestling with whether tags still matter, share this episode. The answer is more interesting than a simple yes or no.

If you want to go deeper on building the kind of systematic prompt and context frameworks we talked about today, that's exactly what my course Writing with Machines covers. It's designed for content professionals who want a repeatable process, not a collection of tips.

I’m Lance Cummings. Until next time … keep prompting … or engineering that context!

Paid subscribers to Writing with Machines get access as part of their subscription.

Discussion about this video

User's avatar

Ready for more?