Every Doc Makes a Promise
Understanding the ethos of documentation in an AI world

I’ll be honest with you … I came to this book review feeling a little behind.
Agentic AI has been moving fast, and for most of it I’ve watched from a comfortable distance.
I’m skeptical of a lot of the use cases, and I’ve made no secret of that. Can I really trust an agent to the complex work I’d want them to do? I’ve not been that sure.
But I think we’ve come to a point that everyone needs to reckon with agents one way or another, and Manny Silva’s Docs as Tests and AI made me think about that. Not because he evangelizes agents, but because he explains how to trust them.
That’s really what this book is about.
Not necessarily how to build agents, but how to verify they actually do what you need them to do, and why your documentation is the place where that trust either gets built or falls apart.
Docs as tests is a philosophy, not just a method
What strikes me most about this book is that Silva isn’t offering a workflow checklist. He’s arguing for a way of thinking about documentation.
Every document that describes how a product behaves is making a testable claim. When it says “click this button and this happens,” that’s a promise. The question Docs as Tests asks and answers is whether your documentation keeps its promises.
One inaccurate paragraph, confidently retrieved and served by a chatbot, is a very different problem than one frustrated user who bounces to support. The stakes have changed, and Docs as Tests takes those stakes seriously.
Rhetoricians will recognize this as a question of ethos. Not the reduced version where ethos means credentials, but the deeper sense: credibility earned through consistent, verifiable action.
Silva never uses the word, but the entire book is an argument for how to build machine ethos, or the kind of trustworthiness that holds up not just when a human reads your docs, but when an AI system consumes them and generates answers for thousands of users.
One inaccurate paragraph, confidently retrieved and served by a chatbot, is a very different problem than one frustrated user who bounces to support. The stakes have changed, and Docs as Tests takes those stakes seriously.
Ethos is too often reduced to “credibility” and left at that. For Aristotle, one of the first to articulate this idea, ethos can’t simply be asserted. It has to be demonstrated through the work itself.
A document that claims accuracy isn’t credible; a document that demonstrably maps to reality is.
Technical communication scholars have made a version of this argument for decades. Documentation functions as a form of institutional ethos, a sustained social contract between a product and its users.
Silva’s framework makes that contract testable. He’s not adding rhetoric to documentation theory; he’s building the verification infrastructure that rhetoric always assumed someone was running.
Three layers in one book
What makes this book practical rather than theoretical is how Silva structures it. Each section works at three levels simultaneously:
First, he lays out the conceptual foundation,
illustrates it through a practitioner named Vanessa who is working through the same problems in a real documentation context, and then
closes with exercises you can run yourself.
I’ll be transparent about my own limitations here. Some of the exercises require comfort with the terminal, YAML, and basic scripting. I got through them, but I needed to lean on AI to troubleshoot a few mistakes I didn’t fully understand.
If you’re a technical writer who works comfortably with dev tooling, this will feel intuitive. If you’re coming from a less technical background, the exercises are still worth doing — just plan to go slower, and don’t skip the Vanessa scenarios, which give you the shape of the work even when the code feels unfamiliar.
The inventory of documents that Silva lays out in Part Three was probably the part I found most useful. To run a trustworthy agentic workflow, you need:
a project description,
agent definitions,
orchestration patterns that make the workflow legible to every agent involved,
task skills written as reusable prompts, and
plans with explicit acceptance criteria.
That’s documentation. And reading through it, I found myself thinking about portfolios.
Right now I ask students to document a chatbot assessment, build a structured knowledge piece, and analyze a workflow. That’s a solid foundation.
➡️ For a limited time, paid subscribers can access a version of this course for professionals for free. Check it out here.
But a more advanced version of that portfolio could be the full documentation set for a working agent: the project file, the skill definitions, the plan and specs.
It would demonstrate not just that a student can use AI tools, but that they understand the system well enough to govern one.
Manny’s book is the clearest map I’ve seen of what those documents actually need to contain.
Understanding deterministic vs probabilistic
The core technical distinction in the book is between deterministic testing and probabilistic testing.
Deterministic tests produce the same result every time. It’s binary, reliable, automatable.
Probabilistic tests use AI to interpret content, which means the result can vary between runs. Same input, different output.
That might sound like a limitation, and in some ways it is. But what Silva’s really mapping is the difference between things that can be verified objectively and things that require interpretation — and interpretation, as any writing teacher knows, is where human judgment lives.
When we mistake a probabilistic check for a deterministic one, we get what Silva calls a false signal.
His practical solution is a hierarchy of trust. At the base are ungrounded assertions, or documentation that simply hasn’t been tested.
Above that is grounded probabilistic testing, where AI evaluation is constrained by explicit criteria, run multiple times, and treated as reconnaissance rather than verdict.
At the top is deterministic testing, or binary checks against a live product.
The goal is to migrate upward wherever possible, while being honest about what each level can and can’t tell you.
An LLM isn’t evaluating your product. It’s generating what it expects your product to be, based on the patterns of its training. That’s why any automated documentation system needs testing … not as a quality-control afterthought, but as the mechanism that keeps the content grounded in reality.
Why technical writers are still necessary
The third part of the book, “Teaching Agents Your Process”, was where I learned the most. Silva walks through what it actually takes to run an agentic documentation workflow:
project descriptions,
agent definitions,
orchestration patterns,
task skills, plans
and specifications.
Every one of these is a document. Every one of them shapes how an agent operates and whether it stays within the bounds you intend. Writing them well and maintaining them accurately is a technical writing problem.
This is an argument I’ve made from the rhetoric side, and it’s gratifying to see it come from the practitioner side too. The work that gets automated is first-draft production.
The work that doesn’t is design, information architecture, judgment about what content actually needs to exist and in what form.
Technical writers who understand that distinction become more valuable in agentic systems, not less. The documentation that governs those systems requires exactly the expertise you already have.
The classical term for this kind of judgment is phronesis, or Aristotle’s practical wisdom, the capacity to discern what the right action is in a specific situation, with specific constraints, for a specific audience.
Phronesis isn’t a skill you can look up or a rule you can apply consistently across cases. It’s cultivated through experience, through having stakes in an outcome, through the kind of situational reading that comes from being genuinely accountable to the people you’re writing for.
I’ve been thinking about phronesis lately as the human quality that agentic AI most needs and structurally cannot have.
An agent can execute a well-documented workflow with high fidelity. It cannot tell you when the workflow is wrong for this situation. Or when the document that passes every test still misleads the user who reads it at 11pm trying to fix a production problem.
That gap is where technical writers operate, and it’s the gap that Silva’s human oversight checkpoints are designed to protect.
Who should read this
➡️ Check out Docs as Tests & AI here.
If you work in technical documentation, content strategy, or documentation engineering, this book belongs in your hands now. It gives you a coherent framework for thinking about documentation quality in AI pipelines, with concrete methods for testing and validating what you build.
If you’re an educator or solo creator who isn’t running enterprise documentation workflows (like me) it’s still worth your time. Docs as Tests & AI is one of the clearest explanations of what agentic AI actually involves, what the components are, and where human judgment remains irreplaceable. That’s useful regardless of whether you’re ready to build the system yourself.
I’m not ready to build all of it myself. But I understand it now in a way I didn’t before, and I’m starting to see where pieces of it apply to my own work. That’s about as good an endorsement as I can give.


