GovCompass
Responsible AI

Documenting an agent is not governing it

By GovCompass.ai· · Last verified June 2026. Agentic governance is moving fast; this maps the documentation-versus-governance gap onto the EU AI Act and the GovCompass-7.

Most organisations can describe their AI agents in detail: the architecture, the tools, the memory, the base model, the benchmarks, the internal safety testing. What far fewer can do is govern what those agents decide once they are running. Documentation answers the question "what is this agent and what can it do?" Governance answers a harder one: "can we trust the decisions it makes over time?" The two are routinely confused, and an agent that is thoroughly documented but ungoverned is exactly the kind of system that passes every review and then fails in production.

This is part of the Agentic AI element of the GovCompass-7.

The comfortable half of the problem

There is a great deal of information available about what AI agents are. Vendors describe their planning and reasoning capabilities, their memory, their tool integrations, and their base models. Teams produce model cards, benchmark results, and reports from internal safety testing and red teaming. Deployment is documented: the interfaces, the APIs, the hosting, the access controls.

This is the comfortable half of the problem, and organisations do it well because the information is abundant and the questions are static. What is the agent built on? What can it do? How is it deployed? These are answerable at a point in time, and once answered they tend to stay answered until the next version. Documentation of this kind is necessary. The EU AI Act requires much of it: technical documentation under Article 11, record-keeping under Article 12. An organisation that cannot describe its agents has a more basic problem than governance.

But describing an agent is not the same as governing it, and the gap between the two is where most agentic risk lives.

The harder half

Governance answers a different question, and it is not a point-in-time question. It asks whether the decisions the agent makes can be trusted as they accumulate, change, and compound over the agent's operating life. This is the half that documentation does not reach, and it has three parts.

Decision governance. An agent does not just have capabilities; it exercises decision authoritydecision authorityA dimension of an agent's autonomy: how consequential the decisions it may make are, from recommending to a human to deciding and acting without review.Open full entry →. The governance questions are about that authority, not the capability. How much decision authority does this agent hold, and who delegated it? When does its goal need to be revalidated, because the goal it was given six months ago may no longer be the goal the organisation wants it pursuing? Are the policies it operates under expressed in a form the agent actually follows at runtime, rather than written in a document no running system reads? Documentation describes what the agent can decide. Decision governance constrains what it is allowed to decide and keeps that constraint current.

Runtime governance. A model cardmodel cardStandardised documentation for a model: intended use, performance (including per group), limitations, training data summary — a release-gate artefact and transparency tool.Open full entry → describes the agent as it was at the moment of testing. But an agent in production drifts: its inputs change, its memory accumulates, its behaviour shifts. Governance has to operate at runtime, not only at design time. Dynamic risk management adjusts as the agent's behaviour and context change. Runtime assurance and drift detection surface the moment the agent starts behaving outside its expected envelope. State and memory governance keeps the agent's accumulated context from quietly corrupting its future decisions. None of this appears in documentation, because documentation is static and the risk is dynamic.

AccountabilityaccountabilityThe principle that a named human or organization answers for an AI system's outcomes, through ownership, documentation, audit trails and redress — never the system itself.Open full entry → and trust. Documentation can show that an agent was tested. It cannot, on its own, establish that a specific decision the agent made in production can be explained, that an identifiable person is answerable for it, that the organisation's reliance on the agent is calibrated to its actual reliability, and that the agent is governed across its whole lifecycle from design to retirement. These are the trust questions, and they are answered by governance operating continuously, not by a document produced once.

The governance shift

The move from documenting models to governing autonomous decisions is a shift along several axes at once, and naming them makes the gap concrete.

Traditional AI governance governs models; agent governance governs the autonomous decisions those models drive. Traditional governance relies on static controls set at deployment; agent governance needs dynamic controls that adjust at runtime. Traditional governance produces point-in-time assurance, a snapshot that was true when the assessment was done; agent governance needs continuous assurance, because the thing being assured keeps changing. Traditional governance assumes human execution, a person acting on the model's output; agent governance has to account for autonomous execution, where the agent acts without that human step. Traditional governance produces compliance documentation; agent governance has to produce regulatory evidence, which is documentation that demonstrates the controls actually operated, not merely that they were designed. And traditional governance provides model oversight; agent governance requires decision oversight, supervision of the choices the agent makes rather than the model that makes them.

Each of these is a move from something static and describable to something dynamic and governed. An organisation that has done the left-hand column and believes it has done the right-hand column has the exact gap this article is about.

Why the confusion is dangerous

The danger is that documentation creates the appearance of governance. An agent with a thorough model card, clean benchmarks, and a documented deployment looks governed. It has passed the reviews that ask "what is this agent and what can it do?" But those reviews do not ask whether the decisions it makes next month, on inputs it has not yet seen, after its memory has accumulated, can be trusted. The agent that fails in production is frequently the one that was best documented, because the documentation gave everyone confidence to widen its autonomy without building the runtime governance that wider autonomy requires.

This is the same pattern that appears across responsible AI: a control designed once and never operated, a pillar that looks compliant on paper and fails silently in production. With agents the pattern is sharper, because the gap between what the documentation describes and what the agent does grows every day the agent runs.

What to do about it

The practical response is to treat documentation and governance as two separate deliverables and to require both. Documentation answers what the agent is; require it, because the EU AI Act does and because you cannot govern what you cannot describe. But do not let a complete set of documentation stand in for governance.

For each agent, in addition to the documentation, establish the three governance capabilities that documentation does not provide. Decision governance: who holds the agent's decision authority, when its goals are revalidated, and how its policies are enforced at runtime. Runtime governance: dynamic risk management, drift detection, and memory governance that operate while the agent runs. Accountability: decision-level explainabilityexplainabilityThe ability to give a meaningful reason for a specific output of an AI system to the people it affects — distinct from transparency, which is disclosure that and how AI is used.Open full entry →, a named answerable owner, calibrated reliance, and lifecycle governance from design to retirement.

These map onto the GovCompass pillars that autonomy stresses most: accountability, transparencytransparencyOpenness about the fact that AI is used and how it operates in general: disclosures, documentation, notices. Pairs with explainability, which addresses individual outcomes.Open full entry → and explainability, and the integrating agentic element that binds them. The test of whether you have governed an agent, rather than merely documented it, is simple. Documentation lets you answer what the agent is and can do. Governance lets you answer whether you can trust the decisions it is making right now, and whether you would know if you could not.

Continue

Legal referencesArt. 11Art. 12

More on Accountability

Art. 10 EU AI Act: data and data governance for high-risk AI

Reference

Art. 10 requires that the training, validation, and testing data for high-risk AI systems meets quality criteria: relevant, sufficiently representative, and as free of errors and complete as possible for the intended purpose. It also requires documented data governance practices covering collection, preparation, bias examination, and gap mitigation, and it permits the limited processing of special-category data where strictly necessary to detect and correct bias, under safeguards.

Art. 12 EU AI Act: record-keeping and logging for high-risk AI

Reference

Art. 12 requires high-risk AI systems to technically allow for the automatic recording of events (logs) over their lifetime. The logging must enable traceability of the system's functioning at a level appropriate to its intended purpose, support post-market monitoring, and help identify situations that may lead to risk or substantial modification. It is a design obligation on the provider that makes the system auditable by construction.

Art. 19 EU AI Act: keeping the automatically generated logs

Reference

Art. 19 requires providers of high-risk AI systems to keep the logs that the system automatically generates (under Art. 12) for as long as they control them, for a period appropriate to the intended purpose and at least six months unless other law requires longer. It is the retention counterpart to the Art. 12 logging capability, and it works alongside the deployer retention duty in Art. 26.6.

Art. 26.1 EU AI Act: following provider instructions as a deployer

Reference

Art. 26.1 requires deployers to use high-risk AI systems strictly in accordance with the provider's instructions for use. This means using the system only for its intended purpose, within its specified technical configuration, and by qualified users, and documenting that compliance. Deviating from the instructions can shift liability entirely to the deployer.

More on Transparency & explainability