Element 4 of the GovCompass-7

Security & robustness

The system resists deliberate attempts to manipulate, degrade, extract from, or subvert it.

What it means

Security and robustness is the property that an AI system resists deliberate attempts to manipulate, degrade, extract from, or subvert it, and continues to operate correctly under adversarial conditions. The EU AI Act addresses this in Art. 15, which places cybersecurity and robustness alongside accuracy as a requirement for high-risk systems. This element is distinct from general information security because AI systems present an attack surface that conventional systems do not: the model itself, and the data it was trained on, become targets.

The threat catalogue is specific to AI. Adversarial examples are inputs crafted to cause misclassification while appearing normal to a human. Data poisoning corrupts the training data so the model learns an attacker-chosen behaviour. Model inversion and membership inference extract information about the training data from the model's outputs. Model extraction reconstructs a proprietary model by querying it. Prompt injection, for systems built on language models, subverts the system's instructions through crafted input. Each of these has a corresponding defensive posture, and the control set has to cover the ones relevant to the deployment.

Why it matters

Security and robustness is the element that underpins all the others, which is why its failure is so consequential. A model that can be manipulated through adversarial input cannot be relied upon to be fair, safe, or transparent, because an attacker can defeat each of those properties on demand. A poisoned training set undermines every downstream control, because the model's behaviour has been compromised at its foundation. A successful extraction attack turns a proprietary asset into a competitor's starting point.

Governing security and robustness

The controls extend conventional security engineering to cover the model and its data as first-class assets, and they treat adversarial conditions as expected rather than exceptional.

Control layer	Control
Preventive	Threat-model the AI system specifically, covering adversarial input, poisoning, extraction, and inversion as relevant to the architecture. Validate and sanitise inputs to constrain the attack surface. Control and verify the provenance of training data to resist poisoning. Apply access controls to the model, its weights, and its training data as sensitive assets. For systems built on language models, isolate untrusted input from system instructions to resist prompt injection.
Detective	Log and monitor inference requests for patterns consistent with extraction or adversarial probing, such as anomalous query volume or systematic input perturbation. Conduct adversarial testing and red-teaming on a defined schedule. Monitor for unexpected shifts in model behaviour that could indicate a compromise.
Corrective	Maintain an incident-response capability that covers AI-specific attacks, including the ability to roll back to a known-good model version. Define the response to a confirmed poisoning, which requires retraining from verified data. Report security incidents that meet the Art. 73 threshold, and feed the attack pattern back into the threat model and the preventive controls.