AI systems are being embedded into every layer of enterprise infrastructure, from endpoint detection tools to cloud-native application logic. That shift creates a new category of attack surface. Traditional security models were not built to account for machine learning pipelines, foundation models, or the probabilistic outputs that define how AI systems behave. You need a different mental model for what “securing AI” actually means in 2026.
This guide covers the full threat surface: adversarial attacks on ML models, prompt injection against LLMs, data poisoning in training pipelines, synthetic media abuse, and the governance frameworks your organisation needs to manage it all. Whether you are defending AI systems your team built or assessing third-party AI tools your business relies on, the attack patterns and countermeasures here apply directly.
Table of Contents
- The AI Threat Surface in 2026
- Adversarial Machine Learning Attacks
- Prompt Injection: The SQL Injection of the AI Era
- Data Poisoning and Training Pipeline Attacks
- Deepfakes as a Security Threat
- LLM Vulnerabilities and Exploitation Patterns
- AI Red Teaming: Testing Before Attackers Do
- AI Governance and Organisational Controls
- Using AI to Defend Against AI Threats
- Frequently Asked Questions
The AI Threat Surface in 2026
The AI threat surface has three distinct layers, and most organisations only think about one of them.
The first layer is AI as a target. Your machine learning models, training data, inference APIs, and the pipelines that feed them are all attackable assets. An adversary who corrupts your fraud detection model or extracts your proprietary training data has caused real damage without ever touching your traditional IT infrastructure.
The second layer is AI as a weapon. Attackers are using generative AI to produce more convincing phishing emails, synthesise realistic voice and video, automate vulnerability discovery, and accelerate exploit development. The barrier to entry for sophisticated attacks has dropped significantly. AI-powered phishing attacks now bypass traditional filters that rely on pattern matching because the content is genuinely varied and contextually accurate.
The third layer is AI as an amplifier of existing risk. AI systems often have broad access to data and APIs because they need it to function. A compromised or manipulated AI system can do more damage faster than a human attacker with the same access level, because it operates at machine speed without hesitation or fatigue.
Understanding which layer applies to your organisation shapes your defence priorities. A company deploying a customer-facing LLM chatbot faces immediate prompt injection risk. A company using ML for internal fraud detection faces training data integrity risk. A company relying heavily on video conferencing faces deepfake-driven social engineering risk. The specifics differ; the need for a deliberate AI security posture does not.
Adversarial Machine Learning Attacks
Adversarial ML attacks manipulate the inputs or training process of machine learning models to cause incorrect outputs. The field has been active in academic research for a decade, but 2025 and 2026 have brought these attacks into production environments at a scale that demands practitioner attention.
Evasion Attacks
Evasion attacks modify inputs at inference time so a model produces the wrong output. The classic example is an image with imperceptible pixel noise that causes a vision model to misclassify it with high confidence. In security contexts, evasion attacks target malware classifiers, network intrusion detection systems, and spam filters. The attacker does not need direct access to the model; black-box evasion attacks work by probing the API and observing outputs until an adversarial input is found.
Effective defences include adversarial training (incorporating adversarial examples into the training set so the model learns to handle them), input preprocessing pipelines that strip or smooth perturbations before the model sees them, ensemble methods requiring multiple independent models to agree on a classification, and certified robustness techniques for environments where formal correctness guarantees matter.
Model Inversion and Extraction
Model extraction attacks query a model repeatedly to reconstruct its architecture or its training data. This carries intellectual property implications when a competitor reconstructs your proprietary model, and privacy implications when sensitive training data is recovered. Model inversion attacks specifically reconstruct training records from model outputs, a serious concern for healthcare and financial ML systems where training data contains personally identifiable information.
Defending against extraction requires rate limiting on inference APIs, differential privacy techniques applied during training, calibrated output perturbation that adds noise to predictions without significantly degrading utility, and monitoring systems that flag systematic probing behaviour patterns.
Membership Inference
Membership inference attacks determine whether a specific record was included in a model’s training set. Under UK GDPR and the Data Protection Act 2018, confirming that a person’s health record was used to train your model constitutes a potential data breach even when no raw data was exfiltrated. The attack exploits the fact that models are statistically more confident on examples seen during training than on unseen data. Differential privacy provides the strongest theoretical protection; regularisation, confidence thresholding, and restricted API access provide practical mitigations for deployed systems.
Prompt Injection: The SQL Injection of the AI Era
Prompt injection is currently the most practically exploitable vulnerability class in deployed LLM systems. The analogy to SQL injection is apt: just as SQL injection exploits the failure to separate data from instructions in database queries, prompt injection exploits the failure to separate user input from system instructions in LLM contexts. The root cause in both cases is the same, treating untrusted input as trusted instruction.
An LLM receives a system prompt from the application developer and user input from the end user. The model has no cryptographic or structural way to distinguish between the two. A user who crafts input that overrides or redirects the system prompt can cause the model to behave in ways the developer never intended, including revealing confidential system prompt contents, bypassing access controls, or taking unauthorised actions via connected tools.
For a detailed technical breakdown of how these attacks are constructed and what makes them difficult to patch architecturally, see the full analysis of prompt injection attacks and how hackers manipulate AI systems.
Direct vs. Indirect Prompt Injection
Direct prompt injection involves the user crafting malicious input directly in the conversation interface. Indirect prompt injection is considerably more dangerous: the attacker embeds malicious instructions in external content that the LLM is asked to retrieve and process, such as a web page, email, document, or database record. When the model processes that content, it also executes the injected instructions, with the original user unaware anything has occurred. An AI email assistant that reads and summarises your inbox is vulnerable to indirect injection from any malicious email in that inbox.
Mitigations That Work in Practice
No current mitigation completely eliminates prompt injection risk against production LLM architectures. The most effective approach is privilege minimisation: if the LLM cannot take consequential actions without a human confirmation step, the blast radius of a successful injection is bounded. Structural separators between system prompts and user input help at the margins. Input and output classifiers that flag known injection patterns add a detection layer. Sandboxed execution environments for any actions the LLM takes are non-negotiable for production deployments handling sensitive functions.
Data Poisoning and Training Pipeline Attacks
Data poisoning attacks corrupt the training data used to build a model, causing it to learn incorrect associations, degraded performance on specific inputs, or hidden backdoor behaviours. The attack surface covers the data collection pipeline (web scraping, third-party data purchases), the labelling process (crowdsourced annotation platforms), and fine-tuning datasets provided by users or sourced from external repositories.
Backdoor Attacks
A backdoor attack embeds a hidden trigger in the training data. The model behaves normally on clean inputs but produces attacker-controlled outputs whenever a specific trigger pattern appears. In a content moderation context, a backdoor might cause the model to approve content containing a specific steganographic marker regardless of how harmful the content otherwise is. These attacks are difficult to detect because the model passes all standard benchmarks on clean test sets; the malicious behaviour is invisible until the trigger fires.
Effective defences include training data provenance tracking, automated sanitisation pipelines that scan for anomalous label distributions, neural cleanse and similar backdoor scanning tools applied at evaluation time, and adversarial training techniques that build resistance to trigger-based manipulation into the model.
Supply Chain Risk in AI
Most organisations training or fine-tuning models use pre-trained base models from public repositories. A compromised base model, whether through a malicious submission or a hijacked publisher account, introduces vulnerabilities before you write a single line of your own training code. Treat model provenance the same way you treat software dependency provenance: verify checksums, prefer models from publishers with documented security programmes and training data transparency, and audit model behaviour on adversarial test sets before production integration.
Federated Learning Vulnerabilities
Federated learning distributes model training across nodes without centralising raw data, which is legitimate privacy architecture. It also creates a different attack surface: malicious participants can submit manipulated gradient updates that poison the global model. Byzantine-robust aggregation algorithms such as Krum, coordinate-wise median, and FLTrust provide resistance to a bounded fraction of malicious participants, but they are not standard in most federated learning frameworks by default and require deliberate implementation decisions.
Deepfakes as a Security Threat
Deepfake technology has crossed the threshold from technically impressive novelty to practical attack tool. The cost of generating convincing synthetic video and audio has dropped to where it is accessible to moderately resourced threat actors, not exclusively nation-state programmes. Corporate security teams need to treat synthetic media as a genuine attack vector.
Security-relevant deepfake use cases include CEO fraud (synthetic video or audio of a senior executive authorising a wire transfer or instructing staff to bypass security controls), identity verification bypass (synthetic face or voice defeating biometric authentication), and targeted social engineering against specific individuals.
Technical detection tools use frequency domain analysis, facial landmark inconsistencies, physiological signal analysis, and temporal coherence checks to flag synthetic media. For a current evaluation of which tools perform reliably under real-world conditions, see the review of the best deepfake detection tools that actually work.
Process Controls as Primary Defence
Technical detection is imperfect and will remain so as generation quality improves. Process controls that do not depend on the authenticity of audio or video are more durable. Requiring that any financial transaction above a defined threshold be confirmed via a separate, pre-established communication channel eliminates most deepfake-driven CEO fraud scenarios regardless of how convincing the synthetic media is. Multi-person authorisation for high-value actions adds a layer deepfakes alone cannot defeat; the attacker would need to compromise multiple independent channels simultaneously.
LLM Vulnerabilities and Exploitation Patterns
Large language models have a vulnerability profile that maps imperfectly to traditional application security frameworks. The OWASP Top 10 for LLMs, updated in 2025, provides a useful taxonomy. Leading risk categories in production deployments include prompt injection, insecure output handling, training data poisoning, model denial of service via computationally expensive inputs, supply chain vulnerabilities in model and plugin ecosystems, and excessive agency granted to AI agents.
For a thorough technical treatment of each vulnerability class with real exploitation scenarios, the dedicated analysis of LLM security vulnerabilities and how large language models get exploited covers the full OWASP taxonomy with remediation guidance.
Insecure Output Handling
LLMs produce text. When that text is passed to downstream systems that interpret it as executable code, a database query, or a shell command without sanitisation, you have introduced an injection risk through AI output. An LLM-generated SQL snippet inserted into a query builder without parameterisation carries the same risk as a user-supplied SQL snippet. The vulnerability is easy to miss in code review because developers frequently trust LLM output more than they trust raw user input, and the AI involvement creates a false sense of safety.
Excessive Agency
Excessive agency means giving an LLM-based agent more permissions, access scope, or capability than it needs to complete its intended function. An AI coding assistant with write access to your entire production repository is a broader attack surface than one restricted to a sandboxed branch. An AI customer service agent that can issue refunds, modify accounts, and close tickets is a richer target than one that can only retrieve information. Apply least privilege to AI agents as rigorously as you apply it to human users and service accounts.
AI Red Teaming: Testing Before Attackers Do
AI red teaming is the practice of systematically attempting to cause AI systems to produce harmful, incorrect, or unintended outputs before those systems reach production. It borrows methodology from traditional penetration testing but requires additional skills: understanding model behaviour under adversarial inputs, prompt engineering for manipulation, and familiarity with the specific failure modes of the architecture under test.
Microsoft, Google, Anthropic, and OpenAI all maintain dedicated AI red teams and have published methodology guidance. The NIST AI Risk Management Framework (AI RMF 1.0) and the UK AI Safety Institute have both formalised expectations for structured AI evaluation before deployment in high-stakes contexts.
Our coverage of AI model safety, alignment, and red teaming guardrails connects formal alignment research to practical security testing methodology.
Structuring an AI Red Team Exercise
An effective exercise starts with a defined threat model: who is the attacker, what is their objective, what access level are you assuming? For a customer-facing LLM chatbot, the threat model typically includes external users attempting jailbreaks, competitors probing for system prompt disclosure, and automated bots testing for injection vectors. For an internal AI agent with access to business systems, the threat model expands to insider threats and supply chain attacks against AI infrastructure.
Test cases should cover all known vulnerability categories from the OWASP LLM Top 10, plus open-ended adversarial creativity. The output should be a prioritised finding list with reproduction steps and severity ratings, not a general risk statement. Findings should map directly to the remediation owners who can act on them.
Continuous Testing in CI/CD
AI systems change over time. Model updates, new integrations, prompt changes, and plugin additions all create new attack surface. Red teaming as a one-time pre-deployment activity misses the ongoing risk. Build automated adversarial test suites into your CI/CD pipeline for AI systems, with human red team reviews at major milestones and at least quarterly for production systems processing sensitive data or taking consequential actions.
AI Governance and Organisational Controls
Technical controls cannot manage AI security risk in isolation. You need governance structures that define accountability for AI security decisions, assessment requirements before deploying AI systems, and incident response processes specific to AI-related events. Without governance, technical controls exist in a vacuum and erode under operational pressure.
EU AI Act and UK AI Governance in 2026
The EU AI Act applies to UK organisations deploying AI systems affecting EU residents. High-risk AI systems, including those used in employment decisions, credit assessment, biometric identification, and critical infrastructure management, face mandatory conformity assessments, transparency requirements, and ongoing monitoring obligations. Even where you have no direct legal obligation under the Act, its risk categorisation framework provides a defensible baseline for internal governance.
The UK approach uses lighter-touch sector-specific guidance from the FCA, ICO, and industry regulators, supplemented by AI Safety Institute evaluation frameworks. The ICO has published specific guidance on AI and data protection covering lawful basis for training, automated decision-making rights under Article 22, and data protection impact assessment requirements for high-risk AI processing. Our analysis of AI data privacy concerns and what happens inside AI models covers the technical and regulatory dimensions together.
AI Risk Register
Every AI system your organisation uses or develops should have a risk register entry covering: the system’s purpose and data inputs, its risk category, the technical controls in place, the responsible owner, testing status including red team findings, and the scheduled review date. When an AI-related incident occurs, a current AI risk register is the difference between a structured response and a chaotic scramble to establish which systems are affected and who owns them.
Third-Party AI Supplier Risk
Most organisations consume AI capabilities through third-party APIs and products. That does not transfer security responsibility; it changes its character. Your supplier risk management programme needs AI-specific questions: what training data does the vendor use and how is it quality-controlled, how are model updates tested before deployment, what data retention and isolation controls apply to your prompts and outputs, and what is their incident response SLA for AI-specific events?
Using AI to Defend Against AI Threats
The security industry spent 2023 and 2024 focused primarily on how AI enables attackers. The defensive side has caught up. AI-powered defensive tools are now mature enough to provide measurable improvements in detection coverage, investigation speed, and analyst productivity. The organisations getting the most value have integrated AI into security operations deliberately, not as a vendor checkbox but as a genuine capability investment.
Behavioural detection models that identify anomalous patterns in user and entity activity have improved substantially. They catch lateral movement, data exfiltration patterns, and credential abuse scenarios that rule-based SIEM alerts miss because the behaviour is statistically unusual rather than matching a known signature. Modern UEBA platforms explain their detections in terms that security analysts can evaluate, rather than producing opaque anomaly scores that require a data scientist to interpret.
For a grounded assessment of where machine learning actually improves detection performance versus where vendor claims outpace reality, see the analysis of how AI detects threats faster than humans in cybersecurity operations.
AI-powered vulnerability management platforms correlate CVE data, asset inventory, network topology, and observed exploitation activity to produce prioritised remediation recommendations. The improvement over CVSS-score-only prioritisation is material: these systems account for active exploitation in the wild, your specific network exposure, and existing compensating controls. The result is a patch list ordered by actual risk to your environment.
Understanding offensive AI tools being used against organisations is prerequisite knowledge for defensive investment decisions. The detailed breakdown of how hackers use AI offensive tools with real attack examples gives you the attacker’s perspective on the same technology your defenders are using.
Frequently Asked Questions
What is the biggest AI security threat facing UK organisations right now?
Prompt injection against deployed LLM systems is the most practically exploitable threat in 2026, because many organisations have integrated LLMs into critical workflows without treating them as a security boundary. AI-powered phishing is a close second, operating at scale and already bypassing filters trained on older attack patterns. Both require immediate attention regardless of sector or size.
Does the EU AI Act apply to UK businesses after Brexit?
Yes, if your AI systems process data belonging to EU residents or make decisions affecting people in the EU, the Act applies. UK-only operations fall under UK GDPR and ICO guidance rather than the EU AI Act directly, but the risk categorisation frameworks are closely aligned and provide a sound governance baseline regardless of your specific regulatory exposure.
How do you defend against data poisoning when using third-party training datasets?
Start with provenance verification: use data from sources with documented collection and quality control processes. Apply statistical anomaly detection to training datasets before use. Use differential privacy during fine-tuning to limit individual example influence. For base models from public repositories, verify checksums and prefer publishers with documented training data lineage and active security programmes.
What is AI red teaming and how does it differ from traditional penetration testing?
AI red teaming tests whether AI systems can be manipulated to produce harmful or unintended outputs, rather than finding software vulnerabilities. Required skills overlap with traditional pen testing but extend to prompt engineering, adversarial input crafting, and model-specific failure mode analysis. NIST’s AI RMF and the UK AI Safety Institute both publish structured methodology guidance.
How should organisations handle deepfake fraud attempts?
Process controls are more durable than technical detection. Require out-of-band verification for any high-value action requested via video or voice call, using a pre-established channel rather than responding to an inbound contact. No financial transaction or security exception should be authorised based solely on audio or video confirmation, regardless of how convincing it appears.
What governance documents should exist before deploying an AI system?
At minimum: a data protection impact assessment under UK GDPR for high-risk processing, a risk register entry covering data inputs and controls, documented testing results including red team findings, and a supplier security assessment for any third-party AI API dependencies. Regulated sector organisations should additionally check their sector regulator’s specific AI guidance for any additional requirements.