AI Security Risks: Every Threat Vector You Need to Understand in 2025

Photo of author

By James Harrington

AI Security Risks: The Threat Landscape You Cannot Ignore

If you deploy AI systems without understanding their attack surface, you are exposed. AI security risks span data poisoning, model theft, prompt injection, adversarial inputs, and supply chain compromise. This guide maps every threat vector, references real CVEs, and gives you actionable mitigations aligned with OWASP and NIST frameworks.

Why AI Security Risks Are Accelerating in 2025

The rapid adoption of large language models and generative AI has outpaced security controls. According to NIST AI 100-2 (Adversarial Machine Learning: A Taxonomy and Terminology), AI systems face unique threats that traditional cybersecurity frameworks do not fully address. The OWASP Top 10 for LLM Applications (2025 edition) now catalogues risks from prompt injection to insecure output handling, reflecting how quickly the threat surface has expanded.

You are dealing with systems that learn from data, respond to natural language, and operate with degrees of autonomy. Each of those properties introduces distinct LLM security vulnerabilities that attackers actively exploit.

The Complete AI Threat Vector Map

Understanding each threat vector is the first step toward building resilient AI systems. The table below categorises the primary AI security risks by type, severity, and recommended mitigation.

Threat Vector Description Severity Mitigation
Prompt Injection (Direct) Attacker crafts input that overrides system instructions, causing unintended model behaviour. Ref: OWASP LLM01. Critical Input validation, instruction hierarchy, output filtering
Indirect Prompt Injection Malicious payloads embedded in external data sources (websites, documents) that the model retrieves and executes. Critical Sandboxed retrieval, content sanitisation, trust boundaries
Data Poisoning Adversary corrupts training data to introduce backdoors or degrade model accuracy. Ref: CVE-2024-3660 (Keras Lambda layer arbitrary code execution). High Data provenance tracking, anomaly detection on training sets, NIST AI RMF data integrity controls
Model Theft / Extraction Attackers use repeated API queries to replicate model weights or decision boundaries. Ref: OWASP LLM10 (Model Theft). High Rate limiting, query auditing, watermarking, differential privacy
Adversarial Examples Subtle input perturbations that cause misclassification. Well-documented in computer vision (e.g., adversarial patches on stop signs). High Adversarial training, input preprocessing, ensemble verification
Insecure Output Handling Model output passed to downstream systems without sanitisation, enabling XSS, SSRF, or code execution. Ref: OWASP LLM02. High Output encoding, strict typing, least-privilege execution contexts
Supply Chain Compromise Malicious models, poisoned datasets, or backdoored libraries distributed through public repositories. Ref: CVE-2024-5480 (PyTorch model loading RCE). High Model signing, dependency scanning, SBOM for ML pipelines
Training Data Leakage Model memorises and reproduces sensitive training data (PII, credentials, proprietary content). Ref: OWASP LLM06. Medium Differential privacy, membership inference testing, data minimisation
Denial of Service (Model DoS) Crafted inputs that consume excessive compute, increasing latency or crashing inference endpoints. Medium Input length limits, token budgets, auto-scaling with circuit breakers
Excessive Agency AI agents granted overly broad permissions execute harmful actions autonomously. Ref: OWASP LLM08. Medium Least-privilege access, human-in-the-loop controls, action allow-listing

Prompt Injection: The Most Exploited AI Attack Vector

Prompt injection remains the single most exploited vulnerability in LLM-powered applications. You need to understand both direct and indirect variants. In a direct prompt injection attack, the attacker manipulates the user-facing input to override system-level instructions. In indirect injection, the payload hides inside data the model retrieves from external sources, such as web pages, emails, or database records.

Real-world examples include the 2023 Bing Chat indirect injection demonstrated by security researcher Johann Rehberger, where hidden instructions on web pages caused the chatbot to exfiltrate conversation data. The OWASP LLM Top 10 lists prompt injection as LLM01, its highest-priority risk.

Defending Against Prompt Injection

You should implement a layered defence. First, separate system instructions from user input using clear delimiters and instruction hierarchy. Second, apply input validation that detects known injection patterns. Third, filter model outputs before passing them to downstream systems. Fourth, use canary tokens to detect when system prompts have been overridden. No single control is sufficient. You need defence in depth.

Data Poisoning and Supply Chain Attacks on AI Models

Data poisoning targets the training phase. If an attacker can inject malicious samples into your training dataset, they can introduce backdoors that activate on specific triggers while the model performs normally on standard inputs. The TrojAI programme run by IARPA specifically researches detection methods for these backdoor attacks.

Supply chain risks have escalated as organisations rely on pre-trained models from Hugging Face, PyTorch Hub, and similar repositories. CVE-2024-5480 demonstrated how loading a malicious PyTorch model could achieve remote code execution. You must treat model files with the same suspicion as executable code. Implement model signing, verify checksums, and maintain a software bill of materials (SBOM) that includes your ML dependencies.

AI Model Safety: What NIST and OWASP Recommend

When you evaluate AI model safety, two frameworks provide the most actionable guidance. The NIST AI Risk Management Framework (AI RMF 1.0) structures AI risk into four functions: Govern, Map, Measure, and Manage. It emphasises that AI risks differ from conventional software risks because they include emergent behaviours, opacity, and societal impact.

The OWASP Top 10 for LLM Applications provides a developer-focused list. Its 2025 revision covers prompt injection (LLM01), insecure output handling (LLM02), training data poisoning (LLM03), model denial of service (LLM04), supply chain vulnerabilities (LLM05), sensitive information disclosure (LLM06), insecure plugin design (LLM07), excessive agency (LLM08), overreliance (LLM09), and model theft (LLM10). You should map your AI deployments against both frameworks.

Building a Practical AI Security Programme

Start with an AI asset inventory. Catalogue every model, dataset, API endpoint, and plugin in your environment. Then conduct threat modelling specific to AI, using STRIDE adapted for ML systems or Microsoft’s Counterfit framework for adversarial testing. Integrate AI-specific checks into your CI/CD pipeline: scan for adversarial robustness, test for data leakage, and validate output handling. Assign ownership of AI risk to a named individual or team, not just your general security function.

Adversarial Machine Learning: Beyond Theory

Adversarial machine learning has moved from academic research to practical exploitation. Attackers use techniques like Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) to generate inputs that fool classifiers. In computer vision, adversarial patches printed on physical objects can cause misclassification at scale. In NLP, character-level perturbations or homoglyph substitutions bypass content filters and toxicity detectors.

You should incorporate adversarial testing into your model evaluation pipeline. Tools like IBM Adversarial Robustness Toolbox (ART) and Microsoft Counterfit provide automated adversarial attack simulations. Regularly test your models against known attack techniques and track robustness metrics alongside accuracy metrics.

Securing the AI Inference Pipeline

Your inference pipeline is a live attack surface. Every API call is an opportunity for extraction, injection, or denial of service. Implement strict rate limiting per user and per session. Log all inputs and outputs for anomaly detection. Use guardrails libraries (such as NVIDIA NeMo Guardrails or Guardrails AI) to enforce output constraints. Deploy model endpoints behind authentication, and never expose raw model access to untrusted users.

Monitor for model drift and performance degradation, which may indicate ongoing data poisoning or adversarial campaigns. Set alerting thresholds on confidence scores, output distributions, and error rates.

Frequently Asked Questions

What are the most critical AI security risks for enterprises in 2025?

Prompt injection (both direct and indirect) and supply chain compromise rank as the most critical AI security risks. Prompt injection is widely exploitable with minimal skill, while supply chain attacks on model repositories can compromise entire AI pipelines. OWASP rates both as top-priority threats requiring immediate mitigation controls.

How does data poisoning differ from adversarial attacks on AI models?

Data poisoning targets the training phase by corrupting the dataset, embedding backdoors or degrading accuracy over time. Adversarial attacks target the inference phase by crafting inputs that exploit learned model weaknesses. Both manipulate model behaviour, but poisoning requires access to training data while adversarial attacks only need query access to the deployed model.

Which frameworks should you use to assess AI security risks?

Use NIST AI RMF 1.0 for enterprise-level AI risk governance across the Govern, Map, Measure, and Manage functions. Use the OWASP Top 10 for LLM Applications for application-layer security specific to large language models. Together, they cover strategic risk management and tactical vulnerability mitigation for AI systems.