You will have a security incident. The question is whether your organisation will respond to it with a coordinated process that limits damage and meets legal obligations, or with an improvised scramble that makes everything worse. Incident response planning is the work you do before something goes wrong so that when it does, the decisions are already made and the people involved know their roles.
This guide covers the complete incident response lifecycle for UK organisations: the NIST framework that structures the process, the practical mechanics of each phase, the tooling that makes response faster and more consistent, tabletop exercises that validate your plan before you need it, and the UK regulatory obligations that govern breach notification to the ICO and affected individuals.
Table of Contents
- Why Incident Response Planning Is Not Optional
- The NIST Incident Response Framework
- Phase 1: Preparation
- Phase 2: Detection and Analysis
- Phase 3: Containment
- Phase 4: Eradication
- Phase 5: Recovery
- Phase 6: Post-Incident Review
- SIEM and SOAR: Accelerating Detection and Response
- Tabletop Exercises and Plan Validation
- UK Regulatory Requirements: ICO and Breach Notification
- Communication During an Incident
- Frequently Asked Questions
Why Incident Response Planning Is Not Optional
The average time to identify a breach is still measured in months, not hours. The average time to contain it, once identified, extends the exposure further. During that window, attackers establish persistence, move laterally, exfiltrate data, and in ransomware scenarios, stage their encryption payload. The damage is largely done before the incident response process begins.
Organisations with mature incident response capabilities consistently show better outcomes on the metrics that matter: faster detection (reducing the dwell time during which attackers operate undetected), faster containment (reducing the blast radius once detection occurs), lower total breach cost, and cleaner regulatory interactions (because notifications are accurate and timely rather than incomplete and late).
The incident types UK organisations face in 2026 span ransomware deployments that encrypt business-critical systems and demand payment for restoration, business email compromise that redirects financial transactions, supply chain attacks that use trusted software updates to install backdoors, data theft campaigns targeting customer and employee records for downstream fraud, and denial-of-service attacks against public-facing infrastructure. Your incident response plan needs to address the scenarios relevant to your threat profile, not just theoretical attack categories.
Understanding specific attack categories your plan needs to address is prerequisite reading for building a realistic response. The detailed analysis of ransomware types and how they operate covers the operational mechanics that shape containment and recovery decisions. The overview of brute force attack patterns covers the credential-based attack scenarios that frequently precede more serious incidents.
The NIST Incident Response Framework
NIST Special Publication 800-61 (Computer Security Incident Handling Guide) defines the incident response lifecycle that most enterprise IR programmes are built on. The framework organises the process into four phases: Preparation, Detection and Analysis, Containment/Eradication/Recovery, and Post-Incident Activity. The phases are sequential but not strictly linear: you may cycle back from recovery to containment if new indicators of compromise are discovered.
The framework is technology-agnostic and scale-agnostic: it applies equally to a small security team at a mid-sized organisation and a large SOC at a multinational. The mechanics differ but the process structure is the same. NIST 800-61 is not the only framework; SANS also publishes widely used IR guidance, and ISO/IEC 27035 provides a formal international standard for incident management. The practical differences between them are smaller than the shared structure.
Phase 1: Preparation
Preparation is the phase that determines whether your response to an incident is competent or chaotic. It covers everything you do before an incident occurs: defining what constitutes an incident, building and testing your incident response plan, assembling and training your incident response team, deploying the tools you will need during a response, and establishing the relationships with legal, PR, and regulators that you will need to activate quickly.
Defining Severity Levels
Not every security event is a major incident. You need a classification scheme that allows responders to quickly determine the appropriate response level without convening a committee. A practical approach is four severity levels: P1 (critical, active business impact or confirmed data breach requiring immediate executive involvement), P2 (high, significant system compromise or imminent breach risk requiring urgent response), P3 (medium, contained compromise with limited impact requiring next-business-day response), and P4 (low, suspicious activity requiring investigation but no immediate action). Define the criteria for each level specifically enough that responders can classify consistently without ambiguity.
The Incident Response Team
An effective IR team covers several distinct functions: the incident commander who makes the decisions and communicates the status, the technical lead who directs the hands-on investigation and containment, communications and legal representatives who manage internal and external communication, and subject matter experts who can be called in based on the incident type (cloud team for cloud incidents, network team for network incidents). Define roles in advance, assign backups for each role, and ensure everyone on the team has read and understands the plan.
Retaining an external IR firm on a pre-negotiated retainer is standard practice for organisations without large in-house security teams. The retainer agreement should define the engagement terms, response time commitments, and escalation process in advance rather than negotiating under pressure during an active incident. Mandiant, CrowdStrike, Secureworks, and NCC Group all offer retainer-based IR services with UK coverage.
Asset Inventory and Critical System Mapping
You cannot protect what you do not know you have. A current, accurate asset inventory is foundational to incident response: it tells you which systems are affected, which systems are at risk, and which systems are business-critical and therefore prioritised for recovery. Map your critical systems explicitly, including their dependencies, data flows, recovery time objectives, and recovery point objectives. This mapping should be stored offline or in a resilient location that an attacker who has compromised your primary systems cannot access.
Phase 2: Detection and Analysis
Detection is the process of identifying that an incident has occurred or is occurring. Analysis is the process of understanding what happened, which systems are affected, and what the attacker has done. Together they define what you are responding to, which determines everything about how you respond.
Detection Sources
Security incidents are detected through multiple channels: automated alerts from SIEM systems, endpoint detection and response (EDR) tools, and network monitoring; helpdesk reports from users experiencing anomalous system behaviour; alerts from cloud provider security services; threat intelligence feeds identifying indicators of compromise matching your environment; and in some cases, external notification from law enforcement, partner organisations, or security researchers.
Detection quality is uneven across these sources. Automated alerts generate high volumes with significant false positive rates that require analyst triage. User reports are often the first indication of ransomware or account compromise but rely on users recognising and reporting anomalous behaviour. External notifications are reliable when they arrive but may come well after the attacker has established persistence. Effective detection requires all of these sources working in combination, not any single one in isolation.
Initial Analysis and Scoping
When a potential incident is identified, the first analytical task is determining whether it is genuine and understanding its initial scope. This involves reviewing available logs, looking for the indicators of compromise associated with the reported activity, mapping the timeline of events as far back as the available data allows, and identifying which systems and accounts show signs of involvement. The goal is not a complete forensic analysis at this stage; it is establishing enough understanding to make good containment decisions.
Log availability is a persistent problem in incident response. Organisations that have not invested in centralised logging and appropriate log retention frequently find that the evidence needed to understand an incident is not available because logs were overwritten or never collected. Security logging requirements should be defined in advance: which systems log to a central SIEM, what events are logged, and how long logs are retained. The guidance on staying safe from rootkits and on Linux distribution security against malware covers the host-level logging and integrity mechanisms that feed incident response detection.
Phase 3: Containment
Containment stops the incident from getting worse while you work on eradication and recovery. It is the decision point that involves the most difficult trade-offs in incident response: containment actions that are more aggressive (taking systems offline, blocking network segments) are more effective at stopping the incident but cause more operational disruption. Less aggressive containment (monitoring without blocking, isolating only confirmed compromised systems) is less disruptive but allows the attacker to continue operating.
Short-Term Containment
Short-term containment is the immediate action taken to stop the bleeding: isolating confirmed compromised systems from the network, revoking credentials that are known or suspected to be compromised, blocking specific network indicators (malicious IP addresses, domains) at the firewall or DNS level, and disabling specific user accounts. These actions should be reversible and documented; you need to be able to undo them cleanly once eradication is complete.
Evidence Preservation
Before performing any remediation actions on compromised systems, preserve the forensic evidence. Take memory dumps of running systems (memory is volatile and is overwritten when a system is restarted). Preserve disk images before wiping and reimaging. Capture network traffic logs. Document the state of the system before any changes are made. Evidence preservation matters both for understanding the incident and for regulatory and legal purposes: if you need to demonstrate to the ICO or in legal proceedings what happened and when, you need preserved evidence to support that account.
Longer-Term Containment
Longer-term containment involves establishing a stable, isolated environment for affected systems that allows business operations to continue while eradication is completed. This might involve rebuilding systems on clean infrastructure, restoring from verified clean backups to temporary environments, or temporarily running degraded services. The objective is to buy time for thorough eradication without maintaining extended disruption to the organisation.
Phase 4: Eradication
Eradication removes the attacker and all evidence of their presence from your environment. This is the phase most organisations rush, and the rushing is why many experience re-compromise: they remove the obvious malware and restore services without completing a thorough investigation of how the attacker gained access, how far they moved, and what persistence mechanisms they established.
Root Cause Analysis
Before eradication can be complete, you need to understand the root cause: how did the attacker gain initial access? A phishing email that delivered a malicious attachment, a vulnerable internet-facing application, a compromised third-party credential, a supply chain attack? Without understanding the initial access vector, you cannot be confident you have closed it. If the attacker entered through a VPN credential stuffing attack and you do not reset all VPN credentials and enable MFA, they can re-enter the same way after you have cleaned up.
For server-level hardening and configuration controls that prevent common initial access vectors, the guidance on securing SSH servers covers the authentication and access control configurations that prevent the credential-based attacks most commonly used for initial access to Linux systems.
Removing Persistence Mechanisms
Sophisticated attackers establish multiple persistence mechanisms so that removing the obvious one does not eject them from the environment. Common mechanisms include scheduled tasks, registry run keys, service installations, web shells on internet-facing systems, and modified system binaries. A thorough eradication process requires reviewing all persistence locations systematically, not just the ones identified in the initial investigation. Autoruns (for Windows) provides a comprehensive view of persistence locations. On Linux systems, review cron jobs, systemd units, SSH authorised keys, and modified binaries for all accounts including service accounts.
Phase 5: Recovery
Recovery restores affected systems to normal operation in a verified clean state. The sequence matters: returning systems to production before eradication is confirmed risks re-infecting a clean environment from a remaining attacker foothold.
Restoration Sequencing
Restore in order of business criticality, but verify cleanliness before each restoration. For systems being restored from backup, confirm that the backup pre-dates the initial compromise, not just the detection. Attackers often establish persistence weeks or months before executing their end objective; a backup taken after compromise contains the attacker’s tools. Use a known-clean backup or rebuild from a verified clean image with your configuration applied, rather than relying on a backup whose integrity you cannot confirm.
Enhanced Monitoring During Recovery
Maintain heightened monitoring for at least 30 days after recovery. Watch specifically for re-emergence of the indicators of compromise identified during the incident, anomalous authentication patterns from accounts that were affected, and network traffic to or from the infrastructure used by the attacker. Re-compromise often manifests first as subtle reuse of the attacker’s known tactics rather than an entirely fresh attack chain.
Phase 6: Post-Incident Review
The post-incident review, sometimes called a lessons learned or after-action review, is the process of extracting maximum value from the experience of responding to an incident. It is not a blame exercise; it is a structured analysis of what happened, how the response performed against the plan, and what changes to process, technology, or training would improve future outcomes.
Review Structure
Conduct the review within two weeks of incident closure while details are fresh. Invite all participants in the response, including representatives from business units affected, not just the security team. Review the timeline: when was the incident detected relative to when it began, what was the time to containment, what was the time to recovery? Assess each phase of the response against your plan: did detection mechanisms work as expected, were containment decisions sound, did eradication miss anything that required a second pass?
Improvement Tracking
The review generates improvement actions. Track them, assign owners, and set completion dates. Without a tracking mechanism, the insights from the review evaporate and the same failures recur in the next incident. Common improvement categories include: deploying additional logging that was identified as missing during the investigation, updating the IR plan based on gaps found during response, implementing additional technical controls that would have prevented or limited the incident, and updating training to address knowledge gaps observed during response.
SIEM and SOAR: Accelerating Detection and Response
Security Information and Event Management (SIEM) platforms aggregate logs and events from across your environment, correlate them against detection rules, and generate alerts for analyst investigation. Security Orchestration, Automation, and Response (SOAR) platforms automate the repetitive tasks in incident response workflows, from triaging alerts to executing containment actions, freeing analysts for the higher-judgement tasks that automation cannot handle.
SIEM Platform Selection
The major SIEM platforms in enterprise use are Microsoft Sentinel (cloud-native, strong Azure integration, consumption-based pricing that scales with data volume), Splunk Enterprise Security (mature, highly customisable, significant operational complexity), IBM QRadar (strong compliance reporting, on-premises or cloud deployment), LogRhythm (mid-market focus, integrated SOAR capabilities), and Elastic Security (open-source foundation, attractive for organisations with in-house engineering capability).
SIEM value is determined almost entirely by the quality of the detection content (rules and correlation logic) and the quality of the log sources feeding it. A SIEM with excellent detection content and poor log coverage misses significant attack activity. A SIEM with comprehensive log coverage and poor detection content drowns analysts in noise. The deployment is the easy part; the ongoing tuning and detection engineering is where the real work is.
SOAR for Response Automation
SOAR platforms such as Palo Alto XSOAR, Splunk SOAR, ServiceNow Security Operations, and Microsoft Sentinel Automation (using Logic Apps) can automate the initial triage steps for common alert types: enriching an alert with threat intelligence context, looking up whether an IP address has been seen before, pulling the full account activity for a flagged user, and initiating a containment action such as disabling a user account or blocking an IP address after analyst approval.
Start automation with high-confidence, low-blast-radius actions. Automated account lockout when a user exceeds a specified number of failed authentications is a safe starting point. Automated firewall rule changes based on alert correlation require more caution: the blast radius of a false positive is higher, and the decision benefit to a human analyst reviewing the specific context before approving the action is real.
Tabletop Exercises and Plan Validation
A tabletop exercise is a facilitated discussion-based simulation of an incident scenario. It walks the IR team through a specific scenario (a ransomware deployment, a data breach, a cloud account compromise) and tests how the team would apply the IR plan to that scenario. The value is identifying gaps in the plan, unclear decision authorities, communication failures, and missing capabilities before you face them in a real incident.
Designing Effective Scenarios
The most valuable scenarios are based on realistic threat actor behaviour relevant to your sector. For a UK financial services organisation, that might be a business email compromise followed by fraudulent payment instruction. For a healthcare organisation, it might be a ransomware deployment targeting patient record systems with a UK GDPR notification obligation. For a technology company, it might be a software supply chain compromise via a dependency injection attack.
Build inject points into the scenario to test specific decision points: what do you do when the legal team confirms there is a notification obligation, what do you do when you discover the backup environment has also been compromised, what do you do when the attacker begins leaking data publicly while you are still in containment? These inject points test the depth of your plan rather than just the outline.
Full Simulation Exercises
Tabletop exercises test decision-making but not technical execution. Full simulation exercises, sometimes called red team/blue team exercises, test the entire response stack including detection capability, tool proficiency, and actual containment mechanics. These are more expensive and operationally intensive but provide a much more comprehensive assessment of response capability. Plan for at least one full simulation exercise annually for high-risk or regulated environments.
UK Regulatory Requirements: ICO and Breach Notification
UK GDPR Article 33 requires that personal data breaches be notified to the ICO within 72 hours of the organisation becoming aware of a breach, unless the breach is unlikely to result in a risk to the rights and freedoms of natural persons. Article 34 requires notification to affected individuals when the breach is likely to result in a high risk to their rights and freedoms. Both notification obligations can be triggered by an incident that compromises personal data, regardless of whether the organisation was at fault.
What Constitutes a Reportable Breach
A personal data breach under UK GDPR means any breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to personal data. This covers not just data theft but also ransomware that encrypts personal data without exfiltrating it (destruction/loss), accidentally emailing personal data to the wrong recipient (unauthorised disclosure), and a third-party processor suffering a breach affecting your data (you must still notify as the data controller).
The ICO publishes decision trees and guidance on assessing whether a specific incident meets the reporting threshold. The practical challenge is making this assessment quickly enough to meet the 72-hour window. Your IR plan should include a standard initial assessment checklist that the privacy team or DPO can apply immediately when an incident involves personal data, so the notification decision is structured rather than ad hoc.
The 72-Hour Clock
The 72-hour notification window starts from when the organisation becomes “aware” of a breach, not from when the breach occurred. In practice, the ICO interprets “aware” as when the organisation has a reasonable degree of certainty that a breach has occurred, not when every detail is known. You do not need to delay notification until the investigation is complete; the ICO expects a phased notification where an initial report is submitted within 72 hours and supplementary information is provided as the investigation progresses.
Submit an initial notification to the ICO even if you do not yet know the full scope. The report should cover: the nature of the breach and approximate number of data subjects and records affected, the categories of personal data involved, the likely consequences of the breach, and the measures taken or proposed to address it. Update the report as further information becomes available.
NIS Regulations
The Network and Information Systems (NIS) Regulations 2018 apply to operators of essential services (energy, transport, water, health, digital infrastructure) and relevant digital service providers. They require organisations to take appropriate security measures, implement business continuity management, and notify the relevant competent authority of incidents that have a significant impact on the continuity of essential services. Notification timelines are sector-specific and generally tighter than the ICO’s 72-hour window.
Communication During an Incident
Communication failures are one of the most common ways that incident response deteriorates from a managed process to a public relations crisis. Poor communication takes several forms: internal communications that are too sparse (people do not know what is happening or what they should do), internal communications that are too broad (information about an ongoing incident reaching people who do not need it and potentially the attacker themselves), and external communications that are premature, inaccurate, or inconsistent.
Internal Communication
Establish a secure communication channel for the IR team that is separate from your normal communication infrastructure. If the incident involves a compromised email system or collaboration platform, communications about the incident in that same platform are visible to the attacker. A pre-configured out-of-band channel (a Signal group, a dedicated conference bridge, a separate collaboration tenant) should be established in advance as part of preparation, not improvised during an incident.
Create a communication rhythm: regular brief updates at defined intervals to stakeholders who need awareness without detail, and a more granular information flow within the technical response team. Appoint a single person as the communications coordinator to ensure messages are consistent and that nothing goes out that has not been reviewed.
External Communication
External communication during an incident should involve legal counsel before anything goes to customers, media, regulators, or partners. Statements made during an active incident can create legal liability if they are inaccurate or are later contradicted by the investigation findings. The general principle is to communicate promptly with those who have a legal right to notification (ICO, affected individuals) and carefully with everyone else. Pre-draft communication templates for common incident scenarios as part of your preparation so you are not writing under pressure during an active incident.
Frequently Asked Questions
How long should an incident response plan document be?
A plan that is too long will not be read and will not be followed under pressure. The core IR plan should be concise enough to be usable in an incident, typically 10-20 pages covering roles, severity levels, the response process by phase, key contacts, and decision criteria. Detailed playbooks for specific incident types (ransomware, data breach, account compromise) can be separate documents referenced from the core plan, each covering the specific steps for that scenario without cluttering the main framework.
What is the difference between an IR plan and a disaster recovery plan?
An incident response plan addresses security incidents: the process of detecting, containing, eradicating, and recovering from a security event. A disaster recovery plan addresses recovery from any significant disruption to business operations, including non-security events like infrastructure failures, natural disasters, and utility outages. They overlap in the recovery phase and often reference each other; most organisations maintain both as distinct documents with clear linkages between them.
We are a small organisation. Do we need a formal IR plan?
Yes, sized appropriately. A small organisation cannot maintain a full-time IR team, but it can define who makes decisions during an incident, document the contact details for an external IR firm on retainer, have a process for assessing ICO notification obligations, and maintain an offline copy of critical system information. A one-page decision tree and a contacts list is better than nothing, and nothing is what most small organisations have until they face an incident without it.
How do we know if our incident response plan is actually effective?
Test it. Tabletop exercises expose gaps in decision-making and communication. Full simulation exercises expose gaps in technical detection and response capability. The metrics that indicate IR effectiveness are mean time to detect (how long from incident start to detection), mean time to contain (how long from detection to containment), and breach cost. Track these across incidents and simulations to see whether capability is improving over time.
When are we legally required to notify the ICO of an incident?
You must notify the ICO within 72 hours of becoming aware of a personal data breach that is likely to result in a risk to the rights and freedoms of individuals. This threshold is lower than many organisations assume: most ransomware incidents involving personal data, most data theft incidents, and most cases of personal data sent to wrong recipients meet this threshold. When in doubt, notify: the ICO views proactive notification positively, and failing to notify when required is treated more seriously than notifying when the incident turns out to be below the threshold.
What should we do in the first hour of a confirmed security incident?
Activate your IR team and establish your out-of-band communication channel. Preserve forensic evidence on affected systems before taking any remediation actions. Classify the severity level and confirm the incident commander. Begin logging all response actions with timestamps. Alert legal counsel if personal data appears to be involved. Initiate initial technical analysis to scope which systems are affected. Do not shut down affected systems without considering the forensic evidence implications; memory dumps should be taken first for systems that may have yielded valuable volatile evidence.