ICS/OT Incident Response: Why Your IT Runbook Won't Save the Plant

INCIDENT RESPONSE

Your IT incident response playbook says \”isolate the affected host.\” On a plant floor, isolating the wrong host shuts down production or trips a safety system. OT incident response is not a flavour of IT IR. It is a different discipline.

Your IT incident response playbook says “isolate the affected host.” Good advice in the data centre. Catastrophic advice on a plant floor where the affected host is running the HMI that the operator is currently using to keep a reactor from overpressurising.

ICS/OT incident response is not a flavour of IT IR. It is a different discipline, with different priorities, different actions, and different people in the room.

The priorities are inverted

IT incident response, simplified: Confidentiality, then Integrity, then Availability. Protect the data first. Keep systems trustworthy. Availability matters, but you would rather a system be down than leaking.

OT incident response, simplified: Safety, then Availability, then Integrity, then Confidentiality. A human life comes first. Keeping the process running safely comes second. Data confidentiality is a distant fourth.

This is not pedantic. It dictates every decision in the first hour of an incident.

The six things an OT IR plan has to answer that an IT plan doesn’t

1. Who has authority to take the plant down?

Not the SOC analyst. Not the CISO. Usually the site manager or the duty operations manager, and even they often need approval from operations leadership. Your plan must name the actual human, their backup, and the escalation path at 2 AM.

2. What is the safe fallback state for each process?

A chemical plant has a safe shutdown procedure. A water utility has a manual override. A power station has a black-start procedure. Your IR plan must know which fallback applies to which scenario, and it must be written by people who understand the process, not just the network.

3. Can you still run without the compromised system?

Some plants can hand-operate for a few hours. Some cannot. Your IR plan must know, in advance, which systems have manual fallback and which do not. If your SCADA is compromised and there is no manual fallback, “isolate the host” means “stop production.”

4. Who do you have to tell, and when?

Regulators (CERT-In, CISA, NIS2 competent authorities). Insurers. Customers whose deliveries depend on your plant. The local authority if there is public safety impact. Each has a clock. The IR plan must list them all with the trigger conditions.

5. What evidence can you collect without making things worse?

In IT, you take a memory dump. In OT, you probably cannot. The HMI you would dump is controlling a live process, and you are not rebooting it in the middle of an incident. Evidence collection in OT is constrained by operational tempo. Plan what you can collect, how, and by whom.

6. How do you know it is really over?

IT incidents end when the threat actor is evicted. OT incidents end when the threat is evicted and the process is back to a known-good operating state, which is often much later. Your IR plan must define “recovered” in operations terms, not just cyber terms.

Three lessons from real incidents

Lesson 1: The people who respond are not the people who wrote the plan. The shift supervisor at 2 AM on a Sunday is the one executing your runbook. Write it for them. Use plain language. Assume they have five minutes to find what they need.

Lesson 2: Communication is the failure mode, not the technology. In every post-incident review I have been part of, the technical response was mostly fine. The communication (between IT and OT, between site and HQ, between company and vendor) is what broke. Build drills around communication, not just technical containment.

Lesson 3: The plan rots if you do not exercise it. An OT IR plan that has not been tabletopped in 12 months is approximately useless. Run a tabletop every six months. Include the shift supervisors. Include the vendor. Include the regulator if they will come.

A minimal viable OT IR plan looks like this

If you have nothing today, start here:

A one-page summarynaming the incident commander, the authority to stop production, the call tree, and the top three scenarios (ransomware on the business network, compromise of an engineering workstation, unusual behaviour on the control network).
A scenario-specific runbookfor each of those three, under five pages each.
A regulatory notification checklistwith the clocks and the recipients.
A quarterly tabletop calendar.

That is the floor. Above this, you add depth. Below this, you do not really have an IR plan; you have a document.

RelyBlue runs OT incident response planning engagements and facilitates tabletop exercises (TTX) that include plant-side staff, IT, and vendors. Talk to us if your current plan has not been exercised this year.