I spend a lot of time with operations leaders who have excellent visibility … but disappointing outcomes. The paradigm is that we invest heavily in detection, correlation and dashboards, then depend on human heroics. It’s the smoke-alarm-without-a-sprinkler problem.
From a CIO’s perspective, the competitive metric is no longer just mean time to detect (MTTD). It’s become mean time to contain (MTTC), the time between the first credible signal and a concrete action that limits damage. Shrinking MTTC requires something that I see many organizations lack: a secure, closed-loop operating model that can detect and decide.
Today, I’m sharing the playbook I’ve used to get to that cybersecurity model. It’s grounded in infrastructure reality and designed to align IT and security rather than put them at odds.
1. Know what’s on your wire (discovery is a security control)
Containment is impossible if I don’t know what I’m containing. Continuous discovery is integral to cybersecurity because it’s frontline control, not mere administrative housekeeping. New hosts, services, ephemeral workloads and other “unknown unknowns” are often where trouble starts.
I treat surprise assets and configuration drift as incident classes in their own right. When something appears that the CMDB or inventory can’t account for, my default is to segment or disable it until it’s validated. Ownership mapping matters here: every discovered item should resolve to a responsible team and a lifecycle policy. If discovery is stale, then containment is a gamble.
Questions to ask:
- What shows up in our environment each week that didn’t exist last week?
- Who owns it? What policy does it violate if it’s untagged, unpatched or misconfigured?
- What’s the default action if ownership is unclear?
2. Decide fast, but safely (triage patterns and thresholds)
The second gap I see is decision latency. Teams tend to wait for a perfect picture before they act. They don’t need perfection to contain, though; they need credible signals plus clear thresholds.
I codify decision logic using a simple rubric:
- Auto when risk is high and reversibility is easy (e.g., revoke tokens, move to quarantine VLAN).
- Approve when the signal is strong, but business impact could be material; a human clicks once to authorize a predefined action.
- Ask when context is missing; the system assembles the case and routes it to the right owner fast.
Under the hood, this blends signal, identity, recent change and business criticality. I also use time windows (after-hours policies can be stricter) and blast-radius caps (limit actions to a segment, account or service instance first).
This is what I mean by human-on-the-loop: automation proposes and executes within guardrails while humans steer exceptions.
Questions to ask:
- Which corroborating signals justify auto-containment?
- Where should after-hours policy differ from business hours?
- How do we cap impact if a decision is wrong?
3. Do the thing (trustworthy, reversible containment)
Detection and triage only matter if they lead to action we can trust. I prioritize reversible steps that reliably reduce risk:
- Network: shift endpoint to a quarantine VLAN; throttle or block suspicious egress.
- Endpoint/server: isolate host, stop a specific process, revert a config to last known good.
- Identity: invalidate tokens, rotate keys, force MFA or session kill.
- Cloud/IaC: auto-rollback drifted resources, reapply baseline policy, detach overly permissive roles.
Every action needs baked-in guardrails: canaries, progressive rollout, health checks, automatic rollback and a kill switch. Every action must leave a complete audit trail (who/what/when/why) so security, compliance and post-incident review have evidence.
Principle: Auto-contain what you can reverse ASAP.
4. Two Week 1 playbooks (high confidence, high payoff)
At this stage, I pick two plays that are both impactful and easy to reverse:
Credential misuse containment
- Trigger: abnormal login pattern (impossible travel, atypical device, repeated failures, then success).
- Action: revoke active tokens, require step-up authentication on next attempt, notify user and security.
- Guardrails: cap to user/account scope; automatic rollback if user verifies legitimate activity.
Suspicious egress throttle
- Trigger: sudden data egress spike or new destination.
- Action: rate-limit or block egress for the source, isolate the host or VPC segment, alert security.
- Guardrails: progressive exposure (start with throttle), auto-revert if a scheduled job or backup is confirmed.
These tactics cut MTTC dramatically with minimal business disruption. In the long term, they create organizational muscle memory for how automated containment feels when it’s done right.
5. Make IT and security a single operating model
I’ve seen these tactics fail when IT and security operate on different clocks. The fix is a joint operating model:
- A shared taxonomy (severity, asset classes, incident types) and a unified queue where automated actions, approvals and evidence coexist.
- Security curates signals and policy; IT encodes infrastructure actions and guardrails.
- Pre-approved change windows for auto-fixes; clear break-glass paths for exceptions.
- Communication triggers for business owners, legal, compliance, etc. decided in advance, not improvised mid-incident.
The point of this strategy is to agree on who decides what, with which safeguards and at what speed.
6. Metrics that matter to the business
I shift reporting from volume to velocity and safety:
- MTTC (mean time to contain): detect to first containment action.
- Percentage auto-contained: incidents that required no human touch to contain.
- False positive rate and auto-rollback rate: speed is only valuable if safety holds.
- Repeat rate: how often the same class of incident escapes containment.
- Time to restore normal: because containment isn’t the finish line.
Executives care about downtime exposure, blast radius and auditability. These metrics translate technical action into business risk.
7. Anti-patterns to avoid
A few traps I watch for:
- Automating without discovery. If ownership is murky, even “smart” actions will bite you.
- Alert-only culture. If every response ends in a ticket, you’re leaving MTTC to chance.
- Unbounded scripts. No blast-radius caps, no rollback, no kill switch: no thank you.
- Skipping learning. If we don’t tune thresholds and guardrails after incidents, we’ll repeat mistakes.
- One-tool thinking. This is an operating model; any mix of tools can implement it if the rules are clear.
8) A 30/60/90-day rollout
When one of my clients asks where to begin, I suggest a tight timeline:
- Days 1–30: turn on continuous discovery; document ownership; choose two high-confidence triggers; define guardrails (caps, rollback, canaries).
- Days 31–60: pilot the two playbooks in one network segment or app area; measure MTTC, false positives and auto-rollback; tune thresholds.
- Days 61–90: expand coverage; add a third play (e.g., cloud drift auto-rollback); publish an executive dashboard focused on MTTC and percentage auto-contained.
Small, safe wins create trust. Trust creates the appetite for broader automation.
That’s the journey.
Detection’s vital importance to cybersecurity
Detection is table stakes in IT. What differentiates resilient organizations is what they can safely do in the first five minutes. You don’t need perfect systems to act; you need known-good actions with known-good guardrails. Start with two containments you can reverse, measure MTTC relentlessly and scale from there. Security will thank you and, more importantly, the entire business will feel the difference.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Read More from This Article: Shifting your organizational focus from MTTD to MTTC in 8 steps
Source: News

