- First confirm your fears. You don't want to be the boy who cried wolf; you need to have a firm basis for your theory that someone has hacked (or is hacking) your network. At this stage, don't worry about mitigation. Just be sure you have enough evidence to form a solid basis for your theory that maliscious activity is going on.
- Document everything. Start a timestamped log (textfile or handwritten notes) and gather your evidence nondestructively. Throughout the following steps, keeping a timestamped log of who was doing what is going to be important for many reasons. When it’s all done, you’ll need a record of what was damaged, what was changed, what was offline and for how long. The records will also be valuable in possible legal actions, and for planning for your next incident.
- Assemble the team. By which I really mean - get your boss on the horn. At this point you want only the boss - don't involve coworkers. Let boss know what's going on, and what your evidence is. Emanate calmness and rationality here! No one is helped by a sense of fear or excitement or, dare I say it, panic. Suggest that the boss may want to involve legal, public relations, HR, line of business stakeholders, other IT workers - but let the boss decide and make those calls. Let him know that you're going to take no action other than diagnostic and getting your notes in order while he's assembling the team.
- Make preliminary decisions. Now that the team is together, you’re going to need to present the information you’ve gathered so far. Next your team have a couple of relatively simple decisions to make:
- Who has ultimate authority? This is not the time to have everyone off playing hero in an uncoordinated response. You need a ‘fire chief’ – and the chief needs to know what everyone is doing.
- Priorities. Which is more important – fixing damages, closing the hole, preserving evidence so you can prosecute the attacker? You’re aiming to get a ranked priority list here. A bit of advice: do what you can to make sure that blame is de-prioritized. Let all members of the team know that this is most definitely not the time to even think about finger pointing!
- Who will work on what?
- How often should the response team get together to assess status and make decisions?
- Communication plan. IT workers will want to be able to concentrate – their effectiveness is vastly reduced if they have to answer a ringing phone every five minutes. Work out a quick plan to avoid that, and put one person in charge of disseminating information to whomever else needs it. Let him be the guy whose phone rings every 5 minutes.
- Work plan. Whomever is in charge will need to marshal forces carefully. In an event like this, people tend to overwork themselves – that can lead to mistakes, or a skeleton crew available tomorrow when the next problem happens. Be sure to rotate people effectively.
- Assess damage (and damage potential). Now, with the team assembled and preliminary decisions made, it’s time to reconsider the damage done and the potential for more damage. Your team may need to do more investigation at this point, or it may have enough information to procede to the next step.
- Stop the spread. Consider whether you need to shut down servers, services, network links, etc. Maybe a firewall rule change is enough. Think about how to make the smallest (and least destructive) changes that have the greatest impact on slowing the growth of the problem.
- Notification plan. You’re probably going to have to tell your users about an outage. But what about customers, business partners, law enforcement? Consider carefully the phrasing and timing of such notifications. This is a management task.
- Remediate. OK, now that you understand the damage, and you’ve stopped the spread, it’s time to consider the cleanup plan. This is going to depend much on some of the decisions made at step #4 – do you need to preserve evidence for later legal or disciplinary action? Which systems need to be cleaned, secured, and brought back online first? What can wait?
- Post-Mortem. The post mortem is often skipped, as exhausted people all take their much-deserved rest, and then return to all those tasks they deferred whilst working on the outage. But impress on your boss the need for spending a couple of hours critiquing the incident, running down loose ends, and so on. It’ll be worth it – though you may never be able to measure that worth. Because you’ll never know about the problems your post-mortem successfully prevents!
Even if you can't get the post-mortem going, this is the best time to hit up your boss for funding the protective measures you wish you'd had before. Could be a new backup system, better network partitioning via VLANs - whatever. The iron is hottest about a week after the event. Strike it!
To me, most of the above seems like just plain common sense. But, having been through more than a few SHTF scenarios, I am always amazed by the number of people who, in their zeal to fix the problem, take leave of common sense, or do something which looks sensible from their POV but proves wasteful or even counterproductive when you step back and take the wide angle view.