This process describes the way in which [Org Name] receives and responds to computer security incidents. This process covers how incidents are assigned, analysed, managed, escalated, closed and reviewed for lessons learned.
Whenever an incident is received an incident handler is responsible for providing an initial response and ensuring the incident is followed through. This first response should be provided as soon as possible, and must always happen within [time defined by the DSHCS's service level agreement].
The case owner is responsible for the analysis of and response to the incident. The criteria to define the case owner should include:
- Case priority.
- Language of the case / languages spoken by the incident handlers.
- Incident handlers' case loads.
- Geographic location of the beneficiary / time zone.
- Skill set required to resolve the incident.
Shift leaders are responsible for balancing the workload within and across offices. If necessary an incident handler can request that a case be owned by a different person, in case the new case owner is better suited to deal with the ongoing incident. This should be done in agreement with the current and future owner. If necessary the beneficiary should also be informed of the change of ownership.
Case priority refers to a value assigned to each case. Priorities help case owners and, generally, the DSHCS team manager to allocate the right amount of resources for each case. They also define the order in which cases should be resolved. Priority reflects the organisational response required for each request.
Among the variables involved in prioritisation, impact and urgency are the most relevant.
In instances where the beneficiary is in danger, physically or digitally, and the consequence of not acting is severe, the case should be addressed by the incident handler by considering the possible consequences and effects of the issue and the solution to be proposed. There are three categories for case impact: high, moderate and low impact.
To establish a case's impact, a guide table is presented below:
|High (H)||- Someone has been or is at risk of being injured
- The beneficiary is dealing with a reactive/dangerous situation
- There is a high risk of sensitive information being compromised
- Personal information of several beneficiaries is likely to be compromised
- The damage to the help desk's reputation is likely to be high if the situation isn't handled properly
- The beneficiary might be a high-profile person
|Medium (M)||- The consequences caused by the incident can be defined as an intermediate value between low and high
|Low (L)||- The case aims at preventing a future security incident for the organisation
- The consequences of the help desk's advice do not translate into an imminent physical damage to the beneficiary
Sometimes cases might impact the help desk and its reputation. These cases need to be handled with special care, involving the management team in their resolution.
This is defined as the amount of delay that can be tolerated and how quickly a solution is needed. Cases can be classified as highly urgent, moderately urgent and not urgent. This will depend on various factors, including the timelines involved and the level of threat if action is not taken within a certain time frame.
To establish case priority incident handlers should also consider what the beneficiary mentions when opening the case. Sometimes beneficiaries specify that the case is urgent for a particular reason. To establish case urgency, a guide table is presented below:
|High (H)||- The consequences caused by the incident increase rapidly over time
- A minor incident can be prevented from becoming a major incident by acting immediately
- The case was opened in a reactive manner, by a beneficiary seeking for immediate assistance
- Is it a DDoS? Is there an ongoing data breach?
|Medium (M)||- The consequences caused by the incident increase slowly over time
|Low (L)||- The consequences of not solving the case do not increase over time
- The case aims at preventing a future security incident for the organisation
By combining the above-mentioned factors (urgency and impact) the incident handler can assess the corresponding case priority. This priority is listed in the table below:
NOTE: If there is any doubt about the urgency or impact of a case, it is always best to err on the side of caution and not to take any risks.
As a rule, if a case has a higher priority, it also has a significant impact for the help desk and the beneficiary. Note that no matter what priority the case has, if the incident handler is unsure about the advice they should give they must request support from colleagues.
The following is our incident response workflow. This provides a general overview of how digital security incidents should be managed. It doesn't provide advice on how to tackle specific incidents. For specific advice on how to manage different types of incidents, please refer to our procedural documentation.
The incident response life cycle followed by the help desk is based on: Paul Cichonski, Tom Millar, Tim Grance, Karen Scarfone, Computer Security Incident Handling Guide, NIST, 2021. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf.
When an organisation reaches out to our DSHCS asking for preventative support, they are often preparing and adjusting their practices to prevent digital security incidents from taking place. This is the ideal scenario, where we help our beneficiaries mitigate the risk of compromise of their security or data.
On some occasions the beneficiary's request is to investigate a potential incident or attempted incident. These types of reports do not include clear evidence that an incident has already happened and thus require an initial investigation to verify the reporting and confirm if an incident has already taken place or not. This phase is called detection and the handler may require more evidence for their investigation until it is clear for them whether this event indeed compromised the beneficiary's security, or if it was just an event without consequences. Our procedural documentation should help the handler understand what evidence or information is helpful to investigate different types of incidents. These types of cases will normally be of medium or high urgency, especially while still determining if an incident took place or not.
However, often beneficiaries reach out to our help desk when an incident has already taken place. This means there has been one or multiple actions that intentionally harm or attempt to harm the beneficiary's system, network or data. Examples can be: system unavailability, data leak, device seizure, account compromise, etc. Therefore, we normally prioritise the containment of that incident to stop harm from spreading to other parties.
Actions that we often take to contain incidents include: requesting an online platform to suspend a compromised account, isolating a compromised system from the network, suspending a defaced website, removing leaked data, etc. Containment should be quick in most of the cases and it should be prioritised. In some cases, we will rely on the beneficiary's actions to remove their system from the network or isolate it while we provide the technical instructions via remote communications. Urgency will normally be high in cases where we are trying to contain an incident that is taking place.
When containment is achieved, it is often important for the case owner to dedicate some time to analyse the root cause of the issue. This normally leads to an investigation that usually takes place under the detection and analysis phase. Depending on the category of the case, additional evidence could be requested from the beneficiary to perform analysis, such as the source of received recently malicious email, the link to download a malicious app, screenshots of antivirus alerts, etc. The goal here is to determine if there are any additional actions that should be taken to ensure that the recovery from the incident is substantial. Again, the handler should refer to our procedural documentation to know what other information can be helpful to perform their analysis.
Eradication consists in cleaning any compromised systems to ensure a substantial recovery. It could be as simple as installing and running an antivirus application or in some other cases it could require a fresh system install. However in all cases, it is essential to know and document what - if anything - the attacker left behind and clean it. This is useful in order to monitor the attacker's possible comeback and also to look for other similar attacks against other systems or beneficiaries. For systems that cannot be installed again or cannot be reset to factory settings, the attacker's artefacts that are identified in the analysis phase can be removed manually: think of startup or cron tasks to relaunch a backdoor. In cases such as DDoS attacks, eradication is not possible because in such incidents the attacker does not leave any artefacts and the source of the attack is so distributed that taking down every implicated host in the attack is neither possible nor reasonable. However in cases where the attacker's infrastructure is not distributed, taking down this malicious infrastructure should be considered part of the eradication phase. Reporting an account that is leaking information or suspending an email address that is sending phishing emails could also be considered eradication. Eradication usually should not be urgent as the threat should be already contained at this stage. However if the system continues to be live or connected (according to the owner's preference) and the analysis has discovered artefacts that allow the attacker to return soon or immediately, the urgency of the case should be marked as high.
The distinction between the recovery and eradication phases is not always clear, as you end up recovering from the incident by just eradicating the attacker's artefact: think of removing the hacker's email address or phone number from the account and associate it to legitimate ones or using an antivirus to clean a non persistent worm. However, to ensure recovery it is important, in some cases, to monitor for any comeback from the attacker. This task should be considered especially important when we discover that the attack is highly targeted and the threat is persistent. In these cases, attackers won't hesitate to attack again using the same vulnerability/weakness or by looking for other ones. Depending on how feasible it is to do so, you can help a beneficiary monitor for any of the artefacts that have been already found and removed in the eradication phase.
Post-incident activity consists in preventative work that can be performed following the attack. It could be training, system hardening, penetration testing, a security audit and assessment of the beneficiary's organisation, among other things. However some of this preventative work could be done earlier in the incident's life to ensure a substantial recovery too. For example, a beneficiary whose account has been hacked could be assisted to create a new email address protected by a strong password and 2-factor authentication to be able to recover their account. In cases of system compromise, a vulnerability scan could be conducted on the system to close any vulnerability that allows further attacks before this system is back to production. In harassment cases open source intelligence research could help a victim recover from a previous harassment attack, to identify any available online information that could be used by the attacker again. Post-incident activity cases that are required to recover from the incident should never be marked as low urgency cases!
The following notes should be considered when responding to requests:
After a case is received, in addition to the automatic reply sent by the ticketing system, the operator on shift should personally answer the case requester, explaining they will be in charge of the case and making themselves available for any issue that arrives.
Vetting a new beneficiary could take some time. While this process is taking place the owner of the case should begin working on the solution, considering that while the beneficiary is not vetted extra care should be taken as to what information is shared and what actions are taken since we haven't yet confirmed the link of trust.
When looking for solutions for cases, please always consider the following suggestions:
- Look for related procedural documentation.
- Escalate to other colleagues and/or consider reaching out to partner organisations for specific cases.
- Consider reaching out to CiviCERT.
- In case you reach a dead end, always escalate and discuss the case with your manager.
When closing a case the incident handler should record the reason for closing it. The possible options are:
- Successfully Solved: Case goal was successfully completed.
- Customer unresponsiveness: The beneficiary was not responsive after several
- Future Improvement: Case goal was not fully completed and further actions
will be carried out in the future.
- Unsuccessful solution: Case goal was not met.
- Customer Request: Customer explicitly requested the closure of the case.
- Internal Case Cancellation: Case was cancelled after request from internal team member.
A case should only be closed due to a lack of response from the beneficiary if that unresponsiveness stops us from meeting the goal of responding to the incident. If the incident handler has met the requirements to complete the case, then the case should be labelled as successfully solved, regardless of whether we hear back from the beneficiary or not.