11 Things You Need to Create a Cyber Recovery Plan
Cyber Recovery Planning is critical to the survival of your organization in the event of a cyber attack.
According to Gartner, companies who pay the ransom only recover on average 61% of the hacker encrypted data and only 8% recover it all. In fact, organizations who pay a ransom to cyber-criminals following a cyber-attack are highly likely to suffer a subsequent attack.
Well practiced and tested Cyber Recovery Plans (CRP) can dramatically speed up the recovery process and increase the likelihood of success. The nature of attacks and their impacts vary and are constantly evolving. Most common ransomware attacks look to extort victims to pay a fee, usually in crypto-currency, to decrypt systems and files, restore account-based access to resources, and to not release exfiltrated data to the public.
Here, we’re focusing on the “Recovery” aspect of a Cyber Incident, not so much the incident response, detection, eradication, containment, communications, forensics, lessons learned or other aspects of a comprehensive Cyber Incident Response Plan.
Assuming you have been successfully attacked (meaning your preventative security measures failed to keep attackers out) you will either have to pay a ransom (with no guarantees - you are dealing with criminals after all) or recover your systems and data. Being well prepared to deal with an attack is key to survival, and will provide significant benefits in a number of areas:
The likelihood of a successful recovery
Increasing the speed of recovery
Reducing the amount of data you may loose
Reducing the likelihood of re-infection
Reducing the overall cost impact of the incident (avoid ransom, back to business quicker, less resources required to recover)
Following are 11 things you need to know and / or do to create a viable Cyber Recovery Plan.
1. Evolve your thinking about cyber recovery vs. disaster recovery
If you have existing disaster recovery plans, they will likely not be sufficient to recover from ransomware or other cyber events.
Disaster Recovery Plans (DRP) are good at dealing with natural disasters, physical damage, and major service outages. Typically, you would have one DRP for each physical data center. DRP can achieve aggressive recovery times including sub-minute recovery point objectives (RPO) and “near zero” recovery time objectives (RTO), depending on the technology and tools deployed. Often, disasters necessitate an “all or nothing” recovery, making decisions to act easier and have more obvious recovery scenarios.
Cyber Recovery Plans (CRP) address incidents such as cyberattacks (ransomware, privileged account exploits, DDoS attacks, etc.), data breaches, and other cybersecurity incidents. The approaches bad actors use are varied and numerous, but the result is usually a loss of access to production systems and data. Where DRP can provide quick RTO and tight RPO timelines, cyber recovery is messy and complicated such that it’s often measured in days, weeks, or even months. This greatly increases the importance of Business Continuity Plan!
The impacted systems or “blast radius” of cyber attacks can also vary greatly, from a few systems to all systems and data; they can be isolated within a subnet, or span across many networks and data centers. This requires teams to be dynamic and well practiced with the tools and techniques required to detect/identify, contain/control, eradicate the source, and recovery. Further, plans should include the means to quickly and securely make required tools available.
The physical systems, and / or cloud constructs that make up production environments are still in tact, but highly compromised. Which can you trust? Blind restoration from backups is possible, but not recommended as it is difficult to determine time of breach and whether a backup is also infected. The typical cyber recovery approach includes restoration using Clean Rooms in an Isolated Recovery Environment, where the recovered data is validated to be “clean” by a forensics expert. Depending on the nature of the attack, it may be necessary to recover with backups from multiple different recovery points. Cloud computing offers significant benefits to this approach, as “spinning up” new isolated environments and clean rooms can be quick and automated. In non-cloud environments, one needs a plan on what to keep for longer term forensics on the attack and what to wipe for clean room purposes. Note that virtual cluster could become compromised - vCenter has had a few zero-day exploits the past year.
Most importantly, whether it is CRP or DRP, preparation and testing are critical to success.
2. Maintain a constant state of readiness and preparedness
A key issue with recovery capabilities (CRP or DRP) is keeping the technology, processes and people constantly ready. Often, implementation and planning are done in the form of projects, having a start and an end - however, organizations must remain vigilant to ensure recovery capabilities remain viable and sustainable. Following are some key things to consider:
Keep current with the latest cyber threats and appropriate tools and techniques to thwart them - change in this space is rapid!
Rigorous change management - any changes made in production could negatively impact recovery capabilities.
Keep technology current - perform updates, patches, refreshes as required.
Regular testing and exercising - to catch issues, validate capabilities, keep staff sharp, update processes.
Consider and include cyber recovery in new project architecture and design - this will be more effective / efficient than retro-fitting into a launched production application
Modernize Governance - ensure that policies, standards, and controls are adhered to and measured via automation from an auto-discovered CMDB. Dashboard the results to keep teams from drifting away from being compliant.
3. Training and skills development of key personnel
In cybersecurity, change is constant and rapid. Keep your staff well trained and practiced in the tools and techniques required to execute a cyber recovery at all times. Testing and exercising is an excellent way to keep teams sharp, and help ensure that they can execute under the pressure of an actual event. Make sure you have more than one person available to perform key steps and retain expertise - cyber recovery will take days to weeks to months, so burnout is a factor. Many companies engage third party cybersecurity firms to assist in the event of an incident - although this is a viable strategy, skillset of individual, quality of services, cost, and availability at time of incident present some risks. Also, the uniqueness of each organizations IT (infrastructure, applications, data, process, business, etc.) are important to understand and can not be expected to be known from third parties.
4. Regular testing and exercising of recovery plans, tools, and techniques
Unlike DRPs which can be fairly static in nature, CRPs need to address a much wider variety of incident types, impacts, blast radius, and recovery scenarios. DRP testing can lead to one or a few refined recovery plans that can be executed similarly each time - the goals of testing are largely about ensuring the processes, technology and techniques function to achieve a defined outcome. For CRP, although those same goals can exist, it is less likely that a very specific scenario and blast radius can be predicted. Identity systems (Active Directory) and controlling IT (vCenter, keyvaults, Certificate Authorities etc) systems will be the high impact targets. Hence, for CRP, a key goal should be to get your teams proficient and confident with the tools, processes and sequences they may be utilize in a recovery - such that they are ready for just about any circumstance and can adjust as required.
Consider the following analogy:
DRP is like a band that constantly practices / plays the same song over and over - very quickly, each musician would know their part well. Even novice musicians can learn a few simple songs.
CRP is like a band that takes requests from the audience - they never know exactly what they will be asked to play. This would require significantly more skilled, competent, and confident musicians.
In conclusion, CRP requires teams to be highly proficient in the tools and techniques required to address a wide variety of incidents, so they can quickly and effectively make decisions, adjust plans, and act in real time.
5. Know what you have
The challenges that asset management presents to most organizations is not trivial - the sheer quantity of “things” that IT is required to manage continues to grow and overwhelm: whether we are talking systems, storage, applications or data. Having an accurate representation of your IT environment is critical to cyber recovery, including:
IT Infrastructure architecture and design
Data centres
Networks
Systems (physical, cloud, VMs, appliances, etc.)
Storage (physical, cloud, etc.)
Accounts & access management / RBAC
Applications and their infrastructure mapping
Data
Categorization
Criticality
Security classification
Location
Backups
Retention
Immutability
Location of copies
Air-gap vaults
There are many tools available to perform discovery and populate / keep a current CMDB. Establishing and tracking relationships between entities are also important to understand dependencies and priorities. It is critical to know what you have to be able to properly plan and execute cyber recoveries. Of special note is the difficulty most organization have around understanding and managing their data - ultimately, data is the most valued asset in most organization, and the least understood. Data management and governance should be a top priority in your organization!
Ensure that key artifacts that will help you rebuild / recovery are also protected and available in the event of a cyber attack. Similar to protecting backups, critical documentation (recovery plans, IT architecture and blueprints, legal documents, etc.) should be kept in an immutable, air-gapped or otherwise protected location for availability / use in the event of a cyber attack.
6. Design and protect your backups - specifically to address cyber attacks
More and more, bad actors are targeting backup data and systems as part of their attack to compromise your ability to recover. This gives them more power and leverage to collect ransoms and achieve their goals. There are a number of tools and techniques you can use to protect your backups and ensure they are available and viable in the event of a cyber attack, including:
Immutable backups: effectively WORM (Write Once Read Many), technology to prevent ransomware / bad actors from encrypting or deleting backup data. This is accomplished via settings in cloud storage accounts and/or specific vendor storage arrays.
Air-gapped backups: backups are stored offline or in a physically isolated environment from the production network.
Proximity of the air-gapped immutable and encrypted backup vault to the Isolated Recovery Environment (IRE) to avoid network transmission delays when full recoveries of data into clean rooms are required. DR is required across sites, while CR speed requires local network speeds.
Multi-Factor Authentication (MFA): Prevent unauthorize access to your backup schedules and retained data. Accounts used to manage backups must be closely controlled and monitored.
Network segmentation / Zero Trust Architecture: restrict unauthorized access and lateral movement of bad actors / ransomware by segmenting backup infrastructure from production environments and deploy a zero trust architecture.
Anomaly detection: some products enable identifying fragments of ransomware attack by comparing daily backups to previous days. This adds an additional detection capability to firewall intrusion protection and endpoint XDR solutions that both have to deal with real time data.
Testing, training, and awareness of backups & restores.
7. Define and document cyber recovery plans for your most likely scenarios
Although the exact nature, impact, and blast radius of a future cyber attack can’t be know, CRPs should be created for your most likely threats and scenarios. CRPs will be supporting documents to your Cyber Incident Response Plans (CIRP) - again, CRP will focus on the recovery / restore aspects of an event, should the CIRP lead you to decide to execute a CRP. Some processes / procedures can take the form of Standard Operating Procedures, enabling a more modular approach to executing technical recoveries and related actions. CRPs can include:
Quarantine / secure copies of infected systems / data for future forensics
Immutable Backup Vault (IBV) that was air-gapped from production and anomaly scanned to help identify available recovery points, free from malware
Clean Room / Isolated Recovery Environment setup to enable forensics on a per application basis to validate clean systems and data
Enable / provide access to the tools required to perform a recovery / restore / forensics validation
Prioritize sequence of recovery / restore activities
Promote release for production use
8. Create the capabilities to establish Clean Rooms in Isolated Recovery Environments (IRE)
An important aspect of CRPs is the ability to create and access Clean Rooms in IREs. These are fully isolated network environments from production, used to test and to perform recoveries. But realize they will still require access by your forensic provider and security toolsets/clouds like XDR, vulnerability scanning etc. IRE clean rooms provide a number of key benefits including:
Environment to perform regular testing and exercising
Separate VPN and identity access from production, including your forensics provider
A target environment to recover into (outside of production) - often referred to as Clean Room bubbles.
In cloud, the ability to quickly “spin up” numerous isolated environments to perform numerous simultaneous recovery to help identify a viable backup (i.e. ransomware was present in most recent backups)
Clean Rooms / IREs are especially effective in cloud environments, where they can be created before hand with minimal cost (as no running VMs would be necessary) or to leverage code (Terraform, CloudFormation, ARM, etc.) to quickly build recovery target or test environments on demand. This also greatly simplifies and reduces testing efforts, allowing for more frequent testing and exercising. Also valuable for change management testing.
9. Ensure and prepare for the tools you will require to perform a cyber recovery
Having the tools necessary to perform a recovery early in the process is key to reducing the time to recover and increasing the likelihood of success. Including the critical tools as part of your Clean Room / IRE can ensure they will be available when your production environment is inaccessible. If you had to recover all the tools first, before actually being able to start system and data recovery, would greatly increase your recovery time. This may mean some extra costs, as potentially hardware, licenses, cloud resources, etc. would have to be procured. There are techniques to minimize those potential costs. Some of the tools that you would need available immediately, regardless of the nature of the incident, include:
Incident Response & Orchestration tools
Backup and recovery solutions: this would include the backup and recovery tool systems and software (core backup applications, media agent servers that perform restoration), backup media (disk, cloud, other), immutable backups, etc.
Security Information & Event Management (SEIM)
Forensic Analysis Tools
Update and Patch Management
Endpoint Detection & Response (XDR)
Ransomware Decryption Tools
Malware Analysis & Threat Intelligence
Access to key documentation repositories or services (recovery plans, run books, legal artifacts, architecture / blueprints, CMDB, etc.)
10. Leverage new tools & techniques to modernize governance and keep CRPs and infrastructure viable over time
IT environments constantly change. Ensuring changes do not negatively affect your resilience and ability to recovery in the event of an incident is crucial. Not only technology, but also processes and procedures can be rendered ineffective without proper change management and governance oversight. The sheer quantity of “things” that exist in an IT environment / business presents significant challenges to governance. And maintaining accurate inventory of applications, systems, services, data, configurations and their inter-dependencies is difficult and rarely done well.
Leveraging modern tools and techniques to track, report and alert in real time is possible…
11. Get going on cyber recovery preparations before its too late!
As per IDC's report discovering why Cyber-Resilience is a top concern for C-Suites, they highlight the top concerns for C-suites is cyber-resilience, particularly ransomware attacks. Digital businesses are characterized by data growth; high interconnectedness of devices, people, applications, data, and networks; as well as cloud migration increasing vulnerability to cyberattacks. Ransomware attacks have multiplied exponentially, and now, generative AI is threatening with more sophisticated attacks. In combination of it all, savvy organizations need to plan for "when it happens" scenarios rather than "if it happens". IDC believes that without holistic recovery infrastructure and recovery testing strategies, cyberresilience plans are half-baked.
Plan your recovery before its too late.
thinkwise.cloud