Resilience & Recovery

Cyber Resilience: “The ability to anticipate, withstand, recover from, and adapt to adverse conditions, stresses, attacks, or compromises on systems that use or are enabled by cyber resources.”

Source: NIST SP 800-172

Thinkwise specializes in architecting and implementing the tools, techniques and processes to ensure your critical infrastructure, services, systems and data meet your availability and recovery needs

In today’s reality of constant ransomware attacks and increasing damage from climate change, addressing uptime, availability, Recovery Time Objectives (RTO) and Recovery Point Objectives (RTO) is paramount. We provide this expertise for todays highly distributed cloud and hybrid environments.

Thinkwise understands, by having worked with hundreds of clients over the years, that this is a challenging area for most organizations. The expertise to design, build, operate, and sustain viable resiliency and recovery capabilities is typically not available in-house. That is where Thinkwise can help!

Cyber Recovery (CR) and Disaster Recovery (DR) Solutions:

Cyber Recovery Strategy

The threat of ransomware posed by cybercriminals has never been greater, and continues to grow. Regardless of the cyber security framework or “shift left” best practices organizations have adopted, it is no longer a case of preventing “if” you’ll be attacked but “when” and how prepared you’ll be to recover critical data from backup quickly as your Last Line of Defense. Most frameworks focus on prevention and detection, with recovery from backup as a mere line item, without providing key guidance required to ensure such recovery will be viable.

Many tools and techniques are available in the market today, each with their strengths and weaknesses. Backup and recovery is a complex and nuanced domain, often not well understood within organizations, and becoming more complex with the vast distribution, sheer volume and diversity of systems, applications, and data. For Cyber events, it is important to have a “clean room” or isolated recovery environment (IRE) as part of the recovery process as well as a process to promote that recovered data back into production quickly while quarantining affected systems for further forensics.

Thinkwise advisors have decades of experience in backup & recovery, and have deepened our knowledge given today’s cyber ransomware world reality. Our past experience shows there is a lack of understanding of the underlying realities, risks, and opportunities of hybrid (cloud and on-premises) backup & recovery and the processes and procedures required when attacks take place. We can help ensure that your recovery strategies for ransomware will actually work when you need them to.

Cyber Recovery Strategy

Cyber & Disaster Recovery Strategy

A Disaster Recovery Strategy focuses on the IT technical aspects of recovery and resiliency, along with the process and procedures necessary to sustain these capabilities over time. Enabling a constant state of readiness to act upon a DR event is core to a viable disaster recovery posture. This practice ensures that the services, systems and data necessary to support the organization’s business continuity plan will be available whenever required.

The rapid emergence and adoption of cloud computing, combined with traditional on-premis computing, presents both new challenges and new opportunities for enabling disaster recovery capabilities on top of the traditional causes of outages like operational errors, hardware failures, site, network and power outages. The relative newness of cloud means that new technologies, skills, processes and techniques are needed to build, maintain or enhance DR capabilities. Our experience in both traditional data center and cloud-based disaster recovery will ensure you are ready to address the old and new threats IT and the business face, in an effective, practical, cost efficient and sustainable manner.

Cloud offers the opportunity to make core critical infrastructure resilient across availability zones and regions such that RTO and RPO times can be significantly reduced or eliminated for all time sensitive and mission critical applications that depend upon them. Thinkwise can help you select and implement the right technologies and techniques to meet these critical business requirements. See some examples below:

Disaster Recovery Strategy & Resilience

  • Implement data recovery measures such as backups and snapshots to ensure the recoverability of critical data. This includes defining backup schedules, retention policies, and testing procedures to verify the recoverability of backed up data. Setup of tools, techniques and processes to safeguard against ransomware, rogue employees, and other cyber threats includes immutable storage, air-gapped backups, Privileged Access Management (PAM), and others.

  • Cloud solution providers offer a variety of resiliency tools and built in capabilities to their clients: most have multiple data centres as availability zones and regions interconnected with high speed networks to enable synchronous replication of data across sites. . But, it is important to fully understand which cloud services are resilient and which are your responsibility to protect. You, the client, are responsible to understand any limitations and implement your own safeguards and recovery techniques for your data especially in IaaS and PaaS situations.

    Thinkwise can help you navigate the often overlooked complexities of Cloud computing services, and select and implement the right services, tools, and techniques to protect your business.

  • High Availability (HA), whether in on-premises or cloud deployments, aims to eliminate single points of failure (SPOF) for components and systems within a site; i.e., via redundancy, clustering, and load balancing. This does NOT address DR site outages and hence should be made resilient across sites (see Resilient Availability)

  • Replication is about synchronizing data to one or more targets (typically another site or cloud). Replication enables quick recovery times and tight data loss windows (from zero data loss to seconds, minutes, and hours). For cyber ransomware there are a few consideration to take into account, where air-gapped controls, clean room flexibility and reliance on Active Directory are of concern.

    There are numerous replication methods available and Thinkwise can help you select and implement the right solution(s) for your needs. Some examples include:

    • Database log shipping

    • Snapshot replication

    • Transactional replication

    • Host based replication

    • Hypervisor-based replication

    • Continuous Data Protection (CDP)

    • Storage staged replication

  • There are a number of techniques that can be leveraged to make your applications more resilient from outages of networks, sites or cloud availability zones or regions, including:

    • globally load balancing “like” systems or containers so they remain running

    • Stretch clustering such that at least one side keeps running (example SQL Always On or Oracle Data Guard across 2 or more locations)

    • Leveraging a PaaS service that includes built-in cross site service & data replication

Cyber Recovery Plan & Disaster Recovery Plan Implementation

Cyber Recovery (CR) & Disaster Recovery (DR) share similar technologies and tools yet require unique and separate solutions. Our services include a range of activities aimed at safeguarding organizations’ IT infrastructure, data, and operations in the event of a cyber / ransomware attack, disaster, or other disruptive event. Establishing a constant state of readiness across people, process, technology, and techniques is key to sustaining critical recovery capabilities over time.

Below are common focus areas that make up comprehensive cyber and disaster recovery programs, and can be performed separately or as part of full program implementations:

Cyber Recovery & Disaster Recovery
Solution Implementations

  • Cyber recovery plans require additional steps to traditional disaster recovery plans. In addition to recovering systems and data, some of the planning / actions that need to take place include:

    • Forensic investigation

    • Isolating the attack

    • Eradicate malware

    • Vulnerability patching

    Specialized skills and tools will also be needed. Immutable backups and clean rooms / isolated recovery environments are key.

    It is important to note that cyber recoveries will take much longer than a typical disaster recovery, with average recovery times in the range of 3 to 4 weeks.

  • Disaster recovery planning has evolved significantly over the years, largely due to cloud computing. With decades of experience in DR, we leverage modern tools to create viable DR solutions for Cloud, on-premises, and Hybrid environments.

    Systems, services, and data are often widely distributed across technologies and geographies. Often, multiple DR Plans are necessary to prepare against a variety of scenarios and blast radius’s.

    Being properly prepared is the key to surviving a disaster, whatever form it may take.

  • Conduct an assessment of your current infrastructure, applications, and data to identify critical assets and their recovery requirements. Based on this assessment, a comprehensive disaster recovery solution approach and plan is developed, outlining the steps needed for a viable and sustainable disaster recovery program in support of business continuity.

  • Recommend and select appropriate disaster recovery technologies, tools and techniques to satisfy your specific requirements, budget, and recovery objectives. This may involve implementing backup and recovery, snapshots and data replication solutions, cloud-based disaster recovery services, or hybrid approaches tailored to your needs.

  • Set up and configure the defined infrastructure for disaster recovery, including backup servers, storage systems, network connectivity, and failover mechanisms. This may involve deploying redundant systems, implementing virtualization technologies, and establishing replication and synchronization processes. This is all performed via a mature “do no harm” methodology to safeguard production services and data at all times.

  • Implement data recovery measures such as backups, snapshots, replication and data encryption to ensure the integrity and availability of critical data. This includes defining backup schedules, retention policies, and testing procedures to verify the recoverability of backup data. Setup of tools, techniques and processes to safeguard against ransomware, rogue employees, and other cyber threats, and includes immutable storage, air-gapped backup, Privileged Access Management (PAM), and others.

  • Modern monitoring and management tools and techniques to continuously watch over the health and performance of the disaster recovery infrastructure is critical. These include real-time alerting, performance and health monitoring, capacity planning, and most critically, change management. Uncontrolled change and drift affecting DR are the main reason DR capabilities fail to perform in the event of an actual disaster.

  • Creating and maintaining documentation is key to the disaster recovery plans, procedures, and configurations that ensure all stakeholders understand their roles and responsibilities around disaster recovery. These artifacts provide training and knowledge transfer to IT staff and key personnel to ensure they are prepared to execute the recovery plan effectively. In addition to paper and traditional Microsoft Office-based plans (Word, Excel, PowerPoint), modern DR artifacts can be created in searchable WiKi, dedicated tools, and in the form of automation scripts such as Terraforms and PowerShell.

    Maintaining copies of key artifacts outside of the infrastructure they are meant to protect is critical!

  • Conducting regular disaster recovery tests and exercises is critical to sustaining viable DR capabilities; not just to validate the effectiveness of technology, the recovery plans, and identify gaps or shortcomings, but to also ensure your teams are in a constant state of readiness and proficiency with the tools and techniques. This may involve performing technical failover drills, testing recovery procedures, table-top exercises, and assessing the impact on business operations.

  • Regular reviewing and optimizing of the disaster recovery solution and it’s artifacts is needed to adapt to evolving threats, changes in infrastructure, and business requirements. This includes technology updates and patching, updating recovery plans, incorporating lessons learned from tests and exercises, and leveraging new technologies and best practices to enhance resilience and recovery capabilities.

    It is important to note that change is also one of the main contributors to “breaking” DR capabilities. Changes to production often warrant a similar change to DR for it to remain viable; hence, this consideration must be addressed as part of regular change management.

Cyber Recovery and Disaster Recovery tests & exercises are key to sustaining the effectiveness of technology, artifacts, procedures, and people skills, and ensuring the readiness of an organization to act when a major incident occurs. Further, most organizations, and often their clients and partners, require audit evidence of such tests as proof of preparedness and compliance.

Thinkwise advisors and consultants have planned, executed and participated in hundreds of cyber and disaster recovery tests for clients throughout North America. From table top exercises to full technology recovery, we help our clients establish testing and exercise programs to keep their capabilities and team readiness viable over time.

Compliance is a key aspect of testing, and the creation, collection and reporting of DR test evidence can be complicated and time consuming. We have developed and implemented methods to automate key aspects of compliance, improving the outcomes and significantly reducing the time and effort to perform compliance work.

Thinkwise has experience across the DR compliance spectrum, including such areas as:

Cyber / Disaster Recovery Testing & Exercises

  • Cyber Recovery requires many additional skills, tools, techniques to ensure a constant state of readiness. Proficiency with tools and techniques help increase the likelihood of success and can dramatically reduce recovery times.

    In addition to actual restore / recover processes, testing of forensics, creation of clean rooms / Isolated Recovery Environments, containment, promotion to production, and others must be well practices.

    We assist clients to establish the methods of testing during architecture / design of solution, plan tests, and support tests, from technology and process testing to table top exercises.

  • Disaster Recovery testing should always be performed under a “do no harm” mandate. Testing of technologies can introduce significant risk to production if not done correctly. A variety of techniques, depending on the recovery technology deployed, can limit or eliminate risks while still performing valid and viable testing. Cloud technology offers numerous techniques to enable a target recovery environment that closely simulates production, making testing and exercising as “lifelike” as possible, yet reducing or eliminating risk.

    We assist clients to establish the methods of testing during architecture / design of solution, plan tests, and support tests, from technology and process testing to table top exercises.

  • A viable DR testing & exercising program includes providing and managing evidence of test executions and results. Thinkwise has helped clients dramatically reduce efforts around test evidence and reporting through a variety of techniques including the use of cloud native features and automation.

  • Testing and exercising is a great way to ramp up the skill of new and seasoned staff alike. The need for highly skilled professionals in the disaster recovery field far out paces the availability of viable candidates to fulfill the required roles. In today’s job market, most organizations need to “build” these specialized skills inhouse as opposed to relying on hiring. Thinkwise, along with a viable testing and exercise program, can quickly ramp up your teams. DR requires not just skills but the right mindset - we help foster this through the mentoring approach of our senior advisors and consultants.

  • The ultimate goal of a DR Program is to ensure recovery in the event of an actual incident. Testing & exercising are key to achieving this goal. Thinkwise work closely with our clients to develop programs not only to validate technology and procedures, but also to build, mentor and teach teams to always be ready to recover.

    Testing identifies gaps and issues with DR technologies and procedures. It identifies issues that may evolve due to changes in production environments that affect the ability to use existing DR tools, techniques and plans.

    With the fast rate of change and growth in technology, frequent testing increases the skills and abilities of teams to react to unexpected events if needed.

    These are critical success factors to any DR program, and ones that are often overlooked and under serviced.

The threat of ransomware posed by cybercriminals has never been greater, and continues to grow. It is not enough to establish a Cyber Recovery Strategy and deploy a CR solution. It must be well practiced and exercised to ensure plans, processes and knowledge is well understood by your teams when the heat of the moment arises. After all, a successful attack has most likely compromised your data and identity systems, taken down your business and hence requires recovery from your Last Line of Defense backups.

For Cyber events, this means understanding how to practice re-establishing identity and access systems quickly so the recovery of data from multiple recovery points into a “clean room” or isolated recovery environment (IRE) is possible. This includes establishing isolated remote access to your cyber security partner for forensic analysis to help determine/cleanse which data should be recovered. And lastly, understand the promotion process of the desired recovery data to most likely new virtual networks, accounts/subscriptions and identity systems etc. It will be complicated and time consuming, so practice to shorten duration will be key to getting your business back online as quickly as possible and avoid costly ransoms to do so.

Thinkwise advisors and consultants can help you establish the cyber recovery steps and guide/train on how to exercise preparedness for cyber events. From table top exercises to full technology recovery, we help our clients establish testing and exercise programs to keep their capabilities and team readiness viable over time.

Compliance is a key aspect of testing, and the creation, collection and reporting of CR test evidence can be complicated and time consuming. We have developed and implemented methods to automate key aspects of compliance, improving the outcomes and significantly reducing the time and effort to perform compliance work.

Ransomware Preparedness

The exponential growth of systems and data in most organizations means that conventional Microsoft Office tools like Excel, Word and PowerPoint are not enough to provide DR governance. Our clients are dealing with hundreds to thousands of systems and Petabytes of storage. And the number of applications, artifacts, reports, tools, and services IT supports is typically in the hundreds or thousands too. The sheer volume of “things” that need to be included, change controlled, released and maintained for DR governance purposes, demands modern tools and techniques. Advanced tools, automation and orchestration are keys to success, and cloud is an excellent platform to enable these capabilities.

Modern DR and CR Governance

  • Thinkwise has helped clients dramatically improve GRC from the traditional “trust” approach of querying their tech teams on against a checklist of compliance controls, to a zero-trust real time automated governance process. This involves automated discovery techniques of virtual and physical assets and using tags as barcodes for inventory control, BCm meta data and tracking to create compelling reports and dashboards. These reports and dashboards can include both real-time and historic views that drive the tech teams into the correct and desired behaviors for DR and CR sustainability.

  • Managing and addressing changes to production and DR environments is a must to sustain DR readiness and viability over time. Considering and including DR in any net new projects is also important - it is much more effective and cost efficient to design DR capabilities into any new systems than to retro-fit DR solutions to existing systems.

    One of the most observed issues our clients face is what we call drift: planned or unplanned changes to your IT environment that can negatively affect adjacent and/or dependent systems and services. Ensuring that drift is constantly managed and controlled is crucial to sustaining viable resilience and disaster recovery capabilities over time. We help our clients implement tools and techniques to protect against risk of decay to their DR readiness.

  • Optimizes cloud spending and prevents cost overruns through effective resource management and real-time cost monitoring and alerting. Techniques such as tagging and applying cloud cost alerts can provide transparency and protection.

  • It is important to create and implement practical policies with traceable standards to maintain and report on compliance. Leveraging modern tools and techniques can significantly improve your compliance posture while simplifying and reducing efforts.

    It is also important that your employees, partners, and contractors understand and adhere to your policies & standards. Thinkwise writes and structures policies in an easy to understand, traceable and structured manner, further enabling compliance.

  • Identifies and mitigates risks associated with data breaches, service outages, and vendor lock-in. A clear understanding of cloud service contracts and service levels is key to managing your risks and implementing the tools and techniques to protect your organization.

  • Ensures that sensitive data stored in the cloud even during DR testing and practicing remains secure and compliant with legal, regulatory, and business requirements. Automate compliance reporting using data, tagging, and other techniques to reduce effort, time, and increase reliability and transparency.

Thinkwise 

We make organizations better with positive change in people, processes and technology.