How to Plan Disaster Recovery on Oracle Cloud

May 27, 2021

According to Gartner, selecting a Disaster Recovery option is a complex decision and it’s easy to pick the wrong solution. Enterprises should be clear about requirements and careful about shortcuts. You might be hesitant to rely 100% on the public cloud for your E-Business Suite, but you can migrate your disaster recovery and non-production workload to OCI easily.

Begin by dipping your toes into Cloud. Creating a backup of your source Oracle EBS instance is the first part of a lift and shift OCI migration process.

Although this process is intended primarily for on-premises instances, you can also run the Oracle EBS cloud backup module to conduct a lift and shift in certain cases when the source environment is already in Oracle cloud Infrastructure with optional database services.

The primary objectives for the following architectures are to ensure you can build disaster recovery (DR) into your deployment in case of unforeseen events which would require you to failover and still keep eBusiness Suite up and running.

Outcomes these architectures can provide:

DR within a single region

  • Active-Active components across ADs
  • Active-Passive components across ADs
  • Regional subnets across ADs
  • Load-balancing across ADs
  • Storage synchronization across ADs
  • Database DR across ADs

DR across multiple regions

  • Application replication between regions
  • Storage replication between regions
  • Cross-region copy lets you asynchronously copy object storage datasets
  • Cross-region backup copy for block volumes
  • Database protection between regions

Identifying the Right Disaster Recovery Strategy

To actually select a solution, you can focus just on the parameters of data loss and downtime.

The first thing to do is to think about the two extremes: can my application tolerate hours or days of lost data, AND uncertain recovery time (hours or days, at least)? If so, then you just need basic backup to the cloud. Probably every application in your environment needs at least this basic level of protection.

Otherwise, if you can tolerate all that data loss and downtime, why not just turn off that system now, and save the cost?

Next, lets go to the other extreme. Does your application need to be back on-line after a site-outage in < 30 minutes (including the decision time?). Maybe you need something close to Zero downtime? If you need that, or you need < than a few seconds of data loss – basically zero data loss after a site-wide outage, Then you want the Active / Active solution. Of course this comes with more cost & effort, but if you need it, Oracle can deliver it.

Most applications fall into the middle ground – they are critical enough to deserve some protection, but possibly not quite Zero downtime / Zero data loss. These fall into the middle, where we measure data loss in seconds, and we can ask one more question: How much downtime can you accept? If you need to ensure that you are back on line in less than 4 hours, you want an Active / Standby solution. If you can tolerate something in the 4-24 recovery time, you can use a Pilot Light solution.

Oracle Cloud Disaster Recovery

When we actually build the DR solution, there is a range of price / performance trade-offs, and we can provide solutions across that whole range.

Mostly what you are thinking about are Data Loss – also called Recovery Point Objective or RPO, and Down time – also called Recovery Time Objective or RTO. As always, these two performance metrics are balanced by cost and complexity.

a) At the entry level, we can just backup the data to the cloud. This gets the data and application configuration off-site, so at least we have a starting place to recover from. This is a really basic offer because it means that you will loose all data up to the last backup – so maybe 24 hours of data loss, and it will take time to recover systems from backup plus any required time to reconfigure them to run in the cloud. This is what we would call a minimum effort – no system should be without at least this level of protection, but most systems will need something better.

b) Next we have the Pilot Light. In this solution we upgrade our database protection to real-time replication, which brings RPO or data loss down to just a few seconds. But we still use the backup and recovery strategy for our application servers, so we have that long recovery time while we restore and reconfigure servers. This solution is good when you want to minimize data loss and keep your costs down, and can tolerate a relatively long downtime – like 24+ hours.

c) The next step up is to configure some stand-by servers that match our application tiers on-premises. Now when we need to fail over, instead of waiting for that long restore time, we have everything ready to go, and just need to switch it into production. This brings our recovery time down to minutes to go along with our already low data loss.

d) At the highest level, we can build an Active / Active solution that gives you Zero downtime and Zero data loss even in the face of a regional disaster. Not every application needs this level of protection, but if the value of a few minutes of downtime or a few seconds of data loss is high enough, we can deliver a solution here.

e) Most customers will select something here, depending on their tolerance for down time when recovering from a disaster.

f) And of course, we recommend that every application get at least this level of protection. Without this basic protection, you are really signing up for unlimited data loss and unlimited down time following a regional disaster. Very few people will knowingly sign up for that.

Notable point : There are a range of protection options that trade off performance and cost. Active / Passive lets you recover with very low down time and data loss. Backup is the minimum for all applications.

Disaster Recovery Across Multiple Regions

You can achieve true DR across multiple region in the unlikely event that one region goes down. This reference architecture covers the most robust case with clustering of supported services across ADs within the primary region, but disaster recovery can be achieved across regions with single AD. This is important to note as most of the new OCI regions launching will be single AD regions.

Active-active components across ADs: Clustering of supported services across ADs provides protection from an AD failure.

Active-passive components across regions: If you are using active-passive to synchronize application servers across ADs, use rsync.

VCN peering across regions: VCNs can connect between regions within a tenancy or even between tenancies. Connectivity is done using Oracle’s internal backbone between regions.

Storage synchronization across AD: Block volume backups between regions can be done using the console, CLI, SDKs, or REST APIs. Copying block volume backups to another region at regular intervals makes it easier to rebuild applications and data in the destination region if a region-wide disaster occurs in the source region. You can also easily migrate and expand applications to another region. With Object Storage cross-region copy, data asynchronously copies objects between buckets in the same region or to buckets in other regions.

Database DR across ADs: The use of either Data Guard or Active Data Guard is dependent on your use case and database edition. Active Data Guard requires Enterprise Edition – Extreme Performance.

Disaster Recovery: On-Prem to Oracle Cloud Infrastructure

1. Replicate production environment to OCI

2. Set sync policy

  • Configurable policy: hourly, daily,
  • weekly or per defined schedule
  • Multiple policies can be configured and applied
  • Automatic sync and alerts

3. Provisioning options

  • Pre-provision VMs (hot standby)
  • Dynamically provision VMs -sync to storage (low cost)

IT Convergence can provide a complete Disaster Recovery and Backup Platform that extends across physical and virtual environments. Multiple RPO/RTO options give enterprises control over availability vs. cost, ensuring critical applications recover quickly and secondary apps do so in the most cost-effective timeframe.

So let’s take a look at how it works. On the left you see your on-prem estate, including your Oracle EBS applications. On the right you see OCI. Helping us to connect these together is middleware, which leverages automation software to run its own small server in Oracle Cloud and knows how to set up, operate, and manage disaster recovery between your on prem data center and Oracle Cloud Infrastructure.

It detect the config on your production servers, recreate them in the cloud, migrate all the DB, files, etc. to the cloud and continuously update it. Then, either on demand or after a failure you can restart your E-Business Suite applications in the cloud and let your users reconnect so that you’re back in business with minimum disruption.

Customers can also choose from a list certified Cloud MSE’s who will manage the entire process on your behalf. Certified Oracle cloud MSE’s have proven expertise, tools and processes to build, deploy, run, and manage Oracle and non-Oracle workloads on Oracle Cloud Platform all under a single contract and a single point of contact.

These cloud MSE’s can have the environment up and running with a full DR test of the EBS environment running in OCI in a minimum specified time (30-45 days*). For a leading US eye wear company we setup a multi-region, 3 disaster recovery scenario in less than 6 weeks setup. You can read the case study here.

Once OCI is validated as a viable high performance DR site, switchover primary to OCI and set up a second availability domain for DR to OCI as discussed previously

High Availability & Disaster Recovery Deployment choices for EBS on OCI

Oracle Cloud Infrastructure provides flexibility in designing your Oracle E-Business Suite deployment. You have the option to deploy your EBS system in a single availability domain, across multiple availability domains, or in multiple regions depending on your business requirements.

  • For high availability, deploy your Oracle E-Business Suite in a single availability domain with multiple application instances. if one instance fails, other instances can continue processing your requests.
  • To further enhance availability, you can use multiple availability domains. If one domain fails, you can still access the application instances in another domain.
  • For disaster recovery purposes, you can set up a second site in a different region using the multiple regions architecture. This is similar to the multiple availability domain architecture, but resources are created in a different region.

Configuration Options for Oracle E-Business Suite with Disaster Recovery

This option is a variant of the multiple node architecture. It consists of the same components, however, the database is deployed to Oracle’s Platform as a Service (PaaS).

You may subscribe to:

  1. DBCS (Single Instance), or
  2. DB Systems (Single Instance) or
  3. Exadata DB System for the Oracle E-Business Suite database.

You may provision a multi–node Oracle E-Business Suite environment or perform a Lift and Shift of an existing on-premises Oracle E-Business Suite Release 12.2 or 12.1.3 Environment. The use of the Lift and Shift automation provides an expedited capacity for migration with reduced risk and shorter a time period for project completion.

Just as you may use Oracle Real Application Clusters (RAC) or Data Guard for disaster recovery, you can use them in Oracle Cloud.

Importance of Disaster Recovery Consultants

Disaster recovery is an essential aspect of any business strategy, especially when it comes to cloud migration. While cloud providers offer some level of solutions, they may not be sufficient for all businesses. Moreover, even with the best disaster recovery plan in place, unexpected disasters can still occur, leading to data loss, downtime, and other costly consequences.

This is where expert consultants come into play. These consultants can help businesses develop comprehensive cloud disaster recovery plans that are tailored to their specific needs and provide the expertise needed to navigate any challenges that arise.

Benefits of Cloud Disaster Recovery Consultants

Customized Plans

Disaster recovery consultants can help businesses develop customized disaster recovery plans that are tailored to their specific needs. These plans take into account the unique risks and challenges facing the business and provide a roadmap for mitigating those risks.

Expertise and Knowledge

Disaster recovery consultants have the expertise and knowledge needed to navigate the complexities of disaster recovery. They understand the different types of disasters that can occur and how to prepare for them, as well as the best practices for minimizing downtime and data loss.

Testing and Validation

Disaster recovery consultants can help businesses test and validate their disaster recovery plans to ensure they are effective. Regular testing is essential to ensure that the plan is up-to-date and that all systems and procedures are functioning correctly.

Rapid Response

In the event of a disaster, disaster recovery consultants can provide rapid response services to help businesses get back up and running as quickly as possible. They can help businesses identify the root cause of the disaster, implement recovery procedures, and restore data and systems.

Cost Savings

While disaster recovery services may seem like an additional expense, they can actually save businesses money in the long run. A well-designed plan can minimize downtime and data loss, reducing the impact on the business and its customers.

Conclusion

Creating a DR for mission critical applications will have an involvement of key several stakeholders with specialized skillsets. It is highly recommended to leverage certified Oracle Cloud MSE Partners to ensure you avoid any roadblocks in your cloud migration journey and avoid inflating your cloud migration costs and timelines.

Related Posts