In a full production test, complete failover was achieved in just a quarter of the Client’s dated recovery time objective (RTO)
A replica of key on-premise servers can be brought online ‘in the Cloud’ with data continuously synchronised in near real-time
The Client’s data is now replicated to the Cloud, providing a complete failover architecture without requiring a significant investment in a digital hardware on premise
Manufacturing, Meat processing
The Client is a large Australian agricultural processor supplying premium produce to domestic and export markets.
As one of the largest processors of its type in Australia, the Client prepares, processes, and packages its products and distributes it via wholesale, retail, and eCommerce platforms.
Downtime is very costly and every minute counts when operating a 24/7 manufacturing facility with hundreds of employees. Processing large quantities of product per day, the Client’s production is 100% dependent on its automation functioning. A loss of IT systems would see production stop and the hundreds of staff on site unable to work.
Further, an IT disaster event could cause a loss of important customer and processing data, loss of revenue, and, potentially, jeopardise product safety and damage reputation; as proven when the Client’s biggest competitor suffered a devastating prolonged outage as a consequence of a ransomware attack.
The Client operated a classic disaster recovery strategy where they backed up to disk and removable media nightly. The existing recovery strategy required traditional ‘bare metal’ restoration processes that placed the business at significant risk, with recovery time likely to be measured in days, rather than the Recovery Time Objective (RTO) of four hours. Depending on the type and severity of the event, the Recovery Point Objective (RPO), the last point in time to which data could be restored, could potentially be 24 hours or more.
The business risk associated with the length of time required for bare metal recovery, and the possibility of losing hours worth of data, was no longer considered acceptable by the Client. They recognised that completely duplicating their physical assets and systems on-site would be extremely costly and mean greater system complexity and administration overhead. The complex nature of any traditional backup system, with many moving parts and interdependencies, requires specialist IT staff to continuously monitor and remediate faults.
Instead, the Client wondered if they could achieve a better disaster recovery ROI by replicating key applications and data to the Cloud. The Client wanted to achieve a RTO measured in hours rather than days, and to have the ability to restore to any point in time in the previous 30 days. AWS CloudEndure Disaster Recovery, with near-continuous replication, straightforward failover (and failback) and a simple management interface, ticked all of the boxes.
Our Client’s IT team identified that specialist knowledge was needed to plan and implement the connection between their existing on-premise systems and the AWS Cloud. They turned to Clevvi for help in setting up the AWS CloudEndure Cloud-based disaster recovery system.
We started the process by analysing the Client’s existing systems and workflows. By deep diving into their servers and workloads, we gained an astute understanding of their network and which endpoints needed to be replicated to CloudEndure.
We found that the Client operates around 80 servers, with endpoints spread across a complex internal fibre network. Clevvi needed to ensure that if the Client had to recover their operations, all computers and handheld devices (such as warehouse scanner guns) could talk to the disaster recovery replicas in the AWS Cloud.
We took the time to understand the Client’s risk profile in relation to their IT systems to ensure that we could recommend which servers should be prioritised in the Cloud disaster recovery environment. We then evaluated the workloads of the chosen servers to confirm if CloudEndure was the best AWS product to enable the disaster recovery solution.
For two of the workloads – Active Directory and Windows File Sharing – we identified that native replication was the best solution – which meant the replication occurred in their own environment and not CloudEndure.
We also identified that for Microsoft SQL Server, the Client’s objectives would be best met using AlwaysOn Availability Groups. We recommended using CloudEndure as an immediate solution and implementing availability groups in a future phase.
The Client’s firewall devices were legacy products and Clevvi recognised that the Cloud disaster recovery solution would be more resilient if firewalls were upgraded to modern UTM devices. We presented the business case to the Client and then assisted with acquiring, installing, and integrating those devices into the network.
Clevvi first helped the Client to establish their AWS account. AWS Control Tower was used to provision the AWS environment and apply AWS’ strongly recommended high-level rules (called “guardrails”).
We then created a private, secure connection between the Client’s on-premise networks and the AWS Cloud. This was implemented and secured using modern UTMs, with physical devices at the Client’s sites and Cloud UTM in AWS.
The CloudEndure Disaster Recovery was set up so that the Client's most critical databases, systems and applications - including Microsoft Dynamics AX ERP application servers, Microsoft SQL Servers, Microsoft Remote Desktop Servers, and various industry specific production and MES systems - were replicated to AWS. This process was relatively simple, requiring only a small agent be installed on the production servers.
Clevvi’s engineers then configured native replication of the Active Directory and Windows File Sharing workloads using AWS EC2 instances. Using CloudEndure, we replicated the Client’s 14 servers that are now continuously replicating to the Cloud.
With the Client’s systems continuously replicated in their AWS account, kept up-to-date with all application changes, and ready to run, the final step was testing. Clevvi initially carried out basic ‘smoke testing’, running a test instance of each server in the AWS Cloud isolated from the production network. The business then proceeded to a full end-to-end test, verifying that scanning guns in the warehouse could successfully manipulate inventory records in the Cloud.
Clevvi’s engineers customised a failover playbook for the Client’s IT team. We provided training on how to recover from a disaster/failover to the Cloud in the event that their primary data centre goes offline. This enabled the Client’s IT team to readily achieve recovery and easily conduct frequent drills without impacting on the business. The project was officially completed when we successfully demonstrated that scanning guns in the warehouse could continue to be used without any reconfiguration on the factory floor, achieved well within the Client’s RTO.
The technical competency of the Clevvi team and our approach of analysing and understanding workloads first meant that we could design a Cloud disaster recovery strategy for the Client that utilised optimal replication methods.
Our wide breadth of technical experience in local area networking, internet and wide area networking, Cloud computing, and application level engineering/support was essential in being able to design a holistic solution.
Clevvi’s relationship with AWS was a major contributing factor as to why this project was a success. Our team’s extensive knowledge and experience in cyber security was particularly relevant as the Client’s decision to update their disaster recovery system was rooted in concern around the consequences of a cyber attack.
We tested the system by coordinating a complete disaster recovery trial and database shut down with their IT department. Throughout the test, the Client was able to continue scanning products on the floor and we achieved 75% better recovery time compared to their RTO. Their objective was four hours - we achieved recovery within one.
Owing to continuous replication through AWS, the Client can now launch at any time without a need to wait for their servers to be updated. Due to CloudEndure’s continuous operation, The Client’s RPO improved from 24 hours (using their traditional nightly backups) to under 10 minutes.
Should the Client’s systems go down, by using a simple playbook that their IT team can execute, they are now able to recover their systems in minutes, with less likelihood of human error and improved reliability.
For their IT team, the ability to protect enterprise applications and databases with a single tool has saved significant time and made the Client’s disaster recovery strategy far more robust.
The Client was able to achieve a completely redundant set of systems with a moderate increase in operating expenses, rather than having to make a significant capital expense in, and maintain on an ongoing basis, double the amount of server hardware.