Disasters can strike any business. They can inflict serious financial losses and even shut an organization down for good. Surviving one requires planning.
They aren’t limited to the obvious physical disasters, such as fires, floods, and theft of equipment. Massive data loss can be just as harmful. It can result from a hardware failure that wipes out a disk, or from malware that does the same thing.
Ransomware has become a major risk in the past few years. It scrambles the data on computers and backups and then demands a payment for the code to restore the files. Making the payment doesn’t always get the mangled files back.
Most small businesses have no disaster recovery plan. Nationwide Insurance reports that only 18 percent of companies in the United States with fewer than 50 employees have such a plan. The report estimates that 25 percent of businesses don’t reopen after a major disaster.
Small businesses are at greater risk, since the loss of one server won’t wipe out a huge enterprise but could ruin a small shop. Having a plan significantly improves the chances of survival.
Insurance can cover the replacement cost of lost physical assets, but money won’t get back lost data. Replacing a computer requires setting it up with the software and data from the old machine, and this is very hard if you’re caught unprepared.
The good news is that there are cost-effective ways for small businesses to protect their data from catastrophic loss and get running again quickly.
How prepared do you need to be?
A data recovery plan needs to match a business’s needs. A shop with a few employees can’t do everything, but it’s better to have some contingency plans than none.
The two big issues are how quickly you can get back to normal operation and how much data you can afford to lose.
A small business usually has more leeway on resuming operations than a huge one. If a big site like Amazon or eBay went down for an hour, it would be national news, costing the company millions and making a huge number of customers and vendors unhappy. If a small operation goes down for an hour, the consequences usually aren’t too serious. It can probably endure a couple of days of downtime, provided the people affected know you’re working on the problem and will be back up soon.
The amount of data loss is more critical. If you lose every transaction in the hour before the disaster, it’s likely you’ve lost orders, payments, or other transactions. Reconstructing the bookkeeping is painful, and claims of unfulfilled orders are bad for business.
The technical terms for quantifying these two factors are Recovery Point Objective (RPO) and Recovery Time Objective (RTO). The Disaster Recovery Blog has a good explanation of them. Briefly:
- The RPO is the longest acceptable time period over which you can tolerate losing data. If your RPO is one hour, then your data should be backed up at least every hour.
- The RTO is the longest time that’s acceptable for restoring operations.
Disasters are messy, so you can never be sure how long it will take, but you have to set a goal. If you decide that four hours is the longest downtime your business can endure, that’s your RTO, and the recovery plan needs to have a high chance of getting the system running again within that time.
Devising a recovery plan
A small business needs a recovery plan at a reasonable cost. Fortunately, there are a number of low-cost options. A plan has two main requirements:
- All important data should be backed up offsite and updated frequently. At least one backup has to be offsite, since a disaster is likely to take out local copies.
- There should be an efficient way to bring up a system which is equivalent to the one that failed. It needs to run the same software and have the most recent backup data.
Your business most likely uses a hosted service or remote server. Keeping your servers in your office space is too risky. It’s a maintenance burden even if nothing catastrophic happens, and it makes picking up the pieces hard if a disaster wrecks the office.
A tech-oriented business often runs a dedicated server at a datacenter. Other businesses are more likely to use a hosting plan, which significantly reduces the burden of preparing for disasters.
The hosting plan might give your business part of a specific machine, sharing its capacity with other customers while keeping your accounts and data private. That’s called shared hosting. Alternatively, you might have cloud hosting, where you run on a virtual machine that isn’t tied to a specific piece of hardware. Cloud hosting is the safest from disasters, since it’s not dependent on any one machine. If you have a good cloud host that has multiple datacenters, disaster recovery for the server is built in.
As long as your server uses a good datacenter, it’s much safer from physical disasters than the average office. Make sure it does frequent backups to a different location from the server.
Even if the server is remote, your business probably has in-office computers that are essential to its operation. You might have desktop machines, and your phone system may depend on the data network. They also have to be part of the recovery plan.
If a desktop holds important information which isn’t stored elsewhere, it has to be backed up. Restoring the files quickly is less urgent than server recovery. An automated cloud backup will protect their files nicely. It’s important, though, not to let a desktop machine be the only place that holds information which is essential to recovery.
Many reliable services are available for offsite file backup. That’s a good start, but disaster recovery requires more. It’s necessary to get the server back to a state with the same operating system, software, and database. Rebuilding a system from scratch takes too long, and the chances of missing something or having compatibility issues are high.
The best way to ensure restoration of the whole system is to save a system image. This is a copy of the whole drive, including the operating system, software, user accounts, data files, and everything else that’s relevant. It can be restored to any compatible hardware. A system image can be many gigabytes in size, so you need enough offsite storage to hold it. Restoring it could take a significant chunk of time.
Since a system image is huge, you can’t keep it updated to the minute. It just needs to be recent enough to hold the current software and configuration. It’s supplemented by incremental file backup, which regularly copies files that have changed to offsite storage. Most backup software is smart enough to copy only the parts of a file that have changed, so it can keep large databases backed up without falling behind. The time between updates needs to be no longer than your RPO.
Once you’ve set up a backup plan, it will run automatically. However, you need to make periodic spot checks to be sure it’s still working. Pay attention to notices from your backup provider. If it discontinues your service, tells you you’ve reached your storage limit, or reports that it can’t run a backup, you need to fix the problem immediately.
With desktop machines, file backup may be sufficient. Getting important documents back is important, but it probably doesn’t matter if the system configuration is exactly the same as before.
Having a complete server backup is useful only if you can get it running on some hardware within your RTO or faster. If your existing computer has suffered data loss due to software issues (e.g., ransomware), you may be able to wipe the disk clean and restore to the same hardware. Make sure you’ve eliminated the cause and it isn’t still lurking anywhere on your network, or you could get an immediate recurrence.
The first step is to restore the system image and make sure it works. Then you can restore your file backup on top of that. If all goes well, you’ll be back to a running system.
If server hardware is broken or missing because of the disaster, that’s a bigger problem. As mentioned before, the server really shouldn’t be on your premises, but if it is, you have to get compatible replacement equipment before doing the restoration. If you own the equipment at the datacenter, the situation may be much the same.
If you lease hardware at a datacenter, moving your lease to another machine shouldn’t be hard, unless the whole center has failed. If it has, you’ll need to find another location to set up operations, but that should be a rare occurrence.
If you use shared or cloud hosting, the situation is easier. Your hosting service should be able to set up another host quickly and restore the system image to it.
If any amount of downtime is too much, there’s an option that allows very fast recovery. It’s called cloud failover. With this approach, you have a cloud server in reserve that can have the system image and latest files loaded onto it. You can be running again within seconds, though perhaps with less computing power than usual.
The tricky part is making it look like the same server on the internet. Your public server, wherever it is, identifies itself to the internet by its IP address, a sequence of four numbers such as 126.96.36.199. A server can’t just declare its own IP address to the internet but has to get it from its upstream provider.
The provider is what actually handles the failover. Both your regular and failover servers need to connect to the net through it, and it needs to be set up so it will switch your IP address to the failover when the normal server stops responding. It’s an extra complication and cost, but it almost completely eliminates downtime. It could be the right choice for a small business that needs to provide continuous service.
A plan that hasn’t been tested doesn’t inspire much confidence. It’s hard to do a full end-to-end test of a recovery plan, but whatever testing is feasible helps. An annual test could reveal serious omissions that need fixing.
You can run a full restoration to an alternate system without overwriting your main server or bringing the test server publicly online. Either leasing a server or using a virtual server in the cloud for a day is cheap.
Test the restored system to make sure users can log in and that all the functionality is there. Keep a log of who did what, what issues arose, and how long it took to bring up a working server. If the attempt failed, figure out what the plan left out and make corrections. When finished, shut the test system down so it can’t leave any security holes open.
So far this discussion has been all about computers, system images, and backups. However, there’s more to disaster recovery than restoring the contents of a disk drive. The human element is crucial.
The first thing is to know that a disaster has happened. Someone should be responsible for monitoring the system, so that downtime doesn’t go unnoticed for hours. When the system is down, the next step is to notify the appropriate people, including the boss and the people who maintain the server.
If the downtime will be more than a few minutes, you need to let affected parties know what’s happening. If all they know is that it isn’t responding, they have no idea what to expect. The recovery plan should include a strategy for notifying people.
Access to the system is ultimately in people’s hands. Natural disasters often put key people out of reach for a long time. If the only person with essential information becomes unavailable permanently or for a long time, recovery may be difficult or impossible.
- System passwords need to be secure, but they shouldn’t be in one person’s exclusive possession. There needs to be a contingency plan, such as keeping them in a locked safe.
- If the one person who understands the system quits or becomes incapacitated, that can be a disaster all by itself. Procedures for running and restoring the system should be written down so that someone else can pick up the job.
- Storing the information only on the server invites trouble if the server goes down. Paper copies in a locked, fireproof box are safer. Storing them on a secure, independent site is another approach. Make sure that more than one person has the key or password.
- All employees should have copies of the information about their responsibilities in an emergency, and management should have their contact information.
- It isn’t necessary to rely just on employees for recovery planning. Hiring a managed services company can be a cost-effective approach. A business with just a few employees can’t normally keep an IT manager, but a managed service provider can help it to deal not just with disasters but with the more routine complications of running a server.
A service company with a good record can help to set up a disaster recovery plan and provide emergency support. The service level agreement (SLA) needs to spell out the expected services and say how quickly the company will respond. An SLA that covers only normal business hours is less expensive; whether it’s sufficient depends on your business.
Everyone with IT operations is vulnerable to disasters, and small businesses have an especially poor chance of surviving one if they aren’t prepared. Every business needs to develop a disaster recovery plan that fits its level of risk. Several approaches can improve the odds without costing a fortune. Offsite backups and the ability to reconstruct failed servers will let a business keep running, even when things go badly wrong. Even a partial plan is better than none.
- The U.S. government’s Ready site offers recommendations for an IT disaster recovery plan.
- Small Business Computing discusses small companies’ needs for a recovery plan and suggests some affordable solutions.
- Network World discusses recovery options for smaller companies, including cloud services and virtualization, as well as the value of social media in informing affected people.