IT Disaster Recovery Plan Template for Small Business
On a Friday evening, a ransomware attack encrypts every file on a 60-person manufacturing company's network. The file server, the ERP system, email, the CRM, and 14 years of engineering drawings are all locked behind a ransom demand of 15 Bitcoin. The owner calls the IT manager, who has been there for three years but has never written down what to do in a situation like this. The backup system exists, but nobody has tested a restore in over a year. The IT manager discovers that the backup drive was connected to the network and was encrypted along with everything else. Monday morning, 60 employees have nothing to work with. The company pays the ransom. The decryptor works on some files but corrupts others. Full recovery takes 23 days. The total cost - including ransom, lost revenue, overtime labor, emergency consulting, and customer penalties for late deliveries - reaches $1.4 million.
A disaster recovery plan would not have prevented the attack. But it would have reduced recovery time from 23 days to 48 hours, eliminated the ransom payment, and cut the total cost by over 90%. The plan does not need to be complex. It needs to exist, it needs to be documented, and it needs to be tested.
This article provides a complete disaster recovery plan framework that any small business IT team can adapt. It covers every section your plan needs, explains why each section matters, and gives you a practical template to fill in with your own systems and procedures.
Section 1: Business Impact Analysis
Before you can plan recovery, you need to understand what you are recovering and how urgently each system needs to be restored. A Business Impact Analysis (BIA) answers two critical questions for every system in your organization.
Recovery Time Objective (RTO)
RTO defines the maximum amount of time a system can be down before the impact becomes unacceptable. For each system, ask: "If this system goes offline right now, how long before the business suffers serious harm?" The answer varies dramatically by system.
- Tier 1 - Critical (RTO: 1-4 hours). Systems whose outage immediately stops revenue generation or creates safety or legal risk. Examples: e-commerce platform, payment processing, production control systems, patient records in healthcare, email for customer-facing teams.
- Tier 2 - Important (RTO: 4-24 hours). Systems that significantly degrade operations when unavailable but do not immediately stop the business. Examples: CRM, internal file shares, project management tools, phone systems, internal communication platforms.
- Tier 3 - Standard (RTO: 1-7 days). Systems that are used daily but whose absence can be worked around temporarily. Examples: HR systems, internal wikis, development environments, non-critical internal tools.
- Tier 4 - Deferrable (RTO: 7+ days). Systems that can wait until everything else is restored. Examples: archival systems, reporting dashboards, training platforms.
Recovery Point Objective (RPO)
RPO defines the maximum amount of data loss that is acceptable, measured in time. If your RPO for the accounting system is 4 hours, that means you can afford to lose up to 4 hours of accounting data. Everything entered in the last 4 hours before the disaster would need to be re-entered manually. RPO directly determines your backup frequency.
- Near-zero RPO (continuous). Transaction systems, databases where every record matters - use synchronous replication or continuous data protection. Examples: payment systems, order databases.
- 1-4 hour RPO. Systems where losing a few hours of work is tolerable but a full day is not. Use backup snapshots every 1-4 hours. Examples: CRM data, shared file systems with active work.
- 24-hour RPO. Systems where daily backups are sufficient. Losing a day of data is inconvenient but recoverable. Examples: email (if using a cloud provider with built-in retention), internal documents, HR records.
Section 2: Backup Strategy - The 3-2-1-1 Rule
The traditional 3-2-1 backup rule says: maintain 3 copies of your data, on 2 different types of media, with 1 copy offsite. In 2026, with ransomware specifically targeting backup systems, the rule has evolved to 3-2-1-1: add 1 immutable copy that cannot be modified or deleted, even by an administrator.
Implementing 3-2-1-1 for SMBs
- Copy 1: Production data. This is your live data in your applications, databases, and file systems. It is not a backup - it is what you are protecting.
- Copy 2: Local backup. A backup stored on local infrastructure (NAS, dedicated backup server, or local disk array) that enables fast restoration. Local backups should be on a separate network segment from production systems with dedicated credentials that are not shared with any production accounts. Ransomware that compromises an admin account should not be able to reach the backup system.
- Copy 3: Offsite backup. A backup stored in a geographically separate location. Cloud storage (AWS S3, Azure Blob, Google Cloud Storage, or Backblaze B2) is the most practical option for SMBs. For organizations in a single region, the offsite backup should be in a different geographic region to protect against natural disasters.
- Copy 4: Immutable backup. A backup that cannot be modified or deleted for a defined retention period, even by someone with administrative credentials. AWS S3 Object Lock, Azure Immutable Blob Storage, and Backblaze B2 all support immutability. Set a retention period that matches your longest recovery scenario (typically 30-90 days). This is your last line of defense against ransomware that specifically targets and deletes backups.
What to Back Up
Not everything needs the same backup treatment. Align your backup strategy with your BIA tiers.
- Databases and transaction systems. Continuous or hourly backups with point-in-time recovery capability. Use native database backup tools (pg_dump for PostgreSQL, mysqldump or Percona XtraBackup for MySQL, SQL Server backup to Azure Blob). Test restoration monthly.
- File servers and shared drives. Daily incremental backups with weekly full backups. Retention of 30 days for daily backups and 12 months for monthly snapshots. Use versioning so that individual files can be restored from any point in the retention window.
- Email and cloud applications. If you use Google Workspace or Microsoft 365, your email provider maintains redundant copies. However, these are not true backups - a deleted mailbox can only be recovered within a limited window (30 days for Google, 30 days for Microsoft). Use a third-party backup solution (Backupify, Spanning, or Veeam for Microsoft 365) for long-term email retention.
- System configurations. Back up firewall rules, switch configurations, server build documents, application configurations, and DNS records. These are often overlooked until a disaster occurs and the IT team realizes they cannot rebuild the environment because they do not know how it was configured. Store configurations as code in a version-controlled repository.
- SaaS application data. Data in SaaS platforms (CRM records, project management data, accounting entries) lives on the vendor's infrastructure. Most SaaS vendors do not provide backup or recovery services beyond their standard retention policies. For business-critical SaaS applications, use the vendor's API to export data regularly or use a SaaS backup tool.
Section 3: Communication Plan
During a disaster, communication failures cause as much damage as the technical failure itself. Employees do not know what is happening. Customers cannot reach anyone. Vendors are sending invoices to a dead email system. A communication plan ensures that the right people know what is happening, what to do, and when to expect resolution.
Internal Communication
- Emergency contact list. Maintain a printed and digital (stored outside your primary systems) list of every person who needs to be contacted during a disaster. Include personal phone numbers, personal email addresses, and messaging app contacts. This list should include: IT team, executive leadership, department heads, key vendors, and your cyber insurance carrier's incident hotline.
- Communication chain. Define who calls whom. The IT lead notifies the CEO and department heads. Department heads notify their teams. This prevents the IT team from spending the first hour of a disaster fielding phone calls instead of working on recovery.
- Out-of-band communication channel. If your email and Slack are down, how does the team communicate? Designate a backup: a WhatsApp group, a Signal group, or a pre-configured Teams channel on a separate tenant. Set this up before you need it and verify that all key personnel have access.
- Status update cadence. During active recovery, provide status updates every 2 hours to internal stakeholders. Updates should include: what is currently down, what has been restored, estimated time to next milestone, and what employees should do in the meantime (work from home, use personal email, take PTO).
External Communication
- Customer notification templates. Pre-write email templates for common scenarios: service outage, data breach requiring notification, extended maintenance. During a disaster is not the time to be composing customer communications from scratch. Templates should be reviewed by legal and stored in an accessible location outside your primary email system.
- Vendor notification. Notify critical vendors within 4 hours if a disaster affects your ability to receive deliveries, process invoices, or fulfill contractual obligations. This prevents cascading failures in your supply chain.
- Regulatory notification. If the disaster involves a data breach, regulatory notification requirements begin immediately. HIPAA requires notification within 60 days. GDPR requires notification to the supervisory authority within 72 hours. Your DR plan should include the specific notification requirements for your industry and a pre-identified contact at your legal counsel who handles breach notifications.
Section 4: Ransomware-Specific Response
Ransomware is now the most likely disaster scenario for small businesses. Your DR plan needs a specific section addressing ransomware response because the correct actions differ significantly from other disaster types.
- Do not pay the ransom as a first response. Contact your cyber insurance carrier and legal counsel before making any payment decision. Many insurers have negotiators who can reduce ransom demands by 40-60%. In some jurisdictions, paying ransoms to sanctioned entities is illegal. Insurance may cover the ransom payment if recovery from backups is not feasible, but only if you follow their incident response process.
- Isolate before investigating. Disconnect all network connections immediately - internet, internal network, VPN. Ransomware often has a propagation component that continues encrypting reachable systems while you are investigating. Minutes matter.
- Identify the ransomware strain. Upload a ransom note or encrypted file sample to ID Ransomware (a legitimate identification service). Some ransomware strains have known decryptors available for free. Check the No More Ransom project before considering payment.
- Assess backup integrity before restoration. Verify that your backups are clean and not encrypted or corrupted. Ransomware sometimes lies dormant for weeks before activating, meaning recent backups may contain the ransomware payload. Test restoration in an isolated environment before connecting restored systems to the production network.
- Rebuild, do not decrypt. Even if you obtain a decryptor (through payment or a free tool), rebuilding systems from known-good backups is faster and more reliable. Decryptors do not always work perfectly, and the attacker may have installed additional backdoors that a decryptor will not remove. Use the disaster as an opportunity to rebuild with improved security configurations.
Section 5: Recovery Procedures
Document step-by-step recovery procedures for each Tier 1 and Tier 2 system. These procedures should be detailed enough that someone other than the primary IT person can execute them. The primary IT person might be on vacation, might be the person who caused the disaster, or might be unavailable for any number of reasons.
What Each Procedure Should Include
- System name and description
- BIA tier, RTO, and RPO
- Dependencies (what other systems must be running before this one can be restored)
- Backup location and access credentials (stored securely outside the primary password manager)
- Step-by-step restoration instructions with screenshots or command examples
- Verification steps - how to confirm the system is functioning correctly after restoration
- Responsible person and backup person
- Last tested date and result
Recovery Order Template
Systems should be restored in dependency order, not importance order. A Tier 1 application that depends on a Tier 2 database must wait until the database is restored. Here is a typical recovery sequence.
- Network infrastructure. DNS, DHCP, firewall, VPN. Without network connectivity, nothing else can be restored.
- Identity and authentication. Active Directory, identity provider, MFA. Without authentication, nobody can log in to restored systems.
- Databases. Restore database servers and verify data integrity before bringing application servers online.
- Core business applications. ERP, CRM, email, file services - restore in BIA tier order.
- Communication systems. Phone, internal messaging, video conferencing.
- Secondary systems. Reporting, analytics, internal tools, development environments.
Section 6: Testing Your Plan
An untested disaster recovery plan is a hypothesis, not a plan. Testing reveals gaps that documentation cannot: corrupted backups, missing credentials, procedures that assume dependencies that no longer exist, and recovery times that exceed your RTO by hours.
Testing Levels
- Backup verification (monthly). Verify that backups completed successfully and are not corrupted. Restore a random sample of files or database records and verify their integrity. This takes 30 minutes and should be a standing monthly task.
- Tabletop exercise (quarterly). Gather the IT team and key stakeholders. Present a disaster scenario (ransomware attack, cloud provider outage, office fire) and walk through the plan step by step. At each step, ask: do we have the information we need? Do we have access to the systems referenced in the plan? Are the people assigned to each role still in those roles? Document gaps and fix them within two weeks.
- Partial recovery test (semi-annually). Select one Tier 1 system and perform an actual restore from backup to a test environment. Measure the actual recovery time and compare it to your RTO. If actual recovery takes 6 hours and your RTO is 2 hours, you need better backup infrastructure or a revised RTO. This is the test that reveals whether your plan actually works.
- Full recovery test (annually). Simulate a complete disaster and recover all Tier 1 and Tier 2 systems from backup. This is disruptive and requires planning, but it is the only way to verify that your full recovery sequence works end-to-end. Schedule it during a planned maintenance window and communicate it to the organization in advance.
Section 7: Plan Maintenance
A disaster recovery plan that was written 18 months ago and never updated is almost as dangerous as no plan at all. Systems change, people leave, vendors are replaced, and the plan becomes fiction.
- Review after every infrastructure change. New server, new SaaS application, new office, new ISP - any infrastructure change should trigger a review of the affected sections of the DR plan.
- Review after every personnel change. When someone named in the plan leaves or changes roles, update the plan immediately. Do not wait for the quarterly review.
- Full review annually. Once per year, review the entire plan from start to finish. Verify all contact information, access credentials, backup locations, and recovery procedures. Update the BIA if business priorities have changed.
- Store the plan accessibly. The plan must be accessible during a disaster. If it lives only on the file server that just got encrypted, it is useless. Maintain copies in at least three locations: printed copies in a locked cabinet at the office, a copy in the IT lead's home, and a copy in cloud storage that is separate from your primary cloud environment (a personal Google Drive or a dedicated DR documentation bucket).
Disaster recovery planning is not exciting work. It does not generate revenue, it does not ship features, and it sits unused until the worst day of your business. But when that day comes - and statistics say it will - the difference between a company that recovers in 48 hours and one that spends three weeks in chaos is a plan that was written, tested, and maintained. Start with your BIA. Get your backups right. Test quarterly. The rest will follow.
Get IT Support Insights Delivered Weekly
Practical tips for IT teams - troubleshooting guides, cost-saving strategies, and tool reviews. No spam, unsubscribe anytime.
Ready to automate your IT support?
HelpBot resolves 60-70% of Tier 1 tickets automatically. 14-day free trial - no credit card required.
Start Free TrialTrack DR Tasks and Incidents in One System
HelpBot manages disaster recovery testing tasks, backup verification schedules, and incident response tickets in a single platform with automated SLA tracking and escalation. Never miss a DR test deadline again. Start your free trial.
Start Your Free Trial