16 database recovery strategy mistakes

High Availability and Disaster Recovery statistics

Statistics from the “2018 State of Resilience” report by Syncsort, a leader in big data software solutions, highlight an increase in pressure IT leaders are feeling to ensure security, high availability and disaster recovery. Having surveyed 5,632 IT professionals globally on data protection strategies, the report shows a staggering amount of companies are behind the curve on adequate HA and DR strategies.

  • Business continuity/high availability (47%) is IT’s top concern, security is the top initiative companies will pursue in the next 24 months (49%).
  • Nearly two-thirds of companies perform security audits on their systems, but the most common schedule was annual (39%).
  • IT leaders are entrusting critical applications to the cloud, but with concerns. 43% identify it as their top security challenge for the coming year, followed by sophistication of attacks (37%) and ransomware (35%).
  • Availability is top measure of IT performance. IT is not only being measured on functional IT disciplines, but also on impact to customers and business users. Availability/uptime led at 67%, followed by application performance (49%) and customer satisfaction (47%).
  • Nearly half of businesses experienced a failure requiring a high availability/disaster recovery solution to resume operations. 35% lost a few minutes to an hour of data, 28% lost a few hours and 31% lost a day or more. Only half of businesses are meeting their recovery time objective (RTO) and, despite known risks, 85% of respondents had no recovery plan or were less than 100% confident in their plan.
  • Migrations a weak point in business resilience. While many companies are undergoing migrations to upgrade outdated technology (68%), improve performance (50%) and consolidate servers (42%), 42% have experienced a migration failure and 68% reported their systems were down between one and 48 hours during the last migration.
  • 53% of companies had multiple databases and share data to improve Business Intelligence, largely through scripting (42%), followed by backup/restore/ snapshot processes and FTP/SCP/file transfer at 38% each.

“IT leaders are under immense pressure to provide an enterprise infrastructure that can sustain severe threats and secure vital information while enabling data accessibility and business intelligence,” said Terry Plath, Vice President, Global Services, Syncsort. “Business resilience requires the right mix of planning and technology, and this survey did a thorough job of uncovering how businesses are tackling this increasingly complex and multi-faceted challenge.”

 

Backup mistakes to avoid

There are many mistakes or oversights made in creating backups, here are 16 of the top items we have encountered:

1. Don’t back up a virtual machine through the guest OS

OS level backups work but uses the vm resources instead of using the virtualization layer. Backups run at the virtualization layer create block level in=mage copies rather than an OS file level.

2. VM snapshots are not backups

VM snapshots preserve the state of a VM from the point in time when the snapshot was taken. Additionally, multiple snapshots can be created to provide more than one restore point to choose from. While this can be useful in certain situations, it should never be used as a primary method for backing up VMs.

One problem with VM snapshots is that, once you revert back to a previous snapshot, you can’t go back to the present. The current state of your VM is lost, and you can only revert to previous snapshots. Snapshots are not useful for restoring individual files because they only bring a whole VM image back to a present state.  Snapshots are useful as a secondary method for short-term or ad hoc backups if you need to permanently revert to a previous state, such as when applying patches or upgrading applications.

3. Make sure you are quiescing properly

Most virtualization backup applications back up at the image level and are not aware of what is going on inside the guest OS. Before you start backing up VMs, you need to ensure they are quiesced so they are in a consistent state to be backed up. If you don’t quiesce them, you risk having data that is not in a state to be restored properly. The quiesce operation is handled inside the guest OS, and for Windows VMs, Volume Shadow Copy Service (VSS) handles this.

4. Deleting the previous backup copy before a new backup copy is created

This mistake is most common among newbies who do not realize that the main purpose of a database backup copy is not just to create a database copy, but to make the downtime of an information system (an important part of which is the database) as short as possible.

As a result, the system remains unprotected from the moment the latest backup copy is deleted and to the moment the new one is created because the database does not have a single backup copy during this period.

5. Overwriting an existing database while restoring it from a backup copy

If the backup copy has not been verified and turns out to be corrupted, you will have neither the previous copy of the database nor a valid backup copy.

6. Using one-step backup/restore without using an intermediate backup file

Standard input/output streams make it possible to implement a streaming backup with restoring the database from it at once. No intermediate backup file is created as a result. It is convenient for routine maintenance and for running a test restore operation, but if a serious disk failure occurs during this backup/restore process, the initial database may become damaged while no new database has yet been created.

7. Storing backup copies and the database on one and the same physical device

With so many levels of virtualization/abstraction in modern systems, it is quite possible that the database and the backup disk may end up being stored on one data storage system.

8. Not controlling the successful completion of the backup process

It is important to know that your backups have completed successfully through some kind of monitoring or notification system. At the very least have a manual check in your daily check list but it is easy to set up email notifications or other automated monitors. Without this, how would know if your backup has succeeded?

9. Not validating backups

A backup successfully creating a file does not mean that the backup is valid and can be used to restore from. Validation can be automated and will ensure that the backups are not corrupt and can be read.

10. Not running database health checks

Most database management systems have some kind of health check facility which may detect issues that are not captured by the backup method. There is little point in backing up a corrupt database.

11. Not controlling free space for backups

It is quite easy to run out of space if you do not plan enough overhead. Backup sizes can change from day to day depending on the amount of activity and changes to the data. As you will be ensuring that you have a successful new backup before deleting the old ones, there must always be enough overhead for your largest expected backup size.

12. Not monitoring the time it takes to create a backup

The length of time for a backup to complete is a great indicator of the health of your system and the effects of scheduled tasks. The best backup window may change due to operational reasons or other related events. If your backups take a very long time, this could affect your backup and recovery strategy. For example, you currently do a daily full backup which used to run in 3 hours a few years ago but it is now taking 25 hours to complete because there is so much data – this requires a rethink!

13. Backing up the database while operating system updates are being applied

This is a problem more often seen in Microsoft Windows servers but other OS’s are affected too. OS updates may affect how long your backups take if run concurrently but worst still is when the OS update reboots the server during your backup, making the backup invalid. Especially in large IT environments with different teams, make sure that updates and backups are coordinated.

14. Backing up the database with the help of file backup tools or virtual machine backup tools while the database server is running

Databases are complex with many moving parts and not a simple file system. If you plan on using external backup tools, ensure that the database is in a state compatible with that style of backup. This may need the database to be put in backup mode or even shutdown for the backup.

15. Replacing backup with replication

Replicated databases are great to improve availability and to reduce or prevent data loss but they are quite different from a backup. If a table gets dropped in the primary database, the table also gets dropped in the replicated database. Also a corrupt primary may have unexpected effects on the replicated data. A good High Availability and Disaster Recovery (HADR) model will provide multiple database instances and replicated data but will also need to have frequent backups.

16. Not encrypting backups

After making sure that your database is secure and protected, don’t ruin it by having unprotected backup files, especially when considering cloud or offsite.

Discover Your Opportunities

Data Solutions | Application Development | Automation

GET STARTED TODAY