Mark (aka Storagezilla) Twomey wrote in his blog last week about copies and backups, and concluded that point in time copies are backups.
Hrmm. As I have said before, I am not sure that I agree with this.
I have discussed the issue here with W. Curtis Preston too, and I think there is a disconnect happening somewhere. I am fundamentally uneasy with the idea of calling a copy a backup. It seems to me that a copy is a necessary part of a backup, but not sufficient.
Would a tar ball be a backup? A gzip?
Again, I think the answer is, at best, “sort of”.
And why do I differentiate between a copy and the tar ball?
Because I think there is not only a difference between copies and backups, but I think there is a difference between a backup and a backup and recovery system.
So lets start with fundamentals.
SNIA has this to say regarding the definition of a backup:
backup 1. [Data Recovery] A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy. 2. [Data Recovery] The act of creating a backup. See archive.
(By way of an aside: does this mean a “backup” to flash drives is not a backup?)
SNIA also notes that: “To be useful for recovery, a backup must be made by copying the source data image when it is in a consistent state.”
Which is a pretty important qualifier. Does a copy always do this? Well, no. Because there are a lot of ways of making a copy that could do so without ensuring consistency of the data set. So if and only if the copy is done in such a way as to generate a consistent state of (uncorrupted) data, you may have a backup.
But that backup may or may not be “good” in the qualitative sense. Not good, because if it is not generated as part of a repeatable, supportable, manageable, scaleable process, then its utility
as a process is not very good. What is fine for one system is not so good when it needs to be done for 1000 systems. What is OK when done by one sys admin may not be so good if that sys admin is away
and needs to be done by somebody else. What is acceptable to that sys admin may not be acceptable to the audit committee that wants to see and audit trail, reports, and proof of backup. And so on.
So a tar ball may be a backup, but it is not an acceptable part of a backup and recovery system, in my opinion.
And I think this is where some about of subjectivity creeps into the conversation. Because we have moved from having a very concrete definition of backup, to a more subjective discussion of a backup and recovery system. Both of the last two words necessitate this: recovery and system. Recovery, because we then have to ask some harder questions: should it be on the same disk array (presumably not, or not always)? should it have the same permissions and access levels as the source data (again, presumably no)? and, what is the likelihood that the data has been captured in such a way as to ensure recovery–without corruption?
“System” also implies subjectivity, around the quality and repeatability of the process. It means the data must be retrievable by the right, authorized people in a timely fashion (as specified in an SLA, probably) and we must have some means of verifying that the backup of a given system or data set is complete.
In the end, I think we end up with:
- A definition for backup
- A definition for backup and recovery process or system
- A set of mandatory requirements for a backup (included in the definition) and a set of desirable requirements (not included in the definition)
- A set of mandatory requirements for a backup and recovery system (included in the definition) and a set of desirable requirements (not included in the definition)
And it would be nice if we could define a backup and recovery system to everybody’s satisfaction as we have defined backup. Pending further discussion here, that may be the subject of a future post!
Source: The Backup Blog