Backup & Recovery

 View Only
  • 1.  CBT backup size versus guest-OS modified file size

    Posted May 29, 2015 01:37 PM

    Hi!

    I administer a vSphere 5.5 environment using IBM Tivoli Storage Management for Virtual Environments (TSM-VE) as our backup solution. TSM-VE uses CBT to make incremental-forever backups of our VMs, meaning that it only backs up the entire VM once; all subsequent backups are incrementals. We backup our VMs once a day.

    I'm currently investigating why certain Windows VMs in our vSphere environment generate huge incremental backups. During my investigation I hit on something I don't understand.

    As a quick-and-dirty test, I did a file scan on a VM, to list all files that had been modified on the VM during the last 24 hours. I then added up the total file size of all those modified files. Then, I compared this combined file size with the size of the incremental TSM-VE backup for that day. I repeated this test on a number of VMs, smaller as well as larger ones.

    I had expected the combined file size to be much larger than the size of the incremental backup. After all, the fact that a file is modified does not mean that the entire file has changed (meaning that all CBT blocks need to be backed up). I expected a CBT backup to be more efficient, size-wise, than an incremental file-based backup.

    Instead, I found out that on all VMs, the incremental TSM-VE backup was consistently 1.5 to twice the size of the combined modified file size, exactly the opposite of the result I expected.

    I've tried to think of a few things that could cause this discrepancy.

    1) In-guest disk defrag. This would change the blocks without changing the files, messing up the way CBT works. However, there are no scheduled or unscheduled defrags on our VMs.

    2) The files on the VM are smaller than the CBT blocks. That could cause a small file to mark a larger CBT block as changed. However, as I understand it, CBT blocks are usually quite small (and not the same as VMFS blocks)

    What am I missing here? Is there some other process that changes the VMDK storage blocks of my Windows VMs without changing the actual files? Is my quick-and-dirty file scan too simplistic? I really hope that someone can explain this to me, thanks!



  • 2.  RE: CBT backup size versus guest-OS modified file size

    Posted May 29, 2015 02:38 PM

    I think this kind of scan is too simplistic, yes. If you want accurate numbers, then you should run a binary diff of two complete disk images.

    A few reasons I can think of from the top of my head why you're seeing something like this:

    - Temporary files that were deleted in the past 24h, naturally your "files modified date within the last 24h" scan won't be able to find already deleted files

    - Pagefile/swap partition

    - This might depend on the filesystem/OS, but a process can open a file in write mode and start writing changes, but the modified date will only be updated when the write mode is formally closed or the process ends (constantly updating a modified date for every written IO when the process is still writing is very inefficient)

    - Filesystem metadata-operations that don't reflect in file properties

    - Filesystem cluster size - This depends on the application but if for example your filesystem cluster size is 64KB and you change a single bit in a 1KB file, then this might trigger (64*1024)/512 physical disk sectors to be be written. With a smaller cluster size less blocks will be written

    - Transparent filesystem compression or such



  • 3.  RE: CBT backup size versus guest-OS modified file size

    Posted Jun 01, 2015 12:20 PM

    Thanks MKguy, you're making some good points.

    - Deleted files - I admit I hadn't thought of deleted files. However, deleted files usually don't actually cause many blocks to change, because the files are only made unavailable to the file system; the actual disk space is usually not overwritten.

    - Pagefile - not relevant here because I've included the swap file in my combined modified file totals.

    - We don't use any kind of filesystem compression

    - Filesystem cluster size - good one but then I would only expect this phenomenon on servers with millions of tiny files, instead of all of them.

    It still seems like a huge discrepancy to me... for example, we have a very busy server with 120GB of used space, running a data warehouse application. Purely based on modified files, this server generates around 23GB of changed data per day. The daily incremental CBT backup however is around 58GB...



  • 4.  RE: CBT backup size versus guest-OS modified file size

    Posted Jun 05, 2015 10:11 AM

    - Deleted files - I admit I hadn't thought of deleted files. However, deleted files usually don't actually cause many blocks to change, because the files are only made unavailable to the file system; the actual disk space is usually not overwritten.

    I'm not referring to the deletion process itself, it's about what was written to files since the last backup before they were deleted.

    Btw, not sure if it will make any difference but on my Windows Templates I usually disable the "update last file access time" attribute:

    fsutil behavior set disablelastaccess 1

    (mounting with the noatime attribute in Linux)

    To analyze the effectiveness of CBT, you can test the following procedure: (I would test it myself but I don't have access to backup systems)

    1. Create a test VM or clone an existing one where you're experiencing the issue (and disconnect network)

    2. While the VM is powered-off, make a full backup and a subsequent incremental backup

    3. Boot a live Linux image in the VM (make sure the VM really doesn't boot its local OS first)

    4. Mount the VM's filesystem into the live Linux (writable, most modern Distributions can mount NTFS writable just fine)

    5. Write a deterministic amount of random raw data to the disk, for example do something like this to write 100MiB of raw random data and flush the filesystem buffers:

    #  dd if=/dev/urandom of=/mnt/MyVMFilesystem/testfile.bin bs=100M count=1

    #  sync

    6. Unmount the VM filesystem

    7. Make an incremental backup and check the backup size

    Another general thing to check is whether the VMDK format (thin vs. thick) makes a difference.



  • 5.  RE: CBT backup size versus guest-OS modified file size

    Posted Jun 04, 2015 10:57 AM

    I'm also experiencing this issue but on a much worse scale.

    A large (2TB) volume containing uses home directories is backed up with daily incrementals.

    Routinely, the backup is at least 10x the size of the modified or created files since last backup. This is slowly filling our backup storage.

    For example, over a 24h period there was about 2GB changed or created files. The CBT backup was 23GB.

    The CBT block size is 512k.

    I can only guess what generates this amount of data. Main suspects are:

    • Windows Group Policies pushing out 20 files (favorites and word templates) to every connected client (average 300) every 90 minutes.
      This effectively creates a new file and deletes the old one.
      After a full workday this is about 20x300x5 = 30.000 changed blocks. This alone would be 15GB, with no "new" files
    • Microsoft Sync Center synchronizing offline files, I can see this generates a lot of created temporary files.
      In my case, about 230 per sync even with no files changed. I don't know how often this runs, but it's impact is potentially huge.

    Currently I'm trying to diagnose by turning off some of these features and watching how it impacts backup sizes.



  • 6.  RE: CBT backup size versus guest-OS modified file size

    Posted Jun 11, 2015 07:47 PM

    I'm not familiar with how TSM works.  If its similar to Networker and CommVault during an incremental backup a file index is created which would be included in the incremental backup size.  In particular with Networker legacy VADP backups this index can be multiple gigabytes.

    Could a file index explain the difference between total changed file size and TSM incremental backup size?

    Avamar\VDP do incremental forever backups.  They generate a small amount of "meta data" for each backup if if no files change.  In my experience this is less than a file index with VM backups but it would still account for something.

    Does TSM save "meta data" on each incremental backup?  Would this account for the difference?