Effects of using thin provisioning inside a Linux/ Vmware/ Deduplication environment
Published on: 25 November 2020 | Posted by: Usha Devel
Hard fact: If you didn’t know: STORAGE IS NOT CHEAP!
Storage is not cheap… specially datacenter-grade storage. So every kilobyte you save is a win! If your main storage solution is a fast one (all flash) with many degrees of redundancy then “cost per byte” is something constantly blinking in your mind.
If your storage solution is also one of those doing some kind of de-duplication, then the less wasted data you have, the best de-duplication ratio you will get.
Then now: How do we clean the waste and keep our storage in shape? First thing is: Eliminate the fat!
FAT is unhealthy: For humans, and for Linux systems!
Typically the fat comes from the way a standard operating system stores data inside a filesystem, and specially, how that data is managed when a delete operation happens. First thing to understand here is this: Operating systems are lazy… file deletions are just a mere way to “eliminate” the presence of a file but not necessarily “clear” all the bytes used by it. Normally, when a file is “deleted” some “ghost” data remains. This data eventually can get partly overwritten or fully removed but it is not automated at all… we sysadmins are not that lucky!
This is especially true when the operating system is being presented with a normal disk (spinning, non flash, non ssd). Some tools can get rid of the “deleted” data (basically cleaning free space) but again: this is not automated at all – it requires someone or something to do the task with special tools! Also they require computing power.
So spinners are fat-storage devices… thanks for ruining my day! But what about newer SSD’s?
Then enters the SSD: Most modern SSD’s solves the fat-excess problem by providing an operation that actually clears the waste: TRIM. But what is trim? TRIM (when active) provides a way to mark blocks that are being deleted so the disk will eventually “really delete” the blocks and keep the entire disk optimal. This improves the performance and keeps the free space really free. So fat is really gone… it is?? Well it depends!
You need that the operating system is capable to use trim!
BEHOLD! The power of discard and fstrim!
In Linux systems you can perform TRIM operations by mounting the filesystem with the “discard” option. Also, most modern linuxes includes the “fstrim” utility (and service) that can be enabled so the OS will “trim” the filesystems from time to time and clean whatever it was left untouched by the “discard” option. Basically: Discard and fstrim are FAT killers.
In the case of the “discard” option, the TRIM operation will be performed online if possible (so the “delete” will send the trim command to the SSD) but it can be also combined with the fstrim service (or by calling fstrim true crontab).
Now, this sounds very nice provided the machine is a physical one and it is directly connected to a SSD disk but what happens if our environment is Virtual (say.. running in top of VMWARE) ?
Then we have a solution of course: THIN PROVISION!
FAT provisioning and THIN provisioning!
Well… There is no such thing as “FAT” provisioning in VMWARE. The real concept is thick provisioning (in the form of two options) and thin provisioning. The actual options are:
- Eager Zeroed (thick)
- Lazy Zeroed (thick)
I won’t describe in detail what eager or lazy zeroed does but basically those are thick provisioning (FAAAATTTTT). If you create a VM with a 20GB disk, then a vmdk file 20GB long will be created. On the “Thin provisioning” side, if you create a VM with a 20GB disk, the vmdk file will be smaller and only grow when actual used data is written. In other words, if your VM uses 5GB out of the 20GB assigned, the VM size will be really 5GB… but beware: This is not always true!
Bad news: THIN can become FAT! Good news: We can make our VM’s to work out and reduce the FAT!
What happens if the OS in top of the VM doesn’t know that the underlying storage is thin? Then bad news: The free space will contain deleted entries that are not cleaned at all and eventually your “apparently used 5GB” will grow to the full 20GB of assigned data. Of course this doesn’t harm the OS at all… most modern filesystems will just re-use the deleted space (this is the “eventually overwritten” we mentioned before) but the underlying storage will see 20GB of used (or wasted) space when only 5GB are really used! This is WASTE and a typical example of “how my thin storage became fatty”.
So what’s the solution here? Easy: Ensure the OS understand that the storage is “thin” and act in consequence!
How we do that? First: most virtualization solution solves that by presenting the “thin storage” as SSD so the OS will see as if the disks are true SSD. In the case of Linux, this allows the kernel to enable TRIM and allow the disks/partitions (even the swap) to be mounted with the discard option. Then… our 5GB will be really 5GB’s and whatever clean space will be really clean/zeroed space.
So the strategy here is ensuring our VM’s are using the discard option (for every partition and also the swap), and also, the fstrim utility. If you want to be extremist, crontab the fstrim command to run every night so whatever block that was not trimmed by the discard (online trim) will be “trimmed” every night (please schedule this at low-utilization hours as fstrim can cause some excess on disk I/O).
In other words: Allow our VM’s to work out daily so the unhealthy FAT is kept at bay! (no more “gordo” VM’s here please).
Note that the actual vmdk size will be also kept “thinner” when the trim/discard operations are performed so the impact on the underlying datastore will be noticeable in a positive way.
Ok Ok you sold me! Thin provisioning and discard/trim are good things… now… how does this impact de-duplication?
In order to understand the benefits of thin/discard/fstrim for dedupe operations let’s first understand the bad scenario: FAT (thick) provisioning with no discard/fstrim at all!
When a “FAT VM” sees its disk as a “spinner” (thick provisioning scenario) there is no immediate erasing of “deleted” files. As mentioned before: The entries are deleted, but the actual data is there until it gets overwritten (that is still “not deleted” bytes – just overwritten bytes).
Let’s see this example: A 20GB “FAT VM” with some heavy disk operations will eventually use all the 20GB’s (or near it) then reduce due file deletions to, say, 10GB’s… but the free space is filled with non-zero bytes. If the underlying datastore supports any form of de-duplication, it will take into account “all” non-zero bytes for the actual deduplicable data so the “deleted files” (that are not really there but are kept like ghosts) will impact the deduplication in a negative way. Also if you have an integrated backup system (like veeam for example) the same effect will be present. You’ll backing up (and deduplicating) the actual really present data inside the VM (the 10GB) and whatever ghosts are still present inside the VM (the free space filled with non-zero bytes). In other words: You are wasting your resources in FAT!
The only solution (an imperfect one) for this scenario is to erase the free space (fill with zeros) so the dedupe is impacted in a positive way. Sadly this is costly to do (in terms of system administration) and ineffective in terms of computational resources. Also this doesn’t reduce the VM size (the vmdk file size).
Now consider the good scenario: VM’s with thin provisioning and discard/fstrim support: The VM space that will be deduplicated (and backed up) is the “really used” space. There is no free-space with ghost files or anything that requires to be filled with zeros. Also the VMDK’s are smaller so it is faster to dedupe and backup and entire VM. In other words: There is no FAT to worry about!
Then the daily work out enforced on your once-fat VM’s will result in a thinner and more agile environment. Much better deduplication ratios. Much shorter backup and restore times. Less space wasted in “wasty fatty bytes”.
So… this sounds all very nice but… where’s the catch? There is ALWAYS a catch!
There is a catch indeed that comes from the most dangerous factor inside a datacenter: You (yes, the human reading this)! You are the most dangerous thing on earth and a danger for every datacenter in the world! But why?
Using thin provisioning opens a window for an undesirable effect: Datastore oversubscribing. You may feel the urgent need to oversubscribe your storage: Please DO NOT! In a thin provisioning scenario it is VITAL that the provisioned space NEVER exceeds the available space inside a datastore (even if you feel that you are a statistics genius). Don’t be fooled by yourself! If you oversubscribe your storage, this will eventually return to you with bad consequences like a nuclear boomerang with embedded poisoned razors! If your datastore is, say, 20 Terabytes don’t add more than that in “provisioned” space (even if you see the datastore with very low utilization percents) so the sum of all your virtual hard disk should not exceed those 20TB’s. Keep your nose clean sysadmin!
So the catch here is to feel over-confident that we can add more space that what we actually have because the magic of trim/discard will keep things thinner. Don’t fall to that. Period!
Remember: trim will only help you to get rid of the wasted space but it won’t help at all with the “really used” space. If your VM’s get’s near 90% “all at the same time” and you oversubscribed your storage then prepare yourself for some sleep deprivation (or losing your job)!
Perfect I got it: Thin provisioning, discard/fstrim and no oversubscription. Then what’s my plan in few lines?
Assuming you have part of your infrastructure already on thick provisioning, this is what you need to do:
- First ensure your VM’s are configured with discard/fstrim even if they are not yet moved into a thin provisioning. Make a plan to move the VM’s and reboot them so they see the change. If you use vmware, the only way to change a VM from thick to thin is doing a storage migration (choosing the destination disk as thin provisioning). Then for linux to see the changes, you’ll need to reboot. This requires some planning ahead. Don’t get your customer mad with non-planned reboot operations.
- Ensure that you and your teammates are deploying new VM’s using thin provisioning. If you use templates for new VM’s, create new templates with the fstrim/discard options already configured. If not, ensure your procedures include the new configurations for mounting disk with discard and enabling fstrim.
- Keep track of the provisioned space vs total datastore space. DO NOT oversubscribe. Repeat with me at least one thousand times: I will not oversubscribe my storage!
And finally, monitor the changes: See how your available space and deduplication factor improve over time. You’ll see improvements only if you properly monitor your platform.