One of the features I was recently excited about with the release of Windows Server 2012 is Data Deduplication. I decided to kick the tires with this in my lab environment for my Windows Server 2012 system which is both a Hyper-V servers and provides my Virtual Machine Manager library. The screenshot below shows my three drive configuration which has a C drive for the OS, V drive for virtuals, and a Y drive for software media and templates. The goal is to enable data deduplication on the V and Y drives for this server.
Adding the Data Deduplication functionality in Windows Server 2012
Let’s add the data deduplication feature: http://blog.powerbiz.net.au/features/data-deduplication-in-windows-server-2012/. Add the data deduplication part of the File and Storage Services Server Role:
Installation documentation is available at http://technet.microsoft.com/en-us/library/hh831700.aspx, or http://technet.microsoft.com/en-us/library/hh831434.aspx or http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx.
Determining benefit to data deduplicating a drive:
This feature provides a program called DDPEVAL.EXE (c:\windows\system32) which can assess estimated disk file space savings from using data deduplication. The example below shows the results when it was run for my 3.63 TB Y drive shown at the top of this blog post. On a large volume like this drive the process may take a significant amount of time to perform. For my lab environment with relatively slow drives this took about a couple of hours to run for my 3.63 TB volume. Results shown below: (45% estimated space savings percentage with no compression is impressive!)
Activating Data Deduplication on a volume:
To configure data deduplication on a volume, open the server manager, File and Storage Services, Volumes.
Right-click on the volume that you want to configure this for and choose the “Configure Data Deduplication” option.
Enable data duplication, and configure how old files need to be before they can be deduplicated.
File extensions can be excluded, and a schedule can be configured for when the deduplicate this volume as shown below.
Note: Per (http://technet.microsoft.com/en-us/library/hh831700.aspx) if you enable throughput optimization the system will use up to 50% of the system’s memory for the optimization job (which would probably not be a good idea on a highly memory utilized Hyper-V server as an example).
Once data deduplication has been activated for the volume the fields are added the deduplication rate and savings fields are now populated as shown below:
If a volume is not supported for data deduplication the option is grayed out as shown below where the C drive is not allowed (system or boot volume):
What does it run?
Using resource monitor we can see the ddpeval program and what files it’s accessing while it’s assessing the benefits to data deduplicating a volume.
Where can’t data deduplication be used?
What we can’t use dedup on: (subset from http://technet.microsoft.com/en-us/library/hh831700.aspx)
What are the expected disk space savings?
The following is a great sample what we should expect to save in disk space from http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx.
So what about result in my lab environment?
Let’s see what it says will be gained on my Y drive which stores my software media.
Results one day later:
After one day (and re-opening Server Manager), the results were as shown below:
Testing my V drive – 932 GB with 555 GB free – now has 626 GB free (71 GB additional space).
Testing my Y drive – 3.63 TB with 831 GB free – now has 919 GB free (88 GB additional space).
Results four days later:
After after four days (and re-opening Server Manager), the results were as shown below:
Testing my V drive – 932 GB with 555 GB free – now has 634 GB free (93.3 GB additional space).
Testing my Y drive – 3.63 TB with 831 GB free – now has 1.08 TB free (285 GB additional space).
Results one week later:
After one week (and re-opening Server Manager), the results were as shown below:
Testing my V drive – 932 GB with 555 GB free – now has 680 GB free (143 GB additional space).
Testing my Y drive – 3.63 TB with 831 GB free – now has 2.42 TB free (1.63 TB additional space).
Results after the leaving the system online with deduplication configured:
Several months later I returned to this blog post and found the following results:
Testing my V drive – showed that it had a 43% deduplication rate and had gained 417 GB of additional space through deduplication.
Testing my Y drive – showed that it had a 69% deduplication rate and gained 1.69 TB of additional space through deduplication!
Or before and after screenshots shown below:
Before:
After:
Summary:
I am extremely impressed with the disk savings seen in my Hyper-V servers in my lab. I have since activated this on all of my Hyper-V servers for the drives where I store my virtuals.
There are several types of drives (including the system or boot volume) that dedup cannot be used on.
It can take some significant amounts of time to complete the full deduplication process and there needs to be sufficient disk space and memory available on the host to effectively deduplicate the volume.
Resources and links:
- Adam Rafels provided a great writeup on this that I wish I had found prior to starting mine which is available at: https://www.catapultsystems.com/arafels/archive/2012/08/09/windows-server-2012-data-deduplication.aspx
Additional recommended links:
- http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx
- http://vmfocus.com/2012/10/23/implementing-testing-windows-server-2012-deduplication/
- http://blog.powerbiz.net.au/features/data-deduplication-in-windows-server-2012/
- http://technet.microsoft.com/en-us/library/hh831700.aspx
- http://derek858.blogspot.com/2012/10/windows-server-2012-file-share.html
- http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303(v=vs.85).aspx