As part of the Operating System management packs, Operations Manager provides warning level alerts which impact the state of a server when drives on the server are heavily fragmented. For good discussions on these and history related to them please see the following articles:
- Kevin Holman’s discussion on this functionality: http://blogs.technet.com/b/kevinholman/archive/2009/09/28/new-base-os-mp-6-0-6667-0-adds-file-fragmentation-monitor-to-all-logical-disks.aspx
- Logical Disk Fragmentation Level is High ReSearch This! KB Part 1: http://www.systemcentercentral.com/BlogDetails/tabid/143/IndexID/67143/Default.aspx
- Logical Disk Fragmentation Level is High ReSearch This! KB Part 2: http://www.systemcentercentral.com/tabid/145/indexId/82727/Default.aspx
I agree with the logic to making these monitors and that they impact the state of the server. The difficulty with this monitor is that it generates a lot of alerts and changes the health state of most of the servers in the environment to warning. As a result this is most often turned off to “blissfully ignore” these situations (well said Ian!). There are challenges with this monitor in terms of how this monitor works which I have included as a subset from an earlier post below:
“I would like to be able to enable this recovery for all servers in the environment (and exclude specific ones) however there are a few major challenges with this approach as of this version of the management pack:
- IT organizations are extremely hesitant to automatically defragment systems because it can slow down performance of the server. If we could back this up with a counter which would monitor the actual defragmentation process this risk could be mitigated. If we could add functionality which monitors the actual defragmentation and alerts in case of an issue this would remove most of the hesitancy to automatically defragmenting these systems.
- On the weekly defragmentation, the state needs to be reset to green and then recalculated to determine what the actual level of fragmentation is. Otherwise this situation can occur: My drive is highly fragmented and goes to warning, I defragment it on Tuesday but by Friday it’s back over 10% so the state stays in warning. As a result since state has not changed it will not fire a recovery for this drive. Additionally, if I run the task or the recovery the state of the entity should be reset to green for the same reason. I know that I defragmented it, I hope now that it’s actually defragmented but if it’s not defragmented I want OpsMgr to run the defragment again when state changes from yellow to green.
- There should be another task available which would re-run the assessment of the fragmentation level so that we could get updated metrics for what the actual fragmentation was at that point in time (such is if I just defragmented a drive, I would like to remotely be able to know that it’s now green and that the fragmentation level is now 5% or whatever it is).
- The level of fragmentation should be stored in a performance counter so this can be trended.”
As updates to the above information for the first point – defragmentation automatically stops itself after one hour so this does help to minimize the potential risk of impacting production systems. The second item above I will be discussing in this blog entry. The third and fourth items I still believe would be beneficial next steps for the management pack.
To reset the health states for these servers only for the logical disks, we created a batch file which contained the following:
GreenMachine –reset –w –class “Windows Server 2008 Logical Disk”
GreenMachine –reset –w –class “Windows Server 2003 Logical Disk”
GreenMachine –reset –w –class “Windows Server 2000 Logical Disk”
Successful execution of the GreenMachine class level reset looks like the following:
The batch script resets the health states for these warning conditions for the various types of Windows Server logical disks and when these are reset they automatically close the corresponding (Logical Disk Fragmentation level is High) alert. This batch file can then be scheduled to run daily (or once a week) after an automated recovery has occurred. If we put these all together we can develop an automated process that looks like this:
- Sunday 3:00 am: The OpsMgr management pack detects a high fragmentation level on a logical disk, generates the alert and changes the health state of the server to warning.
- Sunday just after 3:00 am: Since the state changes from green to yellow and an override is in place to cause the defragmentation to occur, a recovery is executed to defragment the drive.
- Sunday around 4:00 am: The recovery is either completed or stopped.
- Monday at 2:00 am: The batch script for GreenMachine is run to reset the states for the logical disks returning the logical disk health state to green for the fragmentation level. This resets the cycle and allows the drives to be re-identified to determine if they are highly fragmented.
This process requires GreenMachineR2 v1.04.1 (which is available for download here) since it uses the –class option
Summary: Automating the defragmentation of drives in Operations Manager is viable with the right process!