One of the common approaches that I use when tuning OpsMgr is to see what are the alerts which occur most either by the alert volume or by the repeat count (the screenshot below is from the Service Manager dashboard customized for OpsMgr).
Using the graph on the left it’s easy to identify that the “OleDB: Results Error” is causing the largest portion of the alerts by repeat count in the environment. For background, the repeat count is not shown in the default views. To add the repeat count go to “Personalize View” and add Repeat Count (shown below).
This provides a view which shows us with how long the alert has existed and how many times it has repeated. This particular rule creates an alert based upon an event in the event log.
It’s often useful to know how often an alert is occurring in order to determine what is occurring which is causing the alert to occur. As an example, a script may be set to run every 5 minutes which in turn writes an event to the event log. We cannot tell from this rule when it is scheduled to occur because the rule is looking for an event in the event log. The rule is not running on a scheduled basis.
((61 days * 24 * 60) + (22 hours * 24) + (58 minutes)) / 22024 RepeatCount
((87840 + 528 + 58) = 88426) / 22024 = 4.01
((days * 24 * 60) + (hours * 24) + (minutes)) / RepeatCount
In this case the alert was caused by an OleDB check gone bad which was executing approximately every 4 minutes.
Summary: Trying to determine how often a rule is causing a repeat to occur? Add the repeat count to the alert view, and divide the repeat count by the number of minutes since the alert was generated.