One agent in an environment I was working with was reporting that DNS has been shut down on the box, and we saw an alert that DNS resolution was occurring slowly. Based upon the alert for DNS resolution it was apparent that something was wrong as it was reporting 0’s for items including BestTime and WorstTime and no values for BestHost and WorstHost as shown below on the alert context tab.

image

We logged in and verified that DNS was running even though alerts were indicating that DNS has been shut down on the box. Through reviewing the events for the agent we found that various 1163 events were being written from the Health Service State which were the cause of the alert that the DNS service was not running on this box.

As a rule I haven’t seen situations where OpsMgr generates fictional alerts but this was looking like one. However this is what we were seeing.

Alert: DNS is not running

Reality: DNS is running

Once we dug into the information above from this alert (BestTime and WorstTime and no values for BestHost and WorstHost blank) we were able to track this back to a failure of the script which was checking the service.

image

After further debugging we came to an assumption that this was caused by a corruption of the OpsMgr agent health state folder. To resolve this we stopped the OpsMgr Service, renamed the Health Service State folder (Health Service State), and restarted the services to rebuild the folder. After a short period of time, the agent started reporting correct data including the information shown below which shows the DNS servers used and their response times which validated our assumption that the alert was caused by a corruption in the OpsMgr agent health state folder content.

image

Summary: A corruption of content in the Health Service State folder can result in strange alerts as shown above. To identify situations like this look for strange alerts which do not reflect the reality of the situation and are not occurring on other agents which perform the same function in the environment (such as another DC in the same location, or in the case above another DNS server in the same location).