Recently a customer contacted me for input on an issue in their environment. They have a rather large OpsMgr 2012 R2 Management Group, with several dozen Gateway Servers that are responsible for monitoring OpsMgr agents in isolated environments (DMZ, customer site, etc.). The problem was fairly simple; when an agent that reports to a Gateway Server fails to report a heartbeat and the alert is raised, OpsMgr’s default action is to attempt to PING the agent from the Management Server. While this may work in some environments, it will not work for this environment due to network restrictions; remote agents can only see their primary and failover Gateway Server, and cannot see any Management Servers. So, when the Heartbeat alert is raised, the PING immediately fails, even though you can manually PING the agent from the Gateway Server.
So…what to do? In my search for an answer, I came across this article which describes how to override the diagnostic for a health service heartbeat failure in OpsMgr 2007. The looks easy enough, so I attempted to configure this in my lab; no luck. As you probably know, things have changed between OpsMgr 2007 and 2012. Thankfully, I had some time to document the workaround and share it below.
The first thing you need to do is simple – figure out which Gateway Server you want the PING diagnostic step to fire from. (ex. Server.Domain.Com)
Next, from the OpsMgr console’s Monitoring View, click on Discovered Inventory.
Now, change the Discovered Inventory target type by clicking “Change Target Type” in the Tasks bar.
We will change the target type to “Health Service Watcher Group (Agent)”, then click OK. You may need to select “View All Targets” to find this object.
You should now see 1 object – the Health Service Watcher Group (Agent)
From here, click on the object and then click on the HEALTH EXPLORER in the Tasks pane.
From the HEALTH EXPLORER, click the X to remove the filter and show all monitors.
From here, expand Availability and then find the Availability object for the agent you wish to configure the override against. Once you find that, highlight the Health Service Heartbeat Failure monitor.
With the Health Service Heartbeat Failure monitor object highlighted, click on Overrides (upper left), Override Diagnostic, Ping Computer on Heartbeat Failure, for the object COMPUTERNAME.DOMAIN.COM
From here override the value for Source Computer with the FQDN of the Gateway Server that you want the PING to fire from when this object reports a Heartbeat Failure. (You can also create an override for a group if you have large numbers of agents that report to a Gateway Server.)
Once you have changed the value, save this override to the appropriate Management Pack; this is an OpsMgr object, so you should save it to the same MP that houses other overrides for OpsMgr (do not save to Windows OS, Default MP, IIS, BeanSpy, etc).
Now, provided the agent can in fact PING the Gateway, and you entered the correct FQDN for the SOURCE of the PING, you should be all set.
…hopefully this will be easier to configure in future versions of OpsMgr, or at least the default should be changed to PING from whatever the agent reports to, regardless if its a Management Server or Gateway.