[Update 11/29/2017: This blog post series has been superseded by a solution built to visualize server and client information which is available at: http://blogs.catapultsystems.com/cfuller/archive/2017/11/28/updating-the-server-and-client-performance-solution-to-the-new-query-language/. Please note that query examples in this deprecated blog post are for the old query language and will not work in the current query language.]
The first part of this blog series showed how Microsoft has built in a notification for what OMS agents are offline as part of their pre-built reports. The second blog post provided a query for OMS agents who have not reported into OMS for the last hour. This blog post will show an example of how to notify when a specific agent is not reporting into OMS.
Querying for an OMS agent which has not had specific events in their event log:
As part of my approach to monitoring Windows Media Center Monitoring with OMS I wanted to add an email notification if my Media Center system was no longer functional or able to report to OMS.
For history, Tao did something similar to this for management servers in his self-maintenance management pack available at http://blog.tyang.org/2015/09/16/opsmgr-self-maintenance-management-pack-2-5-0-0/. In my example I am looking to provide a notification when a particular agent is no longer functional or if the agent loses connectivity to OMS (or the internet). To do this in OMS required for following steps:
- Collect Operations Manager events
- Enable the alerting preview
- Using the log search functionality
- Creating an alert rule
- Creating a dashboard
- Viewing sample email alerts
Collect Operations Manager events:
Operations Manager provides a pre-built event which is logged in the Operations Manager log every 15 minutes if an agent is reporting correctly. To collect this information log into oms (www.microsoft.com/oms) and go to settings on the data tab. Add the Operations Manager log to collection as shown below (in this example you could restrict collection to only Information events if you want to minimize data collected by the agents).
Use the save option on the bottom to commit the addition of the Operations Manager log (note the purple line next to Operations Manager which indicates that the change has not been saved yet).
Enable the alerting preview
As of 1/13/2016 the alerting functionality is in preview state. To enable Alerting (if it’s not already enabled) in Settings / Preview Features (details are available at: http://blogs.technet.com/b/momteam/archive/2015/12/02/announcing-the-oms-alerting-public-preview.aspx)
Using the log search functionality
Now that we are collecting the appropriate events we can use the log search to find the correct event. It’s easy to start on this by filtering to the Operations Manager event log entries using a query like this:
Type=Event (EventLog=”Operations Manager”)
For our heartbeat type alert we want event ID # 6022 so that updates our query to the following:
Type=Event (EventLog=”Operations Manager”) (EventID=6022)
Finally, we want to restrict this query to a specific system and to the last one hour of data which updates the query to the following:
Type=Event (EventLog=”Operations Manager”) (EventID=6022) TimeGenerated>NOW-1HOURS Computer=”<ComputerName>”
(The time restriction idea came from: https://technet.microsoft.com/en-us/library/mt484120.aspx)
This event should occur every 15 minutes for a functional Operations Manager agent. For this example we want to restrict the results to a single system. A successful heartbeat looks like this:
We save this query next using the Save button which was recently moved to the top left corner of the UI:
Creating an alert rule
Next we add an alert notifying if there aren’t successful heartbeat alerts over the 35 minute timeframe by checking for the condition of less than 1 result occurring in the last 35 minutes. This condition is checked every 15 minutes and if the condition is met an email notification is sent.
Creating a dashboard
We can also set this up on a dashboard where it goes red if OMS isn’t receiving this type of event. You can create your own dashboard by going to the My Dashboard option from the top level of OMS and adding the query that we saved above with the following visualization. By choosing to highlight when the value is less than 1 we only go to a red state when no heartbeats have occurred during that time interval.
The number displayed on the dashboard should have a value between 1-4. One of the interesting features of the dashboard solutions in OMS is that they are easy to access on the mobile edition as well. The example below shows the two different query based approaches to providing heartbeats that I am using in this blog series when both are in a healthy condition (the top system has 3 heartbeats in the last hour, the bottom shows that there are no systems which OMS is monitor which have not sent any data in the last hour).
In the example below, the heartbeat monitor for one system is reporting as healthy (> 1 for the count of heartbeats). The bottom shows a second system which is currently not reporting into OMS called “Mine02”,
Viewing sample email alerts:
An example of these alerts is down below:
It is important to note that this approach continues to notify on an every 15 minute basis as long as this system is no longer reporting to OMS.
Update: A comment was placed on a previous blog post which indicated an approach that they were using to log a particular event to each system every two minutes and alert if those “heartbeat” type alerts were not occurring. This is similar in approach to what I am discussing in this blog post but it is a good variation to consider if you need to have a more real-time approach to heartbeat checking for multiple systems.
Summary: This does require that the agent is not only reporting to OMS but also is reporting to Operations Manager so this would only work on agents which are either multihomed to OMS and OpsMgr or in environments where the Operations Manager environment is integrated with OMS. This approach will provide a constant notification (every 15 minutes) if the particular system is continue to not report to OMS.
The final blog post in this series will discuss the various challenges I have seen with each of these three approaches.