What if you have an environment which is geographically dispersed and uses redundant WAN links to provide connectivity between the locations and you want to monitor these links with Operations Manager? You are using Operations Manager 2007 to monitor the environment and you want to know if a remote location is up or down, and to be provided with alerts which specific links go offline. A while back I discussed the concept of using TCP Port monitors to provide rudimentary network monitoring (up/down, and response time) which is available at http://cameronfuller.spaces.live.com/blog/cns!A231E4EB0417CB76!993.entry. With TCP Port monitors we can determine if a router is up or down and how responsive the test is. However this gets more complex when you have redundant links. For our example, we wants alerts if we lose either link to the remote location but we want a critical alert if we lose all connectivity to the remote location. This type of a situation is an example of where distributed applications can be extremely useful.
To provide redundancy in case a single watcher node failed we configured two different watcher nodes for each network link we were monitoring with a TCP port monitor which was monitoring the IP address of the router or switch on port 22. We used the distributed application designer to model this network configuration as shown below:
For each link, we configured it to provide a warning alert if both of the watcher nodes reported an error (using the Rollup Algorithm of Best Health State so that if a single watcher node experienced an issue that would not be enough to indicate that the network link was down).
We configured the top level of the distributed application to provide a critical alert if both the network links are experiencing issues (using the Rollup Algorithm of Best Health State).
To provide a high level overview of the state of the entire network we use Savision LiveMaps to show the map for locations including the network link which we created between Austin and Denver.
To find the IP addresses for the remote network devices, we performed a traceroute to a server in the remote location. From that traceroute we tested the IP address which was listed prior to the server’s IP address. We tested this IP address via telnet to port 22 to see if it was a networking device/if not we moved back one IP address on the traceroute until we found the appropriate network device.
Summary: Using both tools built into Operations Manager (TCP Port monitors and Distributed Applications) coupled with 3rd party tools such as Savision Live Maps we can effectively model even redundant networking connectivity between locations and provide alerts on both the loss of a single link and complete connectivity loss between locations.