You know it’s been a bad couple of days when as an Operations Manager administrator you find yourself looking back fondly to the good ol’ times when your environment only crashed with a “Root Management Server Unavailable” error… This one definitely fits that particular description. It started with one symptom: the inability to add web monitors to the environment. After digging in for a while we found that the actual list of things which were having issues were significantly more including:

  • Could not add web monitors (not displaying as monitored once they were created)
  • Could not reset the health state of anything on the RMS
  • Could not add agents (they would appear displayed as not monitored)
  • The OpsMgr Configuration file was not updating on the RMS
  • The Management pack folder on the RMS was not listing all management packs – only a few of the total on the RMS were appearing. On the MS servers in the environment the full list of management packs were appearing.

The key on this one was a 21906 warning message in the Operations Manager log on the RMS which was semi-frequently occurring in the event log.The following is the actual error content:

The request to synchronize state for OpsMgr Health Service identified by "2f6ee74a-14aa-97f0-477a-551cefb3b18a" failed due to the following exception "Microsoft.EnterpriseManagement.Common.DataAccessLayerException: Invalid column name Ratio_471F3B6E_B3BE_C1B9_23B7_392E47EB6BC2 for query MTV_SelectProperty_471f3b6e-b3be-c1b9-23b7-392e47eb6bc2.

   at Microsoft.EnterpriseManagement.Mom.DataAccess.QueryDefinition.GetColumnDefinitionBySourceColumnName(String sourceColumnName, Int32 resultSetIndex)

   at Microsoft.Mom.ConfigService.OpsMgrDataAccess.ConfigurationDataAccessor.QueryInstanceProperties(ReadOnlyCollection1 instances)

   at Microsoft.Mom.ConfigService.DataAccess.DatabaseAccessor.QueryInstanceProperties(ReadOnlyCollection1 instances)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.ConfigurationItems.Instances.CollectPublicProperties(ReadOnlyCollection1 identities, IConfigurationDataAccessor dataAccessor)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.ConfigurationItems.ConfigurationItemCollection2.CollectPublicProperties(IConfigurationDataAccessor dataAccessor)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.ConfigurationItems..ctor(StateContext stateContext, IConfigurationDataAccessor dataAccessor)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.CreateResponse(Managers managers)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.Managers.Synchronize(OnDoSynchronizedWork onDoSynchronizedWork)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.Execute(Managers managers)

   at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.Run(Guid source, String cookie, Managers managers, IConfigurationDataAccessor dataAccessor, Stream stream, IConnection connection)".

Using the following link (http://systemcentersolutions.wordpress.com/2009/11/12/troubleshooting-21023-events/) I was able to determine that one of the management packs (which will remain nameless) was actually causing all of these problems.

I then used the following SQL query to find out the management pack that was causing the problem (where the managedtypeid is the second guid in the original error):

SELECT mp.ManagementPackId, mp.MPName,
mp.MPFriendlyName, mt.TypeName, mt.ManagedTypeTableName
FROM ManagementPack mp
LEFT JOIN ManagedType mt ON mt.ManagementPackId = mp.ManagementPackId
WHERE mt.ManagedTypeId = ‘471f3b6e-b3be-c1b9-23b7-392e47eb6bc2’

Once this management pack was removed from the environment, all issues identified at the start of this blog entry were began to function correctly.

Summary: Having strange issues on your RMS? Can’t add agents or web monitors? Check for a 21906 error in the Operations Manager log on the RMS and if it’s there translate the GUID to identify which management pack may be causing the issue.