Hyper-V 3.0: Unable to contact cluster service on ‘cluster’

2017-12-07T12:04:20+00:00 November 26th, 2012|Azure, Cloud, Windows|

Hi There,

I spent some time recently upgrading a WS2012 Hyper-V 3.0 cluster from RC to RTM. This was a four node cluster and my process was to evict a node, rebuild and then add the node back into the cluster. All went relatively smoothly until I wanted to add the cluster back in to SCVMM.

I ran the add host wizard to re-add the cluster and received the following error;

Add-SCVMHostCluster : Unable to contact cluster service  <clusterName>

I started hitting the logs on the VMM server and could see the following event in the System log:

DCOM was unable to communicate with the computer <ClusterName> using any of the configured protocols

I could also see similar DCOM related errors on each of the four nodes.

I started by checking the service account I was using for the vmmservice.exe on the VMM server was a member of local administrators on all nodes in the cluster. I then checked that the DCOM permissions were allowing remote access via compmgmt. This all played out fine.

I then noticed another sporadic error in the system log on the VMM server:

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server <serverName>$. The target name used was RPCSS/<ClusterName>

The serverName was the cluster group owner. I then realized I had not reset the computer accounts when joining the nodes that had been reinstalled to the domain. The next logical step was to reset the computer account of all the nodes, which I did by running the following command (on a domain member as a domain administrator, or with delegated perms):

netdom reset ‘machinename’ /domain:’domainname

Reviewing the event log the kerborous error messages now seemed to have disappeared, however I was still unable to add the cluster in to VMM.

Given the error message was specific about VMM being unable to contact the cluster service I suspected that due to the errors between the nodes and the domain that the cluster virtual object had not registered correctly. This is fixed by accessing Failover Cluster Manager, selecting the cluster name and right clicking on the cluster object in the middle pane and choosing ‘Repair’. I then took the name and IP offline and bought it back online. Checking ADSIedit I could see that the cluster object had been updated (which prior to this it had not since it was created)

I rebooted the VMM instance and attempted the cluster add again and it now worked 🙂

Thanks

Steve