It’s a topic that SharePoint Administrators have raised with me many times, and for some, seems to be somewhat of a dark art. Occasionally, this is simply down to a misunderstanding of the technology, or not having the confidence to dive in and pick it apart. So, today I will cover what are possibly the two most prevalent signs your Distributed Cache is unhealthy, what to look for, and how to fix it. Changes are that you’ve got a message in the Health Analyzer and you’re reading this because Bing (other search engines are available… but we don’t talk about them!) bought you here.
First thing to understand is that the Distributed Cache service for SharePoint actually runs under the Windows Service called AppFabric Caching Service. It stands to reason then that each SharePoint server you have configured for Distributed Cache has the AppFabric service running and set to Automatic start up. Golden rule; do not attempt to manage or recover Distributed Cache / AppFabric from Windows Services or from the AppFabric utility. Great tip, but you probably already knew this and are now thinking “how do I start looking at my Distributed Cache then?”. Well it’s quite easy; open a Management Shell session (with elevated privileges of course) and run the following cmdlet, replacing anything in [square brackets] with the relevant variables for your environment:
- Get-CacheHostConfig –ComputerName [ComputerName] –CachePort 22233
If everything is well, some configuration details will be returned (includes things like Cluster Port, Cache Port, Arbitration Ports, Service Name, IsLeadHost, and so on…). However, if you receive a message stating “Specified host is not present in cluster” one of two things occurred; you misspelt the ComputerName (or port), or the computer is in fact not present in the cluster. This is the first most common scenario in my experience. Re-registering a computer is simple enough, from that Management Shell, execute the following cmdlet:
- Register-CacheHost –Provider [ProviderName] –ConnectionString [ConnectionString] –Account “NT AuthorityNetwork Service” –CachePort 22233 –ClusterPort 22234 –ArbitrationPort 22235 –ReplicationPort 22236 –HostName [ComputerName]
Don’t know the Provider and ConnectionString parameters? Don’t panic, they are located in the HKLM registry hive under SoftwareMicrosoft AppFabricv1.0Configuration; so nab them from a Distributed Cache server that is healthy. Additionally, you could check the DistributedCacheService.exe.config file located in C:Program FilesAppFabric v1.0 directory, again from a known-working Distributed Cache server. Consultant tip: include this file (and possibly even the registry key for belt-and-braces) to your backup regime. You could change the Account parameter, but that’s up to you and environment specific. Assuming the cmdlet returned no errors, execute this cmdlet:
This will return the cluster along with Hostnames, Service Status’, Version, etc. Now for the second most common problem I’ve seen… One, or more, of the hosts show as “Down” or “Starting”:
HostName : CachePort Service Name Service Status Version Info
Server1.contoso.com:22233 AppFabricCachingService UP 3 [3,3][1,3]
Server2.contoso.com:22233 AppFabricCachingService DOWN 3 [3,3][1,3]
The first thing you should attempt (and please don’t dive in the deep end as many other blogs suggest), try this cmdlet to get things started
- Start-CacheHost –ComputerName [ComputerName] –CachePort 22233
If that doesn’t work, you may need to consider re-importing the cluster configuration. This is an easy, but slightly lengthy process, so will post it separately later this week. EDIT: You can check it out here Export/Import Distributed Cache Configuration in SharePoint 2013
Hope this helps our fellow SharePointeers out there!