This was an interesting one! Imagine the scenario if you will, I am miles from home and very much looking forward to a holiday in the sun at the end of the week, so I really want to get their Office 365 single sign on solution deployed and tested with no issues so I can switch off for a week.
All of our prerequisites were in place and we had a plan we were all happy with, so we are ready to go!
Near enough our first task was installing the first Windows Server 2012 R2 ADFS server. After a successful installation we decided to give it a test by browsing to the ADFS sign in page to make sure it was behaving in the way we expect:
We simply click the sign in button in the form that appears and if everything is ok we should see a response back saying “You are signed in”. Happy days J
In our case we were getting an error back saying “An error has occurred”. Not good.
After looking at the basics like proxy exceptions and forcing AD replication we started to dig into the event viewer and see what was being logged. Every time we attempted to sign in on ADFS we saw the same two errors logged Event ID 365 and Event ID 111. L
Looking into the text of the 365 error we could see this at the top of the error:
System.TypeInitializationException: The type initializer for ‘Microsoft.IdentityServer.Service.SecurityTokenService.MSISSecurityTokenService’ threw an exception. —> System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.DeviceRegistration.ADAdapter.ADStore.GetDnsHostNameFromNtdsSettingDN(IDRServerContext context, String distinguishedName)
The important statement is Object reference not set to an instance of an object. This means that it is not getting a response it is expecting back from Active Directory and it has run home to momma with this generic error code.
We raised the case with Microsoft Office 365 support who escalated it to an identity specialist who got us to take both fiddler and netmon traces after checking the ADFS deployment thoroughly and deciding it was not the ADFS deployment that was causing this issue. These were uploaded to Microsoft for analysis but time was pressing on this and I decided to look at the netmon traces myself while awaiting the results back from Microsoft support.
I am no expert in netmon traces and rarely have to look at them if at all, but a quick search on the Internet told me how to apply basic filters and I started from the end of the log and worked my way back. From the traces I saw a block of responses which when you looked deeper into the package using netmon, you could see they were LDAP lookups of the Active Directory sites.
We asked the infrastructure team who looked after the DCs (as we had not access to them) to see if they could see anything obvious. At first it seemed the AD was functional but a quick look in one of the domain controllers in the affected sites showed that the site itself was not defined in any Active Directory site connections and event viewer was showing many errors about this. This was just an oversight when creating the new Active Directory deployment. After adding this in and forcing replication ADFS sprung into life and worked as expected.
The moral of the story is that before installing ADFS into the environment you should be performing a basic Active Directory health check to ensure that there are no underlying errors with AD so we can avoid any of these issues when deploying ADFS. If you are coming across these errors after installing ADFS, check your replication status and domain controller health in Active Directory as more than likely this is the cause of you issue.
We at risual have an Active Directory health check service that we can perform which produces a report on any remediation or recommendations for the Active Directory deployment.
Hope this helps.