Domain Controller decommissioning – a new problem (part 2)

In the previous blog I outlined an issue we came across with Direct Access when decommissioning a customer’s final legacy domain controllers after a domain upgrade. To summarize part 1 our own DA testing had worked, but come the morning and live users logging on via DA, 90% of clients could connect and access corporate systems with the other 10% having issues being able to connect but not access anything.

My chain of thought at the time was as follows: if there was some sort of server side, firewall or network issue my gut was saying DA probably would have failed at the connection stage for everyone and that it was likely a difference on the client. As the problem DA clients could connect but not get to any resource, it was possibly DNS related. We got Direct Access logs from a working DA client and a client that could not connect, and performed a comparison.

Comparing the two direct access logs side by side, something interesting appeared; the DirectAccess Policy-ClientToInfra section of the log there were “endpoints” which are the IP addresses of the domain controllers DA uses.

The working client had an up to date list of endpoints that included new Windows 2016 domain controller IP addresses.
The failed client still had a list of IP addresses that contained all old DCs (DNS servers) and no new ones (old DCs now either decommissioned or offline thanks to the shutdown test).

We double checked on the new domain controllers that were online that the DA client GPOs were consistent on all DCs and showed an up to date list of endpoints, which they did.

So the issue – it appeared a handful of clients had not done a gpupdate since the new DCs were added and were therefore “orphaned” pointing to the old DCs in their DA client GPO that had been either fully decommissioned, or temporarily shut down.

The test laptops we were using to test DA in the office in the evening had been on the network and having gpupdates regularly, which explains why when we tested DA it had all worked fine as they had an up to date endpoint list.

To fix this – we brought the old domain controllers back up and this allowed DA to function properly on a failed DA client, it could access resources once connected. We then forced a gpupdate on the same machine and observed the DA client GPO reflect the version on the DCs that had the up to date endpoint IP addresses from the server.

The main reason this happened was probably the fact we did add new domain controllers and commence decommissioning of the old ones in a fairly short space of time; in this time, some clients had not connected to the network. If a period of coexistence of new 2016 DCs and old ones had gone on for longer there would have been a greater chance gpupdates would have happened, and all clients would have updated to reflect the most up to date DA Client settings.

If we had fully decommissioned all the old DCs rather than temporarily shutting down the last two, the orphaned DA clients would have had to come in to the office and do a gpupdate when on the office network in order to be able to use DA going forward.

As it was we kept the old DCs online for a while; there were several VIP staff out and about and we did not want to risk giving them connection problems. We ended up waiting another few weeks before final decommissioning of old domain controllers and the local team in that time ensured that all clients had been able to update their DA client GPO before the final step was done.

About the author

risual