I have done a few domain upgrades over the years, but during a recent project where we had introduced new Windows 2016 domain controllers (DCs) for a client I discovered something I had never come across before.
At a high level, the normal things I would look out for when decommissioning domain old controllers are as follows:
- Ensure your Flexible Single Master Operations (FSMO) roles have been moved to a new DC or DC’s.
- Make changes on new PDC emulator to point to appropriate time source.
- Account for manually created replication objects present in AD sites and services, ideally the Knowledge consistency checker should be managing all replication.
- Account for manually allocated IP bridgeheads in AD sites and services, again ideally let the Inter Site Topology Generator manage these.
- Ensure all DNS server settings on servers with static IPs, set on devices and set in DHCP scopes have been updated to reflect the new Domain Controllers DNS service before decommissioning the old.
- Network monitor – change any services or systems that point to the old domain controllers to point to new domain controllers; for example, applications pointing directly at a DC to perform LDAP queries.
- DFS-N – We commonly see old domain controllers hosting DFS namespaces. These namespaces will need to be updated with servers that exist or you could end up not being able to access the namespace.
- Exchange – In very old versions of Exchange you could manually set which global catalog servers Exchange used (not best practice) I have fallen foul of this when removing DCs and killed Exchange. This setting should be set to automatic in Exchange.
Having checked off all of these it was going well. We had successfully installed new domain controllers and decommissioned several old ones and were down to the last two old DCs to be removed. To assess the impact of finally removing the last two old DCs, we did a shutdown test on them. That evening when they were offline we tested everything we could, and all systems seemed to be working. We opted to leave these old DCs offline whilst the general user population started working the next morning to be doubly sure of the impact.
The next morning, everything looked good; BUT out of 70 or so direct access (DA) users, 6 or 7 could connect but not access any resources.
- But we had tested Direct Access in the evening….
- And it had worked….!
- And 60+ users were working fine on Direct Access at the same time as the handful of failures…?
To find out how we resolved this, look out for part 2!