Domain Controller in USN Rollback

USN Rollback?  WTF is That?

So, you rolled into work this morning, got your coffee, and settled in at your desk.  It’s a nice Friday morning and the weekend is looking great.  Life is good.  You didn’t even have to do the domain controller patching last night.  Your junior admin did.

Then…

You open your email… and you find a half-dozen tickets... and they all indicate several newly-provisioned users that were created earlier this morning can’t login.  You think to yourself, “no biggie – I’ll just force replication and move on”.  Easy-peasy, right?

Wrong.

After forcing AD replication among your Active Directory domain controllers – several times, you realize that the new accounts refuse to show up on any other DCs except for the one that your helpdesk typically connects to when provisioning users.  So, you think to yourself, “Is this how my day is going to go?”

Yes.  Yes it is.

You start troubleshooting by looking at the domain controller that the helpdesk uses and find that the Netlogon service is paused.  Well, that’s weird.  Being the highly-skilled troubleshooter you are, you opt to NOT try and restart Netlogon and, instead, track down the event logs for more information – only to find some event ID 2013’s, 8456’s, and 8457’s floating around.  Not good.

After some further research, you suspect that your domain controller is in USN Rollback mode.

WTF is USN Rollback?

USN Rollback means that your day just got complicated.

Active Directory uses USN Rollback to protect against replicating stale info throughout AD by disabling replication to/from an affected DC.  For example, if you take a snapshot of a domain controller VM, patch the VM, and then roll back to the snapshot because of issues with patching, the domain controller is likely to go into USN Rollback – especially if any AD replication occurred prior to rolling back to the snapshot.

Guess what your junior admin did last night…

Your Windows 2008 R2 domain controller is now refusing to replicate to or from any other domain controllers – and any changes that have been made on the affected DC are now stuck on that DC and will not be replicated to the rest of Active Directory.  Running repadmin /replsummary shows that replication is failing to/from the affected domain controller.  This all explains why the new users cannot login.

To confirm your suspicions, you open Regedit on the affected DC and browse to the following path:

HKLM\System\CurrentControlSet\Services\NTDS\Parameters

While in there, you see a key that says “DSA Not Writable” with a value of “4”.  Now you know, for certain, that your Active Directory domain controller is definitely in USN Rollback.  So, now what?  Well, fix it!  You have system state backups for the DC in question, don’t you?  If so, it’s time to break them out so you can just do a non-authoritative restore on the DC and re-enable replication.

Although USN Rollback isn’t terribly difficult to recover from (if you are prepared), it DOES become a bit more of a headache if:

  1. You are doing what lots of companies do and multi-tasking your domain controllers
  2. You have no good system state backups
  3. You have no idea what the DSRM password is (has anyone EVER actually documented this??)
  4. You are dealing with any combination of A, B, and C

In your case, you realize that your affected domain controller also hosts your internal Certificate Authority.  Peachy!  Oh, you also have no idea what the DSRM password is for the affected DC.  It was never documented.  Yikes.

Without the DSRM password, a non-authoritative restore is out of the question.  That would have been the easiest solution.  As such, you are now left with only one choice – demote the affected domain controller and then re-promote it.  Your problem, however, is that you cannot demote an Active Directory domain controller if Certificate Services is installed.  You have to uninstall Certificate Services first.  Ugh.

Although it sounds scary on the surface, uninstalling Certificate Services isn’t too terrible, provided you aren’t doing anything crazy.  Microsoft has a great document here that covers the process of moving your CA to another server, which by the way, is what I recommend in this particular case.  If you have to pull Certificate Services off of your domain controller anyway in order to demote/re-promote it, you might as well move the CA to it’s own server as a course of best practice.

After migrating your Certificate Authority to a new server, it’s time to get the DC fixed.

Since you are still using an older OS (Windows 2008 R2), go ahead and launch dcpromo /forceremoval from a command prompt.  You are going to have to use the /forceremoval switch because the affected DC cannot replicate out.  A regular dcpromo is not going to allow you to demote the domain controller gracefully.

During the forced demotion, you are going to see a few warnings that you have to agree to.  The two you are most interested in are the DNS cleanup and whether or not this is the last DC in the forest.  You DO want to remove the DNS zones from this DC.  However, do NOT tell it that this is the last domain controller.  Bad things will happen.  Once the force demotion is complete, the server will reboot and no longer be part of the domain.  Shut it down.

At this point, go into Active Directory Users and Computers on a healthy DC and delete the record for your demoted DC from the Domain Controllers container.  In addition, go into AD Sites and Services, expand the demoted server, and delete the NTDS Settings object under it.  Once you’ve done that, delete the server’s object from AD Sites and Services altogether.  After performing these steps, your AD should be clean of any remaining metadata referencing the demoted domain controller.

Now that the metadata is cleaned up, you can go ahead and turn the demoted DC back on, re-join it to the domain, and re-promote it to a DC.  Once you’ve done that, you can use repadmin /replsummary and repadmin /showreps to confirm replication to/from it is working again.  You can also refresh AD Sites and Services to confirm that the Active Directory KCC is rebuilding replication links to/from the newly-promoted domain controller.

Once you’ve confirmed that replication is working again and that the DC is no longer in USN Rollback, you can go back to having a good day – after, of course, recreating the six user accounts that were created on the affected DC while it was in USN Rollback since those changes were lost when you performed the forced demotion.

Folks, there are a few key takeaways from this scenario:

  • Stop sharing your DCs with other services
  • VM Snapshots are bad news when it comes to DCs
  • ALWAYS record the DSRM password when you promote a DC (even though nobody else does)

This Active Directory administrator’s life became significantly more difficult on a beautiful Friday morning because he failed to follow some simple best practices.  Had he (or she) followed basic best practices, this entire exercise would have been reduced to a simple non-authoritative restore of the affected DC that would have required no more than an hour of time.  Instead, the process required significantly more effort, lots of hand wringing, and a wasted Saturday.

This has been an Active Directory Public Service Announcement.

Click here to join the Understanding Azure Facebook group or here for the latest Azure practice questions, answers, explanations, and reference materials.

Thomas Mitchell

Tom is a 20+ year veteran of the IT industry and carries numerous Microsoft certifications, including the MCSE: Cloud Platform and Infrastructure certification. A Subject Matter Expert in Active Directory and Microsoft Exchange, Tom also possesses expert-level knowledge in several other IT disciplines, including Azure, Storage, and O365/Exchange Online. You can find Tom at his website, on LinkedIn, or on Facebook. Need to reach him by phone? Call 484-334-2790.