OUTDATED BROWSER WARNING
You are running an old version of Internet Explorer. You should Update Internet Exporer, or replace it with Firefox or Chrome. Curious why? Read more...

Rebuild NTFS, Fix STOP 0x00000071:
SESSION5_INITIALIZATION_FAILED

The short version: Go to the Advanced Boot Options Menu and choose "Last Known Good Configuration" and it will probably boot up (we hope), but you need to understand that something has corrupted your SYSTEM registry hive at some point and this error comes from damage to the part of it required to boot, so run some memory and hard drive tests. If the damage is already done, you may want to plan to reinstall, or maybe even try the risky filesystem repair operation we detail and try to walk you through at the end of this article.

We recently ran into this nasty BSOD on an XP machine which was once resurrected from a dead machine that had lots of bad capacitors on the motherboard. We performed a couple of disk checks in the past due to filesystem damage from said capacitors failing over time and mangling things here and there, and those had been just fine. However, there came a little snag recently: we have this machine run an automated Linux-based backup system (which is based on the freely available Tritech Service System) on a schedule overnight. The backup system connects to a server and uses "rsync" to back up the files to that server, and it's designed to NOT reboot the machine if something goes wrong so that the business can report to us that their backup system isn't operating properly.

Sure enough, the backup started to not reboot itself after completing. On the server, we could see that backups were finishing, so we scheduled a manual backup session with the personnel. rsync reported that some files could not be copied, and that concerned me, so we had them schedule a disk check. The disk check ran and rebooted the system...and we were greeted, in normal AND safe mode, with STOP 0x00000071: SESSION5_INITIALIZATION_FAILED. Ugh!

The only solutions offered by various websites that address this BSOD are to either boot to Last Known Good Configuration or do a clean reinstall of Windows. Unfortunately, our experience with this particular machine reinforces those conclusions, but we'd like to explain exactly what is going on so that the more technically inclined can learn from our experience. That has to begin with a brief discussion of the Last Known Good Configuration boot option, what it does, and why it "fixes" the problem. (Just to be clear, the workings of Last Known Good described here are observations and may not be 100% correct as far as how it operates, so take the general concepts away but don't trust the explanations as if every detail is correct.)

Windows has at least two complete copies of the system configuration (which includes drivers, services, and critical low-level configuration options such as paging files) at any given time. If you use the hardware profile feature of Windows, there will be additional configurations for each extra hardware profile. One copy is the "selected" or "active" copy, while the other is the Last Known Good copy. If you make a system-wide configuration change, it doesn't get copied over from the active copy to the Last Known Good copy until you've successfully rebooted the system completely (in other words, it worked, so it's known to be good, thus becoming the new Last Known Good Configuration...wow, something that makes sense!)

What happens to cause the SESSION5_INITIALIZATION_FAILED problem is damage to the active system configuration profile in the registry that is being used to tell the system how to boot up properly. This can happen in a number of ways. In our customer's case, the registry was probably damaged because of filesystem damage. Our customer's computer is a newer Vista computer with a hard drive that was pulled from an older XP system because 100% of the older computer's motherboard capacitors had blown out, mangling the filesystem and various files along the way. It seems that specific NTFS filesystem damage is the number one cause of STOP 0x71 errors, because CHKDSK seems to frequently be the trigger of STOP 0x00000071 and the BSOD triggers a "CHKDSK loop" where the system alternates between automatic disk checks and STOP 0x71 BSODs. When we first encountered the devil that is 0x71, we didn't manage to figure it out, and all signs pointed to reinstall, but we've found a possible (albeit very experimental) alternate solution.

Enter the fantastic live Linux known as the Tritech Service System. Once again, the custom Linux distribution from c02ware saves the day. In short, the customer is 700+ miles away, and this system had TSS as a boot option, so we had someone at their location boot TSS and remotely link us in by SSH. Once at a TSS prompt, we were able to mount their main server via CIFS, dump an exact image of the hard drive AND a complete copy of all files to it, download a compacted and compressed hard drive image of a 100% empty NTFS, unpack it, resize it, dump the files back to it, reboot into Windows, and fix the attributes and permissions that are inevitably lost during this process.

"Holy cow!" you might be thinking. We're going to explain, here and now in this very post, exactly how to do this clean NTFS rebuild thing to eliminate all possibility of unfixable filesystem damage, all with the Tritech Service System. Learning Linux command line stuff is beyond the scope of this article. You MUST have an external hard drive, preferably NTFS formatted, before attempting this.

WARNING: THE DIRECTIONS BELOW ARE NOT GUARANTEED TO WORK, THE PROCEDURE IS COMPLICATED, AND IF YOU AREN'T COMFORTABLE DOING IT, DON'T. If you find errors or problems, please leave a comment and we will review your concern as quickly as possible. DO NOT COMMENT JUST BECAUSE IT DOESN'T WORK UNLESS YOU CAN PROVIDE INFORMATION TO HELP FIX THE PROBLEM. ONCE AGAIN, THIS IS RISKY. YOU HAVE BEEN WARNED.

You need to boot Windows using Last Known Good Configuration, aborting any disk check that may attempt to run, and make sure you get into Windows completely and shut down cleanly. If you don't get that far, you're wasting your time and you've got no options but to do a clean reinstall anyway. The backup directions below could help you get your files if you're unable to boot, even if you have no choice but to reinstall.

First, you have to get a good NTFS disk image somehow, before you can even THINK of doing this. The easiest way to do this is to take any hard drive with Windows XP installed to the first and ONLY partition (no Dell diagnostics, no other partitions at all) and clone it to your own. This assumes your Windows OS is also installed this way, and performing this operation if Windows is installed to a partition other than the first is beyond the scope of this article. Additionally, the disk you're cloning from needs to be the same size or smaller than the one which will be cloned over.

Once you have a drive with an undamaged NTFS on it in your possession, you need to back up your files using Linux, and you need to understand that you could potentially be forced to do a reinstall if this doesn't work. There is no turning back once you start cloning or decompressing a disk image to your hard drive! BE PREPARED FOR THIS TO FAIL.

Boot TSS from a burned CD or USB flash drive, logging in as "root" with no password. Go ahead and plug in your external hard drive once the system starts to boot up. If you're not using TSS "base" then you'll need to right-click on the wallpaper and run "rxvt" at the top of the menu to get a terminal to work with. Feel free to maximize it to make life easier if you wish. Once you've got a terminal (with the root@tss:~ # prompt) you're ready to start working.

Step 1: What are my drives called?

To figure out where your internal and external hard drive are located, you need to look through a file called /proc/partitions, find the devices that look like they're the same size as your own, and write down which is which. Here's an example:

root@tss:~ # cat /proc/partitions
major minor  #blocks  name

 3     0   39062500 hda
 3     1   39061488 hda1
 22     0   39082680 hdc
 22     1   39082648 hdc1
 9     0   39061376 md0

Yours will vary; in this case, there are two internal hard drives, but let's assume you have "sda" and "sdb" instead, and that both are 40GB (as seen above, the sizes will be a little lower than the advertised capacity; 39062500 is a 40GB drive, while a 500GB drive might be in the high 400s.) "sda" is almost always the first internal hard drive. Anything after that (sdb, sdc, sdg, whatever) tends to be an external hard drive, a flash drive, or a card reader. Note which one looks like it's the internal and which is your backup drive before moving on.

Step 2: Copy all the files on the disk to an external hard drive

You need to mount both drives, and copy the internal's contents to a folder on the external. Modify the session below to match your internal and external device names (change "sdb1" to "sdg1" if your external is "sdg"). Ignore any "cannot read file" errors, as they are usually caused by the damage you're trying to fix. Learn by example (remember "sda" is the internal, "sdb" is the external):

root@tss:~ # cd /mnt
root@tss:/mnt # mount.ntfs /dev/sda1 sda1
root@tss:/mnt # mount.ntfs /dev/sdb1 sdb1
root@tss:/mnt # mkdir sdb1/backup
root@tss:/mnt # cd sda1
root@tss:/mnt/sda1 # cp -a * ../sdb1/backup/
root@tss:/mnt/sda1 # cd ..
root@tss:/mnt # umount sda1

You will not be shown any progress indicators, and the screen will surely go blank during this process (it's okay, don't panic! You can tap a key like Control or Shift to bring the screen back up if it blanks!) Wait for the prompt "root@tss:/mnt/sda1 #" to return, which indicates that the copy completed. If you see lots of errors with square brackets and large decimal numbers to the left of them (i.e. [245.20312349]) then one or both of the hard drives may have problems, and you should abort immediately (use Control-C to abort the copy if it's still in progress, then type "poweroff" at a prompt to shut down.)

Step 3: Clone another Windows installation, get your files back.

This part is difficult, because it involves getting a hard drive from another machine to temporarily attach to the faulty one. The easiest way to do this is with a super handy $20-$25 USB to IDE/SATA cable kit, but using part of a compatible drive enclosure will also work in a pinch. The process will be faster if you hook the drive up to an internal drive interface which is also beyond the scope of this article, but if you can pull it off, it will save some time. Assuming you've used a USB drive cable kit and it's already been plugged in and works correctly, we'll need to know what drive name was assigned to it. Run "cat /proc/partitions" again and note the new drive's name, then follow the procedure below. ntfsclone output is lengthy, and so is omitted; assume the drive name we're cloning from is "sdc." First, though, you need to figure out if you're going to have a geometry mismatch.

root@tss:/mnt/ # dd if=/dev/sda1 bs=512 count=1 | hexdump -C | head -n 2
1+0 records in
1+0 records out
512 bytes (512 B) copied, 1.25e-05 s, 41 MB/s
00000000  eb 52 90 4e 54 46 53 20  20 20 20 00 02 08 00 00  |.R.NTFS    .....|
00000010  00 00 00 00 00 f8 00 00  3f 00 ff 00 3f 00 00 00  |........?...?...|
root@tss:/mnt/ # dd if=/dev/sdc1 bs=512 count=1 | hexdump -C | head -n 2
1+0 records in
1+0 records out
512 bytes (512 B) copied, 1.25e-05 s, 41 MB/s
00000000  eb 52 90 4e 54 46 53 20  20 20 20 00 02 08 00 00  |.R.NTFS    .....|
00000010  00 00 00 00 00 f8 00 00  3f 00 f0 00 3f 00 00 00  |........?...?...|

The byte you need to compare is italicized, and is always found in the pattern which is underlined "3f 00 XX 00 3f". The first display shows ff, the second shows f0. They must be the same. If you have a mismatch as shown above, it's guaranteed to not work and you should stop now (type poweroff.) An advanced computer tech can fix this geometry-related problem with hexedit (and later with the expert mode of fdisk) but most people will not be able to do it safely. If they're both identical, you're clear to keep going, using this external drive as your source:

root@tss:/mnt/ # dd if=/dev/sdc of=/dev/sda bs=512 count=1000
3066360+0 records in
3066359+0 records out
1569975808 bytes (1.6 GB) copied, 0.983301 s, 1.6 GB/s
root@tss:/mnt # ntfsclone -O /dev/sda1 /dev/sdc1
(if it works, a percentage will run to 100 and success
will be indicated.  Do not proceed if you don't receive
indication that the cloning process succeeded!)

Step 4: Fix the size of everything

You'll need to redo the partition size to fit the entire disk, or you won't be able to use 100% of its capacity. This part may be hard to follow, so here's the short way to do this: run "fdisk /dev/sda" and hit the following keys: d [enter] n [enter] p [enter] 1 [enter] [enter] [enter] t [enter] 7 [enter] a [enter] 1 [enter] w [enter].

root@tss:/mnt # fdisk /dev/sda
The number of cylinders for this disk is set to 2000.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): d
Selected partition 1

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-2000, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-2000, default 2000):
Using default value 2000

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 7
Changed system type of partition 1 to 7 (HPFS/NTFS)

Command (m for help): a
Partition number (1-4): 1

Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.

Step 5: Resize the filesystem

root@tss:/mnt # yes | ntfsresize -f /dev/sda1
(long output omitted, look for a success message at the end)

Step 6: Copy your stuff back to the internal drive and reboot

root@tss:/mnt/ # mount.ntfs /dev/sda1 sda1
root@tss:/mnt/ # cd sda1
root@tss:/mnt/sda1 # find -type l -delete
root@tss:/mnt/sda1 # rm -rf *
root@tss:/mnt/sda1 # cp -a ../sdb1/backup/* .
root@tss:/mnt/sda1 # cd ..
root@tss:/mnt/ # umount sda1 sdb1
root@tss:/mnt/ # sync
root@tss:/mnt/ # reboot -f

Step 7: Fix some miscellaneous things in Windows

Once you've booted back into Windows, you'll be greeted by some instances of Notepad with a file "desktop.ini" opened, which you should close, as well as a window prompting you to restart due to having found new hardware, which you should not allow to reboot the system. Go to Start > Run... and type "cmd" without quotes, and hit [enter]. Then, type the following commands at the "C:\whatever\here>" command prompt:

C:\Documents and Settings\Owner>cd \
C:\>attrib +s +h +r boot.ini
C:\>attrib +s +h /s desktop.ini
C:\>attrib +s +h /s folder.htt
C:\>attrib +s +h +r ntldr
C:\>attrib +s +h +r ntdetect.com
C:\>exit

This will hide desktop.ini files which restores their function, as well as hiding other critical system files which should absolutely not be visible. Reboot now, and the system should be ready to go.

File security is reset such that everyone who can log on has full access to all files. Most people won't care, but you may need to manually change security settings on files if you're concerned about this.