Recovering Truncated or Corrupt Tar Archives
I was unfortunate enough to use resize2fs to resize an ext3 partition. The result, which at first appeared OK, was a corrupt filesystem.
Using SSHFS, I mapped ~martin on Andrew’s laptop to /media/sshfs on mine. I then told tar to make an archive of what was on my partition, and save it to the SSHFS. It errored out midway because of the severely corrupted filesystem, but I didn’t think it wouldn’t present a problem because the files in tars are simply stored back to back - it is a very basic archive tool.
After running badblocks, which was all clear, then formatting the partition to ext3 again, I attempted to extract the tar. GNU tar consumed CPU cycles for ages without writing any files, then complained about corruption. I assume it was walking the tar to check it wasn’t corrupt. This, as you can imagine, is not very handy. I perused the manpage looking for a switch that would disable the scan and just extract what it can, upto the corruption, but I could not see such a switch
I found a website that suggested finding where the last intact file in the tar ended, and feeding tar just that part of the file. They supplied a perl script to do the job.
The problem is that this perl script was unbelievably slow, and had probably only been tested on small files, not my 30GB tar, so Andrew wrote this python script to do the same thing, using information from Wikipedia and from BestSolution’s perl script. Andrew’s version started searching from the end of the tar, not the beginning, because we knew that my tar was corrupt at the end.
After a few minutes of watching it make a non-corrupt tar, we realised we were going to have to load the whole tar from the disk twice - once to repair it, and again to actually extract it. Surely a better way would be to just start extracting the truncated one, and give up when we get to the truncated file. Andrew’s Aerauntar does just that.
Usage:
python aerauntar.py archivename destinationfolder
So the moral of the story is not to tar backups from corrupt filesystems, always pay attention to errors and never assume that a program will operate as you think it will.
