SMEServer LVM Recovery
Contents
- 1 LVM2 Recovery (SMEServer 8)
- 1.1 Document Format
- 1.2 Recovery Target
- 1.3 Knoppix to the Rescue
- 1.4 Taking Remote Control
- 1.5 Find Disk Name and Status
- 1.6 Locate Device Boot Errors
- 1.7 SMEServer Volume Group Info
- 1.8 Configure and Start Disk Admin (mdadm) and RAID Daemons
- 1.9 LVM2 Availability
- 1.10 Disk Admin Status
- 1.11 No Backup of LVM Metadata and Yet...
- 1.12 Onward LVM Daemon
- 1.13 Prepare the Mount Point
- 1.14 Discover LVM Mount Error
- 1.15 Get Disk Block Size
- 1.16 Recover LVM SuperBlock
- 1.17 Mount the Corrected LVM
- 1.18 SMEServer Data Info for recovery
- 1.19 Usage in other Appliance Distributions
- 2 References
LVM2 Recovery (SMEServer 8)
Document Format
In the code boxes below:
All comments will start with //. All commands are prefixed with # (except for the contents of the main file). The ones without any prefix are the responses.
Recovery Target
This recovery was done on a 80 GB IDE Hard Disk on a P3 1U server with 512 MB SDRAM running SMEServer v8 beta 3. The server crashed due to hard disk failure on 21st April, 2010 in Singapore while I was in India.
The reason why I did not attempt to use the faulty hard disk as a slave in a working SMEServer for data recovery is that the Volume Group would be named main in both instances and would need some more surgical intervention besides the risk of corrupting a known good install as well. The SMEServer 8.x installs (possibly even later v7.x as well) LVM2 file system for it's storage area by default and will degrade to raid on single disk mode if raid is absent.
Knoppix to the Rescue
My associate in Singapore downloaded the Knoppix v6.2.1 ISO and burnt it to a CD. He then booted the server with the Knoppix LiveCD with the crashed hard disk (still spinning) insitu. The Knoppix was given the IP 192.168.1.10.
Opening up a command prompt on the Knoppix GUI, he issued the following commands:
# sudo root # passwd root root # /etc/init.d/ssh start
SSH requires a password for any remote user and hence the password root was set for root user above. This is very highly insecure and not normally recommended, but on a protecetd LAN for recovery purposes, it was acceptable.
Taking Remote Control
On a windows machine (it could have been another linux machine console as well) on the same LAN segment (having an IP of 192.168.1.20) with VNC forwarded to me in India, I opened up PuTTY and SSHed into the Knoppix at 192.168.1.10 logging in as root with password root set above.
Find Disk Name and Status
Check the boot sequence for the hard disk label (generally in /var/log/messages)
# dmesg | grep sda 156299375 512 - 80 gb - 74 gib sda: sda1 sda2
Knoppix refers to the hard disk as sda, while SMEServer reffered to it as hda (for ide hard disk).
We now make sure that udev has recognised the hard disk:
# udevadm info --query=all --name=/dev/sda P: /devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0/block/sda N: sda W: 36 S: block/8:0 S: disk/by-id/scsi-SATA_ST380011A_5JV5BNC8 S: disk/by-id/ata-ST380011A_5JV5BNC8 S: disk/by-path/pci-0000:00:1f.1-scsi-0:0:0:0 E: UDEV_LOG=3 E: DEVPATH=/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0/block/sda E: MAJOR=8 E: MINOR=0 E: DEVNAME=/dev/sda E: DEVTYPE=disk E: SUBSYSTEM=block E: ID_ATA=1 E: ID_TYPE=disk E: ID_BUS=ata E: ID_MODEL=ST380011A E: ID_MODEL_ENC=ST380011A\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20 E: ID_REVISION=3.06 E: ID_SERIAL=ST380011A_5JV5BNC8 E: ID_SERIAL_SHORT=5JV5BNC8 E: ID_ATA_WRITE_CACHE=1 E: ID_ATA_WRITE_CACHE_ENABLED=1 E: ID_ATA_FEATURE_SET_HPA=1 E: ID_ATA_FEATURE_SET_HPA_ENABLED=1 E: ID_ATA_FEATURE_SET_PM=1 E: ID_ATA_FEATURE_SET_PM_ENABLED=1 E: ID_ATA_FEATURE_SET_SECURITY=1 E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0 E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=0 E: ID_ATA_FEATURE_SET_SECURITY_FROZEN=1 E: ID_ATA_FEATURE_SET_SMART=1 E: ID_ATA_FEATURE_SET_SMART_ENABLED=1 E: ID_ATA_DOWNLOAD_MICROCODE=1 E: ID_SCSI_COMPAT=SATA_ST380011A_5JV5BNC8 E: ID_PATH=pci-0000:00:1f.1-scsi-0:0:0:0 E: DKD_PRESENTATION_NOPOLICY=0 E: DKD_MEDIA_AVAILABLE=1 E: DKD_PARTITION_TABLE=1 E: DKD_PARTITION_TABLE_SCHEME=mbr E: DKD_ATA_SMART_IS_AVAILABLE=1 E: DEVLINKS=/dev/block/8:0 /dev/disk/by-id/scsi-SATA_ST380011A_5JV5BNC8 /dev/disk/by-id/ata-ST380011A_5JV5BNC8 /dev/disk/by-path/pci-0000:00:1f.1-scsi-0:0:0:0
Locate Device Boot Errors
Hard disk error information was culled out of the dmesg command's output.
Do not get distracted by dmesg errors due to lack of media in the CDROM drive such as:
[ 24.317523] sr 1:0:0:0: [sr0] Unhandled sense code [ 24.317531] sr 1:0:0:0: [sr0] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 24.317539] sr 1:0:0:0: [sr0] Sense Key : Medium Error [current] [ 24.317549] sr 1:0:0:0: [sr0] Add. Sense: Unrecovered read error [ 24.317560] sr 1:0:0:0: [sr0] CDB: Read(10): 28 00 00 05 61 1c 00 00 01 00 [ 24.317577] end_request: I/O error, dev sr0, sector 1410160 [ 24.317587] Buffer I/O error on device sr0, logical block 352540
The errors we are interested in are:
[ 2559.293548] sd 0:0:0:0: [sda] Unhandled sense code [ 2559.293553] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 2559.293561] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] [ 2559.293571] Descriptor sense data with sense descriptors (in hex): [ 2559.293576] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 2559.293593] 00 03 cd cf [ 2559.293601] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed [ 2559.293614] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 03 cd ad 00 01 00 00 [ 2559.293631] end_request: I/O error, dev sda, sector 249295
SMEServer Volume Group Info
The volume group in the SMEServer to be recovered is referred to as main. The block device configuration is stored in /etc/mdadm/mdadm.conf in Knoppix while it is stored in /etc/mdadm.conf in the SMEServer.
Configure and Start Disk Admin (mdadm) and RAID Daemons
Check mdadm and populate the config file for it.
# mdadm --examine --scan /dev/sda* ARRAY /dev/md2 UUID=e2fe405e:8ec62819:1b6664b2:beb205b3 ARRAY /dev/md1 UUID=fc6886dc:e37cd2be:e425d83a:c48045a0 mdadm --examine --scan /dev/sda* >> /etc/mdadm/mdadm.conf
Whether we have raid or not, the following commands are essential now:
# /etc/init.d/mdadm start # /etc/init.d/mdadm-raid start
LVM2 Availability
Knoppix has lvm2 daemon installed and started by default. If you use a Knoppix version earlier than 5.0.1, the following will be needed:
# modprobe dm-mod # apt-get update # apt-get install lvm-common lvm2 // If you needed the above steps, remember that the tools are in /lib/lvm-200 // but if you don't want to type the complete path, you may link them into /usr/bin. # lndir /lib/lvm-200/ /usr/sbin/
Disk Admin Status
Now we check that the mdadm is running and get the status:
# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda1[0] 104320 blocks [2/1] [U_] md2 : active raid1 sda2[0] 78043648 blocks [2/1] [U_] unused devices: <none>
In the SMEServer, md1 is the /boot partition and md2 is the LVM's Physical Volume that has the Volume Group main which contains the Logical Volumes swap and /root mapped as /dev/main/root and /dev/main/swap and symlinked to /dev/mapper/main-root and /dev/mapper/main-swap respectively - all of them are in the /media/sda physical device. The /dev/mapper/control is also present.
No Backup of LVM Metadata and Yet...
The server admin had not preserved a copy of the SMEServer's /etc/lvm/backup/main file which contains the Volume Group's meta data. However, as good design sense would have it, the creators of the LVM2 stored multiple copies of this metadata in the Volume Group's data area within the first 128 KB or so (it depends on your block size). We now look for the metadata with:
dd if=/dev/sda2 bs=512 count=255 skip=1 of=/tmp/sda2.txt
We now create the SMEServer's metdata file (named main to be uploaded to /etc/lvm/backup folder) from the info in the sda2.txt file above and we get something like:
# Generated by LVM2 version 2.02.32-RHEL5 (2008-03-04): Sat Jun 6 03:00:06 2009 contents = "Text Format Volume Group" version = 1 description = "Recovered using Knoppix" creation_host = "localhost.localdomain" # Linux localhost.localdomain 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 creation_time = 1244257206 # Sat Jun 6 03:00:06 2009 main { id = "ROZZEI-bVsY-dtER-DbpA-qCLp-q6E5-Zl6DbE" seqno = 3 status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 65536 # 32 Megabytes max_lv = 0 max_pv = 0 physical_volumes { pv0 { id = "9ed9O6-2Deq-vGFT-XFCL-OFG2-2k6x-p9XNJx" device = "/dev/md2" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 156087296 # 76.1681 Gigabytes pe_start = 384 pe_count = 2381 # 76.1562 Gigabytes } } logical_volumes { root { id = "teCkdT-14JW-wLKd-w7YU-LM06-4DUQ-sJIZgK" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 2349 # 76.1562 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } } swap { id = "mOscy0-XODU-D5ts-7stc-yukW-blyG-NmADyV" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 32 # 1024 Megabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 2349 ] } } } }
We see above that the last 1 GB of the VG main is allocated for the swap area, while the rest of the initial space in the VG main is allocated for the /root area.
# mkdir /etc/lvm/backup # nano /etc/lvm/backup/main // Copy / Paste the info for the Volume Group's metadata extracted from the /tmp/sda2.txt file // Alternatively, prepare the file from Desktop machine and upload it to the said folder. // To preserve the indentation, I edited a known good SMEServer's main file with the parameters obtained above.
Onward LVM Daemon
To ensure a clean detection of the LVMs, I decided to restart the lvm2 daemon (I only found references to the lvm daemon on the net)
# /etc/init.d/lvm2 stop # /etc/init.d/lvm2 start
Scan for Volume Groups:
# vgscan Reading all physical volumes. This may take a while... Found volume group "main" using metadata type lvm2
Scan for Physical Volumes:
# pvscan PV /dev/md2 VG main lvm2 [74.41 GB / 0 free] Total: 1 [74.41 GB] / in use: 1 [74.41 GB] / in no VG: 0 [0 ]
Change to the Volume Group main:
# vgchange main -a y 2 logical volume(s) in volume group "main" now active
Scan for Logical Volumes:
# lvscan ACTIVE '/dev/main/root' [73.41 GB] inherit ACTIVE '/dev/main/swap' [1.00 GB] inherit
Display the detected Volume Groups' info:
# vgdisplay --- Volume group --- VG Name main System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 74.41 GiB PE Size 32.00 MiB Total PE 2381 Alloc PE / Size 2381 / 74.41 GiB Free PE / Size 0 / 0 VG UUID ROZZEI-bVsY-dtER-DbpA-qCLp-q6E5-Zl6DbE
Prepare the Mount Point
Make a mount folder:
# mkdir /mnt/data
Discover LVM Mount Error
Mount the /root LV in the VG main:
# mount /dev/main/root /mnt/data/
The mount failed as the Volume Group's superblock was corrupted as seen from:
// A read only check of the next inode integrity. # fsck.ext3 -n /dev/main/root fsck.ext3: Too many levels of symbolic links while trying to open /dev/vg0/home The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device>
Get Disk Block Size
The block size for the device md2 is obtained as 4096 bytes (4KB) from:
# fdisk -l Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 fd Linux raid autodetect /dev/sda2 14 9729 78043770 fd Linux raid autodetect Disk /dev/md2: 79.9 GB, 79916695552 bytes 2 heads, 4 sectors/track, 19510912 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md2 doesn't contain a valid partition table Disk /dev/md1: 106 MB, 106823680 bytes 2 heads, 4 sectors/track, 26080 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md1 doesn't contain a valid partition table
All allusions to sda* executed in Knoppix above will be hda* (if IDE) when executed in SMEServer.
Recover LVM SuperBlock
The man page for e2fsck for the -b option states:
-b superblock Instead of using the normal superblock, use an alternative superblock specified by superblock. This option is normally used when the primary superblock has been corrupted. The loca- tion of the backup superblock is dependent on the filesystem’s blocksize. For filesystems with 1k blocksizes, a backup superblock can be found at block 8193; for filesystems with 2k blocksizes, at block 16384; and for 4k blocksizes, at block 32768. Additional backup superblocks can be determined by using the mke2fs program using the -n option to print out where the superblocks were created. The -b option to mke2fs, which spec- ifies blocksize of the filesystem must be specified in order for the superblock locations that are printed out to be accurate. If an alternative superblock is specified and the filesystem is not opened read-only, e2fsck will make sure that the primary superblock is updated appropriately upon completion of the filesystem check.
Since our block size is 4K, the alternate superblock offset is 32768. We now recover with:
# e2fsck -b 32768 /mnt/data/root
We will get several errors where inodes are corrupt and data cannot be wholly read (short read). Just answer y (yes) to all questions looking for messages which state that a certain file is deleted / cleared due to corruption. Finally some will get saved to lost+found folder.
Once the main superblock is now in sync with the acuals after the e2fsck, we are good to go:
Mount the Corrected LVM
Mount the /root LV in the VG main:
# mount /dev/main/root /mnt/data/
Viola! We can now sftp into /mnt/data/ and access / recover all the files needed.
SMEServer Data Info for recovery
For an SMEServer, the mysql data is now at '/mnt/data/var/lib/mysql'. On restoring to a fresh install, the user database folders and it's contents must all be chowned by mysql:mysql and chmoded to 700. The contents of the user database folders should be chmoded to 660. It is best not to restore the information_schema, test and mysql databases and recreate the users and privileges afresh as the version in a fresh later install may be in a better known state. Do a complete repair of all databases after restoration so that all indexes will be recreated afresh. The SMEServer's ibays also need to be recovered and restored. They will now be at '/mnt/data/home/e-smith/files/ibays/'. In SMEServer 8 beta 5, for internet resolvable domains, the php scripts need to change their host from 'localhost' (if so) to '127.0.0.1' as otherwise, the local socket file will be assumed.
Usage in other Appliance Distributions
With small variations in config file locations, this method can be used to recover LVMs in
References
- Block Sizes
- LVM2 fsck issues - 1, 2
- system-config-lvm
- Bad Magic Number - 1, 2
- e2fsck man page
- Mounting LVM2
- LVM HOWTO dated 2006
- Logical Volume Management
- LVM in Rescue Mode
- LVM RAID
- System recovery with Knoppix
- Knoppix SSH Password
- Mounting LVM1
- Auto Mount USB with udev
- RAID1 LVM Knoppix Recovery
- SMEServer LVM recovery
- Access Fedora LVM from Ubuntu
- Backup / Restore LVM Metadata