Sunday, June 10, 2007

bad pbr sig


I had posted about this error earlier, then I deleted the post because I thought it was stupid. Then, I noticed alot of people searching google for this post after I removed it. So I am posting my solution to help out.

I have come across this problem only on Sun hardware, but it may occur on others. I encountered it running CentOS linux, using grub.

This error, Bad PBR Sig, means that your Primary Boot Record has become borked. This normally happens when you are installing an OS on a machine that the OS is not familiar with, and it writes the record to the wrong place.

In my case, I was installing a thumper (sun x4500). My version of linux doesn't see all 48 hard drives in it, and so the install process is pretty lengthy. First you install an OS to /dev/sda, then you have to copy the OS, once it's installed, to /dev/sdy, which is the first bootable disk on the system. In doing so, if grub writes it's boot record to /dev/sda, then you won't be able to boot, as the BIOS don't see /dev/sda as bootable, just sdy and sdac. I think this is due to the layout of the disks and their closeness in position to the scsi channel.

Anyways, the way I fixed it was to use grub to write out the correct record to the correct drive.

You need to write to /boot/grub/device.map, and tell grub which hard drive is which. Mine looked like this:

(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdy
(hd2) /dev/sdac


I installed the os first to hd0, moved it over to hd1, and mirrored it to hd2. I will be booting from a mirrored boot partition.

in grub, you set everything up like this:

type grub;

in grub:

grub> device (hd1) /dev/sdy
grub> root (hd1,0)
grub> setup (hd1)
grub> device (hd2) /dev/sdac
grub> root (hd2,0)
grub> setup (hd2)


this will mark the correct disks with boot records.

then, you make sure you boot from the correct drive in grub.conf:

title CentOS-4 x86_64 (2.6.9-42.ELsmp)
root (hd1,0)
kernel /vmlinuz-2.6.9-42.ELsmp ro root=/dev/md1 rhgb quiet
initrd /initrd-2.6.9-42.ELsmp.img


that's it.

3 comments:

Anonymous said...

Thanks for the info.

I don't quite understand how did you enabled mirror between /dev/sdy and /dev/sdac after you have copied the sda to sdy?
I mean, assuming you used dd to copy an installed OS from sda over to sdy, then you should have gotten all the paritions of sda on sdy, then how do you turn sdaX paritions into mdX devices?

Cheers,
Michael

Tex Swiss said...

well, in my case, I create an md device as a broken raid 1 mirror, like this:

echo "yes"| /sbin/mdadm -C /dev/md0 -l 1 -n 2 missing /dev/sdy1

then, after reboot, you try something like this to add another to it:

mdadm --manage /dev/md0 --add /dev/sdy1

hope that helps. thanks for reading!

Anonymous said...

Thanks so much for decoding the mysterious red message that looks alot more like a terrible hardware failure of some sort that has rendered it incapable of saying anything more descriptive.

Another possible cause is that the BIOS is simply trying to boot from the wrong drive.

An X4200 I have periodically decides to switch boot devices completely on it's own. Usually this results in a blinking cursor on a blank screen where it should be booting Solaris. Today it did the bad pbr sig instead for the first time. Must be something new in the boot sector on RAID array used for storage.

So one might want to verify that the boot device is indeed correct before persuing other possibilities.