Recently, we acquired a two processor BL860c i2 blade for inclusion in our development cluster. We justified this by pointing out that the development cluster (which also doubles as the QA/Test cluster) had no where near enough CPU power to run our full overnight processing stream. As our overnight run consists of (nearly) identical streams of parallel processing, we had been testing small subsets and hoping that the operative word "nearly" would not bite us. The addition of the i2 will allow a full run in a timely fashion (read "less that 8 hours").
In addition, we recently had an incident where the management processor of our single BL860c in development went nuts, resulting in a 24+ hour outage while various hardware swaps were tried to rectify the problem (the large outage resulted from someone dropping the ball when it came to backing up the MP settings). It was pointed out to management that they had a development team and a number of testers from the business all waiting for the recovery of this machine, and adding a second Itanium based machine would remove this single point of failure.
There were certainly some "interesting" configuration exercises required to get the new machine up and running the way we wanted it.
The first issue we ran into was that someone that shall remain nameless had inserted a half height blade in a slot with a dual slot divider in it. This is a big no-no as it prevents you from creating a full height slot by removing the divider without first shutting down and removing the half height blade. Fortunately, the blade was part of a VMware farm, and we moved the virtual machines on it somewhere else so we could shut down and relocate the half height blade.
The second issue was that I believed our networking guys when they told me that no fibrechannel zone changes would be required for this machine. I wasted half a day racking my brains for reasons the fibrechannel disks were not visible until I decided to look at the zoning information myself. And what do you know? Zoning changes were required. This issue is really a problem with the way we manage our networks. It's all outsourced, and getting information (or access to look at information yourself) can sometimes be problematic.
The next issue was that the instructions for establishing bootstrap records in the EFI partition when using an unpatched version of OpenVMS 8.4 are bogus. You are instructed to boot from the installation media (I use vMedia as all the boxes are in a remote location), install VMS, and allow the installation to write a boot record. All this works, but you cannot use the boot records to actually boot the operating system. Eventually I used the bcfg boot add
EFI command to add a record that would actually boot. The bug that was causing this is fixed by applying the latest update patch.
I also discovered during this exercise that HP have switched over to delivering licenses via the network. Easy for HP, frustrating for the customer. The most frustrating bit is that you require the exact email address that was specified on the order to get the licenses. If you don't know it, or have to go looking for it, you can waste significant time.
And finally (if the above wasn't enough), the Smart Array P410i RAID controller that runs the local disks in the blade appeared to be working, and I could even see the block level devices at the EFI console, but VMS could not see them. If you've read this far, your reward is a ton of technical info telling you how to correct this issue.
When I started to troubleshoot this, I looked at what the console was telling me and the first thing I noted was that the raid controlled was in "HBA mode". I thought "great, just what I want. I just want both physical disks to appear as DKA devices, and forget about RAID". Little did I know that OpenVMS only supports Smart Array P410i controllers in RAID Mode. At the time I wrote that link, Google returned one result. The only mention I'm aware of concerning this restriction is in the UPDATE V0200 release notes, and to me at least, requiring RAID to access locally connected disks is counterintuative.
So, how do we get the controller into RAID mode and configured correctly? Prepare for some hack-fu.
First, we need to locate a utility called saupdate
, which is normally used to flash new versions of firmware to the array controller. Usually, I'd just link to it, but HP is not making things easy:
Next, we have to get saupdate.efi
onto the EFI FAT partition on a disk we can access from the EFI shell of the target machine:
saupdate.efi
to the target machine in binary. Let's put it in SYS$MANAGER:
for this example.SYS$SYSTEM:EFI$CP.EXE
to write saupdate.efi
to the EFI partition on the system disk. See Accessing the EFI FAT Boot Partition at Hoff's site for full details.This is certainly the crucial bit to get right, as you can screw your EFI partition with the wrong command. Here's a step-by-step assuming the partition you want to update is on your system disk:
$ dir sys$manager:saupdate.efi
Directory SYS$COMMON:[SYSMGR]
SAUPDATE.EFI;1
Total of 1 file.
$ mc efi$cp
EFI$CP> mount sys$common:[sys$ldr]sys$efi.sys/device=ivms:/over=id
EFI$CP> copy/binary sys$manager:saupdate.efi ivms:\EFI\VMS\UPDATE\
EFI$CP> dismount ivms:
EFI$CP> exit
$
Next, reset the target machine so that the EFI partition is rescanned (I'm assuming you know how to do that :-)
Once you have the machine back at the EFI shell prompt, change directory to where you placed the saupdate.efi
image. For this example, we'll assume it's on the first file system:
Shell> fs0:
fs0:> cd EFI\VMS\UPDATE
fs0:\EFI\VMS\UPDATE> saupdate set_mode all raid
[output suppressed telling you that doing this will corrupt existing data etc.]
Next, you need to define logical units on the RAID controller. In my case I want the two physicals as two JBODs so I can shadow them with HBVS. The EFI commands here involve discovering the handle where the array's driver is mapped, and the controller number:
Shell> drivers
T D
D Y C I
R P F A
V VERSION E G G #D #C DRIVER NAME IMAGE NAME
== ======== = = = == == ======================= ===================
[output suppressed]
A4 00000312 B X X 1 2 Smart Array SAS Driver v 3.12 MemoryMapped(0xB,0xB)
[output suppressed]
Shell>
We need the information in the first column. In this case, "A4". With this, we can find the controller number:
Shell> drvcfg
Configurable Components
[output suppressed]
Drv[A4] Ctr[A3] Lang[eng]
[output suppressed]
We match "A4" against the "Drv" column, and we find our controller number is "A3". Armed with this information, we can finally fire up the RAID controller's configuration utility:
Shell> drvcfg -s a4 a3
From here, the utility is menu driven and obvious. Eventually I had two logical drives configured, one for each physical.
Hoff and I had an IRC conversation where we sorted this out. And he wrote it up on his website too at SmartArray P410i Controller and OpenVMS?. Thanks for the input, Hoff!
Posted at October 6, 2011 5:39 PMComments are closed