I've spent the last few days in New York doing our annual disaster recovery test.
This year, due to some intense pressure on the applications side (translation: they have higher priority things to do) we did not have any applications team involvement to assist with the test.
This made my part of the test quite simple. I simply restored my smallest cluster rather than all four of the critical clusters.
The big hitch this year was that, dispite careful preparation by Computer Operations, a canister failed to ship to the remote site. Of course this contained my documentation. So I had to do the recovery without any instructions.
Fortunately, I know the systems really well, and this was not much of a problem. However, if it had been one of the two huge production clusters, I think I would have been struggling.
Even without the documentation, I discovered some things on the cluster that can be improved. For example, the command procedure that mounts the disks is not on the system disk. This meant that I had to restore a second disk before I could figure out the logical to physical mapping of the disks. I attempted a good guess as to which physical disk I restored that to, but as I guessed wrong, I ended up restoring the disk twice.
Other than that, and the network being delayed (because of the same missing canister) the test was 100% successful.
Posted at November 19, 2003 3:46 PMComments are closed