As you know, I recently accepted permanent employment as an OpenVMS Systems Specialist (yeah, yeah, I know I have to do Un*x/Windoze server support too). As the technical support team is rather light on the ground at the moment after some positions have become vacant, I was asked if I was contactable over the holidays between Christmas and the New Year. Well, I could hardly say "no".
Needless to say, I was called multiple times for various application issues that occurred after a blade enclosure was power cycled (eek) to clear a stuck mezzanine card. Of course, Murphy's Law states that half the production cluster was in that enclosure.
On investigation of the startup issue, it became apparent that the startup on one of the two nodes that rebooted had gone into some sort of infinite loop. On tracking down the problem, I discovered a wonderful piece of DCL that, if run by multiple nodes at the same time, would undoubtably have temporary filename collisions. The command procedure created a non-unique temp file, and then deleted all versions of it.
Of course, the temp file was being used for input to sysman
by doing the following horrible hack:
$ temp_file = "sys$manager:temp.tmp"
$ define/nolog sys$input 'temp_file'
$ mc sysman
$ deassign sys$input
$ delete/nolog 'temp_file';*
Erk. What happens if the file's not there? You guessed it - sysman
goes into an infinite loop requesting a command with an alphabetic first character! What obviously happened in the startup was that one node deleted the file while the other was still creating it. Bad, bad, bad.
The proper way to do this is of course to use the f$unique()
lexical function to ensure you have no temp file name collisions. In addition, let's use sysman
's @ command to input the temp file:
$ temp_file = "sys$manager:" + f$unique() + ".tmp"
$ mc sysman @'temp_file'
$ delete/nolog 'temp_file';*
Even if the temp file doesn't exist, you at least don't get an infinite loop. You just get a nice %RMS-E-FNF, file not found
message.
Comments are closed