This time on development. A little back story: the system this occurred on is running on a blade system, and the management processor in the blade system had been acting flaky. HP decided to swap out the blade, but the engineer on site forgot to back up the NVRAM before he swapped out the old blade, and we ended up with an unbootable box. After much stuffing around, we managed to boot the box in a non-standard way. A day later, we saw this crash.
Crashdump Summary Information:
------------------------------
Crash Time: 12-AUG-2011 13:18:50.79
Bugcheck Type: PGFIPLHI, Pagefault with IPL too high
Node: xxxxxx (Cluster)
CPU Type: HP BL860c (1.59GHz/9.0MB)
VMS Version: V8.3-1H1
Current Process: NULL
Current Image: <not available>
Failing PC: FFFFFFFF.800F9ED1 EXE$PAL_REMQUEQ_C+001C1
Failing PS: 00000000.00000608
Module: SYSTEM_PRIMITIVES_MIN (Link Date/Time: 2-JUL-2010 17:35:2
7.47)
Offset: 000C9ED1
Boot Time: 10-AUG-2011 15:26:14.00
System Uptime: 1 21:52:36.79
Crash/Primary CPU: 0./0.
System/CPU Type: 4020
Saved Processes: 394
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 16383 MByte (134742016 PFNs, discontiguous memory)
Dumpfile Pagelets: 3497462 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: compressed,selective,dosd,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump,tbchk
Paging Files: 2 Pagefiles and 1 Swapfile installed
Stack Pointers:
KSP = FFFFFFFF.B34C57D0 ESP = FFFFFFFF.B281B000 SSP = FFFFFFFF.B280F000
USP = FFFFFFFF.B280F000
General Registers:
R0 = 00000000.00000000 GP = FFFFFFFF.AD4D8600 R2 = FFFFFFFF.AD04CAE8
R3 = 80000000.00000006 R4 = FFFFFFFF.70A94000 R5 = 00000000.00000000
R6 = FFFFFFFF.B34C5800 R7 = 00000000.00000006 R8 = 00000000.00000207
R9 = 00000000.00000009 R10 = 00000000.00000001 R11 = 00000000.00000000
SP = 00000000.00000000 TP = 00000000.00000000 R14 = FFFFFFFF.FFFFFFFC
R15 = FFFFFFFF.AD1E7F88 R16 = FFFFFFFF.80000CE0 R17 = 00000000.0000035C
R18 = 00000000.00000002 R19 = 00000000.00000000 R20 = 00000000.00000000
R21 = 00000000.7FFF0278 R22 = 00000000.00000358 R23 = FFFFFFFF.B34C58C0
R24 = 00000000.00000600 AI = 00000000.00000006 RA = FFFFFFFF.B34C57E8
PV = FFFFFFFF.B34C57E0 R28 = 00000000.00000000 FP = FFFFFFFF.B34C57D0
R30 = FFFFFFFF.81977C10 R31 = 30000000.00000000
Pagefault Information:
Faulting Virtual Address FFFFFFFF.70A94000
Memory Management Flags 00000000.00000000 Read Data Fault
Exception Frame:
Exception taken at IP FFFFFFFF.800F9ED0, slot 01 from Kernel mode
Trap Type 00000009 (Translation not valid fault)
IVT Offset 00000800 (Data TLB Fault)
Control Registers:
CR0 Default Control Register (DCR) 00000000.00007F00
CR16 Processor Status Register (IPSR) 00001210.08022030
CR17 Interrupt Status Register (ISR) 00000A04.00000000
CR19 Instruction Pointer (IIP) FFFFFFFF.800F9ED0
CR20 Faulting Address (IFA) FFFFFFFF.70A94000
CR21 TLB Insertion Register (ITIR) 00000000.00000334
CR22 Instruction Previous Address (IIPA) FFFFFFFF.800F9ED0
CR23 Function State (IFS) 80000000.00000006
CR24 Instruction immediate (IIM) 00000000.00000000
CR25 VHPT Hash Address (IHA) FFFFFFFF.7FFF2920
Application Registers:
AR16 Register Stack Config Reg (RSC) 00000000.00000003
AR17 Backing Store Pointer (BSP) FFFFFFFF.70A88328
AR18 Backing Store for Mem Store (BSPSTORE) FFFFFFFF.70A88168
AR19 RSE NaT Collection Register (RNAT) 00000000.00000000
AR32 Compare/Exchange Comp Value Reg (CCV) FFFFFFFF.00000000
AR36 User NaT Collection Register (UNAT) 00000000.00000000
AR64 Previous Function State (PFS) 00000000.00000C9F
AR65 Loop Count Register (LC) 00000000.00000000
AR66 Epilog Count Register (EC) 00000000.00000000
Processor Status Register (IPSR):
AC = 0 MFL= 1 MFH= 1 IC = 1 I = 0 DT = 1
DFL= 0 DFH= 0 RT = 1 CPL= 0 IT = 1 MC = 0 RI = 1
Interrupt Status Register (ISR):
Code 00000000 X = 0 W = 0 R = 1 NA = 0 SP = 0
RS = 0 IR = 0 NI = 0 SO = 0 EI = 1 ED = 1
Branch Registers:
B0 FFFFFFFF.819BAF90
B1 00000000.00000000
B2 00000000.00000000
B3 00000000.00000000
B4 00000000.00000000
B5 00000000.00000000
B6 FFFFFFFF.800F9D20
B7 FFFFFFFF.81977C10
Floating Point Registers: FPSR 0009804C.8A70033F
F6 00000000.0001003E.00000000.00016A76
F7 00000000.0001003E.00000000.00000407
F8 00000000.0001003E.00000000.0000005A
F9 00000000.0001003E.0000E7F9.999A6494
F10 00000000.0001003E.00000000.00016A76
F11 00000000.0001003E.00000000.A3D70A3E
Miscellaneous Registers:
Interrupt Priority Level (IPL) 00000006
Stack Align 000002D0
NaT Mask 0000
PPrev Mode 00
Previous Stack 00
Interrupt Depth 03
Preds 40000000.0001F059
Nats 00000000.00000000
Context 40000000.0001F20B
General Registers:
R0 00000000.00000000 GP FFFFFFFF.AD3E5C00 R2 00000000.0000023C
R3 FFFFFFFF.88C75574 R4 00000000.7FF43B20 R5 00000000.7FF43B40
R6 0009804C.0270033F R7 00000000.0000003E R8 00000000.00000006
R9 FFFFFFFF.88C75570 R10 00000000.00000001 R11 00000000.00000001
SP FFFFFFFF.B34C5AD0 TP 00000000.00000000 R14 00000000.00000006
R15 FFFFFFFF.AD735790 R16 FFFFF804.09C00A00 R17 00000000.00000000
R18 00000000.00000000 R19 FFFFFFFF.800F9D10 R20 FFFFFFFF.70A94000
R21 00000000.00000000 R22 FFFFFFFF.00000000 R23 00000000.00000000
R24 00000000.00000000 R25 00000000.00000001 R26 00000000.00000000
R27 00000000.00000003 R28 FFFFFFFF.893A0074 R29 FFFFFFFF.B34C5AD0
R30 FFFFFFFF.89645C30 R31 00000000.00000000
System Registers:
Page Table Base Register (PTBR) 00000000.00000000
Processor Base Register (PRBR) FFFFFFFF.88050000
Privileged Context Block Base (PCBB) FFFFFFFF.88050080
System Control Block Base (SCBB) 6D6D6D6D.6D6D6D6D
Software Interrupt Summary Register (SISR) 00000000.00000000
Address Space Number (ASN) 00000000.00000000
AST Summary / AST Enable (ASTSR_ASTEN) 00000000.00000000
Floating-Point Enable (FEN) 00000000.00000001
Interrupt Priority Level (IPL) 00000000.00000006
Machine Check Error Summary (MCES) 00000000.00000000
Virtual Page Table Base Register (VPTB) 00000000.00000000
Failing Instruction:
EXE$PAL_REMQUEQ_C+001C1: ld8 r30 = [r20], 008
Instruction Stream (last 20 instructions):
EXE$PAL_REMQUEQ_C+00170: nop.m 000000
EXE$PAL_REMQUEQ_C+00171: cmp.eq p0, p6 = r14, r0
EXE$PAL_REMQUEQ_C+00172: (p6) br.cond.spnt.few 1FFF400
EXE$PAL_REMQUEQ_C+00180: tak r22 = r30 ;;
EXE$PAL_REMQUEQ_C+00181: nop.m 000000
EXE$PAL_REMQUEQ_C+00182: cmp.eq p6, p0 = 01, r22 ;;
EXE$PAL_REMQUEQ_C+00190: nop.m 000000
EXE$PAL_REMQUEQ_C+00191: (p6) mov r17 = r30
EXE$PAL_REMQUEQ_C+00192: (p6) br.cond.spnt.few 00000D0
EXE$PAL_REMQUEQ_C+001A0: probe.w r22 = r30, r31 ;;
EXE$PAL_REMQUEQ_C+001A1: nop.m 000000
EXE$PAL_REMQUEQ_C+001A2: cmp.eq p6, p0 = r22, r0 ;;
EXE$PAL_REMQUEQ_C+001B0: (p6) mov r17 = r30
EXE$PAL_REMQUEQ_C+001B1: (p6) br.cond.spnt.few 1FFF330
EXE$PAL_REMQUEQ_C+001B2: br.few 0000030
EXE$PAL_REMQUEQ_C+001C0: rsm 004000
EXE$PAL_REMQUEQ_C+001C1: ld8 r30 = [r20], 008
EXE$PAL_REMQUEQ_C+001C2: nop.i 000000 ;;
EXE$PAL_REMQUEQ_C+001D0: ld8 r10 = [r20], 1F8
EXE$PAL_REMQUEQ_C+001D1: nop.f 000000
EXE$PAL_REMQUEQ_C+001D2: nop.i 000000
EXE$PAL_REMQUEQ_C+001E0: add r14 = 0008, r30 ;;
EXE$PAL_REMQUEQ_C+001E1: st8 [r14] = r10
EXE$PAL_REMQUEQ_C+001E2: nop.i 000000
EXE$PAL_REMQUEQ_C+001F0: st8 [r10] = r30
Posted at August 15, 2011 5:59 PM
This crash footprint is known to HP. The crash happens when the Network File System (NFS) accesses a corrupt TCPIP queue header. The problem is resolved by installing HP-I64VMS-TCPIP-V0506-9ECO5-1.
Posted by: Brodders at August 16, 2011 7:01 PM
Thanks John,
A patching we will go.
Posted by: Jim Duff at August 16, 2011 9:37 PM
Comments are closed