patch-2.4.10 linux/Documentation/nmi_watchdog.txt

Next file: linux/Documentation/parisc/registers
Previous file: linux/Documentation/networking/dl2k.txt
Back to the patch index
Back to the overall index

diff -u --recursive --new-file v2.4.9/linux/Documentation/nmi_watchdog.txt linux/Documentation/nmi_watchdog.txt
@@ -1,19 +1,27 @@
 
-Is your SMP system locking up unpredictably? No keyboard activity, just
+Is your ix86 system locking up unpredictably? No keyboard activity, just
 a frustrating complete hard lockup? Do you want to help us debugging
 such lockups? If all yes then this document is definitely for you.
 
-on Intel SMP hardware there is a feature that enables us to generate
-'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt - these get
-executed even if the system is otherwise locked up hard) This can be
-used to debug hard kernel lockups. By executing periodic NMI interrupts,
-the kernel can monitor whether any CPU has locked up, and print out
-debugging messages if so.  You can enable/disable the NMI watchdog at boot
-time with the 'nmi_watchdog=1' boot parameter. Eg. the relevant
-lilo.conf entry:
+On Intel and similar ix86 type hardware there is a feature that enables
+us to generate 'watchdog NMI interrupts'.  (NMI: Non Maskable Interrupt
+which get executed even if the system is otherwise locked up hard).
+This can be used to debug hard kernel lockups.  By executing periodic
+NMI interrupts, the kernel can monitor whether any CPU has locked up,
+and print out debugging messages if so.  You must enable the NMI
+watchdog at boot time with the 'nmi_watchdog=n' boot parameter.  Eg.
+the relevant lilo.conf entry:
 
         append="nmi_watchdog=1"
 
+For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.
+For UP machines without an IO-APIC use nmi_watchdog=2, this only works
+for some processor types.  If in doubt, boot with nmi_watchdog=1 and
+check the NMI count in /proc/interrupts; if the count is zero then
+reboot with nmi_watchdog=2 and check the NMI count.  If it is still
+zero then log a problem, you probably have a processor that needs to be
+added to the nmi code.
+
 A 'lockup' is the following scenario: if any CPU in the system does not
 execute the period local timer interrupt for more than 5 seconds, then
 the NMI handler generates an oops and kills the process. This
@@ -24,8 +32,9 @@
 cannot even accept NMI interrupts, or the crash has made the kernel
 unable to print messages.
 
-NOTE: currently the NMI-oopser is enabled unconditionally on x86 SMP
-boxes.
+NOTE: starting with 2.4.2-ac18 the NMI-oopser is disabled by default,
+you have to enable it with a boot time parameter.  Prior to 2.4.2-ac18
+the NMI-oopser is enabled unconditionally on x86 SMP boxes.
 
 [ feel free to send bug reports, suggestions and patches to
   Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing

FUNET's LINUX-ADM group, linux-adm@nic.funet.fi
TCL-scripts by Sam Shen (who was at: slshen@lbl.gov)