Machine Check Exception Article Index for
Machine
Website Links For
Machine
 

Information About

Machine Check Exception




A Machine Check Exception occurs when a computer processor detects an unrecoverable hardware error.


COMMON CAUSES


  • Overclocking

  • Inadequate cooling causing electrons to drift

  • Poorly fitted heatsink/fans

  • System bus errors

  • Memory errors that may include parity or Error Correction Code (ECC) problems

  • Cache errors in the processor or hardware

  • Translation Lookaside Buffers (TLB) errors in the processor

  • Other CPU-vendor specific detected hardware problems

  • Vendor-specific detected hardware problems



RARE CAUSES


  • Software performing read/writes to non-existant memory regions

  • Certain sequences of operations may trigger CPU errata

  • Overclocking

  • Inadequate cooling

  • Poorly fitted heatsink/fans



EXTREMELY RARE CAUSES


  • Cosmic rays causing bits to randomly flip



DECODING MCES


Ultimately you should contact your hardware vendor to help you understand the significance of an MCE, however, there are some utilities out that can help you get you started.

mcelog

ftp://ftp.x86-64.org/pub/linux/tools/mcelog/

For machines with a x86-64 processors you can use mcelog. mcelog is a utility to decode the
binary machine check events generated by the x86-64 kernel. It should run
as a regular cronjob on any x86-64 machine. In addition it allows to decode
the machine check panics reported by the kernel for fatal hardware errors.

parsemce

http://www.codemonkey.org.uk/cruft/parsemce.c/parsemce.c

Written by Dave Jones for Red Hat, you can use parsemce to decode MCEs on Linux from AMD K7 processors.