OSDN Git Service

EDAC, sb_edac: Fix reporting for patrol scrubber errors
authorQiuxu Zhuo <qiuxu.zhuo@intel.com>
Mon, 10 Sep 2018 21:11:45 +0000 (14:11 -0700)
committerBorislav Petkov <bp@suse.de>
Tue, 11 Sep 2018 09:09:54 +0000 (11:09 +0200)
commit8489b17ce29d9a35a36c08bbea93cdce4c98a6ad
tree752d188b55996a1182fc587ab1399a71539924c7
parentdcc960b225ceb2bd66c45e0845d03e577f7010f9
EDAC, sb_edac: Fix reporting for patrol scrubber errors

sb_edac sometimes reports the wrong DIMM for a memory error found by
the patrol scrubber. That is because the hardware provides only a 4KB
page-aligned address for the error case.

This means that the EDAC driver will point at the DIMM matching offset
0x0 in the 4KB page, but because of interleaving across channels and
ranks, the actual DIMM involved may be different if the error is on some
other cache line within the page.

Therefore, reconstruct the socket/iMC/channel information from the "mce"
structure passed to the EDAC driver. The DIMM cannot be determined, so
pass "dimm=-1" to the EDAC core. It will report that all the DIMMs on
that channel may be affected.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-3-tony.luck@intel.com
[ Improve comments on the functions to convert bank number
  to memory controller number. Minor cleanup to commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message more. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
drivers/edac/sb_edac.c