OSDN Git Service

mm/hwpoison: introduce per-memory_block hwpoison counter
authorNaoya Horiguchi <naoya.horiguchi@nec.com>
Mon, 24 Oct 2022 06:20:12 +0000 (15:20 +0900)
committerAndrew Morton <akpm@linux-foundation.org>
Wed, 9 Nov 2022 01:37:22 +0000 (17:37 -0800)
commit5033091de814ab4b5623faed2755f3064e19e2d2
tree7edde57024f06ca1015a0cc59e0173d848d04ade
parenta46c9304b4bbf1b164154976cbb7e648980c7b5b
mm/hwpoison: introduce per-memory_block hwpoison counter

Currently PageHWPoison flag does not behave well when experiencing memory
hotremove/hotplug.  Any data field in struct page is unreliable when the
associated memory is offlined, and the current mechanism can't tell
whether a memory block is onlined because a new memory devices is
installed or because previous failed offline operations are undone.
Especially if there's a hwpoisoned memory, it's unclear what the best
option is.

So introduce a new mechanism to make struct memory_block remember that a
memory block has hwpoisoned memory inside it.  And make any online event
fail if the onlining memory block contains hwpoison.  struct memory_block
is freed and reallocated over ACPI-based hotremove/hotplug, but not over
sysfs-based hotremove/hotplug.  So the new counter can distinguish these
cases.

Link: https://lkml.kernel.org/r/20221024062012.1520887-5-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers/base/memory.c
include/linux/memory.h
include/linux/mm.h
mm/internal.h
mm/memory-failure.c
mm/sparse.c