mm/ksm.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 183 insertions(+)
From: Longlong Xia <xialonglong@kylinos.cn> When a hardware memory error occurs on a KSM page, the current behavior is to kill all processes mapping that page. This can be overly aggressive when KSM has multiple duplicate pages in a chain where other duplicates are still healthy. This patch introduces a recovery mechanism that attempts to migrate mappings from the failing page to another healthy duplicate within the same chain before resorting to killing processes. The recovery process works as follows: 1. When a memory failure is detected on a KSM page, identify if the failing node is part of a chain (has duplicates) (maybe add dup_haed item to save head_node to struct stable_node?, saving searching the whole stable tree, or other way to find head_node) 2. Search for another healthy duplicate page within the same chain 3. For each process mapping the failing page: - Update the PTE to point to the healthy duplicate page ( maybe reuse replace_page?, or split repalce_page into smaller function and use the common part) - Migrate the rmap_item to the new stable node 4. If all migrations succeed, remove the failing node from the chain 5. Only kill processes if recovery is impossible or fails The original idea came from Naoya Horiguchi. https://lore.kernel.org/all/20230331054243.GB1435482@hori.linux.bs1.fc.nec.co.jp/ I test it with /sys/kernel/debug/hwpoison/corrupt-pfn in qemu-x86_64. here is my test steps and result: 1. alloc 1024 page with same content and enable KSM to merge after merge (same phy_addr only print once) a. virtual addr = 0x7e4c68a00000 phy_addr =0x10e802000 b. virtual addr = 0x7e4c68b2c000 phy_addr =0x10e902000 c. virtual addr = 0x7e4c68c26000 phy_addr =0x10ea02000 d. virtual addr = 0x7e4c68d20000 phy_addr =0x10eb02000 2. echo 0x10e802 > /sys/kernel/debug/hwpoison/corrupt-pfn a. virtual addr = 0x7e4c68a00000 phy_addr =0x10eb02000 b. virtual addr = 0x7e4c68b2c000 phy_addr =0x10e902000 c. virtual addr = 0x7e4c68c26000 phy_addr =0x10ea02000 d. virtual addr = 0x7e4c68d20000 phy_addr =0x10eb02000 (share with a) 3.echo 0x10eb02 > /sys/kernel/debug/hwpoison/corrupt-pfn a. virtual addr = 0x7e4c68a00000 phy_addr =0x10ea02000 b. virtual addr = 0x7e4c68b2c000 phy_addr =0x10e902000 c. virtual addr = 0x7e4c68c26000 phy_addr =0x10ea02000 (share with a) d. virtual addr = 0x7e4c68c58000 phy_addr =0x10ea02000 (share with a) 4.echo 0x10ea02 > /sys/kernel/debug/hwpoison/corrupt-pfn a. virtual addr = 0x7e4c68a00000 phy_addr =0x10e902000 b. virtual addr = 0x7e4c68a32000 phy_addr =0x10e902000(share with a) c. virtual addr = 0x7e4c68a64000 phy_addr =0x10e902000(share with a) d. virtual addr = 0x7e4c68a96000 phy_addr =0x10e902000(share with a) 5.echo 0x10e902 > /sys/kernel/debug/hwpoison/corrupt-pfn MCE: Killing ksm_test:531 due to hardware memory corruption fault at 7e4c68a00000 kernel-log: Injecting memory failure at pfn 0x10e802 Memory failure: 0x10e802: recovery action for dirty LRU page: Recovered Injecting memory failure at pfn 0x10eb02 Memory failure: 0x10eb02: recovery action for dirty LRU page: Recovered Injecting memory failure at pfn 0x10ea02 Memory failure: 0x10ea02: recovery action for dirty LRU page: Recovered Injecting memory failure at pfn 0x10e902 Memory failure: 0x10e902: recovery action for dirty LRU page: Recovered MCE: Killing ksm_test:531 due to hardware memory corruption fault at 7e4c68a00000 Thanks for review and comments! Longlong Xia (1): mm/ksm: Add recovery mechanism for memory failures mm/ksm.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 183 insertions(+) -- 2.43.0
On 09.10.25 09:00, Longlong Xia wrote: > From: Longlong Xia <xialonglong@kylinos.cn> > > When a hardware memory error occurs on a KSM page, the current > behavior is to kill all processes mapping that page. This can > be overly aggressive when KSM has multiple duplicate pages in > a chain where other duplicates are still healthy. > > This patch introduces a recovery mechanism that attempts to migrate > mappings from the failing page to another healthy duplicate within > the same chain before resorting to killing processes. An alternative could be to allocate a new page and effectively migrate from the old (degraded) page to the new page by copying page content from one of the healty duplicates. That would keep the #mappings per page in the chain balanced. > > The recovery process works as follows: > 1. When a memory failure is detected on a KSM page, identify if the > failing node is part of a chain (has duplicates) (maybe add dup_haed > item to save head_node to struct stable_node?, saving searching > the whole stable tree, or other way to find head_node) > 2. Search for another healthy duplicate page within the same chain > 3. For each process mapping the failing page: > - Update the PTE to point to the healthy duplicate page ( maybe reuse > replace_page?, or split repalce_page into smaller function and use the > common part) > - Migrate the rmap_item to the new stable node > 4. If all migrations succeed, remove the failing node from the chain > 5. Only kill processes if recovery is impossible or fails Does not sound too crazy. But how realistic do we consider that in practice? We need quite a bunch of processes to dedup the same page to end up getting duplicates in the chain IIRC. So isn't this rather an improvement only for less likely scenarios in practice? -- Cheers David / dhildenb
© 2016 - 2025 Red Hat, Inc.