arch/x86/kernel/e820.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
The current implementation of e820__register_nosave_regions suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
Cc: stable@vger.kernel.org
Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable names")
Reported-by: Roberto Ricci <io@r-ricci.it>
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz>
---
The issue of the kernel failing to resume from hibernation after
kexec_load() is used is likely due to kexec-tools passing in a different
e820 memory map from the one provided by system firmware, causing the
e820 consistency check to fail. That issue is not addressed in this
patch and will need to be fixed in kexec-tools instead.
Changes in v3:
- Round out nosave PFN ranges since the kernel does not use partial
pages
- Link to v2: https://lore.kernel.org/r/20250405-fix-e820-nosave-v2-1-d40dbe457c95@qtmlabs.xyz
Changes in v2:
- Updated author details
- Rewrote commit message
- Link to v1: https://lore.kernel.org/r/20250405-fix-e820-nosave-v1-1-162633199548@qtmlabs.xyz
---
arch/x86/kernel/e820.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0749cc3c23844eeb36820705687e08bbf7..9d8dd8deb2a702bd34b961ca4f5eba8a8d9643d0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI
---
base-commit: a8662bcd2ff152bfbc751cab20f33053d74d0963
change-id: 20250405-fix-e820-nosave-f5cb985a9a61
Best regards,
--
Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz>
* Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz> wrote: > The current implementation of e820__register_nosave_regions suffers from > multiple serious issues: > - The end of last region is tracked by PFN, causing it to find holes > that aren't there if two consecutive subpage regions are present > - The nosave PFN ranges derived from holes are rounded out (instead of > rounded in) which makes it inconsistent with how explicitly reserved > regions are handled > > Fix this by: > - Treating reserved regions as if they were holes, to ensure consistent > handling (rounding out nosave PFN ranges is more correct as the > kernel does not use partial pages) > - Tracking the end of the last RAM region by address instead of pages > to detect holes more precisely > > Cc: stable@vger.kernel.org > Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable names") So why is this SHA1 indicated as the root cause? AFAICS that commit does nothing but cleanups, so it cannot cause such regressions. Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > > * Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz> wrote: > > > The current implementation of e820__register_nosave_regions suffers from > > multiple serious issues: > > - The end of last region is tracked by PFN, causing it to find holes > > that aren't there if two consecutive subpage regions are present > > - The nosave PFN ranges derived from holes are rounded out (instead of > > rounded in) which makes it inconsistent with how explicitly reserved > > regions are handled > > > > Fix this by: > > - Treating reserved regions as if they were holes, to ensure consistent > > handling (rounding out nosave PFN ranges is more correct as the > > kernel does not use partial pages) > > - Tracking the end of the last RAM region by address instead of pages > > to detect holes more precisely > > > > Cc: stable@vger.kernel.org > > Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable names") > > So why is this SHA1 indicated as the root cause? AFAICS that commit > does nothing but cleanups, so it cannot cause such regressions. BTW.: A) "It was the first random commit that seemed related, sry" B) "It's a 15 years old bug, but I wanted to indicate a fresh, 8-year old bug to get this into -stable. Busted!" ... are perfectly fine answers in my book. :-) I'm glad about the fixes, I'm just curious how the Fixes tag came about. Thanks, Ingo
On 4/7/25 01:36, Ingo Molnar wrote: > * Ingo Molnar<mingo@kernel.org> wrote: > >> * Myrrh Periwinkle<myrrhperiwinkle@qtmlabs.xyz> wrote: >> >>> The current implementation of e820__register_nosave_regions suffers from >>> multiple serious issues: >>> - The end of last region is tracked by PFN, causing it to find holes >>> that aren't there if two consecutive subpage regions are present >>> - The nosave PFN ranges derived from holes are rounded out (instead of >>> rounded in) which makes it inconsistent with how explicitly reserved >>> regions are handled >>> >>> Fix this by: >>> - Treating reserved regions as if they were holes, to ensure consistent >>> handling (rounding out nosave PFN ranges is more correct as the >>> kernel does not use partial pages) >>> - Tracking the end of the last RAM region by address instead of pages >>> to detect holes more precisely >>> >>> Cc:stable@vger.kernel.org >>> Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable names") >> So why is this SHA1 indicated as the root cause? AFAICS that commit >> does nothing but cleanups, so it cannot cause such regressions. > BTW.: > > A) "It was the first random commit that seemed related, sry" > B) "It's a 15 years old bug, but I wanted to indicate a fresh, 8-year old bug to get this into -stable. Busted!" You got me :) How did you know that this is a 15 years old bug? (although I didn't think the age of the bug a patch fixes would affect its chances of getting to -stable) This specific revision was picked since it's the latest one that this patch can be straightforwardly applied to (there is a (trivial) merge conflict with -stable, though). Later, I managed to track the buggy logic back to 1c10070a55a3 ("i386: do not restore reserved memory after hibernation"), which I believe is the very first occurrence of this bug. If you prefer, I can send a v4 with a more correct Fixes: tag (or feel free to do so yourself when applying this patch). > ... are perfectly fine answers in my book. :-) > > I'm glad about the fixes, I'm just curious how the Fixes tag came about. > > Thanks, > > Ingo Regards, Myrrh
* Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz> wrote: > On 4/7/25 01:36, Ingo Molnar wrote: > > > * Ingo Molnar<mingo@kernel.org> wrote: > > > > > * Myrrh Periwinkle<myrrhperiwinkle@qtmlabs.xyz> wrote: > > > > > > > The current implementation of e820__register_nosave_regions suffers from > > > > multiple serious issues: > > > > - The end of last region is tracked by PFN, causing it to find holes > > > > that aren't there if two consecutive subpage regions are present > > > > - The nosave PFN ranges derived from holes are rounded out (instead of > > > > rounded in) which makes it inconsistent with how explicitly reserved > > > > regions are handled > > > > > > > > Fix this by: > > > > - Treating reserved regions as if they were holes, to ensure consistent > > > > handling (rounding out nosave PFN ranges is more correct as the > > > > kernel does not use partial pages) > > > > - Tracking the end of the last RAM region by address instead of pages > > > > to detect holes more precisely > > > > > > > > Cc:stable@vger.kernel.org > > > > Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable names") > > > So why is this SHA1 indicated as the root cause? AFAICS that commit > > > does nothing but cleanups, so it cannot cause such regressions. > > BTW.: > > > > A) "It was the first random commit that seemed related, sry" > > B) "It's a 15 years old bug, but I wanted to indicate a fresh, 8-year old bug to get this into -stable. Busted!" > > You got me :) How did you know that this is a 15 years old bug? Call it a 'regression radar' that every kernel maintainer develops after their first 20 years or so - each bug has a distinct feeling to them, and this one felt genuinely *ancient*. > [...] (although I didn't think the age of the bug a patch fixes would > affect its chances of getting to -stable) Yeah, it doesn't really affect its -stable elibility much once we move outside the ~6-12 months window that upstream recognizes as a semi-recent regression - it was mostly my lame attempt at deadpan humor, trying to play off 15 year old bugs against 8 year old bugs as if 8 years old bugs were fresh. Yeah, I know, it's not funny even to me anymore, I'm weird that way. ;-) > This specific revision was picked since it's the latest one that this > patch can be straightforwardly applied to (there is a (trivial) merge > conflict with -stable, though). Yeah. So in the x86/urgent commit I've tagged the other commit you pinpointed in your followup mail: Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64") Just to give backporters *some* chance at fixing this ancient bug in older kernels, if they really want to. Thanks, Ingo
On 4/7/25 07:53, Myrrh Periwinkle wrote: > On 4/7/25 01:36, Ingo Molnar wrote: > >> * Ingo Molnar<mingo@kernel.org> wrote: >> >>> * Myrrh Periwinkle<myrrhperiwinkle@qtmlabs.xyz> wrote: >>> >>>> The current implementation of e820__register_nosave_regions suffers >>>> from >>>> multiple serious issues: >>>> - The end of last region is tracked by PFN, causing it to find holes >>>> that aren't there if two consecutive subpage regions are present >>>> - The nosave PFN ranges derived from holes are rounded out >>>> (instead of >>>> rounded in) which makes it inconsistent with how explicitly >>>> reserved >>>> regions are handled >>>> >>>> Fix this by: >>>> - Treating reserved regions as if they were holes, to ensure >>>> consistent >>>> handling (rounding out nosave PFN ranges is more correct as the >>>> kernel does not use partial pages) >>>> - Tracking the end of the last RAM region by address instead of >>>> pages >>>> to detect holes more precisely >>>> >>>> Cc:stable@vger.kernel.org >>>> Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry >>>> *entry' local variable names") >>> So why is this SHA1 indicated as the root cause? AFAICS that commit >>> does nothing but cleanups, so it cannot cause such regressions. >> BTW.: >> >> A) "It was the first random commit that seemed related, sry" >> B) "It's a 15 years old bug, but I wanted to indicate a fresh, >> 8-year old bug to get this into -stable. Busted!" > > You got me :) How did you know that this is a 15 years old bug? > (although I didn't think the age of the bug a patch fixes would affect > its chances of getting to -stable) > > This specific revision was picked since it's the latest one that this > patch can be straightforwardly applied to (there is a (trivial) merge > conflict with -stable, though). > > Later, I managed to track the buggy logic back to 1c10070a55a3 ("i386: > do not restore reserved memory after hibernation"), which I believe is > the very first occurrence of this bug. If you prefer, I can send a v4 > with a more correct Fixes: tag (or feel free to do so yourself when > applying this patch). I did some more digging and it seems like the buggy logic actually appeared all the way back in e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64") back when x86_64 was a separate port, which was copied later by the i386 port in the commit I mentioned above, which would make this a 19 year old bug instead of 15. > >> ... are perfectly fine answers in my book. :-) >> >> I'm glad about the fixes, I'm just curious how the Fixes tag came about. >> >> Thanks, >> >> Ingo > > Regards, > > Myrrh >
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: f2f29da9f0d4367f6ff35e0d9d021257bb53e273
Gitweb: https://git.kernel.org/tip/f2f29da9f0d4367f6ff35e0d9d021257bb53e273
Author: Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz>
AuthorDate: Sun, 06 Apr 2025 11:45:22 +07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 07 Apr 2025 19:20:08 +02:00
x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
While debugging kexec/hibernation hangs and crashes, it turned out that
the current implementation of e820__register_nosave_regions() suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
These bugs appear to have been introduced about ~18 years ago with the very
first version of e820_mark_nosave_regions(), and its flawed assumptions were
carried forward uninterrupted through various waves of rewrites and renames.
[ mingo: Added Git archeology details, for kicks and giggles. ]
Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64")
Reported-by: Roberto Ricci <io@r-ricci.it>
Tested-by: Roberto Ricci <io@r-ricci.it>
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@qtmlabs.xyz
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
---
arch/x86/kernel/e820.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0..9d8dd8d 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI
© 2016 - 2025 Red Hat, Inc.