Kexec from a kernel with 5-level page tables to one with 4-level page
tables is broken because bits above the physical address width are not
properly masked by the target kernel. This issue was particularly triggered
by _PAGE_BIT_NOPTISHADOW, which uses _PAGE_BIT_SOFTW5 (bit 58).
The ideal fix would be to mask the upper bits properly in all kernels.
However, this is not feasible due to:
- The logistical challenge of patching all older kernels in production
- The patch not being applicable for live patching
Instead, move _PAGE_BIT_NOPTISHADOW to use _PAGE_BIT_SOFTW1 (bit 9),
which is already masked by older kernels using PAGE_MASK. This is safe
as the other users of _PAGE_BIT_SOFTW1 (_PAGE_BIT_SPECIAL and
_PAGE_BIT_CPA_TEST) are only used for leaf entries, while
_PAGE_BIT_NOPTISHADOW is used for PGD and P4D entries only.
Fixes: d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
arch/x86/include/asm/pgtable_types.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 2ec250ba467e2..616e928d87973 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -29,6 +29,8 @@
#define _PAGE_BIT_PKEY_BIT3 62 /* Protection Keys, bit 4/4 */
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
+/* _PAGE_BIT_SPECIAL and _PAGE_BIT_CPA_TEST only used for leaf entries */
+#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW1
#define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1
#define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1
#define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */
@@ -37,11 +39,9 @@
#ifdef CONFIG_X86_64
#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */
-#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD) */
#else
/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */
#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */
-#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD) */
#endif
/* If _PAGE_BIT_PRESENT is clear, we use these: */
--
2.47.3
On 10/22/25 15:06, Usama Arif wrote: > Instead, move _PAGE_BIT_NOPTISHADOW to use _PAGE_BIT_SOFTW1 (bit 9), Wait a sec, though... This isn't necessary once the previous 2 patches are applied, right?
On 23/10/2025 00:35, Dave Hansen wrote: > On 10/22/25 15:06, Usama Arif wrote: >> Instead, move _PAGE_BIT_NOPTISHADOW to use _PAGE_BIT_SOFTW1 (bit 9), > > Wait a sec, though... > > This isn't necessary once the previous 2 patches are applied, right? In kexec if the target kernels have patch 1 and 2, then this patch is not needed. Unfortunately, patches 1 and 2 are not livepatchable. Also backporting patches 1 and 2 to all previous kernels running in production in a large fleet is not very scalable. So if we want to run a kernel with 5 level pagetable in production (with the ability to kexec into a 4 level kernel that doesn't have the first 2 patches), then this patch would solve the problem. i.e. patches 1 and 2 solve the problem from the target kernels perspective, patch 3 solves it from the source kernel (if the target kernel doesnt have patches 1 and 2 applied). I mentioned this in the commit message as: " - The logistical challenge of patching all older kernels in production - The patch not being applicable for live patching " I can try and make the commit message clearer in the next revision.
On 10/22/25 16:58, Usama Arif wrote: >> This isn't necessary once the previous 2 patches are applied, right? > In kexec if the target kernels have patch 1 and 2, then this patch > is not needed. Unfortunately, patches 1 and 2 are not livepatchable. > Also backporting patches 1 and 2 to all previous kernels running in > production in a large fleet is not very scalable. I don't think I've ever been asked to apply a patch to make livepatching easier. I'm not sure that's something we want to pollute mainline with.
On Thu, Oct 23, 2025 at 07:05:24AM -0700, Dave Hansen wrote: > On 10/22/25 16:58, Usama Arif wrote: > >> This isn't necessary once the previous 2 patches are applied, right? > > In kexec if the target kernels have patch 1 and 2, then this patch > > is not needed. Unfortunately, patches 1 and 2 are not livepatchable. > > Also backporting patches 1 and 2 to all previous kernels running in > > production in a large fleet is not very scalable. > > I don't think I've ever been asked to apply a patch to make livepatching > easier. I'm not sure that's something we want to pollute mainline with. It is not about assisting livepatching. Machines in our fleet may switch between kernel versions using kexec. We recently introduced a kernel in the fleet that enables 5-level paging. Kexecing into an older kernel that requires switching from 5- to 4-level paging which is broken because the target kernel doesn't expect _PAGE_NOPTISHADOW. The first two patches fix the problem for the target kernel. If we only apply them upstream, we would need to backport them to all kernels we use to address the problem. The last patch allows us to only update the kernel that has 5-level paging enabled, making it much easier logistically. The fix seems trivial, and I don't see any downsides. Ultimately, it helps with interoperability between different kernel versions and/or configurations. -- Kiryl Shutsemau / Kirill A. Shutemov
On 10/23/25 07:24, Kiryl Shutsemau wrote: > The last patch allows us to only update the kernel that has 5-level > paging enabled, making it much easier logistically. > > The fix seems trivial, and I don't see any downsides. What I'm hearing is: Please change mainline so $COMPANY can do fewer backports. Yeah, it's pretty trivial. But I'm worried about the precedent, and I'm worried that the change doesn't do a thing for mainline. It's pure churn. Churn has inherent downsides. I'd urge you to kick this out of the series and focus on the bug fixes that are unambiguously good for everyone. Let's have a nice big flamewar in another thread.
On 23/10/2025 16:12, Dave Hansen wrote: > On 10/23/25 07:24, Kiryl Shutsemau wrote: >> The last patch allows us to only update the kernel that has 5-level >> paging enabled, making it much easier logistically. >> >> The fix seems trivial, and I don't see any downsides. > > What I'm hearing is: Please change mainline so $COMPANY can do fewer > backports. > Not at all! Very happy to do the backports (will probably end up doing anyways). They apply very cleanly annd are easy to do. The issue is trying to deploy a kernel with 5-level table. This problem would be encountered by anyone that has a medium to large number of machines to manage. Kiryl made a good point about crash kernels, but also medium to large fleets are very dynamic. Old kernels remain for some time for a variety of reasons. And once you have to kexec into an older kernel that doesnt have patches 1 and 2, it just doesn't work. The only reason I mentioned live-patch is because that is the only way I know that can be used to fix a problem like this and not have patch 3. But even if they were live patchable not every uses it. It would be nice to have patch 3 in upstream, as I would imagine it would make life easier for a lot of people when they upgrade their kernel past 6.15 (when the defconfig option to switch to 4 level was removed). We know of the problem, so we can mitigate it, but I would imagine a lot of people won't. The bug was found when we tried upgrading to 6.16, and kexec was breaking when downgrading. It took quite a while to find the bug as prints don't work in this part of the code, so I think this patch might just save others the trouble of going through the whole debugging process. If there is a strong preference to drop patch 3, I will remove it in the next revision.
On Thu, Oct 23, 2025 at 08:12:32AM -0700, Dave Hansen wrote: > On 10/23/25 07:24, Kiryl Shutsemau wrote: > > The last patch allows us to only update the kernel that has 5-level > > paging enabled, making it much easier logistically. > > > > The fix seems trivial, and I don't see any downsides. > > What I'm hearing is: Please change mainline so $COMPANY can do fewer > backports. Or you can read it as: without the fix 5-level paging deployment is harder. One other point is that crashkernels tend to be older and update less frequently than the main kernel. And one would only discover that crashdump doesn't work when the crash happens. > Yeah, it's pretty trivial. But I'm worried about the precedent, and I'm > worried that the change doesn't do a thing for mainline. It's pure > churn. Churn has inherent downsides. You don't consider kexec to older kernels useful for mainline? > I'd urge you to kick this out of the series and focus on the bug fixes > that are unambiguously good for everyone. Let's have a nice big flamewar > in another thread. Oh, well... Okay. -- Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2026 Red Hat, Inc.