[v1] x86: Fix kexec 5-level to 4-level paging transition

[PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Usama Arif 3 months, 2 weeks ago

Kexec from a kernel with 5-level page tables to one with 4-level page
tables is broken because bits above the physical address width are not
properly masked by the target kernel. This issue was particularly triggered
by _PAGE_BIT_NOPTISHADOW, which uses _PAGE_BIT_SOFTW5 (bit 58).

The ideal fix would be to mask the upper bits properly in all kernels.
However, this is not feasible due to:
- The logistical challenge of patching all older kernels in production
- The patch not being applicable for live patching

Instead, move _PAGE_BIT_NOPTISHADOW to use _PAGE_BIT_SOFTW1 (bit 9),
which is already masked by older kernels using PAGE_MASK. This is safe
as the other users of _PAGE_BIT_SOFTW1 (_PAGE_BIT_SPECIAL and
_PAGE_BIT_CPA_TEST) are only used for leaf entries, while
_PAGE_BIT_NOPTISHADOW is used for PGD and P4D entries only.

Fixes: d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
 arch/x86/include/asm/pgtable_types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 2ec250ba467e2..616e928d87973 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -29,6 +29,8 @@
 #define _PAGE_BIT_PKEY_BIT3	62	/* Protection Keys, bit 4/4 */
 #define _PAGE_BIT_NX		63	/* No execute: only valid after cpuid check */
 
+/* _PAGE_BIT_SPECIAL and _PAGE_BIT_CPA_TEST only used for leaf entries */
+#define _PAGE_BIT_NOPTISHADOW	_PAGE_BIT_SOFTW1
 #define _PAGE_BIT_SPECIAL	_PAGE_BIT_SOFTW1
 #define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW1
 #define _PAGE_BIT_UFFD_WP	_PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */
@@ -37,11 +39,9 @@
 
 #ifdef CONFIG_X86_64
 #define _PAGE_BIT_SAVED_DIRTY	_PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */
-#define _PAGE_BIT_NOPTISHADOW	_PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD) */
 #else
 /* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */
 #define _PAGE_BIT_SAVED_DIRTY	_PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */
-#define _PAGE_BIT_NOPTISHADOW	_PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD) */
 #endif
 
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
-- 
2.47.3

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Dave Hansen 3 months, 2 weeks ago

On 10/22/25 15:06, Usama Arif wrote:
> Instead, move _PAGE_BIT_NOPTISHADOW to use _PAGE_BIT_SOFTW1 (bit 9),

Wait a sec, though...

This isn't necessary once the previous 2 patches are applied, right?

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Usama Arif 3 months, 2 weeks ago

On 23/10/2025 00:35, Dave Hansen wrote:
> On 10/22/25 15:06, Usama Arif wrote:
>> Instead, move _PAGE_BIT_NOPTISHADOW to use _PAGE_BIT_SOFTW1 (bit 9),
> 
> Wait a sec, though...
> 
> This isn't necessary once the previous 2 patches are applied, right?

In kexec if the target kernels have patch 1 and 2, then this patch
is not needed. Unfortunately, patches 1 and 2 are not livepatchable.
Also backporting patches 1 and 2 to all previous kernels running in
production in a large fleet is not very scalable.

So if we want to run a kernel with 5 level pagetable in production
(with the ability to kexec into a 4 level kernel that doesn't have the first
2 patches), then this patch would solve the problem. i.e. patches 1 and 2
solve the problem from the target kernels perspective, patch 3 solves
it from the source kernel (if the target kernel doesnt have patches 1
and 2 applied).
I mentioned this in the commit message as:

"
- The logistical challenge of patching all older kernels in production
- The patch not being applicable for live patching
"

I can try and make the commit message clearer in the next revision.

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Dave Hansen 3 months, 2 weeks ago

On 10/22/25 16:58, Usama Arif wrote:
>> This isn't necessary once the previous 2 patches are applied, right?
> In kexec if the target kernels have patch 1 and 2, then this patch
> is not needed. Unfortunately, patches 1 and 2 are not livepatchable.
> Also backporting patches 1 and 2 to all previous kernels running in
> production in a large fleet is not very scalable.

I don't think I've ever been asked to apply a patch to make livepatching
easier. I'm not sure that's something we want to pollute mainline with.

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Kiryl Shutsemau 3 months, 2 weeks ago

On Thu, Oct 23, 2025 at 07:05:24AM -0700, Dave Hansen wrote:
> On 10/22/25 16:58, Usama Arif wrote:
> >> This isn't necessary once the previous 2 patches are applied, right?
> > In kexec if the target kernels have patch 1 and 2, then this patch
> > is not needed. Unfortunately, patches 1 and 2 are not livepatchable.
> > Also backporting patches 1 and 2 to all previous kernels running in
> > production in a large fleet is not very scalable.
> 
> I don't think I've ever been asked to apply a patch to make livepatching
> easier. I'm not sure that's something we want to pollute mainline with.

It is not about assisting livepatching.

Machines in our fleet may switch between kernel versions using kexec.

We recently introduced a kernel in the fleet that enables 5-level
paging.

Kexecing into an older kernel that requires switching from 5- to 4-level
paging which is broken because the target kernel doesn't expect
_PAGE_NOPTISHADOW.

The first two patches fix the problem for the target kernel. If we only
apply them upstream, we would need to backport them to all kernels we
use to address the problem.

The last patch allows us to only update the kernel that has 5-level
paging enabled, making it much easier logistically.

The fix seems trivial, and I don't see any downsides.

Ultimately, it helps with interoperability between different kernel
versions and/or configurations.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Dave Hansen 3 months, 2 weeks ago

On 10/23/25 07:24, Kiryl Shutsemau wrote:
> The last patch allows us to only update the kernel that has 5-level
> paging enabled, making it much easier logistically.
> 
> The fix seems trivial, and I don't see any downsides.

What I'm hearing is: Please change mainline so $COMPANY can do fewer
backports.

Yeah, it's pretty trivial. But I'm worried about the precedent, and I'm
worried that the change doesn't do a thing for mainline. It's pure
churn. Churn has inherent downsides.

I'd urge you to kick this out of the series and focus on the bug fixes
that are unambiguously good for everyone. Let's have a nice big flamewar
in another thread.

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Usama Arif 3 months, 2 weeks ago

On 23/10/2025 16:12, Dave Hansen wrote:
> On 10/23/25 07:24, Kiryl Shutsemau wrote:
>> The last patch allows us to only update the kernel that has 5-level
>> paging enabled, making it much easier logistically.
>>
>> The fix seems trivial, and I don't see any downsides.
> 
> What I'm hearing is: Please change mainline so $COMPANY can do fewer
> backports.
> 

Not at all! Very happy to do the backports (will probably end up doing anyways).
They apply very cleanly annd are easy to do.

The issue is trying to deploy a kernel with 5-level table. This problem would be encountered
by anyone that has a medium to large number of machines to manage. 
Kiryl made a good point about crash kernels, but also medium to large fleets are very
dynamic. Old kernels remain for some time for a variety of reasons. And once you have
to kexec into an older kernel that doesnt have patches 1 and 2, it just doesn't work.

The only reason I mentioned live-patch is because that is the only way I know that can
be used to fix a problem like this and not have patch 3. But even if they were live patchable
not every uses it.

It would be nice to have patch 3 in upstream, as I would imagine it would make
life easier for a lot of people when they upgrade their kernel past 6.15 (when the defconfig
option to switch to 4 level was removed). We know of the problem, so we can mitigate it,
but I would imagine a lot of people won't. The bug was found when we tried upgrading
to 6.16, and kexec was breaking when downgrading. It took quite a while to find the bug
as prints don't work in this part of the code, so I think this patch might just save others
the trouble of going through the whole debugging process. 

If there is a strong preference to drop patch 3, I will remove it in the next revision.

Re: [PATCH 3/3] x86/mm: Move _PAGE_BIT_NOPTISHADOW from bit 58 to bit 9

Posted by Kiryl Shutsemau 3 months, 2 weeks ago

On Thu, Oct 23, 2025 at 08:12:32AM -0700, Dave Hansen wrote:
> On 10/23/25 07:24, Kiryl Shutsemau wrote:
> > The last patch allows us to only update the kernel that has 5-level
> > paging enabled, making it much easier logistically.
> > 
> > The fix seems trivial, and I don't see any downsides.
> 
> What I'm hearing is: Please change mainline so $COMPANY can do fewer
> backports.

Or you can read it as: without the fix 5-level paging deployment is
harder.

One other point is that crashkernels tend to be older and update less
frequently than the main kernel. And one would only discover that
crashdump doesn't work when the crash happens.

> Yeah, it's pretty trivial. But I'm worried about the precedent, and I'm
> worried that the change doesn't do a thing for mainline. It's pure
> churn. Churn has inherent downsides.

You don't consider kexec to older kernels useful for mainline?

> I'd urge you to kick this out of the series and focus on the bug fixes
> that are unambiguously good for everyone. Let's have a nice big flamewar
> in another thread.

Oh, well... Okay.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov