[RFC] Restore PKRU to user-defined value after signal handling

Aruna Ramakrishna posted 1 patch 2 weeks, 3 days ago
[RFC] Restore PKRU to user-defined value after signal handling
Posted by Aruna Ramakrishna 2 weeks, 3 days ago
Hello,

The following commit, which is part of 6.12 rc, does not work consistently on systems with AMD processors vs. Intel:

70044df250d0:     x86/pkeys: Update PKRU to enable all pkeys before XSAVE

This zeroes out the pkeys in handle_signal() by calling sig_prepare_pkru():

/*
 * Enable all pkeys temporarily, so as to ensure that both the current
 * execution stack as well as the alternate signal stack are writeable.
 * The application can use any of the available pkeys to protect the
 * alternate signal stack, and we don't know which one it is, so enable
 * all. The PKRU register will be reset to init_pkru later in the flow,
 * in fpu__clear_user_states(), and it is the application's responsibility
 * to enable the appropriate pkey as the first step in the signal handler
 * so that the handler does not segfault.
 */
static inline u32 sig_prepare_pkru(void)
{
        u32 orig_pkru = read_pkru();

        write_pkru(0);
        return orig_pkru;
}

The write_pkru(0) call seems to set xinuse[9] to 0 on systems with AMD CPUs (but not Intel), which means the user-defined PKRU value overwritten in the sigframe (in update_pkru_in_sigframe()) is not restored by XRSTOR and the PKRU value stays at 0 when it returns back to userspace. Which is unexpected.

AMD:

$ ./handler-pkru
startup pkru = 0x55555554
changed in main thread pkru = 0xfffffff0
received signal 10
in signal handler pkru = 0x55555554
after usr1 signal pkru = 0x00000000

…

xcr0 207
xcr0 AND xinuse 202
writing pkru 0
xcr0 207
xcr0 AND xinuse 2


Intel:

$ ./handler-pkru
startup pkru = 0x55555554
changed in main thread pkru = 0xfffffff0
received signal 10
in signal handler pkru = 0x55555554
after usr1 signal pkru = 0xfffffff0

…

xcr0 2E7
xcr0 AND xinuse 2A2
writing pkru 0
xcr0 2E7
xcr0 AND xinuse 2A2

From the Intel manual:

“
13.6 PROCESSOR TRACKING OF XSAVE-MANAGED STATE

The following notation describes the state of the init and modified optimizations:
• XINUSE denotes the state-component bitmap corresponding to the init optimization. If XINUSE[i] = 0, state component i is known to be in its initial configuration; otherwise XINUSE[i] = 1. It is possible for XINUSE[i] to be 1 even when state component i is in its initial configuration. On a processor that does not support the init optimization, XINUSE[i] is always 1 for every value of i.
...

• PKRU state. PKRU state is in its initial configuration if the value of the PKRU is 0.
...

13.8.1 Standard Form of XRSTOR

XRSTOR updates state component i based on the value of bit i in the XSTATE_BV field of the XSAVE header:
• If XSTATE_BV[i] = 0, the state component is set to its initial configuration. Section 13.6 specifies the initial configuration of each state component.
The initial configuration of state component 1 pertains only to the XMM registers and not to MXCSR. See below for the treatment of MXCSR
• If XSTATE_BV[i] = 1, the state component is loaded with data from the XSAVE area. See Section 13.5 for specifics for each state component and for details regarding mode-specific operation and operation determined by instruction prefixes. See Section 13.13 for details regarding faults caused by memory accesses.
“

The line “PKRU state is in its initial configuration if the value of the PKRU is 0” seems to imply that when the PKRU register is set to 0, xinuse[9] is also automatically set to 0 and that is expected behavior, which causes XRSTOR to not load the register value from XSAVE area. But we do not want xinuse[9] to be set to 0 here, as we want the PKRU value to be correctly restored from the sigframe - otherwise it becomes a security issue.

I’m not really sure of the correct way to reset xinuse[9] to 1 (after wrpkru(0)) - but something like this seems to work (thanks to Rudi Horn for both finding the issue and suggesting this patch):

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 1065ab995305..701a163f0ac5 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -68,9 +68,35 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
  */
 static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
 {
+       int err = 0;
+
        if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
                return 0;
-       return __put_user(pkru, (unsigned int __user *)get_xsave_addr_user(buf, XFEATURE_PKRU));
+
+       if (pkru != 0) {
+               err = __put_user(pkru,
+                                (unsigned int __user *)get_xsave_addr_user(
+                                        buf, XFEATURE_PKRU));
+               u64 xfeatures;
+               u64 __user *xfeaturesp = &buf->header.xfeatures;
+
+               err |= __get_user(xfeatures, xfeaturesp);
+
+               /*
+                * On some systems, when PKRU is set to 0, the corresponding
+                * XINUSE bit is also zeroed out, which causes XRSTOR to not
+                * load the register value from XSAVE area. Which means the
+                * PKRU value that was updated on the sigframe will be
+                * effectively discarded.
+                *
+                * Mark PKRU as in use so that it is restored correctly.
+                */
+               if (!err & !(xfeatures & XFEATURE_MASK_PKRU)) {
+                       xfeatures |= XFEATURE_MASK_PKRU;
+                       err |= __put_user(xfeatures, xfeaturesp);
+               }
+       }
 }

I’ve tested a version of this patch on both AMD and Intel systems and it works.

Please let me know if this is acceptable, or if there’s a better way to do this.

Thanks,
Aruna









Re: [RFC] Restore PKRU to user-defined value after signal handling
Posted by Dave Hansen 2 weeks, 3 days ago
On 11/6/24 10:33, Aruna Ramakrishna wrote:
>  static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
>  {
> +       int err = 0;
> +
>         if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
>                 return 0;
> -       return __put_user(pkru, (unsigned int __user *)get_xsave_addr_user(buf, XFEATURE_PKRU));

Let me try to summarize that whole email:

The existing code updates the PKRU value in the XSAVE buffer.  But it
does not update ->xfeatures[PKRU].  If ->xfeatures[PKRU]==0, then XRSTOR
will ignore the data that __put_user() put in place.

How does ->xfeatures[PKRU] end up set to 0?  On AMD, a WRPKRU(0) sets
PKRU=0 *and* XINUSE[PKRU]=0.  Intel doesn't do that.  Either behavior is
architecturally permitted.

Did I miss anything?

But the suggested fix is just beyond hideous.  Can't we just use the
mask that xsave_to_user_sigframe() generated instead of reading it back
out of userspace three seconds after it is written?

static inline int update_pkru_in_sigframe(..., u32 mask)
{
	u32 xinuse;
	int err;

        if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
                return 0;

	/* Ensure XRSTOR picks up the new PKRU value from the buffer: */
	xinuse = (mask & xfeatures_in_use()) | XFEATURE_MASK_PKRU;

	err =  __put_user(xinuse, &buf->header.xfeatures);
	if (err)
		return err;

        return ... existing code here;
}

This probably means moving update_pkru_in_sigframe() to the end of
xsave_to_user_sigframe() instead of calling it after, though.

But either way, this is all horrific.  It's yet another reason that the
XSAVE architecture complexity hurts more than it helps.  We want PKRU
written out here, dammit.  We shouldn't have to ask the hardware to
write it out, and _then_ go back and do it ourselves.
Re: [RFC] Restore PKRU to user-defined value after signal handling
Posted by Aruna Ramakrishna 2 weeks, 3 days ago
> On Nov 6, 2024, at 11:27 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 11/6/24 10:33, Aruna Ramakrishna wrote:
>> static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
>> {
>> +       int err = 0;
>> +
>>        if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
>>                return 0;
>> -       return __put_user(pkru, (unsigned int __user *)get_xsave_addr_user(buf, XFEATURE_PKRU));
> 
> Let me try to summarize that whole email:
> 
> The existing code updates the PKRU value in the XSAVE buffer.  But it
> does not update ->xfeatures[PKRU].  If ->xfeatures[PKRU]==0, then XRSTOR
> will ignore the data that __put_user() put in place.
> 
> How does ->xfeatures[PKRU] end up set to 0?  On AMD, a WRPKRU(0) sets
> PKRU=0 *and* XINUSE[PKRU]=0.  Intel doesn't do that.  Either behavior is
> architecturally permitted.
> 
> Did I miss anything?

Nope, this is correct.

> 
> But the suggested fix is just beyond hideous.  Can't we just use the
> mask that xsave_to_user_sigframe() generated instead of reading it back
> out of userspace three seconds after it is written?
> 
> static inline int update_pkru_in_sigframe(..., u32 mask)
> {
> u32 xinuse;
> int err;
> 
>        if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
>                return 0;
> 
> /* Ensure XRSTOR picks up the new PKRU value from the buffer: */
> xinuse = (mask & xfeatures_in_use()) | XFEATURE_MASK_PKRU;
> 
> err =  __put_user(xinuse, &buf->header.xfeatures);
> if (err)
> return err;
> 
>        return ... existing code here;
> }

Ah, I missed xfeatures_in_use(). This is a better implementation.

> 
> This probably means moving update_pkru_in_sigframe() to the end of
> xsave_to_user_sigframe() instead of calling it after, though.
> 

I do not understand why it has to be moved. Would you mind explaining?

Thank you for your feedback, I’ll redo the patch and test again.

Thanks,
Aruna

> But either way, this is all horrific.  It's yet another reason that the
> XSAVE architecture complexity hurts more than it helps.  We want PKRU
> written out here, dammit.  We shouldn't have to ask the hardware to
> write it out, and _then_ go back and do it ourselves.




Re: [RFC] Restore PKRU to user-defined value after signal handling
Posted by Dave Hansen 2 weeks, 3 days ago
On 11/6/24 11:40, Aruna Ramakrishna wrote:
> I do not understand why it has to be moved. Would you mind explaining?

You need to know what XSTATE_BV value got written by XSAVE.  That's
dependent on: XINUSE and RFBM.

RFBM is 'mask' in xsave_to_user_sigframe().

So you can either completely regenerate 'mask' in
update_pkru_in_sigframe() or you can just pass 'mask' in.