From: Kan Liang <kan.liang@linux.intel.com>
There is a hardware feature (Intel PEBS XMMs group), which can handle
XSAVE "snapshots" from random code running. This just provides another
XSAVE data source at a random time.
Add an interface to retrieve the actual register contents when the NMI
hit. The interface is different from the other interfaces of FPU. The
other mechanisms that deal with xstate try to get something coherent.
But this interface is *in*coherent. There's no telling what was in the
registers when a NMI hits. It writes whatever was in the registers when
the NMI hit. It's the invoker's responsibility to make sure the contents
are properly filtered before exposing them to the end user.
The support of the supervisor state components is required. The
compacted storage format is preferred. So the XSAVES is used.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/include/asm/fpu/xstate.h | 1 +
arch/x86/kernel/fpu/xstate.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index b308a76afbb7..0c8b9251c29f 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -107,6 +107,7 @@ int xfeature_size(int xfeature_nr);
void xsaves(struct xregs_state *xsave, u64 mask);
void xrstors(struct xregs_state *xsave, u64 mask);
+void xsaves_nmi(struct xregs_state *xsave, u64 mask);
int xfd_enable_feature(u64 xfd_err);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9aa9ac8399ae..8602683fcb12 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1448,6 +1448,36 @@ void xrstors(struct xregs_state *xstate, u64 mask)
WARN_ON_ONCE(err);
}
+/**
+ * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI
+ * @xstate: Pointer to the buffer
+ * @mask: Feature mask to select the components to save
+ *
+ * The @xstate buffer must be 64 byte aligned.
+ *
+ * Caution: The interface is different from the other interfaces of FPU.
+ * The other mechanisms that deal with xstate try to get something coherent.
+ * But this interface is *in*coherent. There's no telling what was in the
+ * registers when a NMI hits. It writes whatever was in the registers when
+ * the NMI hit.
+ * The only user for the interface is perf_event. There is already a
+ * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE
+ * "snapshots" from random code running. This just provides another XSAVE
+ * data source at a random time.
+ * This function can only be invoked in an NMI. It returns the *ACTUAL*
+ * register contents when the NMI hit.
+ */
+void xsaves_nmi(struct xregs_state *xstate, u64 mask)
+{
+ int err;
+
+ if (!in_nmi())
+ return;
+
+ XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
+ WARN_ON_ONCE(err);
+}
+
#if IS_ENABLED(CONFIG_KVM)
void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int xfeature)
{
--
2.38.1
On 6/26/2025 12:56 PM, kan.liang@linux.intel.com wrote:
<snip>> Add an interface to retrieve the actual register contents when
the NMI
> hit. The interface is different from the other interfaces of FPU. The
> other mechanisms that deal with xstate try to get something coherent.
> But this interface is *in*coherent. There's no telling what was in the
> registers when a NMI hits. It writes whatever was in the registers when
> the NMI hit. It's the invoker's responsibility to make sure the contents
> are properly filtered before exposing them to the end user.
<snip>
>
> +/**
> + * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI
> + * @xstate: Pointer to the buffer
> + * @mask: Feature mask to select the components to save
> + *
> + * The @xstate buffer must be 64 byte aligned.
> + *
> + * Caution: The interface is different from the other interfaces of FPU.
> + * The other mechanisms that deal with xstate try to get something coherent.
> + * But this interface is *in*coherent. There's no telling what was in the
> + * registers when a NMI hits. It writes whatever was in the registers when
> + * the NMI hit.
> + * The only user for the interface is perf_event. There is already a
> + * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE
> + * "snapshots" from random code running. This just provides another XSAVE
> + * data source at a random time.
> + * This function can only be invoked in an NMI. It returns the *ACTUAL*
> + * register contents when the NMI hit.
> + */
> +void xsaves_nmi(struct xregs_state *xstate, u64 mask)
> +{
> + int err;
> +
> + if (!in_nmi())
> + return;
> +
> + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
> + WARN_ON_ONCE(err);
> +}
> +
There are xsaves()/xrstors() functions, already narrowed to the
"independent" feature set only. So, adding a new xsaves_yyy() variant
for a different use case -- without renaming the existing helpers to
something like xsaves_xxx() -- would make the naming scheme appear
inconsistent at a glance.
But looking back at history:
1. These helpers were established with "independent" in the name (though
they were initially described as for “dynamic” features):
copy_kernel_to_independent_supervisor()/
copy_independent_supervisor_to_kernel()
2. Later, Thomas reworked them, renaming and simplifying them to
xsaves()/xrstors(), and adding a refactored validator:
validate_xsaves_xrstors() [1]. At that point, their usage was
*relaxed* and not strictly limited to independent features.
3. Subsequently, in preparation for dynamic feature support, the helpers
were restricted again to independent features only [2]. This involved
renaming and enforcing stricter validation via
validate_independent_components().
Given that, rather than introducing a new wrapper for every additional
use case, another option could be to retain xsaves() naming but modestly
expand its scope. That would mean to add another allowance: features in
tightly constrained contexts (e.g., NMI). Perhaps, this approach can
keep the API simple while still expanding usage.
[1] a75c52896b6d ("x86/fpu/xstate: Sanitize handling of independent
features")
[2] f5daf836f292 ("x86/fpu: Restrict xsaves()/xrstors() to independent
states")
On 2025-07-01 5:18 p.m., Chang S. Bae wrote:
> On 6/26/2025 12:56 PM, kan.liang@linux.intel.com wrote:
> <snip>> Add an interface to retrieve the actual register contents when
> the NMI
>> hit. The interface is different from the other interfaces of FPU. The
>> other mechanisms that deal with xstate try to get something coherent.
>> But this interface is *in*coherent. There's no telling what was in the
>> registers when a NMI hits. It writes whatever was in the registers when
>> the NMI hit. It's the invoker's responsibility to make sure the contents
>> are properly filtered before exposing them to the end user.
>
> <snip>
>
>> +/**
>> + * xsaves_nmi - Save selected components to a kernel xstate buffer in
>> NMI
>> + * @xstate: Pointer to the buffer
>> + * @mask: Feature mask to select the components to save
>> + *
>> + * The @xstate buffer must be 64 byte aligned.
>> + *
>> + * Caution: The interface is different from the other interfaces of FPU.
>> + * The other mechanisms that deal with xstate try to get something
>> coherent.
>> + * But this interface is *in*coherent. There's no telling what was in
>> the
>> + * registers when a NMI hits. It writes whatever was in the registers
>> when
>> + * the NMI hit.
>> + * The only user for the interface is perf_event. There is already a
>> + * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE
>> + * "snapshots" from random code running. This just provides another
>> XSAVE
>> + * data source at a random time.
>> + * This function can only be invoked in an NMI. It returns the *ACTUAL*
>> + * register contents when the NMI hit.
>> + */
>> +void xsaves_nmi(struct xregs_state *xstate, u64 mask)
>> +{
>> + int err;
>> +
>> + if (!in_nmi())
>> + return;
>> +
>> + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err);
>> + WARN_ON_ONCE(err);
>> +}
>> +
> There are xsaves()/xrstors() functions, already narrowed to the
> "independent" feature set only. So, adding a new xsaves_yyy() variant
> for a different use case -- without renaming the existing helpers to
> something like xsaves_xxx() -- would make the naming scheme appear
> inconsistent at a glance.
>
> But looking back at history:
>
> 1. These helpers were established with "independent" in the name (though
> they were initially described as for “dynamic” features):
> copy_kernel_to_independent_supervisor()/
> copy_independent_supervisor_to_kernel()
>
> 2. Later, Thomas reworked them, renaming and simplifying them to
> xsaves()/xrstors(), and adding a refactored validator:
> validate_xsaves_xrstors() [1]. At that point, their usage was
> *relaxed* and not strictly limited to independent features.
>
> 3. Subsequently, in preparation for dynamic feature support, the helpers
> were restricted again to independent features only [2]. This involved
> renaming and enforcing stricter validation via
> validate_independent_components().
>
> Given that, rather than introducing a new wrapper for every additional
> use case, another option could be to retain xsaves() naming but modestly
> expand its scope. That would mean to add another allowance: features in
> tightly constrained contexts (e.g., NMI). Perhaps, this approach can
> keep the API simple while still expanding usage.
>
So we need to add a parameter, e.g., nmi. For the NMI case, the limit to
the independent features should be removed, right?
void xsaves(struct xregs_state *xsave, u64 mask, bool nmi);
The only user for the xsaves is LBR. It should not be a problem to
update the interface.
But perf only needs the xsaves for the SIMD and other registers. So the
parameter will only be added for the xsaves(). The xsaves()/xrstore()
will not be symmetrical anymore. I'm not sure if it's a problem.
Dave? Thomas? Any comments?
Should we extend the existing xsaves() interface or adding a new
xsaves_nmi() interface for the perf usage?
Thanks,
Kan
> [1] a75c52896b6d ("x86/fpu/xstate: Sanitize handling of independent
> features")
> [2] f5daf836f292 ("x86/fpu: Restrict xsaves()/xrstors() to independent
> states")
>
>
On 7/7/25 11:12, Liang, Kan wrote: > Should we extend the existing xsaves() interface or adding a new > xsaves_nmi() interface for the perf usage? I think we should just add a new one. The perf usage is totally unique.
© 2016 - 2026 Red Hat, Inc.