arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/fpu/types.h | 9 ++ arch/x86/include/asm/fpu/xstate.h | 5 +- arch/x86/kernel/cpu/cpuid-deps.c | 1 + arch/x86/kernel/cpu/scattered.c | 1 + arch/x86/kernel/fpu/xstate.c | 131 ++++++++++++++++----------- tools/testing/selftests/x86/Makefile | 3 +- tools/testing/selftests/x86/apx.c | 10 ++ tools/testing/selftests/x86/xstate.c | 3 +- tools/testing/selftests/x86/xstate.h | 1 + 10 files changed, 108 insertions(+), 57 deletions(-) create mode 100644 tools/testing/selftests/x86/apx.c
Hi all,
This patch series introduces support for Intel's Advanced Performance
Extensions (APX). The goal is to collect early feedback on the approach
taken for APX. Below is a brief overview of the feature and key
considerations.
== Introduction ==
APX introduces a new set of general-purpose registers designed to improve
performance. Currently, these registers are expected to be used primarily
by userspace applications, with no intended use in kernel mode. More
details on its use cases can be found in the published documentation [1].
== Points to Consider ==
In terms of kernel support, some aspects need attention:
* New Register State
APX register state is managed by the XSAVE instruction set, with its
state component number assigned to 19, following XTILEDATA.
* XSAVE Buffer Offset
- In the compacted format used (for the in-kernel buffer), the APX
state will appear at a later position in the buffer.
- In the non-compacted format (used for signal, ptrace, and KVM ABIs),
APX is assigned a lower offset, occupying the space previously
reserved for the deprecated MPX state.
This should not bring any ABI changes. In the extended register state
area, each state's size and offset are determined dynamically via
CPUID.
* Kernel Assumptions:
The kernel generally assumes that higher-numbered components have
higher offsets, as discussed in the following section. Although MPX
feature usage support was removed [2], its state components (#3 and #4)
remain supported.
== Areas for Adjustment ==
With that in mind, here are a few key areas that need to be addressed to
properly handle the APX state. While these are thought to be pretty much
at this stage, I'd say it is still open to discover possible hindsights
in other areas.
1. Feature Conflict
While a valid CPU should not expose both MPX and APX, a broken or
misconfigured CPU could erroneously expose support for both. This hard
conflict should be avoided up front.
2. XSAVE Format Conversion
APX introduces an offset anomaly that requires special handling when
converting between compacted and non-compacted layouts. The kernel
relies on XSAVE instructions for context switching and signal
handling. However, xstate copy functions must correctly translate the
XSAVE format while ensuring sequential memory access, as enforced by
struct membuf.
3. XSAVE Size Calculation
The kernel calculates XSAVE buffer sizes based on the assumption that
the highest-numbered feature appears at the end. If APX is the last
bit set in the feature mask, the existing logic in
xstate_calculate_size() miscalculates the buffer size.
4. Offset Sanity Check
The kernel's boot-time sanity check in setup_xstate_cache() assumes
offsets increase with feature numbers. The APX state will conflict
with this assumption.
== Approaches ==
These two approaches are considerable:
Option 1, Consider APX as a one-off exception
Initially, I thought this approach would result in fewer changes to the
xstate code. However, it makes the code less comprehensive (or more
complicated) and introduces a risk. If another feature similar to APX
comes up, adding another exception seems to be inefficient and messy.
Option 2, Handle out-of-order offsets in general
Rather than treating APX as an exception, this approach adapts the
kernel to accommodate out-of-order offsets. By introducing a feature
order table and an accompanying macro, the traversal logic is
encapsulated cleanly, simplifying related code. This makes the kernel
more resilient to future features with non-sequential offsets.
== Series Summary ==
This patch set addresses the above issues before enabling APX support
The chosen approach (Option 2) adjusts the kernel to handle out-of-order
offsets. Here’s a breakdown of the patches:
* PART1: XSTATE Code Adjustment
- PATCH1: Clean up xstate enabling messages.
- PATCH2: Introduce the feature order table.
- PATCH3: Remove the offset sanity check, as the checked rule is no
longer true.
- PATCH4: Adjust XSAVE size calculation.
- PATCH5: Modify the xstate copy function.
* PART2: APX Enabling
- PATCH6: Remove MPX support.
- PATCH7: Enumerate the APX CPUID feature bit.
- PATCH8: Update xstate definitions to include APX.
- PATCH9: Ensure MPX and APX are mutually exclusive.
- PATCH10: Register APX in the supported xstate list.
- PATCH11: Add a self-test case for APX.
== Testing ==
The first part is agnostic to the new feature, and the changes should
preserve the same functionality for the currently supported features. The
xstate tests -- covering context switching and ABI compatibility for
signal and ptrace -- were executed to ensure there is no regression.
PATCH11 applies to the same test set and builds on the xstate selftest
rework [3]. Since no hardware implementation is available at this time,
an internal Intel emulator was primarily used to verify the test cases.
---
The patches are based on the x86/fpu branch [4], where the selftest
rework has landed (Thanks, Ingo!). The series can be found here:
git://github.com/intel/apx.git apx_rfc-v1
Thanks,
Chang
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
[2] Commit 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
[3] https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fpu
Chang S. Bae (11):
x86/fpu/xstate: Simplify print_xstate_features()
x86/fpu/xstate: Introduce xstate order table and accessor macro
x86/fpu/xstate: Remove xstate offset check
x86/fpu/xstate: Adjust XSAVE buffer size calculation
x86/fpu/xstate: Adjust xstate copying logic for user ABI
x86/fpu/mpx: Remove MPX xstate component support
x86/cpufeatures: Add X86_FEATURE_APX
x86/fpu/apx: Define APX state component
x86/fpu/apx: Disallow conflicting MPX presence
x86/fpu/apx: Enable APX state support
selftests/x86/apx: Add APX test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 9 ++
arch/x86/include/asm/fpu/xstate.h | 5 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/fpu/xstate.c | 131 ++++++++++++++++-----------
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/apx.c | 10 ++
tools/testing/selftests/x86/xstate.c | 3 +-
tools/testing/selftests/x86/xstate.h | 1 +
10 files changed, 108 insertions(+), 57 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
base-commit: bd64e9d6aafd12e5437685d2a06360f86418d277
--
2.45.2
Hi all,
This is a revision of the previous posting [1]. The primary goal at this
stage is to present the scope of the code changes and still gather early
feedback on the approach to enabling this feature for userspace.
== Changes in this revision ==
A major update in this version is that the xstate ordering table is now
dynamically populated at boot time, thanks to Dave’s suggestion. Each
system will now generate its own table based on the available xfeature
set.
As a result, the MPX removal patch has been dropped from this series.
Previously, a conflict arose when listing both MPX and APX in a static
table. With the new dynamic approach, each feature can be enabled
exclusively without conflict. Patch 7 ensures their mutual exclusivity,
meaning the table will now reflect either one or neither of them.
Removing MPX entirely also seemed premature, given previous discussions
on the list [4]. For guest compatibility, setting MPX bits in XCR0 is
still expected on older systems.
Additional changes since the last posting:
* Complete MPX feature bits for enforcing APX exclusivity (Dave).
* Remove the first patch, as merged into the tip tree:
69a2fdf44604 ("x86/fpu/xstate: Simplify print_xstate_features()")
Below is the original cover letter with minor updates:
== Introduction ==
APX introduces a new set of general-purpose registers designed to improve
performance. More details on its use cases can be found in the published
documentation [2].
This series primarily assumes the userspace usage only, though in-kernel
use cases may remain open for considerations.
== Points to Consider ==
Considering APX enabling by the kernel,
* New Register State
APX register state is managed by the XSAVE instruction set, with its
state component number assigned to 19, following XTILEDATA.
* XSAVE Buffer Offset
- In the compacted format used (for the in-kernel buffer), the APX
state will appear at a later position in the buffer.
- In the non-compacted format (used for signal, ptrace, and KVM ABIs),
APX is assigned a lower offset, occupying the space previously
reserved for the deprecated MPX state.
This should not bring any ABI changes. In the extended register state
area, each state's size and offset are determined dynamically via
CPUID.
* Kernel Assumptions:
The kernel generally assumes that higher-numbered components have
higher offsets, as discussed in the following section.
== Areas for Adjustment ==
With that in mind, here are a few key areas that need to be addressed to
properly handle the APX state. While these are thought to be pretty much
at this stage, I'd say it is still open to discover possible hindsights
in other areas.
1. Feature Conflict
While a valid CPU should not expose both MPX and APX, a broken or
misconfigured CPU could erroneously expose support for both. This hard
conflict should be avoided up front.
2. XSAVE Format Conversion
APX introduces an offset anomaly that requires special handling when
converting between compacted and non-compacted layouts. The kernel
relies on XSAVE instructions for context switching and signal
handling. However, xstate copy functions must correctly translate the
XSAVE format while ensuring sequential memory access, as enforced by
struct membuf.
3. XSAVE Size Calculation
The kernel calculates XSAVE buffer sizes based on the assumption that
the highest-numbered feature appears at the end. If APX is the last
bit set in the feature mask, the existing logic in
xstate_calculate_size() miscalculates the buffer size.
4. Offset Sanity Check
The kernel's boot-time sanity check in setup_xstate_cache() assumes
offsets increase with feature numbers. The APX state will conflict
with this assumption.
== Approaches ==
These two approaches are considerable:
Option 1, Consider APX as a one-off exception
Initially, I thought this approach would result in fewer changes to the
xstate code. However, it makes the code less comprehensive (or more
complicated) and introduces a risk. If another feature similar to APX
comes up, adding another exception seems to be inefficient and messy.
Option 2, Handle out-of-order offsets in general
Rather than treating APX as an exception, this approach adapts the
kernel to accommodate out-of-order offsets. By introducing a feature
order table and an accompanying macro, the traversal logic is
encapsulated cleanly, simplifying related code. This makes the kernel
more resilient to future features with non-sequential offsets.
== Series Summary ==
This patch set addresses the above issues before enabling APX support
The chosen approach (Option 2) adjusts the kernel to handle out-of-order
offsets. Here’s a breakdown of the patches:
* PART1: XSTATE Code Adjustment
- PATCH1: Remove the offset sanity check.
- PATCH2: Introduce the feature order table.
- PATCH3: Adjust XSAVE size calculation.
- PATCH4: Modify the xstate copy function.
* PART2: APX Enabling
- PATCH5: Enumerate the APX CPUID feature bit.
- PATCH6: Update xstate definitions to include APX.
- PATCH7: Ensure MPX and APX are mutually exclusive.
- PATCH8: Register APX in the supported xstate list.
- PATCH9: Add a self-test case for APX.
== Testing ==
The first part is agnostic to APX and should preserve the same
functionality for the currently supported features. The xstate tests --
covering context switching and ABI compatibility for signal and ptrace --
were executed to ensure there is no regression.
Now, with dynamic population of the ordering table, MPX and APX states
can be separately enabled in different systems:
* APX: PATCH9 applies to the same test set and builds on the xstate
selftest rework [3]. Since no hardware implementation is available at
this time, an internal Intel emulator was primarily used to verify the
test cases.
* MPX exposure in KVM guest was also validated from a Skylake machine.
---
The patches are based on the tip/master branch. The series can be found
here:
git://github.com/intel/apx.git apx_rfc-v2
Thanks,
Chang
[1] https://lore.kernel.org/lkml/20250227184502.10288-1-chang.seok.bae@intel.com/
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
[3] https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
[4] https://lore.kernel.org/lkml/547a1203-0339-7ad2-9505-eab027046298@intel.com/
Chang S. Bae (9):
x86/fpu/xstate: Remove xstate offset check
x86/fpu/xstate: Introduce xfeature order table and accessor macro
x86/fpu/xstate: Adjust XSAVE buffer size calculation
x86/fpu/xstate: Adjust xstate copying logic for user ABI
x86/cpufeatures: Add X86_FEATURE_APX
x86/fpu/apx: Define APX state component
x86/fpu/apx: Disallow conflicting MPX presence
x86/fpu/apx: Enable APX state support
selftests/x86/apx: Add APX test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 9 +++
arch/x86/include/asm/fpu/xstate.h | 3 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/fpu/xstate.c | 115 +++++++++++++++++++--------
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/apx.c | 10 +++
tools/testing/selftests/x86/xstate.c | 3 +-
tools/testing/selftests/x86/xstate.h | 2 +
10 files changed, 113 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
base-commit: 758ea9705c51865858c612f591c6e6950dcafccf
--
2.45.2
* Chang S. Bae <chang.seok.bae@intel.com> wrote: > Chang S. Bae (9): > x86/fpu/xstate: Remove xstate offset check > x86/fpu/xstate: Introduce xfeature order table and accessor macro > x86/fpu/xstate: Adjust XSAVE buffer size calculation > x86/fpu/xstate: Adjust xstate copying logic for user ABI So I've applied these first 4 as preparatory patches to tip:x86/fpu, let's get these tested a bit broadly. > x86/cpufeatures: Add X86_FEATURE_APX > x86/fpu/apx: Define APX state component > x86/fpu/apx: Disallow conflicting MPX presence > x86/fpu/apx: Enable APX state support > selftests/x86/apx: Add APX test Planning to apply these remaining 5 as well in a couple of days, if all goes well with the others. Thanks, Ingo
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d1610d8b..542c6981180d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -216,9 +216,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -246,16 +243,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
--
2.45.2
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: 031b33ef1a6a1129a1a02a16b89608ded2eff9be
Gitweb: https://git.kernel.org/tip/031b33ef1a6a1129a1a02a16b89608ded2eff9be
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:52 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Remove xstate offset check
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d16..542c698 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -216,9 +216,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -246,16 +243,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 937a3c877f7c84103891a2319e24626bf3a4d8a5
Gitweb: https://git.kernel.org/tip/937a3c877f7c84103891a2319e24626bf3a4d8a5
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:52 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:02:20 +01:00
x86/fpu/xstate: Remove xstate offset check
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d16..542c698 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -216,9 +216,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -246,16 +243,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/all/7fa02be2-0884-4702-ae73-a3620938161b@intel.com
---
RFC-V1 -> RFC-V2: Populate the order table dynamically (Dave).
This introduction lays the groundwork for handling APX, which is assigned
feature number 19 but appears immediately after FEATURE_YMM in
APX-enabled systems. Older CPUs, such as Skylake systems, previously used
this region for MPX.
Later in this series, APX and MPX will be explicitly marked as mutually
exclusive to prevent conflicts. Since the xfeature order table is
dynamically populated at boot, it will reflect the correct feature set
for each system configuration.
---
arch/x86/kernel/fpu/xstate.c | 58 +++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 542c6981180d..1e22103a8e17 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,6 +14,7 @@
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
#include <linux/coredump.h>
+#include <linux/sort.h>
#include <asm/fpu/api.h>
#include <asm/fpu/regset.h>
@@ -88,6 +89,31 @@ static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
{ [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
+/*
+ * Ordering of xstate components in uncompacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features.
+ */
+static unsigned int xfeature_uncompact_order[XFEATURE_MAX] __ro_after_init =
+ { [ 0 ... XFEATURE_MAX - 1] = -1};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; xfeature_uncompact_order[i] != -1; i++) {
+ if (mask & BIT_ULL(xfeature_uncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in uncompacted order: */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = 0; \
+ i = next_xfeature_order(i, mask), \
+ xfeature_uncompact_order[i] != -1; \
+ i++)
+
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
@@ -209,13 +235,20 @@ static bool xfeature_enabled(enum xfeature xfeature)
return fpu_kernel_cfg.max_features & BIT_ULL(xfeature);
}
+static int compare_xstate_offsets(const void *xfeature1, const void *xfeature2)
+{
+ return xstate_offsets[*(unsigned int *)xfeature1] -
+ xstate_offsets[*(unsigned int *)xfeature2];
+}
+
/*
* Record the offsets and sizes of various xstates contained
- * in the XSAVE state memory layout.
+ * in the XSAVE state memory layout. Also, create an ordered
+ * list of xfeatures for handling out-of-order offsets.
*/
static void __init setup_xstate_cache(void)
{
- u32 eax, ebx, ecx, edx, i;
+ u32 eax, ebx, ecx, edx, xfeature, i = 0;
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -229,21 +262,30 @@ static void __init setup_xstate_cache(void)
xstate_sizes[XFEATURE_SSE] = sizeof_field(struct fxregs_state,
xmm_space);
- for_each_extended_xfeature(i, fpu_kernel_cfg.max_features) {
- cpuid_count(CPUID_LEAF_XSTATE, i, &eax, &ebx, &ecx, &edx);
+ for_each_extended_xfeature(xfeature, fpu_kernel_cfg.max_features) {
+ cpuid_count(CPUID_LEAF_XSTATE, xfeature, &eax, &ebx, &ecx, &edx);
- xstate_sizes[i] = eax;
- xstate_flags[i] = ecx;
+ xstate_sizes[xfeature] = eax;
+ xstate_flags[xfeature] = ecx;
/*
* If an xfeature is supervisor state, the offset in EBX is
* invalid, leave it to -1.
*/
- if (xfeature_is_supervisor(i))
+ if (xfeature_is_supervisor(xfeature))
continue;
- xstate_offsets[i] = ebx;
+ xstate_offsets[xfeature] = ebx;
+
+ /* Populate the list of xfeatures before sorting */
+ xfeature_uncompact_order[i++] = xfeature;
}
+
+ /*
+ * Sort xfeatures by their offsets to support out-of-order
+ * offsets in the uncompacted format.
+ */
+ sort(xfeature_uncompact_order, i, sizeof(unsigned int), compare_xstate_offsets, NULL);
}
/*
--
2.45.2
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: 15d51a2f6f3f7057311d6e37d5190d58597a94a9
Gitweb: https://git.kernel.org/tip/15d51a2f6f3f7057311d6e37d5190d58597a94a9
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:53 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Introduce xfeature order table and accessor macro
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 58 ++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 542c698..1e22103 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,6 +14,7 @@
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
#include <linux/coredump.h>
+#include <linux/sort.h>
#include <asm/fpu/api.h>
#include <asm/fpu/regset.h>
@@ -88,6 +89,31 @@ static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
{ [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
+/*
+ * Ordering of xstate components in uncompacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features.
+ */
+static unsigned int xfeature_uncompact_order[XFEATURE_MAX] __ro_after_init =
+ { [ 0 ... XFEATURE_MAX - 1] = -1};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; xfeature_uncompact_order[i] != -1; i++) {
+ if (mask & BIT_ULL(xfeature_uncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in uncompacted order: */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = 0; \
+ i = next_xfeature_order(i, mask), \
+ xfeature_uncompact_order[i] != -1; \
+ i++)
+
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
@@ -209,13 +235,20 @@ static bool xfeature_enabled(enum xfeature xfeature)
return fpu_kernel_cfg.max_features & BIT_ULL(xfeature);
}
+static int compare_xstate_offsets(const void *xfeature1, const void *xfeature2)
+{
+ return xstate_offsets[*(unsigned int *)xfeature1] -
+ xstate_offsets[*(unsigned int *)xfeature2];
+}
+
/*
* Record the offsets and sizes of various xstates contained
- * in the XSAVE state memory layout.
+ * in the XSAVE state memory layout. Also, create an ordered
+ * list of xfeatures for handling out-of-order offsets.
*/
static void __init setup_xstate_cache(void)
{
- u32 eax, ebx, ecx, edx, i;
+ u32 eax, ebx, ecx, edx, xfeature, i = 0;
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -229,21 +262,30 @@ static void __init setup_xstate_cache(void)
xstate_sizes[XFEATURE_SSE] = sizeof_field(struct fxregs_state,
xmm_space);
- for_each_extended_xfeature(i, fpu_kernel_cfg.max_features) {
- cpuid_count(CPUID_LEAF_XSTATE, i, &eax, &ebx, &ecx, &edx);
+ for_each_extended_xfeature(xfeature, fpu_kernel_cfg.max_features) {
+ cpuid_count(CPUID_LEAF_XSTATE, xfeature, &eax, &ebx, &ecx, &edx);
- xstate_sizes[i] = eax;
- xstate_flags[i] = ecx;
+ xstate_sizes[xfeature] = eax;
+ xstate_flags[xfeature] = ecx;
/*
* If an xfeature is supervisor state, the offset in EBX is
* invalid, leave it to -1.
*/
- if (xfeature_is_supervisor(i))
+ if (xfeature_is_supervisor(xfeature))
continue;
- xstate_offsets[i] = ebx;
+ xstate_offsets[xfeature] = ebx;
+
+ /* Populate the list of xfeatures before sorting */
+ xfeature_uncompact_order[i++] = xfeature;
}
+
+ /*
+ * Sort xfeatures by their offsets to support out-of-order
+ * offsets in the uncompacted format.
+ */
+ sort(xfeature_uncompact_order, i, sizeof(unsigned int), compare_xstate_offsets, NULL);
}
/*
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: b03c328e9711b4af8e4a433a1c4dad38f2e160a2
Gitweb: https://git.kernel.org/tip/b03c328e9711b4af8e4a433a1c4dad38f2e160a2
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:53 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:02:20 +01:00
x86/fpu/xstate: Introduce xfeature order table and accessor macro
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 58 ++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 542c698..1e22103 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,6 +14,7 @@
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
#include <linux/coredump.h>
+#include <linux/sort.h>
#include <asm/fpu/api.h>
#include <asm/fpu/regset.h>
@@ -88,6 +89,31 @@ static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
{ [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
+/*
+ * Ordering of xstate components in uncompacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features.
+ */
+static unsigned int xfeature_uncompact_order[XFEATURE_MAX] __ro_after_init =
+ { [ 0 ... XFEATURE_MAX - 1] = -1};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; xfeature_uncompact_order[i] != -1; i++) {
+ if (mask & BIT_ULL(xfeature_uncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in uncompacted order: */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = 0; \
+ i = next_xfeature_order(i, mask), \
+ xfeature_uncompact_order[i] != -1; \
+ i++)
+
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
@@ -209,13 +235,20 @@ static bool xfeature_enabled(enum xfeature xfeature)
return fpu_kernel_cfg.max_features & BIT_ULL(xfeature);
}
+static int compare_xstate_offsets(const void *xfeature1, const void *xfeature2)
+{
+ return xstate_offsets[*(unsigned int *)xfeature1] -
+ xstate_offsets[*(unsigned int *)xfeature2];
+}
+
/*
* Record the offsets and sizes of various xstates contained
- * in the XSAVE state memory layout.
+ * in the XSAVE state memory layout. Also, create an ordered
+ * list of xfeatures for handling out-of-order offsets.
*/
static void __init setup_xstate_cache(void)
{
- u32 eax, ebx, ecx, edx, i;
+ u32 eax, ebx, ecx, edx, xfeature, i = 0;
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -229,21 +262,30 @@ static void __init setup_xstate_cache(void)
xstate_sizes[XFEATURE_SSE] = sizeof_field(struct fxregs_state,
xmm_space);
- for_each_extended_xfeature(i, fpu_kernel_cfg.max_features) {
- cpuid_count(CPUID_LEAF_XSTATE, i, &eax, &ebx, &ecx, &edx);
+ for_each_extended_xfeature(xfeature, fpu_kernel_cfg.max_features) {
+ cpuid_count(CPUID_LEAF_XSTATE, xfeature, &eax, &ebx, &ecx, &edx);
- xstate_sizes[i] = eax;
- xstate_flags[i] = ecx;
+ xstate_sizes[xfeature] = eax;
+ xstate_flags[xfeature] = ecx;
/*
* If an xfeature is supervisor state, the offset in EBX is
* invalid, leave it to -1.
*/
- if (xfeature_is_supervisor(i))
+ if (xfeature_is_supervisor(xfeature))
continue;
- xstate_offsets[i] = ebx;
+ xstate_offsets[xfeature] = ebx;
+
+ /* Populate the list of xfeatures before sorting */
+ xfeature_uncompact_order[i++] = xfeature;
}
+
+ /*
+ * Sort xfeatures by their offsets to support out-of-order
+ * offsets in the uncompacted format.
+ */
+ sort(xfeature_uncompact_order, i, sizeof(unsigned int), compare_xstate_offsets, NULL);
}
/*
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103a8e17..93f94013b094 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -581,13 +581,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_uncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
--
2.45.2
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: a758ae2885eacaa68e4de9a01539a242a6bbb403
Gitweb: https://git.kernel.org/tip/a758ae2885eacaa68e4de9a01539a242a6bbb403
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:54 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Adjust XSAVE buffer size calculation
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103..93f9401 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -581,13 +581,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_uncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: a6842ee9b5be3223cdb0c8fee3f6d71c274f68ba
Gitweb: https://git.kernel.org/tip/a6842ee9b5be3223cdb0c8fee3f6d71c274f68ba
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:54 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:21:20 +01:00
x86/fpu/xstate: Adjust XSAVE buffer size calculation
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103..93f9401 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -581,13 +581,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_uncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 93f94013b094..46c45e2f2a5a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1107,10 +1107,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1179,15 +1178,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1197,14 +1197,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
--
2.45.2
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: cbe8e4dab16c56ac84765dcd53e418160c8bc0db
Gitweb: https://git.kernel.org/tip/cbe8e4dab16c56ac84765dcd53e418160c8bc0db
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Adjust xstate copying logic for user ABI
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 93f9401..46c45e2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1107,10 +1107,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1179,15 +1178,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1197,14 +1197,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: ed91ad084ef9fa49f6c6dfef2cb10c12ce3786a6
Gitweb: https://git.kernel.org/tip/ed91ad084ef9fa49f6c6dfef2cb10c12ce3786a6
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:21:20 +01:00
x86/fpu/xstate: Adjust xstate copying logic for user ABI
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 93f9401..46c45e2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1107,10 +1107,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1179,15 +1178,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1197,14 +1197,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 8693f035b9d6814082aed41234f3459560ce81b2
Gitweb: https://git.kernel.org/tip/8693f035b9d6814082aed41234f3459560ce81b2
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:02:21 +01:00
x86/fpu/xstate: Adjust xstate copying logic for user ABI
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103..61b3a49 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1100,10 +1100,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1172,15 +1171,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1190,14 +1190,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Allocating a new feature word for this bit seems excessive at this stage,
given that no other imminent features are quite known.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 8b7cf13e0acb..51178d4a6308 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -479,6 +479,7 @@
#define X86_FEATURE_AMD_FAST_CPPC (21*32 + 5) /* Fast CPPC */
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
+#define X86_FEATURE_APX (21*32 + 8) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a2fbea0be535..d5e5013e0e9f 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{}
};
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..6c40d5af8479 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
{ X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ 0, 0, 0, 0, 0 }
};
--
2.45.2
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Rebased onto v6.15-rc1 with commit:
968e9bc4cef8 ("x86: move ZMM exclusion list into CPU feature flag")
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..5445937eff3d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32 + 9) /* Advanced #Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 94c062cddfa4..7c1268138a7a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -88,6 +88,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
{ X86_FEATURE_SPEC_CTRL_SSBD, X86_FEATURE_SPEC_CTRL },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{}
};
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..6c40d5af8479 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
{ X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ 0, 0, 0, 0, 0 }
};
--
2.45.2
On 4/11/2025 9:12 AM, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 6c2c152d8a67..5445937eff3d 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -481,6 +481,7 @@
> #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
> #define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
> #define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
> +#define X86_FEATURE_APX (21*32 + 9) /* Advanced #Performance Extensions */
>
Is the '#' before 'Performance' intentional? The previous version didn't
seem to have it.
> /*
> * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 94c062cddfa4..7c1268138a7a 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -88,6 +88,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
> { X86_FEATURE_FRED, X86_FEATURE_LKGS },
> { X86_FEATURE_SPEC_CTRL_SSBD, X86_FEATURE_SPEC_CTRL },
> + { X86_FEATURE_APX, X86_FEATURE_XSAVE },
I don't think this table follows any specific logic, but recent patches
have tried grouping by similar features or features with similar
dependencies.
I don't have a preference, but should this be inserted closer to other
XSAVE dependent features?
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
> {}
> };
>
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 16f3ca30626a..6c40d5af8479 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
> { X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
> { X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
> + { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
There is a comment on top of that table that says:
"Please keep the leaf sorted by cpuid_bit.level for faster search."
Based on that, APX should be inserted here:
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+{ X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
> { 0, 0, 0, 0, 0 }
> };
>
On 4/11/2025 9:54 AM, Sohil Mehta wrote:
> On 4/11/2025 9:12 AM, Chang S. Bae wrote:
>
>> +#define X86_FEATURE_APX (21*32 + 9) /* Advanced #Performance Extensions */
>
> Is the '#' before 'Performance' intentional? The previous version didn't
> seem to have it.
Oh no — I only meant to update the bit position. My bad!
> I don't think this table follows any specific logic, but recent patches
> have tried grouping by similar features or features with similar
> dependencies.
>
> I don't have a preference, but should this be inserted closer to other
> XSAVE dependent features?
>
> { X86_FEATURE_PKU, X86_FEATURE_XSAVE },
> { X86_FEATURE_MPX, X86_FEATURE_XSAVE },
> { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
> + { X86_FEATURE_APX, X86_FEATURE_XSAVE },
> { X86_FEATURE_CMOV, X86_FEATURE_FXSR },
> There is a comment on top of that table that says:
> "Please keep the leaf sorted by cpuid_bit.level for faster search."
>
> Based on that, APX should be inserted here:
>
> { X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
> +{ X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
> { X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },Yeah, both suggestions make sense. Thanks for pointing them out!
I've attached the patch revision.
Thanks,
Chang
From a3a68edf35c43162a57dad80ea2eecc9c1101246 Mon Sep 17 00:00:00 2001
From: "Chang S. Bae" <chang.seok.bae@intel.com>
Date: Thu, 13 Feb 2025 23:00:28 -0800
Subject: [PATCH RFC v2b 5/9] x86/cpufeatures: Add X86_FEATURE_APX
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V2a -> RFC-V2b: Fix typo and organize APX entries orderly (Sohil)
RFC-V2 -> RFC-V2a: Rebased onto v6.15-rc1; resolve conflict with commit
968e9bc4cef8 ("x86: move ZMM exclusion list into CPU feature flag")
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..eb73f3f0ec70 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32 + 9) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 94c062cddfa4..46efcbd6afa4 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -28,6 +28,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
{ X86_FEATURE_MMX, X86_FEATURE_FXSR },
{ X86_FEATURE_MMXEXT, X86_FEATURE_MMX },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..ffb80b5ad97f 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
--
2.45.2
On 4/11/2025 11:23 AM, Chang S. Bae wrote: > I've attached the patch revision. > LGTM, Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
* Sohil Mehta <sohil.mehta@intel.com> wrote: > On 4/11/2025 11:23 AM, Chang S. Bae wrote: > > > I've attached the patch revision. > > > > LGTM, > > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Chang, mind sending a series of the latest version of all the pending APX patches you have at the moment (and any other pending FPU patches you may have), with Reviewed-by tags rolled in, etc., on top of: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu Thanks, Ingo
On 4/12/2025 1:43 AM, Ingo Molnar wrote:
>
> Chang, mind sending a series of the latest version of all the pending
> APX patches you have at the moment (and any other pending FPU patches
> you may have), with Reviewed-by tags rolled in, etc.
Hi Ingo,
Here’s the updated patch set following up on the previous APX series [1],
along with a collection of additional FPU-related cleanups and
improvements that were previously posted or discussed.
The series is organized into two parts:
1. APX Enabling (PATCH 1–5)
These patches complete the APX bring-up. After laying the groundwork,
this portion finalizes the enablement:
* Patches 1, 2, and 4 are typical xfeature plumbing.
* Patch 3 handles MPX conflict -- unexpected hardware issue
* Patch 5 adds a test case.
2. Miscellaneous FPU Code Improvements (PATCH 6–10)
This batch includes various standalone improvements:
* Patch 6: Centralizes the XSAVE disablement message
* Patches 7-8: Simplifies PKRU update in XSTATE_BV on sigframe
* Patch 9: Removes the unused mxcsr_feature_mask export
* Patch 10: Renames fpu_reset_fpregs() for clarity
Each patch includes context and links to earlier discussions or
revisions.
As you noted in [2], the series is based on the x86/fpu branch and is
available at:
git://github.com/intel/apx.git apx-and-misc
[1]: https://lore.kernel.org/lkml/20250320234301.8342-1-chang.seok.bae@intel.com
[2]: https://lore.kernel.org/lkml/Z_zGCCNE_Qt3IlMZ@gmail.com
Thanks,
Chang
Chang S. Bae (10):
x86/cpufeatures: Add X86_FEATURE_APX
x86/fpu/apx: Define APX state component
x86/fpu/apx: Disallow conflicting MPX presence
x86/fpu/apx: Enable APX state support
selftests/x86/apx: Add APX test
x86/fpu: Log XSAVE disablement consistently
x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
x86/pkeys: Simplify PKRU update in signal frame
x86/fpu: Remove export of mxcsr_feature_mask
x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs()
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/fpu/core.c | 6 +++---
arch/x86/kernel/fpu/init.c | 1 -
arch/x86/kernel/fpu/signal.c | 11 +----------
arch/x86/kernel/fpu/xstate.c | 23 ++++++++++++++++++++---
arch/x86/kernel/fpu/xstate.h | 21 +++++++++++++++------
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
14 files changed, 69 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
base-commit: e3a52b67f54aa36ab21265eeea016460b5fe1c46
--
2.45.2
* Chang S. Bae <chang.seok.bae@intel.com> wrote: > On 4/12/2025 1:43 AM, Ingo Molnar wrote: > > > > Chang, mind sending a series of the latest version of all the pending > > APX patches you have at the moment (and any other pending FPU patches > > you may have), with Reviewed-by tags rolled in, etc. > > Hi Ingo, > > Here’s the updated patch set following up on the previous APX series [1], > along with a collection of additional FPU-related cleanups and > improvements that were previously posted or discussed. > > The series is organized into two parts: > > 1. APX Enabling (PATCH 1–5) > > These patches complete the APX bring-up. After laying the groundwork, > this portion finalizes the enablement: > > * Patches 1, 2, and 4 are typical xfeature plumbing. > > * Patch 3 handles MPX conflict -- unexpected hardware issue > > * Patch 5 adds a test case. > > 2. Miscellaneous FPU Code Improvements (PATCH 6–10) > > This batch includes various standalone improvements: > > * Patch 6: Centralizes the XSAVE disablement message > > * Patches 7-8: Simplifies PKRU update in XSTATE_BV on sigframe > > * Patch 9: Removes the unused mxcsr_feature_mask export > > * Patch 10: Renames fpu_reset_fpregs() for clarity > > Each patch includes context and links to earlier discussions or > revisions. > 14 files changed, 69 insertions(+), 26 deletions(-) Applied to tip:x86/fpu, thanks! Note that I've merged the currently pending tip:x86/cpu bits into tip:x86/fpu before applying these patches, to resolve a conflict with <asm/cpufeatures.h>, and re-formatted the new X86_FEATURE_APX line to have the canonical format. Ingo
On 4/16/2025 1:07 AM, Ingo Molnar wrote: > >> 14 files changed, 69 insertions(+), 26 deletions(-) > > Applied to tip:x86/fpu, thanks! > > Note that I've merged the currently pending tip:x86/cpu bits into > tip:x86/fpu before applying these patches, to resolve a conflict with > <asm/cpufeatures.h>, and re-formatted the new X86_FEATURE_APX line to > have the canonical format. Appreciate! Chang
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-6-chang.seok.bae@intel.com/
* Rebase onto v6.15-rc1; resolve conflict with commit:
968e9bc4cef8 ("x86: move ZMM exclusion list into CPU feature flag")
* Organize APX entries orderly (Sohil). Then, include his review tag
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..eb73f3f0ec70 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32 + 9) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a2fbea0be535..72f4fb66ac20 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -28,6 +28,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
{ X86_FEATURE_MMX, X86_FEATURE_FXSR },
{ X86_FEATURE_MMXEXT, X86_FEATURE_MMX },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..ffb80b5ad97f 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: b02dc185ee86836cf1d8a37b81349374e4018ee0
Gitweb: https://git.kernel.org/tip/b02dc185ee86836cf1d8a37b81349374e4018ee0
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:51 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:13 +02:00
x86/cpufeatures: Add X86_FEATURE_APX
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-2-chang.seok.bae@intel.com
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index bc81b9d..478ab36 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HTR_CORES (21*32+ 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32+ 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32+ 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32+ 9) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 94c062c..46efcbd 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -28,6 +28,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
{ X86_FEATURE_MMX, X86_FEATURE_FXSR },
{ X86_FEATURE_MMXEXT, X86_FEATURE_MMX },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index c75c57b..dbf6d71 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
Advanced Performance Extensions (APX) is associated with a new state
component number 19. To support saving and restoring of the corresponding
registers via the XSAVE mechanism, introduce the component definition
along with the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly list the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-7-chang.seok.bae@intel.com/
* Move the check to be grouped with other XCHECK_SZ() entries (Sohil)
* Massage the changelog (Sohil)
* Add Sohil's tag
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..97310df3ea13 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -303,6 +305,13 @@ struct xtile_data {
struct reg_1024_byte tmm;
} __packed;
+/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a288597065fd..dfd07af10037 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -63,6 +63,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -569,6 +571,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: bd0b10b795c5c4c587e83c0498251356874c655c
Gitweb: https://git.kernel.org/tip/bd0b10b795c5c4c587e83c0498251356874c655c
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:52 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
x86/fpu/apx: Define APX state component
Advanced Performance Extensions (APX) is associated with a new state
component number 19. To support saving and restoring of the corresponding
registers via the XSAVE mechanism, introduce the component definition
along with the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly list the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-3-chang.seok.bae@intel.com
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862..97310df 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -304,6 +306,13 @@ struct xtile_data {
} __packed;
/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
+/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
*/
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a288597..dfd07af 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -63,6 +63,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -569,6 +571,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
XSTATE components are architecturally independent. There is no rule
requiring their offsets in the non-compacted format to be strictly
ascending or mutually non-overlapping. However, in practice, such
overlaps have not occurred -- until now.
APX is introduced as xstate component 19, following AMX. In the
non-compacted XSAVE format, its offset overlaps with the space previously
occupied by the now-deprecated MPX feature:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
To prevent conflicts, the kernel must ensure the CPU never expose both
features at the same time. If so, it indicates unreliable hardware. In
such cases, XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Note: MPX state components remain enabled on legacy systems solely for
KVM guest support.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-8-chang.seok.bae@intel.com/
* Add background in the changelog (Sohil/Dave)
* Clarify XSAVE disablement (Sohil).
* Collect review tag
The related warning message will be moved to the XSAVE-disabling function
in patch 6, per Dave’s suggestion.
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index dfd07af10037..14f5c1bb2080 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: ea68e39190cff86f457bd286c70b535e2a99a94d
Gitweb: https://git.kernel.org/tip/ea68e39190cff86f457bd286c70b535e2a99a94d
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:53 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
x86/fpu/apx: Disallow conflicting MPX presence
XSTATE components are architecturally independent. There is no rule
requiring their offsets in the non-compacted format to be strictly
ascending or mutually non-overlapping. However, in practice, such
overlaps have not occurred -- until now.
APX is introduced as xstate component 19, following AMX. In the
non-compacted XSAVE format, its offset overlaps with the space previously
occupied by the now-deprecated MPX feature:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
To prevent conflicts, the kernel must ensure the CPU never expose both
features at the same time. If so, it indicates unreliable hardware. In
such cases, XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Note: MPX state components remain enabled on legacy systems solely for
KVM guest support.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-4-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index dfd07af..14f5c1b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-9-chang.seok.bae@intel.com/
* Include review tag
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..b308a76afbb7 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -32,7 +32,8 @@
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 14f5c1bb2080..2ac1fc182273 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 50c5b071e2833d2b61e3774cd792620311df157c
Gitweb: https://git.kernel.org/tip/50c5b071e2833d2b61e3774cd792620311df157c
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:54 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
x86/fpu/apx: Enable APX state support
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-5-chang.seok.bae@intel.com
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7..b308a76 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -32,7 +32,8 @@
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 14f5c1b..2ac1fc1 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-10-chang.seok.bae@intel.com/
* Add review tag
Some might view this standalone test as potentially converging with other
tests. I encountered similar situation when posting the selftest rework:
https://lore.kernel.org/lkml/20250226010731.2456-10-chang.seok.bae@intel.com/
Maybe it's worthwhile to clarify the consideration again here:
Alternatively, this invocation could be placed directly in
xstate.c::main(). However, the current test file naming convention,
which clearly specifies the tested area, seems reasonable. Adding apx.c
considerably aligns with that convention.
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
4 files changed, 16 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c32cc8f..f703fcfe9f7c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 000000000000..d9c8d41b8c5a
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c964ea..97fe4bd8bc77 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36ec852f..e91e3092b5d2 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -59,6 +60,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: ab6f87ddd0c6d3fb114cdf897eb9839cbd429439
Gitweb: https://git.kernel.org/tip/ab6f87ddd0c6d3fb114cdf897eb9839cbd429439
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
selftests/x86/apx: Add APX test
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-6-chang.seok.bae@intel.com
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
4 files changed, 16 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c3..f703fcf 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 0000000..d9c8d41
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c..97fe4bd 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36e..e91e309 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -59,6 +60,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
Not all paths that lead to fpu__init_disable_system_xstate() currently
emit a message indicating that XSAVE has been disabled. Move the print
statement into the function to ensure the message in all cases.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/lkml/d6d19e39-2749-4d45-aeab-a209a0ecba17@intel.com
---
New patch for following up patch 3.
---
arch/x86/kernel/fpu/xstate.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2ac1fc182273..8b14c9d3a1df 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -751,6 +751,8 @@ static int __init init_xstate_size(void)
*/
static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
{
+ pr_info("x86/fpu: XSAVE disabled\n");
+
fpu_kernel_cfg.max_features = 0;
cr4_clear_bits(X86_CR4_OSXSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
@@ -821,7 +823,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* This is a problematic CPU configuration where two
* conflicting state components are both enumerated.
*/
- pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
fpu_kernel_cfg.max_features);
goto out_disable;
}
@@ -900,7 +902,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
init_fpstate.xfeatures = fpu_kernel_cfg.default_features;
if (init_fpstate.size > sizeof(init_fpstate.regs)) {
- pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
+ pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
sizeof(init_fpstate.regs), init_fpstate.size);
goto out_disable;
}
@@ -912,7 +914,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* xfeatures mask.
*/
if (xfeatures != fpu_kernel_cfg.max_features) {
- pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE\n",
+ pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init\n",
xfeatures, fpu_kernel_cfg.max_features);
goto out_disable;
}
--
2.45.2
On 4/15/2025 7:16 PM, Chang S. Bae wrote:
> Not all paths that lead to fpu__init_disable_system_xstate() currently
> emit a message indicating that XSAVE has been disabled. Move the print
> statement into the function to ensure the message in all cases.
>
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Link: https://lore.kernel.org/lkml/d6d19e39-2749-4d45-aeab-a209a0ecba17@intel.com
> ---
> New patch for following up patch 3.
> ---
> arch/x86/kernel/fpu/xstate.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 2ac1fc182273..8b14c9d3a1df 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -751,6 +751,8 @@ static int __init init_xstate_size(void)
> */
> static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
> {
> + pr_info("x86/fpu: XSAVE disabled\n");
> +
There is a mix of pr_info(), pr_warn() and pr_err() to log these related
messages. Would it be useful to make the log level consistent in this
patch or a follow-up?
I think new the "XSAVE disabled" print should be a pr_warn() at least.
> fpu_kernel_cfg.max_features = 0;
> cr4_clear_bits(X86_CR4_OSXSAVE);
> setup_clear_cpu_cap(X86_FEATURE_XSAVE);
> @@ -821,7 +823,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> * This is a problematic CPU configuration where two
> * conflicting state components are both enumerated.
> */
> - pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
> + pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
> fpu_kernel_cfg.max_features);
> goto out_disable;
> }
> @@ -900,7 +902,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> init_fpstate.xfeatures = fpu_kernel_cfg.default_features;
>
> if (init_fpstate.size > sizeof(init_fpstate.regs)) {
> - pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
> + pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
> sizeof(init_fpstate.regs), init_fpstate.size);
> goto out_disable;
> }
> @@ -912,7 +914,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> * xfeatures mask.
> */
> if (xfeatures != fpu_kernel_cfg.max_features) {
> - pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE\n",
> + pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init\n",
> xfeatures, fpu_kernel_cfg.max_features);
> goto out_disable;
> }
On 4/16/2025 9:56 AM, Sohil Mehta wrote:
> On 4/15/2025 7:16 PM, Chang S. Bae wrote:
>>
>> static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
>> {
>> + pr_info("x86/fpu: XSAVE disabled\n");
>> +
>
> There is a mix of pr_info(), pr_warn() and pr_err() to log these related
> messages. Would it be useful to make the log level consistent in this
> patch or a follow-up?
>
> I think new the "XSAVE disabled" print should be a pr_warn() at least.
I think pr_info() makes sense here, as it aligns with this hunk:
static __init void disable_smp(void)
{
pr_info("SMP disabled\n");
disable_ioapic_support();
topology_reset_possible_cpus_up();
cpumask_set_cpu(0, topology_sibling_cpumask(0));
cpumask_set_cpu(0, topology_core_cpumask(0));
cpumask_set_cpu(0, topology_die_cpumask(0));
}
If you strongly feel that pr_warn() is more appropriate, then it would
make sense to update all related messages consistently. But honestly, I
don’t think it’s a big deal either way.
On 4/16/2025 10:03 AM, Chang S. Bae wrote:
> If you strongly feel that pr_warn() is more appropriate, then it would
> make sense to update all related messages consistently. But honestly, I
> don’t think it’s a big deal either way.
I don't have a strong preference.
But, for someone who is using log levels to reduce the kernel log
details, the below fpu-specific messages probably don't seem to be in
priority order.
pr_info("x86/fpu: XSAVE disabled\n");
pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features:
0x%llx.\n",
pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during
init\n",
pr_err("x86/fpu: FP/SSE not present amongst the CPU's xstate features:
0x%llx.\n",
"XSAVE disabled" seems most important, with the rest just being internal
details. Maybe it's just me... Feel free to ignore unless someone else
chimes in.
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 39cd7fad39ce2ffbeb21939c805ee55f3ec808d4
Gitweb: https://git.kernel.org/tip/39cd7fad39ce2ffbeb21939c805ee55f3ec808d4
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:56 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:15 +02:00
x86/fpu: Log XSAVE disablement consistently
Not all paths that lead to fpu__init_disable_system_xstate() currently
emit a message indicating that XSAVE has been disabled. Move the print
statement into the function to ensure the message in all cases.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250416021720.12305-7-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2ac1fc1..8b14c9d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -751,6 +751,8 @@ static int __init init_xstate_size(void)
*/
static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
{
+ pr_info("x86/fpu: XSAVE disabled\n");
+
fpu_kernel_cfg.max_features = 0;
cr4_clear_bits(X86_CR4_OSXSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
@@ -821,7 +823,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* This is a problematic CPU configuration where two
* conflicting state components are both enumerated.
*/
- pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
fpu_kernel_cfg.max_features);
goto out_disable;
}
@@ -900,7 +902,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
init_fpstate.xfeatures = fpu_kernel_cfg.default_features;
if (init_fpstate.size > sizeof(init_fpstate.regs)) {
- pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
+ pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
sizeof(init_fpstate.regs), init_fpstate.size);
goto out_disable;
}
@@ -912,7 +914,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* xfeatures mask.
*/
if (xfeatures != fpu_kernel_cfg.max_features) {
- pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE\n",
+ pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init\n",
xfeatures, fpu_kernel_cfg.max_features);
goto out_disable;
}
Currently, saving register states in the signal frame, the legacy feature
bits are always set in xregs_state->header->xfeatures. This code sequence
can be generalized for reuse in similar cases.
Refactor the logic to ensure a consistent approach across similar usages.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250214010607.7067-2-chang.seok.bae@intel.com/
* No change
This patch and the next were previously posted together. I thought this
refactoring is a meaningful step toward decoupling PKRU from an
unnecessary dependency on XGETBV(1).
---
arch/x86/kernel/fpu/signal.c | 11 +----------
arch/x86/kernel/fpu/xstate.h | 12 ++++++++++++
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index b8b4fa9c2d04..c3ec2512f2bb 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -114,7 +114,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
{
struct xregs_state __user *x = buf;
struct _fpx_sw_bytes sw_bytes = {};
- u32 xfeatures;
int err;
/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
@@ -127,12 +126,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
err |= __put_user(FP_XSTATE_MAGIC2,
(__u32 __user *)(buf + fpstate->user_size));
- /*
- * Read the xfeatures which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers.
- */
- err |= __get_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
-
/*
* For legacy compatible, we always set FP/SSE bits in the bit
* vector while saving the state to the user context. This will
@@ -144,9 +137,7 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
* header as well as change any contents in the memory layout.
* xrestore as part of sigreturn will capture all the changes.
*/
- xfeatures |= XFEATURE_MASK_FPSSE;
-
- err |= __put_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
+ err |= set_xfeature_in_sigframe(x, XFEATURE_MASK_FPSSE);
return !err;
}
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 9a3a8ccf13bf..aadf02aed071 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -69,6 +69,18 @@ static inline u64 xfeatures_mask_independent(void)
return fpu_kernel_cfg.independent_features;
}
+static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
+{
+ u64 xfeatures;
+ int err;
+
+ /* Read the xfeatures value already saved in the user buffer */
+ err = __get_user(xfeatures, &xbuf->header.xfeatures);
+ xfeatures |= mask;
+ err |= __put_user(xfeatures, &xbuf->header.xfeatures);
+ return err;
+}
+
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
--
2.45.2
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> +static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
> +{
> + u64 xfeatures;
> + int err;
> +
> + /* Read the xfeatures value already saved in the user buffer */
> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
> + xfeatures |= mask;
> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
> + return err;
> +}
For future reference, please put an extra newline before 'return'
statements, so that the code looks more 'balanced':
> +{
> + u64 xfeatures;
> + int err;
> +
> + /* Read the xfeatures value already saved in the user buffer */
> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
> + xfeatures |= mask;
> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
> +
> + return err;
> +}
I've done that for this patch, so no need to resend it.
Thanks,
Ingo
On 4/16/2025 1:05 AM, Ingo Molnar wrote:
>
> * Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
>> +static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
>> +{
>> + u64 xfeatures;
>> + int err;
>> +
>> + /* Read the xfeatures value already saved in the user buffer */
>> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
>> + xfeatures |= mask;
>> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
>> + return err;
>> +}
>
> For future reference, please put an extra newline before 'return'
> statements, so that the code looks more 'balanced':
>
>> +{
>> + u64 xfeatures;
>> + int err;
>> +
>> + /* Read the xfeatures value already saved in the user buffer */
>> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
>> + xfeatures |= mask;
>> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
>> +
>> + return err;
>> +}
I see. Thanks for the note.
> I've done that for this patch, so no need to resend it.
Thanks!
Chang
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 64e54461ab6e8524a8de4e63b7d1a3e4481b5cf3
Gitweb: https://git.kernel.org/tip/64e54461ab6e8524a8de4e63b7d1a3e4481b5cf3
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:57 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:00 +02:00
x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
Currently, saving register states in the signal frame, the legacy feature
bits are always set in xregs_state->header->xfeatures. This code sequence
can be generalized for reuse in similar cases.
Refactor the logic to ensure a consistent approach across similar usages.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-8-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/signal.c | 11 +----------
arch/x86/kernel/fpu/xstate.h | 13 +++++++++++++
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index b8b4fa9..c3ec251 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -114,7 +114,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
{
struct xregs_state __user *x = buf;
struct _fpx_sw_bytes sw_bytes = {};
- u32 xfeatures;
int err;
/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
@@ -128,12 +127,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
(__u32 __user *)(buf + fpstate->user_size));
/*
- * Read the xfeatures which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers.
- */
- err |= __get_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
-
- /*
* For legacy compatible, we always set FP/SSE bits in the bit
* vector while saving the state to the user context. This will
* enable us capturing any changes(during sigreturn) to
@@ -144,9 +137,7 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
* header as well as change any contents in the memory layout.
* xrestore as part of sigreturn will capture all the changes.
*/
- xfeatures |= XFEATURE_MASK_FPSSE;
-
- err |= __put_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
+ err |= set_xfeature_in_sigframe(x, XFEATURE_MASK_FPSSE);
return !err;
}
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 9a3a8cc..4231e44 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -69,6 +69,19 @@ static inline u64 xfeatures_mask_independent(void)
return fpu_kernel_cfg.independent_features;
}
+static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
+{
+ u64 xfeatures;
+ int err;
+
+ /* Read the xfeatures value already saved in the user buffer */
+ err = __get_user(xfeatures, &xbuf->header.xfeatures);
+ xfeatures |= mask;
+ err |= __put_user(xfeatures, &xbuf->header.xfeatures);
+
+ return err;
+}
+
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
The signal delivery logic was modified to always set the PKRU bit in
xregs_state->header->xfeatures by this commit:
ae6012d72fa6 ("x86/pkeys: Ensure updated PKRU value is XRSTOR'd")
However, the change derives the bitmask value using XGETBV(1), rather
than simply updating the buffer that already holds the value. Thus, this
approach induces an unnecessary dependency on XGETBV1 for PKRU handling.
Eliminate the dependency by using the established helper function.
Subsequently, remove the now-unused 'mask' argument.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250214010607.7067-3-chang.seok.bae@intel.com/
* Massage the changelog
Additional Context:
Previously, the concern was raised about environments where XGETBV1 is
unavailable — such as in some virtual machines:
https://lore.kernel.org/lkml/20250102075419.2559-1-TonyWWang-oc@zhaoxin.com
That patch tried to sidestep the problem by skipping PKRU updates
entirely when XGETBV1 is inaccessible. However, this assumed the
dependency was necessary, which isn’t the case.
---
arch/x86/kernel/fpu/xstate.h | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index aadf02aed071..a6d987c16293 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -84,18 +84,15 @@ static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
-static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u64 mask, u32 pkru)
+static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
{
- u64 xstate_bv;
int err;
if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
return 0;
/* Mark PKRU as in-use so that it is restored correctly. */
- xstate_bv = (mask & xfeatures_in_use()) | XFEATURE_MASK_PKRU;
-
- err = __put_user(xstate_bv, &buf->header.xfeatures);
+ err = set_xfeature_in_sigframe(buf, XFEATURE_MASK_PKRU);
if (err)
return err;
@@ -319,7 +316,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf, u32 pkr
clac();
if (!err)
- err = update_pkru_in_sigframe(buf, mask, pkru);
+ err = update_pkru_in_sigframe(buf, pkru);
return err;
}
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: d1e420772cd1eb0afe5858619c73ce36f3e781a1
Gitweb: https://git.kernel.org/tip/d1e420772cd1eb0afe5858619c73ce36f3e781a1
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:58 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:03 +02:00
x86/pkeys: Simplify PKRU update in signal frame
The signal delivery logic was modified to always set the PKRU bit in
xregs_state->header->xfeatures by this commit:
ae6012d72fa6 ("x86/pkeys: Ensure updated PKRU value is XRSTOR'd")
However, the change derives the bitmask value using XGETBV(1), rather
than simply updating the buffer that already holds the value. Thus, this
approach induces an unnecessary dependency on XGETBV1 for PKRU handling.
Eliminate the dependency by using the established helper function.
Subsequently, remove the now-unused 'mask' argument.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250416021720.12305-9-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.h | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 4231e44..a0256ef 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -85,18 +85,15 @@ static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
-static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u64 mask, u32 pkru)
+static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
{
- u64 xstate_bv;
int err;
if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
return 0;
/* Mark PKRU as in-use so that it is restored correctly. */
- xstate_bv = (mask & xfeatures_in_use()) | XFEATURE_MASK_PKRU;
-
- err = __put_user(xstate_bv, &buf->header.xfeatures);
+ err = set_xfeature_in_sigframe(buf, XFEATURE_MASK_PKRU);
if (err)
return err;
@@ -320,7 +317,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf, u32 pkr
clac();
if (!err)
- err = update_pkru_in_sigframe(buf, mask, pkru);
+ err = update_pkru_in_sigframe(buf, pkru);
return err;
}
The variable was previously referenced in KVM code but the last usage was
removed by:
ea4d6938d4c0 ("x86/fpu: Replace KVMs home brewed FPU copy from user")
Remove its export symbol.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/d143cc4c-8f8e-48e5-87f1-dded3272433a@suse.com
* Note the commit that removed its usage (Nikolay)
* Include review tag
Apologies -- given the review tag, I should have followed up this
earlier.
---
arch/x86/kernel/fpu/init.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 16b6611634c3..2d9b5e677559 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -100,7 +100,6 @@ static void __init fpu__init_system_early_generic(void)
* Boot time FPU feature detection code:
*/
unsigned int mxcsr_feature_mask __ro_after_init = 0xffffffffu;
-EXPORT_SYMBOL_GPL(mxcsr_feature_mask);
static void __init fpu__init_system_mxcsr(void)
{
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 70fe4a0266ef156f3a49071da0d9ea6af0f49c44
Gitweb: https://git.kernel.org/tip/70fe4a0266ef156f3a49071da0d9ea6af0f49c44
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:59 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:03 +02:00
x86/fpu: Remove export of mxcsr_feature_mask
The variable was previously referenced in KVM code but the last usage was
removed by:
ea4d6938d4c0 ("x86/fpu: Replace KVMs home brewed FPU copy from user")
Remove its export symbol.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-10-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/init.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 16b6611..2d9b5e6 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -100,7 +100,6 @@ static void __init fpu__init_system_early_generic(void)
* Boot time FPU feature detection code:
*/
unsigned int mxcsr_feature_mask __ro_after_init = 0xffffffffu;
-EXPORT_SYMBOL_GPL(mxcsr_feature_mask);
static void __init fpu__init_system_mxcsr(void)
{
The original function name came from an overly compressed form of
'fpstate_regs' by commit:
e61d6310a0f8 ("x86/fpu: Reset permission and fpstate on exec()")
However, the term 'fpregs' typically refers to physical FPU registers. In
contrast, this function copies the init values to fpu->fpstate->regs, not
hardware registers.
Rename the function to better reflect what it actually does.
No functional change.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20240530192739.172566-2-chang.seok.bae@intel.com
* Refine the changelog
* Add a note referencing the original naming decision.
This patch was originally submitted as part of the In-Field Scan driver
series. Although that series is now dropped that I believe, this rename
still serves as a useful cleanup to correct a naming choice that I
previously made :(.
---
arch/x86/kernel/fpu/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3a19877a314e..8d674435f173 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -733,7 +733,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
/*
* Reset current->fpu memory state to the init values.
*/
-static void fpu_reset_fpregs(void)
+static void fpu_reset_fpstate_regs(void)
{
struct fpu *fpu = x86_task_fpu(current);
@@ -768,7 +768,7 @@ void fpu__clear_user_states(struct fpu *fpu)
fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
fpregs_unlock();
return;
}
@@ -798,7 +798,7 @@ void fpu__clear_user_states(struct fpu *fpu)
void fpu_flush_thread(void)
{
fpstate_reset(x86_task_fpu(current));
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
}
/*
* Load FPU context before returning to userspace.
--
2.45.2
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: de8304c319bc020ef79d109909ad40e944d82c82
Gitweb: https://git.kernel.org/tip/de8304c319bc020ef79d109909ad40e944d82c82
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:17:00 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:03 +02:00
x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs()
The original function name came from an overly compressed form of
'fpstate_regs' by commit:
e61d6310a0f8 ("x86/fpu: Reset permission and fpstate on exec()")
However, the term 'fpregs' typically refers to physical FPU registers. In
contrast, this function copies the init values to fpu->fpstate->regs, not
hardware registers.
Rename the function to better reflect what it actually does.
No functional change.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-11-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3a19877..8d67443 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -733,7 +733,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
/*
* Reset current->fpu memory state to the init values.
*/
-static void fpu_reset_fpregs(void)
+static void fpu_reset_fpstate_regs(void)
{
struct fpu *fpu = x86_task_fpu(current);
@@ -768,7 +768,7 @@ void fpu__clear_user_states(struct fpu *fpu)
fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
fpregs_unlock();
return;
}
@@ -798,7 +798,7 @@ void fpu__clear_user_states(struct fpu *fpu)
void fpu_flush_thread(void)
{
fpstate_reset(x86_task_fpu(current));
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
}
/*
* Load FPU context before returning to userspace.
* Ingo Molnar <mingo@kernel.org> wrote: > > * Sohil Mehta <sohil.mehta@intel.com> wrote: > > > On 4/11/2025 11:23 AM, Chang S. Bae wrote: > > > > > I've attached the patch revision. > > > > > > > LGTM, > > > > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > > Chang, mind sending a series of the latest version of all the pending > APX patches you have at the moment (and any other pending FPU patches > you may have), with Reviewed-by tags rolled in, etc., on top of: > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu Note that this is now all in tip:x86/fpu, alongside with a rebased version of Chang S. Bae's preparatory patches for Intel APX: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/fpu I think we could now start merging the rest of the APX patches, for v6.16 upstreaming. Thanks, Ingo
On 4/14/2025 1:23 AM, Ingo Molnar wrote: > >> Chang, mind sending a series of the latest version of all the pending >> APX patches you have at the moment (and any other pending FPU patches >> you may have), with Reviewed-by tags rolled in, etc., on top of: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu > > Note that this is now all in tip:x86/fpu, alongside with a rebased > version of Chang S. Bae's preparatory patches for Intel APX: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/fpu > > I think we could now start merging the rest of the APX patches, for > v6.16 upstreaming. Thanks for the update! Finalizing a few minor changes based on the feedback so far, and also refreshing a few other FPU patches that are considerably worth to be bundled up. I’ll post them in the next day or two. Thanks, Chang
On 4/14/2025 1:23 AM, Ingo Molnar wrote: > I think we could now start merging the rest of the APX patches, for > v6.16 upstreaming. > Chang, Ingo, The series looks good to me. Functionally, the only change I have requested is in patch 7/9 to add a "disabling XSAVE" message to the error print. The rest of the comments are mainly about clarifications in the commit logs. Feel free to ignore them if they seem redundant.
On 4/14/25 10:28, Sohil Mehta wrote: > Functionally, the only change I have requested is in patch 7/9 to add a > "disabling XSAVE" message to the error print. That's probably a good idea to stick in: fpu__init_disable_system_xstate() for all of the "out_disable" cases to use, no?
On 4/14/2025 10:32 AM, Dave Hansen wrote: > On 4/14/25 10:28, Sohil Mehta wrote: >> Functionally, the only change I have requested is in patch 7/9 to add a >> "disabling XSAVE" message to the error print. > > That's probably a good idea to stick in: > > fpu__init_disable_system_xstate() > > for all of the "out_disable" cases to use, no? That way my initial inclination as well. My suggestion was mainly to keep it consistent. But looking more closely, there is mismatch already. So either of the options work for me. "x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx." "x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE" "x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE"
On 4/14/2025 10:45 AM, Sohil Mehta wrote:
>
> That way my initial inclination as well. My suggestion was mainly to
> keep it consistent. But looking more closely, there is mismatch already.
> So either of the options work for me.
>
> "x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx."
>
> "x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE"
>
> "x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init,
> disabling XSAVE"
And you also left this comment in patch 7:
> It might be useful to add a "disabling XSAVE" print at the end of this
> statement, like the other error messages in the same function.
So it sounds like you were suggesting something along the lines of:
"x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx,
disabling XSAVE"
If so, I see that as an improvement to the error message rather than a
functional change. But I agree it reads better and more in line with the
other cases.
Thanks,
Chang
On 4/14/2025 11:02 AM, Chang S. Bae wrote: > On 4/14/2025 10:45 AM, Sohil Mehta wrote: >> >> That way my initial inclination as well. My suggestion was mainly to >> keep it consistent. But looking more closely, there is mismatch already. >> So either of the options work for me. >> >> "x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx." >> >> "x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE" >> >> "x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, >> disabling XSAVE" > > > And you also left this comment in patch 7: > > > It might be useful to add a "disabling XSAVE" print at the end of this > > statement, like the other error messages in the same function. > > So it sounds like you were suggesting something along the lines of: > "x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, > disabling XSAVE" > Yes, this is what I meant. But Dave's suggestion to add a common print in fpu__init_disable_system_xstate() is fine as well. > If so, I see that as an improvement to the error message rather than a > functional change. Maybe functional is too strong. I meant a minor code/message change. > But I agree it reads better and more in line with the > other cases. >
The Advanced Performance Extensions (APX) feature flag was previously
defined. This feature is associated with a new state component number 19.
To support APX, it is essential to define this xstate component and
implement the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly set the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V1 -> RFC-V2:
* Remove the ordering table change, as it is now dynamically populated.
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..97310df3ea13 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -303,6 +305,13 @@ struct xtile_data {
struct reg_1024_byte tmm;
} __packed;
+/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 46c45e2f2a5a..2a270683a762 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -63,6 +63,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -570,6 +572,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
return false;
--
2.45.2
Overall, the patch looks good to me.
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Some minor nits below.
On 3/20/2025 4:42 PM, Chang S. Bae wrote:
> The Advanced Performance Extensions (APX) feature flag was previously
> defined.
It's unnecessary to say this. You could directly start with. Advanced
Performance Extensions (APX) is associated with...
> This feature is associated with a new state component number 19.
> To support APX, it is essential to define this xstate component and
> implement the necessary sanity checks.
>
It might be more precise to say what support is being added.
Maybe something like,
During context switch, to support saving and restoring of APX registers
using XSAVE, it is essential...
> Define the new component number, state name, and those register data
> type. Then, extend the size checker to validate the register data type
> and explicitly set the APX feature flag as a dependency for the new
> component in xsave_cpuid_features[].
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> RFC-V1 -> RFC-V2:
> * Remove the ordering table change, as it is now dynamically populated.
> ---
> arch/x86/include/asm/fpu/types.h | 9 +++++++++
> arch/x86/kernel/fpu/xstate.c | 3 +++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
> index de16862bf230..97310df3ea13 100644
> --- a/arch/x86/include/asm/fpu/types.h
> +++ b/arch/x86/include/asm/fpu/types.h
> @@ -125,6 +125,7 @@ enum xfeature {
> XFEATURE_RSRVD_COMP_16,
> XFEATURE_XTILE_CFG,
> XFEATURE_XTILE_DATA,
> + XFEATURE_APX,
>
> XFEATURE_MAX,
> };
> @@ -145,6 +146,7 @@ enum xfeature {
> #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
> #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
> #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
> +#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
>
> #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
> #define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
> @@ -303,6 +305,13 @@ struct xtile_data {
> struct reg_1024_byte tmm;
> } __packed;
>
> +/*
> + * State component 19: 8B extended general purpose register.
> + */
> +struct apx_state {
> + u64 egpr[16];
> +} __packed;
> +
The below comment makes it seem the APX component is inserted out of
order. But, it's the PASID component that is actually out of order.
I'll send a cleanup patch separately. It could be applied before or
after this series.
> /*
> * State component 10 is supervisor state used for context-switching the
> * PASID state.
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 46c45e2f2a5a..2a270683a762 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -63,6 +63,7 @@ static const char *xfeature_names[] =
> "unknown xstate feature",
> "AMX Tile config",
> "AMX Tile data",
> + "APX registers",
> "unknown xstate feature",
> };
>
> @@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
> [XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
> [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
> [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
> + [XFEATURE_APX] = X86_FEATURE_APX,
> };
>
> static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
> @@ -570,6 +572,7 @@ static bool __init check_xstate_against_struct(int nr)
> case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
> case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
> case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
> + case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
Can we insert the new APX case before XFEATURE_XTILE_DATA? These cases
are not really in numerical order. That way, the switch case is more
consistent and easier to read.
> default:
> XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
> return false;
APX is introduced as xstate component 19, following AMX. However, in the
non-compacted format, its offset overlaps with the space previously
occupied by the now-deprecated MPX:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
To prevent conflicts, the kernel must ensure the CPU never expose both
features at the same time. If so, it indicates unreliable hardware. In
such cases, XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Note: MPX state components remain enabled on legacy systems solely for
KVM guest support.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2a270683a762..0d68d5c4bc48 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: both APX and MPX present in the CPU's xstate features: 0x%llx.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
--
2.45.2
On 3/20/2025 4:42 PM, Chang S. Bae wrote:
> APX is introduced as xstate component 19, following AMX. However, in the
> non-compacted format, its offset overlaps with the space previously
> occupied by the now-deprecated MPX:
>
> 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
>
Would it be useful to describe why and how this is possible?
My knowledge on this fairly limited, is it because the size of the APX
register state is the same or less than the legacy MPX state?
Also, what other options does the kernel have to handle this?
There was an earlier email from Dave that describes the trade-offs.
Would it be useful to insert some form of summary here or in a different
patch?
https://lore.kernel.org/lkml/674db309-e206-4bb4-bf99-b3c39bff7973@intel.com/
> To prevent conflicts, the kernel must ensure the CPU never expose both
> features at the same time. If so, it indicates unreliable hardware. In
> such cases, XSAVE should be disabled entirely as a precautionary measure.
>
> Add a sanity check to detect this condition and disable XSAVE if an
> invalid hardware configuration is identified.
>
> Note: MPX state components remain enabled on legacy systems solely for
> KVM guest support.
>
I didn't understand this properly. Can you please explain more? Is there
an impact or restriction that this imposes on a combination of legacy
and modern components?
Old KVM — new Guest kernel
or
New KVM — Old guest kernel
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 2a270683a762..0d68d5c4bc48 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> goto out_disable;
> }
>
> + if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
> + fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
> + /*
> + * This is a problematic CPU configuration where two
> + * conflicting state components are both enumerated.
> + */
> + pr_err("x86/fpu: both APX and MPX present in the CPU's xstate features: 0x%llx.\n",
> + fpu_kernel_cfg.max_features);
s/both/Both
For the user, this statement doesn't necessarily seem like an error
message, nor does it say what action the kernel takes.
IIUC, there is no other print that says the kernel is disabling XSAVE
once it jumps to out_disable?
It might be useful to add a "disabling XSAVE" print at the end of this
statement, like the other error messages in the same function.
With this change,
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
> + goto out_disable;
> + }
> +
> fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
> XFEATURE_MASK_INDEPENDENT;
>
On 4/14/25 10:09, Sohil Mehta wrote:
> On 3/20/2025 4:42 PM, Chang S. Bae wrote:
>> APX is introduced as xstate component 19, following AMX. However, in the
>> non-compacted format, its offset overlaps with the space previously
>> occupied by the now-deprecated MPX:
>>
>> 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
>>
> Would it be useful to describe why and how this is possible?
Every XSAVE state component is architecturally independent. There's no
architectural rule out there that says that state component offsets in
the non-compacted format must always go up or that they can't overlap.
But, there's never been an overlap like this before.
So this is technically allowed, but it's weird and it breaks some
internal assumptions that the kernel has.
Does that summarize it OK?
> My knowledge on this fairly limited, is it because the size of the APX
> register state is the same or less than the legacy MPX state?
Well, that was one reason the MPX location was _chosen_ for APX. Had APX
been bigger, it the MPX location would obviously not have been an
option. Had it been way smaller, using the MPX location would be OK.
> Also, what other options does the kernel have to handle this?
The kernel could _probably_ just disable the MPX states and favor using
APX. But, honestly, if this ever happens it's going to be because some
silly VMM is enumerating all the features it knows of and doesn't have
logic to make APX and MPX mutually exclusive.
In that case, we want the VMM to get fixed pronto, so papering over the
issue is not the best idea. Spitting out a warning and disabling XSAVE
but still booting sounds like a good balance of being annoying, but not
being _as_ annoying as a BUG_ON() or breaking the boot.
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..b308a76afbb7 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -32,7 +32,8 @@
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0d68d5c4bc48..a5e3954dc834 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
--
2.45.2
On 3/20/2025 4:42 PM, Chang S. Bae wrote: > With securing APX against conflicting MPX, it is now ready to be enabled. > Include APX in the enabled xfeature set. > > Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> > --- > arch/x86/include/asm/fpu/xstate.h | 3 ++- > arch/x86/kernel/fpu/xstate.c | 3 ++- > 2 files changed, 4 insertions(+), 2 deletions(-) > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h > index 7f39fe7980c5..b308a76afbb7 100644 > --- a/arch/x86/include/asm/fpu/xstate.h > +++ b/arch/x86/include/asm/fpu/xstate.h > @@ -32,7 +32,8 @@ > XFEATURE_MASK_PKRU | \ > XFEATURE_MASK_BNDREGS | \ > XFEATURE_MASK_BNDCSR | \ > - XFEATURE_MASK_XTILE) > + XFEATURE_MASK_XTILE | \ > + XFEATURE_MASK_APX) > > /* > * Features which are restored when returning to user space. > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index 0d68d5c4bc48..a5e3954dc834 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) > XFEATURE_MASK_BNDCSR | \ > XFEATURE_MASK_PASID | \ > XFEATURE_MASK_CET_USER | \ > - XFEATURE_MASK_XTILE) > + XFEATURE_MASK_XTILE | \ > + XFEATURE_MASK_APX) > > /* > * setup the xstate image representing the init state
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
The patch depends on the selftest xstate series, which was just merged
into the x86/fpu branch as of now:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fpu
Here is the original series posting for the record:
https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
4 files changed, 16 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c32cc8f..f703fcfe9f7c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 000000000000..d9c8d41b8c5a
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c964ea..97fe4bd8bc77 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36ec852f..e91e3092b5d2 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -59,6 +60,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
--
2.45.2
On 3/20/2025 4:43 PM, Chang S. Bae wrote:
> The extended general-purpose registers for APX may contain random data,
> which is currently assumed by the xstate testing framework. This allows
> the testing of the new userspace feature using the common test code.
>
> Invoke the test entry function from apx.c after enumerating the
> state component and adding it to the support list
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
AVX and APX have very similar single line testcases. If we end up adding
more, it might be useful to consider merging them into a common file.
> diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
> new file mode 100644
> index 000000000000..d9c8d41b8c5a
> --- /dev/null
> +++ b/tools/testing/selftests/x86/apx.c
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include "xstate.h"
> +
> +int main(void)
> +{
> + test_xstate(XFEATURE_APX);
> +}
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> == Introduction ==
>
> APX introduces a new set of general-purpose registers designed to improve
> performance. Currently, these registers are expected to be used primarily
> by userspace applications, with no intended use in kernel mode. More
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> details on its use cases can be found in the published documentation [1].
I strongly suspect this won't remain so, unless there's some horrible
ISA limitation or other quirk that makes APX unsuitable for kernel use.
The moment one of the mainstream compilers adds support for it we'll
want to enable it for the kernel too. Since the new GP registers are
caller-saved they are fairly easy to use as scratch registers to reduce
stack pressure, which is very much present on x86-64, especially when
the frame pointer is in use.
So we need to keep that in mind when shaping these patches.
Unless I'm missing something that is. :-)
Thanks,
Ingo
On 2/27/25 11:15, Ingo Molnar wrote: >> by userspace applications, with no intended use in kernel mode. More > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> details on its use cases can be found in the published documentation [1]. > I strongly suspect this won't remain so, unless there's some horrible > ISA limitation or other quirk that makes APX unsuitable for kernel use. I think Chang was trying to say that this series is completely focused on userspace AMX and basically ignores the idea of using it in the kernel. That said, we honestly talk all the time about using it in the kernel. A kernel compiled to use APX everywhere obviously won't run on older CPUs but it _should_ have some really nice advantages. I completely expect the kernel to be able to be compiled to use APX at _some_ point.
On 2/27/2025 11:36 AM, Dave Hansen wrote: > On 2/27/25 11:15, Ingo Molnar wrote: >>> by userspace applications, with no intended use in kernel mode. More >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> details on its use cases can be found in the published documentation [1]. >> I strongly suspect this won't remain so, unless there's some horrible >> ISA limitation or other quirk that makes APX unsuitable for kernel use. > > I think Chang was trying to say that this series is completely focused > on userspace AMX and basically ignores the idea of using it in the kernel. > > That said, we honestly talk all the time about using it in the kernel. A > kernel compiled to use APX everywhere obviously won't run on older CPUs > but it _should_ have some really nice advantages. I completely expect > the kernel to be able to be compiled to use APX at _some_ point. Yes, exactly. My primary intent was to clarify that this posting is for supporting userspace APX usage. At the same time, I don’t want this to imply little or no potential for in-kernel use. Thanks, Chang
© 2016 - 2025 Red Hat, Inc.