[v2] x86/its: use Sapphire Rapids+ feature to opt out

[PATCH v2] x86/its: use Sapphire Rapids+ feature to opt out

Posted by Jon Kohler 3 months, 3 weeks ago

A VMM may not expose ITS_NO or BHI_CTL, so guests cannot rely on those
bits to determine whether they might be migrated to ITS-affected
hardware. Rather than depending on a control that may be absent, detect
ITS-unaffected hosts via a CPU feature that is exclusive to Sapphire
Rapids and newer processors.

Use X86_FEATURE_BUS_LOCK_DETECT as the canary: it is present on
Sapphire Rapids+ parts and provides a reliable indicator that the guest
won't be moved to ITS-affected hardware. This avoids false negatives
caused by VMMs that omit ITS_NO or BHI_CTL. For example, QEMU added
bhi-ctrl only in v9.2.0 [1], well after adding the Sapphire Rapids
model in v8.0.0 [2].

[1] 10eaf9c0fb7 ("target/i386: Add more features enumerated by CPUID.7.2.EDX")
[2] 7eb061b06e9 ("i386: Add new CPU model SapphireRapids")

Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Fixes: 159013a7ca18 ("x86/its: Enumerate Indirect Target Selection (ITS) bug")
Signed-off-by: Jon Kohler <jon@nutanix.com>

---
v1->v2: Fix logic typo and checkpatch warning for Fixes line.

 .../admin-guide/hw-vuln/indirect-target-selection.rst       | 5 +++--
 arch/x86/kernel/cpu/common.c                                | 6 ++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst b/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst
index d9ca64108d23..3cfe4b9f9bd0 100644
--- a/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst
+++ b/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst
@@ -98,8 +98,9 @@ Mitigation in guests
 ^^^^^^^^^^^^^^^^^^^^
 All guests deploy ITS mitigation by default, irrespective of eIBRS enumeration
 and Family/Model of the guest. This is because eIBRS feature could be hidden
-from a guest. One exception to this is when a guest enumerates BHI_DIS_S, which
-indicates that the guest is running on an unaffected host.
+from a guest. One exception to this is when a guest enumerates BHI_DIS_S or
+BUS_LOCK_DETECT, either of which indicates that the guest is running on an
+unaffected host and would not be migratable to an affected host.
 
 To prevent guests from unnecessarily deploying the mitigation on unaffected
 platforms, Intel has defined ITS_NO bit(62) in MSR IA32_ARCH_CAPABILITIES. When
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c7d3512914ca..60fbfeba92e9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1361,9 +1361,11 @@ static bool __init vulnerable_to_its(u64 x86_arch_cap_msr)
 	/*
 	 * If a VMM did not expose ITS_NO, assume that a guest could
 	 * be running on a vulnerable hardware or may migrate to such
-	 * hardware.
+	 * hardware, except in the situation where the guest is presented
+	 * with a feature that only exists in non-vulnerable hardware.
 	 */
-	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
+	    !boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
 		return true;
 
 	if (cpu_matches(cpu_vuln_blacklist, ITS))
-- 
2.43.0

Re: [PATCH v2] x86/its: use Sapphire Rapids+ feature to opt out

Posted by Dave Hansen 3 months, 3 weeks ago

On 10/16/25 18:18, Jon Kohler wrote:
> +	 * hardware, except in the situation where the guest is presented
> +	 * with a feature that only exists in non-vulnerable hardware.
>  	 */
> -	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
> +	    !boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
>  		return true;

This seems like a hack in its purest form. Even worse, it's an
_uncommented_ hack.

This is _literally_ what ITS_NO is for.

So it's a pretty strong NAK from me on this one. No thanks. If you think
this is useful, it's a great thing to carry in a local kernel fork, but
it has no place in mainline.

Re: [PATCH v2] x86/its: use Sapphire Rapids+ feature to opt out

Posted by Jon Kohler 3 months, 3 weeks ago

> On Oct 17, 2025, at 12:12 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 10/16/25 18:18, Jon Kohler wrote:
>> + * hardware, except in the situation where the guest is presented
>> + * with a feature that only exists in non-vulnerable hardware.
>> */
>> - if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
>> +    !boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
>> return true;
> 
> This seems like a hack in its purest form. Even worse, it's an
> _uncommented_ hack.

Thanks for the review and comments, Dave.

Yes, it is a hack, I could do a better job on this, I’ve proposed
another pass at the bottom. See below for more detail. I’m
hoping we can work on something better before we
completely put this out to pasture.

> This is _literally_ what ITS_NO is for.

Not quite, as ITS_NO is for the VMM to drive the opt_out workflow.
Same with BHI_CTRL; however, I’ll explain below why this is a problem
for distributions and guests.

> So it's a pretty strong NAK from me on this one. No thanks. If you think
> this is useful, it's a great thing to carry in a local kernel fork, but
> it has no place in mainline.

I understand why you’d NAK this revision of the patch, but I’d love
to have a slightly longer discussion on what we could do to solve
the problem driving this commit.

This isn’t for our products/kernels, but rather guest kernels
from distributions that run on our (or anyone else’s) virtualization
products. I’ll admit I could improve the commit message to reflect
the driver for this, that’s what I get for working late :) my apologies

Here’s the deal:
With ITS on SPR, we see up to a ~3x regression in SAP’s
PBOffline benchmark tool in a metric that they call ‘cputime’. From
the end-users perspective, this happens out of nowhere when they
update to the ITS-enabled version of SLES kernel.

In that benchmark, it tracks all sorts of stuff, including the cumulative
time spent of all calls in their ‘indexserver’ process. The idea being
that they want to track both database / app response time as well
as the associated cost on the system.

The problem is that a guest kernel can not control what the VMM
configuration is, which is what the original ITS commit points out,
and the end user will automatically see this regression when they
deploy/update their kernel on a VMM that may not have ITS_NO

I am going to send patches for QEMU to add ITS_NO today, but
that doesn’t help anyone in this situation, who will hit this regression
on hardware that Intel has documented as unimpacted.

Now, the counter for that is that we’re also looking at BHI_CTRL
in the kernel code, but as the commit msg noted, that didn’t appear
in QEMU at least until 9.2, which is still fairly recent code. Even
then, it would still have to be configured as part of the virt stack
and isn’t an “automatic” given just booting a SPR model VM on a
SPR++ host with the fixed up QEMU.

The entire point (at least that I can figure from the docs and original
commit) of having the default enablement is that in the migration
pool scenario that Intel has documented, where just looking at
eIBRS enablement wouldn’t be sufficient because it would be
possible a guest with *only* eIBRS, even when started on SPR,
to be configured in such a way where it didn’t have any SPR++
features, and then be migrated to an impacted (e.g. ICX) host
at a later point.

Distros can accomplish the exact same thing in the guest, without
VMM modifications by simply looking at something that is exclusive to
SPR++, and know that any sane VMM would not (or could not)
allow a guest with higher level features active to migrate to a lower
level host.

That all said, that is not what indirect-target-selection.rst says.
The docs says that the reason why this is on by default is:
	All guests deploy ITS mitigation by default, irrespective of
	eIBRS enumeration and Family/Model of the guest. This is
	because eIBRS feature could be hidden from a guest.

Using that documentation to improve my approach, how about
this instead, where A) we have better code comments and B) we
also check eIBRS enablement? 

static bool __init vulnerable_to_its(u64 x86_arch_cap_msr)
{
...
	/*
	 * Some hypervisors do not expose ITS_NO or BHI_CTRL to guests.
	 * We can nevertheless infer that the underlying CPU is unaffected
	 * by checking for other features that only exist on unaffected
	 * hardware and by requiring that eIBRS is presented to the guest.
	 * If these conditions are met, the hypervisor cannot migrate the
	 * guest to vulnerable hardware without changing the advertised
	 * feature set. Use bus lock detection (introduced on Sapphire
	 * Rapids) as such a proxy feature. This is an intentional
	 * workaround for non-upgraded hypervisors to avoid unnecessary
	 * performance regressions on systems that are not vulnerable.
	 */
	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
		x86_arch_cap_msr & ARCH_CAP_IBRS_ALL &&
		!boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
		return false;

	/*
	 * If a VMM did not expose ITS_NO and does not expose eIBRS or
	 * other immunity bits, assume that a guest could be running on
	 * a vulnerable hardware or may migrate to such hardware.
	 */
	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
		return true;
...
}

Re: [PATCH v2] x86/its: use Sapphire Rapids+ feature to opt out

Posted by Dave Hansen 3 months, 3 weeks ago

On 10/17/25 05:21, Jon Kohler wrote:
> Using that documentation to improve my approach, how about
> this instead, where A) we have better code comments and B) we
> also check eIBRS enablement? 

No thanks.

Sounds like you are doing the right thing and fixing the hypervisor
that's not exposing thing that it should.