[PATCH v2] x86/split_lock: Handle unexpected split lock as fatal

Xiaoyao Li posted 1 patch 1 month ago
arch/x86/kernel/cpu/bus_lock.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
[PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Xiaoyao Li 1 month ago
The kernel can receive #AC fault on split lock access even when
X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated. For example, this can
occur with a TDX guest running under a Linux host with split lock
detection enabled.

The default "warning" mode of handling user split lock depends on being
able to temporarily disable detection to recover from the split lock event.
However, when X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated, the MSR
or the bit that disables detection is normally not accessible. This means
the feature cannot be disabled and the "warning" mode will not work. The
"fatal" mode, however, can still work properly.

Handle split locks as fatal in such cases.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v2:
- handle it generally instead of special-casing TDX guest; [Kiryl]

v1: https://lore.kernel.org/all/20251126100205.1729391-1-xiaoyao.li@intel.com/

Notes:
- Kiryl suggested to check cpu_model_supports_sld in v1, while this v2
  checks X86_FEATURE_SPLIT_LOCK_DETECT. Because I found the latter can
  cover the purpose of the former in current kernel. I also sent a patch
  separately to clean up cpu_model_supports_sld. [1]

- Patch 2 of v1 is dropped in this v2, since v2 is not TDX specific
  anymore. If anyone has interest on whether to enhance sld_state_show()
  to call out the potential unexpected #AC behavior for TDX guest or
  for more general cases, we can discuss here.

[1] https://lore.kernel.org/all/20251218080044.2615106-1-xiaoyao.li@intel.com/
---
 arch/x86/kernel/cpu/bus_lock.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bus_lock.c b/arch/x86/kernel/cpu/bus_lock.c
index dbc99a47be45..68721f1a2cb0 100644
--- a/arch/x86/kernel/cpu/bus_lock.c
+++ b/arch/x86/kernel/cpu/bus_lock.c
@@ -316,9 +316,25 @@ void bus_lock_init(void)
 	wrmsrq(MSR_IA32_DEBUGCTLMSR, val);
 }
 
+static bool split_lock_fatal(void)
+{
+	/*
+	 * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT
+	 * the kernel cannot handle it by disabling the detection. Treat it as
+	 * fatal regardless of the sld_state.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT))
+		return true;
+
+	if (sld_state == sld_fatal)
+		return true;
+
+	return false;
+}
+
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+	if ((regs->flags & X86_EFLAGS_AC) || split_lock_fatal())
 		return false;
 	split_lock_warn(regs->ip);
 	return true;
-- 
2.43.0
Re: [PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Dave Hansen 1 month ago
On 1/7/26 05:49, Xiaoyao Li wrote:
> +	/*
> +	 * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT
> +	 * the kernel cannot handle it by disabling the detection. Treat it as
> +	 * fatal regardless of the sld_state.
> +	 */
> +	if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT))
> +		return true;

If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, that
sounds more like a naughty hypervisor or buggy CPU that deserves a
BUG_ON() rather than a situation where the kernel wants to move merrily
along.

This also needs an explanation in the changelog about _why_
X86_FEATURE_SPLIT_LOCK_DETECT isn't set and can't be set. It needs to
explain why enumeration is not present *AND* is impossible to add.
Re: [PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Xiaoyao Li 1 month ago
On 1/7/2026 11:19 PM, Dave Hansen wrote:
> On 1/7/26 05:49, Xiaoyao Li wrote:
>> +	/*
>> +	 * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT
>> +	 * the kernel cannot handle it by disabling the detection. Treat it as
>> +	 * fatal regardless of the sld_state.
>> +	 */
>> +	if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT))
>> +		return true;
> 
> If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, that
> sounds more like a naughty hypervisor or buggy CPU that deserves a
> BUG_ON() rather than a situation where the kernel wants to move merrily
> along.

Yes. Such behavior is non-architectural.
1) If it happens on bare metal, the CPU is broken.
2) If it happens in guest, the hypervisor does something wrong.

> This also needs an explanation in the changelog about _why_
> X86_FEATURE_SPLIT_LOCK_DETECT isn't set and can't be set. It needs to
> explain why enumeration is not present *AND* is impossible to add.

The only case I know, where such non-architectural behavior can happen 
is TDX guest. It's a virtualization case and 
X86_FEATURE_SPLIT_LOCK_DETECT cannot be virtualized normally in a sane 
manner because MSR_TEST_CTRL is a per-core scope MSR. Enumerating 
X86_FEATURE_SPLIT_LOCK_DETECT to a guest means the guest is able to 
enable/disable the feature freely by its own. However, on the HT system, 
if the guest disables the feature for its vcpu, it will also disable the 
feature for the sibling CPU on the same core, where the host processes 
or other VMs might run. Even on non-HT system, allowing the guest to 
disable the feature will violate the host purpose of not getting any 
split lock when host sets to fatal mode.

On the other hand, the question can be "why getting #AC on the split 
lock if the feature is not available? and if it can be fixed to not get 
#AC?" For this question,

1) if it happens on bare metal, the CPU is broken. The kernel cannot fix it.

2) if it happens in guest, it should be the hypervisor enables the 
feature in hardward MSR when the guest is running. To fix it, the 
hypervisor can intercept the #AC and handle it itself instead of letting 
the #AC be delivered to the guest. This is what KVM already does for 
normal guests. However, for TDX guest, KVM cannot intercept #AC. It 
needs changes in TDX module to provide such ability.
Re: [PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Edgecombe, Rick P 1 month ago
On Wed, 2026-01-07 at 07:19 -0800, Dave Hansen wrote:
> On 1/7/26 05:49, Xiaoyao Li wrote:
> > +	/*
> > +	 * If #AC occurs on split lock without
> > X86_FEATURE_SPLIT_LOCK_DETECT
> > +	 * the kernel cannot handle it by disabling the detection.
> > Treat it as
> > +	 * fatal regardless of the sld_state.
> > +	 */
> > +	if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT))
> > +		return true;
> 
> If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT,
> that sounds more like a naughty hypervisor or buggy CPU that deserves
> a BUG_ON() rather than a situation where the kernel wants to move
> merrily along.

Can you clarify your feelings on BUG_ON()'s? I was under the impression
that new ones were basically banned, and we should WARN() here to try
to keep running.

Unless we could claim that continuing would destroy something or other
situation a user would never want.
Re: [PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Dave Hansen 1 month ago
On 1/7/26 07:24, Edgecombe, Rick P wrote:
>> If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT,
>> that sounds more like a naughty hypervisor or buggy CPU that deserves
>> a BUG_ON() rather than a situation where the kernel wants to move
>> merrily along.
> Can you clarify your feelings on BUG_ON()'s? I was under the impression
> that new ones were basically banned, and we should WARN() here to try
> to keep running.
> 
> Unless we could claim that continuing would destroy something or other
> situation a user would never want.

I'm conflicted about BUG_ON() here. It's a pretty nasty thing to be
sending exceptions that the kernel doesn't expect. x86 exception
handling is "fun" and has lots of sharp edges. There are absolutely
windows where the kernel can not recover from exceptions if they happen
in there. The real questions is why the kernel should even try to
recover if it's faced with a borderline malicious hypervisor or CPU so
buggy it's throwing unexpected exceptions.

On the other hand, in practice, this particular code path is from
userspace and a BUG_ON() is an instant DoS.

Balancing all that, a WARN_ON_ONCE() with panic_on_warn=1 is probably
the best course of action here.

But I still want to hear more about why the enumeration is broken and
can't be fixed.
Re: [PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Xiaoyao Li 1 month ago
On 1/8/2026 12:06 AM, Dave Hansen wrote:
> On 1/7/26 07:24, Edgecombe, Rick P wrote:
>>> If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT,
>>> that sounds more like a naughty hypervisor or buggy CPU that deserves
>>> a BUG_ON() rather than a situation where the kernel wants to move
>>> merrily along.
>> Can you clarify your feelings on BUG_ON()'s? I was under the impression
>> that new ones were basically banned, and we should WARN() here to try
>> to keep running.
>>
>> Unless we could claim that continuing would destroy something or other
>> situation a user would never want.
> 
> I'm conflicted about BUG_ON() here. It's a pretty nasty thing to be
> sending exceptions that the kernel doesn't expect. x86 exception
> handling is "fun" and has lots of sharp edges. There are absolutely
> windows where the kernel can not recover from exceptions if they happen
> in there. The real questions is why the kernel should even try to
> recover if it's faced with a borderline malicious hypervisor or CPU so
> buggy it's throwing unexpected exceptions.
> 
> On the other hand, in practice, this particular code path is from
> userspace and a BUG_ON() is an instant DoS.
> 
> Balancing all that, a WARN_ON_ONCE() with panic_on_warn=1 is probably
> the best course of action here.

Given that WARN_ON_ONCE() is 100% triggerable in TDX guest with a 
default host (CONFIG_X86_BUS_LOCK_DETECT=y && sld_state != sld_off) , is 
it OK to add it?

> But I still want to hear more about why the enumeration is broken and
> can't be fixed.

please see my reply to your original ask.
Re: [PATCH v2] x86/split_lock: Handle unexpected split lock as fatal
Posted by Kiryl Shutsemau 1 month ago
On Wed, Jan 07, 2026 at 09:49:55PM +0800, Xiaoyao Li wrote:
> The kernel can receive #AC fault on split lock access even when
> X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated. For example, this can
> occur with a TDX guest running under a Linux host with split lock
> detection enabled.
> 
> The default "warning" mode of handling user split lock depends on being
> able to temporarily disable detection to recover from the split lock event.
> However, when X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated, the MSR
> or the bit that disables detection is normally not accessible. This means
> the feature cannot be disabled and the "warning" mode will not work. The
> "fatal" mode, however, can still work properly.
> 
> Handle split locks as fatal in such cases.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v2:
> - handle it generally instead of special-casing TDX guest; [Kiryl]
> 
> v1: https://lore.kernel.org/all/20251126100205.1729391-1-xiaoyao.li@intel.com/
> 
> Notes:
> - Kiryl suggested to check cpu_model_supports_sld in v1, while this v2
>   checks X86_FEATURE_SPLIT_LOCK_DETECT. Because I found the latter can
>   cover the purpose of the former in current kernel. I also sent a patch
>   separately to clean up cpu_model_supports_sld. [1]
> 
> - Patch 2 of v1 is dropped in this v2, since v2 is not TDX specific
>   anymore. If anyone has interest on whether to enhance sld_state_show()
>   to call out the potential unexpected #AC behavior for TDX guest or
>   for more general cases, we can discuss here.
> 
> [1] https://lore.kernel.org/all/20251218080044.2615106-1-xiaoyao.li@intel.com/
> ---
>  arch/x86/kernel/cpu/bus_lock.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/bus_lock.c b/arch/x86/kernel/cpu/bus_lock.c
> index dbc99a47be45..68721f1a2cb0 100644
> --- a/arch/x86/kernel/cpu/bus_lock.c
> +++ b/arch/x86/kernel/cpu/bus_lock.c
> @@ -316,9 +316,25 @@ void bus_lock_init(void)
>  	wrmsrq(MSR_IA32_DEBUGCTLMSR, val);
>  }
>  
> +static bool split_lock_fatal(void)
> +{
> +	/*
> +	 * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT
> +	 * the kernel cannot handle it by disabling the detection. Treat it as
> +	 * fatal regardless of the sld_state.
> +	 */

Nit: maybe some context here on why it is even possible.

Otherwise:

Acked-by: Kiryl Shutsemau <kas@kernel.org>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov