arch/x86/kernel/cpu/bus_lock.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
The kernel can receive #AC fault on split lock access even when
X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated. For example, this can
occur with a TDX guest running under a Linux host with split lock
detection enabled.
The default "warning" mode of handling user split lock depends on being
able to temporarily disable detection to recover from the split lock event.
However, when X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated, the MSR
or the bit that disables detection is normally not accessible. This means
the feature cannot be disabled and the "warning" mode will not work. The
"fatal" mode, however, can still work properly.
Handle split locks as fatal in such cases.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v2:
- handle it generally instead of special-casing TDX guest; [Kiryl]
v1: https://lore.kernel.org/all/20251126100205.1729391-1-xiaoyao.li@intel.com/
Notes:
- Kiryl suggested to check cpu_model_supports_sld in v1, while this v2
checks X86_FEATURE_SPLIT_LOCK_DETECT. Because I found the latter can
cover the purpose of the former in current kernel. I also sent a patch
separately to clean up cpu_model_supports_sld. [1]
- Patch 2 of v1 is dropped in this v2, since v2 is not TDX specific
anymore. If anyone has interest on whether to enhance sld_state_show()
to call out the potential unexpected #AC behavior for TDX guest or
for more general cases, we can discuss here.
[1] https://lore.kernel.org/all/20251218080044.2615106-1-xiaoyao.li@intel.com/
---
arch/x86/kernel/cpu/bus_lock.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/bus_lock.c b/arch/x86/kernel/cpu/bus_lock.c
index dbc99a47be45..68721f1a2cb0 100644
--- a/arch/x86/kernel/cpu/bus_lock.c
+++ b/arch/x86/kernel/cpu/bus_lock.c
@@ -316,9 +316,25 @@ void bus_lock_init(void)
wrmsrq(MSR_IA32_DEBUGCTLMSR, val);
}
+static bool split_lock_fatal(void)
+{
+ /*
+ * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT
+ * the kernel cannot handle it by disabling the detection. Treat it as
+ * fatal regardless of the sld_state.
+ */
+ if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT))
+ return true;
+
+ if (sld_state == sld_fatal)
+ return true;
+
+ return false;
+}
+
bool handle_user_split_lock(struct pt_regs *regs, long error_code)
{
- if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ if ((regs->flags & X86_EFLAGS_AC) || split_lock_fatal())
return false;
split_lock_warn(regs->ip);
return true;
--
2.43.0
On 1/7/26 05:49, Xiaoyao Li wrote: > + /* > + * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT > + * the kernel cannot handle it by disabling the detection. Treat it as > + * fatal regardless of the sld_state. > + */ > + if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT)) > + return true; If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, that sounds more like a naughty hypervisor or buggy CPU that deserves a BUG_ON() rather than a situation where the kernel wants to move merrily along. This also needs an explanation in the changelog about _why_ X86_FEATURE_SPLIT_LOCK_DETECT isn't set and can't be set. It needs to explain why enumeration is not present *AND* is impossible to add.
On 1/7/2026 11:19 PM, Dave Hansen wrote: > On 1/7/26 05:49, Xiaoyao Li wrote: >> + /* >> + * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT >> + * the kernel cannot handle it by disabling the detection. Treat it as >> + * fatal regardless of the sld_state. >> + */ >> + if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT)) >> + return true; > > If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, that > sounds more like a naughty hypervisor or buggy CPU that deserves a > BUG_ON() rather than a situation where the kernel wants to move merrily > along. Yes. Such behavior is non-architectural. 1) If it happens on bare metal, the CPU is broken. 2) If it happens in guest, the hypervisor does something wrong. > This also needs an explanation in the changelog about _why_ > X86_FEATURE_SPLIT_LOCK_DETECT isn't set and can't be set. It needs to > explain why enumeration is not present *AND* is impossible to add. The only case I know, where such non-architectural behavior can happen is TDX guest. It's a virtualization case and X86_FEATURE_SPLIT_LOCK_DETECT cannot be virtualized normally in a sane manner because MSR_TEST_CTRL is a per-core scope MSR. Enumerating X86_FEATURE_SPLIT_LOCK_DETECT to a guest means the guest is able to enable/disable the feature freely by its own. However, on the HT system, if the guest disables the feature for its vcpu, it will also disable the feature for the sibling CPU on the same core, where the host processes or other VMs might run. Even on non-HT system, allowing the guest to disable the feature will violate the host purpose of not getting any split lock when host sets to fatal mode. On the other hand, the question can be "why getting #AC on the split lock if the feature is not available? and if it can be fixed to not get #AC?" For this question, 1) if it happens on bare metal, the CPU is broken. The kernel cannot fix it. 2) if it happens in guest, it should be the hypervisor enables the feature in hardward MSR when the guest is running. To fix it, the hypervisor can intercept the #AC and handle it itself instead of letting the #AC be delivered to the guest. This is what KVM already does for normal guests. However, for TDX guest, KVM cannot intercept #AC. It needs changes in TDX module to provide such ability.
On Wed, 2026-01-07 at 07:19 -0800, Dave Hansen wrote: > On 1/7/26 05:49, Xiaoyao Li wrote: > > + /* > > + * If #AC occurs on split lock without > > X86_FEATURE_SPLIT_LOCK_DETECT > > + * the kernel cannot handle it by disabling the detection. > > Treat it as > > + * fatal regardless of the sld_state. > > + */ > > + if (!cpu_feature_enabled(X86_FEATURE_SPLIT_LOCK_DETECT)) > > + return true; > > If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, > that sounds more like a naughty hypervisor or buggy CPU that deserves > a BUG_ON() rather than a situation where the kernel wants to move > merrily along. Can you clarify your feelings on BUG_ON()'s? I was under the impression that new ones were basically banned, and we should WARN() here to try to keep running. Unless we could claim that continuing would destroy something or other situation a user would never want.
On 1/7/26 07:24, Edgecombe, Rick P wrote: >> If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, >> that sounds more like a naughty hypervisor or buggy CPU that deserves >> a BUG_ON() rather than a situation where the kernel wants to move >> merrily along. > Can you clarify your feelings on BUG_ON()'s? I was under the impression > that new ones were basically banned, and we should WARN() here to try > to keep running. > > Unless we could claim that continuing would destroy something or other > situation a user would never want. I'm conflicted about BUG_ON() here. It's a pretty nasty thing to be sending exceptions that the kernel doesn't expect. x86 exception handling is "fun" and has lots of sharp edges. There are absolutely windows where the kernel can not recover from exceptions if they happen in there. The real questions is why the kernel should even try to recover if it's faced with a borderline malicious hypervisor or CPU so buggy it's throwing unexpected exceptions. On the other hand, in practice, this particular code path is from userspace and a BUG_ON() is an instant DoS. Balancing all that, a WARN_ON_ONCE() with panic_on_warn=1 is probably the best course of action here. But I still want to hear more about why the enumeration is broken and can't be fixed.
On 1/8/2026 12:06 AM, Dave Hansen wrote: > On 1/7/26 07:24, Edgecombe, Rick P wrote: >>> If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT, >>> that sounds more like a naughty hypervisor or buggy CPU that deserves >>> a BUG_ON() rather than a situation where the kernel wants to move >>> merrily along. >> Can you clarify your feelings on BUG_ON()'s? I was under the impression >> that new ones were basically banned, and we should WARN() here to try >> to keep running. >> >> Unless we could claim that continuing would destroy something or other >> situation a user would never want. > > I'm conflicted about BUG_ON() here. It's a pretty nasty thing to be > sending exceptions that the kernel doesn't expect. x86 exception > handling is "fun" and has lots of sharp edges. There are absolutely > windows where the kernel can not recover from exceptions if they happen > in there. The real questions is why the kernel should even try to > recover if it's faced with a borderline malicious hypervisor or CPU so > buggy it's throwing unexpected exceptions. > > On the other hand, in practice, this particular code path is from > userspace and a BUG_ON() is an instant DoS. > > Balancing all that, a WARN_ON_ONCE() with panic_on_warn=1 is probably > the best course of action here. Given that WARN_ON_ONCE() is 100% triggerable in TDX guest with a default host (CONFIG_X86_BUS_LOCK_DETECT=y && sld_state != sld_off) , is it OK to add it? > But I still want to hear more about why the enumeration is broken and > can't be fixed. please see my reply to your original ask.
On Wed, Jan 07, 2026 at 09:49:55PM +0800, Xiaoyao Li wrote:
> The kernel can receive #AC fault on split lock access even when
> X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated. For example, this can
> occur with a TDX guest running under a Linux host with split lock
> detection enabled.
>
> The default "warning" mode of handling user split lock depends on being
> able to temporarily disable detection to recover from the split lock event.
> However, when X86_FEATURE_SPLIT_LOCK_DETECT is not enumerated, the MSR
> or the bit that disables detection is normally not accessible. This means
> the feature cannot be disabled and the "warning" mode will not work. The
> "fatal" mode, however, can still work properly.
>
> Handle split locks as fatal in such cases.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v2:
> - handle it generally instead of special-casing TDX guest; [Kiryl]
>
> v1: https://lore.kernel.org/all/20251126100205.1729391-1-xiaoyao.li@intel.com/
>
> Notes:
> - Kiryl suggested to check cpu_model_supports_sld in v1, while this v2
> checks X86_FEATURE_SPLIT_LOCK_DETECT. Because I found the latter can
> cover the purpose of the former in current kernel. I also sent a patch
> separately to clean up cpu_model_supports_sld. [1]
>
> - Patch 2 of v1 is dropped in this v2, since v2 is not TDX specific
> anymore. If anyone has interest on whether to enhance sld_state_show()
> to call out the potential unexpected #AC behavior for TDX guest or
> for more general cases, we can discuss here.
>
> [1] https://lore.kernel.org/all/20251218080044.2615106-1-xiaoyao.li@intel.com/
> ---
> arch/x86/kernel/cpu/bus_lock.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/bus_lock.c b/arch/x86/kernel/cpu/bus_lock.c
> index dbc99a47be45..68721f1a2cb0 100644
> --- a/arch/x86/kernel/cpu/bus_lock.c
> +++ b/arch/x86/kernel/cpu/bus_lock.c
> @@ -316,9 +316,25 @@ void bus_lock_init(void)
> wrmsrq(MSR_IA32_DEBUGCTLMSR, val);
> }
>
> +static bool split_lock_fatal(void)
> +{
> + /*
> + * If #AC occurs on split lock without X86_FEATURE_SPLIT_LOCK_DETECT
> + * the kernel cannot handle it by disabling the detection. Treat it as
> + * fatal regardless of the sld_state.
> + */
Nit: maybe some context here on why it is even possible.
Otherwise:
Acked-by: Kiryl Shutsemau <kas@kernel.org>
--
Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2026 Red Hat, Inc.