[PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression

Guilherme G. Piccoli posted 1 patch 1 year, 6 months ago
.../admin-guide/kernel-parameters.txt         | 11 ++++++-
Documentation/x86/buslock.rst                 | 19 +++++++++++
arch/x86/Kconfig                              | 10 ++++++
arch/x86/kernel/cpu/intel.c                   | 33 +++++++++++++------
4 files changed, 62 insertions(+), 11 deletions(-)
[PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, not only it shows the warn message, but also intentionally
introduces a slowdown (through sleeping plus serialization mechanism)
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.

Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slowdowns the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could try
a DoS by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].

So let's revamp the idea of having another option/mode for the split
lock detector, which is hereby called "seq" (based on the original
"sequential" naming in [0]). Also introduces a Kconfig option to give
the option of Linux vendors have a choice what mode should be their
default. While at it, fix/improve the documentation about bus locking.

[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/

[1] https://github.com/doitsujin/dxvk/issues/2938

Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Paul Gofman <pgofman@codeweavers.com>
Cc: Pavel Machek <pavel@denx.de>
Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>
Cc: Tony Luck <tony.luck@intel.com>
Tested-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


Hi Tony / Thomas, it was reported that some games cannot run anymore
due to the split lock detector change (as you can see above, in [1]).

This really "flirts" with userspace breakage / policy enforcement
from the kernel - I agree with the DoS point, but changing it to
have a new default behavior preventing the execution of these
applications with enough performance seems quite a hammer.

The proposal here is to get back to the old idea from Tony, having
another mode for the split lock detector, and a Kconfig to allow
distros choose the policy they wanna take by default. I've set
default in kernel to be the harmless old "warn" mode, we could
discuss of course what should be kernel default. But I think it's
important to give at least the choice for the distros that don't
want to opt-in this aggressive behavior.

Reviews / comments are very appreciated as usual!
Thanks in advance,

Guilherme


 .../admin-guide/kernel-parameters.txt         | 11 ++++++-
 Documentation/x86/buslock.rst                 | 19 +++++++++++
 arch/x86/Kconfig                              | 10 ++++++
 arch/x86/kernel/cpu/intel.c                   | 33 +++++++++++++------
 4 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 426fa892d311..a00113629f8f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5790,7 +5790,9 @@
 			instructions that access data across cache line
 			boundaries will result in an alignment check exception
 			for split lock detection or a debug exception for
-			bus lock detection.
+			bus lock detection. The default is set by the kernel
+			config X86_SPLIT_LOCK_MODE; defconfig sets it to warn.
+
 
 			off	- not enabled
 
@@ -5802,6 +5804,13 @@
 				  behavior is by #AC if both features are
 				  enabled in hardware.
 
+			seq	- kernel will emit the same rate-limited warns
+				  than the "warn" mode, but will also introduce
+				  an intentional delay (through serialization
+				  and a small sleeping) in such processes, in
+				  order to call the attention of users/devs to
+				  properly fix their applications.
+
 			fatal	- the kernel will send SIGBUS to applications
 				  that trigger the #AC exception or the #DB
 				  exception. Default behavior is by #AC if
diff --git a/Documentation/x86/buslock.rst b/Documentation/x86/buslock.rst
index 7c051e714943..f3cf353d2028 100644
--- a/Documentation/x86/buslock.rst
+++ b/Documentation/x86/buslock.rst
@@ -58,6 +58,17 @@ parameter "split_lock_detect". Here is a summary of different options:
 |		   |When both features are	|			|
 |		   |supported, warn in #AC	|			|
 +------------------+----------------------------+-----------------------+
+|seq		   |Kernel OOPs			|Warn once per task and |
+|		   |Same output as the "warn"	|and continues to run.  |
+|		   |option above, but also	|			|
+|		   |intentionally introduce a	|			|
+|		   |delay by serializing and	|			|
+|		   |quickly sleeping - this	|			|
+|		   |greatly affects application	|			|
+|		   |performance but may improve	|			|
+|		   |overall system perf in the	|			|
+|		   |case of many split locks.	|			|
++------------------+----------------------------+-----------------------+
 |fatal		   |Kernel OOPs			|Send SIGBUS to user.	|
 |		   |Send SIGBUS to user		|			|
 |		   |When both features are	|			|
@@ -103,6 +114,14 @@ warn
 A warning is emitted when a bus lock is detected which allows to identify
 the offending application. This is the default behavior.
 
+seq
+----
+
+A warning is emitted when a bus lock is detected which allows to identify
+the offending application, but also introduces a delay by serializing and
+quickly sleeping - this greatly affects application performance but might
+improve overall system performance in case of many unligned accesses.
+
 fatal
 -----
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f9920f1341c8..e44779756461 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -398,6 +398,16 @@ config CC_HAS_SANE_STACKPROTECTOR
 	  the compiler produces broken code or if it does not let us control
 	  the segment on 32-bit kernels.
 
+config X86_SPLIT_LOCK_MODE
+	int "Default mode of split lock detector for supporting CPUs (0/off - 3/fatal)"
+	range 0 3
+	default 1
+	help
+	  Select the default mode for split lock detector on supporting CPUs.
+	  Possible modes are: 0 ("off"), 1 ("warn"), 2 ("seq") and 3 ("fatal");
+	  details are well described in the kernel parameters documentation.
+	  Users can always override this using the cmdline "split_lock_detect=".
+
 menu "Processor type and features"
 
 config SMP
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2d7ea5480ec3..9f6157afbb56 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -44,6 +44,7 @@
 enum split_lock_detect_state {
 	sld_off = 0,
 	sld_warn,
+	sld_seq,
 	sld_fatal,
 	sld_ratelimit,
 };
@@ -1028,6 +1029,7 @@ static const struct {
 } sld_options[] __initconst = {
 	{ "off",	sld_off   },
 	{ "warn",	sld_warn  },
+	{ "seq",	sld_seq	},
 	{ "fatal",	sld_fatal },
 	{ "ratelimit:", sld_ratelimit },
 };
@@ -1075,7 +1077,7 @@ static bool split_lock_verify_msr(bool on)
 
 static void __init sld_state_setup(void)
 {
-	enum split_lock_detect_state state = sld_warn;
+	enum split_lock_detect_state state = CONFIG_X86_SPLIT_LOCK_MODE;
 	char arg[20];
 	int i, ret;
 
@@ -1149,7 +1151,9 @@ static void split_lock_init(void)
 static void __split_lock_reenable(struct work_struct *work)
 {
 	sld_update_msr(true);
-	up(&buslock_sem);
+
+	if (sld_state == sld_seq)
+		up(&buslock_sem);
 }
 
 /*
@@ -1180,12 +1184,15 @@ static void split_lock_warn(unsigned long ip)
 				    current->comm, current->pid, ip);
 	current->reported_split_lock = 1;
 
-	/* misery factor #1, sleep 10ms before trying to execute split lock */
-	if (msleep_interruptible(10) > 0)
-		return;
-	/* Misery factor #2, only allow one buslocked disabled core at a time */
-	if (down_interruptible(&buslock_sem) == -EINTR)
-		return;
+	if (sld_state == sld_seq) {
+		/* misery factor #1, sleep 10ms before trying to execute split lock */
+		if (msleep_interruptible(10) > 0)
+			return;
+		/* Misery factor #2, only allow one buslocked disabled core at a time */
+		if (down_interruptible(&buslock_sem) == -EINTR)
+			return;
+	}
+
 	cpu = get_cpu();
 	schedule_delayed_work_on(cpu, &split_lock_reenable, 2);
 
@@ -1196,7 +1203,7 @@ static void split_lock_warn(unsigned long ip)
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-	if (sld_state == sld_warn) {
+	if (sld_state == sld_warn || sld_state == sld_seq) {
 		split_lock_warn(ip);
 		return true;
 	}
@@ -1256,6 +1263,7 @@ void handle_bus_lock(struct pt_regs *regs)
 		/* Warn on the bus lock. */
 		fallthrough;
 	case sld_warn:
+	case sld_seq:
 		pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
 				    current->comm, current->pid, regs->ip);
 		break;
@@ -1335,8 +1343,13 @@ static void sld_state_show(void)
 		pr_info("disabled\n");
 		break;
 	case sld_warn:
+	case sld_seq:
 		if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
-			pr_info("#AC: crashing the kernel on kernel split_locks and warning on user-space split_locks\n");
+			if (sld_state == sld_warn)
+				pr_info("#AC: crashing the kernel on kernel split_locks and warning on user-space split_locks\n");
+			else
+				pr_info("#AC: crashing the kernel on kernel split_locks and intentionally slowdown the task on user-space split_locks\n");
+
 			if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 					      "x86/splitlock", NULL, splitlock_cpu_offline) < 0)
 				pr_warn("No splitlock CPU offline handler\n");
-- 
2.37.3
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Dave Hansen 1 year, 6 months ago
I really don't like the idea of *both* a new boot parameter and a new
Kconfig option.

The warning mode worked as intended in this case because it got a user
to file a bug and that bug report made it back to us.  It's kinda funny
to respond to that report by reducing the misery.

On the other hand, all the report resulted in was finger-pointing at a
binary Windows applications that neither we nor probably anybody else
can do anything about.

It boils down to either:
 * The misery is good and we keep it as-is, or
 * The misery is bad and we kill it

My gut says we should keep the warnings and kill the misery.  The folks
who are going to be able to fix the issues are probably also the ones
looking at dmesg and don't need the extra hint from the misery.  The
folks running Windows games don't look at dmesg and just want to play
their game without misery.

The other option is to wait and see if there's any kind of pattern with
these reports.
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Zebediah Figura 1 year, 6 months ago
On 9/28/22 16:50, Dave Hansen wrote:
> I really don't like the idea of *both* a new boot parameter and a new
> Kconfig option.
> 
> The warning mode worked as intended in this case because it got a user
> to file a bug and that bug report made it back to us.  It's kinda funny
> to respond to that report by reducing the misery.
> 
> On the other hand, all the report resulted in was finger-pointing at a
> binary Windows applications that neither we nor probably anybody else
> can do anything about.
> 
> It boils down to either:
>   * The misery is good and we keep it as-is, or
>   * The misery is bad and we kill it
> 
> My gut says we should keep the warnings and kill the misery.  The folks
> who are going to be able to fix the issues are probably also the ones
> looking at dmesg and don't need the extra hint from the misery.  The
> folks running Windows games don't look at dmesg and just want to play
> their game without misery.

This seems like a reasonable position, but on the other hand...

> The other option is to wait and see if there's any kind of pattern with
> these reports.

...if the pattern ends up being "closed-source Windows software", as is 
not unlikely, then maybe this is something that could be made into a 
personality, that emulators like Wine could enable? I don't know how 
distasteful such optional workarounds are in general, but this would at 
least allow the original warning to continue working as intended for 
software that can be fixed.

ἔρρωσθε,
Zeb
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
On 28/09/2022 18:50, Dave Hansen wrote:
>[...]
> It boils down to either:
>  * The misery is good and we keep it as-is, or
>  * The misery is bad and we kill it
> 
> My gut says we should keep the warnings and kill the misery.  The folks
> who are going to be able to fix the issues are probably also the ones
> looking at dmesg and don't need the extra hint from the misery.  The
> folks running Windows games don't look at dmesg and just want to play
> their game without misery.
> 

Hi Dave, thanks for your response. I really appreciated your reasoning,
and I think it's a good argument. In the end, adding misery would harm
the users that are unlikely to be able to fix (or at least, fix quickly)
the split lock situation, like games or legacy/proprietary code.

I have a revert removing the misery ready and tested, let me know if I
should submit it.

Cheers,


Guilherme
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Dave Hansen 1 year, 6 months ago
On 9/29/22 07:57, Guilherme G. Piccoli wrote:
> On 28/09/2022 18:50, Dave Hansen wrote:
>> [...]
>> It boils down to either:
>>  * The misery is good and we keep it as-is, or
>>  * The misery is bad and we kill it
>>
>> My gut says we should keep the warnings and kill the misery.  The folks
>> who are going to be able to fix the issues are probably also the ones
>> looking at dmesg and don't need the extra hint from the misery.  The
>> folks running Windows games don't look at dmesg and just want to play
>> their game without misery.
>>
> Hi Dave, thanks for your response. I really appreciated your reasoning,
> and I think it's a good argument. In the end, adding misery would harm
> the users that are unlikely to be able to fix (or at least, fix quickly)
> the split lock situation, like games or legacy/proprietary code.
> 
> I have a revert removing the misery ready and tested, let me know if I
> should submit it.

I'm a bit of a late arrival to the split lock party, so I'm a bit
hesitant to merge any changes immediately.

How about we give it a few weeks and see if the current behavior impacts
anyone else?  Maybe the best route will be more clear then.
RE: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Luck, Tony 1 year, 6 months ago
>> I have a revert removing the misery ready and tested, let me know if I
>> should submit it.
>
> I'm a bit of a late arrival to the split lock party, so I'm a bit
> hesitant to merge any changes immediately.
>
> How about we give it a few weeks and see if the current behavior impacts
> anyone else?  Maybe the best route will be more clear then.

Applying "misery" to the processes that are executing split-lock flows saves
the rest of the system from a different level of misery (for the duration of the
split lock other logical CPUs and I/O devices have access to memory blocked).

So the "misery" serves a very useful purpose on multi-user systems.

Maybe the decision of which mode to use could be dynamic based on
number of online CPUs? Laptops/desktops with low counts (<50???)
could just "warn", while servers could default to the "seq" mode.

Or perhaps there is some other heuristic to distinguish single-user
systems where the split-locks are not causing pain to other users?

-Tony
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Lukas Straub 1 year, 5 months ago
On Thu, 29 Sep 2022 15:37:55 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:

> >> I have a revert removing the misery ready and tested, let me know if I
> >> should submit it.  
> >
> > I'm a bit of a late arrival to the split lock party, so I'm a bit
> > hesitant to merge any changes immediately.
> >
> > How about we give it a few weeks and see if the current behavior impacts
> > anyone else?  Maybe the best route will be more clear then.  
> 
> Applying "misery" to the processes that are executing split-lock flows saves
> the rest of the system from a different level of misery (for the duration of the
> split lock other logical CPUs and I/O devices have access to memory blocked).
> 
> So the "misery" serves a very useful purpose on multi-user systems.

Hello Everyone,
How about the following: The kernel traps the split-lock, but instead
of punishing the process artificially it emulates it in a different way
that won't harm the system as a whole. Of course this still will be
slower than a non-split-lock but surely won't take 10ms.

For example, you could emulate the instruction without atomic semantics,
but protected under a single global mutex for all split-lock operations
instead. This is how atomics are done on on alpha AFAIK, which doesn't
have atomic instructions.

This is not as simple as the current solution. But I see the current
solution more like a quick and dirty workaround for this security/DoS
issue, until a proper solution (like my proposal) is implemented.

Regards,
Lukas Straub

> Maybe the decision of which mode to use could be dynamic based on
> number of online CPUs? Laptops/desktops with low counts (<50???)
> could just "warn", while servers could default to the "seq" mode.
> 
> Or perhaps there is some other heuristic to distinguish single-user
> systems where the split-locks are not causing pain to other users?
> 
> -Tony



-- 

Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
On 29/09/2022 12:17, Dave Hansen wrote:
> [...]
>> Hi Dave, thanks for your response. I really appreciated your reasoning,
>> and I think it's a good argument. In the end, adding misery would harm
>> the users that are unlikely to be able to fix (or at least, fix quickly)
>> the split lock situation, like games or legacy/proprietary code.
>>
>> I have a revert removing the misery ready and tested, let me know if I
>> should submit it.
> 
> I'm a bit of a late arrival to the split lock party, so I'm a bit
> hesitant to merge any changes immediately.
> 

OK, agree that we could indeed gather more opinions, like Thomas' and
other interested parties. But...


> How about we give it a few weeks and see if the current behavior impacts
> anyone else?  Maybe the best route will be more clear then.

...I disagree in just letting it fly for weeks with all players of God
of War 2 running modern Intel chips unable to play in 5.19+ because of
this change.

Certainly we have more games/applications that are impacted, I just
don't think we should wait on having 3 userspace breakages reported, for
example, to take an action - why should gamers live with this for an
arbitrary amount of time, until others report more issues?

Cheers,


Guilherme
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Dave Hansen 1 year, 6 months ago
On 9/29/22 08:30, Guilherme G. Piccoli wrote:
>> How about we give it a few weeks and see if the current behavior impacts
>> anyone else?  Maybe the best route will be more clear then.
> ...I disagree in just letting it fly for weeks with all players of God
> of War 2 running modern Intel chips unable to play in 5.19+ because of
> this change.

Let's be precise here, though.  It isn't that folks can't play.  It's
that we *intentionally* put something in place that kept them from
playing.  They can play just fine after disabling split lock detection.

> Certainly we have more games/applications that are impacted, I just 
> don't think we should wait on having 3 userspace breakages reported,
> for example, to take an action - why should gamers live with this for
> an arbitrary amount of time, until others report more issues?
They don't have to live with it.  They can turn it off.  That's why the
command-line disable is there.

The real question in my head is whether the misery is intentional or
not.  Is breaking games what folks _intended_ with
split_lock_detect=warn?  Or, is this a more severe penalty than we
expected and maybe we should back off for the default?
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Paul Gofman 1 year, 6 months ago
On 9/29/22 11:26, Dave Hansen wrote:
> Let's be precise here, though. It isn't that folks can't play. It's
> that we *intentionally* put something in place that kept them from
> playing.  They can play just fine after disabling split lock detection.

I guess that the statement that they can play is arguable. To do it, the 
player (even capabale and willing to go as far as tweaking kernel 
options, which alone may be a showstopper for the game to be called 
playable) should get aware that the issue they are having can be solved 
this way. IMO such level of involvement effectively means that a user 
who is not technically advanced and not perceiving troubleshooting Linux 
specific issues as a part of game play just can't play it.
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
Two responses in one:

First, I like Tony's idea - heuristic based on CPU number seems to make
sense. This number could be purely based on CPU number with a hardcoded
value, or based in a Kconfig (after N CPUs, add misery).


Now, second part below, inline:

On 29/09/2022 13:26, Dave Hansen wrote:
> [...]
> Let's be precise here, though.  It isn't that folks can't play.  It's
> that we *intentionally* put something in place that kept them from
> playing.  They can play just fine after disabling split lock detection.
>

Agreed with your wording, would just maybe "s/put something/added a
policy" heheh


>> [...]
> They don't have to live with it.  They can turn it off.  That's why the
> command-line disable is there.
> 

If they know how to do it, right? I'd rather have it as a new
non-default mechanism or at least, a Kconfig option in which gamer
distros could tune for their users, so we're not requiring them to tune
x86-specific parameters to play some game.
RE: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Luck, Tony 1 year, 6 months ago
> So let's revamp the idea of having another option/mode for the split
> lock detector, which is hereby called "seq" (based on the original
> "sequential" naming in [0]). Also introduces a Kconfig option to give
> the option of Linux vendors have a choice what mode should be their
> default. While at it, fix/improve the documentation about bus locking.
>
> [1] https://github.com/doitsujin/dxvk/issues/2938

Why not just use the workaround suggested in that bug report:

   "so manual switching from default setting to split_lock_detect=off helps as workaround here"

If you add this extra mode, I'm going to argue that the kernel default
should be "seq" rather than "warn". So these game players will need
to add a split_lock_detect=off (or warn) option.

Has a bug report been filed against the God Of War game? Probably worth fixing,
the performance penalty for split lock is only going to get worse as numbers of
cores keeps increasing.

-Tony
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Pavel Machek 1 year, 6 months ago
On Wed 2022-09-28 20:24:23, Luck, Tony wrote:
> > So let's revamp the idea of having another option/mode for the split
> > lock detector, which is hereby called "seq" (based on the original
> > "sequential" naming in [0]). Also introduces a Kconfig option to give
> > the option of Linux vendors have a choice what mode should be their
> > default. While at it, fix/improve the documentation about bus locking.
> >
> > [1] https://github.com/doitsujin/dxvk/issues/2938
> 
> Why not just use the workaround suggested in that bug report:
> 
>    "so manual switching from default setting to split_lock_detect=off helps as workaround here"
> 
> If you add this extra mode, I'm going to argue that the kernel default
> should be "seq" rather than "warn". So these game players will need
> to add a split_lock_detect=off (or warn) option.

Kernel should not cause userland regressions, and this is one. That
should make it pretty clear what the solution is.

And no, I don't like CONFIG option, either.

Best regards,
								Pavel
-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
On 06/10/2022 06:04, Pavel Machek wrote:
> [...]
>> Why not just use the workaround suggested in that bug report:
>>
>>    "so manual switching from default setting to split_lock_detect=off helps as workaround here"
>>
>> If you add this extra mode, I'm going to argue that the kernel default
>> should be "seq" rather than "warn". So these game players will need
>> to add a split_lock_detect=off (or warn) option.
> 
> Kernel should not cause userland regressions, and this is one. That
> should make it pretty clear what the solution is.
> 
> And no, I don't like CONFIG option, either.

Thanks for your opinion Pavel! Good thing is that seems everybody at
least understands the problem exists and is affecting userspace.

Now, about the action we should take: what if we go with a revert in
this "misery factor" based on Dave's reasoning? I have a patch ready and
tested, let me know otherwise or I'll submit it as a V2 by the end of
this week.

Cheers,


Guilherme
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
On 28/09/2022 17:24, Luck, Tony wrote:
> [...] 
> Why not just use the workaround suggested in that bug report:
> 
>    "so manual switching from default setting to split_lock_detect=off helps as workaround here"
> 
> If you add this extra mode, I'm going to argue that the kernel default
> should be "seq" rather than "warn". So these game players will need
> to add a split_lock_detect=off (or warn) option.
> 

Hi Tony, thanks for your response. The workaround is the way to
circumvent the issue for now, but not all users want (or know how) to
deal with the kernel parameters. If a distro wants to default to show a
warning only, but don't break performance so hard, this would be
currently impossible.

I understand that maybe the kernel default could be seq, I could
re-submit - not a deal breaker, at least distros are given a way to
collect split lock reports without breaking!

The main/big issues here are two: defaulting to the disruptive behavior
(with no way of building a kernel not defaulting to that without
patching), and not having a way to warn about split locking without
breaking the performance, hence the new mode "seq".

> Has a bug report been filed against the God Of War game? Probably worth fixing,
> the performance penalty for split lock is only going to get worse as numbers of
> cores keeps increasing.
> 

Definitely worth fixing! And the plan is always to report it -
meanwhile, we'd like to have all users of old/proprietary software/games
to continue using the kernel the way it was before :-)

Regarding God of War specifically, I'm not sure but I can verify - the
devels should be made aware, for sure.

Cheers,


Guilherme
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Thomas Gleixner 1 year, 6 months ago
On Wed, Sep 28 2022 at 17:56, Guilherme G. Piccoli wrote:
> On 28/09/2022 17:24, Luck, Tony wrote:
>> [...] 
>> Why not just use the workaround suggested in that bug report:
>> 
>>    "so manual switching from default setting to split_lock_detect=off helps as workaround here"
>> 
>> If you add this extra mode, I'm going to argue that the kernel default
>> should be "seq" rather than "warn". So these game players will need
>> to add a split_lock_detect=off (or warn) option.
>> 
>
> Hi Tony, thanks for your response. The workaround is the way to
> circumvent the issue for now, but not all users want (or know how) to
> deal with the kernel parameters. If a distro wants to default to show a
> warning only, but don't break performance so hard, this would be
> currently impossible.

That Kconfig knob is patently bad. The only sane choice for a generic
distro kernel is to slow down the offenders simply because split lock is
a trivial unpriviledged DoS. Run a split locker in a tight loop and
watch your shiny new multicore system degrading into a machine from the
80s. So unless the distro provides a "special broken games" kernel the
users will still need to fiddle with the command line.

Attack vector prevention has precedence over broken applications. That's what
command line options or sysctls are for.

> The main/big issues here are two: defaulting to the disruptive behavior
> (with no way of building a kernel not defaulting to that without
> patching), and not having a way to warn about split locking without
> breaking the performance, hence the new mode "seq".

Which is a misnomer and tells absolutely nothing. If we add a new
parameter then we name it something like "mitigate" and make it the
default.

But a way better solution is to add a sysctl knob which allows to
disable the slowdown mechanics and that allows distros to give the user
an trivial knob in the GUI to switch to "I don't care. My broken game is
more important!" mode, while still maintaining the only sensible default
of preventing damage for the general use case of the generic distro
kernel.

Thanks,

        tglx
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Guilherme G. Piccoli 1 year, 6 months ago
On 06/10/2022 17:15, Thomas Gleixner wrote:
> [...]
> 
> But a way better solution is to add a sysctl knob which allows to
> disable the slowdown mechanics and that allows distros to give the user
> an trivial knob in the GUI to switch to "I don't care. My broken game is
> more important!" mode, while still maintaining the only sensible default
> of preventing damage for the general use case of the generic distro
> kernel.

OK, if nobody opposes I'll work on that sysctl approach then.
Cheers,


Guilherme
Re: [PATCH] x86/split_lock: Restore warn mode (and add a new one) to avoid userspace regression
Posted by Joshua Ashton 1 year, 6 months ago

On 9/28/22 17:54, Luck, Tony wrote:
>> So let's revamp the idea of having another option/mode for the split
>> lock detector, which is hereby called "seq" (based on the original
>> "sequential" naming in [0]). Also introduces a Kconfig option to give
>> the option of Linux vendors have a choice what mode should be their
>> default. While at it, fix/improve the documentation about bus locking.
>>
>> [1] https://github.com/doitsujin/dxvk/issues/2938
> 
> Why not just use the workaround suggested in that bug report:
> 
>     "so manual switching from default setting to split_lock_detect=off helps as workaround here"
> 
> If you add this extra mode, I'm going to argue that the kernel default
> should be "seq" rather than "warn". So these game players will need
> to add a split_lock_detect=off (or warn) option.
> 
> Has a bug report been filed against the God Of War game? Probably worth fixing,
> the performance penalty for split lock is only going to get worse as numbers of
> cores keeps increasing.

It's not just about God of War specifically.
There are many old titles that will never, ever, get updated to fix this 
problem.
These titles worked perfectly fine and were performant before.

I completely understand that split-locks are bad, and developers should 
fix their code, but applications paying penalties like this really 
doesn't make sense for the Linux desktop.

If server-oriented distributions want to set their default to `seq`, 
then that's completely fine, and I would completely encourage them doing 
that -- but I really don't think the Linux kernel should be essentially 
breaking compatibility with these applications by default and throwing 
desktop Linux usecases under the bus.

- Joshie 🐸✨

> 
> -Tony
>