[PATCH] lkdtm/bugs: add test for hung smp_call_function_single()

Mark Rutland posted 1 patch 4 weeks, 1 day ago
There is a newer version of this series
drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
[PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
Posted by Mark Rutland 4 weeks, 1 day ago
The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
smp_call_function*() calls (e.g. when the target CPU gets stuck within
the callback function). Testing this option requires triggering such
hangs.

This patch adds an lkdtm test with a hung smp_call_function_single()
callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
of the hung target CPU).

On arm64 using pseudo-NMI, this looks like:

| # mount -t debugfs none /sys/kernel/debug/
| # echo CSDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry CSDLOCKUP
| smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000001136 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
| smp:     csd: CSD lock (#1) handling this request.
| Sending NMI from CPU 0 to CPUs 1:
| NMI backtrace for cpu 1
| CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
| Hardware name: linux,dummy-virt (DT)
| pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
| pc : __lkdtm_CSDLOCKUP+0x0/0x8
| lr : __flush_smp_call_function_queue+0x1b0/0x290
| sp : ffff80008000bf30
| pmr_save: 00000060
| x29: ffff80008000bf30 x28: fff00000c02dc500 x27: 0000000000000000
| x26: 0000000000000000 x25: fff00000c02dc500 x24: ffffa41b939aa140
| x23: ffffa41b939aa140 x22: 0000000000000000 x21: ffff80008066bc40
| x20: 0000000000000000 x19: 0000000000000000 x18: fff05be56bd37000
| x17: fff05be56bd07000 x16: ffff800080008000 x15: 00005b132023e6fd
| x14: 00005aeabb53d8c3 x13: 000000000000032e x12: 0000000000000001
| x11: 0000000000000040 x10: fff00000c003d0a8 x9 : fff00000c003d0a0
| x8 : fff00000c0400270 x7 : 0000000000000000 x6 : ffffa41b9251b810
| x5 : 0000000000000000 x4 : fff05be56bd07000 x3 : ffff80008000bf30
| x2 : fff05be56bd07000 x1 : ffffa41b939aa140 x0 : 0000000000000000
| Call trace:
|  __lkdtm_CSDLOCKUP+0x0/0x8
|  generic_smp_call_function_single_interrupt+0x14/0x20
|  ipi_handler+0xb8/0x178
|  handle_percpu_devid_irq+0x84/0x130
|  generic_handle_domain_irq+0x2c/0x44
|  gic_handle_irq+0x118/0x240
|  call_on_irq_stack+0x24/0x4c
|  do_interrupt_handler+0x80/0x84
|  el1_interrupt+0x44/0xc0
|  el1h_64_irq_handler+0x18/0x24
|  el1h_64_irq+0x78/0x7c
|  default_idle_call+0x40/0x60
|  do_idle+0x23c/0x2d0
|  cpu_startup_entry+0x38/0x3c
|  secondary_start_kernel+0x148/0x180
|  __secondary_switched+0xb8/0xbc
| CPU: 0 PID: 143 Comm: sh Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x90/0xe8
|  show_stack+0x18/0x24
|  dump_stack_lvl+0xac/0xe8
|  dump_stack+0x18/0x24
|  csd_lock_wait_toolong+0x268/0x338
|  smp_call_function_single+0x1dc/0x2f0
|  lkdtm_CSDLOCKUP+0xcc/0xfc
|  lkdtm_do_action+0x1c/0x38
|  direct_entry+0xbc/0x14c
|  full_proxy_write+0x60/0xb4
|  vfs_write+0xd0/0x35c
|  ksys_write+0x70/0x104
|  __arm64_sys_write+0x1c/0x28
|  invoke_syscall+0x48/0x114
|  el0_svc_common.constprop.0+0x40/0xe0
|  do_el0_svc+0x1c/0x28
|  el0_svc+0x38/0x108
|  el0t_64_sync_handler+0x120/0x12c
|  el0t_64_sync+0x1a4/0x1a8
| smp: csd: Continued non-responsive CSD lock (#1) on CPU#0, waiting 10000001888 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
| smp:     csd: CSD lock (#1) handling this request.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
 drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

I wrote this because I needed to guide someone through debugging a hung
smp_call_function() call, and I needed examples with/without an NMI
backtrace. It seems like it'd be useful for testing the CSD lockup
detector and NMI backtrace code in future.

I'm not sure about the CSDLOCKUP name, but everything else I tried
didn't seem great either:

* IPILOCKUP sounds like it's testing IPIs generally
* SMPCALLLOCKUP and similar look weirdly long
* SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP

... and I'm happy to defer to Kees for the naming. ;)

Mark.

diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 5178c02b21eba..47cd1be09ac1f 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -286,6 +286,35 @@ static void lkdtm_HARDLOCKUP(void)
 		cpu_relax();
 }
 
+static void __lkdtm_CSDLOCKUP(void *unused)
+{
+	for (;;)
+		cpu_relax();
+}
+
+static void lkdtm_CSDLOCKUP(void)
+{
+	unsigned int cpu, target;
+
+	cpus_read_lock();
+
+	cpu = get_cpu();
+	target = cpumask_any_but(cpu_online_mask, cpu);
+
+	if (target >= nr_cpu_ids) {
+		pr_err("FAIL: no other online CPUs\n");
+		goto out_put_cpus;
+	}
+
+	smp_call_function_single(target, __lkdtm_CSDLOCKUP, NULL, 1);
+
+	pr_err("FAIL: did not hang\n");
+
+out_put_cpus:
+	put_cpu();
+	cpus_read_unlock();
+}
+
 static void lkdtm_SPINLOCKUP(void)
 {
 	/* Must be called twice to trigger. */
@@ -680,6 +709,7 @@ static struct crashtype crashtypes[] = {
 	CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE),
 	CRASHTYPE(SOFTLOCKUP),
 	CRASHTYPE(HARDLOCKUP),
+	CRASHTYPE(CSDLOCKUP),
 	CRASHTYPE(SPINLOCKUP),
 	CRASHTYPE(HUNG_TASK),
 	CRASHTYPE(OVERFLOW_SIGNED),
-- 
2.30.2
Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
Posted by Paul E. McKenney 3 weeks, 3 days ago
On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> smp_call_function*() calls (e.g. when the target CPU gets stuck within
> the callback function). Testing this option requires triggering such
> hangs.
> 
> This patch adds an lkdtm test with a hung smp_call_function_single()
> callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> of the hung target CPU).
> 
> On arm64 using pseudo-NMI, this looks like:
> 
> | # mount -t debugfs none /sys/kernel/debug/
> | # echo CSDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry CSDLOCKUP
> | smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000001136 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> | Sending NMI from CPU 0 to CPUs 1:
> | NMI backtrace for cpu 1
> | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> | pc : __lkdtm_CSDLOCKUP+0x0/0x8
> | lr : __flush_smp_call_function_queue+0x1b0/0x290
> | sp : ffff80008000bf30
> | pmr_save: 00000060
> | x29: ffff80008000bf30 x28: fff00000c02dc500 x27: 0000000000000000
> | x26: 0000000000000000 x25: fff00000c02dc500 x24: ffffa41b939aa140
> | x23: ffffa41b939aa140 x22: 0000000000000000 x21: ffff80008066bc40
> | x20: 0000000000000000 x19: 0000000000000000 x18: fff05be56bd37000
> | x17: fff05be56bd07000 x16: ffff800080008000 x15: 00005b132023e6fd
> | x14: 00005aeabb53d8c3 x13: 000000000000032e x12: 0000000000000001
> | x11: 0000000000000040 x10: fff00000c003d0a8 x9 : fff00000c003d0a0
> | x8 : fff00000c0400270 x7 : 0000000000000000 x6 : ffffa41b9251b810
> | x5 : 0000000000000000 x4 : fff05be56bd07000 x3 : ffff80008000bf30
> | x2 : fff05be56bd07000 x1 : ffffa41b939aa140 x0 : 0000000000000000
> | Call trace:
> |  __lkdtm_CSDLOCKUP+0x0/0x8
> |  generic_smp_call_function_single_interrupt+0x14/0x20
> |  ipi_handler+0xb8/0x178
> |  handle_percpu_devid_irq+0x84/0x130
> |  generic_handle_domain_irq+0x2c/0x44
> |  gic_handle_irq+0x118/0x240
> |  call_on_irq_stack+0x24/0x4c
> |  do_interrupt_handler+0x80/0x84
> |  el1_interrupt+0x44/0xc0
> |  el1h_64_irq_handler+0x18/0x24
> |  el1h_64_irq+0x78/0x7c
> |  default_idle_call+0x40/0x60
> |  do_idle+0x23c/0x2d0
> |  cpu_startup_entry+0x38/0x3c
> |  secondary_start_kernel+0x148/0x180
> |  __secondary_switched+0xb8/0xbc
> | CPU: 0 PID: 143 Comm: sh Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x90/0xe8
> |  show_stack+0x18/0x24
> |  dump_stack_lvl+0xac/0xe8
> |  dump_stack+0x18/0x24
> |  csd_lock_wait_toolong+0x268/0x338
> |  smp_call_function_single+0x1dc/0x2f0
> |  lkdtm_CSDLOCKUP+0xcc/0xfc
> |  lkdtm_do_action+0x1c/0x38
> |  direct_entry+0xbc/0x14c
> |  full_proxy_write+0x60/0xb4
> |  vfs_write+0xd0/0x35c
> |  ksys_write+0x70/0x104
> |  __arm64_sys_write+0x1c/0x28
> |  invoke_syscall+0x48/0x114
> |  el0_svc_common.constprop.0+0x40/0xe0
> |  do_el0_svc+0x1c/0x28
> |  el0_svc+0x38/0x108
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x1a4/0x1a8
> | smp: csd: Continued non-responsive CSD lock (#1) on CPU#0, waiting 10000001888 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> I wrote this because I needed to guide someone through debugging a hung
> smp_call_function() call, and I needed examples with/without an NMI
> backtrace. It seems like it'd be useful for testing the CSD lockup
> detector and NMI backtrace code in future.
> 
> I'm not sure about the CSDLOCKUP name, but everything else I tried
> didn't seem great either:
> 
> * IPILOCKUP sounds like it's testing IPIs generally
> * SMPCALLLOCKUP and similar look weirdly long
> * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> 
> ... and I'm happy to defer to Kees for the naming. ;)
> 
> Mark.
> 
> diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
> index 5178c02b21eba..47cd1be09ac1f 100644
> --- a/drivers/misc/lkdtm/bugs.c
> +++ b/drivers/misc/lkdtm/bugs.c
> @@ -286,6 +286,35 @@ static void lkdtm_HARDLOCKUP(void)
>  		cpu_relax();
>  }
>  
> +static void __lkdtm_CSDLOCKUP(void *unused)
> +{
> +	for (;;)
> +		cpu_relax();
> +}
> +
> +static void lkdtm_CSDLOCKUP(void)
> +{
> +	unsigned int cpu, target;
> +
> +	cpus_read_lock();
> +
> +	cpu = get_cpu();
> +	target = cpumask_any_but(cpu_online_mask, cpu);
> +
> +	if (target >= nr_cpu_ids) {
> +		pr_err("FAIL: no other online CPUs\n");
> +		goto out_put_cpus;
> +	}
> +
> +	smp_call_function_single(target, __lkdtm_CSDLOCKUP, NULL, 1);
> +
> +	pr_err("FAIL: did not hang\n");
> +
> +out_put_cpus:
> +	put_cpu();
> +	cpus_read_unlock();
> +}
> +
>  static void lkdtm_SPINLOCKUP(void)
>  {
>  	/* Must be called twice to trigger. */
> @@ -680,6 +709,7 @@ static struct crashtype crashtypes[] = {
>  	CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE),
>  	CRASHTYPE(SOFTLOCKUP),
>  	CRASHTYPE(HARDLOCKUP),
> +	CRASHTYPE(CSDLOCKUP),
>  	CRASHTYPE(SPINLOCKUP),
>  	CRASHTYPE(HUNG_TASK),
>  	CRASHTYPE(OVERFLOW_SIGNED),
> -- 
> 2.30.2
>
Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
Posted by Kees Cook 4 weeks, 1 day ago
On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> smp_call_function*() calls (e.g. when the target CPU gets stuck within
> the callback function). Testing this option requires triggering such
> hangs.
> 
> This patch adds an lkdtm test with a hung smp_call_function_single()
> callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> of the hung target CPU).
> 
> On arm64 using pseudo-NMI, this looks like:
> 
> | # mount -t debugfs none /sys/kernel/debug/
> | # echo CSDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry CSDLOCKUP
> | smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000001136 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> | Sending NMI from CPU 0 to CPUs 1:
> | NMI backtrace for cpu 1
> | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> | pc : __lkdtm_CSDLOCKUP+0x0/0x8
> | lr : __flush_smp_call_function_queue+0x1b0/0x290
> | sp : ffff80008000bf30
> | pmr_save: 00000060
> | x29: ffff80008000bf30 x28: fff00000c02dc500 x27: 0000000000000000
> | x26: 0000000000000000 x25: fff00000c02dc500 x24: ffffa41b939aa140
> | x23: ffffa41b939aa140 x22: 0000000000000000 x21: ffff80008066bc40
> | x20: 0000000000000000 x19: 0000000000000000 x18: fff05be56bd37000
> | x17: fff05be56bd07000 x16: ffff800080008000 x15: 00005b132023e6fd
> | x14: 00005aeabb53d8c3 x13: 000000000000032e x12: 0000000000000001
> | x11: 0000000000000040 x10: fff00000c003d0a8 x9 : fff00000c003d0a0
> | x8 : fff00000c0400270 x7 : 0000000000000000 x6 : ffffa41b9251b810
> | x5 : 0000000000000000 x4 : fff05be56bd07000 x3 : ffff80008000bf30
> | x2 : fff05be56bd07000 x1 : ffffa41b939aa140 x0 : 0000000000000000
> | Call trace:
> |  __lkdtm_CSDLOCKUP+0x0/0x8
> |  generic_smp_call_function_single_interrupt+0x14/0x20
> |  ipi_handler+0xb8/0x178
> |  handle_percpu_devid_irq+0x84/0x130
> |  generic_handle_domain_irq+0x2c/0x44
> |  gic_handle_irq+0x118/0x240
> |  call_on_irq_stack+0x24/0x4c
> |  do_interrupt_handler+0x80/0x84
> |  el1_interrupt+0x44/0xc0
> |  el1h_64_irq_handler+0x18/0x24
> |  el1h_64_irq+0x78/0x7c
> |  default_idle_call+0x40/0x60
> |  do_idle+0x23c/0x2d0
> |  cpu_startup_entry+0x38/0x3c
> |  secondary_start_kernel+0x148/0x180
> |  __secondary_switched+0xb8/0xbc
> | CPU: 0 PID: 143 Comm: sh Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x90/0xe8
> |  show_stack+0x18/0x24
> |  dump_stack_lvl+0xac/0xe8
> |  dump_stack+0x18/0x24
> |  csd_lock_wait_toolong+0x268/0x338
> |  smp_call_function_single+0x1dc/0x2f0
> |  lkdtm_CSDLOCKUP+0xcc/0xfc
> |  lkdtm_do_action+0x1c/0x38
> |  direct_entry+0xbc/0x14c
> |  full_proxy_write+0x60/0xb4
> |  vfs_write+0xd0/0x35c
> |  ksys_write+0x70/0x104
> |  __arm64_sys_write+0x1c/0x28
> |  invoke_syscall+0x48/0x114
> |  el0_svc_common.constprop.0+0x40/0xe0
> |  do_el0_svc+0x1c/0x28
> |  el0_svc+0x38/0x108
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x1a4/0x1a8
> | smp: csd: Continued non-responsive CSD lock (#1) on CPU#0, waiting 10000001888 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> ---
>  drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> I wrote this because I needed to guide someone through debugging a hung
> smp_call_function() call, and I needed examples with/without an NMI
> backtrace. It seems like it'd be useful for testing the CSD lockup
> detector and NMI backtrace code in future.

Like the other lockup detector, I suspect we should skip it by default
in the selftests? Something like this:

diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
index 368973f05250..32baddc2c85d 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -31,6 +31,7 @@ SLAB_FREE_CROSS
 SLAB_FREE_PAGE
 #SOFTLOCKUP Hangs the system
 #HARDLOCKUP Hangs the system
+#CSDLOCKUP Hangs the system
 #SPINLOCKUP Hangs the system
 #HUNG_TASK Hangs the system
 EXEC_DATA

> 
> I'm not sure about the CSDLOCKUP name, but everything else I tried
> didn't seem great either:
> 
> * IPILOCKUP sounds like it's testing IPIs generally
> * SMPCALLLOCKUP and similar look weirdly long
> * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> 
> ... and I'm happy to defer to Kees for the naming. ;)

It looks like it's only useful with CSD lockup detector? If that's true,
sure, this name is fine.

> [...]
> diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
> index 5178c02b21eba..47cd1be09ac1f 100644
> --- a/drivers/misc/lkdtm/bugs.c
> +++ b/drivers/misc/lkdtm/bugs.c
> @@ -286,6 +286,35 @@ static void lkdtm_HARDLOCKUP(void)
>  		cpu_relax();
>  }
>  
> +static void __lkdtm_CSDLOCKUP(void *unused)
> +{
> +	for (;;)
> +		cpu_relax();
> +}
> +
> +static void lkdtm_CSDLOCKUP(void)
> +{
> +	unsigned int cpu, target;
> +
> +	cpus_read_lock();
> +
> +	cpu = get_cpu();
> +	target = cpumask_any_but(cpu_online_mask, cpu);
> +
> +	if (target >= nr_cpu_ids) {
> +		pr_err("FAIL: no other online CPUs\n");
> +		goto out_put_cpus;
> +	}
> +
> +	smp_call_function_single(target, __lkdtm_CSDLOCKUP, NULL, 1);
> +
> +	pr_err("FAIL: did not hang\n");
> +
> +out_put_cpus:
> +	put_cpu();
> +	cpus_read_unlock();
> +}
> +
>  static void lkdtm_SPINLOCKUP(void)
>  {
>  	/* Must be called twice to trigger. */
> @@ -680,6 +709,7 @@ static struct crashtype crashtypes[] = {
>  	CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE),
>  	CRASHTYPE(SOFTLOCKUP),
>  	CRASHTYPE(HARDLOCKUP),
> +	CRASHTYPE(CSDLOCKUP),
>  	CRASHTYPE(SPINLOCKUP),
>  	CRASHTYPE(HUNG_TASK),
>  	CRASHTYPE(OVERFLOW_SIGNED),
> -- 
> 2.30.2

Seems good to me.

-- 
Kees Cook
Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
Posted by Mark Rutland 3 weeks, 4 days ago
On Fri, Apr 19, 2024 at 02:53:59PM -0700, Kees Cook wrote:
> On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> > The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> > smp_call_function*() calls (e.g. when the target CPU gets stuck within
> > the callback function). Testing this option requires triggering such
> > hangs.
> > 
> > This patch adds an lkdtm test with a hung smp_call_function_single()
> > callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> > backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> > of the hung target CPU).

[...]

> > I wrote this because I needed to guide someone through debugging a hung
> > smp_call_function() call, and I needed examples with/without an NMI
> > backtrace. It seems like it'd be useful for testing the CSD lockup
> > detector and NMI backtrace code in future.
> 
> Like the other lockup detector, I suspect we should skip it by default
> in the selftests? Something like this:
> 
> diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
> index 368973f05250..32baddc2c85d 100644
> --- a/tools/testing/selftests/lkdtm/tests.txt
> +++ b/tools/testing/selftests/lkdtm/tests.txt
> @@ -31,6 +31,7 @@ SLAB_FREE_CROSS
>  SLAB_FREE_PAGE
>  #SOFTLOCKUP Hangs the system
>  #HARDLOCKUP Hangs the system
> +#CSDLOCKUP Hangs the system
>  #SPINLOCKUP Hangs the system
>  #HUNG_TASK Hangs the system
>  EXEC_DATA

Ah, I wasn't ware of that file, yes.

> > I'm not sure about the CSDLOCKUP name, but everything else I tried
> > didn't seem great either:
> > 
> > * IPILOCKUP sounds like it's testing IPIs generally
> > * SMPCALLLOCKUP and similar look weirdly long
> > * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> > 
> > ... and I'm happy to defer to Kees for the naming. ;)
> 
> It looks like it's only useful with CSD lockup detector? If that's true,
> sure, this name is fine.

I think it's also useful for testing other things (e.g. RCU stall detection),
so how about we go with SMP_CALL_LOCKUP, as that says what the test does rather
than what specifically it can be used to test?

Mark.
Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
Posted by Kees Cook 3 weeks, 4 days ago
On Tue, Apr 23, 2024 at 10:47:29AM +0100, Mark Rutland wrote:
> On Fri, Apr 19, 2024 at 02:53:59PM -0700, Kees Cook wrote:
> > On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> > > The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> > > smp_call_function*() calls (e.g. when the target CPU gets stuck within
> > > the callback function). Testing this option requires triggering such
> > > hangs.
> > > 
> > > This patch adds an lkdtm test with a hung smp_call_function_single()
> > > callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> > > backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> > > of the hung target CPU).
> 
> [...]
> 
> > > I wrote this because I needed to guide someone through debugging a hung
> > > smp_call_function() call, and I needed examples with/without an NMI
> > > backtrace. It seems like it'd be useful for testing the CSD lockup
> > > detector and NMI backtrace code in future.
> > 
> > Like the other lockup detector, I suspect we should skip it by default
> > in the selftests? Something like this:
> > 
> > diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
> > index 368973f05250..32baddc2c85d 100644
> > --- a/tools/testing/selftests/lkdtm/tests.txt
> > +++ b/tools/testing/selftests/lkdtm/tests.txt
> > @@ -31,6 +31,7 @@ SLAB_FREE_CROSS
> >  SLAB_FREE_PAGE
> >  #SOFTLOCKUP Hangs the system
> >  #HARDLOCKUP Hangs the system
> > +#CSDLOCKUP Hangs the system
> >  #SPINLOCKUP Hangs the system
> >  #HUNG_TASK Hangs the system
> >  EXEC_DATA
> 
> Ah, I wasn't ware of that file, yes.
> 
> > > I'm not sure about the CSDLOCKUP name, but everything else I tried
> > > didn't seem great either:
> > > 
> > > * IPILOCKUP sounds like it's testing IPIs generally
> > > * SMPCALLLOCKUP and similar look weirdly long
> > > * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> > > 
> > > ... and I'm happy to defer to Kees for the naming. ;)
> > 
> > It looks like it's only useful with CSD lockup detector? If that's true,
> > sure, this name is fine.
> 
> I think it's also useful for testing other things (e.g. RCU stall detection),
> so how about we go with SMP_CALL_LOCKUP, as that says what the test does rather
> than what specifically it can be used to test?

Yeah, that works for me. Thanks!

-Kees

-- 
Kees Cook