[PATCH] dma-buf/sw-sync: Hide the feature by default

Janusz Krzysztofik posted 1 patch 1 week, 2 days ago
drivers/dma-buf/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] dma-buf/sw-sync: Hide the feature by default
Posted by Janusz Krzysztofik 1 week, 2 days ago
When multiple fences of an sw_sync timeline are signaled via
sw_sync_ioctl_inc(), we now disable interrupts and keep them disabled
while signaling all requested fences of the timeline in a loop.  Since
user space may set up an arbitrary long timeline of fences with
arbitrarily expensive callbacks added to each fence, we may end up running
with interrupts disabled for too long, longer than NMI watchdog limit.
That potentially risky scenario has been demonstrated on Intel DRM CI
trybot[1], on a low end machine fi-pnv-d510, with one of new IGT subtests
that tried to reimplement wait_* test cases of a dma_fence_chain selftest
in user space.

[141.993704] [IGT] syncobj_timeline: starting subtest stress-enable-all-signal-all-forward
[164.964389] watchdog: CPU3: Watchdog detected hard LOCKUP on cpu 3
[164.964407] Modules linked in: snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit video wmi overlay at24 ppdev gpio_ich binfmt_misc nls_iso8859_1 coretemp i2c_i801 i2c_mux i2c_smbus r8169 lpc_ich realtek parport_pc parport nvme_fabrics dm_multipath fuse msr efi_pstore nfnetlink autofs4
[164.964569] irq event stamp: 1002206
[164.964575] hardirqs last  enabled at (1002205): [<ffffffff82898ac7>] _raw_spin_unlock_irq+0x27/0x70
[164.964599] hardirqs last disabled at (1002206): [<ffffffff8287d021>] sysvec_irq_work+0x11/0xc0
[164.964616] softirqs last  enabled at (1002138): [<ffffffff81341bc5>] fpu_clone+0xb5/0x270
[164.964631] softirqs last disabled at (1002136): [<ffffffff81341b97>] fpu_clone+0x87/0x270
[164.964650] CPU: 3 UID: 0 PID: 1515 Comm: syncobj_timelin Tainted: G     U              6.17.0-rc6-Trybot_154715v1-gc1b827f32471+ #1 PREEMPT(voluntary)
[164.964662] Tainted: [U]=USER
[164.964665] Hardware name:  /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010
[164.964669] RIP: 0010:lock_release+0x13d/0x2a0
[164.964680] Code: c2 01 48 8d 4d c8 44 89 f6 4c 89 ef e8 bc fc ff ff 0b 05 96 ca 42 06 0f 84 fc 00 00 00 b8 ff ff ff ff 65 0f c1 05 0b 71 a9 02 <83> f8 01 0f 85 2f 01 00 00 48 f7 45 c0 00 02 00 00 74 06 fb 0f 1f
[164.964686] RSP: 0018:ffffc90000170e70 EFLAGS: 00000057
[164.964693] RAX: 0000000000000001 RBX: ffffffff83595520 RCX: 0000000000000000
[164.964698] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[164.964701] RBP: ffffc90000170eb0 R08: 0000000000000000 R09: 0000000000000000
[164.964706] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8226a948
[164.964710] R13: ffff88802423b340 R14: 0000000000000001 R15: ffff88802423c238
[164.964714] FS:  0000729f4d972940(0000) GS:ffff8880f8e77000(0000) knlGS:0000000000000000
[164.964720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[164.964725] CR2: 0000729f4d92e720 CR3: 000000003afe4000 CR4: 00000000000006f0
[164.964729] Call Trace:
[164.964734]  <IRQ>
[164.964750]  dma_fence_chain_get_prev+0x13d/0x240
[164.964769]  dma_fence_chain_walk+0xbd/0x200
[164.964784]  dma_fence_chain_enable_signaling+0xb2/0x280
[164.964803]  dma_fence_chain_irq_work+0x1b/0x80
[164.964816]  irq_work_single+0x75/0xa0
[164.964834]  irq_work_run_list+0x33/0x60
[164.964846]  irq_work_run+0x18/0x40
[164.964856]  __sysvec_irq_work+0x35/0x170
[164.964868]  sysvec_irq_work+0x9b/0xc0
[164.964879]  </IRQ>
[164.964882]  <TASK>
[164.964890]  asm_sysvec_irq_work+0x1b/0x20
[164.964900] RIP: 0010:_raw_spin_unlock_irq+0x2d/0x70
[164.964907] Code: 00 00 55 48 89 e5 53 48 89 fb 48 83 c7 18 48 8b 75 08 e8 06 63 bf fe 48 89 df e8 be 98 bf fe e8 59 ee d3 fe fb 0f 1f 44 00 00 <65> ff 0d 5c 85 68 01 74 14 48 8b 5d f8 c9 31 c0 31 d2 31 c9 31 f6
[164.964913] RSP: 0018:ffffc9000070fca0 EFLAGS: 00000246
[164.964919] RAX: 0000000000000000 RBX: ffff88800c2d8b10 RCX: 0000000000000000
[164.964923] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[164.964927] RBP: ffffc9000070fca8 R08: 0000000000000000 R09: 0000000000000000
[164.964931] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800c2d8ac0
[164.964934] R13: ffffc9000070fcc8 R14: ffff88800c2d8ac0 R15: 00000000ffffffff
[164.964967]  sync_timeline_signal+0x153/0x2c0
[164.964989]  sw_sync_ioctl+0x98/0x580
[164.965017]  __x64_sys_ioctl+0xa2/0x100
[164.965034]  x64_sys_call+0x1226/0x2680
[164.965046]  do_syscall_64+0x93/0x980
[164.965057]  ? do_syscall_64+0x1b7/0x980
[164.965070]  ? lock_release+0xce/0x2a0
[164.965082]  ? __might_fault+0x53/0xb0
[164.965096]  ? __might_fault+0x89/0xb0
[164.965104]  ? __might_fault+0x53/0xb0
[164.965116]  ? _copy_to_user+0x53/0x70
[164.965131]  ? __x64_sys_rt_sigprocmask+0x8f/0xe0
[164.965152]  ? do_syscall_64+0x1b7/0x980
[164.965169]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[164.965176] RIP: 0033:0x729f4fb24ded
[164.965188] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[164.965193] RSP: 002b:00007ffdc36220e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[164.965200] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000729f4fb24ded
[164.965205] RDX: 00007ffdc3622174 RSI: 0000000040045701 RDI: 0000000000000007
[164.965209] RBP: 00007ffdc3622130 R08: 0000000000000000 R09: 0000000000000000
[164.965213] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdc3622174
[164.965217] R13: 0000000040045701 R14: 0000000000000007 R15: 0000000000000003
[164.965248]  </TASK>
[166.952984] perf: interrupt took too long (11861 > 6217), lowering kernel.perf_event_max_sample_rate to 16000
[166.953134] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 13036276804 wd_nsec: 13036274445

As explained by Christian Köenig[2], "The purpose of the sw-sync is to
test what happens if drivers exposing dma-fences doesn't behave well.  So
being able to trigger the NMI watchdog for example is part of why that
functionality exists in the first place. ... You can actually use the
functionality to intentionally deadlock drivers and even the core memory
management."

Let the feature show up only if EXPERT is selected.

[1] https://patchwork.freedesktop.org/series/154715/
[2] https://patchwork.freedesktop.org/patch/675579/#comment_1239269

Fixes: 35538d7822e86 ("dma-buf/sw_sync: de-stage SW_SYNC")
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/dma-buf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index b46eb8a552d7b..e726948b64f67 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -18,7 +18,7 @@ config SYNC_FILE
 	  Documentation/driver-api/sync_file.rst.
 
 config SW_SYNC
-	bool "Sync File Validation Framework"
+	bool "Sync File Validation Framework" if EXPERT
 	default n
 	depends on SYNC_FILE
 	depends on DEBUG_FS
-- 
2.51.0

Re: [PATCH] dma-buf/sw-sync: Hide the feature by default
Posted by Christian König 1 week, 2 days ago
On 22.09.25 15:24, Janusz Krzysztofik wrote:
> When multiple fences of an sw_sync timeline are signaled via
> sw_sync_ioctl_inc(), we now disable interrupts and keep them disabled
> while signaling all requested fences of the timeline in a loop.  Since
> user space may set up an arbitrary long timeline of fences with
> arbitrarily expensive callbacks added to each fence, we may end up running
> with interrupts disabled for too long, longer than NMI watchdog limit.
> That potentially risky scenario has been demonstrated on Intel DRM CI
> trybot[1], on a low end machine fi-pnv-d510, with one of new IGT subtests
> that tried to reimplement wait_* test cases of a dma_fence_chain selftest
> in user space.
> 
> [141.993704] [IGT] syncobj_timeline: starting subtest stress-enable-all-signal-all-forward
> [164.964389] watchdog: CPU3: Watchdog detected hard LOCKUP on cpu 3
> [164.964407] Modules linked in: snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit video wmi overlay at24 ppdev gpio_ich binfmt_misc nls_iso8859_1 coretemp i2c_i801 i2c_mux i2c_smbus r8169 lpc_ich realtek parport_pc parport nvme_fabrics dm_multipath fuse msr efi_pstore nfnetlink autofs4
> [164.964569] irq event stamp: 1002206
> [164.964575] hardirqs last  enabled at (1002205): [<ffffffff82898ac7>] _raw_spin_unlock_irq+0x27/0x70
> [164.964599] hardirqs last disabled at (1002206): [<ffffffff8287d021>] sysvec_irq_work+0x11/0xc0
> [164.964616] softirqs last  enabled at (1002138): [<ffffffff81341bc5>] fpu_clone+0xb5/0x270
> [164.964631] softirqs last disabled at (1002136): [<ffffffff81341b97>] fpu_clone+0x87/0x270
> [164.964650] CPU: 3 UID: 0 PID: 1515 Comm: syncobj_timelin Tainted: G     U              6.17.0-rc6-Trybot_154715v1-gc1b827f32471+ #1 PREEMPT(voluntary)
> [164.964662] Tainted: [U]=USER
> [164.964665] Hardware name:  /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010
> [164.964669] RIP: 0010:lock_release+0x13d/0x2a0
> [164.964680] Code: c2 01 48 8d 4d c8 44 89 f6 4c 89 ef e8 bc fc ff ff 0b 05 96 ca 42 06 0f 84 fc 00 00 00 b8 ff ff ff ff 65 0f c1 05 0b 71 a9 02 <83> f8 01 0f 85 2f 01 00 00 48 f7 45 c0 00 02 00 00 74 06 fb 0f 1f
> [164.964686] RSP: 0018:ffffc90000170e70 EFLAGS: 00000057
> [164.964693] RAX: 0000000000000001 RBX: ffffffff83595520 RCX: 0000000000000000
> [164.964698] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [164.964701] RBP: ffffc90000170eb0 R08: 0000000000000000 R09: 0000000000000000
> [164.964706] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8226a948
> [164.964710] R13: ffff88802423b340 R14: 0000000000000001 R15: ffff88802423c238
> [164.964714] FS:  0000729f4d972940(0000) GS:ffff8880f8e77000(0000) knlGS:0000000000000000
> [164.964720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [164.964725] CR2: 0000729f4d92e720 CR3: 000000003afe4000 CR4: 00000000000006f0
> [164.964729] Call Trace:
> [164.964734]  <IRQ>
> [164.964750]  dma_fence_chain_get_prev+0x13d/0x240
> [164.964769]  dma_fence_chain_walk+0xbd/0x200
> [164.964784]  dma_fence_chain_enable_signaling+0xb2/0x280
> [164.964803]  dma_fence_chain_irq_work+0x1b/0x80
> [164.964816]  irq_work_single+0x75/0xa0
> [164.964834]  irq_work_run_list+0x33/0x60
> [164.964846]  irq_work_run+0x18/0x40
> [164.964856]  __sysvec_irq_work+0x35/0x170
> [164.964868]  sysvec_irq_work+0x9b/0xc0
> [164.964879]  </IRQ>
> [164.964882]  <TASK>
> [164.964890]  asm_sysvec_irq_work+0x1b/0x20
> [164.964900] RIP: 0010:_raw_spin_unlock_irq+0x2d/0x70
> [164.964907] Code: 00 00 55 48 89 e5 53 48 89 fb 48 83 c7 18 48 8b 75 08 e8 06 63 bf fe 48 89 df e8 be 98 bf fe e8 59 ee d3 fe fb 0f 1f 44 00 00 <65> ff 0d 5c 85 68 01 74 14 48 8b 5d f8 c9 31 c0 31 d2 31 c9 31 f6
> [164.964913] RSP: 0018:ffffc9000070fca0 EFLAGS: 00000246
> [164.964919] RAX: 0000000000000000 RBX: ffff88800c2d8b10 RCX: 0000000000000000
> [164.964923] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [164.964927] RBP: ffffc9000070fca8 R08: 0000000000000000 R09: 0000000000000000
> [164.964931] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800c2d8ac0
> [164.964934] R13: ffffc9000070fcc8 R14: ffff88800c2d8ac0 R15: 00000000ffffffff
> [164.964967]  sync_timeline_signal+0x153/0x2c0
> [164.964989]  sw_sync_ioctl+0x98/0x580
> [164.965017]  __x64_sys_ioctl+0xa2/0x100
> [164.965034]  x64_sys_call+0x1226/0x2680
> [164.965046]  do_syscall_64+0x93/0x980
> [164.965057]  ? do_syscall_64+0x1b7/0x980
> [164.965070]  ? lock_release+0xce/0x2a0
> [164.965082]  ? __might_fault+0x53/0xb0
> [164.965096]  ? __might_fault+0x89/0xb0
> [164.965104]  ? __might_fault+0x53/0xb0
> [164.965116]  ? _copy_to_user+0x53/0x70
> [164.965131]  ? __x64_sys_rt_sigprocmask+0x8f/0xe0
> [164.965152]  ? do_syscall_64+0x1b7/0x980
> [164.965169]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [164.965176] RIP: 0033:0x729f4fb24ded
> [164.965188] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> [164.965193] RSP: 002b:00007ffdc36220e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [164.965200] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000729f4fb24ded
> [164.965205] RDX: 00007ffdc3622174 RSI: 0000000040045701 RDI: 0000000000000007
> [164.965209] RBP: 00007ffdc3622130 R08: 0000000000000000 R09: 0000000000000000
> [164.965213] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdc3622174
> [164.965217] R13: 0000000040045701 R14: 0000000000000007 R15: 0000000000000003
> [164.965248]  </TASK>
> [166.952984] perf: interrupt took too long (11861 > 6217), lowering kernel.perf_event_max_sample_rate to 16000
> [166.953134] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 13036276804 wd_nsec: 13036274445
> 
> As explained by Christian Köenig[2], "The purpose of the sw-sync is to
> test what happens if drivers exposing dma-fences doesn't behave well.  So
> being able to trigger the NMI watchdog for example is part of why that
> functionality exists in the first place. ... You can actually use the
> functionality to intentionally deadlock drivers and even the core memory
> management."
> 
> Let the feature show up only if EXPERT is selected.
> 
> [1] https://patchwork.freedesktop.org/series/154715/
> [2] https://patchwork.freedesktop.org/patch/675579/#comment_1239269
> 
> Fixes: 35538d7822e86 ("dma-buf/sw_sync: de-stage SW_SYNC")
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

Good idea, we have previously discussed to taint the kernel if sw_sync is used but that is also a clearly step in the right direction.

Reviewed-by: Christian König <christian.koenig@amd.com>

Regards,
Christian.

> ---
>  drivers/dma-buf/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
> index b46eb8a552d7b..e726948b64f67 100644
> --- a/drivers/dma-buf/Kconfig
> +++ b/drivers/dma-buf/Kconfig
> @@ -18,7 +18,7 @@ config SYNC_FILE
>  	  Documentation/driver-api/sync_file.rst.
>  
>  config SW_SYNC
> -	bool "Sync File Validation Framework"
> +	bool "Sync File Validation Framework" if EXPERT
>  	default n
>  	depends on SYNC_FILE
>  	depends on DEBUG_FS