[PATCH] scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT

Jan Kiszka posted 1 patch 1 week, 2 days ago
drivers/scsi/storvsc_drv.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
[PATCH] scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT
Posted by Jan Kiszka 1 week, 2 days ago
From: Jan Kiszka <jan.kiszka@siemens.com>

This resolves the follow splat and lock-up when running with PREEMPT_RT
enabled on Hyper-V:

[  415.140818] BUG: scheduling while atomic: stress-ng-iomix/1048/0x00000002
[  415.140822] INFO: lockdep is turned off.
[  415.140823] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel aesni_intel rapl binfmt_misc nls_ascii nls_cp437 vfat fat snd_pcm hyperv_drm snd_timer drm_client_lib drm_shmem_helper snd sg soundcore drm_kms_helper pcspkr hv_balloon hv_utils evdev joydev drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs autofs4 ext4 crc16 mbcache jbd2 sr_mod sd_mod cdrom hv_storvsc serio_raw hid_generic scsi_transport_fc hid_hyperv scsi_mod hid hv_netvsc hyperv_keyboard scsi_common
[  415.140846] Preemption disabled at:
[  415.140847] [<ffffffffc0656171>] storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc]
[  415.140854] CPU: 8 UID: 0 PID: 1048 Comm: stress-ng-iomix Not tainted 6.19.0-rc7 #30 PREEMPT_{RT,(full)}
[  415.140856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/04/2024
[  415.140857] Call Trace:
[  415.140861]  <TASK>
[  415.140861]  ? storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc]
[  415.140863]  dump_stack_lvl+0x91/0xb0
[  415.140870]  __schedule_bug+0x9c/0xc0
[  415.140875]  __schedule+0xdf6/0x1300
[  415.140877]  ? rtlock_slowlock_locked+0x56c/0x1980
[  415.140879]  ? rcu_is_watching+0x12/0x60
[  415.140883]  schedule_rtlock+0x21/0x40
[  415.140885]  rtlock_slowlock_locked+0x502/0x1980
[  415.140891]  rt_spin_lock+0x89/0x1e0
[  415.140893]  hv_ringbuffer_write+0x87/0x2a0
[  415.140899]  vmbus_sendpacket_mpb_desc+0xb6/0xe0
[  415.140900]  ? rcu_is_watching+0x12/0x60
[  415.140902]  storvsc_queuecommand+0x669/0xbe0 [hv_storvsc]
[  415.140904]  ? HARDIRQ_verbose+0x10/0x10
[  415.140908]  ? __rq_qos_issue+0x28/0x40
[  415.140911]  scsi_queue_rq+0x760/0xd80 [scsi_mod]
[  415.140926]  __blk_mq_issue_directly+0x4a/0xc0
[  415.140928]  blk_mq_issue_direct+0x87/0x2b0
[  415.140931]  blk_mq_dispatch_queue_requests+0x120/0x440
[  415.140933]  blk_mq_flush_plug_list+0x7a/0x1a0
[  415.140935]  __blk_flush_plug+0xf4/0x150
[  415.140940]  __submit_bio+0x2b2/0x5c0
[  415.140944]  ? submit_bio_noacct_nocheck+0x272/0x360
[  415.140946]  submit_bio_noacct_nocheck+0x272/0x360
[  415.140951]  ext4_read_bh_lock+0x3e/0x60 [ext4]
[  415.140995]  ext4_block_write_begin+0x396/0x650 [ext4]
[  415.141018]  ? __pfx_ext4_da_get_block_prep+0x10/0x10 [ext4]
[  415.141038]  ext4_da_write_begin+0x1c4/0x350 [ext4]
[  415.141060]  generic_perform_write+0x14e/0x2c0
[  415.141065]  ext4_buffered_write_iter+0x6b/0x120 [ext4]
[  415.141083]  vfs_write+0x2ca/0x570
[  415.141087]  ksys_write+0x76/0xf0
[  415.141089]  do_syscall_64+0x99/0x1490
[  415.141093]  ? rcu_is_watching+0x12/0x60
[  415.141095]  ? finish_task_switch.isra.0+0xdf/0x3d0
[  415.141097]  ? rcu_is_watching+0x12/0x60
[  415.141098]  ? lock_release+0x1f0/0x2a0
[  415.141100]  ? rcu_is_watching+0x12/0x60
[  415.141101]  ? finish_task_switch.isra.0+0xe4/0x3d0
[  415.141103]  ? rcu_is_watching+0x12/0x60
[  415.141104]  ? __schedule+0xb34/0x1300
[  415.141106]  ? hrtimer_try_to_cancel+0x1d/0x170
[  415.141109]  ? do_nanosleep+0x8b/0x160
[  415.141111]  ? hrtimer_nanosleep+0x89/0x100
[  415.141114]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  415.141116]  ? xfd_validate_state+0x26/0x90
[  415.141118]  ? rcu_is_watching+0x12/0x60
[  415.141120]  ? do_syscall_64+0x1e0/0x1490
[  415.141121]  ? do_syscall_64+0x1e0/0x1490
[  415.141123]  ? rcu_is_watching+0x12/0x60
[  415.141124]  ? do_syscall_64+0x1e0/0x1490
[  415.141125]  ? do_syscall_64+0x1e0/0x1490
[  415.141127]  ? irqentry_exit+0x140/0x7e0
[  415.141129]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

get_cpu() disables preemption while the spinlock hv_ringbuffer_write is
using is converted to an rt-mutex under PREEMPT_RT.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

This is likely just the tip of an iceberg, see specifically [1], but if 
you never start addressing it, it will continue to crash ships, even if 
those are only on test cruises (we are fully aware that Hyper-V provides 
no RT guarantees for guests). A pragmatic alternative to that would be a 
simple

config HYPERV
    depends on !PREEMPT_RT

Please share your thoughts if this fix is worth it, or if we should 
better stop looking at the next splats that show up after it. We are 
currently considering to thread some of the hv platform IRQs under
PREEMPT_RT as potential next step.

TIA!

[1] https://lore.kernel.org/all/20230809-b4-rt_preempt-fix-v1-0-7283bbdc8b14@gmail.com/

 drivers/scsi/storvsc_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index b43d876747b7..68c837146b9e 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1855,8 +1855,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	cmd_request->payload_sz = payload_sz;
 
 	/* Invokes the vsc to start an IO */
-	ret = storvsc_do_io(dev, cmd_request, get_cpu());
-	put_cpu();
+	migrate_disable();
+	ret = storvsc_do_io(dev, cmd_request, smp_processor_id());
+	migrate_enable();
 
 	if (ret)
 		scsi_dma_unmap(scmnd);
-- 
2.51.0
Re: [PATCH] scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT
Posted by Bezdeka, Florian 2 days, 14 hours ago
On Thu, 2026-01-29 at 15:30 +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This resolves the follow splat and lock-up when running with PREEMPT_RT
> enabled on Hyper-V:
> 
> [  415.140818] BUG: scheduling while atomic: stress-ng-iomix/1048/0x00000002
> [  415.140822] INFO: lockdep is turned off.
> [  415.140823] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel aesni_intel rapl binfmt_misc nls_ascii nls_cp437 vfat fat snd_pcm hyperv_drm snd_timer drm_client_lib drm_shmem_helper snd sg soundcore drm_kms_helper pcspkr hv_balloon hv_utils evdev joydev drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs autofs4 ext4 crc16 mbcache jbd2 sr_mod sd_mod cdrom hv_storvsc serio_raw hid_generic scsi_transport_fc hid_hyperv scsi_mod hid hv_netvsc hyperv_keyboard scsi_common
> [  415.140846] Preemption disabled at:
> [  415.140847] [<ffffffffc0656171>] storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc]
> [  415.140854] CPU: 8 UID: 0 PID: 1048 Comm: stress-ng-iomix Not tainted 6.19.0-rc7 #30 PREEMPT_{RT,(full)}
> [  415.140856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/04/2024
> [  415.140857] Call Trace:
> [  415.140861]  <TASK>
> [  415.140861]  ? storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc]
> [  415.140863]  dump_stack_lvl+0x91/0xb0
> [  415.140870]  __schedule_bug+0x9c/0xc0
> [  415.140875]  __schedule+0xdf6/0x1300
> [  415.140877]  ? rtlock_slowlock_locked+0x56c/0x1980
> [  415.140879]  ? rcu_is_watching+0x12/0x60
> [  415.140883]  schedule_rtlock+0x21/0x40
> [  415.140885]  rtlock_slowlock_locked+0x502/0x1980
> [  415.140891]  rt_spin_lock+0x89/0x1e0
> [  415.140893]  hv_ringbuffer_write+0x87/0x2a0
> [  415.140899]  vmbus_sendpacket_mpb_desc+0xb6/0xe0
> [  415.140900]  ? rcu_is_watching+0x12/0x60
> [  415.140902]  storvsc_queuecommand+0x669/0xbe0 [hv_storvsc]
> [  415.140904]  ? HARDIRQ_verbose+0x10/0x10
> [  415.140908]  ? __rq_qos_issue+0x28/0x40
> [  415.140911]  scsi_queue_rq+0x760/0xd80 [scsi_mod]
> [  415.140926]  __blk_mq_issue_directly+0x4a/0xc0
> [  415.140928]  blk_mq_issue_direct+0x87/0x2b0
> [  415.140931]  blk_mq_dispatch_queue_requests+0x120/0x440
> [  415.140933]  blk_mq_flush_plug_list+0x7a/0x1a0
> [  415.140935]  __blk_flush_plug+0xf4/0x150
> [  415.140940]  __submit_bio+0x2b2/0x5c0
> [  415.140944]  ? submit_bio_noacct_nocheck+0x272/0x360
> [  415.140946]  submit_bio_noacct_nocheck+0x272/0x360
> [  415.140951]  ext4_read_bh_lock+0x3e/0x60 [ext4]
> [  415.140995]  ext4_block_write_begin+0x396/0x650 [ext4]
> [  415.141018]  ? __pfx_ext4_da_get_block_prep+0x10/0x10 [ext4]
> [  415.141038]  ext4_da_write_begin+0x1c4/0x350 [ext4]
> [  415.141060]  generic_perform_write+0x14e/0x2c0
> [  415.141065]  ext4_buffered_write_iter+0x6b/0x120 [ext4]
> [  415.141083]  vfs_write+0x2ca/0x570
> [  415.141087]  ksys_write+0x76/0xf0
> [  415.141089]  do_syscall_64+0x99/0x1490
> [  415.141093]  ? rcu_is_watching+0x12/0x60
> [  415.141095]  ? finish_task_switch.isra.0+0xdf/0x3d0
> [  415.141097]  ? rcu_is_watching+0x12/0x60
> [  415.141098]  ? lock_release+0x1f0/0x2a0
> [  415.141100]  ? rcu_is_watching+0x12/0x60
> [  415.141101]  ? finish_task_switch.isra.0+0xe4/0x3d0
> [  415.141103]  ? rcu_is_watching+0x12/0x60
> [  415.141104]  ? __schedule+0xb34/0x1300
> [  415.141106]  ? hrtimer_try_to_cancel+0x1d/0x170
> [  415.141109]  ? do_nanosleep+0x8b/0x160
> [  415.141111]  ? hrtimer_nanosleep+0x89/0x100
> [  415.141114]  ? __pfx_hrtimer_wakeup+0x10/0x10
> [  415.141116]  ? xfd_validate_state+0x26/0x90
> [  415.141118]  ? rcu_is_watching+0x12/0x60
> [  415.141120]  ? do_syscall_64+0x1e0/0x1490
> [  415.141121]  ? do_syscall_64+0x1e0/0x1490
> [  415.141123]  ? rcu_is_watching+0x12/0x60
> [  415.141124]  ? do_syscall_64+0x1e0/0x1490
> [  415.141125]  ? do_syscall_64+0x1e0/0x1490
> [  415.141127]  ? irqentry_exit+0x140/0x7e0
> [  415.141129]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> get_cpu() disables preemption while the spinlock hv_ringbuffer_write is
> using is converted to an rt-mutex under PREEMPT_RT.

Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com>

> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---

This patch survived a 24h stress test with CONFIG_PREEMPT_RT enabled and
heavy load applied to the system.

Without this patch - and very same system configuration - the system
will lock up within 2 minutes.

Best regards,
Florian

-- 
Siemens AG, Foundational Technologies
Linux Expert Center