[RFC v5 0/7] Add packed format to shadow virtqueue

Sahil Siddiq posted 7 patches 5 days, 15 hours ago
hw/virtio/vhost-shadow-virtqueue.c | 396 ++++++++++++++++++++++-------
hw/virtio/vhost-shadow-virtqueue.h |  88 ++++---
hw/virtio/vhost-vdpa.c             |  52 +++-
3 files changed, 404 insertions(+), 132 deletions(-)
[RFC v5 0/7] Add packed format to shadow virtqueue
Posted by Sahil Siddiq 5 days, 15 hours ago
Hi,

I managed to fix a few issues while testing this patch series.
There is still one issue that I am unable to resolve. I thought
I would send this patch series for review in case I have missed
something.

The issue is that this patch series does not work every time. I
am able to ping L0 from L2 and vice versa via packed SVQ when it
works.

When this doesn't work, both VMs throw a "Destination Host
Unreachable" error. This is sometimes (not always) accompanied
by the following kernel error (thrown by L2-kernel):

virtio_net virtio1: output.0:id 1 is not a head!

This error is not thrown always, but when it is thrown, the id
varies. This is invariably followed by a soft lockup:

[  284.662292] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [swapper/1:0]
[  284.662292] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class vfg
[  284.662292] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.7-200.fc39.x86_64 #1
[  284.662292] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  284.662292] RIP: 0010:virtqueue_enable_cb_delayed+0x115/0x150
[  284.662292] Code: 44 77 04 0f ae f0 48 8b 42 70 0f b7 40 02 66 2b 42 50 66 39 c1 0f 93 c0 c3 cc cc cc cc 66 87 44 77 04 eb e2 f0 83 44 24 fc 00 <e9> 5a f1
[  284.662292] RSP: 0018:ffffb8f000100cb0 EFLAGS: 00000246
[  284.662292] RAX: 0000000000000000 RBX: ffff96f20204d800 RCX: ffff96f206f5e000
[  284.662292] RDX: ffff96f2054fd900 RSI: ffffb8f000100c7c RDI: ffff96f2054fd900
[  284.662292] RBP: ffff96f2078bb000 R08: 0000000000000001 R09: 0000000000000001
[  284.662292] R10: ffff96f2078bb000 R11: 0000000000000005 R12: ffff96f207bb4a00
[  284.662292] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96f20452fd00
[  284.662292] FS:  0000000000000000(0000) GS:ffff96f27bc80000(0000) knlGS:0000000000000000
[  284.662292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  284.662292] CR2: 00007f2a9ca191e8 CR3: 0000000136422003 CR4: 0000000000770ef0
[  284.662292] PKRU: 55555554
[  284.662292] Call Trace:
[  284.662292]  <IRQ>
[  284.662292]  ? watchdog_timer_fn+0x1e6/0x270
[  284.662292]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  284.662292]  ? __hrtimer_run_queues+0x10f/0x2b0
[  284.662292]  ? hrtimer_interrupt+0xf8/0x230
[  284.662292]  ? __sysvec_apic_timer_interrupt+0x4d/0x140
[  284.662292]  ? sysvec_apic_timer_interrupt+0x39/0x90
[  284.662292]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  284.662292]  ? virtqueue_enable_cb_delayed+0x115/0x150
[  284.662292]  start_xmit+0x2a6/0x4f0 [virtio_net]
[  284.662292]  ? netif_skb_features+0x98/0x300
[  284.662292]  dev_hard_start_xmit+0x61/0x1d0
[  284.662292]  sch_direct_xmit+0xa4/0x390
[  284.662292]  __dev_queue_xmit+0x84f/0xdc0
[  284.662292]  ? nf_hook_slow+0x42/0xf0
[  284.662292]  ip_finish_output2+0x2b8/0x580
[  284.662292]  igmp_ifc_timer_expire+0x1d5/0x430
[  284.662292]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
[  284.662292]  call_timer_fn+0x21/0x130
[  284.662292]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
[  284.662292]  __run_timers+0x21f/0x2b0
[  284.662292]  run_timer_softirq+0x1d/0x40
[  284.662292]  __do_softirq+0xc9/0x2c8
[  284.662292]  __irq_exit_rcu+0xa6/0xc0
[  284.662292]  sysvec_apic_timer_interrupt+0x72/0x90
[  284.662292]  </IRQ>
[  284.662292]  <TASK>
[  284.662292]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  284.662292] RIP: 0010:pv_native_safe_halt+0xf/0x20
[  284.662292] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 53 75 3f 00 fb f4 <c3> cc c0
[  284.662292] RSP: 0018:ffffb8f0000b3ed8 EFLAGS: 00000212
[  284.662292] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[  284.662292] RDX: 4000000000000000 RSI: 0000000000000083 RDI: 00000000000289ec
[  284.662292] RBP: ffff96f200810000 R08: 0000000000000000 R09: 0000000000000001
[  284.662292] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  284.662292] R13: 0000000000000000 R14: ffff96f200810000 R15: 0000000000000000
[  284.662292]  default_idle+0x9/0x20
[  284.662292]  default_idle_call+0x2c/0xe0
[  284.662292]  do_idle+0x226/0x270
[  284.662292]  cpu_startup_entry+0x2a/0x30
[  284.662292]  start_secondary+0x11e/0x140
[  284.662292]  secondary_startup_64_no_verify+0x184/0x18b
[  284.662292]  </TASK>

The soft lockup seems to happen in
drivers/net/virtio_net.c:start_xmit() [1].

I don't think the issue is in the kernel because I haven't seen
any issue when testing my changes with split vqs. Only packed vqs
give an issue.

L0 kernel version: 6.12.13-1-lts

QEMU command to boot L1:

$ sudo ./qemu/build/qemu-system-x86_64 \
-enable-kvm \
-drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
-net nic,model=virtio \
-net user,hostfwd=tcp::2222-:22 \
-device intel-iommu,snoop-control=on \
-device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off,packed=on,event_idx=off,bus=pcie.0,addr=0x4 \
-netdev tap,id=net0,script=no,downscript=no,vhost=off \
-nographic \
-m 8G \
-smp 4 \
-M q35 \
-cpu host 2>&1 | tee vm.log

L1 kernel version: 6.8.5-201.fc39.x86_64

I have been following the "Hands on vDPA - Part 2" blog
to set up the environment in L1 [2].

QEMU command to boot L2:

# ./qemu/build/qemu-system-x86_64 \
-nographic \
-m 4G \
-enable-kvm \
-M q35 \
-drive file=//root/L2.qcow2,media=disk,if=virtio \
-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \
-device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off,event_idx=off,packed=on,bus=pcie.0,addr=0x7 \
-smp 4 \
-cpu host \
2>&1 | tee vm.log

L2 kernel version: 6.8.7-200.fc39.x86_64

I confirmed that packed vqs are enabled in L2 by running the
following:

# cut -c35 /sys/devices/pci0000\:00/0000\:00\:07.0/virtio1/features 
1

I may be wrong, but I think the issue in my implementation might be
related to:

1. incorrect endianness coversions.
2. implementation of "vhost_svq_more_used_packed" in commit #5.
3. implementation of "vhost_svq_(en|dis)able_notification" in commit #5.
4. something else?

Thanks,
Sahil

[1] https://github.com/torvalds/linux/blob/master/drivers/net/virtio_net.c#L3245
[2] https://www.redhat.com/en/blog/hands-vdpa-what-do-you-do-when-you-aint-got-hardware-part-2

Sahil Siddiq (7):
  vhost: Refactor vhost_svq_add_split
  vhost: Data structure changes to support packed vqs
  vhost: Forward descriptors to device via packed SVQ
  vdpa: Allocate memory for SVQ and map them to vdpa
  vhost: Forward descriptors to guest via packed vqs
  vhost: Validate transport device features for packed vqs
  vdpa: Support setting vring_base for packed SVQ

 hw/virtio/vhost-shadow-virtqueue.c | 396 ++++++++++++++++++++++-------
 hw/virtio/vhost-shadow-virtqueue.h |  88 ++++---
 hw/virtio/vhost-vdpa.c             |  52 +++-
 3 files changed, 404 insertions(+), 132 deletions(-)

-- 
2.48.1
Re: [RFC v5 0/7] Add packed format to shadow virtqueue
Posted by Eugenio Perez Martin 3 days, 22 hours ago
On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>
> Hi,
>
> I managed to fix a few issues while testing this patch series.
> There is still one issue that I am unable to resolve. I thought
> I would send this patch series for review in case I have missed
> something.
>
> The issue is that this patch series does not work every time. I
> am able to ping L0 from L2 and vice versa via packed SVQ when it
> works.
>

So we're on a very good track then!

> When this doesn't work, both VMs throw a "Destination Host
> Unreachable" error. This is sometimes (not always) accompanied
> by the following kernel error (thrown by L2-kernel):
>
> virtio_net virtio1: output.0:id 1 is not a head!
>

How many packets have been sent or received before hitting this? If
the answer to that is "the vq size", maybe there is a bug in the code
that handles the wraparound of the packed vq, as the used and avail
flags need to be twisted. You can count them in the SVQ code.

> This error is not thrown always, but when it is thrown, the id
> varies. This is invariably followed by a soft lockup:
>
> [  284.662292] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [swapper/1:0]
> [  284.662292] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class vfg
> [  284.662292] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.7-200.fc39.x86_64 #1
> [  284.662292] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [  284.662292] RIP: 0010:virtqueue_enable_cb_delayed+0x115/0x150
> [  284.662292] Code: 44 77 04 0f ae f0 48 8b 42 70 0f b7 40 02 66 2b 42 50 66 39 c1 0f 93 c0 c3 cc cc cc cc 66 87 44 77 04 eb e2 f0 83 44 24 fc 00 <e9> 5a f1
> [  284.662292] RSP: 0018:ffffb8f000100cb0 EFLAGS: 00000246
> [  284.662292] RAX: 0000000000000000 RBX: ffff96f20204d800 RCX: ffff96f206f5e000
> [  284.662292] RDX: ffff96f2054fd900 RSI: ffffb8f000100c7c RDI: ffff96f2054fd900
> [  284.662292] RBP: ffff96f2078bb000 R08: 0000000000000001 R09: 0000000000000001
> [  284.662292] R10: ffff96f2078bb000 R11: 0000000000000005 R12: ffff96f207bb4a00
> [  284.662292] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96f20452fd00
> [  284.662292] FS:  0000000000000000(0000) GS:ffff96f27bc80000(0000) knlGS:0000000000000000
> [  284.662292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  284.662292] CR2: 00007f2a9ca191e8 CR3: 0000000136422003 CR4: 0000000000770ef0
> [  284.662292] PKRU: 55555554
> [  284.662292] Call Trace:
> [  284.662292]  <IRQ>
> [  284.662292]  ? watchdog_timer_fn+0x1e6/0x270
> [  284.662292]  ? __pfx_watchdog_timer_fn+0x10/0x10
> [  284.662292]  ? __hrtimer_run_queues+0x10f/0x2b0
> [  284.662292]  ? hrtimer_interrupt+0xf8/0x230
> [  284.662292]  ? __sysvec_apic_timer_interrupt+0x4d/0x140
> [  284.662292]  ? sysvec_apic_timer_interrupt+0x39/0x90
> [  284.662292]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  284.662292]  ? virtqueue_enable_cb_delayed+0x115/0x150
> [  284.662292]  start_xmit+0x2a6/0x4f0 [virtio_net]
> [  284.662292]  ? netif_skb_features+0x98/0x300
> [  284.662292]  dev_hard_start_xmit+0x61/0x1d0
> [  284.662292]  sch_direct_xmit+0xa4/0x390
> [  284.662292]  __dev_queue_xmit+0x84f/0xdc0
> [  284.662292]  ? nf_hook_slow+0x42/0xf0
> [  284.662292]  ip_finish_output2+0x2b8/0x580
> [  284.662292]  igmp_ifc_timer_expire+0x1d5/0x430
> [  284.662292]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  284.662292]  call_timer_fn+0x21/0x130
> [  284.662292]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  284.662292]  __run_timers+0x21f/0x2b0
> [  284.662292]  run_timer_softirq+0x1d/0x40
> [  284.662292]  __do_softirq+0xc9/0x2c8
> [  284.662292]  __irq_exit_rcu+0xa6/0xc0
> [  284.662292]  sysvec_apic_timer_interrupt+0x72/0x90
> [  284.662292]  </IRQ>
> [  284.662292]  <TASK>
> [  284.662292]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  284.662292] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [  284.662292] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 53 75 3f 00 fb f4 <c3> cc c0
> [  284.662292] RSP: 0018:ffffb8f0000b3ed8 EFLAGS: 00000212
> [  284.662292] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
> [  284.662292] RDX: 4000000000000000 RSI: 0000000000000083 RDI: 00000000000289ec
> [  284.662292] RBP: ffff96f200810000 R08: 0000000000000000 R09: 0000000000000001
> [  284.662292] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [  284.662292] R13: 0000000000000000 R14: ffff96f200810000 R15: 0000000000000000
> [  284.662292]  default_idle+0x9/0x20
> [  284.662292]  default_idle_call+0x2c/0xe0
> [  284.662292]  do_idle+0x226/0x270
> [  284.662292]  cpu_startup_entry+0x2a/0x30
> [  284.662292]  start_secondary+0x11e/0x140
> [  284.662292]  secondary_startup_64_no_verify+0x184/0x18b
> [  284.662292]  </TASK>
>
> The soft lockup seems to happen in
> drivers/net/virtio_net.c:start_xmit() [1].
>

Maybe it gets stuck in the do {} while(...
!virtqueue_enable_cb_delayed()) ? you can add a printk in
virtqueue_enable_cb_delayed return and check if it matches with the
speed you're sending or receiving ping. For example, if ping is each
second, you should not see a lot of traces.

If this does not work I'd try never disabling notifications, both in
the kernel and SVQ, and check if that works.

> I don't think the issue is in the kernel because I haven't seen
> any issue when testing my changes with split vqs. Only packed vqs
> give an issue.
>
> L0 kernel version: 6.12.13-1-lts
>
> QEMU command to boot L1:
>
> $ sudo ./qemu/build/qemu-system-x86_64 \
> -enable-kvm \
> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
> -net nic,model=virtio \
> -net user,hostfwd=tcp::2222-:22 \
> -device intel-iommu,snoop-control=on \
> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off,packed=on,event_idx=off,bus=pcie.0,addr=0x4 \
> -netdev tap,id=net0,script=no,downscript=no,vhost=off \
> -nographic \
> -m 8G \
> -smp 4 \
> -M q35 \
> -cpu host 2>&1 | tee vm.log
>
> L1 kernel version: 6.8.5-201.fc39.x86_64
>
> I have been following the "Hands on vDPA - Part 2" blog
> to set up the environment in L1 [2].
>
> QEMU command to boot L2:
>
> # ./qemu/build/qemu-system-x86_64 \
> -nographic \
> -m 4G \
> -enable-kvm \
> -M q35 \
> -drive file=//root/L2.qcow2,media=disk,if=virtio \
> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \
> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off,event_idx=off,packed=on,bus=pcie.0,addr=0x7 \
> -smp 4 \
> -cpu host \
> 2>&1 | tee vm.log
>
> L2 kernel version: 6.8.7-200.fc39.x86_64
>
> I confirmed that packed vqs are enabled in L2 by running the
> following:
>
> # cut -c35 /sys/devices/pci0000\:00/0000\:00\:07.0/virtio1/features
> 1
>
> I may be wrong, but I think the issue in my implementation might be
> related to:
>
> 1. incorrect endianness coversions.
> 2. implementation of "vhost_svq_more_used_packed" in commit #5.
> 3. implementation of "vhost_svq_(en|dis)able_notification" in commit #5.
> 4. something else?
>

I think 1 is unlikely. I'd go with 2 and 3.

Let me know if the proposed changes work!

> Thanks,
> Sahil
>
> [1] https://github.com/torvalds/linux/blob/master/drivers/net/virtio_net.c#L3245
> [2] https://www.redhat.com/en/blog/hands-vdpa-what-do-you-do-when-you-aint-got-hardware-part-2
>
> Sahil Siddiq (7):
>   vhost: Refactor vhost_svq_add_split
>   vhost: Data structure changes to support packed vqs
>   vhost: Forward descriptors to device via packed SVQ
>   vdpa: Allocate memory for SVQ and map them to vdpa
>   vhost: Forward descriptors to guest via packed vqs
>   vhost: Validate transport device features for packed vqs
>   vdpa: Support setting vring_base for packed SVQ
>
>  hw/virtio/vhost-shadow-virtqueue.c | 396 ++++++++++++++++++++++-------
>  hw/virtio/vhost-shadow-virtqueue.h |  88 ++++---
>  hw/virtio/vhost-vdpa.c             |  52 +++-
>  3 files changed, 404 insertions(+), 132 deletions(-)
>
> --
> 2.48.1
>