On the guest where a NVidia's GH100 card is passed from the host, the
guest system hang can be observed on attempt to compile 'cuda-samples',
as reported by Julia.
host$ lspci | grep GH100
0009:01:00.0 3D controller: NVIDIA Corporation GH100 [GH200 120GB / 480GB] (rev a1)
host$ /home/sandbox/gavin/qemu.main/build/qemu-system-aarch64 -accel kvm \
-machine virt,gic-version=host,ras=on,highmem-mmio-size=4T \
-cpu host -smp cpus=32 -m size=8G \
-drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=d0 \
-device virtio-blk-pci,id=vb0,bus=pcie.0,drive=d0,num-queues=4 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.1.0
guest$ cd cuda-samples/build
guest$ make -j 20 clean
guest$ make -j 20
:
[ 54%] Linking CUDA executable graphMemoryNodes
[ 54%] Built target graphMemoryNodes
<no more output afterwards, guest becomes frozen here>
guest$ qemu-system-aarch64: virtio: bogus descriptor or out of resources
[ 555.814025] virtio_blk virtio0: [vda] new size: 268435456 512-byte logical blocks (137 GB/128 GiB)
When the GPU's driver (NVidia open driver) is loaded on guest bootup,
the memory blocks residing in the PCI BAR can be presented to the guest
through memory hot-add. The page cache can be allocated from the hot added
memory blocks when cuda-samples is being built. Afterwards, he page cache
is sent to QEMU's virtio-blk device as part of the DMA request, the bounce
buffer is used to accomodate the request as the corresponding memory
region (MemoryRegion) is a RAM DEVICE region in qemu. For this specific
case, false is returned from memory_access_is_direct() in the path where
the DMA request is handled.
QEMU
====
virtio_blk_handle_output
virtio_blk_handle_vq
virtio_blk_get_request
virtqueue_pop
virtqueue_split_pop
virtqueue_map_desc
address_space_map
memory_access_is_direct # Return false
memory_region_supports_direct_access
(qemu) info mtree
:
memory-region: pci_bridge_pci
0000000000000000-ffffffffffffffff (prio 0, container): pci_bridge_pci
0000042000000000-0000043fffffffff (prio 1, i/o): 0009:01:00.0 base BAR 4
0000042000000000-0000043fffffffff (prio 0, i/o): 0009:01:00.0 BAR 4
0000042000000000-000004379fffffff (prio 0, ramd): 0009:01:00.0 BAR 4 mmaps[0]
By default, the max bounce buffer size is only 4096 bytes, even less
than one page when the guest page is 64KB. This tries to fix the issue
by inheriting the customized max bounce buffer size of the virtio bus's
parent through property 'x-max-bounce-buffer-size' when the customized
size is a larger one. With this applied, no guest system hang is seen
with '-device virtio-blk-pci,...,x-max-bounce-buffer-size=268435456'.
Reported-by: Julia Graham <jugraham@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
hw/virtio/virtio-bus.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index cef944e015..e0933823f3 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -42,6 +42,7 @@ do { printf("virtio_bus: " fmt , ## __VA_ARGS__); } while (0)
/* A VirtIODevice is being plugged */
void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
{
+ AddressSpace *as;
DeviceState *qdev = DEVICE(vdev);
BusState *qbus = BUS(qdev_get_parent_bus(qdev));
VirtioBusState *bus = VIRTIO_BUS(qbus);
@@ -100,6 +101,19 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
return;
}
}
+ } else {
+ /*
+ * The maximal bounce buffer size of the virtio bus's parent may
+ * have been customized by property 'x-max-bounce-buffer-size'.
+ * Lets inherit the customized size if it's larger than the
+ * current one.
+ */
+ as = klass->get_dma_as ? klass->get_dma_as(qbus->parent) : NULL;
+ if (as) {
+ vdev->dma_as->max_bounce_buffer_size = MAX(
+ vdev->dma_as->max_bounce_buffer_size,
+ as->max_bounce_buffer_size);
+ }
}
}
--
2.54.0
© 2016 - 2026 Red Hat, Inc.