system/physmem.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
virtqueue_map_desc() is called with values of sz exceeding that may exceed
TARGET_PAGE_SIZE. sz = 0x2800 has been observed.
We only support a single bounce buffer. We have to avoid
virtqueue_map_desc() calling address_space_map() multiple times. Otherwise
we see an error
qemu: virtio: bogus descriptor or out of resources
Increase the minimum size of the bounce buffer to 0x10000 which matches
the largest value of TARGET_PAGE_SIZE for all architectures.
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
---
v2:
remove unrelated change
---
system/physmem.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index e3ebc19eef..3c82da1c86 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as,
*plen = 0;
return NULL;
}
- /* Avoid unbounded allocations */
- l = MIN(l, TARGET_PAGE_SIZE);
+ /*
+ * There is only one bounce buffer. The largest occuring value of
+ * parameter sz of virtqueue_map_desc() must fit into the bounce
+ * buffer.
+ */
+ l = MIN(l, 0x10000);
bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
bounce.addr = addr;
bounce.len = l;
--
2.43.0
Hi Heinrich, On 28/2/24 13:59, Heinrich Schuchardt wrote: > virtqueue_map_desc() is called with values of sz exceeding that may exceed > TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > We only support a single bounce buffer. We have to avoid > virtqueue_map_desc() calling address_space_map() multiple times. Otherwise > we see an error > > qemu: virtio: bogus descriptor or out of resources > > Increase the minimum size of the bounce buffer to 0x10000 which matches > the largest value of TARGET_PAGE_SIZE for all architectures. > > Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > --- > v2: > remove unrelated change > --- > system/physmem.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/system/physmem.c b/system/physmem.c > index e3ebc19eef..3c82da1c86 100644 > --- a/system/physmem.c > +++ b/system/physmem.c > @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > *plen = 0; > return NULL; > } > - /* Avoid unbounded allocations */ > - l = MIN(l, TARGET_PAGE_SIZE); > + /* > + * There is only one bounce buffer. The largest occuring value of > + * parameter sz of virtqueue_map_desc() must fit into the bounce > + * buffer. > + */ > + l = MIN(l, 0x10000); Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or TARGETS_BIGGEST_PAGE_SIZE? Then along: QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); > bounce.addr = addr; > bounce.len = l;
On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > Hi Heinrich, > > On 28/2/24 13:59, Heinrich Schuchardt wrote: >> virtqueue_map_desc() is called with values of sz exceeding that may >> exceed >> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. >> >> We only support a single bounce buffer. We have to avoid >> virtqueue_map_desc() calling address_space_map() multiple times. >> Otherwise >> we see an error >> >> qemu: virtio: bogus descriptor or out of resources >> >> Increase the minimum size of the bounce buffer to 0x10000 which matches >> the largest value of TARGET_PAGE_SIZE for all architectures. >> >> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> >> --- >> v2: >> remove unrelated change >> --- >> system/physmem.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/system/physmem.c b/system/physmem.c >> index e3ebc19eef..3c82da1c86 100644 >> --- a/system/physmem.c >> +++ b/system/physmem.c >> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, >> *plen = 0; >> return NULL; >> } >> - /* Avoid unbounded allocations */ >> - l = MIN(l, TARGET_PAGE_SIZE); >> + /* >> + * There is only one bounce buffer. The largest occuring >> value of >> + * parameter sz of virtqueue_map_desc() must fit into the bounce >> + * buffer. >> + */ >> + l = MIN(l, 0x10000); > > Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > TARGETS_BIGGEST_PAGE_SIZE? > > Then along: > QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); Thank you Philippe for reviewing. TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the page size. How about MIN_BOUNCE_BUFFER_SIZE? Is include/exec/memory.h the right include for the constant? I don't think that TARGET_PAGE_SIZE has any relevance for setting the bounce buffer size. I only mentioned it to say that we are not decreasing the value on any existing architecture. I don't know why TARGET_PAGE_SIZE ever got into this piece of code. e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a reason for this choice. Maybe Paolo remembers. Best regards Heinrich > >> bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); >> bounce.addr = addr; >> bounce.len = l; >
On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > > Hi Heinrich, > > > > On 28/2/24 13:59, Heinrich Schuchardt wrote: > >> virtqueue_map_desc() is called with values of sz exceeding that may > >> exceed > >> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > >> > >> We only support a single bounce buffer. We have to avoid > >> virtqueue_map_desc() calling address_space_map() multiple times. > >> Otherwise > >> we see an error > >> > >> qemu: virtio: bogus descriptor or out of resources > >> > >> Increase the minimum size of the bounce buffer to 0x10000 which matches > >> the largest value of TARGET_PAGE_SIZE for all architectures. > >> > >> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > >> --- > >> v2: > >> remove unrelated change > >> --- > >> system/physmem.c | 8 ++++++-- > >> 1 file changed, 6 insertions(+), 2 deletions(-) > >> > >> diff --git a/system/physmem.c b/system/physmem.c > >> index e3ebc19eef..3c82da1c86 100644 > >> --- a/system/physmem.c > >> +++ b/system/physmem.c > >> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > >> *plen = 0; > >> return NULL; > >> } > >> - /* Avoid unbounded allocations */ > >> - l = MIN(l, TARGET_PAGE_SIZE); > >> + /* > >> + * There is only one bounce buffer. The largest occuring > >> value of > >> + * parameter sz of virtqueue_map_desc() must fit into the bounce > >> + * buffer. > >> + */ > >> + l = MIN(l, 0x10000); > > > > Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > > TARGETS_BIGGEST_PAGE_SIZE? > > > > Then along: > > QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > > Thank you Philippe for reviewing. > > TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > page size. > How about MIN_BOUNCE_BUFFER_SIZE? > Is include/exec/memory.h the right include for the constant? > > I don't think that TARGET_PAGE_SIZE has any relevance for setting the > bounce buffer size. I only mentioned it to say that we are not > decreasing the value on any existing architecture. > > I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > reason for this choice. Maybe Paolo remembers. The limitation to a page dates back to commit 6d16c2f88f2a in 2009, which was the first implementation of this function. I don't think there's a particular reason for that value beyond that it was probably a convenient value that was assumed to be likely "big enough". I think the idea with this bounce-buffer has always been that this isn't really a code path we expected to end up in very often -- it's supposed to be for when devices are doing DMA, which they will typically be doing to memory (backed by host RAM), not devices (backed by MMIO and needing a bounce buffer). So the whole mechanism is a bit "last fallback to stop things breaking entirely". The address_space_map() API says that it's allowed to return a subset of the range you ask for, so if the virtio code doesn't cope with the minimum being set to TARGET_PAGE_SIZE then either we need to fix that virtio code or we need to change the API of this function. (But I think you will also get a reduced range if you try to use it across a boundary between normal host-memory-backed RAM and a device MemoryRegion.) thanks -- PMM
On 28.02.24 19:39, Peter Maydell wrote: > On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: >> >> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: >>> Hi Heinrich, >>> >>> On 28/2/24 13:59, Heinrich Schuchardt wrote: >>>> virtqueue_map_desc() is called with values of sz exceeding that may >>>> exceed >>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. >>>> >>>> We only support a single bounce buffer. We have to avoid >>>> virtqueue_map_desc() calling address_space_map() multiple times. >>>> Otherwise >>>> we see an error >>>> >>>> qemu: virtio: bogus descriptor or out of resources >>>> >>>> Increase the minimum size of the bounce buffer to 0x10000 which matches >>>> the largest value of TARGET_PAGE_SIZE for all architectures. >>>> >>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> >>>> --- >>>> v2: >>>> remove unrelated change >>>> --- >>>> system/physmem.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/system/physmem.c b/system/physmem.c >>>> index e3ebc19eef..3c82da1c86 100644 >>>> --- a/system/physmem.c >>>> +++ b/system/physmem.c >>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, >>>> *plen = 0; >>>> return NULL; >>>> } >>>> - /* Avoid unbounded allocations */ >>>> - l = MIN(l, TARGET_PAGE_SIZE); >>>> + /* >>>> + * There is only one bounce buffer. The largest occuring >>>> value of >>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce >>>> + * buffer. >>>> + */ >>>> + l = MIN(l, 0x10000); >>> >>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or >>> TARGETS_BIGGEST_PAGE_SIZE? >>> >>> Then along: >>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); >> >> Thank you Philippe for reviewing. >> >> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the >> page size. >> How about MIN_BOUNCE_BUFFER_SIZE? >> Is include/exec/memory.h the right include for the constant? >> >> I don't think that TARGET_PAGE_SIZE has any relevance for setting the >> bounce buffer size. I only mentioned it to say that we are not >> decreasing the value on any existing architecture. >> >> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. >> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a >> reason for this choice. Maybe Paolo remembers. > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > which was the first implementation of this function. I don't think > there's a particular reason for that value beyond that it was > probably a convenient value that was assumed to be likely "big enough". > > I think the idea with this bounce-buffer has always been that this > isn't really a code path we expected to end up in very often -- > it's supposed to be for when devices are doing DMA, which they > will typically be doing to memory (backed by host RAM), not > devices (backed by MMIO and needing a bounce buffer). So the > whole mechanism is a bit "last fallback to stop things breaking > entirely". > > The address_space_map() API says that it's allowed to return > a subset of the range you ask for, so if the virtio code doesn't > cope with the minimum being set to TARGET_PAGE_SIZE then either > we need to fix that virtio code or we need to change the API > of this function. (But I think you will also get a reduced > range if you try to use it across a boundary between normal > host-memory-backed RAM and a device MemoryRegion.) If we allow a bounce buffer only to be used once (via the in_use flag), why do we allow only a single bounce buffer? Could address_space_map() allocate a new bounce buffer on every call and address_space_unmap() deallocate it? Isn't the design with a single bounce buffer bound to fail with a multi-threaded client as collision can be expected? Best regards Heinrich
On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 28.02.24 19:39, Peter Maydell wrote: > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > which was the first implementation of this function. I don't think > > there's a particular reason for that value beyond that it was > > probably a convenient value that was assumed to be likely "big enough". > > > > I think the idea with this bounce-buffer has always been that this > > isn't really a code path we expected to end up in very often -- > > it's supposed to be for when devices are doing DMA, which they > > will typically be doing to memory (backed by host RAM), not > > devices (backed by MMIO and needing a bounce buffer). So the > > whole mechanism is a bit "last fallback to stop things breaking > > entirely". > > > > The address_space_map() API says that it's allowed to return > > a subset of the range you ask for, so if the virtio code doesn't > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > we need to fix that virtio code or we need to change the API > > of this function. (But I think you will also get a reduced > > range if you try to use it across a boundary between normal > > host-memory-backed RAM and a device MemoryRegion.) > > If we allow a bounce buffer only to be used once (via the in_use flag), > why do we allow only a single bounce buffer? > > Could address_space_map() allocate a new bounce buffer on every call and > address_space_unmap() deallocate it? > > Isn't the design with a single bounce buffer bound to fail with a > multi-threaded client as collision can be expected? Yeah, I don't suppose multi-threaded was particularly expected. Again, this is really a "handle the case where the guest does something silly" setup, which is why only one bounce buffer. Why is your guest ending up in the bounce-buffer path? -- PMM
On Thu, 29 Feb 2024 09:38:29 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: > > > > On 28.02.24 19:39, Peter Maydell wrote: > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > > which was the first implementation of this function. I don't think > > > there's a particular reason for that value beyond that it was > > > probably a convenient value that was assumed to be likely "big enough". > > > > > > I think the idea with this bounce-buffer has always been that this > > > isn't really a code path we expected to end up in very often -- > > > it's supposed to be for when devices are doing DMA, which they > > > will typically be doing to memory (backed by host RAM), not > > > devices (backed by MMIO and needing a bounce buffer). So the > > > whole mechanism is a bit "last fallback to stop things breaking > > > entirely". > > > > > > The address_space_map() API says that it's allowed to return > > > a subset of the range you ask for, so if the virtio code doesn't > > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > > we need to fix that virtio code or we need to change the API > > > of this function. (But I think you will also get a reduced > > > range if you try to use it across a boundary between normal > > > host-memory-backed RAM and a device MemoryRegion.) > > > > If we allow a bounce buffer only to be used once (via the in_use flag), > > why do we allow only a single bounce buffer? > > > > Could address_space_map() allocate a new bounce buffer on every call and > > address_space_unmap() deallocate it? > > > > Isn't the design with a single bounce buffer bound to fail with a > > multi-threaded client as collision can be expected? > > Yeah, I don't suppose multi-threaded was particularly expected. > Again, this is really a "handle the case where the guest does > something silly" setup, which is why only one bounce buffer. > > Why is your guest ending up in the bounce-buffer path? Happens for me with emulated CXL memory. I think the case I saw was split descriptors in virtio via address space caches https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 One bounce buffer is in use for the outer loop and another for the descriptors it is pointing to. Matthias' series makes this work fine. I need to circle back and check how big a cache this needs. I'm carrying a silly size because of the side effect of the address space bug here https://lore.kernel.org/qemu-devel/20240215142817.1904-1-Jonathan.Cameron@huawei.com/#t and can probably set it to much less than my currently 1GiB. Jonathan > > -- PMM >
On Thu, 29 Feb 2024 at 10:59, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 29 Feb 2024 09:38:29 +0000 > Peter Maydell <peter.maydell@linaro.org> wrote: > > > On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > On 28.02.24 19:39, Peter Maydell wrote: > > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > > > which was the first implementation of this function. I don't think > > > > there's a particular reason for that value beyond that it was > > > > probably a convenient value that was assumed to be likely "big enough". > > > > > > > > I think the idea with this bounce-buffer has always been that this > > > > isn't really a code path we expected to end up in very often -- > > > > it's supposed to be for when devices are doing DMA, which they > > > > will typically be doing to memory (backed by host RAM), not > > > > devices (backed by MMIO and needing a bounce buffer). So the > > > > whole mechanism is a bit "last fallback to stop things breaking > > > > entirely". > > > > > > > > The address_space_map() API says that it's allowed to return > > > > a subset of the range you ask for, so if the virtio code doesn't > > > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > > > we need to fix that virtio code or we need to change the API > > > > of this function. (But I think you will also get a reduced > > > > range if you try to use it across a boundary between normal > > > > host-memory-backed RAM and a device MemoryRegion.) > > > > > > If we allow a bounce buffer only to be used once (via the in_use flag), > > > why do we allow only a single bounce buffer? > > > > > > Could address_space_map() allocate a new bounce buffer on every call and > > > address_space_unmap() deallocate it? > > > > > > Isn't the design with a single bounce buffer bound to fail with a > > > multi-threaded client as collision can be expected? > > > > Yeah, I don't suppose multi-threaded was particularly expected. > > Again, this is really a "handle the case where the guest does > > something silly" setup, which is why only one bounce buffer. > > > > Why is your guest ending up in the bounce-buffer path? > > Happens for me with emulated CXL memory. Can we put that in the "something silly" bucket? :-) But yes, I'm not surprised that CXL runs into this. Heinrich, are you doing CXL testing, or is this some other workload? > I think the case I saw > was split descriptors in virtio via address space caches > https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 > > One bounce buffer is in use for the outer loop and another for the descriptors > it is pointing to. Mmm. The other assumption made in the design of the address_space_map() API I think was that it was unlikely that a device would be trying to do two DMA operations simultaneously. This is clearly not true in practice. We definitely need to fix one end or other of this API. (I'm not sure why the bounce-buffer limit ought to be per-AddressSpace: is that just done in Matthias' series so that we can attach an x-thingy property to the individual PCI device?) -- PMM
On Thu, Feb 29, 2024 at 12:12 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 29 Feb 2024 at 10:59, Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: > > > > On Thu, 29 Feb 2024 09:38:29 +0000 > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > > > On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt > > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > > > On 28.02.24 19:39, Peter Maydell wrote: > > > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > > > > which was the first implementation of this function. I don't think > > > > > there's a particular reason for that value beyond that it was > > > > > probably a convenient value that was assumed to be likely "big enough". > > > > > > > > > > I think the idea with this bounce-buffer has always been that this > > > > > isn't really a code path we expected to end up in very often -- > > > > > it's supposed to be for when devices are doing DMA, which they > > > > > will typically be doing to memory (backed by host RAM), not > > > > > devices (backed by MMIO and needing a bounce buffer). So the > > > > > whole mechanism is a bit "last fallback to stop things breaking > > > > > entirely". > > > > > > > > > > The address_space_map() API says that it's allowed to return > > > > > a subset of the range you ask for, so if the virtio code doesn't > > > > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > > > > we need to fix that virtio code or we need to change the API > > > > > of this function. (But I think you will also get a reduced > > > > > range if you try to use it across a boundary between normal > > > > > host-memory-backed RAM and a device MemoryRegion.) > > > > > > > > If we allow a bounce buffer only to be used once (via the in_use flag), > > > > why do we allow only a single bounce buffer? > > > > > > > > Could address_space_map() allocate a new bounce buffer on every call and > > > > address_space_unmap() deallocate it? > > > > > > > > Isn't the design with a single bounce buffer bound to fail with a > > > > multi-threaded client as collision can be expected? > > > > > > Yeah, I don't suppose multi-threaded was particularly expected. > > > Again, this is really a "handle the case where the guest does > > > something silly" setup, which is why only one bounce buffer. > > > > > > Why is your guest ending up in the bounce-buffer path? > > > > Happens for me with emulated CXL memory. > > Can we put that in the "something silly" bucket? :-) > But yes, I'm not surprised that CXL runs into this. Heinrich, > are you doing CXL testing, or is this some other workload? > > > I think the case I saw > > was split descriptors in virtio via address space caches > > https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 > > > > One bounce buffer is in use for the outer loop and another for the descriptors > > it is pointing to. > > Mmm. The other assumption made in the design of the address_space_map() > API I think was that it was unlikely that a device would be trying > to do two DMA operations simultaneously. This is clearly not > true in practice. We definitely need to fix one end or other of > this API. > > (I'm not sure why the bounce-buffer limit ought to be per-AddressSpace: > is that just done in Matthias' series so that we can attach an > x-thingy property to the individual PCI device?) Yes, that's the result of review feedback to the early iterations of my series. Specifically, (1) a limit is needed to prevent rogue guests from hogging unlimited amounts of memory and (2) global parameters are frowned upon. Setting a suitable limit is much more practical when targeted at a given device/driver combination.
On 29.02.24 12:11, Peter Maydell wrote: > On Thu, 29 Feb 2024 at 10:59, Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: >> >> On Thu, 29 Feb 2024 09:38:29 +0000 >> Peter Maydell <peter.maydell@linaro.org> wrote: >> >>> On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt >>> <heinrich.schuchardt@canonical.com> wrote: >>>> >>>> On 28.02.24 19:39, Peter Maydell wrote: >>>>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, >>>>> which was the first implementation of this function. I don't think >>>>> there's a particular reason for that value beyond that it was >>>>> probably a convenient value that was assumed to be likely "big enough". >>>>> >>>>> I think the idea with this bounce-buffer has always been that this >>>>> isn't really a code path we expected to end up in very often -- >>>>> it's supposed to be for when devices are doing DMA, which they >>>>> will typically be doing to memory (backed by host RAM), not >>>>> devices (backed by MMIO and needing a bounce buffer). So the >>>>> whole mechanism is a bit "last fallback to stop things breaking >>>>> entirely". >>>>> >>>>> The address_space_map() API says that it's allowed to return >>>>> a subset of the range you ask for, so if the virtio code doesn't >>>>> cope with the minimum being set to TARGET_PAGE_SIZE then either >>>>> we need to fix that virtio code or we need to change the API >>>>> of this function. (But I think you will also get a reduced >>>>> range if you try to use it across a boundary between normal >>>>> host-memory-backed RAM and a device MemoryRegion.) >>>> >>>> If we allow a bounce buffer only to be used once (via the in_use flag), >>>> why do we allow only a single bounce buffer? >>>> >>>> Could address_space_map() allocate a new bounce buffer on every call and >>>> address_space_unmap() deallocate it? >>>> >>>> Isn't the design with a single bounce buffer bound to fail with a >>>> multi-threaded client as collision can be expected? >>> >>> Yeah, I don't suppose multi-threaded was particularly expected. >>> Again, this is really a "handle the case where the guest does >>> something silly" setup, which is why only one bounce buffer. >>> >>> Why is your guest ending up in the bounce-buffer path? >> >> Happens for me with emulated CXL memory. > > Can we put that in the "something silly" bucket? :-) > But yes, I'm not surprised that CXL runs into this. Heinrich, > are you doing CXL testing, or is this some other workload? I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: qemu-system-riscv64 \ -M virt,acpi=off -accel tcg -m 4096 \ -serial mon:stdio \ -device virtio-gpu-pci \ -device qemu-xhci \ -device usb-kbd \ -drive if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ -drive file=sct.img,format=raw,if=virtio \ -device virtio-net-device,netdev=net0 \ -netdev user,id=net0 This does not invoke any CXL related stuff. Best regards Heinrich > >> I think the case I saw >> was split descriptors in virtio via address space caches >> https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 >> >> One bounce buffer is in use for the outer loop and another for the descriptors >> it is pointing to. > > Mmm. The other assumption made in the design of the address_space_map() > API I think was that it was unlikely that a device would be trying > to do two DMA operations simultaneously. This is clearly not > true in practice. We definitely need to fix one end or other of > this API. > > (I'm not sure why the bounce-buffer limit ought to be per-AddressSpace: > is that just done in Matthias' series so that we can attach an > x-thingy property to the individual PCI device?) > > -- PMM
On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > But yes, I'm not surprised that CXL runs into this. Heinrich, > > are you doing CXL testing, or is this some other workload? > > I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > > qemu-system-riscv64 \ > -M virt,acpi=off -accel tcg -m 4096 \ > -serial mon:stdio \ > -device virtio-gpu-pci \ > -device qemu-xhci \ > -device usb-kbd \ > -drive > if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > -drive file=sct.img,format=raw,if=virtio \ > -device virtio-net-device,netdev=net0 \ > -netdev user,id=net0 > > This does not invoke any CXL related stuff. Hmm, that doesn't seem like it ought to be running into this. What underlying memory region is the guest trying to do the virtio queue access to? -- PMM
On 29.02.24 13:34, Peter Maydell wrote: > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: >>> But yes, I'm not surprised that CXL runs into this. Heinrich, >>> are you doing CXL testing, or is this some other workload? >> >> I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: >> >> qemu-system-riscv64 \ >> -M virt,acpi=off -accel tcg -m 4096 \ >> -serial mon:stdio \ >> -device virtio-gpu-pci \ >> -device qemu-xhci \ >> -device usb-kbd \ >> -drive >> if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ >> -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ >> -drive file=sct.img,format=raw,if=virtio \ >> -device virtio-net-device,netdev=net0 \ >> -netdev user,id=net0 >> >> This does not invoke any CXL related stuff. > > Hmm, that doesn't seem like it ought to be running into this. > What underlying memory region is the guest trying to do > the virtio queue access to? > > -- PMM The error occurs while the SCT is executing function BBTestReadBlocksConformanceAutoTest (https://github.com/tianocore/edk2-test/blob/cabb98d44be94e7547605435a0be7c4946d10f8b/uefi-sct/SctPkg/TestCase/UEFI/EFI/Protocol/BlockIo/BlackBoxTest/BlockIoBBTestConformance.c#L45) This code is accessing the drive defined as -drive file=sct.img,format=raw,if=virtio . In the conformance test correct error handling for invalid parameters of the UEFI block IO protocol is tested. This includes calling the UEFI API to * read with incorrectly aligned buffers * read with invalid LBA parameter * read with buffer size not being a multiple of the sector size In all these cases the UEFI API implemented by EDK II is expected to return an error. Best regards Heinrich
On Thu, 29 Feb 2024 at 14:17, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 29.02.24 13:34, Peter Maydell wrote: > > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > >>> But yes, I'm not surprised that CXL runs into this. Heinrich, > >>> are you doing CXL testing, or is this some other workload? > >> > >> I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > >> > >> qemu-system-riscv64 \ > >> -M virt,acpi=off -accel tcg -m 4096 \ > >> -serial mon:stdio \ > >> -device virtio-gpu-pci \ > >> -device qemu-xhci \ > >> -device usb-kbd \ > >> -drive > >> if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > >> -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > >> -drive file=sct.img,format=raw,if=virtio \ > >> -device virtio-net-device,netdev=net0 \ > >> -netdev user,id=net0 > >> > >> This does not invoke any CXL related stuff. > > > > Hmm, that doesn't seem like it ought to be running into this. > > What underlying memory region is the guest trying to do > > the virtio queue access to? > The error occurs while the SCT is executing function > BBTestReadBlocksConformanceAutoTest > (https://github.com/tianocore/edk2-test/blob/cabb98d44be94e7547605435a0be7c4946d10f8b/uefi-sct/SctPkg/TestCase/UEFI/EFI/Protocol/BlockIo/BlackBoxTest/BlockIoBBTestConformance.c#L45) > > This code is accessing the drive defined as > -drive file=sct.img,format=raw,if=virtio . > > In the conformance test correct error handling for invalid parameters of > the UEFI block IO protocol is tested. This includes calling the UEFI API to > > * read with incorrectly aligned buffers > * read with invalid LBA parameter > * read with buffer size not being a multiple of the sector size > > In all these cases the UEFI API implemented by EDK II is expected to > return an error. OK, but where is it actually reading/writing to ? Even if the buffer isn't aligned right it ought not to be in a non-host-ram-backed memory region, I would have thought. -- PMM
On Thu, Feb 29, 2024 at 1:35 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: > > > But yes, I'm not surprised that CXL runs into this. Heinrich, > > > are you doing CXL testing, or is this some other workload? > > > > I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > > > > qemu-system-riscv64 \ > > -M virt,acpi=off -accel tcg -m 4096 \ > > -serial mon:stdio \ > > -device virtio-gpu-pci \ > > -device qemu-xhci \ > > -device usb-kbd \ > > -drive > > if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > > -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > > -drive file=sct.img,format=raw,if=virtio \ > > -device virtio-net-device,netdev=net0 \ > > -netdev user,id=net0 > > > > This does not invoke any CXL related stuff. > > Hmm, that doesn't seem like it ought to be running into this. > What underlying memory region is the guest trying to do > the virtio queue access to? FWIW, I have seen multiple bounce buffer usage with the generic net TX path as well as the XHCI controller, so it might be either of these. Bounce buffering should only take place when the memory region can't be accessed directly though - I don't see why that's the case for the given command line.
On Thu, 29 Feb 2024 at 12:52, Mattias Nissler <mnissler@rivosinc.com> wrote: > > On Thu, Feb 29, 2024 at 1:35 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > > > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > But yes, I'm not surprised that CXL runs into this. Heinrich, > > > > are you doing CXL testing, or is this some other workload? > > > > > > I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > > > > > > qemu-system-riscv64 \ > > > -M virt,acpi=off -accel tcg -m 4096 \ > > > -serial mon:stdio \ > > > -device virtio-gpu-pci \ > > > -device qemu-xhci \ > > > -device usb-kbd \ > > > -drive > > > if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > > > -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > > > -drive file=sct.img,format=raw,if=virtio \ > > > -device virtio-net-device,netdev=net0 \ > > > -netdev user,id=net0 > > > > > > This does not invoke any CXL related stuff. > > > > Hmm, that doesn't seem like it ought to be running into this. > > What underlying memory region is the guest trying to do > > the virtio queue access to? > > FWIW, I have seen multiple bounce buffer usage with the generic net TX > path as well as the XHCI controller, so it might be either of these. > Bounce buffering should only take place when the memory region can't > be accessed directly though - I don't see why that's the case for the > given command line. Yeah, exactly -- the only thing I can see there where a reasonable guest program ought to be trying to do DMA to/from non-host-RAM would be if (a) it's reading data from directly out of the pflash image and (b) the flash device has got itself into programming mode somehow. But that doesn't happen for the other architectures that have pflash for boot firmware as far as I know. The bounce buffering stuff does need to be fixed but it's also worth looking at why we're hitting the code path at all here, because even if it works the performance is terrible (doubly so if you try to execute code out of that MR). -- PMM
On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: > On 28.02.24 19:39, Peter Maydell wrote: > > On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > > > > Hi Heinrich, > > > > > > > > On 28/2/24 13:59, Heinrich Schuchardt wrote: > > > > > virtqueue_map_desc() is called with values of sz exceeding that may > > > > > exceed > > > > > TARGET_PAGE_SIZE. sz = 0x2800 has been observed. Pure (and can also be stupid) question: why virtqueue_map_desc() would map to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > > > > > > > > > > We only support a single bounce buffer. We have to avoid > > > > > virtqueue_map_desc() calling address_space_map() multiple times. > > > > > Otherwise > > > > > we see an error > > > > > > > > > > qemu: virtio: bogus descriptor or out of resources > > > > > > > > > > Increase the minimum size of the bounce buffer to 0x10000 which matches > > > > > the largest value of TARGET_PAGE_SIZE for all architectures. > > > > > > > > > > Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > > > > > --- > > > > > v2: > > > > > remove unrelated change > > > > > --- > > > > > system/physmem.c | 8 ++++++-- > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/system/physmem.c b/system/physmem.c > > > > > index e3ebc19eef..3c82da1c86 100644 > > > > > --- a/system/physmem.c > > > > > +++ b/system/physmem.c > > > > > @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > > > > > *plen = 0; > > > > > return NULL; > > > > > } > > > > > - /* Avoid unbounded allocations */ > > > > > - l = MIN(l, TARGET_PAGE_SIZE); > > > > > + /* > > > > > + * There is only one bounce buffer. The largest occuring > > > > > value of > > > > > + * parameter sz of virtqueue_map_desc() must fit into the bounce > > > > > + * buffer. > > > > > + */ > > > > > + l = MIN(l, 0x10000); > > > > > > > > Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > > > > TARGETS_BIGGEST_PAGE_SIZE? > > > > > > > > Then along: > > > > QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > > > > > > Thank you Philippe for reviewing. > > > > > > TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > > > page size. > > > How about MIN_BOUNCE_BUFFER_SIZE? > > > Is include/exec/memory.h the right include for the constant? > > > > > > I don't think that TARGET_PAGE_SIZE has any relevance for setting the > > > bounce buffer size. I only mentioned it to say that we are not > > > decreasing the value on any existing architecture. > > > > > > I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > > > e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > > > reason for this choice. Maybe Paolo remembers. > > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > which was the first implementation of this function. I don't think > > there's a particular reason for that value beyond that it was > > probably a convenient value that was assumed to be likely "big enough". > > > > I think the idea with this bounce-buffer has always been that this > > isn't really a code path we expected to end up in very often -- > > it's supposed to be for when devices are doing DMA, which they > > will typically be doing to memory (backed by host RAM), not > > devices (backed by MMIO and needing a bounce buffer). So the > > whole mechanism is a bit "last fallback to stop things breaking > > entirely". > > > > The address_space_map() API says that it's allowed to return > > a subset of the range you ask for, so if the virtio code doesn't > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > we need to fix that virtio code or we need to change the API > > of this function. (But I think you will also get a reduced > > range if you try to use it across a boundary between normal > > host-memory-backed RAM and a device MemoryRegion.) > > If we allow a bounce buffer only to be used once (via the in_use flag), why > do we allow only a single bounce buffer? > > Could address_space_map() allocate a new bounce buffer on every call and > address_space_unmap() deallocate it? > > Isn't the design with a single bounce buffer bound to fail with a > multi-threaded client as collision can be expected? See: https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com For some reason that series didn't land, but it seems to be helpful in this case too if e.g. there can be multiple of such devices. Thanks, -- Peter Xu
On 29.02.24 02:11, Peter Xu wrote: > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: >> On 28.02.24 19:39, Peter Maydell wrote: >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt >>> <heinrich.schuchardt@canonical.com> wrote: >>>> >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: >>>>> Hi Heinrich, >>>>> >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote: >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may >>>>>> exceed >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > Pure (and can also be stupid) question: why virtqueue_map_desc() would map > to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > >>>>>> >>>>>> We only support a single bounce buffer. We have to avoid >>>>>> virtqueue_map_desc() calling address_space_map() multiple times. >>>>>> Otherwise >>>>>> we see an error >>>>>> >>>>>> qemu: virtio: bogus descriptor or out of resources >>>>>> >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures. >>>>>> >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> >>>>>> --- >>>>>> v2: >>>>>> remove unrelated change >>>>>> --- >>>>>> system/physmem.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/system/physmem.c b/system/physmem.c >>>>>> index e3ebc19eef..3c82da1c86 100644 >>>>>> --- a/system/physmem.c >>>>>> +++ b/system/physmem.c >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, >>>>>> *plen = 0; >>>>>> return NULL; >>>>>> } >>>>>> - /* Avoid unbounded allocations */ >>>>>> - l = MIN(l, TARGET_PAGE_SIZE); >>>>>> + /* >>>>>> + * There is only one bounce buffer. The largest occuring >>>>>> value of >>>>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce >>>>>> + * buffer. >>>>>> + */ >>>>>> + l = MIN(l, 0x10000); >>>>> >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or >>>>> TARGETS_BIGGEST_PAGE_SIZE? >>>>> >>>>> Then along: >>>>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); >>>> >>>> Thank you Philippe for reviewing. >>>> >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the >>>> page size. >>>> How about MIN_BOUNCE_BUFFER_SIZE? >>>> Is include/exec/memory.h the right include for the constant? >>>> >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the >>>> bounce buffer size. I only mentioned it to say that we are not >>>> decreasing the value on any existing architecture. >>>> >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a >>>> reason for this choice. Maybe Paolo remembers. >>> >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, >>> which was the first implementation of this function. I don't think >>> there's a particular reason for that value beyond that it was >>> probably a convenient value that was assumed to be likely "big enough". >>> >>> I think the idea with this bounce-buffer has always been that this >>> isn't really a code path we expected to end up in very often -- >>> it's supposed to be for when devices are doing DMA, which they >>> will typically be doing to memory (backed by host RAM), not >>> devices (backed by MMIO and needing a bounce buffer). So the >>> whole mechanism is a bit "last fallback to stop things breaking >>> entirely". >>> >>> The address_space_map() API says that it's allowed to return >>> a subset of the range you ask for, so if the virtio code doesn't >>> cope with the minimum being set to TARGET_PAGE_SIZE then either >>> we need to fix that virtio code or we need to change the API >>> of this function. (But I think you will also get a reduced >>> range if you try to use it across a boundary between normal >>> host-memory-backed RAM and a device MemoryRegion.) >> >> If we allow a bounce buffer only to be used once (via the in_use flag), why >> do we allow only a single bounce buffer? >> >> Could address_space_map() allocate a new bounce buffer on every call and >> address_space_unmap() deallocate it? >> >> Isn't the design with a single bounce buffer bound to fail with a >> multi-threaded client as collision can be expected? > > See: > > https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com > > For some reason that series didn't land, but it seems to be helpful in this > case too if e.g. there can be multiple of such devices. > > Thanks, > Hello Peter Xu, thanks for pointing to your series. What I like about it is that it removes the limit of a single bounce buffer per AddressSpace. Unfortunately it does not solve my problem. You limit the sum of all of the allocations for a single AddressSpcace to DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case. Why do we need a limit? Why is it so tiny? Best regards Heinrich
On Thu, 29 Feb 2024 11:22:24 +0100 Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > On 29.02.24 02:11, Peter Xu wrote: > > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: > >> On 28.02.24 19:39, Peter Maydell wrote: > >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > >>> <heinrich.schuchardt@canonical.com> wrote: > >>>> > >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > >>>>> Hi Heinrich, > >>>>> > >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote: > >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may > >>>>>> exceed > >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > > > Pure (and can also be stupid) question: why virtqueue_map_desc() would map > > to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > > > >>>>>> > >>>>>> We only support a single bounce buffer. We have to avoid > >>>>>> virtqueue_map_desc() calling address_space_map() multiple times. > >>>>>> Otherwise > >>>>>> we see an error > >>>>>> > >>>>>> qemu: virtio: bogus descriptor or out of resources > >>>>>> > >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches > >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures. > >>>>>> > >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > >>>>>> --- > >>>>>> v2: > >>>>>> remove unrelated change > >>>>>> --- > >>>>>> system/physmem.c | 8 ++++++-- > >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/system/physmem.c b/system/physmem.c > >>>>>> index e3ebc19eef..3c82da1c86 100644 > >>>>>> --- a/system/physmem.c > >>>>>> +++ b/system/physmem.c > >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > >>>>>> *plen = 0; > >>>>>> return NULL; > >>>>>> } > >>>>>> - /* Avoid unbounded allocations */ > >>>>>> - l = MIN(l, TARGET_PAGE_SIZE); > >>>>>> + /* > >>>>>> + * There is only one bounce buffer. The largest occuring > >>>>>> value of > >>>>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce > >>>>>> + * buffer. > >>>>>> + */ > >>>>>> + l = MIN(l, 0x10000); > >>>>> > >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > >>>>> TARGETS_BIGGEST_PAGE_SIZE? > >>>>> > >>>>> Then along: > >>>>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > >>>> > >>>> Thank you Philippe for reviewing. > >>>> > >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > >>>> page size. > >>>> How about MIN_BOUNCE_BUFFER_SIZE? > >>>> Is include/exec/memory.h the right include for the constant? > >>>> > >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the > >>>> bounce buffer size. I only mentioned it to say that we are not > >>>> decreasing the value on any existing architecture. > >>>> > >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > >>>> reason for this choice. Maybe Paolo remembers. > >>> > >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > >>> which was the first implementation of this function. I don't think > >>> there's a particular reason for that value beyond that it was > >>> probably a convenient value that was assumed to be likely "big enough". > >>> > >>> I think the idea with this bounce-buffer has always been that this > >>> isn't really a code path we expected to end up in very often -- > >>> it's supposed to be for when devices are doing DMA, which they > >>> will typically be doing to memory (backed by host RAM), not > >>> devices (backed by MMIO and needing a bounce buffer). So the > >>> whole mechanism is a bit "last fallback to stop things breaking > >>> entirely". > >>> > >>> The address_space_map() API says that it's allowed to return > >>> a subset of the range you ask for, so if the virtio code doesn't > >>> cope with the minimum being set to TARGET_PAGE_SIZE then either > >>> we need to fix that virtio code or we need to change the API > >>> of this function. (But I think you will also get a reduced > >>> range if you try to use it across a boundary between normal > >>> host-memory-backed RAM and a device MemoryRegion.) > >> > >> If we allow a bounce buffer only to be used once (via the in_use flag), why > >> do we allow only a single bounce buffer? > >> > >> Could address_space_map() allocate a new bounce buffer on every call and > >> address_space_unmap() deallocate it? > >> > >> Isn't the design with a single bounce buffer bound to fail with a > >> multi-threaded client as collision can be expected? > > > > See: > > > > https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com > > > > For some reason that series didn't land, but it seems to be helpful in this > > case too if e.g. there can be multiple of such devices. > > > > Thanks, > > > > Hello Peter Xu, > > thanks for pointing to your series. What I like about it is that it > removes the limit of a single bounce buffer per AddressSpace. > > Unfortunately it does not solve my problem. You limit the sum of all of > the allocations for a single AddressSpcace to > DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case. Agreed. For CXL memory (which indeed hits this with virtio-blk-pci for example) I'm carrying a far from subtle patch [1] on top of the series linked above (which I would very much like to land asap!) Note I also need: https://lore.kernel.org/qemu-devel/20240215142817.1904-1-Jonathan.Cameron@huawei.com/#t [PATCH 0/3] physmem: Fix MemoryRegion for second access to cached MMIO Address Space otherwise the second 8 byte read comes from the wrong address space. That used to accidentally trigger very large reads hence the rather excessive size set below. I haven't checked yet if a smaller size is fine as obviously something as hacky as the below isn't going anywhere! Perhaps we need a rather more global version of x-max-bounce-buffers from Mattias' series? [1] diff --git a/system/physmem.c b/system/physmem.c index 0a838f1b33..10e9d8da86 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2554,6 +2554,7 @@ static void memory_map_init(void) memory_region_init(system_memory, NULL, "system", UINT64_MAX); address_space_init(&address_space_memory, system_memory, "memory"); + address_space_memory.max_bounce_buffer_size = 1024 * 1024 * 1024; system_io = g_malloc(sizeof(*system_io)); memory_region_init_io(system_io, NULL, &unassigned_io_ops, NULL, "io", 65536); > > Why do we need a limit? > Why is it so tiny? > > Best regards > > Heinrich > > > >
On Thu, Feb 29, 2024 at 11:22 AM Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 29.02.24 02:11, Peter Xu wrote: > > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: > >> On 28.02.24 19:39, Peter Maydell wrote: > >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > >>> <heinrich.schuchardt@canonical.com> wrote: > >>>> > >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > >>>>> Hi Heinrich, > >>>>> > >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote: > >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may > >>>>>> exceed > >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > > > Pure (and can also be stupid) question: why virtqueue_map_desc() would map > > to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > > > >>>>>> > >>>>>> We only support a single bounce buffer. We have to avoid > >>>>>> virtqueue_map_desc() calling address_space_map() multiple times. > >>>>>> Otherwise > >>>>>> we see an error > >>>>>> > >>>>>> qemu: virtio: bogus descriptor or out of resources > >>>>>> > >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches > >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures. > >>>>>> > >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > >>>>>> --- > >>>>>> v2: > >>>>>> remove unrelated change > >>>>>> --- > >>>>>> system/physmem.c | 8 ++++++-- > >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/system/physmem.c b/system/physmem.c > >>>>>> index e3ebc19eef..3c82da1c86 100644 > >>>>>> --- a/system/physmem.c > >>>>>> +++ b/system/physmem.c > >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > >>>>>> *plen = 0; > >>>>>> return NULL; > >>>>>> } > >>>>>> - /* Avoid unbounded allocations */ > >>>>>> - l = MIN(l, TARGET_PAGE_SIZE); > >>>>>> + /* > >>>>>> + * There is only one bounce buffer. The largest occuring > >>>>>> value of > >>>>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce > >>>>>> + * buffer. > >>>>>> + */ > >>>>>> + l = MIN(l, 0x10000); > >>>>> > >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > >>>>> TARGETS_BIGGEST_PAGE_SIZE? > >>>>> > >>>>> Then along: > >>>>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > >>>> > >>>> Thank you Philippe for reviewing. > >>>> > >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > >>>> page size. > >>>> How about MIN_BOUNCE_BUFFER_SIZE? > >>>> Is include/exec/memory.h the right include for the constant? > >>>> > >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the > >>>> bounce buffer size. I only mentioned it to say that we are not > >>>> decreasing the value on any existing architecture. > >>>> > >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > >>>> reason for this choice. Maybe Paolo remembers. > >>> > >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > >>> which was the first implementation of this function. I don't think > >>> there's a particular reason for that value beyond that it was > >>> probably a convenient value that was assumed to be likely "big enough". > >>> > >>> I think the idea with this bounce-buffer has always been that this > >>> isn't really a code path we expected to end up in very often -- > >>> it's supposed to be for when devices are doing DMA, which they > >>> will typically be doing to memory (backed by host RAM), not > >>> devices (backed by MMIO and needing a bounce buffer). So the > >>> whole mechanism is a bit "last fallback to stop things breaking > >>> entirely". > >>> > >>> The address_space_map() API says that it's allowed to return > >>> a subset of the range you ask for, so if the virtio code doesn't > >>> cope with the minimum being set to TARGET_PAGE_SIZE then either > >>> we need to fix that virtio code or we need to change the API > >>> of this function. (But I think you will also get a reduced > >>> range if you try to use it across a boundary between normal > >>> host-memory-backed RAM and a device MemoryRegion.) > >> > >> If we allow a bounce buffer only to be used once (via the in_use flag), why > >> do we allow only a single bounce buffer? > >> > >> Could address_space_map() allocate a new bounce buffer on every call and > >> address_space_unmap() deallocate it? > >> > >> Isn't the design with a single bounce buffer bound to fail with a > >> multi-threaded client as collision can be expected? > > > > See: > > > > https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com > > > > For some reason that series didn't land, but it seems to be helpful in this > > case too if e.g. there can be multiple of such devices. > > > > Thanks, > > > > Hello Peter Xu, > > thanks for pointing to your series. What I like about it is that it > removes the limit of a single bounce buffer per AddressSpace. > > Unfortunately it does not solve my problem. You limit the sum of all of > the allocations for a single AddressSpcace to > DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case. Note that the limit is configured for address spaces attached to PCI devices with a parameter. > > Why do we need a limit? The rationale is to prevent a guest from allocating unlimited amounts of host memory. > Why is it so tiny? Nobody has come up with a good way to determine a "sufficient" amount that works with all use cases, while at the same time addressing the concern due to malicious guest behavior mentioned above. (Note that I'm merely reciting previous conversations as the author of the series Peter pointed you at) > > Best regards > > Heinrich > > >
© 2016 - 2024 Red Hat, Inc.