[Qemu-devel] [PATCH] vfio: If DMA map returns ENOMEM wait and try again

Sergio Lopez posted 1 patch 7 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170403085822.13863-1-slp@redhat.com
Test checkpatch passed
Test docker passed
Test s390x passed
hw/vfio/common.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
[Qemu-devel] [PATCH] vfio: If DMA map returns ENOMEM wait and try again
Posted by Sergio Lopez 7 years ago
When quickly unmapping and mapping memory regions (as may happen in
address_space_update_topology), if running with a non-unlimited
RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request
because the previous unmap has been processed, but accounted yet.

Probably this should be fixed in the kernel ensuring a deterministic
behavior for VFIO map and unmap operations. Until then, this works
around the issue, waiting 10ms and trying again.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/vfio/common.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f3ba9b9..db41fa5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -228,17 +228,32 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
         map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
     }
 
-    /*
-     * Try the mapping, if it fails with EBUSY, unmap the region and try
-     * again.  This shouldn't be necessary, but we sometimes see it in
-     * the VGA ROM space.
-     */
-    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
-         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
         return 0;
     }
 
+    if (errno == ENOMEM) {
+        /*
+         * When quickly unmapping and mapping ranges, the kernel may
+         * return ENOMEM for a map request because the previous unmap
+         * has not been accounted yet. Wait a bit and try again.
+         */
+        usleep(10 * 1000);
+        if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
+            return 0;
+        }
+    } else if (errno == EBUSY) {
+        /*
+         * If mapping fails with EBUSY, unmap the region and try again.
+         * This shouldn't be necessary, but we sometimes see it in the
+         * VGA ROM space.
+         */
+        if (vfio_dma_unmap(container, iova, size) == 0 &&
+            ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
+            return 0;
+        }
+    }
+
     error_report("VFIO_MAP_DMA: %d", -errno);
     return -errno;
 }
-- 
2.9.3


Re: [Qemu-devel] [PATCH] vfio: If DMA map returns ENOMEM wait and try again
Posted by Alex Williamson 7 years ago
On Mon,  3 Apr 2017 10:58:22 +0200
Sergio Lopez <slp@redhat.com> wrote:

> When quickly unmapping and mapping memory regions (as may happen in
> address_space_update_topology), if running with a non-unlimited
> RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request
> because the previous unmap has been processed, but accounted yet.
> 
> Probably this should be fixed in the kernel ensuring a deterministic
> behavior for VFIO map and unmap operations. Until then, this works
> around the issue, waiting 10ms and trying again.

I think we need to know what that kernel fix is before adding arbitrary
delays and retries in userspace code (Do we know why 10ms works?  Is
it too long/short?).  I think I have a test program that reproduces
this, I setup vfio and allocate two 4k buffers, one for mapping through
vfio and one for mlocking.  I clone(2) the process with CLONE_VM and the
clone loops doing mlock/munlock while the main thread does map/unmap.
This fails in a fraction of a second while running either independently
works well. Still investigating. Thanks,

Alex
 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/vfio/common.c | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f3ba9b9..db41fa5 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -228,17 +228,32 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>          map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
>      }
>  
> -    /*
> -     * Try the mapping, if it fails with EBUSY, unmap the region and try
> -     * again.  This shouldn't be necessary, but we sometimes see it in
> -     * the VGA ROM space.
> -     */
> -    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
> -        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
> -         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
> +    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
>          return 0;
>      }
>  
> +    if (errno == ENOMEM) {
> +        /*
> +         * When quickly unmapping and mapping ranges, the kernel may
> +         * return ENOMEM for a map request because the previous unmap
> +         * has not been accounted yet. Wait a bit and try again.
> +         */
> +        usleep(10 * 1000);
> +        if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
> +            return 0;
> +        }
> +    } else if (errno == EBUSY) {
> +        /*
> +         * If mapping fails with EBUSY, unmap the region and try again.
> +         * This shouldn't be necessary, but we sometimes see it in the
> +         * VGA ROM space.
> +         */
> +        if (vfio_dma_unmap(container, iova, size) == 0 &&
> +            ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
> +            return 0;
> +        }
> +    }
> +
>      error_report("VFIO_MAP_DMA: %d", -errno);
>      return -errno;
>  }


Re: [Qemu-devel] [PATCH] vfio: If DMA map returns ENOMEM wait and try again
Posted by Sergio Lopez Pascual 7 years ago
On Mon, Apr 3, 2017 at 5:40 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Mon,  3 Apr 2017 10:58:22 +0200
> Sergio Lopez <slp@redhat.com> wrote:
>
> > When quickly unmapping and mapping memory regions (as may happen in
> > address_space_update_topology), if running with a non-unlimited
> > RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request
> > because the previous unmap has been processed, but accounted yet.
> >
> > Probably this should be fixed in the kernel ensuring a deterministic
> > behavior for VFIO map and unmap operations. Until then, this works
> > around the issue, waiting 10ms and trying again.
>
> I think we need to know what that kernel fix is before adding arbitrary
> delays and retries in userspace code (Do we know why 10ms works?  Is
> it too long/short?).

AFAIK from userspace we can't know when a certain work scheduled in a
kernel workqueue has been completed. Calling usleep ensures the
process will yield, and 10ms looks enough time for a full world of
context switches, but I agree with you that's pretty arbitrary.

On the other hand, this code is only reached in a pretty exceptional
situation, which is not relevant from a performance point of view, and
there's already a workaround for a non-deterministic EBUSY while
mapping VGA ROM space.

There's the option of leaving this as is, and waiting for a fix in the
kernel, but I think I'd a good idea to work around the issue for older
kernels too.

Sergio.