When quickly unmapping and mapping memory regions (as may happen in
address_space_update_topology), if running with a non-unlimited
RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request
because the previous unmap has been processed, but accounted yet.
Probably this should be fixed in the kernel ensuring a deterministic
behavior for VFIO map and unmap operations. Until then, this works
around the issue, waiting 10ms and trying again.
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
hw/vfio/common.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f3ba9b9..db41fa5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -228,17 +228,32 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
}
- /*
- * Try the mapping, if it fails with EBUSY, unmap the region and try
- * again. This shouldn't be necessary, but we sometimes see it in
- * the VGA ROM space.
- */
- if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
- (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
- ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
+ if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
return 0;
}
+ if (errno == ENOMEM) {
+ /*
+ * When quickly unmapping and mapping ranges, the kernel may
+ * return ENOMEM for a map request because the previous unmap
+ * has not been accounted yet. Wait a bit and try again.
+ */
+ usleep(10 * 1000);
+ if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
+ return 0;
+ }
+ } else if (errno == EBUSY) {
+ /*
+ * If mapping fails with EBUSY, unmap the region and try again.
+ * This shouldn't be necessary, but we sometimes see it in the
+ * VGA ROM space.
+ */
+ if (vfio_dma_unmap(container, iova, size) == 0 &&
+ ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
+ return 0;
+ }
+ }
+
error_report("VFIO_MAP_DMA: %d", -errno);
return -errno;
}
--
2.9.3
On Mon, 3 Apr 2017 10:58:22 +0200 Sergio Lopez <slp@redhat.com> wrote: > When quickly unmapping and mapping memory regions (as may happen in > address_space_update_topology), if running with a non-unlimited > RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request > because the previous unmap has been processed, but accounted yet. > > Probably this should be fixed in the kernel ensuring a deterministic > behavior for VFIO map and unmap operations. Until then, this works > around the issue, waiting 10ms and trying again. I think we need to know what that kernel fix is before adding arbitrary delays and retries in userspace code (Do we know why 10ms works? Is it too long/short?). I think I have a test program that reproduces this, I setup vfio and allocate two 4k buffers, one for mapping through vfio and one for mlocking. I clone(2) the process with CLONE_VM and the clone loops doing mlock/munlock while the main thread does map/unmap. This fails in a fraction of a second while running either independently works well. Still investigating. Thanks, Alex > Signed-off-by: Sergio Lopez <slp@redhat.com> > --- > hw/vfio/common.c | 31 +++++++++++++++++++++++-------- > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index f3ba9b9..db41fa5 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -228,17 +228,32 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, > map.flags |= VFIO_DMA_MAP_FLAG_WRITE; > } > > - /* > - * Try the mapping, if it fails with EBUSY, unmap the region and try > - * again. This shouldn't be necessary, but we sometimes see it in > - * the VGA ROM space. > - */ > - if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 || > - (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 && > - ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) { > + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { > return 0; > } > > + if (errno == ENOMEM) { > + /* > + * When quickly unmapping and mapping ranges, the kernel may > + * return ENOMEM for a map request because the previous unmap > + * has not been accounted yet. Wait a bit and try again. > + */ > + usleep(10 * 1000); > + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { > + return 0; > + } > + } else if (errno == EBUSY) { > + /* > + * If mapping fails with EBUSY, unmap the region and try again. > + * This shouldn't be necessary, but we sometimes see it in the > + * VGA ROM space. > + */ > + if (vfio_dma_unmap(container, iova, size) == 0 && > + ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { > + return 0; > + } > + } > + > error_report("VFIO_MAP_DMA: %d", -errno); > return -errno; > }
On Mon, Apr 3, 2017 at 5:40 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > > On Mon, 3 Apr 2017 10:58:22 +0200 > Sergio Lopez <slp@redhat.com> wrote: > > > When quickly unmapping and mapping memory regions (as may happen in > > address_space_update_topology), if running with a non-unlimited > > RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request > > because the previous unmap has been processed, but accounted yet. > > > > Probably this should be fixed in the kernel ensuring a deterministic > > behavior for VFIO map and unmap operations. Until then, this works > > around the issue, waiting 10ms and trying again. > > I think we need to know what that kernel fix is before adding arbitrary > delays and retries in userspace code (Do we know why 10ms works? Is > it too long/short?). AFAIK from userspace we can't know when a certain work scheduled in a kernel workqueue has been completed. Calling usleep ensures the process will yield, and 10ms looks enough time for a full world of context switches, but I agree with you that's pretty arbitrary. On the other hand, this code is only reached in a pretty exceptional situation, which is not relevant from a performance point of view, and there's already a workaround for a non-deterministic EBUSY while mapping VGA ROM space. There's the option of leaving this as is, and waiting for a fix in the kernel, but I think I'd a good idea to work around the issue for older kernels too. Sergio.
© 2016 - 2024 Red Hat, Inc.