Let's query the maximum number of DMA mappings by querying the available
mappings when creating the container.
In addition, count the number of DMA mappings and warn when we would
exceed it. This is a preparation for RamDiscardMgr which might
create quite some DMA mappings over time, and we at least want to warn
early that the QEMU setup might be problematic. Use "reserved"
terminology, so we can use this to reserve mappings before they are
actually created.
Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size
divided by the mapping page size might be a bad indication of what will
happen in practice - we might end up warning all the time.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
hw/vfio/common.c | 34 ++++++++++++++++++++++++++++++++++
include/hw/vfio/vfio-common.h | 2 ++
2 files changed, 36 insertions(+)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6ff1daa763..5ad88d476f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -288,6 +288,26 @@ const MemoryRegionOps vfio_region_ops = {
},
};
+static void vfio_container_dma_reserve(VFIOContainer *container,
+ unsigned long dma_mappings)
+{
+ bool warned = container->dma_reserved > container->dma_max;
+
+ container->dma_reserved += dma_mappings;
+ if (!warned && container->dma_max &&
+ container->dma_reserved > container->dma_max) {
+ warn_report("%s: possibly running out of DMA mappings. "
+ " Maximum number of DMA mappings: %d", __func__,
+ container->dma_max);
+ }
+}
+
+static void vfio_container_dma_unreserve(VFIOContainer *container,
+ unsigned long dma_mappings)
+{
+ container->dma_reserved -= dma_mappings;
+}
+
/*
* Device state interfaces
*/
@@ -835,6 +855,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
}
}
+ /* We'll need one DMA mapping. */
+ vfio_container_dma_reserve(container, 1);
+
ret = vfio_dma_map(container, iova, int128_get64(llsize),
vaddr, section->readonly);
if (ret) {
@@ -879,6 +902,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
MemoryRegionSection *section)
{
VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+ bool unreserve_on_unmap = true;
hwaddr iova, end;
Int128 llend, llsize;
int ret;
@@ -919,6 +943,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
* based IOMMU where a big unmap flattens a large range of IO-PTEs.
* That may not be true for all IOMMU types.
*/
+ unreserve_on_unmap = false;
}
iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
@@ -970,6 +995,11 @@ static void vfio_listener_region_del(MemoryListener *listener,
"0x%"HWADDR_PRIx") = %d (%m)",
container, iova, int128_get64(llsize), ret);
}
+
+ /* We previously reserved one DMA mapping. */
+ if (unreserve_on_unmap) {
+ vfio_container_dma_unreserve(container, 1);
+ }
}
memory_region_unref(section->mr);
@@ -1735,6 +1765,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
container->fd = fd;
container->error = NULL;
container->dirty_pages_supported = false;
+ container->dma_max = 0;
QLIST_INIT(&container->giommu_list);
QLIST_INIT(&container->hostwin_list);
@@ -1765,7 +1796,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
container->pgsizes = info->iova_pgsizes;
+ /* The default in the kernel ("dma_entry_limit") is 65535. */
+ container->dma_max = 65535;
if (!ret) {
+ vfio_get_info_dma_avail(info, &container->dma_max);
vfio_get_iommu_info_migration(container, info);
}
g_free(info);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6141162d7a..fed0e85f66 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -88,6 +88,8 @@ typedef struct VFIOContainer {
uint64_t dirty_pgsizes;
uint64_t max_dirty_bitmap_size;
unsigned long pgsizes;
+ unsigned int dma_max;
+ unsigned long dma_reserved;
QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
QLIST_HEAD(, VFIOGroup) group_list;
--
2.29.2
> Let's query the maximum number of DMA mappings by querying the available
> mappings when creating the container.
>
> In addition, count the number of DMA mappings and warn when we would
> exceed it. This is a preparation for RamDiscardMgr which might
> create quite some DMA mappings over time, and we at least want to warn
> early that the QEMU setup might be problematic. Use "reserved"
> terminology, so we can use this to reserve mappings before they are
> actually created.
>
> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size
> divided by the mapping page size might be a bad indication of what will
> happen in practice - we might end up warning all the time.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Auger Eric <eric.auger@redhat.com>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: teawater <teawaterz@linux.alibaba.com>
> Cc: Marek Kedzierski <mkedzier@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> hw/vfio/common.c | 34 ++++++++++++++++++++++++++++++++++
> include/hw/vfio/vfio-common.h | 2 ++
> 2 files changed, 36 insertions(+)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6ff1daa763..5ad88d476f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -288,6 +288,26 @@ const MemoryRegionOps vfio_region_ops = {
> },
> };
>
> +static void vfio_container_dma_reserve(VFIOContainer *container,
> + unsigned long dma_mappings)
> +{
> + bool warned = container->dma_reserved > container->dma_max;
> +
> + container->dma_reserved += dma_mappings;
> + if (!warned && container->dma_max &&
> + container->dma_reserved > container->dma_max) {
> + warn_report("%s: possibly running out of DMA mappings. "
> + " Maximum number of DMA mappings: %d", __func__,
> + container->dma_max);
> + }
> +}
> +
> +static void vfio_container_dma_unreserve(VFIOContainer *container,
> + unsigned long dma_mappings)
> +{
> + container->dma_reserved -= dma_mappings;
> +}
> +
> /*
> * Device state interfaces
> */
> @@ -835,6 +855,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
> }
> }
>
> + /* We'll need one DMA mapping. */
> + vfio_container_dma_reserve(container, 1);
> +
> ret = vfio_dma_map(container, iova, int128_get64(llsize),
> vaddr, section->readonly);
> if (ret) {
> @@ -879,6 +902,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
> MemoryRegionSection *section)
> {
> VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> + bool unreserve_on_unmap = true;
> hwaddr iova, end;
> Int128 llend, llsize;
> int ret;
> @@ -919,6 +943,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
> * based IOMMU where a big unmap flattens a large range of IO-PTEs.
> * That may not be true for all IOMMU types.
> */
> + unreserve_on_unmap = false;
> }
>
> iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> @@ -970,6 +995,11 @@ static void vfio_listener_region_del(MemoryListener *listener,
> "0x%"HWADDR_PRIx") = %d (%m)",
> container, iova, int128_get64(llsize), ret);
> }
> +
> + /* We previously reserved one DMA mapping. */
> + if (unreserve_on_unmap) {
> + vfio_container_dma_unreserve(container, 1);
> + }
> }
>
> memory_region_unref(section->mr);
> @@ -1735,6 +1765,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> container->fd = fd;
> container->error = NULL;
> container->dirty_pages_supported = false;
> + container->dma_max = 0;
> QLIST_INIT(&container->giommu_list);
> QLIST_INIT(&container->hostwin_list);
>
> @@ -1765,7 +1796,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
> container->pgsizes = info->iova_pgsizes;
>
> + /* The default in the kernel ("dma_entry_limit") is 65535. */
> + container->dma_max = 65535;
> if (!ret) {
> + vfio_get_info_dma_avail(info, &container->dma_max);
> vfio_get_iommu_info_migration(container, info);
> }
> g_free(info);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 6141162d7a..fed0e85f66 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -88,6 +88,8 @@ typedef struct VFIOContainer {
> uint64_t dirty_pgsizes;
> uint64_t max_dirty_bitmap_size;
> unsigned long pgsizes;
> + unsigned int dma_max;
> + unsigned long dma_reserved;
> QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> QLIST_HEAD(, VFIOGroup) group_list;
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
On Wed, 16 Dec 2020 15:11:54 +0100
David Hildenbrand <david@redhat.com> wrote:
> Let's query the maximum number of DMA mappings by querying the available
> mappings when creating the container.
>
> In addition, count the number of DMA mappings and warn when we would
> exceed it. This is a preparation for RamDiscardMgr which might
> create quite some DMA mappings over time, and we at least want to warn
> early that the QEMU setup might be problematic. Use "reserved"
> terminology, so we can use this to reserve mappings before they are
> actually created.
This terminology doesn't make much sense to me, we're not actually
performing any kind of reservation.
> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size
> divided by the mapping page size might be a bad indication of what will
> happen in practice - we might end up warning all the time.
This suggests we're not really tracking DMA "reservations" at all.
Would something like dma_regions_mappings be a more appropriate
identifier for the thing you're trying to count? We might as well also
keep a counter for dma_iommu_mappings where the sum of those two should
stay below dma_max_mappings.
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Auger Eric <eric.auger@redhat.com>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: teawater <teawaterz@linux.alibaba.com>
> Cc: Marek Kedzierski <mkedzier@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> hw/vfio/common.c | 34 ++++++++++++++++++++++++++++++++++
> include/hw/vfio/vfio-common.h | 2 ++
> 2 files changed, 36 insertions(+)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6ff1daa763..5ad88d476f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -288,6 +288,26 @@ const MemoryRegionOps vfio_region_ops = {
> },
> };
>
> +static void vfio_container_dma_reserve(VFIOContainer *container,
> + unsigned long dma_mappings)
> +{
> + bool warned = container->dma_reserved > container->dma_max;
> +
> + container->dma_reserved += dma_mappings;
> + if (!warned && container->dma_max &&
> + container->dma_reserved > container->dma_max) {
> + warn_report("%s: possibly running out of DMA mappings. "
> + " Maximum number of DMA mappings: %d", __func__,
> + container->dma_max);
If we kept track of all the mappings we could predict better than
"possibly". Tracing support to track a high water mark might be useful
too.
> + }
> +}
> +
> +static void vfio_container_dma_unreserve(VFIOContainer *container,
> + unsigned long dma_mappings)
> +{
> + container->dma_reserved -= dma_mappings;
> +}
> +
> /*
> * Device state interfaces
> */
> @@ -835,6 +855,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
> }
> }
>
> + /* We'll need one DMA mapping. */
> + vfio_container_dma_reserve(container, 1);
> +
> ret = vfio_dma_map(container, iova, int128_get64(llsize),
> vaddr, section->readonly);
> if (ret) {
> @@ -879,6 +902,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
> MemoryRegionSection *section)
> {
> VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> + bool unreserve_on_unmap = true;
> hwaddr iova, end;
> Int128 llend, llsize;
> int ret;
> @@ -919,6 +943,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
> * based IOMMU where a big unmap flattens a large range of IO-PTEs.
> * That may not be true for all IOMMU types.
> */
> + unreserve_on_unmap = false;
> }
>
> iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> @@ -970,6 +995,11 @@ static void vfio_listener_region_del(MemoryListener *listener,
> "0x%"HWADDR_PRIx") = %d (%m)",
> container, iova, int128_get64(llsize), ret);
> }
> +
> + /* We previously reserved one DMA mapping. */
> + if (unreserve_on_unmap) {
> + vfio_container_dma_unreserve(container, 1);
> + }
> }
>
> memory_region_unref(section->mr);
> @@ -1735,6 +1765,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> container->fd = fd;
> container->error = NULL;
> container->dirty_pages_supported = false;
> + container->dma_max = 0;
> QLIST_INIT(&container->giommu_list);
> QLIST_INIT(&container->hostwin_list);
>
> @@ -1765,7 +1796,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
> container->pgsizes = info->iova_pgsizes;
>
> + /* The default in the kernel ("dma_entry_limit") is 65535. */
> + container->dma_max = 65535;
> if (!ret) {
> + vfio_get_info_dma_avail(info, &container->dma_max);
> vfio_get_iommu_info_migration(container, info);
> }
> g_free(info);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 6141162d7a..fed0e85f66 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -88,6 +88,8 @@ typedef struct VFIOContainer {
> uint64_t dirty_pgsizes;
> uint64_t max_dirty_bitmap_size;
> unsigned long pgsizes;
> + unsigned int dma_max;
> + unsigned long dma_reserved;
If dma_max is unsigned int, why do we need an unsigned long to track
how many are in use? Thanks,
Alex
> QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> QLIST_HEAD(, VFIOGroup) group_list;
On 17.12.20 18:55, Alex Williamson wrote:
> On Wed, 16 Dec 2020 15:11:54 +0100
> David Hildenbrand <david@redhat.com> wrote:
>
>> Let's query the maximum number of DMA mappings by querying the available
>> mappings when creating the container.
>>
>> In addition, count the number of DMA mappings and warn when we would
>> exceed it. This is a preparation for RamDiscardMgr which might
>> create quite some DMA mappings over time, and we at least want to warn
>> early that the QEMU setup might be problematic. Use "reserved"
>> terminology, so we can use this to reserve mappings before they are
>> actually created.
>
> This terminology doesn't make much sense to me, we're not actually
> performing any kind of reservation.
I see you spotted the second user which actually performs reservations.
>
>> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size
>> divided by the mapping page size might be a bad indication of what will
>> happen in practice - we might end up warning all the time.
>
> This suggests we're not really tracking DMA "reservations" at all.
> Would something like dma_regions_mappings be a more appropriate
> identifier for the thing you're trying to count? We might as well also
Right now I want to count
- Mappings we know we will definitely have (counted in this patch)
- Mappings we know we could eventually have later (RamDiscardMgr)
> keep a counter for dma_iommu_mappings where the sum of those two should
> stay below dma_max_mappings.
We could, however, tracking active IOMMU mappings when removing a memory
region from the address space isn't easily possible - we do a single
vfio_dma_unmap() which might span multiple mappings. Same applies to
RamDiscardMgr. Hard to count how many mappings we actually *currently*
have using that approach.
>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Auger Eric <eric.auger@redhat.com>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: teawater <teawaterz@linux.alibaba.com>
>> Cc: Marek Kedzierski <mkedzier@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> hw/vfio/common.c | 34 ++++++++++++++++++++++++++++++++++
>> include/hw/vfio/vfio-common.h | 2 ++
>> 2 files changed, 36 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 6ff1daa763..5ad88d476f 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -288,6 +288,26 @@ const MemoryRegionOps vfio_region_ops = {
>> },
>> };
>>
>> +static void vfio_container_dma_reserve(VFIOContainer *container,
>> + unsigned long dma_mappings)
>> +{
>> + bool warned = container->dma_reserved > container->dma_max;
>> +
>> + container->dma_reserved += dma_mappings;
>> + if (!warned && container->dma_max &&
>> + container->dma_reserved > container->dma_max) {
>> + warn_report("%s: possibly running out of DMA mappings. "
>> + " Maximum number of DMA mappings: %d", __func__,
>> + container->dma_max);
>
> If we kept track of all the mappings we could predict better than
> "possibly". Tracing support to track a high water mark might be useful
> too.
It's an early warning for reservations e.g., on fundamental setup issues
with virtio-mem.
[...]
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 6141162d7a..fed0e85f66 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -88,6 +88,8 @@ typedef struct VFIOContainer {
>> uint64_t dirty_pgsizes;
>> uint64_t max_dirty_bitmap_size;
>> unsigned long pgsizes;
>> + unsigned int dma_max;
>> + unsigned long dma_reserved;
>
> If dma_max is unsigned int, why do we need an unsigned long to track
> how many are in use? Thanks,
My thinking was that some vfio IOMMU types don't have such a limit
(dma_max == -1) and could allow for more. But thinking again, such a
huge number of mappings is highly unrealistic :)
--
Thanks,
David / dhildenb
On Thu, 17 Dec 2020 20:04:28 +0100
David Hildenbrand <david@redhat.com> wrote:
> On 17.12.20 18:55, Alex Williamson wrote:
> > On Wed, 16 Dec 2020 15:11:54 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >
> >> Let's query the maximum number of DMA mappings by querying the available
> >> mappings when creating the container.
> >>
> >> In addition, count the number of DMA mappings and warn when we would
> >> exceed it. This is a preparation for RamDiscardMgr which might
> >> create quite some DMA mappings over time, and we at least want to warn
> >> early that the QEMU setup might be problematic. Use "reserved"
> >> terminology, so we can use this to reserve mappings before they are
> >> actually created.
> >
> > This terminology doesn't make much sense to me, we're not actually
> > performing any kind of reservation.
>
> I see you spotted the second user which actually performs reservations.
>
> >
> >> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size
> >> divided by the mapping page size might be a bad indication of what will
> >> happen in practice - we might end up warning all the time.
> >
> > This suggests we're not really tracking DMA "reservations" at all.
> > Would something like dma_regions_mappings be a more appropriate
> > identifier for the thing you're trying to count? We might as well also
>
> Right now I want to count
> - Mappings we know we will definitely have (counted in this patch)
> - Mappings we know we could eventually have later (RamDiscardMgr)
>
> > keep a counter for dma_iommu_mappings where the sum of those two should
> > stay below dma_max_mappings.
>
> We could, however, tracking active IOMMU mappings when removing a memory
> region from the address space isn't easily possible - we do a single
> vfio_dma_unmap() which might span multiple mappings. Same applies to
> RamDiscardMgr. Hard to count how many mappings we actually *currently*
> have using that approach.
It's actually easy for the RamDiscardMgr regions, the unmap ioctl
returns the total size of the unmapped extents. Therefore since we
only map granule sized extents, simple math should tell us how many
entries we freed. OTOH, if there are other ways that we unmap multiple
mappings where we don't have those semantics, then it would be
prohibitive.
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Cc: Igor Mammedov <imammedo@redhat.com>
> >> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Auger Eric <eric.auger@redhat.com>
> >> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> >> Cc: teawater <teawaterz@linux.alibaba.com>
> >> Cc: Marek Kedzierski <mkedzier@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >> hw/vfio/common.c | 34 ++++++++++++++++++++++++++++++++++
> >> include/hw/vfio/vfio-common.h | 2 ++
> >> 2 files changed, 36 insertions(+)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 6ff1daa763..5ad88d476f 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -288,6 +288,26 @@ const MemoryRegionOps vfio_region_ops = {
> >> },
> >> };
> >>
> >> +static void vfio_container_dma_reserve(VFIOContainer *container,
> >> + unsigned long dma_mappings)
> >> +{
> >> + bool warned = container->dma_reserved > container->dma_max;
> >> +
> >> + container->dma_reserved += dma_mappings;
> >> + if (!warned && container->dma_max &&
> >> + container->dma_reserved > container->dma_max) {
> >> + warn_report("%s: possibly running out of DMA mappings. "
> >> + " Maximum number of DMA mappings: %d", __func__,
> >> + container->dma_max);
> >
> > If we kept track of all the mappings we could predict better than
> > "possibly". Tracing support to track a high water mark might be useful
> > too.
>
> It's an early warning for reservations e.g., on fundamental setup issues
> with virtio-mem.
>
> [...]
>
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index 6141162d7a..fed0e85f66 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -88,6 +88,8 @@ typedef struct VFIOContainer {
> >> uint64_t dirty_pgsizes;
> >> uint64_t max_dirty_bitmap_size;
> >> unsigned long pgsizes;
> >> + unsigned int dma_max;
> >> + unsigned long dma_reserved;
> >
> > If dma_max is unsigned int, why do we need an unsigned long to track
> > how many are in use? Thanks,
>
> My thinking was that some vfio IOMMU types don't have such a limit
> (dma_max == -1) and could allow for more. But thinking again, such a
> huge number of mappings is highly unrealistic :)
Yeah, that's why it's only an unsigned int from the kernel, even with
4K mappings we could handle 16TB worth of individual 4K mappings.
Thanks,
Alex
On 17.12.20 20:47, Alex Williamson wrote: > On Thu, 17 Dec 2020 20:04:28 +0100 > David Hildenbrand <david@redhat.com> wrote: > >> On 17.12.20 18:55, Alex Williamson wrote: >>> On Wed, 16 Dec 2020 15:11:54 +0100 >>> David Hildenbrand <david@redhat.com> wrote: >>> >>>> Let's query the maximum number of DMA mappings by querying the available >>>> mappings when creating the container. >>>> >>>> In addition, count the number of DMA mappings and warn when we would >>>> exceed it. This is a preparation for RamDiscardMgr which might >>>> create quite some DMA mappings over time, and we at least want to warn >>>> early that the QEMU setup might be problematic. Use "reserved" >>>> terminology, so we can use this to reserve mappings before they are >>>> actually created. >>> >>> This terminology doesn't make much sense to me, we're not actually >>> performing any kind of reservation. >> >> I see you spotted the second user which actually performs reservations. >> >>> >>>> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size >>>> divided by the mapping page size might be a bad indication of what will >>>> happen in practice - we might end up warning all the time. >>> >>> This suggests we're not really tracking DMA "reservations" at all. >>> Would something like dma_regions_mappings be a more appropriate >>> identifier for the thing you're trying to count? We might as well also >> >> Right now I want to count >> - Mappings we know we will definitely have (counted in this patch) >> - Mappings we know we could eventually have later (RamDiscardMgr) >> >>> keep a counter for dma_iommu_mappings where the sum of those two should >>> stay below dma_max_mappings. >> >> We could, however, tracking active IOMMU mappings when removing a memory >> region from the address space isn't easily possible - we do a single >> vfio_dma_unmap() which might span multiple mappings. Same applies to >> RamDiscardMgr. Hard to count how many mappings we actually *currently* >> have using that approach. > > It's actually easy for the RamDiscardMgr regions, the unmap ioctl > returns the total size of the unmapped extents. Therefore since we > only map granule sized extents, simple math should tell us how many > entries we freed. OTOH, if there are other ways that we unmap multiple > mappings where we don't have those semantics, then it would be > prohibitive. So, I decided to not track the number of current mappings for now, but instead use your suggestion to sanity-check via 1. Consulting kvm_get_max_memslots() to guess how many DMA mappings we might have in the worst case apart from the one via RamDiscardMgr. 2. Calculating the maximum number of DMA mappings that could be consumed by RamDiscardMgr in the given setup by looking at all entries in the vrdl list. Looks much cleaner now. -- Thanks, David / dhildenb
> Am 17.12.2020 um 20:04 schrieb David Hildenbrand <david@redhat.com>: > > On 17.12.20 18:55, Alex Williamson wrote: >>> On Wed, 16 Dec 2020 15:11:54 +0100 >>> David Hildenbrand <david@redhat.com> wrote: >>> >>> Let's query the maximum number of DMA mappings by querying the available >>> mappings when creating the container. >>> >>> In addition, count the number of DMA mappings and warn when we would >>> exceed it. This is a preparation for RamDiscardMgr which might >>> create quite some DMA mappings over time, and we at least want to warn >>> early that the QEMU setup might be problematic. Use "reserved" >>> terminology, so we can use this to reserve mappings before they are >>> actually created. >> >> This terminology doesn't make much sense to me, we're not actually >> performing any kind of reservation. > > I see you spotted the second user which actually performs reservations. > >> >>> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size >>> divided by the mapping page size might be a bad indication of what will >>> happen in practice - we might end up warning all the time. >> >> This suggests we're not really tracking DMA "reservations" at all. >> Would something like dma_regions_mappings be a more appropriate >> identifier for the thing you're trying to count? We might as well also > > Right now I want to count > - Mappings we know we will definitely have (counted in this patch) > - Mappings we know we could eventually have later (RamDiscardMgr) > >> keep a counter for dma_iommu_mappings where the sum of those two should >> stay below dma_max_mappings. > > We could, however, tracking active IOMMU mappings when removing a memory > region from the address space isn't easily possible - we do a single > vfio_dma_unmap() which might span multiple mappings. Same applies to > RamDiscardMgr. Hard to count how many mappings we actually *currently* > have using that approach. > Thinking about it, might actually be possible when tracking active mappings per iommu / ram discard mgr as well. Will have a look in the new year - thanks.
© 2016 - 2025 Red Hat, Inc.