target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

[PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Vivek Kasireddy 2 years ago

Recent updates in OVMF and Seabios have resulted in MMIO regions
being placed at the upper end of the physical address space. As a
result, when a Host device is assigned to the Guest via VFIO, the
following mapping failures occur when VFIO tries to map the MMIO
regions of the device:
VFIO_MAP_DMA failed: Invalid argument
vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, 0x7f98ac400000) = -22 (Invalid argument)

The above failures are mainly seen on some Intel platforms where
the physical address width is larger than the Host's IOMMU
address width. In these cases, VFIO fails to map the MMIO regions
because the IOVAs would be larger than the IOMMU aperture regions.

Therefore, one way to solve this problem would be to ensure that
cpu->phys_bits = <IOMMU phys_bits>
This can be done by parsing the IOMMU caps value from sysfs and
extracting the address width and using it to override the
phys_bits value as shown in this patch.

Previous attempt at solving this issue in OVMF:
https://edk2.groups.io/g/devel/topic/102359124

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cédric Le Goater <clg@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>

---
v2:
- Replace the term passthrough with assigned (Laszlo)
- Update the commit message to note that both OVMF and Seabios
  guests are affected (Cédric)
- Update the subject to indicate what is done in the patch
---
 target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 92ecb7254b..5c9fcd7dc2 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -12,6 +12,8 @@
 #include "host-cpu.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "qemu/config-file.h"
+#include "qemu/option.h"
 #include "sysemu/sysemu.h"
 
 /* Note: Only safe for use on x86(-64) hosts */
@@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU *cpu)
     env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
 }
 
+static int intel_iommu_check(void *opaque, QemuOpts *opts, Error **errp)
+{
+    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
+    const char *driver = qemu_opt_get(opts, "driver");
+    const char *device = qemu_opt_get(opts, "host");
+    uint32_t *iommu_phys_bits = opaque;
+    struct stat st;
+    uint64_t iommu_caps;
+
+    /*
+     * Check if the user requested VFIO device assignment. We don't have
+     * to limit phys_bits if there are no valid assigned devices.
+     */
+    if (g_strcmp0(driver, "vfio-pci") || !device) {
+        return 0;
+    }
+
+    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
+    if (stat(dev_path, &st) < 0) {
+        return 0;
+    }
+
+    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
+    if (stat(iommu_path, &st) < 0) {
+        return 0;
+    }
+
+    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
+        if (sscanf(caps, "%lx", &iommu_caps) != 1) {
+            return 0;
+        }
+        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
+    }
+
+    return 0;
+}
+
+static uint32_t host_iommu_phys_bits(void)
+{
+    uint32_t iommu_phys_bits = 0;
+
+    qemu_opts_foreach(qemu_find_opts("device"),
+                      intel_iommu_check, &iommu_phys_bits, NULL);
+    return iommu_phys_bits;
+}
+
 static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
 {
     uint32_t host_phys_bits = host_cpu_phys_bits();
+    uint32_t iommu_phys_bits = host_iommu_phys_bits();
     uint32_t phys_bits = cpu->phys_bits;
-    static bool warned;
+    static bool warned, warned2;
 
     /*
      * Print a warning if the user set it to a value that's not the
@@ -78,6 +127,16 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
         }
     }
 
+    if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
+        phys_bits = iommu_phys_bits;
+        if (!warned2) {
+            warn_report("Using physical bits (%u)"
+                        " to prevent VFIO mapping failures",
+                        iommu_phys_bits);
+            warned2 = true;
+        }
+    }
+
     return phys_bits;
 }
 
-- 
2.39.2

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Eric Auger 2 years ago

Hi Vivek,

On 1/18/24 20:20, Vivek Kasireddy wrote:
> Recent updates in OVMF and Seabios have resulted in MMIO regions
> being placed at the upper end of the physical address space. As a
> result, when a Host device is assigned to the Guest via VFIO, the
> following mapping failures occur when VFIO tries to map the MMIO
> regions of the device:
> VFIO_MAP_DMA failed: Invalid argument
> vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, 0x7f98ac400000) = -22 (Invalid argument)
> 
> The above failures are mainly seen on some Intel platforms where
> the physical address width is larger than the Host's IOMMU
> address width. In these cases, VFIO fails to map the MMIO regions
> because the IOVAs would be larger than the IOMMU aperture regions.
> 
> Therefore, one way to solve this problem would be to ensure that
> cpu->phys_bits = <IOMMU phys_bits>
> This can be done by parsing the IOMMU caps value from sysfs and
> extracting the address width and using it to override the
> phys_bits value as shown in this patch.
> 
> Previous attempt at solving this issue in OVMF:
> https://edk2.groups.io/g/devel/topic/102359124
> 
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> Tested-by: Yanghang Liu <yanghliu@redhat.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> 
> ---
> v2:
> - Replace the term passthrough with assigned (Laszlo)
> - Update the commit message to note that both OVMF and Seabios
>   guests are affected (Cédric)
> - Update the subject to indicate what is done in the patch
> ---
>  target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
> index 92ecb7254b..5c9fcd7dc2 100644
> --- a/target/i386/host-cpu.c
> +++ b/target/i386/host-cpu.c
> @@ -12,6 +12,8 @@
>  #include "host-cpu.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> +#include "qemu/config-file.h"
> +#include "qemu/option.h"
>  #include "sysemu/sysemu.h"
>  
>  /* Note: Only safe for use on x86(-64) hosts */
> @@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU *cpu)
>      env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>  }
>  
> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error **errp)
> +{
> +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
> +    const char *driver = qemu_opt_get(opts, "driver");
> +    const char *device = qemu_opt_get(opts, "host");
> +    uint32_t *iommu_phys_bits = opaque;
> +    struct stat st;
> +    uint64_t iommu_caps;
> +
> +    /*
> +     * Check if the user requested VFIO device assignment. We don't have
> +     * to limit phys_bits if there are no valid assigned devices.
> +     */
> +    if (g_strcmp0(driver, "vfio-pci") || !device) {
> +        return 0;
> +    }
> +
> +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
> +    if (stat(dev_path, &st) < 0) {
> +        return 0;
> +    }
> +
> +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
> +    if (stat(iommu_path, &st) < 0) {
> +        return 0;
> +    }
> +
> +    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
> +        if (sscanf(caps, "%lx", &iommu_caps) != 1) {
> +            return 0;
> +        }
> +        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static uint32_t host_iommu_phys_bits(void)
> +{
> +    uint32_t iommu_phys_bits = 0;
> +
> +    qemu_opts_foreach(qemu_find_opts("device"),
> +                      intel_iommu_check, &iommu_phys_bits, NULL);
> +    return iommu_phys_bits;
> +}
> +
>  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>  {
>      uint32_t host_phys_bits = host_cpu_phys_bits();
> +    uint32_t iommu_phys_bits = host_iommu_phys_bits();
>      uint32_t phys_bits = cpu->phys_bits;
> -    static bool warned;
> +    static bool warned, warned2;
>  
>      /*
>       * Print a warning if the user set it to a value that's not the
> @@ -78,6 +127,16 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>          }
>      }
>  
> +    if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> +        phys_bits = iommu_phys_bits;
are you allowed to change the host cpu characteristics without taking
care of compats for migration?

I don't know anything about OVMF but why isn't the "cap" not directly
read in the FW stack?

Thanks

Eric
> +        if (!warned2) {
> +            warn_report("Using physical bits (%u)"
> +                        " to prevent VFIO mapping failures",
> +                        iommu_phys_bits);
> +            warned2 = true;
> +        }
> +    }
> +
>      return phys_bits;
>  }
>

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Alex Williamson 2 years ago

On Thu, 25 Jan 2024 09:18:02 +0100
Eric Auger <eauger@redhat.com> wrote:

> Hi Vivek,
> 
> On 1/18/24 20:20, Vivek Kasireddy wrote:
> > Recent updates in OVMF and Seabios have resulted in MMIO regions
> > being placed at the upper end of the physical address space. As a
> > result, when a Host device is assigned to the Guest via VFIO, the
> > following mapping failures occur when VFIO tries to map the MMIO
> > regions of the device:
> > VFIO_MAP_DMA failed: Invalid argument
> > vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, 0x7f98ac400000) = -22 (Invalid argument)
> > 
> > The above failures are mainly seen on some Intel platforms where
> > the physical address width is larger than the Host's IOMMU
> > address width. In these cases, VFIO fails to map the MMIO regions
> > because the IOVAs would be larger than the IOMMU aperture regions.
> > 
> > Therefore, one way to solve this problem would be to ensure that
> > cpu->phys_bits = <IOMMU phys_bits>
> > This can be done by parsing the IOMMU caps value from sysfs and
> > extracting the address width and using it to override the
> > phys_bits value as shown in this patch.
> > 
> > Previous attempt at solving this issue in OVMF:
> > https://edk2.groups.io/g/devel/topic/102359124
> > 
> > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Cédric Le Goater <clg@redhat.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Cc: Dongwon Kim <dongwon.kim@intel.com>
> > Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> > Tested-by: Yanghang Liu <yanghliu@redhat.com>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > 
> > ---
> > v2:
> > - Replace the term passthrough with assigned (Laszlo)
> > - Update the commit message to note that both OVMF and Seabios
> >   guests are affected (Cédric)
> > - Update the subject to indicate what is done in the patch
> > ---
> >  target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 60 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
> > index 92ecb7254b..5c9fcd7dc2 100644
> > --- a/target/i386/host-cpu.c
> > +++ b/target/i386/host-cpu.c
> > @@ -12,6 +12,8 @@
> >  #include "host-cpu.h"
> >  #include "qapi/error.h"
> >  #include "qemu/error-report.h"
> > +#include "qemu/config-file.h"
> > +#include "qemu/option.h"
> >  #include "sysemu/sysemu.h"
> >  
> >  /* Note: Only safe for use on x86(-64) hosts */
> > @@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU *cpu)
> >      env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> >  }
> >  
> > +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error **errp)
> > +{
> > +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
> > +    const char *driver = qemu_opt_get(opts, "driver");
> > +    const char *device = qemu_opt_get(opts, "host");
> > +    uint32_t *iommu_phys_bits = opaque;
> > +    struct stat st;
> > +    uint64_t iommu_caps;
> > +
> > +    /*
> > +     * Check if the user requested VFIO device assignment. We don't have
> > +     * to limit phys_bits if there are no valid assigned devices.
> > +     */
> > +    if (g_strcmp0(driver, "vfio-pci") || !device) {
> > +        return 0;
> > +    }
> > +
> > +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
> > +    if (stat(dev_path, &st) < 0) {
> > +        return 0;
> > +    }
> > +
> > +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
> > +    if (stat(iommu_path, &st) < 0) {
> > +        return 0;
> > +    }
> > +
> > +    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
> > +        if (sscanf(caps, "%lx", &iommu_caps) != 1) {
> > +            return 0;
> > +        }
> > +        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static uint32_t host_iommu_phys_bits(void)
> > +{
> > +    uint32_t iommu_phys_bits = 0;
> > +
> > +    qemu_opts_foreach(qemu_find_opts("device"),
> > +                      intel_iommu_check, &iommu_phys_bits, NULL);
> > +    return iommu_phys_bits;
> > +}
> > +
> >  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
> >  {
> >      uint32_t host_phys_bits = host_cpu_phys_bits();
> > +    uint32_t iommu_phys_bits = host_iommu_phys_bits();
> >      uint32_t phys_bits = cpu->phys_bits;
> > -    static bool warned;
> > +    static bool warned, warned2;
> >  
> >      /*
> >       * Print a warning if the user set it to a value that's not the
> > @@ -78,6 +127,16 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
> >          }
> >      }
> >  
> > +    if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> > +        phys_bits = iommu_phys_bits;  
> are you allowed to change the host cpu characteristics without taking
> care of compats for migration?

Not only is migration an issue, but so is hotplug.  Anything that
aligns the VM configuration to the host is going to have migration
issues and anything that does so conditionally based on the current set
of attached devices will have hotplug issues.

It'd be more prudent to find the least common denominator under
/sys/class/iommu/dmar*/intel-iommu/cap regardless of attached devices,
but it still affects migration compatibility between hosts.

Also note that vfio-pci can specify the device with either host= or
sysfsdev= and with vfio cdev support and iommufd the file descriptor
for the vfio device might be passed via SCM rights, so we may not have
a reference to it in sysfs.  The above appears to only work with the
host= device specification.  Thanks,

Alex

RE: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Tian, Kevin 2 years ago

> From: Alex Williamson
> Sent: Friday, January 26, 2024 5:20 AM
> 
> On Thu, 25 Jan 2024 09:18:02 +0100
> Eric Auger <eauger@redhat.com> wrote:
> 
> > Hi Vivek,
> >
> > On 1/18/24 20:20, Vivek Kasireddy wrote:
> > >
> > > +    if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> > > +        phys_bits = iommu_phys_bits;
> > are you allowed to change the host cpu characteristics without taking
> > care of compats for migration?
> 
> Not only is migration an issue, but so is hotplug.  Anything that
> aligns the VM configuration to the host is going to have migration
> issues and anything that does so conditionally based on the current set
> of attached devices will have hotplug issues.

Does it make more sense to print out a warning so it's admin to figure
out the proper phys_bits workable with the iommu restriction, instead of
forcing/hard-coding any value implicitly?

> 
> It'd be more prudent to find the least common denominator under
> /sys/class/iommu/dmar*/intel-iommu/cap regardless of attached devices,
> but it still affects migration compatibility between hosts.
> 
> Also note that vfio-pci can specify the device with either host= or
> sysfsdev= and with vfio cdev support and iommufd the file descriptor
> for the vfio device might be passed via SCM rights, so we may not have
> a reference to it in sysfs.  The above appears to only work with the
> host= device specification.  Thanks,
> 

iommufd supports a IOMMUFD_CMD_GET_HW_INFO cmd which can
also report cap/ecap on intel iommu.

RE: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Kasireddy, Vivek 2 years ago

Hi Alex, Eric, 

> > > Recent updates in OVMF and Seabios have resulted in MMIO regions
> > > being placed at the upper end of the physical address space. As a
> > > result, when a Host device is assigned to the Guest via VFIO, the
> > > following mapping failures occur when VFIO tries to map the MMIO
> > > regions of the device:
> > > VFIO_MAP_DMA failed: Invalid argument
> > > vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000,
> 0x7f98ac400000) = -22 (Invalid argument)
> > >
> > > The above failures are mainly seen on some Intel platforms where
> > > the physical address width is larger than the Host's IOMMU
> > > address width. In these cases, VFIO fails to map the MMIO regions
> > > because the IOVAs would be larger than the IOMMU aperture regions.
> > >
> > > Therefore, one way to solve this problem would be to ensure that
> > > cpu->phys_bits = <IOMMU phys_bits>
> > > This can be done by parsing the IOMMU caps value from sysfs and
> > > extracting the address width and using it to override the
> > > phys_bits value as shown in this patch.
> > >
> > > Previous attempt at solving this issue in OVMF:
> > > https://edk2.groups.io/g/devel/topic/102359124
> > >
> > > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > Cc: Cédric Le Goater <clg@redhat.com>
> > > Cc: Laszlo Ersek <lersek@redhat.com>
> > > Cc: Dongwon Kim <dongwon.kim@intel.com>
> > > Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> > > Tested-by: Yanghang Liu <yanghliu@redhat.com>
> > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > >
> > > ---
> > > v2:
> > > - Replace the term passthrough with assigned (Laszlo)
> > > - Update the commit message to note that both OVMF and Seabios
> > >   guests are affected (Cédric)
> > > - Update the subject to indicate what is done in the patch
> > > ---
> > >  target/i386/host-cpu.c | 61
> +++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 60 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
> > > index 92ecb7254b..5c9fcd7dc2 100644
> > > --- a/target/i386/host-cpu.c
> > > +++ b/target/i386/host-cpu.c
> > > @@ -12,6 +12,8 @@
> > >  #include "host-cpu.h"
> > >  #include "qapi/error.h"
> > >  #include "qemu/error-report.h"
> > > +#include "qemu/config-file.h"
> > > +#include "qemu/option.h"
> > >  #include "sysemu/sysemu.h"
> > >
> > >  /* Note: Only safe for use on x86(-64) hosts */
> > > @@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU
> *cpu)
> > >      env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> > >  }
> > >
> > > +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error
> **errp)
> > > +{
> > > +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps =
> NULL;
> > > +    const char *driver = qemu_opt_get(opts, "driver");
> > > +    const char *device = qemu_opt_get(opts, "host");
> > > +    uint32_t *iommu_phys_bits = opaque;
> > > +    struct stat st;
> > > +    uint64_t iommu_caps;
> > > +
> > > +    /*
> > > +     * Check if the user requested VFIO device assignment. We don't have
> > > +     * to limit phys_bits if there are no valid assigned devices.
> > > +     */
> > > +    if (g_strcmp0(driver, "vfio-pci") || !device) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
> > > +    if (stat(dev_path, &st) < 0) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap",
> dev_path);
> > > +    if (stat(iommu_path, &st) < 0) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
> > > +        if (sscanf(caps, "%lx", &iommu_caps) != 1) {
> > > +            return 0;
> > > +        }
> > > +        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static uint32_t host_iommu_phys_bits(void)
> > > +{
> > > +    uint32_t iommu_phys_bits = 0;
> > > +
> > > +    qemu_opts_foreach(qemu_find_opts("device"),
> > > +                      intel_iommu_check, &iommu_phys_bits, NULL);
> > > +    return iommu_phys_bits;
> > > +}
> > > +
> > >  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
> > >  {
> > >      uint32_t host_phys_bits = host_cpu_phys_bits();
> > > +    uint32_t iommu_phys_bits = host_iommu_phys_bits();
> > >      uint32_t phys_bits = cpu->phys_bits;
> > > -    static bool warned;
> > > +    static bool warned, warned2;
> > >
> > >      /*
> > >       * Print a warning if the user set it to a value that's not the
> > > @@ -78,6 +127,16 @@ static uint32_t
> host_cpu_adjust_phys_bits(X86CPU *cpu)
> > >          }
> > >      }
> > >
> > > +    if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> > > +        phys_bits = iommu_phys_bits;
> > are you allowed to change the host cpu characteristics without taking
> > care of compats for migration?
> 
> Not only is migration an issue, but so is hotplug.  Anything that
> aligns the VM configuration to the host is going to have migration
> issues and anything that does so conditionally based on the current set
> of attached devices will have hotplug issues.
> 
> It'd be more prudent to find the least common denominator under
> /sys/class/iommu/dmar*/intel-iommu/cap regardless of attached devices,
> but it still affects migration compatibility between hosts.
> 
> Also note that vfio-pci can specify the device with either host= or
> sysfsdev= and with vfio cdev support and iommufd the file descriptor
> for the vfio device might be passed via SCM rights, so we may not have
> a reference to it in sysfs.  The above appears to only work with the
> host= device specification.  Thanks,
To address both migration and hotplug use-cases, the only option I can
see is to just hardcode phys_bits to 39 (smallest iommu aw seen on recent
Intel h/w) if an Intel iommu is detected.
Or, instead of doing this automatically with this patch, I am thinking of just
printing a warning that, in order to prevent VFIO mapping failures, the user
needs to add the following when an Intel iommu is detected while launching
Qemu:
-cpu host,host-phys-bits=on,host-phys-bits-limit=39 

Since neither of these options are great, I am wondering if there is a more
elegant way to solve this problem that you can recommend.

Thanks,
Vivek

> 
> Alex

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Cédric Le Goater 2 years ago

On 1/18/24 20:20, Vivek Kasireddy wrote:
> Recent updates in OVMF and Seabios have resulted in MMIO regions
> being placed at the upper end of the physical address space. As a
> result, when a Host device is assigned to the Guest via VFIO, the
> following mapping failures occur when VFIO tries to map the MMIO
> regions of the device:
> VFIO_MAP_DMA failed: Invalid argument
> vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, 0x7f98ac400000) = -22 (Invalid argument)
> 
> The above failures are mainly seen on some Intel platforms where
> the physical address width is larger than the Host's IOMMU
> address width. In these cases, VFIO fails to map the MMIO regions
> because the IOVAs would be larger than the IOMMU aperture regions.
> 
> Therefore, one way to solve this problem would be to ensure that
> cpu->phys_bits = <IOMMU phys_bits>
> This can be done by parsing the IOMMU caps value from sysfs and
> extracting the address width and using it to override the
> phys_bits value as shown in this patch.
> 
> Previous attempt at solving this issue in OVMF:
> https://edk2.groups.io/g/devel/topic/102359124
> 
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> Tested-by: Yanghang Liu <yanghliu@redhat.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> 
> ---
> v2:
> - Replace the term passthrough with assigned (Laszlo)
> - Update the commit message to note that both OVMF and Seabios
>    guests are affected (Cédric)
> - Update the subject to indicate what is done in the patch
> ---
>   target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
> index 92ecb7254b..5c9fcd7dc2 100644
> --- a/target/i386/host-cpu.c
> +++ b/target/i386/host-cpu.c
> @@ -12,6 +12,8 @@
>   #include "host-cpu.h"
>   #include "qapi/error.h"
>   #include "qemu/error-report.h"
> +#include "qemu/config-file.h"
> +#include "qemu/option.h"
>   #include "sysemu/sysemu.h"
>   
>   /* Note: Only safe for use on x86(-64) hosts */
> @@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU *cpu)
>       env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>   }
>   
> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error **errp)
> +{
> +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
> +    const char *driver = qemu_opt_get(opts, "driver");
> +    const char *device = qemu_opt_get(opts, "host");
> +    uint32_t *iommu_phys_bits = opaque;
> +    struct stat st;
> +    uint64_t iommu_caps;
> +
> +    /*
> +     * Check if the user requested VFIO device assignment. We don't have
> +     * to limit phys_bits if there are no valid assigned devices.
> +     */
> +    if (g_strcmp0(driver, "vfio-pci") || !device) {
> +        return 0;
> +    }
> +
> +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
> +    if (stat(dev_path, &st) < 0) {
> +        return 0;
> +    }
> +
> +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
> +    if (stat(iommu_path, &st) < 0) {
> +        return 0;
> +    }
> +
> +    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
> +        if (sscanf(caps, "%lx", &iommu_caps) != 1) {

nit. This should use a PRIx64 define.

> +            return 0;
> +        }
> +        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;

Please use 0x3fULL and this could be a macro in include/hw/i386/intel_iommu.h

Thanks,

C.



> +    }
> +
> +    return 0;
> +}
> +
> +static uint32_t host_iommu_phys_bits(void)
> +{
> +    uint32_t iommu_phys_bits = 0;
> +
> +    qemu_opts_foreach(qemu_find_opts("device"),
> +                      intel_iommu_check, &iommu_phys_bits, NULL);
> +    return iommu_phys_bits;
> +}
> +
>   static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>   {
>       uint32_t host_phys_bits = host_cpu_phys_bits();
> +    uint32_t iommu_phys_bits = host_iommu_phys_bits();
>       uint32_t phys_bits = cpu->phys_bits;
> -    static bool warned;
> +    static bool warned, warned2;
>   
>       /*
>        * Print a warning if the user set it to a value that's not the
> @@ -78,6 +127,16 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>           }
>       }
>   
> +    if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> +        phys_bits = iommu_phys_bits;
> +        if (!warned2) {
> +            warn_report("Using physical bits (%u)"
> +                        " to prevent VFIO mapping failures",
> +                        iommu_phys_bits);
> +            warned2 = true;
> +        }
> +    }
> +
>       return phys_bits;
>   }
>

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Philippe Mathieu-Daudé 2 years ago

On 24/1/24 12:53, Cédric Le Goater wrote:
> On 1/18/24 20:20, Vivek Kasireddy wrote:
>> Recent updates in OVMF and Seabios have resulted in MMIO regions
>> being placed at the upper end of the physical address space. As a
>> result, when a Host device is assigned to the Guest via VFIO, the
>> following mapping failures occur when VFIO tries to map the MMIO
>> regions of the device:
>> VFIO_MAP_DMA failed: Invalid argument
>> vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, 
>> 0x7f98ac400000) = -22 (Invalid argument)
>>
>> The above failures are mainly seen on some Intel platforms where
>> the physical address width is larger than the Host's IOMMU
>> address width. In these cases, VFIO fails to map the MMIO regions
>> because the IOVAs would be larger than the IOMMU aperture regions.
>>
>> Therefore, one way to solve this problem would be to ensure that
>> cpu->phys_bits = <IOMMU phys_bits>
>> This can be done by parsing the IOMMU caps value from sysfs and
>> extracting the address width and using it to override the
>> phys_bits value as shown in this patch.
>>
>> Previous attempt at solving this issue in OVMF:
>> https://edk2.groups.io/g/devel/topic/102359124
>>
>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Cédric Le Goater <clg@redhat.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Dongwon Kim <dongwon.kim@intel.com>
>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>> Tested-by: Yanghang Liu <yanghliu@redhat.com>
>> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>>
>> ---
>> v2:
>> - Replace the term passthrough with assigned (Laszlo)
>> - Update the commit message to note that both OVMF and Seabios
>>    guests are affected (Cédric)
>> - Update the subject to indicate what is done in the patch
>> ---
>>   target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 60 insertions(+), 1 deletion(-)


>> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error **errp)
>> +{
>> +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
>> +    const char *driver = qemu_opt_get(opts, "driver");
>> +    const char *device = qemu_opt_get(opts, "host");
>> +    uint32_t *iommu_phys_bits = opaque;
>> +    struct stat st;
>> +    uint64_t iommu_caps;
>> +
>> +    /*
>> +     * Check if the user requested VFIO device assignment. We don't have
>> +     * to limit phys_bits if there are no valid assigned devices.
>> +     */
>> +    if (g_strcmp0(driver, "vfio-pci") || !device) {
>> +        return 0;
>> +    }
>> +
>> +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
>> +    if (stat(dev_path, &st) < 0) {
>> +        return 0;
>> +    }
>> +
>> +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
>> +    if (stat(iommu_path, &st) < 0) {
>> +        return 0;
>> +    }
>> +
>> +    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
>> +        if (sscanf(caps, "%lx", &iommu_caps) != 1) {
> 
> nit. This should use a PRIx64 define.
> 
>> +            return 0;
>> +        }
>> +        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
> 
> Please use 0x3fULL

or:

            *iommu_phys_bits = 1 + extract32(iommu_caps, 16, 6);

> and this could be a macro in 
> include/hw/i386/intel_iommu.h
> 
> Thanks,
> 
> C.

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Laszlo Ersek 2 years ago

On 1/24/24 13:58, Philippe Mathieu-Daudé wrote:
> On 24/1/24 12:53, Cédric Le Goater wrote:
>> On 1/18/24 20:20, Vivek Kasireddy wrote:
>>> Recent updates in OVMF and Seabios have resulted in MMIO regions
>>> being placed at the upper end of the physical address space. As a
>>> result, when a Host device is assigned to the Guest via VFIO, the
>>> following mapping failures occur when VFIO tries to map the MMIO
>>> regions of the device:
>>> VFIO_MAP_DMA failed: Invalid argument
>>> vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000,
>>> 0x7f98ac400000) = -22 (Invalid argument)
>>>
>>> The above failures are mainly seen on some Intel platforms where
>>> the physical address width is larger than the Host's IOMMU
>>> address width. In these cases, VFIO fails to map the MMIO regions
>>> because the IOVAs would be larger than the IOMMU aperture regions.
>>>
>>> Therefore, one way to solve this problem would be to ensure that
>>> cpu->phys_bits = <IOMMU phys_bits>
>>> This can be done by parsing the IOMMU caps value from sysfs and
>>> extracting the address width and using it to override the
>>> phys_bits value as shown in this patch.
>>>
>>> Previous attempt at solving this issue in OVMF:
>>> https://edk2.groups.io/g/devel/topic/102359124
>>>
>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
>>> Cc: Alex Williamson <alex.williamson@redhat.com>
>>> Cc: Cédric Le Goater <clg@redhat.com>
>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>> Cc: Dongwon Kim <dongwon.kim@intel.com>
>>> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>>> Tested-by: Yanghang Liu <yanghliu@redhat.com>
>>> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>>>
>>> ---
>>> v2:
>>> - Replace the term passthrough with assigned (Laszlo)
>>> - Update the commit message to note that both OVMF and Seabios
>>>    guests are affected (Cédric)
>>> - Update the subject to indicate what is done in the patch
>>> ---
>>>   target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 60 insertions(+), 1 deletion(-)
> 
> 
>>> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error
>>> **errp)
>>> +{
>>> +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
>>> +    const char *driver = qemu_opt_get(opts, "driver");
>>> +    const char *device = qemu_opt_get(opts, "host");
>>> +    uint32_t *iommu_phys_bits = opaque;
>>> +    struct stat st;
>>> +    uint64_t iommu_caps;
>>> +
>>> +    /*
>>> +     * Check if the user requested VFIO device assignment. We don't
>>> have
>>> +     * to limit phys_bits if there are no valid assigned devices.
>>> +     */
>>> +    if (g_strcmp0(driver, "vfio-pci") || !device) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
>>> +    if (stat(dev_path, &st) < 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
>>> +    if (stat(iommu_path, &st) < 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) {
>>> +        if (sscanf(caps, "%lx", &iommu_caps) != 1) {
>>
>> nit. This should use a PRIx64 define.
>>
>>> +            return 0;
>>> +        }
>>> +        *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
>>
>> Please use 0x3fULL
> 
> or:
> 
>            *iommu_phys_bits = 1 + extract32(iommu_caps, 16, 6);

Huh, interesting; I've never seen this recommended before, even though
it comes from a very old commit -- 84988cf910a6 ("bitops.h: Add
functions to extract and deposit bitfields", 2012-07-07).

I thought only edk2 had BitFieldRead32() :)

Laszlo

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

Posted by Laszlo Ersek 2 years ago

On 1/18/24 20:20, Vivek Kasireddy wrote:
> Recent updates in OVMF and Seabios have resulted in MMIO regions
> being placed at the upper end of the physical address space. As a
> result, when a Host device is assigned to the Guest via VFIO, the
> following mapping failures occur when VFIO tries to map the MMIO
> regions of the device:
> VFIO_MAP_DMA failed: Invalid argument
> vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, 0x7f98ac400000) = -22 (Invalid argument)
> 
> The above failures are mainly seen on some Intel platforms where
> the physical address width is larger than the Host's IOMMU
> address width. In these cases, VFIO fails to map the MMIO regions
> because the IOVAs would be larger than the IOMMU aperture regions.
> 
> Therefore, one way to solve this problem would be to ensure that
> cpu->phys_bits = <IOMMU phys_bits>
> This can be done by parsing the IOMMU caps value from sysfs and
> extracting the address width and using it to override the
> phys_bits value as shown in this patch.
> 
> Previous attempt at solving this issue in OVMF:
> https://edk2.groups.io/g/devel/topic/102359124
> 
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> Tested-by: Yanghang Liu <yanghliu@redhat.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> 
> ---
> v2:
> - Replace the term passthrough with assigned (Laszlo)

v1 of the patch was posted in last November; I've now re-read my
(superficial) comments from back then.

Acked-by: Laszlo Ersek <lersek@redhat.com>