[RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding

Shameer Kolothum via posted 5 patches 1 year, 3 months ago
[RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Shameer Kolothum via 1 year, 3 months ago
From: Eric Auger <eric.auger@redhat.com>

To handle SMMUv3 nested stage support it is practical to
expose the guest with reserved memory regions (RMRs)
covering the IOVAs used by the host kernel to map
physical MSI doorbells.

Those IOVAs belong to [0x8000000, 0x8100000] matching
MSI_IOVA_BASE and MSI_IOVA_LENGTH definitions in kernel
arm-smmu-v3 driver. This is the window used to allocate
IOVAs matching physical MSI doorbells.

With those RMRs, the guest is forced to use a flat mapping
for this range. Hence the assigned device is programmed
with one IOVA from this range. Stage 1, owned by the guest
has a flat mapping for this IOVA. Stage2, owned by the VMM
then enforces a mapping from this IOVA to the physical
MSI doorbell.

The creation of those RMR nodes only is relevant if nested
stage SMMU is in use, along with VFIO. As VFIO devices can be
hotplugged, all RMRs need to be created in advance. Hence
the patch introduces a new arm virt "nested-smmuv3" iommu type.

ARM DEN 0049E.b IORT specification also mandates that when
RMRs are present, the OS must preserve PCIe configuration
performed by the boot FW. So along with the RMR IORT nodes,
a _DSM function #5, as defined by PCI FIRMWARE SPECIFICATION
EVISION 3.3, chapter 4.6.5 is added to PCIe host bridge
and PCIe expander bridge objects.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/virt-acpi-build.c | 77 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 68 insertions(+), 9 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ec4cdfb2d7..f327ca59ec 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -132,6 +132,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
         .bus    = vms->bus,
     };
 
+    /*
+     * Nested SMMU requires RMRs for MSI 1-1 mapping, which
+     * require _DSM for PreservingPCI Boot Configurations
+     */
+    if (vms->iommu == VIRT_IOMMU_SMMUV3_NESTED) {
+        cfg.preserve_config = true;
+    }
+
     if (vms->highmem_mmio) {
         cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO];
     }
@@ -216,16 +224,16 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms)
  *
  * Note that @id_count gets internally subtracted by one, following the spec.
  */
-static void build_iort_id_mapping(GArray *table_data, uint32_t input_base,
-                                  uint32_t id_count, uint32_t out_ref)
+static void
+build_iort_id_mapping(GArray *table_data, uint32_t input_base,
+                      uint32_t id_count, uint32_t out_ref, uint32_t flags)
 {
     build_append_int_noprefix(table_data, input_base, 4); /* Input base */
     /* Number of IDs - The number of IDs in the range minus one */
     build_append_int_noprefix(table_data, id_count - 1, 4);
     build_append_int_noprefix(table_data, input_base, 4); /* Output base */
     build_append_int_noprefix(table_data, out_ref, 4); /* Output Reference */
-    /* Flags */
-    build_append_int_noprefix(table_data, 0 /* Single mapping (disabled) */, 4);
+    build_append_int_noprefix(table_data, flags, 4); /* Flags */
 }
 
 struct AcpiIortIdMapping {
@@ -267,6 +275,50 @@ static int iort_idmap_compare(gconstpointer a, gconstpointer b)
     return idmap_a->input_base - idmap_b->input_base;
 }
 
+static void
+build_iort_rmr_nodes(GArray *table_data, GArray *smmu_idmaps,
+                     size_t *smmu_offset, uint32_t *id)
+{
+    AcpiIortIdMapping *range;
+    int i;
+
+    for (i = 0; i < smmu_idmaps->len; i++) {
+        range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
+        int bdf = range->input_base;
+
+        /* Table 18 Reserved Memory Range Node */
+
+        build_append_int_noprefix(table_data, 6 /* RMR */, 1); /* Type */
+        /* Length */
+        build_append_int_noprefix(table_data, 28 + ID_MAPPING_ENTRY_SIZE + 20, 2);
+        build_append_int_noprefix(table_data, 3, 1); /* Revision */
+        build_append_int_noprefix(table_data, *id, 4); /* Identifier */
+        /* Number of ID mappings */
+        build_append_int_noprefix(table_data, 1, 4);
+        /* Reference to ID Array */
+        build_append_int_noprefix(table_data, 28, 4);
+
+        /* RMR specific data */
+
+        /* Flags */
+        build_append_int_noprefix(table_data, 0 /* Disallow remapping */, 4);
+        /* Number of Memory Range Descriptors */
+        build_append_int_noprefix(table_data, 1 , 4);
+        /* Reference to Memory Range Descriptors */
+        build_append_int_noprefix(table_data, 28 + ID_MAPPING_ENTRY_SIZE, 4);
+        build_iort_id_mapping(table_data, bdf, range->id_count, smmu_offset[i], 1);
+
+        /* Table 19 Memory Range Descriptor */
+
+        /* Physical Range offset */
+        build_append_int_noprefix(table_data, 0x8000000, 8);
+        /* Physical Range length */
+        build_append_int_noprefix(table_data, 0x100000, 8);
+        build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+        *id += 1;
+    }
+}
+
 /*
  * Input Output Remapping Table (IORT)
  * Conforms to "IO Remapping Table System Software on ARM Platforms",
@@ -284,7 +336,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
     GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
 
-    AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
+    AcpiTable table = { .sig = "IORT", .rev = 5, .oem_id = vms->oem_id,
                         .oem_table_id = vms->oem_table_id };
     /* Table 2 The IORT */
     acpi_table_begin(&table, table_data);
@@ -325,6 +377,9 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             }
 
             next_range.input_base = idmap->input_base + idmap->id_count;
+            if (vms->iommu == VIRT_IOMMU_SMMUV3_NESTED) {
+                nb_nodes++; /* RMR node per SMMU */
+            }
         }
 
         /* Append the last RC -> ITS ID mapping */
@@ -386,7 +441,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         build_append_int_noprefix(table_data, 0, 4);
 
         /* output IORT node is the ITS group node (the first node) */
-        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET);
+        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET, 0);
     }
 
     /* Table 17 Root Complex Node */
@@ -427,7 +482,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
             /* output IORT node is the smmuv3 node */
             build_iort_id_mapping(table_data, range->input_base,
-                                  range->id_count, smmu_offset[i]);
+                                  range->id_count, smmu_offset[i], 0);
         }
 
         /* bypassed RIDs connect to ITS group node directly: RC -> ITS */
@@ -435,11 +490,15 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             range = &g_array_index(its_idmaps, AcpiIortIdMapping, i);
             /* output IORT node is the ITS group node (the first node) */
             build_iort_id_mapping(table_data, range->input_base,
-                                  range->id_count, IORT_NODE_OFFSET);
+                                  range->id_count, IORT_NODE_OFFSET, 0);
         }
     } else {
         /* output IORT node is the ITS group node (the first node) */
-        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET);
+        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET, 0);
+    }
+
+    if (vms->iommu == VIRT_IOMMU_SMMUV3_NESTED) {
+        build_iort_rmr_nodes(table_data, smmu_idmaps, smmu_offset, &id);
     }
 
     acpi_table_end(linker, &table);
-- 
2.34.1
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Nicolin Chen 1 year, 1 month ago
Hi Eric/Don/Shameer,

On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> Those IOVAs belong to [0x8000000, 0x8100000] matching
> MSI_IOVA_BASE and MSI_IOVA_LENGTH definitions in kernel
> arm-smmu-v3 driver. This is the window used to allocate
> IOVAs matching physical MSI doorbells.
[...]
> +static void
> +build_iort_rmr_nodes(GArray *table_data, GArray *smmu_idmaps,
> +                     size_t *smmu_offset, uint32_t *id)
> +{
[...]
> +        /* Physical Range offset */
> +        build_append_int_noprefix(table_data, 0x8000000, 8);
> +        /* Physical Range length */
> +        build_append_int_noprefix(table_data, 0x100000, 8);
> +        build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> +        *id += 1;
> +    }

Jason made some kernel patches for iommufd to do MSI mappings:
https://github.com/jgunthorpe/linux/commits/for-nicolin/
It addresses Robin's remark against a get_msi_mapping_domain API
so that we could likely support a RMR solution as well, if a VMM
chooses to use it v.s. a future non-RMR one (mapping vITS page).

Yet, here we seem to be missing a pathway between VMM and kernel
to agree on the MSI window decided by the kernel, as this patch
does the hard coding for a [0x8000000, 0x8100000) range.

Though I am aware that the sysfs node
"/sys/devices/pci000x/000x/iommu_group/reserved_regions" exposes
the MSI window, it's probably better to have a new iommufd uAPI
to expose the range, so a nested domain eventually might be able
to choose between a RMR flow and a non-RMR flow.

I have been going through the structures between QEMU's SMMU code
and virt/virt-acpi-build code, yet having a hard time to figure
out a way to forward the MSI window from the SMMU code to IORT,
especially after this series changes the "smmu" instance creation
from virt code to "-device" string. Any thought?

Thanks
Nicolin
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Jason Gunthorpe 1 year, 1 month ago
On Tue, Dec 10, 2024 at 03:01:48PM -0800, Nicolin Chen wrote:

> Yet, here we seem to be missing a pathway between VMM and kernel
> to agree on the MSI window decided by the kernel, as this patch
> does the hard coding for a [0x8000000, 0x8100000) range.

I would ideally turn it around and provide that range information to
the kernel and totally ignore the SW_MSI reserved region once
userspace provides it.

The SW_MSI range then becomes something just used "by default".

Haven't thought about exactly which ioctl could do
this.. SET_OPTION(SW_MSI) on the idevice perhaps?

It seems pretty simple to do?

We will eventually need a way for userspace to disable SW_MSI entirely
anyhow.

> I have been going through the structures between QEMU's SMMU code
> and virt/virt-acpi-build code, yet having a hard time to figure
> out a way to forward the MSI window from the SMMU code to IORT,
> especially after this series changes the "smmu" instance creation
> from virt code to "-device" string. Any thought?

You probably have to solve this eventually because when the kernel
supports a non-RMR path the IORT code will need to not create the RMR
too.

Using RMR, or not, and the address to put the SW_MSI, is probably part
of the global machine configuration in qemu.

Jason
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Nicolin Chen 1 year, 1 month ago
On Tue, Dec 10, 2024 at 08:48:21PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 10, 2024 at 03:01:48PM -0800, Nicolin Chen wrote:
> 
> > Yet, here we seem to be missing a pathway between VMM and kernel
> > to agree on the MSI window decided by the kernel, as this patch
> > does the hard coding for a [0x8000000, 0x8100000) range.
> 
> I would ideally turn it around and provide that range information to
> the kernel and totally ignore the SW_MSI reserved region once
> userspace provides it.

Hmm.. that sounds like a uAPI for vITS range..but yes..

> The SW_MSI range then becomes something just used "by default".
>
> Haven't thought about exactly which ioctl could do
> this.. SET_OPTION(SW_MSI) on the idevice perhaps?
> 
> It seems pretty simple to do?

That looks like a good interface, given that we are already
making sw_msi_list per ictx.

So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
info from kernel. Likely need a second call for its length?
Since IOMMU_OPTION only supports one val64 input or output.

> We will eventually need a way for userspace to disable SW_MSI entirely
> anyhow.

> > I have been going through the structures between QEMU's SMMU code
> > and virt/virt-acpi-build code, yet having a hard time to figure
> > out a way to forward the MSI window from the SMMU code to IORT,
> > especially after this series changes the "smmu" instance creation
> > from virt code to "-device" string. Any thought?
> 
> You probably have to solve this eventually because when the kernel
> supports a non-RMR path the IORT code will need to not create the RMR
> too.
> 
> Using RMR, or not, and the address to put the SW_MSI, is probably part
> of the global machine configuration in qemu.

Yes, either vITS or RMR range is in the global machine code.
So, likely it's not ideal to go with HWPTs.

Thanks!
Nicolin
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Jason Gunthorpe 1 year, 1 month ago
On Tue, Dec 10, 2024 at 05:28:17PM -0800, Nicolin Chen wrote:
> > I would ideally turn it around and provide that range information to
> > the kernel and totally ignore the SW_MSI reserved region once
> > userspace provides it.
> 
> Hmm.. that sounds like a uAPI for vITS range..but yes..

It controls the window that the kernel uses to dynamically map the ITS
pages.

> So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
> info from kernel. Likely need a second call for its length?
> Since IOMMU_OPTION only supports one val64 input or output.

No, just forget about the kernel's SW_MSI region. The VMM uses this
API and overrides it and iommufd completely ignores SW_MSI.

There is nothing special about the range hard coded into the smmu
driver.

Jason
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Nicolin Chen 1 year, 1 month ago
On Wed, Dec 11, 2024 at 09:11:12AM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 10, 2024 at 05:28:17PM -0800, Nicolin Chen wrote:
> > > I would ideally turn it around and provide that range information to
> > > the kernel and totally ignore the SW_MSI reserved region once
> > > userspace provides it.
> > 
> > Hmm.. that sounds like a uAPI for vITS range..but yes..
> 
> It controls the window that the kernel uses to dynamically map the ITS
> pages.

Can we use SET_OPTION for vITS mapping (non-RMR solution) too? 

> > So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
> > info from kernel. Likely need a second call for its length?
> > Since IOMMU_OPTION only supports one val64 input or output.
> 
> No, just forget about the kernel's SW_MSI region. The VMM uses this
> API and overrides it and iommufd completely ignores SW_MSI.
> 
> There is nothing special about the range hard coded into the smmu
> driver.

OK. We will have SET_OPTION(IOMMU_OPTION_SW_MSI_START) and
SET_OPTION(IOMMU_OPTION_SW_MSI_LAST).

I think we will need some validation to the range too, although
iommufd doesn't have the information about the underlying ITS
driver: what if user space sets range to a page size, while the
ITS driver requires multiple pages?

Thanks
Nicolin
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Jason Gunthorpe 1 year, 1 month ago
On Wed, Dec 11, 2024 at 09:20:20AM -0800, Nicolin Chen wrote:
> On Wed, Dec 11, 2024 at 09:11:12AM -0400, Jason Gunthorpe wrote:
> > On Tue, Dec 10, 2024 at 05:28:17PM -0800, Nicolin Chen wrote:
> > > > I would ideally turn it around and provide that range information to
> > > > the kernel and totally ignore the SW_MSI reserved region once
> > > > userspace provides it.
> > > 
> > > Hmm.. that sounds like a uAPI for vITS range..but yes..
> > 
> > It controls the window that the kernel uses to dynamically map the ITS
> > pages.
> 
> Can we use SET_OPTION for vITS mapping (non-RMR solution) too? 

There are two parts to the vITS flow:

 1) mapping the phsical ITS page - I expect this to go through
    IOMMUFD_CMD_IOAS_MAP_FILE
 2) Conveying the MSI addr per-irq - this doesn't feel like set_option
    is quite the right fit since it is an array of msis

> 
> > > So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
> > > info from kernel. Likely need a second call for its length?
> > > Since IOMMU_OPTION only supports one val64 input or output.
> > 
> > No, just forget about the kernel's SW_MSI region. The VMM uses this
> > API and overrides it and iommufd completely ignores SW_MSI.
> > 
> > There is nothing special about the range hard coded into the smmu
> > driver.
> 
> OK. We will have SET_OPTION(IOMMU_OPTION_SW_MSI_START) and
> SET_OPTION(IOMMU_OPTION_SW_MSI_LAST).

Maybe length, but yes

> I think we will need some validation to the range too, although
> iommufd doesn't have the information about the underlying ITS
> driver: what if user space sets range to a page size, while the
> ITS driver requires multiple pages?

Ideally the kernel would detect and fail IRQ setup in these cases.

I suggest enforcing a minimal range of something XXM big at least,
then it won't happen.

Jason
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Nicolin Chen 1 year, 2 months ago
On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> To handle SMMUv3 nested stage support it is practical to
> expose the guest with reserved memory regions (RMRs)
> covering the IOVAs used by the host kernel to map
> physical MSI doorbells.

There has been an ongoing solution for MSI alternative:
https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/

So, I think we should keep this patch out of this series, instead
put it on top of the testing branch.

Thanks
Nicolin
RE: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Shameerali Kolothum Thodi via 1 year, 2 months ago

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, November 13, 2024 6:31 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions
> to handle MSI nested binding
> 
> On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> > From: Eric Auger <eric.auger@redhat.com>
> >
> > To handle SMMUv3 nested stage support it is practical to expose the
> > guest with reserved memory regions (RMRs) covering the IOVAs used by
> > the host kernel to map physical MSI doorbells.
> 
> There has been an ongoing solution for MSI alternative:
> https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
> 
> So, I think we should keep this patch out of this series, instead put it on top
> of the testing branch.

Yes. I think then we can support DT solution as well. 

On that MSI RFC above, have you seen Eric's earlier/initial proposal to bind the Guest MSI in
nested cases. IIRC, it was providing an IOCTL and then creating a mapping in the host.

I think this is the latest on that.
https://lore.kernel.org/linux-iommu/20210411114659.15051-4-eric.auger@redhat.com/

But not sure, why we then moved to RMR approach. Eric?

Thanks,
Shameer
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Eric Auger 1 year, 2 months ago
Hi Shameer,

On 11/14/24 09:48, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Wednesday, November 13, 2024 6:31 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions
>> to handle MSI nested binding
>>
>> On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
>>> From: Eric Auger <eric.auger@redhat.com>
>>>
>>> To handle SMMUv3 nested stage support it is practical to expose the
>>> guest with reserved memory regions (RMRs) covering the IOVAs used by
>>> the host kernel to map physical MSI doorbells.
>> There has been an ongoing solution for MSI alternative:
>> https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
>>
>> So, I think we should keep this patch out of this series, instead put it on top
>> of the testing branch.
> Yes. I think then we can support DT solution as well. 
>
> On that MSI RFC above, have you seen Eric's earlier/initial proposal to bind the Guest MSI in
> nested cases. IIRC, it was providing an IOCTL and then creating a mapping in the host.
>
> I think this is the latest on that.
> https://lore.kernel.org/linux-iommu/20210411114659.15051-4-eric.auger@redhat.com/
yes this is the latest before I stopped my VFIO integration efforts.
>
> But not sure, why we then moved to RMR approach. Eric?

This was indeed the 1st integration approach. Using RMR instead was
suggested by Jean-Philippe and I considered it as simpler (because we
needed the SET_MSI_BINDING iotcl) so I changed the approach.

Thanks

Eric
>
> Thanks,
> Shameer
>
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
Posted by Nicolin Chen 1 year, 2 months ago
On Thu, Nov 14, 2024 at 11:41:58AM +0100, Eric Auger wrote:
> Hi Shameer,
> 
> On 11/14/24 09:48, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Nicolin Chen <nicolinc@nvidia.com>
> >> Sent: Wednesday, November 13, 2024 6:31 PM
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> >> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> >> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> >> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> >> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org
> >> Subject: Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions
> >> to handle MSI nested binding
> >>
> >> On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> >>> From: Eric Auger <eric.auger@redhat.com>
> >>>
> >>> To handle SMMUv3 nested stage support it is practical to expose the
> >>> guest with reserved memory regions (RMRs) covering the IOVAs used by
> >>> the host kernel to map physical MSI doorbells.
> >> There has been an ongoing solution for MSI alternative:
> >> https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
> >>
> >> So, I think we should keep this patch out of this series, instead put it on top
> >> of the testing branch.
> > Yes. I think then we can support DT solution as well. 
> >
> > On that MSI RFC above, have you seen Eric's earlier/initial proposal to bind the Guest MSI in
> > nested cases. IIRC, it was providing an IOCTL and then creating a mapping in the host.
> >
> > I think this is the latest on that.
> > https://lore.kernel.org/linux-iommu/20210411114659.15051-4-eric.auger@redhat.com/
> yes this is the latest before I stopped my VFIO integration efforts.
> >
> > But not sure, why we then moved to RMR approach. Eric?
> 
> This was indeed the 1st integration approach. Using RMR instead was
> suggested by Jean-Philippe and I considered it as simpler (because we
> needed the SET_MSI_BINDING iotcl) so I changed the approach.

Oh, I didn't realized Eric had this..

Now, Robin wanted it back (in iommufd though), against the RMR :-/

Nicolin