[PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain

Julien Grall posted 1 patch 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20230915125204.22719-1-julien@xen.org
tools/libs/light/libxl_pci.c | 137 ++++++++++++++++++++---------------
1 file changed, 77 insertions(+), 60 deletions(-)
[PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain
Posted by Julien Grall 1 year ago
From: Julien Grall <jgrall@amazon.com>

Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
a PCI is attached (see pci_add_dm_done()) for all domain types. However,
the permissions are only revoked for non-HVM domain (see do_pci_remove()).

This means that HVM domains will be left with extra permissions. While
this look bad on the paper, the IRQ permissions should be revoked
when the Device Model call xc_physdev_unmap_pirq() and such domain
cannot directly mapped I/O port and IOMEM regions. Instead, this has to
be done by a Device Model.

The Device Model can only run in dom0 or PV stubdomain (upstream libxl
doesn't have support for HVM/PVH stubdomain).

For PV/PVH stubdomain, the permission are properly revoked, so there is
no security concern.

This leaves dom0. There are two cases:
  1) Privileged: Anyone gaining access to the Device Model would already
     have large control on the host.
  2) Deprivileged: PCI passthrough require PHYSDEV operations which
     are not accessible when the Device Model is restricted.

So overall, it is believed that the extra permissions cannot be exploited.

Rework the code so the permissions are all removed for HVM domains.
This needs to happen after the QEMU has detached the device. So
the revocation is now moved to pci_remove_detached().

Also add a comment on top of the error message when the PIRQ cannot
be unbind to explain this could be a spurious error as QEMU may have
already done it.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

    Changes since v1:
        * Move the code to revoke in pci_remove_detached()
        * Add a comment on top of the PIRQ unbind error path
        * Use goto to deal with errors.
---
 tools/libs/light/libxl_pci.c | 137 ++++++++++++++++++++---------------
 1 file changed, 77 insertions(+), 60 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7f5f170e6eb0..96cb4da0794e 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1968,7 +1968,6 @@ static void do_pci_remove(libxl__egc *egc, pci_remove_state *prs)
         goto out_fail;
     }
 
-    rc = ERROR_FAIL;
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
         prs->hvm = true;
         switch (libxl__device_model_version_running(gc, domid)) {
@@ -1989,65 +1988,7 @@ static void do_pci_remove(libxl__egc *egc, pci_remove_state *prs)
             rc = ERROR_INVAL;
             goto out_fail;
         }
-    } else {
-        char *sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/resource", pci->domain,
-                                     pci->bus, pci->dev, pci->func);
-        FILE *f = fopen(sysfs_path, "r");
-        unsigned int start = 0, end = 0, flags = 0, size = 0;
-        int irq = 0;
-        int i;
-
-        if (f == NULL) {
-            LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-            goto skip1;
-        }
-        for (i = 0; i < PROC_PCI_NUM_RESOURCES; i++) {
-            if (fscanf(f, "0x%x 0x%x 0x%x\n", &start, &end, &flags) != 3)
-                continue;
-            size = end - start + 1;
-            if (start) {
-                if (flags & PCI_BAR_IO) {
-                    rc = xc_domain_ioport_permission(ctx->xch, domid, start, size, 0);
-                    if (rc < 0)
-                        LOGED(ERROR, domid,
-                              "xc_domain_ioport_permission error 0x%x/0x%x",
-                              start,
-                              size);
-                } else {
-                    rc = xc_domain_iomem_permission(ctx->xch, domid, start>>XC_PAGE_SHIFT,
-                                                    (size+(XC_PAGE_SIZE-1))>>XC_PAGE_SHIFT, 0);
-                    if (rc < 0)
-                        LOGED(ERROR, domid,
-                              "xc_domain_iomem_permission error 0x%x/0x%x",
-                              start,
-                              size);
-                }
-            }
-        }
-        fclose(f);
-skip1:
-        if (!pci_supp_legacy_irq())
-            goto skip_irq;
-        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
-                               pci->bus, pci->dev, pci->func);
-        f = fopen(sysfs_path, "r");
-        if (f == NULL) {
-            LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-            goto skip_irq;
-        }
-        if ((fscanf(f, "%u", &irq) == 1) && irq) {
-            rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
-            if (rc < 0) {
-                LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
-            }
-            rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
-            if (rc < 0) {
-                LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
-            }
-        }
-        fclose(f);
     }
-skip_irq:
     rc = 0;
 out_fail:
     pci_remove_detached(egc, prs, rc); /* must be last */
@@ -2226,7 +2167,11 @@ static void pci_remove_detached(libxl__egc *egc,
                                 int rc)
 {
     STATE_AO_GC(prs->aodev->ao);
-    int stubdomid = 0;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    unsigned int start = 0, end = 0, flags = 0, size = 0;
+    int  irq = 0, i, stubdomid = 0;
+    const char *sysfs_path;
+    FILE *f;
     uint32_t domainid = prs->domid;
     bool isstubdom;
 
@@ -2242,6 +2187,78 @@ static void pci_remove_detached(libxl__egc *egc,
     if (rc && !prs->force)
         goto out;
 
+    /* Revoke the permissions */
+    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/resource",
+                           pci->domain, pci->bus, pci->dev, pci->func);
+
+    f = fopen(sysfs_path, "r");
+    if (f == NULL) {
+        LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
+        goto skip_bar;
+    }
+
+    for (i = 0; i < PROC_PCI_NUM_RESOURCES; i++) {
+        if (fscanf(f, "0x%x 0x%x 0x%x\n", &start, &end, &flags) != 3)
+            continue;
+        size = end - start + 1;
+        if (start) {
+            if (flags & PCI_BAR_IO) {
+                rc = xc_domain_ioport_permission(ctx->xch, domid, start,
+                                                 size, 0);
+                if (rc < 0)
+                    LOGED(ERROR, domid,
+                          "xc_domain_ioport_permission error 0x%x/0x%x",
+                          start,
+                          size);
+            } else {
+                rc = xc_domain_iomem_permission(ctx->xch, domid,
+                                                start >> XC_PAGE_SHIFT,
+                                                (size + (XC_PAGE_SIZE - 1)) >> XC_PAGE_SHIFT,
+                                                0);
+                if (rc < 0)
+                    LOGED(ERROR, domid,
+                          "xc_domain_iomem_permission error 0x%x/0x%x",
+                          start,
+                          size);
+            }
+        }
+    }
+    fclose(f);
+
+skip_bar:
+    if (!pci_supp_legacy_irq())
+        goto skip_legacy_irq;
+
+    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+                           pci->bus, pci->dev, pci->func);
+
+    f = fopen(sysfs_path, "r");
+    if (f == NULL) {
+        LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
+        goto skip_legacy_irq;
+    }
+
+    if ((fscanf(f, "%u", &irq) == 1) && irq) {
+        rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
+        if (rc < 0) {
+            /*
+             * QEMU may have already unmapped the IRQ. So the error
+             * may be spurious. For now, still print an error message as
+             * it is not easy to distinguished between valid and
+             * spurious error.
+             */
+            LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
+        }
+        rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
+        if (rc < 0) {
+            LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
+        }
+    }
+
+    fclose(f);
+
+skip_legacy_irq:
+
     isstubdom = libxl_is_stubdom(CTX, domid, &domainid);
 
     /* don't do multiple resets while some functions are still passed through */
-- 
2.40.1
Re: [PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain
Posted by Anthony PERARD 12 months ago
On Fri, Sep 15, 2023 at 01:52:04PM +0100, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
> a PCI is attached (see pci_add_dm_done()) for all domain types. However,
> the permissions are only revoked for non-HVM domain (see do_pci_remove()).
> 
> This means that HVM domains will be left with extra permissions. While
> this look bad on the paper, the IRQ permissions should be revoked
> when the Device Model call xc_physdev_unmap_pirq() and such domain
> cannot directly mapped I/O port and IOMEM regions. Instead, this has to
> be done by a Device Model.
> 
> The Device Model can only run in dom0 or PV stubdomain (upstream libxl
> doesn't have support for HVM/PVH stubdomain).
> 
> For PV/PVH stubdomain, the permission are properly revoked, so there is
> no security concern.
> 
> This leaves dom0. There are two cases:
>   1) Privileged: Anyone gaining access to the Device Model would already
>      have large control on the host.
>   2) Deprivileged: PCI passthrough require PHYSDEV operations which
>      are not accessible when the Device Model is restricted.
> 
> So overall, it is believed that the extra permissions cannot be exploited.
> 
> Rework the code so the permissions are all removed for HVM domains.
> This needs to happen after the QEMU has detached the device. So
> the revocation is now moved to pci_remove_detached().
> 
> Also add a comment on top of the error message when the PIRQ cannot
> be unbind to explain this could be a spurious error as QEMU may have
> already done it.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
> 
>     Changes since v1:
>         * Move the code to revoke in pci_remove_detached()
>         * Add a comment on top of the PIRQ unbind error path
>         * Use goto to deal with errors.

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD
Re: [PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain
Posted by Henry Wang 12 months ago
Hi Julien,

> On Sep 15, 2023, at 20:52, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
> a PCI is attached (see pci_add_dm_done()) for all domain types. However,
> the permissions are only revoked for non-HVM domain (see do_pci_remove()).
> 
> This means that HVM domains will be left with extra permissions. While
> this look bad on the paper, the IRQ permissions should be revoked
> when the Device Model call xc_physdev_unmap_pirq() and such domain
> cannot directly mapped I/O port and IOMEM regions. Instead, this has to
> be done by a Device Model.
> 
> The Device Model can only run in dom0 or PV stubdomain (upstream libxl
> doesn't have support for HVM/PVH stubdomain).
> 
> For PV/PVH stubdomain, the permission are properly revoked, so there is
> no security concern.
> 
> This leaves dom0. There are two cases:
>  1) Privileged: Anyone gaining access to the Device Model would already
>     have large control on the host.
>  2) Deprivileged: PCI passthrough require PHYSDEV operations which
>     are not accessible when the Device Model is restricted.
> 
> So overall, it is believed that the extra permissions cannot be exploited.
> 
> Rework the code so the permissions are all removed for HVM domains.
> This needs to happen after the QEMU has detached the device. So
> the revocation is now moved to pci_remove_detached().
> 
> Also add a comment on top of the error message when the PIRQ cannot
> be unbind to explain this could be a spurious error as QEMU may have
> already done it.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

As in discussion in v1, it is agreed that this patch should be included in
4.18, although technically my release-ack tag should be effective after
code freeze, I am still providing the tag to avoid possible confusion:

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry
Re: [PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain
Posted by Julien Grall 11 months, 4 weeks ago
Hi,

On 16/09/2023 01:11, Henry Wang wrote:
>> On Sep 15, 2023, at 20:52, Julien Grall <julien@xen.org> wrote:
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
>> a PCI is attached (see pci_add_dm_done()) for all domain types. However,
>> the permissions are only revoked for non-HVM domain (see do_pci_remove()).
>>
>> This means that HVM domains will be left with extra permissions. While
>> this look bad on the paper, the IRQ permissions should be revoked
>> when the Device Model call xc_physdev_unmap_pirq() and such domain
>> cannot directly mapped I/O port and IOMEM regions. Instead, this has to
>> be done by a Device Model.
>>
>> The Device Model can only run in dom0 or PV stubdomain (upstream libxl
>> doesn't have support for HVM/PVH stubdomain).
>>
>> For PV/PVH stubdomain, the permission are properly revoked, so there is
>> no security concern.
>>
>> This leaves dom0. There are two cases:
>>   1) Privileged: Anyone gaining access to the Device Model would already
>>      have large control on the host.
>>   2) Deprivileged: PCI passthrough require PHYSDEV operations which
>>      are not accessible when the Device Model is restricted.
>>
>> So overall, it is believed that the extra permissions cannot be exploited.
>>
>> Rework the code so the permissions are all removed for HVM domains.
>> This needs to happen after the QEMU has detached the device. So
>> the revocation is now moved to pci_remove_detached().
>>
>> Also add a comment on top of the error message when the PIRQ cannot
>> be unbind to explain this could be a spurious error as QEMU may have
>> already done it.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> As in discussion in v1, it is agreed that this patch should be included in
> 4.18, although technically my release-ack tag should be effective after
> code freeze, I am still providing the tag to avoid possible confusion:
> 
> Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Thanks. I have committed the patch with Anthony's reviewed-by tag.

Cheers,

-- 
Julien Grall