[PATCH v9 34/37] hw/pci: Add helper to insert PCIe extended capability at a fixed offset

Shameer Kolothum posted 37 patches 2 weeks ago
Maintainers: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <anisinha@redhat.com>, Shannon Zhao <shannon.zhaosl@gmail.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>
[PATCH v9 34/37] hw/pci: Add helper to insert PCIe extended capability at a fixed offset
Posted by Shameer Kolothum 2 weeks ago
Add pcie_insert_capability(), a helper to insert a PCIe extended
capability into an existing extended capability list at a caller
specified offset.

Unlike pcie_add_capability(), which always appends a capability to the
end of the list, this helper preserves the existing list ordering while
allowing insertion at an arbitrary offset.

The helper only validates that the insertion does not overwrite an
existing PCIe extended capability header, since corrupting a header
would break the extended capability linked list. Validation of overlaps
with other configuration space registers or capability-specific
register blocks is left to the caller.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
 hw/pci/pcie.c         | 69 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci/pcie.h |  2 ++
 2 files changed, 71 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b302de6419..aa9024e532 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1050,6 +1050,75 @@ static void pcie_ext_cap_set_next(PCIDevice *dev, uint16_t pos, uint16_t next)
     pci_set_long(dev->config + pos, header);
 }
 
+/*
+ * Insert a PCIe extended capability at a given offset.
+ *
+ * This helper only validates that the insertion does not overwrite an
+ * existing PCIe extended capability header, as corrupting a header would
+ * break the extended capability linked list.
+ *
+ * The caller must ensure that (offset, size) does not overlap with other
+ * registers or capability-specific register blocks. Overlaps with
+ * capability-specific registers are not checked and are considered a
+ * user-controlled override.
+ *
+ * Note: Best effort helper. The PCIe spec does not require extended
+ * capabilities to be ordered, but most devices use a forward-linked list.
+ * Devices that do not consistently use a forward-linked list may cause
+ * insertion to fail.
+ */
+bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t cap_ver,
+                            uint16_t offset, uint16_t size)
+{
+    uint16_t pos = PCI_CONFIG_SPACE_SIZE, prev = 0;
+    uint32_t header;
+
+    assert(pci_is_express(dev));
+
+    if (!QEMU_IS_ALIGNED(offset, PCI_EXT_CAP_ALIGN) ||
+        size < 8 ||
+        offset < PCI_CONFIG_SPACE_SIZE ||
+        offset >= PCIE_CONFIG_SPACE_SIZE ||
+        offset + size > PCIE_CONFIG_SPACE_SIZE) {
+        return false;
+    }
+
+    header = pci_get_long(dev->config + pos);
+    if (!header) {
+        /* No extended capability present, insertion must be at the ECAP head */
+        if (offset != pos) {
+            return false;
+        }
+        pci_set_long(dev->config + pos, PCI_EXT_CAP(cap_id, cap_ver, 0));
+        goto out;
+    }
+
+    while (header && pos && offset >= pos) {
+        uint16_t next = PCI_EXT_CAP_NEXT(header);
+
+        /* Reject insertion inside an existing ECAP header (4 bytes) */
+        if (offset < pos + PCI_EXT_CAP_ALIGN) {
+            return false;
+        }
+
+        prev = pos;
+        pos = next;
+        header = pos ? pci_get_long(dev->config + pos) : 0;
+    }
+
+    pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, pos));
+    if (prev) {
+        pcie_ext_cap_set_next(dev, prev, offset);
+    }
+
+out:
+    /* Make capability read-only by default */
+    memset(dev->wmask + offset, 0, size);
+    memset(dev->w1cmask + offset, 0, size);
+    /* Check capability by default */
+    memset(dev->cmask + offset, 0xFF, size);
+    return true;
+}
 /*
  * Caller must supply valid (offset, size) such that the range wouldn't
  * overlap with other capability or other registers.
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index c880ae1e04..d68bfa6257 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -133,6 +133,8 @@ uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id);
 void pcie_add_capability(PCIDevice *dev,
                          uint16_t cap_id, uint8_t cap_ver,
                          uint16_t offset, uint16_t size);
+bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t cap_ver,
+                            uint16_t offset, uint16_t size);
 void pcie_sync_bridge_lnk(PCIDevice *dev);
 
 void pcie_acs_init(PCIDevice *dev, uint16_t offset);
-- 
2.43.0
Re: [PATCH v9 34/37] hw/pci: Add helper to insert PCIe extended capability at a fixed offset
Posted by Yi Liu 2 weeks ago
On 2026/1/26 18:43, Shameer Kolothum wrote:
> Add pcie_insert_capability(), a helper to insert a PCIe extended
> capability into an existing extended capability list at a caller
> specified offset.
> 
> Unlike pcie_add_capability(), which always appends a capability to the
> end of the list, this helper preserves the existing list ordering while
> allowing insertion at an arbitrary offset.
> 
> The helper only validates that the insertion does not overwrite an
> existing PCIe extended capability header, since corrupting a header
> would break the extended capability linked list. Validation of overlaps
> with other configuration space registers or capability-specific
> register blocks is left to the caller.
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> ---
>   hw/pci/pcie.c         | 69 +++++++++++++++++++++++++++++++++++++++++++
>   include/hw/pci/pcie.h |  2 ++
>   2 files changed, 71 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b302de6419..aa9024e532 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -1050,6 +1050,75 @@ static void pcie_ext_cap_set_next(PCIDevice *dev, uint16_t pos, uint16_t next)
>       pci_set_long(dev->config + pos, header);
>   }
>   
> +/*
> + * Insert a PCIe extended capability at a given offset.
> + *
> + * This helper only validates that the insertion does not overwrite an
> + * existing PCIe extended capability header, as corrupting a header would
> + * break the extended capability linked list.
> + *
> + * The caller must ensure that (offset, size) does not overlap with other
> + * registers or capability-specific register blocks. Overlaps with
> + * capability-specific registers are not checked and are considered a
> + * user-controlled override.
> + *
> + * Note: Best effort helper. The PCIe spec does not require extended
> + * capabilities to be ordered, but most devices use a forward-linked list.
> + * Devices that do not consistently use a forward-linked list may cause
> + * insertion to fail.
> + */
> +bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t cap_ver,
> +                            uint16_t offset, uint16_t size)
> +{
> +    uint16_t pos = PCI_CONFIG_SPACE_SIZE, prev = 0;
> +    uint32_t header;
> +
> +    assert(pci_is_express(dev));
> +
> +    if (!QEMU_IS_ALIGNED(offset, PCI_EXT_CAP_ALIGN) ||
> +        size < 8 ||
> +        offset < PCI_CONFIG_SPACE_SIZE ||
> +        offset >= PCIE_CONFIG_SPACE_SIZE ||
> +        offset + size > PCIE_CONFIG_SPACE_SIZE) {
> +        return false;
> +    }
> +
> +    header = pci_get_long(dev->config + pos);
> +    if (!header) {
> +        /* No extended capability present, insertion must be at the ECAP head */
> +        if (offset != pos) {
> +            return false;
> +        }
> +        pci_set_long(dev->config + pos, PCI_EXT_CAP(cap_id, cap_ver, 0));
> +        goto out;
> +    }
> +
> +    while (header && pos && offset >= pos) {
> +        uint16_t next = PCI_EXT_CAP_NEXT(header);
> +
> +        /* Reject insertion inside an existing ECAP header (4 bytes) */
> +        if (offset < pos + PCI_EXT_CAP_ALIGN) {
> +            return false;
> +        }

TBH. I was expecting to see a table that is similar with 
pci_ext_cap_length[][1], and a helper to walk the ext cap list
to figure out the spared pos. But it might be over-enginering as
we rely more on user to give an offset that is for sure no conflict
with existing ecaps nor hidden registers. Could you also add a comment
in the code to note it?

[1] 
https://github.com/torvalds/linux/blob/63804fed149a6750ffd28610c5c1c98cce6bd377/drivers/vfio/pci/vfio_pci_config.c#L71

With the above nit, LGTM.

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> +        prev = pos;
> +        pos = next;
> +        header = pos ? pci_get_long(dev->config + pos) : 0;
> +    }
> +
> +    pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, pos));
> +    if (prev) {
> +        pcie_ext_cap_set_next(dev, prev, offset);
> +    }
> +
> +out:
> +    /* Make capability read-only by default */
> +    memset(dev->wmask + offset, 0, size);
> +    memset(dev->w1cmask + offset, 0, size);
> +    /* Check capability by default */
> +    memset(dev->cmask + offset, 0xFF, size);
> +    return true;
> +}
>   /*
>    * Caller must supply valid (offset, size) such that the range wouldn't
>    * overlap with other capability or other registers.
> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> index c880ae1e04..d68bfa6257 100644
> --- a/include/hw/pci/pcie.h
> +++ b/include/hw/pci/pcie.h
> @@ -133,6 +133,8 @@ uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id);
>   void pcie_add_capability(PCIDevice *dev,
>                            uint16_t cap_id, uint8_t cap_ver,
>                            uint16_t offset, uint16_t size);
> +bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t cap_ver,
> +                            uint16_t offset, uint16_t size);
>   void pcie_sync_bridge_lnk(PCIDevice *dev);
>   
>   void pcie_acs_init(PCIDevice *dev, uint16_t offset);
RE: [PATCH v9 34/37] hw/pci: Add helper to insert PCIe extended capability at a fixed offset
Posted by Shameer Kolothum 2 weeks ago
Hi Yi,

> -----Original Message-----
> From: Yi Liu <yi.l.liu@intel.com>
> Sent: 26 January 2026 12:08
> To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org
> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; Nicolin Chen <nicolinc@nvidia.com>; ddutile@redhat.com;
> berrange@redhat.com; clg@redhat.com; alex@shazbot.org; Nathan Chen
> <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; Krishnakant Jaju
> <kjaju@nvidia.com>; Michael S . Tsirkin <mst@redhat.com>
> Subject: Re: [PATCH v9 34/37] hw/pci: Add helper to insert PCIe extended
> capability at a fixed offset
> 
> External email: Use caution opening links or attachments
> 
> 
> On 2026/1/26 18:43, Shameer Kolothum wrote:
> > Add pcie_insert_capability(), a helper to insert a PCIe extended
> > capability into an existing extended capability list at a caller
> > specified offset.
> >
> > Unlike pcie_add_capability(), which always appends a capability to the
> > end of the list, this helper preserves the existing list ordering while
> > allowing insertion at an arbitrary offset.
> >
> > The helper only validates that the insertion does not overwrite an
> > existing PCIe extended capability header, since corrupting a header
> > would break the extended capability linked list. Validation of overlaps
> > with other configuration space registers or capability-specific
> > register blocks is left to the caller.
> >
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> > ---
> >   hw/pci/pcie.c         | 69
> +++++++++++++++++++++++++++++++++++++++++++
> >   include/hw/pci/pcie.h |  2 ++
> >   2 files changed, 71 insertions(+)
> >
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index b302de6419..aa9024e532 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -1050,6 +1050,75 @@ static void pcie_ext_cap_set_next(PCIDevice
> *dev, uint16_t pos, uint16_t next)
> >       pci_set_long(dev->config + pos, header);
> >   }
> >
> > +/*
> > + * Insert a PCIe extended capability at a given offset.
> > + *
> > + * This helper only validates that the insertion does not overwrite an
> > + * existing PCIe extended capability header, as corrupting a header would
> > + * break the extended capability linked list.
> > + *
> > + * The caller must ensure that (offset, size) does not overlap with other
> > + * registers or capability-specific register blocks. Overlaps with
> > + * capability-specific registers are not checked and are considered a
> > + * user-controlled override.
> > + *
> > + * Note: Best effort helper. The PCIe spec does not require extended
> > + * capabilities to be ordered, but most devices use a forward-linked list.
> > + * Devices that do not consistently use a forward-linked list may cause
> > + * insertion to fail.
> > + */
> > +bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t
> cap_ver,
> > +                            uint16_t offset, uint16_t size)
> > +{
> > +    uint16_t pos = PCI_CONFIG_SPACE_SIZE, prev = 0;
> > +    uint32_t header;
> > +
> > +    assert(pci_is_express(dev));
> > +
> > +    if (!QEMU_IS_ALIGNED(offset, PCI_EXT_CAP_ALIGN) ||
> > +        size < 8 ||
> > +        offset < PCI_CONFIG_SPACE_SIZE ||
> > +        offset >= PCIE_CONFIG_SPACE_SIZE ||
> > +        offset + size > PCIE_CONFIG_SPACE_SIZE) {
> > +        return false;
> > +    }
> > +
> > +    header = pci_get_long(dev->config + pos);
> > +    if (!header) {
> > +        /* No extended capability present, insertion must be at the ECAP head
> */
> > +        if (offset != pos) {
> > +            return false;
> > +        }
> > +        pci_set_long(dev->config + pos, PCI_EXT_CAP(cap_id, cap_ver, 0));
> > +        goto out;
> > +    }
> > +
> > +    while (header && pos && offset >= pos) {
> > +        uint16_t next = PCI_EXT_CAP_NEXT(header);
> > +
> > +        /* Reject insertion inside an existing ECAP header (4 bytes) */
> > +        if (offset < pos + PCI_EXT_CAP_ALIGN) {
> > +            return false;
> > +        }
> 
> TBH. I was expecting to see a table that is similar with
> pci_ext_cap_length[][1], and a helper to walk the ext cap list
> to figure out the spared pos. But it might be over-enginering as
> we rely more on user to give an offset that is for sure no conflict
> with existing ecaps nor hidden registers. Could you also add a comment
> in the code to note it?
> 
> [1]
> https://github.com/torvalds/linux/blob/63804fed149a6750ffd28610c5c1c9
> 8cce6bd377/drivers/vfio/pci/vfio_pci_config.c#L71
> 
> With the above nit, LGTM.
> 
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>

Thanks.

Yes, the helper depends on the caller to provide a suitable offset, since
it's not generally possible to validate this due to hidden registers and
device specific capability layouts, and this is already noted in the commit
log and function comment.

If you had a specific wording in mind that would make this clearer, please
let me know and I can incorporate it if this requires a respin.

Shameer
 
Re: [PATCH v9 34/37] hw/pci: Add helper to insert PCIe extended capability at a fixed offset
Posted by Yi Liu 1 week, 6 days ago
On 2026/1/26 22:17, Shameer Kolothum wrote:
> Hi Yi,
> 

>>> +/*
>>> + * Insert a PCIe extended capability at a given offset.
>>> + *
>>> + * This helper only validates that the insertion does not overwrite an
>>> + * existing PCIe extended capability header, as corrupting a header would
>>> + * break the extended capability linked list.
>>> + *
>>> + * The caller must ensure that (offset, size) does not overlap with other
>>> + * registers or capability-specific register blocks. Overlaps with
>>> + * capability-specific registers are not checked and are considered a
>>> + * user-controlled override.
>>> + *
>>> + * Note: Best effort helper. The PCIe spec does not require extended
>>> + * capabilities to be ordered, but most devices use a forward-linked list.
>>> + * Devices that do not consistently use a forward-linked list may cause
>>> + * insertion to fail.
>>> + */
>>> +bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t
>> cap_ver,
>>> +                            uint16_t offset, uint16_t size)
>>> +{
>>> +    uint16_t pos = PCI_CONFIG_SPACE_SIZE, prev = 0;
>>> +    uint32_t header;
>>> +
>>> +    assert(pci_is_express(dev));
>>> +
>>> +    if (!QEMU_IS_ALIGNED(offset, PCI_EXT_CAP_ALIGN) ||
>>> +        size < 8 ||
>>> +        offset < PCI_CONFIG_SPACE_SIZE ||
>>> +        offset >= PCIE_CONFIG_SPACE_SIZE ||
>>> +        offset + size > PCIE_CONFIG_SPACE_SIZE) {
>>> +        return false;
>>> +    }
>>> +
>>> +    header = pci_get_long(dev->config + pos);
>>> +    if (!header) {
>>> +        /* No extended capability present, insertion must be at the ECAP head
>> */
>>> +        if (offset != pos) {
>>> +            return false;
>>> +        }
>>> +        pci_set_long(dev->config + pos, PCI_EXT_CAP(cap_id, cap_ver, 0));
>>> +        goto out;
>>> +    }
>>> +
>>> +    while (header && pos && offset >= pos) {
>>> +        uint16_t next = PCI_EXT_CAP_NEXT(header);
>>> +
>>> +        /* Reject insertion inside an existing ECAP header (4 bytes) */
>>> +        if (offset < pos + PCI_EXT_CAP_ALIGN) {
>>> +            return false;
>>> +        }
>>
>> TBH. I was expecting to see a table that is similar with
>> pci_ext_cap_length[][1], and a helper to walk the ext cap list
>> to figure out the spared pos. But it might be over-enginering as
>> we rely more on user to give an offset that is for sure no conflict
>> with existing ecaps nor hidden registers. Could you also add a comment
>> in the code to note it?
>>
>> [1]
>> https://github.com/torvalds/linux/blob/63804fed149a6750ffd28610c5c1c9
>> 8cce6bd377/drivers/vfio/pci/vfio_pci_config.c#L71
>>
>> With the above nit, LGTM.
>>
>> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> 
> Thanks.
> 
> Yes, the helper depends on the caller to provide a suitable offset, since
> it's not generally possible to validate this due to hidden registers and
> device specific capability layouts, and this is already noted in the commit
> log and function comment.
> 
> If you had a specific wording in mind that would make this clearer, please
> let me know and I can incorporate it if this requires a respin.

no respin needed. Just noticed the comment in the function header is
enough. :)