On 2026/1/26 18:43, Shameer Kolothum wrote:
> Add pcie_insert_capability(), a helper to insert a PCIe extended
> capability into an existing extended capability list at a caller
> specified offset.
>
> Unlike pcie_add_capability(), which always appends a capability to the
> end of the list, this helper preserves the existing list ordering while
> allowing insertion at an arbitrary offset.
>
> The helper only validates that the insertion does not overwrite an
> existing PCIe extended capability header, since corrupting a header
> would break the extended capability linked list. Validation of overlaps
> with other configuration space registers or capability-specific
> register blocks is left to the caller.
>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> ---
> hw/pci/pcie.c | 69 +++++++++++++++++++++++++++++++++++++++++++
> include/hw/pci/pcie.h | 2 ++
> 2 files changed, 71 insertions(+)
>
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b302de6419..aa9024e532 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -1050,6 +1050,75 @@ static void pcie_ext_cap_set_next(PCIDevice *dev, uint16_t pos, uint16_t next)
> pci_set_long(dev->config + pos, header);
> }
>
> +/*
> + * Insert a PCIe extended capability at a given offset.
> + *
> + * This helper only validates that the insertion does not overwrite an
> + * existing PCIe extended capability header, as corrupting a header would
> + * break the extended capability linked list.
> + *
> + * The caller must ensure that (offset, size) does not overlap with other
> + * registers or capability-specific register blocks. Overlaps with
> + * capability-specific registers are not checked and are considered a
> + * user-controlled override.
> + *
> + * Note: Best effort helper. The PCIe spec does not require extended
> + * capabilities to be ordered, but most devices use a forward-linked list.
> + * Devices that do not consistently use a forward-linked list may cause
> + * insertion to fail.
> + */
> +bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t cap_ver,
> + uint16_t offset, uint16_t size)
> +{
> + uint16_t pos = PCI_CONFIG_SPACE_SIZE, prev = 0;
> + uint32_t header;
> +
> + assert(pci_is_express(dev));
> +
> + if (!QEMU_IS_ALIGNED(offset, PCI_EXT_CAP_ALIGN) ||
> + size < 8 ||
> + offset < PCI_CONFIG_SPACE_SIZE ||
> + offset >= PCIE_CONFIG_SPACE_SIZE ||
> + offset + size > PCIE_CONFIG_SPACE_SIZE) {
> + return false;
> + }
> +
> + header = pci_get_long(dev->config + pos);
> + if (!header) {
> + /* No extended capability present, insertion must be at the ECAP head */
> + if (offset != pos) {
> + return false;
> + }
> + pci_set_long(dev->config + pos, PCI_EXT_CAP(cap_id, cap_ver, 0));
> + goto out;
> + }
> +
> + while (header && pos && offset >= pos) {
> + uint16_t next = PCI_EXT_CAP_NEXT(header);
> +
> + /* Reject insertion inside an existing ECAP header (4 bytes) */
> + if (offset < pos + PCI_EXT_CAP_ALIGN) {
> + return false;
> + }
TBH. I was expecting to see a table that is similar with
pci_ext_cap_length[][1], and a helper to walk the ext cap list
to figure out the spared pos. But it might be over-enginering as
we rely more on user to give an offset that is for sure no conflict
with existing ecaps nor hidden registers. Could you also add a comment
in the code to note it?
[1]
https://github.com/torvalds/linux/blob/63804fed149a6750ffd28610c5c1c98cce6bd377/drivers/vfio/pci/vfio_pci_config.c#L71
With the above nit, LGTM.
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> + prev = pos;
> + pos = next;
> + header = pos ? pci_get_long(dev->config + pos) : 0;
> + }
> +
> + pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, pos));
> + if (prev) {
> + pcie_ext_cap_set_next(dev, prev, offset);
> + }
> +
> +out:
> + /* Make capability read-only by default */
> + memset(dev->wmask + offset, 0, size);
> + memset(dev->w1cmask + offset, 0, size);
> + /* Check capability by default */
> + memset(dev->cmask + offset, 0xFF, size);
> + return true;
> +}
> /*
> * Caller must supply valid (offset, size) such that the range wouldn't
> * overlap with other capability or other registers.
> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> index c880ae1e04..d68bfa6257 100644
> --- a/include/hw/pci/pcie.h
> +++ b/include/hw/pci/pcie.h
> @@ -133,6 +133,8 @@ uint16_t pcie_find_capability(PCIDevice *dev, uint16_t cap_id);
> void pcie_add_capability(PCIDevice *dev,
> uint16_t cap_id, uint8_t cap_ver,
> uint16_t offset, uint16_t size);
> +bool pcie_insert_capability(PCIDevice *dev, uint16_t cap_id, uint8_t cap_ver,
> + uint16_t offset, uint16_t size);
> void pcie_sync_bridge_lnk(PCIDevice *dev);
>
> void pcie_acs_init(PCIDevice *dev, uint16_t offset);