[PATCH v2] x86/msi: fix locking for SR-IOV devices

Stewart Hildebrand posted 1 patch 3 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20240807052011.582099-1-stewart.hildebrand@amd.com
There is a newer version of this series
xen/arch/x86/msi.c            | 23 ++++++++++-------------
xen/drivers/passthrough/pci.c |  7 +++++++
2 files changed, 17 insertions(+), 13 deletions(-)
[PATCH v2] x86/msi: fix locking for SR-IOV devices
Posted by Stewart Hildebrand 3 months, 2 weeks ago
In commit 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci
structure") a lock moved from allocate_and_map_msi_pirq() to the caller
and changed from pcidevs_lock() to read_lock(&d->pci_lock). However, one
call path wasn't updated to reflect the change, leading to a failed
assertion observed under the following conditions:

* PV dom0
* Debug build (debug=y) of Xen
* There is an SR-IOV device in the system with one or more VFs enabled
* Dom0 has loaded the driver for the VF and enabled MSI-X

(XEN) Assertion 'd || pcidevs_locked()' failed at drivers/passthrough/pci.c:535
(XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Not tainted ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d040284da8>] R pci_get_pdev+0x4c/0xab
(XEN)    [<ffff82d040344f5c>] F arch/x86/msi.c#read_pci_mem_bar+0x58/0x272
(XEN)    [<ffff82d04034530e>] F arch/x86/msi.c#msix_capability_init+0x198/0x755
(XEN)    [<ffff82d040345dad>] F arch/x86/msi.c#__pci_enable_msix+0x82/0xe8
(XEN)    [<ffff82d0403463e5>] F pci_enable_msi+0x3f/0x78
(XEN)    [<ffff82d04034be2b>] F map_domain_pirq+0x2a4/0x6dc
(XEN)    [<ffff82d04034d4d5>] F allocate_and_map_msi_pirq+0x103/0x262
(XEN)    [<ffff82d04035da5d>] F physdev_map_pirq+0x210/0x259
(XEN)    [<ffff82d04035e798>] F do_physdev_op+0x9c3/0x1454
(XEN)    [<ffff82d040329475>] F pv_hypercall+0x5ac/0x6af
(XEN)    [<ffff82d0402012d3>] F lstar_enter+0x143/0x150

In read_pci_mem_bar(), the VF obtains the pdev pointer for its
associated PF to access the vf_rlen array. This array is initialized in
pci_add_device() and doesn't change afterwards. Copy the array to the
pdev of the VF, and remove the troublesome call to pci_get_pdev().

Fixes: 4f78438b45e2 ("vpci: use per-domain PCI lock to protect vpci structure")
Reported-by: Teddy Astie <teddy.astie@vates.tech>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
Candidate for backport to 4.19

v1->v2:
* remove call to pci_get_pdev()
---
 xen/arch/x86/msi.c            | 23 ++++++++++-------------
 xen/drivers/passthrough/pci.c |  7 +++++++
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 0c97fbb3fc03..5b28fb19b78f 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -662,7 +662,8 @@ static int msi_capability_init(struct pci_dev *dev,
     return 0;
 }
 
-static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
+static u64 read_pci_mem_bar(struct pci_dev *pdev, u16 seg, u8 bus, u8 slot,
+                            u8 func, u8 bir, int vf)
 {
     u8 limit;
     u32 addr, base = PCI_BASE_ADDRESS_0;
@@ -670,19 +671,15 @@ static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
 
     if ( vf >= 0 )
     {
-        struct pci_dev *pdev = pci_get_pdev(NULL,
-                                            PCI_SBDF(seg, bus, slot, func));
+        pci_sbdf_t pf_sbdf = PCI_SBDF(seg, bus, slot, func);
         unsigned int pos;
         uint16_t ctrl, num_vf, offset, stride;
 
-        if ( !pdev )
-            return 0;
-
-        pos = pci_find_ext_capability(pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
-        ctrl = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_CTRL);
-        num_vf = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_NUM_VF);
-        offset = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_VF_OFFSET);
-        stride = pci_conf_read16(pdev->sbdf, pos + PCI_SRIOV_VF_STRIDE);
+        pos = pci_find_ext_capability(pf_sbdf, PCI_EXT_CAP_ID_SRIOV);
+        ctrl = pci_conf_read16(pf_sbdf, pos + PCI_SRIOV_CTRL);
+        num_vf = pci_conf_read16(pf_sbdf, pos + PCI_SRIOV_NUM_VF);
+        offset = pci_conf_read16(pf_sbdf, pos + PCI_SRIOV_VF_OFFSET);
+        stride = pci_conf_read16(pf_sbdf, pos + PCI_SRIOV_VF_STRIDE);
 
         if ( !pos ||
              !(ctrl & PCI_SRIOV_CTRL_VFE) ||
@@ -829,7 +826,7 @@ static int msix_capability_init(struct pci_dev *dev,
             vf = dev->sbdf.bdf;
         }
 
-        table_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        table_paddr = read_pci_mem_bar(dev, seg, pbus, pslot, pfunc, bir, vf);
         WARN_ON(msi && msi->table_base != table_paddr);
         if ( !table_paddr )
         {
@@ -852,7 +849,7 @@ static int msix_capability_init(struct pci_dev *dev,
 
         pba_offset = pci_conf_read32(dev->sbdf, msix_pba_offset_reg(pos));
         bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
-        pba_paddr = read_pci_mem_bar(seg, pbus, pslot, pfunc, bir, vf);
+        pba_paddr = read_pci_mem_bar(dev, seg, pbus, pslot, pfunc, bir, vf);
         WARN_ON(!pba_paddr);
         pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 5a446d3dcee0..3a6a6abe205e 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -654,6 +654,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
     const char *type;
     int ret;
     bool pf_is_extfn = false;
+    uint64_t vf_rlen[6] = { 0 };
 
     if ( !info )
         type = "device";
@@ -664,7 +665,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
                             PCI_SBDF(seg, info->physfn.bus,
                                      info->physfn.devfn));
         if ( pdev )
+        {
             pf_is_extfn = pdev->info.is_extfn;
+            memcpy(vf_rlen, pdev->vf_rlen, sizeof(pdev->vf_rlen));
+        }
         pcidevs_unlock();
         if ( !pdev )
             pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
@@ -700,7 +704,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
          * extended function.
          */
         if ( pdev->info.is_virtfn )
+        {
             pdev->info.is_extfn = pf_is_extfn;
+            memcpy(pdev->vf_rlen, vf_rlen, sizeof(pdev->vf_rlen));
+        }
     }
 
     if ( !pdev->info.is_virtfn && !pdev->vf_rlen[0] )

base-commit: 6b9b96ddebf269579730ff2a65f324505bc2aba9
-- 
2.46.0
Re: [PATCH v2] x86/msi: fix locking for SR-IOV devices
Posted by Jan Beulich 3 months, 2 weeks ago
On 07.08.2024 07:20, Stewart Hildebrand wrote:
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -662,7 +662,8 @@ static int msi_capability_init(struct pci_dev *dev,
>      return 0;
>  }
>  
> -static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
> +static u64 read_pci_mem_bar(struct pci_dev *pdev, u16 seg, u8 bus, u8 slot,
> +                            u8 func, u8 bir, int vf)
>  {

First I thought this was a leftover from the earlier version. But you need
it for accessing the vf_rlen[] field. Yet that's properly misleading,
especially when considering that the fix also wants backporting. What pdev
represents here changes. I think you want to pass in just vf_rlen (if we
really want to go this route; I'm a little wary of this repurposing of the
field, albeit I see no real technical issue).

Of course there's a BUILD_BUG_ON() which we need to get creative with, in
order to now outright drop it (see also below).

> @@ -670,19 +671,15 @@ static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
>  
>      if ( vf >= 0 )
>      {
> -        struct pci_dev *pdev = pci_get_pdev(NULL,
> -                                            PCI_SBDF(seg, bus, slot, func));
> +        pci_sbdf_t pf_sbdf = PCI_SBDF(seg, bus, slot, func);

I think this wants naming just "sbdf" and moving to function scope. There
are more places in the function which, in a subsequent change, could also
benefit from this new local variable.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -654,6 +654,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>      const char *type;
>      int ret;
>      bool pf_is_extfn = false;
> +    uint64_t vf_rlen[6] = { 0 };

The type of this variable needs to be tied to that of the struct field
you copy to/from. Otherwise, if the struct field changes type ...

> @@ -664,7 +665,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>                              PCI_SBDF(seg, info->physfn.bus,
>                                       info->physfn.devfn));
>          if ( pdev )
> +        {
>              pf_is_extfn = pdev->info.is_extfn;
> +            memcpy(vf_rlen, pdev->vf_rlen, sizeof(pdev->vf_rlen));

... there'll be nothing for the compiler to tell us. Taken together with
the BUILD_BUG_ON() related remark further up, I think you want to
introduce a typedef and/or struct here to make things properly typesafe
(as then you can avoid the use of memcpy()).

Seeing the conditional we're in, what if we take ...

> +        }
>          pcidevs_unlock();
>          if ( !pdev )
>              pci_add_device(seg, info->physfn.bus, info->physfn.devfn,

... this fallback path?

> @@ -700,7 +704,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>           * extended function.
>           */
>          if ( pdev->info.is_virtfn )
> +        {
>              pdev->info.is_extfn = pf_is_extfn;
> +            memcpy(pdev->vf_rlen, vf_rlen, sizeof(pdev->vf_rlen));
> +        }
>      }

Similarly here - what if the enclosing if()'s condition is false? Even
if these cases couldn't be properly taken care of, they'd at least need
discussing in the description. In this context note how in a subsequent
invocation of pci_add_device() for the PF the missing data in vf_rlen[]
would actually be populated into the placeholder struct that the
fallback invocation of pci_add_device() would have created. Yet the
previously created VF's struct wouldn't be updated (afaict). This was,
iirc, the main reason to always consult the PF's ->vf_rlen[].

An alternative approach might be to add a link from VF to PF, while
making sure that the PF struct won't be de-allocated until all its VFs
have gone away. That would then also allow to eliminate the problematic
pci_get_pdev().

Jan
Re: [PATCH v2] x86/msi: fix locking for SR-IOV devices
Posted by Stewart Hildebrand 3 months, 2 weeks ago
Hi Jan,

Thanks for the feedback.

On 8/7/24 11:21, Jan Beulich wrote:
> On 07.08.2024 07:20, Stewart Hildebrand wrote:
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -662,7 +662,8 @@ static int msi_capability_init(struct pci_dev *dev,
>>      return 0;
>>  }
>>  
>> -static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
>> +static u64 read_pci_mem_bar(struct pci_dev *pdev, u16 seg, u8 bus, u8 slot,
>> +                            u8 func, u8 bir, int vf)
>>  {
> 
> First I thought this was a leftover from the earlier version. But you need
> it for accessing the vf_rlen[] field. Yet that's properly misleading,
> especially when considering that the fix also wants backporting. What pdev
> represents here changes. I think you want to pass in just vf_rlen (if we
> really want to go this route; I'm a little wary of this repurposing of the
> field, albeit I see no real technical issue).

I like your idea below of using a struct, so I'll pass a pointer to the
new struct.

> Of course there's a BUILD_BUG_ON() which we need to get creative with, in
> order to now outright drop it (see also below).

I suppose this BUILD_BUG_ON() is redundant with the one in
pci_add_device()...

>> @@ -670,19 +671,15 @@ static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
>>  
>>      if ( vf >= 0 )
>>      {
>> -        struct pci_dev *pdev = pci_get_pdev(NULL,
>> -                                            PCI_SBDF(seg, bus, slot, func));
>> +        pci_sbdf_t pf_sbdf = PCI_SBDF(seg, bus, slot, func);
> 
> I think this wants naming just "sbdf" and moving to function scope. There
> are more places in the function which, in a subsequent change, could also
> benefit from this new local variable.

Will do.

>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -654,6 +654,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>      const char *type;
>>      int ret;
>>      bool pf_is_extfn = false;
>> +    uint64_t vf_rlen[6] = { 0 };
> 
> The type of this variable needs to be tied to that of the struct field
> you copy to/from. Otherwise, if the struct field changes type ...
> 
>> @@ -664,7 +665,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>                              PCI_SBDF(seg, info->physfn.bus,
>>                                       info->physfn.devfn));
>>          if ( pdev )
>> +        {
>>              pf_is_extfn = pdev->info.is_extfn;
>> +            memcpy(vf_rlen, pdev->vf_rlen, sizeof(pdev->vf_rlen));
> 
> ... there'll be nothing for the compiler to tell us. Taken together with
> the BUILD_BUG_ON() related remark further up, I think you want to
> introduce a typedef and/or struct here to make things properly typesafe
> (as then you can avoid the use of memcpy()).

Here's what I'm thinking:

struct vf_info {
    uint64_t vf_rlen[PCI_SRIOV_NUM_BARS];
    unsigned int refcnt;
};

struct pci_dev {
    ...
    struct vf_info *vf_info;
    ...
};

> Seeing the conditional we're in, what if we take ...
> 
>> +        }
>>          pcidevs_unlock();
>>          if ( !pdev )
>>              pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
> 
> ... this fallback path?

It seems we need another call to pci_get_pdev() here to obtain a
reference to the newly allocated vf_info from the PF's pdev.

>> @@ -700,7 +704,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>           * extended function.
>>           */
>>          if ( pdev->info.is_virtfn )
>> +        {
>>              pdev->info.is_extfn = pf_is_extfn;
>> +            memcpy(pdev->vf_rlen, vf_rlen, sizeof(pdev->vf_rlen));
>> +        }
>>      }
> 
> Similarly here - what if the enclosing if()'s condition is false? Even
> if these cases couldn't be properly taken care of, they'd at least need
> discussing in the description. In this context note how in a subsequent
> invocation of pci_add_device() for the PF the missing data in vf_rlen[]
> would actually be populated into the placeholder struct that the
> fallback invocation of pci_add_device() would have created. Yet the
> previously created VF's struct wouldn't be updated (afaict). This was,
> iirc, the main reason to always consult the PF's ->vf_rlen[].

Right. If info is NULL, either it's a PF in the fallback case, or the
toolstack invoked PHYSDEVOP_manage_pci_add, in which case we treat it as
a PF or non-SR-IOV device. Using PHYSDEVOP_manage_pci_add for a VF is
not a case we handle. We only know if it's a VF if the toolstack has
told us so.

> An alternative approach might be to add a link from VF to PF, while
> making sure that the PF struct won't be de-allocated until all its VFs
> have gone away. That would then also allow to eliminate the problematic
> pci_get_pdev().

I think I can add a link to a new reference-counted struct with just the
info needed (see the proposed struct above).
Re: [PATCH v2] x86/msi: fix locking for SR-IOV devices
Posted by Jan Beulich 3 months, 2 weeks ago
On 09.08.2024 06:09, Stewart Hildebrand wrote:
> On 8/7/24 11:21, Jan Beulich wrote:
>> On 07.08.2024 07:20, Stewart Hildebrand wrote:
>>> --- a/xen/arch/x86/msi.c
>>> +++ b/xen/arch/x86/msi.c
>>> @@ -662,7 +662,8 @@ static int msi_capability_init(struct pci_dev *dev,
>>>      return 0;
>>>  }
>>>  
>>> -static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
>>> +static u64 read_pci_mem_bar(struct pci_dev *pdev, u16 seg, u8 bus, u8 slot,
>>> +                            u8 func, u8 bir, int vf)
>>>  {
>>
>> First I thought this was a leftover from the earlier version. But you need
>> it for accessing the vf_rlen[] field. Yet that's properly misleading,
>> especially when considering that the fix also wants backporting. What pdev
>> represents here changes. I think you want to pass in just vf_rlen (if we
>> really want to go this route; I'm a little wary of this repurposing of the
>> field, albeit I see no real technical issue).
> 
> I like your idea below of using a struct, so I'll pass a pointer to the
> new struct.
> 
>> Of course there's a BUILD_BUG_ON() which we need to get creative with, in
>> order to now outright drop it (see also below).
> 
> I suppose this BUILD_BUG_ON() is redundant with the one in
> pci_add_device()...

"Redundant" can be positive or negative. Your response sounds as if you
thought one could be dropped, yet I think we want them in both places.

>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -654,6 +654,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>      const char *type;
>>>      int ret;
>>>      bool pf_is_extfn = false;
>>> +    uint64_t vf_rlen[6] = { 0 };
>>
>> The type of this variable needs to be tied to that of the struct field
>> you copy to/from. Otherwise, if the struct field changes type ...
>>
>>> @@ -664,7 +665,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>                              PCI_SBDF(seg, info->physfn.bus,
>>>                                       info->physfn.devfn));
>>>          if ( pdev )
>>> +        {
>>>              pf_is_extfn = pdev->info.is_extfn;
>>> +            memcpy(vf_rlen, pdev->vf_rlen, sizeof(pdev->vf_rlen));
>>
>> ... there'll be nothing for the compiler to tell us. Taken together with
>> the BUILD_BUG_ON() related remark further up, I think you want to
>> introduce a typedef and/or struct here to make things properly typesafe
>> (as then you can avoid the use of memcpy()).
> 
> Here's what I'm thinking:
> 
> struct vf_info {
>     uint64_t vf_rlen[PCI_SRIOV_NUM_BARS];
>     unsigned int refcnt;
> };
> 
> struct pci_dev {
>     ...
>     struct vf_info *vf_info;
>     ...
> };

I don't (yet) see the need for refcnt, and I also don't see why it wouldn't
continue to be embedded in struct pci_dev. Specifically ...

>> An alternative approach might be to add a link from VF to PF, while
>> making sure that the PF struct won't be de-allocated until all its VFs
>> have gone away. That would then also allow to eliminate the problematic
>> pci_get_pdev().
> 
> I think I can add a link to a new reference-counted struct with just the
> info needed (see the proposed struct above).

... I think having a link from VF to its PF may turn out helpful in the
future for other purposes, too.

Jan
Re: [PATCH v2] x86/msi: fix locking for SR-IOV devices
Posted by Stewart Hildebrand 3 months, 1 week ago
On 8/9/24 09:05, Jan Beulich wrote:
> On 09.08.2024 06:09, Stewart Hildebrand wrote:
>> On 8/7/24 11:21, Jan Beulich wrote:
>>> On 07.08.2024 07:20, Stewart Hildebrand wrote:
>>>> --- a/xen/arch/x86/msi.c
>>>> +++ b/xen/arch/x86/msi.c
>>>> @@ -662,7 +662,8 @@ static int msi_capability_init(struct pci_dev *dev,
>>>>      return 0;
>>>>  }
>>>>  
>>>> -static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
>>>> +static u64 read_pci_mem_bar(struct pci_dev *pdev, u16 seg, u8 bus, u8 slot,
>>>> +                            u8 func, u8 bir, int vf)
>>>>  {
>>>
>>> First I thought this was a leftover from the earlier version. But you need
>>> it for accessing the vf_rlen[] field. Yet that's properly misleading,
>>> especially when considering that the fix also wants backporting. What pdev
>>> represents here changes. I think you want to pass in just vf_rlen (if we
>>> really want to go this route; I'm a little wary of this repurposing of the
>>> field, albeit I see no real technical issue).
>>
>> I like your idea below of using a struct, so I'll pass a pointer to the
>> new struct.
>>
>>> Of course there's a BUILD_BUG_ON() which we need to get creative with, in
>>> order to now outright drop it (see also below).
>>
>> I suppose this BUILD_BUG_ON() is redundant with the one in
>> pci_add_device()...
> 
> "Redundant" can be positive or negative. Your response sounds as if you
> thought one could be dropped, yet I think we want them in both places.
> 
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -654,6 +654,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>>      const char *type;
>>>>      int ret;
>>>>      bool pf_is_extfn = false;
>>>> +    uint64_t vf_rlen[6] = { 0 };
>>>
>>> The type of this variable needs to be tied to that of the struct field
>>> you copy to/from. Otherwise, if the struct field changes type ...
>>>
>>>> @@ -664,7 +665,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>>                              PCI_SBDF(seg, info->physfn.bus,
>>>>                                       info->physfn.devfn));
>>>>          if ( pdev )
>>>> +        {
>>>>              pf_is_extfn = pdev->info.is_extfn;
>>>> +            memcpy(vf_rlen, pdev->vf_rlen, sizeof(pdev->vf_rlen));
>>>
>>> ... there'll be nothing for the compiler to tell us. Taken together with
>>> the BUILD_BUG_ON() related remark further up, I think you want to
>>> introduce a typedef and/or struct here to make things properly typesafe
>>> (as then you can avoid the use of memcpy()).
>>
>> Here's what I'm thinking:
>>
>> struct vf_info {
>>     uint64_t vf_rlen[PCI_SRIOV_NUM_BARS];
>>     unsigned int refcnt;
>> };
>>
>> struct pci_dev {
>>     ...
>>     struct vf_info *vf_info;
>>     ...
>> };
> 
> I don't (yet) see the need for refcnt, and I also don't see why it wouldn't
> continue to be embedded in struct pci_dev. Specifically ...
> 
>>> An alternative approach might be to add a link from VF to PF, while
>>> making sure that the PF struct won't be de-allocated until all its VFs
>>> have gone away. That would then also allow to eliminate the problematic
>>> pci_get_pdev().
>>
>> I think I can add a link to a new reference-counted struct with just the
>> info needed (see the proposed struct above).
> 
> ... I think having a link from VF to its PF may turn out helpful in the
> future for other purposes, too.

Continue to embed in struct pci_dev: okay.

Link from VF to PF: assuming you mean a link to the PF's
struct pci_dev *, okay.

Ensuring the PF's struct pci_dev * won't be de-allocated until the VFs
are gone: I don't think we want to impose any sort of ordering on
whether the toolstack removes VFs or PFs first. So, if not reference
counting (i.e. how many VFs are referring back to the PF), how else
would we make sure that the PF struct won't be de-allocated until all
its VFs have gone away?
Re: [PATCH v2] x86/msi: fix locking for SR-IOV devices
Posted by Jan Beulich 3 months, 1 week ago
On 09.08.2024 17:02, Stewart Hildebrand wrote:
> On 8/9/24 09:05, Jan Beulich wrote:
>> On 09.08.2024 06:09, Stewart Hildebrand wrote:
>>> On 8/7/24 11:21, Jan Beulich wrote:
>>>> On 07.08.2024 07:20, Stewart Hildebrand wrote:
>>>>> --- a/xen/arch/x86/msi.c
>>>>> +++ b/xen/arch/x86/msi.c
>>>>> @@ -662,7 +662,8 @@ static int msi_capability_init(struct pci_dev *dev,
>>>>>      return 0;
>>>>>  }
>>>>>  
>>>>> -static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf)
>>>>> +static u64 read_pci_mem_bar(struct pci_dev *pdev, u16 seg, u8 bus, u8 slot,
>>>>> +                            u8 func, u8 bir, int vf)
>>>>>  {
>>>>
>>>> First I thought this was a leftover from the earlier version. But you need
>>>> it for accessing the vf_rlen[] field. Yet that's properly misleading,
>>>> especially when considering that the fix also wants backporting. What pdev
>>>> represents here changes. I think you want to pass in just vf_rlen (if we
>>>> really want to go this route; I'm a little wary of this repurposing of the
>>>> field, albeit I see no real technical issue).
>>>
>>> I like your idea below of using a struct, so I'll pass a pointer to the
>>> new struct.
>>>
>>>> Of course there's a BUILD_BUG_ON() which we need to get creative with, in
>>>> order to now outright drop it (see also below).
>>>
>>> I suppose this BUILD_BUG_ON() is redundant with the one in
>>> pci_add_device()...
>>
>> "Redundant" can be positive or negative. Your response sounds as if you
>> thought one could be dropped, yet I think we want them in both places.
>>
>>>>> --- a/xen/drivers/passthrough/pci.c
>>>>> +++ b/xen/drivers/passthrough/pci.c
>>>>> @@ -654,6 +654,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>>>      const char *type;
>>>>>      int ret;
>>>>>      bool pf_is_extfn = false;
>>>>> +    uint64_t vf_rlen[6] = { 0 };
>>>>
>>>> The type of this variable needs to be tied to that of the struct field
>>>> you copy to/from. Otherwise, if the struct field changes type ...
>>>>
>>>>> @@ -664,7 +665,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>>>                              PCI_SBDF(seg, info->physfn.bus,
>>>>>                                       info->physfn.devfn));
>>>>>          if ( pdev )
>>>>> +        {
>>>>>              pf_is_extfn = pdev->info.is_extfn;
>>>>> +            memcpy(vf_rlen, pdev->vf_rlen, sizeof(pdev->vf_rlen));
>>>>
>>>> ... there'll be nothing for the compiler to tell us. Taken together with
>>>> the BUILD_BUG_ON() related remark further up, I think you want to
>>>> introduce a typedef and/or struct here to make things properly typesafe
>>>> (as then you can avoid the use of memcpy()).
>>>
>>> Here's what I'm thinking:
>>>
>>> struct vf_info {
>>>     uint64_t vf_rlen[PCI_SRIOV_NUM_BARS];
>>>     unsigned int refcnt;
>>> };
>>>
>>> struct pci_dev {
>>>     ...
>>>     struct vf_info *vf_info;
>>>     ...
>>> };
>>
>> I don't (yet) see the need for refcnt, and I also don't see why it wouldn't
>> continue to be embedded in struct pci_dev. Specifically ...
>>
>>>> An alternative approach might be to add a link from VF to PF, while
>>>> making sure that the PF struct won't be de-allocated until all its VFs
>>>> have gone away. That would then also allow to eliminate the problematic
>>>> pci_get_pdev().
>>>
>>> I think I can add a link to a new reference-counted struct with just the
>>> info needed (see the proposed struct above).
>>
>> ... I think having a link from VF to its PF may turn out helpful in the
>> future for other purposes, too.
> 
> Continue to embed in struct pci_dev: okay.
> 
> Link from VF to PF: assuming you mean a link to the PF's
> struct pci_dev *, okay.
> 
> Ensuring the PF's struct pci_dev * won't be de-allocated until the VFs
> are gone: I don't think we want to impose any sort of ordering on
> whether the toolstack removes VFs or PFs first. So, if not reference
> counting (i.e. how many VFs are referring back to the PF), how else
> would we make sure that the PF struct won't be de-allocated until all
> its VFs have gone away?

Have the PF have a linked list of its VFs, and de-allocate the PF struct
only when that list is empty. (When putting in a VF->PF link, I was
taking it as obvious that then we also want a link the other way around,
i.e. a linked list attached to the PF's struct.) For non-PF devices that
list (if we need to instantiate it in all cases in the first place) will
always be empty.

Jan