[RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue

Li Ming posted 3 patches 9 months ago
There is a newer version of this series
drivers/cxl/core/core.h |  2 +-
drivers/cxl/core/pci.c  | 27 ++++++++++++++++-----------
drivers/cxl/core/port.c |  2 +-
drivers/cxl/cxl.h       |  6 +++---
drivers/cxl/pmem.c      |  2 +-
5 files changed, 22 insertions(+), 17 deletions(-)
[RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue
Posted by Li Ming 9 months ago
During review all new patches in branch cxl/next. I noticed there may be
a problem in below commit.

commit a52b6a2c1c99 ("cxl/pci: Support Global Persistent Flush (GPF)")

There is a new field gpf_dvsec in struct cxl_port to cache GPF DVSEC for
Port(DVSEC ID 04h) location. When the first EP attaching to a cxl port,
it will trigger locating GPF DVSEC on the cxl dport which the first EP
is under, then the location is cached in port->gpf_dvsec. So if another
EP under another dport is attaching, it will reuse the value of
port->gpf_dvsec as GPF DVSEC location for this another dport. The
problem is the cached location may be a wrong location for other dports
of the port.

Per Table 8-2 in CXL r3.2 section 8.1.1 and CXL r3.2 section 8.1.6, Each
CXL Downstream switch ports and CXL root ports have their own GPF DVSEC
for Port(DVSEC ID 04h). So my understanding is that CXL subsystem should
locate GPF DVSEC for Port for each dport rather than using the cached
location in CXL port.

But I am not sure if all dports under a same port will have same
configuration space layout, if yes, that will not be a problem. If I am
wrong, please let me know, thanks.

base-commit: 3b5d43245f0a56390baaa670e1b6d898772266b3 cxl/next

Li Ming (3):
  cxl/core: Fix caching dport GPF DVSEC issue
  cxl/pci: Update Port GPF timeout only when the first EP attaching
  cxl/pci: Drop the parameter is_port of cxl_gpf_get_dvsec()

 drivers/cxl/core/core.h |  2 +-
 drivers/cxl/core/pci.c  | 27 ++++++++++++++++-----------
 drivers/cxl/core/port.c |  2 +-
 drivers/cxl/cxl.h       |  6 +++---
 drivers/cxl/pmem.c      |  2 +-
 5 files changed, 22 insertions(+), 17 deletions(-)

-- 
2.34.1
Re: [RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue
Posted by Dan Williams 9 months ago
Li Ming wrote:
> During review all new patches in branch cxl/next. I noticed there may be
> a problem in below commit.
> 
> commit a52b6a2c1c99 ("cxl/pci: Support Global Persistent Flush (GPF)")
> 
> There is a new field gpf_dvsec in struct cxl_port to cache GPF DVSEC for
> Port(DVSEC ID 04h) location. When the first EP attaching to a cxl port,
> it will trigger locating GPF DVSEC on the cxl dport which the first EP
> is under, then the location is cached in port->gpf_dvsec. So if another
> EP under another dport is attaching, it will reuse the value of
> port->gpf_dvsec as GPF DVSEC location for this another dport. The
> problem is the cached location may be a wrong location for other dports
> of the port.
> 
> Per Table 8-2 in CXL r3.2 section 8.1.1 and CXL r3.2 section 8.1.6, Each
> CXL Downstream switch ports and CXL root ports have their own GPF DVSEC
> for Port(DVSEC ID 04h). So my understanding is that CXL subsystem should
> locate GPF DVSEC for Port for each dport rather than using the cached
> location in CXL port.
> 
> But I am not sure if all dports under a same port will have same
> configuration space layout, if yes, that will not be a problem. If I am
> wrong, please let me know, thanks.
> 
> base-commit: 3b5d43245f0a56390baaa670e1b6d898772266b3 cxl/next
> 
> Li Ming (3):
>   cxl/core: Fix caching dport GPF DVSEC issue
>   cxl/pci: Update Port GPF timeout only when the first EP attaching
>   cxl/pci: Drop the parameter is_port of cxl_gpf_get_dvsec()

These look good. I doubt a device would build different offsets
per-dport, but the real risk is more in confusing future code readers
that dports are uniform.

For the series,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

A Fixes tag seems reasonable.
Re: [RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue
Posted by Davidlohr Bueso 9 months ago
On Wed, 19 Mar 2025, Li Ming wrote:

>But I am not sure if all dports under a same port will have same
>configuration space layout, if yes, that will not be a problem. If I am
>wrong, please let me know, thanks.

Yes, when caching the dvsec was suggested, it was my assumption that the
config space would be the same.

Thanks,
Davidlohr
Re: [RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue
Posted by Davidlohr Bueso 9 months ago
On Thu, 20 Mar 2025, Davidlohr Bueso wrote:

>On Wed, 19 Mar 2025, Li Ming wrote:
>
>>But I am not sure if all dports under a same port will have same
>>configuration space layout, if yes, that will not be a problem. If I am
>>wrong, please let me know, thanks.
>
>Yes, when caching the dvsec was suggested, it was my assumption that the
>config space would be the same.

Ultimately I don't know what the expectation is here, but your updates
do allow more flexibility from vendors, I guess(?). It's a bit late
in the cycle, unfortunately, so if these are to go in for v6.15, they
would be considered a fix imo, otherwise perhaps they are wanted for
v6.16 or not at all (patch 3 does look useful regardless)?

Based on some of the topologies listed in qemu, I did some testing (and
this was also why the same dvsec config layout) and see things working as
expected.

https://www.qemu.org/docs/master/system/devices/cxl.html#example-command-lines

(i) 2 direct-attached Type3 devices:
-next
[    3.938238] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.939323] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    4.003074] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    4.003676] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs
-next+fix
[    3.969841] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.970957] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    3.971622] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    3.972664] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs

(ii) 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with the CXL Type3 device directly attached:
-next
[    6.182333] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:00.0: Port GPF phase 1 timeout: 20 secs
[    6.182352] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:00.0: Port GPF phase 2 timeout: 20 secs
[    6.183938] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    6.183955] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs
[    6.204324] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    6.205407] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    6.210006] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:01.0: Port GPF phase 1 timeout: 20 secs
[    6.210921] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:01.0: Port GPF phase 2 timeout: 20 secs
-next+fix
[    6.153093] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    6.153107] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs
[    6.154170] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:01.0: Port GPF phase 1 timeout: 20 secs
[    6.154187] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:01.0: Port GPF phase 2 timeout: 20 secs
[    6.195279] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:00.0: Port GPF phase 1 timeout: 20 secs
[    6.195859] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:00.0: Port GPF phase 2 timeout: 20 secs
[    6.255782] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    6.257152] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs

(iii) 4 Type3 devices below a CXL Switch:
-next
[    3.940200] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:01.0: Port GPF phase 1 timeout: 20 secs
[    3.940218] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:01.0: Port GPF phase 2 timeout: 20 secs
[    3.940231] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:00.0: Port GPF phase 1 timeout: 20 secs
[    3.940245] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:00.0: Port GPF phase 2 timeout: 20 secs
[    3.940340] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.940350] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    3.948114] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:02.0: Port GPF phase 1 timeout: 20 secs
[    3.949203] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:02.0: Port GPF phase 2 timeout: 20 secs
[    3.997620] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:03.0: Port GPF phase 1 timeout: 20 secs
[    3.997641] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:03.0: Port GPF phase 2 timeout: 20 secs
-next+fix
[    3.983632] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:01.0: Port GPF phase 1 timeout: 20 secs
[    3.983649] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:01.0: Port GPF phase 2 timeout: 20 secs
[    3.984525] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.984539] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    3.984667] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:03.0: Port GPF phase 1 timeout: 20 secs
[    3.984692] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:03.0: Port GPF phase 2 timeout: 20 secs
[    3.985074] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:02.0: Port GPF phase 1 timeout: 20 secs
[    3.985090] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:02.0: Port GPF phase 2 timeout: 20 secs
[    3.988954] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:00.0: Port GPF phase 1 timeout: 20 secs
[    3.990464] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:00.0: Port GPF phase 2 timeout: 20 secs

Thanks,
Davidlohr
Re: [RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue
Posted by Li Ming 9 months ago
On 3/21/2025 11:59 AM, Davidlohr Bueso wrote:
> On Thu, 20 Mar 2025, Davidlohr Bueso wrote:
>
>> On Wed, 19 Mar 2025, Li Ming wrote:
>>
>>> But I am not sure if all dports under a same port will have same
>>> configuration space layout, if yes, that will not be a problem. If I am
>>> wrong, please let me know, thanks.
>>
>> Yes, when caching the dvsec was suggested, it was my assumption that the
>> config space would be the same.
>
> Ultimately I don't know what the expectation is here, but your updates
> do allow more flexibility from vendors, I guess(?). It's a bit late
> in the cycle, unfortunately, so if these are to go in for v6.15, they
> would be considered a fix imo, otherwise perhaps they are wanted for
> v6.16 or not at all (patch 3 does look useful regardless)?

My understanding is that the expectation of the patchset is to avoid using a wrong GPF DVSEC in case of dports under a same port have different config space layout. And I think the change is more closely to the description of CXL spec.

If the case(dports under a same port have different config space layout) would not happen, maybe add a comment in cxl_gpf_port_setup() is another option.

Yes, if patch 1 & 2 are considered to be merged, they are worth a fix tag. And patch 3 is an obvious cleanup change.

>
> Based on some of the topologies listed in qemu, I did some testing (and
> this was also why the same dvsec config layout) and see things working as
> expected.

Thanks for testing.


Ming

[snip]
Re: [RFC Patch v1 0/3] Fix using wrong GPF DVSEC location issue
Posted by Jonathan Cameron 9 months ago
On Fri, 21 Mar 2025 14:55:42 +0800
Li Ming <ming.li@zohomail.com> wrote:

> On 3/21/2025 11:59 AM, Davidlohr Bueso wrote:
> > On Thu, 20 Mar 2025, Davidlohr Bueso wrote:
> >  
> >> On Wed, 19 Mar 2025, Li Ming wrote:
> >>  
> >>> But I am not sure if all dports under a same port will have same
> >>> configuration space layout, if yes, that will not be a problem. If I am
> >>> wrong, please let me know, thanks.  
> >>
> >> Yes, when caching the dvsec was suggested, it was my assumption that the
> >> config space would be the same.  
> >
> > Ultimately I don't know what the expectation is here, but your updates
> > do allow more flexibility from vendors, I guess(?). It's a bit late
> > in the cycle, unfortunately, so if these are to go in for v6.15, they
> > would be considered a fix imo, otherwise perhaps they are wanted for
> > v6.16 or not at all (patch 3 does look useful regardless)?  
> 
> My understanding is that the expectation of the patchset is to avoid using a wrong GPF DVSEC in case of dports under a same port have different config space layout. And I think the change is more closely to the description of CXL spec.
> 
> If the case(dports under a same port have different config space layout) would not happen, maybe add a comment in cxl_gpf_port_setup() is another option.
> 
> Yes, if patch 1 & 2 are considered to be merged, they are worth a fix tag. And patch 3 is an obvious cleanup change.

I think they can indeed have different layout (in theory).
Seems moderately unlikely to occur in real devices, but you never know.

So I think a fixes tag would be valid.

Jonathan

> 
> >
> > Based on some of the topologies listed in qemu, I did some testing (and
> > this was also why the same dvsec config layout) and see things working as
> > expected.  
> 
> Thanks for testing.
> 
> 
> Ming
> 
> [snip]
>