[PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets

smadhavan@nvidia.com posted 5 patches 1 month ago
drivers/cxl/cxl.h             | 107 +-------
drivers/cxl/cxlpci.h          |  10 -
drivers/pci/Kconfig           |   4 +
drivers/pci/Makefile          |   1 +
drivers/pci/cxl.c             | 468 ++++++++++++++++++++++++++++++++++
drivers/pci/pci.c             |  23 ++
drivers/pci/pci.h             |  18 ++
include/cxl/pci.h             | 129 ++++++++++
include/uapi/linux/pci_regs.h |   6 +
9 files changed, 650 insertions(+), 116 deletions(-)
create mode 100644 drivers/pci/cxl.c
create mode 100644 include/cxl/pci.h
[PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Posted by smadhavan@nvidia.com 1 month ago
From: Srirangan Madhavan <smadhavan@nvidia.com>

CXL devices could lose their DVSEC configuration and HDM decoder programming
after multiple reset methods (whenever link disable/enable). This means a
device that was fully configured — with DVSEC control/range registers set
and HDM decoders committed — loses that state after reset. In cases where
these are programmed by firmware, downstream drivers are unable to re-initialize
the device because CXL memory ranges are no longer mapped.

This series adds CXL state save/restore logic to the PCI core so
that DVSEC and HDM decoder state is preserved across any PCI reset
path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.

HDM decoder defines and the cxl_register_map infrastructure are moved from
internal CXL driver headers to a new public include/cxl/pci.h, allowing
drivers/pci/cxl.c to use them.
This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
changes on top of his.

These patches were previously part of the CXL reset series and have been
split out [2] to allow independent review and merging. Review feedback on
the save/restore portions from v4 has been addressed.

Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
before reset and restored after, with decoder commit confirmed via the
COMMITTED status bit. Type-3 device testing is in progress.

This series is based on v7.0-rc1.

[1] https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
[2] https://lore.kernel.org/linux-cxl/aa8d4f6a-e7bd-4a20-8d34-4376ea314b8f@intel.com/T/#m825c6bdd1934022123807e86d235358a63b08dbc

Srirangan Madhavan (5):
  PCI: Add CXL DVSEC control, lock, and range register definitions
  cxl: Move HDM decoder and register map definitions to
    include/cxl/pci.h
  PCI: Add virtual extended cap save buffer for CXL state
  PCI: Add cxl DVSEC state save/restore across resets
  PCI/CXL: Add HDM decoder state save/restore

 drivers/cxl/cxl.h             | 107 +-------
 drivers/cxl/cxlpci.h          |  10 -
 drivers/pci/Kconfig           |   4 +
 drivers/pci/Makefile          |   1 +
 drivers/pci/cxl.c             | 468 ++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c             |  23 ++
 drivers/pci/pci.h             |  18 ++
 include/cxl/pci.h             | 129 ++++++++++
 include/uapi/linux/pci_regs.h |   6 +
 9 files changed, 650 insertions(+), 116 deletions(-)
 create mode 100644 drivers/pci/cxl.c
 create mode 100644 include/cxl/pci.h

base-commit: 6de23f81a5e0
--
2.43.0

Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Posted by Dan Williams 4 weeks, 1 day ago
smadhavan@ wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> CXL devices could lose their DVSEC configuration and HDM decoder programming
> after multiple reset methods (whenever link disable/enable). This means a
> device that was fully configured — with DVSEC control/range registers set
> and HDM decoders committed — loses that state after reset. In cases where
> these are programmed by firmware, downstream drivers are unable to re-initialize
> the device because CXL memory ranges are no longer mapped.
> 
> This series adds CXL state save/restore logic to the PCI core so
>
> that DVSEC and HDM decoder state is preserved across any PCI reset
> path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.

The PCI core has no business learning CXL core internals.

For example, I have been pushing the CXL port protocol error handling
series to minimally involve the PCI core. Just enough enabling to
forward AER events, but otherwise PCI core stays blissfully unaware of
CXL details. The alternative is maintenance burden to the
PCI core that I expect is best to avoid.

> HDM decoder defines and the cxl_register_map infrastructure are moved from
> internal CXL driver headers to a new public include/cxl/pci.h, allowing
> drivers/pci/cxl.c to use them.
> This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> changes on top of his.

I think we need to evaluate where things stand after both the CXL port
error handling series and the CXL accelerator base series have landed.
Not that they are functionally dependendent on each other, but there is
a review backlog that needs to clear, and those establish the precedent
about where CXL functionality lands between PCI core, CXL core, and CXL
enlightened drivers.

> These patches were previously part of the CXL reset series and have been
> split out [2] to allow independent review and merging. Review feedback on
> the save/restore portions from v4 has been addressed.
> 
> Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> before reset and restored after, with decoder commit confirmed via the
> COMMITTED status bit. Type-3 device testing is in progress.

It is a memory hot plug event.An accelerator driver can coordinate
quiescing CXL.mem over events like reset, a memory expander driver can
not. The PCI core can not manage memory hot plug.  It is the wrong place
to enable this specific CXL reset because PCI core has no idea about the
suitability of reset at any given point of time.

Now, the secondary bus reset enabling for the CXL did end up with
changes to the PCI core:

53c49b6e6dd2 PCI/CXL: Add 'cxl_bus' reset method for devices below CXL Ports

...but only to disambiguate that hardware may be blocking secondary bus
reset by default. However, as the cxl_reset_done() handler shows, there
is zero coordination. One might get lucky and be able to see those
dev_crit() messages before the kernel crashes in the memory expander
case.
Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Posted by Alex Williamson 4 weeks, 1 day ago
Hey Dan,

On Tue, 10 Mar 2026 14:39:25 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> smadhavan@ wrote:
> > From: Srirangan Madhavan <smadhavan@nvidia.com>
> > 
> > CXL devices could lose their DVSEC configuration and HDM decoder programming
> > after multiple reset methods (whenever link disable/enable). This means a
> > device that was fully configured — with DVSEC control/range registers set
> > and HDM decoders committed — loses that state after reset. In cases where
> > these are programmed by firmware, downstream drivers are unable to re-initialize
> > the device because CXL memory ranges are no longer mapped.
> > 
> > This series adds CXL state save/restore logic to the PCI core so
> >
> > that DVSEC and HDM decoder state is preserved across any PCI reset
> > path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.  
> 
> The PCI core has no business learning CXL core internals.
> 
> For example, I have been pushing the CXL port protocol error handling
> series to minimally involve the PCI core. Just enough enabling to
> forward AER events, but otherwise PCI core stays blissfully unaware of
> CXL details. The alternative is maintenance burden to the
> PCI core that I expect is best to avoid.
> 
> > HDM decoder defines and the cxl_register_map infrastructure are moved from
> > internal CXL driver headers to a new public include/cxl/pci.h, allowing
> > drivers/pci/cxl.c to use them.
> > This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> > minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> > changes on top of his.  
> 
> I think we need to evaluate where things stand after both the CXL port
> error handling series and the CXL accelerator base series have landed.
> Not that they are functionally dependendent on each other, but there is
> a review backlog that needs to clear, and those establish the precedent
> about where CXL functionality lands between PCI core, CXL core, and CXL
> enlightened drivers.
> 
> > These patches were previously part of the CXL reset series and have been
> > split out [2] to allow independent review and merging. Review feedback on
> > the save/restore portions from v4 has been addressed.
> > 
> > Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> > before reset and restored after, with decoder commit confirmed via the
> > COMMITTED status bit. Type-3 device testing is in progress.  
> 
> It is a memory hot plug event.An accelerator driver can coordinate
> quiescing CXL.mem over events like reset, a memory expander driver can
> not. The PCI core can not manage memory hot plug.  It is the wrong place
> to enable this specific CXL reset because PCI core has no idea about the
> suitability of reset at any given point of time.
> 
> Now, the secondary bus reset enabling for the CXL did end up with
> changes to the PCI core:
> 
> 53c49b6e6dd2 PCI/CXL: Add 'cxl_bus' reset method for devices below CXL Ports
> 
> ...but only to disambiguate that hardware may be blocking secondary bus
> reset by default. However, as the cxl_reset_done() handler shows, there
> is zero coordination. One might get lucky and be able to see those
> dev_crit() messages before the kernel crashes in the memory expander
> case.

A constraint here is that CXL_BUS can be modular while PCI is builtin,
but reset is initiated through PCI and drivers like vfio-pci already
manage an opaque blob of PCI device state that can be pushed back into
the device to restore it between use cases.  If PCI is not enlightened
about CXL state to some extent, how does this work?

PCI core has already been enlightened about things like virtual-channel
that it doesn't otherwise touch in order to be able to save and restore
firmware initiated configurations.  I think there are aspects of that
sort of thing here as well.  Thanks,

Alex
Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Posted by Dan Williams 4 weeks, 1 day ago
Alex Williamson wrote:
[..]
> A constraint here is that CXL_BUS can be modular while PCI is builtin,
> but reset is initiated through PCI and drivers like vfio-pci already
> manage an opaque blob of PCI device state that can be pushed back into
> the device to restore it between use cases.  If PCI is not enlightened
> about CXL state to some extent, how does this work?

My expectation is that "vfio-cxl" is responsible. Similar to vfio-pci
that builds on PCI core functionality for assigning a device, a vfio-cxl
driver would build on CXL.

Specficially, register generic 'struct cxl_memdev' and/or, 'struct
cxl_cachdev' objects with the CXL core, like any other accelerator
driver, and coordinate various levels of reset on those objects, not the
'struct pci_dev'.

> PCI core has already been enlightened about things like virtual-channel
> that it doesn't otherwise touch in order to be able to save and restore
> firmware initiated configurations.  I think there are aspects of that
> sort of thing here as well.  Thanks,

I am willing to hear more and admit I am not familiar with the details
of virtual-channel that make it both amenable to vfio-pci management and
similar to CXL. CXL needs to consider MM, cross-device dependencies, and
decoder topology management that is more dynamic than what happens for
PCI resources.

The CXL accelerator series is currently contending with being able to
restore device configuration after reset. I expect vfio-cxl to build on
that, not push CXL flows into the PCI core.
Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Posted by Jonathan Cameron 4 weeks ago
On Fri, 6 Mar 2026 08:00:14 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> CXL devices could lose their DVSEC configuration and HDM decoder programming
> after multiple reset methods (whenever link disable/enable). This means a
> device that was fully configured — with DVSEC control/range registers set
> and HDM decoders committed — loses that state after reset. In cases where
> these are programmed by firmware, downstream drivers are unable to re-initialize
> the device because CXL memory ranges are no longer mapped.

Hi Srirangan,

Firstly this might be because I'm behind on patch review and there is
a lot going on right now!  So this might be addressed in a different series.

I'd like to understand the whole use case + flow here.  In general I think
we have a problem if a driver is relying on the bios having set up the
decoders and simply doesn't function if the bios didn't do it (and
that applies in this reset case as well). For starters, no hotplug.
Anyhow, that's a different issue, so we can leave that for now.

I'm thinking the reset flow is a good deal more complex than simply
putting the bios programmed values back.  In some cases that might
be a very bad idea as autonomous traffic can hit the type 2 device
the moment these decoders are enabled and I'm guessing that may be
before the device has fully recovered. There are very few spec rules
about this that I can recall. On the setup path the BIOS presumably
got the device into a state where enabling such traffic was fine
and hopefully the driver bind doesn't break that state.

I think you are restoring CXL.mem as well so that gate isn't
going to save us.  Note it would be good to document what is restored and
why more clearly.  Sure we can figure it out from the code, but
a document might make life easier.

A device might handle this mess for us, but I doubt that this is universal.
For type 3 devices, I'm not sure what we want to do on reset in general.

Anyhow, this is really a request for a more detailed description of the
expected reset flow that goes into what the spec constrains and what
it doesn't.  Probably something worthy of going in Documentation.

Thanks,

Jonathan

> 
> This series adds CXL state save/restore logic to the PCI core so
> that DVSEC and HDM decoder state is preserved across any PCI reset
> path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.
> 
> HDM decoder defines and the cxl_register_map infrastructure are moved from
> internal CXL driver headers to a new public include/cxl/pci.h, allowing
> drivers/pci/cxl.c to use them.
> This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> changes on top of his.
> 
> These patches were previously part of the CXL reset series and have been
> split out [2] to allow independent review and merging. Review feedback on
> the save/restore portions from v4 has been addressed.
> 
> Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> before reset and restored after, with decoder commit confirmed via the
> COMMITTED status bit. Type-3 device testing is in progress.
> 
> This series is based on v7.0-rc1.
> 
> [1] https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
> [2] https://lore.kernel.org/linux-cxl/aa8d4f6a-e7bd-4a20-8d34-4376ea314b8f@intel.com/T/#m825c6bdd1934022123807e86d235358a63b08dbc
> 
> Srirangan Madhavan (5):
>   PCI: Add CXL DVSEC control, lock, and range register definitions
>   cxl: Move HDM decoder and register map definitions to
>     include/cxl/pci.h
>   PCI: Add virtual extended cap save buffer for CXL state
>   PCI: Add cxl DVSEC state save/restore across resets
>   PCI/CXL: Add HDM decoder state save/restore
> 
>  drivers/cxl/cxl.h             | 107 +-------
>  drivers/cxl/cxlpci.h          |  10 -
>  drivers/pci/Kconfig           |   4 +
>  drivers/pci/Makefile          |   1 +
>  drivers/pci/cxl.c             | 468 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.c             |  23 ++
>  drivers/pci/pci.h             |  18 ++
>  include/cxl/pci.h             | 129 ++++++++++
>  include/uapi/linux/pci_regs.h |   6 +
>  9 files changed, 650 insertions(+), 116 deletions(-)
>  create mode 100644 drivers/pci/cxl.c
>  create mode 100644 include/cxl/pci.h
> 
> base-commit: 6de23f81a5e0
> --
> 2.43.0
> 
>