drivers/cxl/Kconfig | 4 - drivers/cxl/acpi.c | 25 ++++++ drivers/cxl/core/Makefile | 2 +- drivers/cxl/core/region.c | 163 ++++++++++++++++++++++++++++++++++++- drivers/cxl/core/suspend.c | 34 +++++++- drivers/cxl/cxl.h | 7 ++ drivers/cxl/cxlmem.h | 9 -- drivers/cxl/cxlpci.h | 1 + drivers/cxl/pci.c | 2 + drivers/dax/hmem/device.c | 47 +++++------ drivers/dax/hmem/hmem.c | 10 ++- include/linux/dax.h | 11 ++- include/linux/pm.h | 7 -- 13 files changed, 270 insertions(+), 52 deletions(-)
Add the ability to manage SOFT RESERVE iomem resources prior to them being
added to the iomem resource tree. This allows drivers, such as CXL, to
remove any pieces of the SOFT RESERVE resource that intersect with created
CXL regions.
The current approach of leaving the SOFT RESERVE resources as is can cause
failures during hotplug of devices, such as CXL, because the resource is
not available for reuse after teardown of the device.
The approach is to add SOFT RESERVE resources to a separate tree during
boot. This allows any drivers to update the SOFT RESERVE resources before
they are merged into the iomem resource tree. In addition a notifier chain
is added so that drivers can be notified when these SOFT RESERVE resources
are added to the ioeme resource tree.
The CXL driver is modified to use a worker thread that waits for the CXL
PCI and CXL mem drivers to be loaded and for their probe routine to
complete. Then the driver walks through any created CXL regions to trim any
intersections with SOFT RESERVE resources in the iomem tree.
The dax driver uses the new soft reserve notifier chain so it can consume
any remaining SOFT RESERVES once they're added to the iomem tree.
The following scenarios have been tested:
Example 1: Exact alignment, soft reserved is a child of the region
|---------- "Soft Reserved" -----------|
|-------------- "Region #" ------------|
Before:
1050000000-304fffffff : CXL Window 0
1050000000-304fffffff : region0
1050000000-304fffffff : Soft Reserved
1080000000-2fffffffff : dax0.0
1080000000-2fffffffff : System RAM (kmem)
After:
1050000000-304fffffff : CXL Window 0
1050000000-304fffffff : region1
1080000000-2fffffffff : dax0.0
1080000000-2fffffffff : System RAM (kmem)
Example 2: Start and/or end aligned and soft reserved spans multiple
regions
|----------- "Soft Reserved" -----------|
|-------- "Region #" -------|
or
|----------- "Soft Reserved" -----------|
|-------- "Region #" -------|
Before:
850000000-684fffffff : Soft Reserved
850000000-284fffffff : CXL Window 0
850000000-284fffffff : region3
850000000-284fffffff : dax0.0
850000000-284fffffff : System RAM (kmem)
2850000000-484fffffff : CXL Window 1
2850000000-484fffffff : region4
2850000000-484fffffff : dax1.0
2850000000-484fffffff : System RAM (kmem)
4850000000-684fffffff : CXL Window 2
4850000000-684fffffff : region5
4850000000-684fffffff : dax2.0
4850000000-684fffffff : System RAM (kmem)
After:
850000000-284fffffff : CXL Window 0
850000000-284fffffff : region3
850000000-284fffffff : dax0.0
850000000-284fffffff : System RAM (kmem)
2850000000-484fffffff : CXL Window 1
2850000000-484fffffff : region4
2850000000-484fffffff : dax1.0
2850000000-484fffffff : System RAM (kmem)
4850000000-684fffffff : CXL Window 2
4850000000-684fffffff : region5
4850000000-684fffffff : dax2.0
4850000000-684fffffff : System RAM (kmem)
Example 3: No alignment
|---------- "Soft Reserved" ----------|
|---- "Region #" ----|
Before:
00000000-3050000ffd : Soft Reserved
..
..
1050000000-304fffffff : CXL Window 0
1050000000-304fffffff : region1
1080000000-2fffffffff : dax0.0
1080000000-2fffffffff : System RAM (kmem)
After:
00000000-104fffffff : Soft Reserved
..
..
1050000000-304fffffff : CXL Window 0
1050000000-304fffffff : region1
1080000000-2fffffffff : dax0.0
1080000000-2fffffffff : System RAM (kmem)
3050000000-3050000ffd : Soft Reserved
v4 updates:
- Split first patch into 4 smaller patches.
- Correct the logic for cxl_pci_loaded() and cxl_mem_active() to return
false at default instead of true.
- Cleanup cxl_wait_for_pci_mem() to remove config checks for cxl_pci
and cxl_mem.
- Fixed multiple bugs and build issues which includes correcting
walk_iomem_resc_desc() and calculations of alignments.
v3 updates:
- Remove srmem resource tree from kernel/resource.c, this is no longer
needed in the current implementation. All SOFT RESERVE resources now
put on the iomem resource tree.
- Remove the no longer needed SOFT_RESERVED_MANAGED kernel config option.
- Add the 'nid' parameter back to hmem_register_resource();
- Remove the no longer used soft reserve notification chain (introduced
in v2). The dax driver is now notified of SOFT RESERVED resources by
the CXL driver.
v2 updates:
- Add config option SOFT_RESERVE_MANAGED to control use of the
separate srmem resource tree at boot.
- Only add SOFT RESERVE resources to the soft reserve tree during
boot, they go to the iomem resource tree after boot.
- Remove the resource trimming code in the previous patch to re-use
the existing code in kernel/resource.c
- Add functionality for the cxl acpi driver to wait for the cxl PCI
and me drivers to load.
Smita Koralahalli (7):
cxl/region: Avoid null pointer dereference in is_cxl_region()
cxl/core: Remove CONFIG_CXL_SUSPEND and always build suspend.o
cxl/pci: Add pci_loaded tracking to mark PCI driver readiness
cxl/acpi: Add background worker to wait for cxl_pci and cxl_mem probe
cxl/region: Introduce SOFT RESERVED resource removal on region
teardown
dax/hmem: Save the DAX HMEM platform device pointer
cxl/dax: Defer DAX consumption of SOFT RESERVED resources until after
CXL region creation
drivers/cxl/Kconfig | 4 -
drivers/cxl/acpi.c | 25 ++++++
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/region.c | 163 ++++++++++++++++++++++++++++++++++++-
drivers/cxl/core/suspend.c | 34 +++++++-
drivers/cxl/cxl.h | 7 ++
drivers/cxl/cxlmem.h | 9 --
drivers/cxl/cxlpci.h | 1 +
drivers/cxl/pci.c | 2 +
drivers/dax/hmem/device.c | 47 +++++------
drivers/dax/hmem/hmem.c | 10 ++-
include/linux/dax.h | 11 ++-
include/linux/pm.h | 7 --
13 files changed, 270 insertions(+), 52 deletions(-)
--
2.17.1
Smita,
Thanks for your awesome work. I just tested the scenarios you listed, and they work as expected. Thanks again.
(Minor comments inlined)
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
To the CXL community,
The scenarios mentioned here essentially cover what a correct firmware may provide. However,
I would like to discuss one more scenario that I can simulate with a modified QEMU:
The E820 exposes a SOFT RESERVED region which is the same as a CFMW, but the HDM decoders are not committed. This means no region will be auto-created during boot.
As an example, after boot, the iomem tree is as follows:
1050000000-304fffffff : CXL Window 0
1050000000-304fffffff : Soft Reserved
<No region>
In this case, the SOFT RESERVED resource is not trimmed, so the end-user cannot create a new region.
My question is: Is this scenario a problem? If it is, should we fix it in this patchset or create a new patch?
On 04/06/2025 06:19, Smita Koralahalli wrote:
> Add the ability to manage SOFT RESERVE iomem resources prior to them being
> added to the iomem resource tree. This allows drivers, such as CXL, to
> remove any pieces of the SOFT RESERVE resource that intersect with created
> CXL regions.
>
> The current approach of leaving the SOFT RESERVE resources as is can cause
> failures during hotplug of devices, such as CXL, because the resource is
> not available for reuse after teardown of the device.
>
> The approach is to add SOFT RESERVE resources to a separate tree during
> boot.
No special tree at all since V3
> This allows any drivers to update the SOFT RESERVE resources before
> they are merged into the iomem resource tree. In addition a notifier chain
> is added so that drivers can be notified when these SOFT RESERVE resources
> are added to the ioeme resource tree.
>
> The CXL driver is modified to use a worker thread that waits for the CXL
> PCI and CXL mem drivers to be loaded and for their probe routine to
> complete. Then the driver walks through any created CXL regions to trim any
> intersections with SOFT RESERVE resources in the iomem tree.
>
> The dax driver uses the new soft reserve notifier chain so it can consume
> any remaining SOFT RESERVES once they're added to the iomem tree.
>
> The following scenarios have been tested:
>
> Example 1: Exact alignment, soft reserved is a child of the region
>
> |---------- "Soft Reserved" -----------|
> |-------------- "Region #" ------------|
>
> Before:
> 1050000000-304fffffff : CXL Window 0
> 1050000000-304fffffff : region0
> 1050000000-304fffffff : Soft Reserved
> 1080000000-2fffffffff : dax0.0
BTW, I'm curious how to set up a dax with an address range different from its corresponding region.
> 1080000000-2fffffffff : System RAM (kmem)
>
> After:
> 1050000000-304fffffff : CXL Window 0
> 1050000000-304fffffff : region1
> 1080000000-2fffffffff : dax0.0
> 1080000000-2fffffffff : System RAM (kmem)
>
> Example 2: Start and/or end aligned and soft reserved spans multiple
> regions
Tested
>
> |----------- "Soft Reserved" -----------|
> |-------- "Region #" -------|
> or
> |----------- "Soft Reserved" -----------|
> |-------- "Region #" -------|
Typo? should be:
|----------- "Soft Reserved" -----------|
|-------- "Region #" -------|
>
> Example 3: No alignment
> |---------- "Soft Reserved" ----------|
> |---- "Region #" ----|
Tested.
Thanks
Zhijian
Hi Zhijian, Thanks for testing my patches. On 6/4/2025 1:43 AM, Zhijian Li (Fujitsu) wrote: > Smita, > > Thanks for your awesome work. I just tested the scenarios you listed, and they work as expected. Thanks again. > (Minor comments inlined) > > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > > > To the CXL community, > > The scenarios mentioned here essentially cover what a correct firmware may provide. However, > I would like to discuss one more scenario that I can simulate with a modified QEMU: > The E820 exposes a SOFT RESERVED region which is the same as a CFMW, but the HDM decoders are not committed. This means no region will be auto-created during boot. > > As an example, after boot, the iomem tree is as follows: > 1050000000-304fffffff : CXL Window 0 > 1050000000-304fffffff : Soft Reserved > <No region> > > In this case, the SOFT RESERVED resource is not trimmed, so the end-user cannot create a new region. > My question is: Is this scenario a problem? If it is, should we fix it in this patchset or create a new patch? > I believe firmware should handle this correctly by ensuring that any exposed SOFT RESERVED ranges correspond to committed HDM decoders and result in region creation. That said, I’d be interested in hearing what the rest of the community thinks. > > > > On 04/06/2025 06:19, Smita Koralahalli wrote: >> Add the ability to manage SOFT RESERVE iomem resources prior to them being >> added to the iomem resource tree. This allows drivers, such as CXL, to >> remove any pieces of the SOFT RESERVE resource that intersect with created >> CXL regions. >> >> The current approach of leaving the SOFT RESERVE resources as is can cause >> failures during hotplug of devices, such as CXL, because the resource is >> not available for reuse after teardown of the device. >> >> The approach is to add SOFT RESERVE resources to a separate tree during >> boot. > > No special tree at all since V3 Will make changes. I overlooked the cover letter. > > >> This allows any drivers to update the SOFT RESERVE resources before >> they are merged into the iomem resource tree. In addition a notifier chain >> is added so that drivers can be notified when these SOFT RESERVE resources >> are added to the ioeme resource tree. >> >> The CXL driver is modified to use a worker thread that waits for the CXL >> PCI and CXL mem drivers to be loaded and for their probe routine to >> complete. Then the driver walks through any created CXL regions to trim any >> intersections with SOFT RESERVE resources in the iomem tree. >> >> The dax driver uses the new soft reserve notifier chain so it can consume >> any remaining SOFT RESERVES once they're added to the iomem tree. >> >> The following scenarios have been tested: >> >> Example 1: Exact alignment, soft reserved is a child of the region >> >> |---------- "Soft Reserved" -----------| >> |-------------- "Region #" ------------| >> >> Before: >> 1050000000-304fffffff : CXL Window 0 >> 1050000000-304fffffff : region0 >> 1050000000-304fffffff : Soft Reserved >> 1080000000-2fffffffff : dax0.0 > > BTW, I'm curious how to set up a dax with an address range different from its corresponding region. Hmm, this configuration was provided directly by our BIOS. The DAX device was mapped to a subset of the region's address space as part of the platform's firmware setup, so I did not explicitly configure it.. > > >> 1080000000-2fffffffff : System RAM (kmem) >> >> After: >> 1050000000-304fffffff : CXL Window 0 >> 1050000000-304fffffff : region1 >> 1080000000-2fffffffff : dax0.0 >> 1080000000-2fffffffff : System RAM (kmem) >> >> Example 2: Start and/or end aligned and soft reserved spans multiple >> regions > > Tested > >> >> |----------- "Soft Reserved" -----------| >> |-------- "Region #" -------| >> or >> |----------- "Soft Reserved" -----------| >> |-------- "Region #" -------| > > Typo? should be: > |----------- "Soft Reserved" -----------| > |-------- "Region #" -------| Yeah, Will fix. > >> >> Example 3: No alignment >> |---------- "Soft Reserved" ----------| >> |---- "Region #" ----| > > Tested. > > > Thanks > Zhijian Thanks Smita
On 05/06/2025 02:59, Koralahalli Channabasappa, Smita wrote: >> >> >> To the CXL community, >> >> The scenarios mentioned here essentially cover what a correct firmware may provide. However, >> I would like to discuss one more scenario that I can simulate with a modified QEMU: >> The E820 exposes a SOFT RESERVED region which is the same as a CFMW, but the HDM decoders are not committed. This means no region will be auto-created during boot. >> >> As an example, after boot, the iomem tree is as follows: >> 1050000000-304fffffff : CXL Window 0 >> 1050000000-304fffffff : Soft Reserved >> <No region> >> >> In this case, the SOFT RESERVED resource is not trimmed, so the end-user cannot create a new region. >> My question is: Is this scenario a problem? If it is, should we fix it in this patchset or create a new patch? >> > > I believe firmware should handle this correctly by ensuring that any exposed SOFT RESERVED ranges correspond to committed HDM decoders and result in region creation. > > That said, I’d be interested in hearing what the rest of the community thinks. After several days, we still haven't heard other significant opinions. I'm fine with keeping the current case coverage. If the case I described becomes common in the future, we can revisit it then. Thanks Zhijian
© 2016 - 2025 Red Hat, Inc.