[PATCH v4 5/9] cxl/region, dax/hmem: Arbitrate Soft Reserved ownership with cxl_regions_fully_map()

Smita Koralahalli posted 9 patches 2 months, 3 weeks ago
There is a newer version of this series
[PATCH v4 5/9] cxl/region, dax/hmem: Arbitrate Soft Reserved ownership with cxl_regions_fully_map()
Posted by Smita Koralahalli 2 months, 3 weeks ago
Introduce cxl_regions_fully_map() to check whether CXL regions form a
single contiguous, non-overlapping cover of a given Soft Reserved range.

Use this helper to decide whether Soft Reserved memory overlapping CXL
regions should be owned by CXL or registered by HMEM.

If the span is fully covered by CXL regions, treat the Soft Reserved
range as owned by CXL and have HMEM skip registration. Else, let HMEM
claim the range and register the corresponding devdax for it.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/cxl/core/region.c | 80 +++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h         |  6 +++
 drivers/dax/hmem/hmem.c   | 14 ++++++-
 3 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b06fee1978ba..94dbbd6b5513 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3749,6 +3749,86 @@ static int cxl_region_debugfs_poison_clear(void *data, u64 offset)
 DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
 			 cxl_region_debugfs_poison_clear, "%llx\n");
 
+static struct cxl_region *
+cxlr_overlapping_range(struct device *dev, resource_size_t s, resource_size_t e)
+{
+	struct cxl_region *cxlr;
+	struct resource *r;
+
+	if (!is_cxl_region(dev))
+		return NULL;
+
+	cxlr = to_cxl_region(dev);
+	r = cxlr->params.res;
+	if (!r)
+		return NULL;
+
+	if (r->start > e || r->end < s)
+		return NULL;
+
+	return cxlr;
+}
+
+struct cxl_range_ctx {
+	resource_size_t start;
+	resource_size_t end;
+	resource_size_t pos;
+	resource_size_t map_end;
+	bool found;
+};
+
+static int cxl_region_map_cb(struct device *dev, void *data)
+{
+	struct cxl_range_ctx *ctx = data;
+	struct cxl_region *cxlr;
+	struct resource *r;
+
+	cxlr = cxlr_overlapping_range(dev, ctx->pos, ctx->end);
+	if (!cxlr)
+		return 0;
+
+	r = cxlr->params.res;
+	if (r->start != ctx->pos)
+		return 0;
+
+	if (!ctx->found) {
+		ctx->found = true;
+		ctx->map_end = r->end;
+		return 0;
+	}
+
+	return 1;
+}
+
+bool cxl_regions_fully_map(resource_size_t start, resource_size_t end)
+{
+	resource_size_t pos = start;
+	int rc;
+
+	while (pos <= end) {
+		struct cxl_range_ctx ctx = {
+			.start   = start,
+			.end     = end,
+			.pos = pos,
+			.found = false,
+		};
+
+		rc = bus_for_each_dev(&cxl_bus_type, NULL, &ctx,
+				      cxl_region_map_cb);
+
+		if (rc || !ctx.found || ctx.map_end > end)
+			return false;
+
+		if (ctx.map_end == end)
+			break;
+
+		pos = ctx.map_end + 1;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(cxl_regions_fully_map);
+
 static int cxl_region_can_probe(struct cxl_region *cxlr)
 {
 	struct cxl_region_params *p = &cxlr->params;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 231ddccf8977..af78c9fd37f2 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -877,6 +877,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
 int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
 struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
 u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
+bool cxl_regions_fully_map(resource_size_t start, resource_size_t end);
 #else
 static inline bool is_cxl_pmem_region(struct device *dev)
 {
@@ -899,6 +900,11 @@ static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
 {
 	return 0;
 }
+static inline bool cxl_regions_fully_map(resource_size_t start,
+					 resource_size_t end)
+{
+	return false;
+}
 #endif
 
 void cxl_endpoint_parse_cdat(struct cxl_port *port);
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index f70a0688bd11..db4c46337ac3 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,6 +3,8 @@
 #include <linux/memregion.h>
 #include <linux/module.h>
 #include <linux/dax.h>
+
+#include "../../cxl/cxl.h"
 #include "../bus.h"
 
 static bool region_idle;
@@ -150,7 +152,17 @@ static int hmem_register_device(struct device *host, int target_nid,
 static int handle_deferred_cxl(struct device *host, int target_nid,
 			       const struct resource *res)
 {
-	/* TODO: Handle region assembly failures */
+	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) != REGION_DISJOINT) {
+
+		if (cxl_regions_fully_map(res->start, res->end))
+			dax_cxl_mode = DAX_CXL_MODE_DROP;
+		else
+			dax_cxl_mode = DAX_CXL_MODE_REGISTER;
+
+		hmem_register_device(host, target_nid, res);
+	}
+
 	return 0;
 }
 
-- 
2.17.1
Re: [PATCH v4 5/9] cxl/region, dax/hmem: Arbitrate Soft Reserved ownership with cxl_regions_fully_map()
Posted by dan.j.williams@intel.com 2 months, 1 week ago
Smita Koralahalli wrote:
> Introduce cxl_regions_fully_map() to check whether CXL regions form a
> single contiguous, non-overlapping cover of a given Soft Reserved range.
> 
> Use this helper to decide whether Soft Reserved memory overlapping CXL
> regions should be owned by CXL or registered by HMEM.
> 
> If the span is fully covered by CXL regions, treat the Soft Reserved
> range as owned by CXL and have HMEM skip registration. Else, let HMEM
> claim the range and register the corresponding devdax for it.

This all feels a bit too custom when helpers like resource_contains()
exist.

Also remember that the default list of soft-reserved ranges that dax
grabs is filtered by the ACPI HMAT. So while there is a chance that one
EFI memory map entry spans multiple CXL regions, there is a lower chance
that a single ACPI HMAT range spans multiple CXL regions.

I think it is fair for Linux to be simple and require that an algorithm
of:

cxl_contains_soft_reserve()
    for_each_cxl_intersecting_hmem_resource()
        found = false
        for_each_region()
           if (resource_contains(cxl_region_resource, hmem_resource))
               found = true
        if (!found)
            return false
    return true

...should be good enough, otherwise fallback to pure hmem operation, and
do not worry about the corner cases.

If Linux really needs to understand that ACPI HMAT ranges may span
multiple CXL regions then I would want to understand more what is
driving that configuration.

Btw, I do not see a:

    guard(rwsem_read)(&cxl_rwsem.region)

...anywhere in the proposed patch. That needs to be held be sure the
region's resource settings are not changed out from underneath you. This
should probably also be checking that the region is in the commit state
because it may still be racing regions under creation post
wait_for_device_probe().

>  void cxl_endpoint_parse_cdat(struct cxl_port *port);
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index f70a0688bd11..db4c46337ac3 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,8 @@
>  #include <linux/memregion.h>
>  #include <linux/module.h>
>  #include <linux/dax.h>
> +
> +#include "../../cxl/cxl.h"
>  #include "../bus.h"
>  
>  static bool region_idle;
> @@ -150,7 +152,17 @@ static int hmem_register_device(struct device *host, int target_nid,
>  static int handle_deferred_cxl(struct device *host, int target_nid,
>  			       const struct resource *res)
>  {
> -	/* TODO: Handle region assembly failures */
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +
> +		if (cxl_regions_fully_map(res->start, res->end))
> +			dax_cxl_mode = DAX_CXL_MODE_DROP;
> +		else
> +			dax_cxl_mode = DAX_CXL_MODE_REGISTER;
> +
> +		hmem_register_device(host, target_nid, res);
> +	}
> +

I think there is enough content to just create the new
cxl_contains_soft_reserve() ABI, and then hookup handle_deferred_cxl in
a follow-on patch.
Re: [PATCH v4 5/9] cxl/region, dax/hmem: Arbitrate Soft Reserved ownership with cxl_regions_fully_map()
Posted by Koralahalli Channabasappa, Smita 1 month, 4 weeks ago
On 12/2/2025 7:50 PM, dan.j.williams@intel.com wrote:
> Smita Koralahalli wrote:
>> Introduce cxl_regions_fully_map() to check whether CXL regions form a
>> single contiguous, non-overlapping cover of a given Soft Reserved range.
>>
>> Use this helper to decide whether Soft Reserved memory overlapping CXL
>> regions should be owned by CXL or registered by HMEM.
>>
>> If the span is fully covered by CXL regions, treat the Soft Reserved
>> range as owned by CXL and have HMEM skip registration. Else, let HMEM
>> claim the range and register the corresponding devdax for it.
> 
> This all feels a bit too custom when helpers like resource_contains()
> exist.
> 
> Also remember that the default list of soft-reserved ranges that dax
> grabs is filtered by the ACPI HMAT. So while there is a chance that one
> EFI memory map entry spans multiple CXL regions, there is a lower chance
> that a single ACPI HMAT range spans multiple CXL regions.
> 
> I think it is fair for Linux to be simple and require that an algorithm
> of:
> 
> cxl_contains_soft_reserve()
>      for_each_cxl_intersecting_hmem_resource()
>          found = false
>          for_each_region()
>             if (resource_contains(cxl_region_resource, hmem_resource))
>                 found = true
>          if (!found)
>              return false
>      return true
> 
> ...should be good enough, otherwise fallback to pure hmem operation, and
> do not worry about the corner cases.
> 
> If Linux really needs to understand that ACPI HMAT ranges may span
> multiple CXL regions then I would want to understand more what is
> driving that configuration.

I was trying to handle a case like Tomasz's setup in [2], where a single 
Soft Reserved span and CFMWS cover two CXL regions:

kernel: [    0.000000][    T0] BIOS-e820: [mem 
0x0000000a90000000-0x0000000c8fffffff] soft reserved

a90000000-c8fffffff : CXL Window 0
   a90000000-b8fffffff : region1
   b90000000-c8fffffff : region0

…so I ended up with the more generic cxl_regions_fully_map() walker. I 
missed the detail that the HMAT filtered Soft reserved ranges we 
actually act on are much less likely to span multiple regions, and on 
AMD platforms we effectively have a 1:1 mapping. Im fine with 
simplifying this per your suggestion.

> 
> Btw, I do not see a:
> 
>      guard(rwsem_read)(&cxl_rwsem.region)
> 
> ...anywhere in the proposed patch. That needs to be held be sure the
> region's resource settings are not changed out from underneath you. This
> should probably also be checking that the region is in the commit state
> because it may still be racing regions under creation post
> wait_for_device_probe().

Sure, I will add this.

> 
>>   void cxl_endpoint_parse_cdat(struct cxl_port *port);
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index f70a0688bd11..db4c46337ac3 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -3,6 +3,8 @@
>>   #include <linux/memregion.h>
>>   #include <linux/module.h>
>>   #include <linux/dax.h>
>> +
>> +#include "../../cxl/cxl.h"
>>   #include "../bus.h"
>>   
>>   static bool region_idle;
>> @@ -150,7 +152,17 @@ static int hmem_register_device(struct device *host, int target_nid,
>>   static int handle_deferred_cxl(struct device *host, int target_nid,
>>   			       const struct resource *res)
>>   {
>> -	/* TODO: Handle region assembly failures */
>> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
>> +
>> +		if (cxl_regions_fully_map(res->start, res->end))
>> +			dax_cxl_mode = DAX_CXL_MODE_DROP;
>> +		else
>> +			dax_cxl_mode = DAX_CXL_MODE_REGISTER;
>> +
>> +		hmem_register_device(host, target_nid, res);
>> +	}
>> +
> 
> I think there is enough content to just create the new
> cxl_contains_soft_reserve() ABI, and then hookup handle_deferred_cxl in
> a follow-on patch.

Okay.

Thanks
Smita