drivers/cxl/core/region.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
cxl_region_remove_target() leaves a NULL pointer in the slot of the
removable endpoint decoder in p->targets array. However, p->targets
array replies on p->nr_targets to determine validity, which means when
p->nr_targets == p->interleave_ways, driver assumes all elements from
index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
violates this assumption and causes the driver to treat a NULL pointer
as a valid endpoint decoder.
To fix this issue, when a endpoint decoder is removed by
cxl_region_remove_target(), always swap the last valid endpoint decoder
pointer into the slot of removal endpoint decoder to ensure all pointers
before p->targets[p->nr_targets] are valid.
Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
Suggested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Li Ming <ming.li@zohomail.com>
---
drivers/cxl/core/region.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e90c024c8036..54018db87a4c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
p->nr_targets--;
cxled->state = CXL_DECODER_STATE_AUTO;
cxled->pos = -1;
- p->targets[i] = NULL;
+
+ /*
+ * Swap the last valid target into the slot to
+ * ensure no invalid target in p->nr_targets range.
+ * The targets array will be re-sorted during the
+ * last endpoint decoder attaching again.
+ */
+ p->targets[i] = p->targets[p->nr_targets];
+ p->targets[p->nr_targets] = NULL;
return 1;
}
---
base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
Best regards,
--
Li Ming <ming.li@zohomail.com>
On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
> cxl_region_remove_target() leaves a NULL pointer in the slot of the
> removable endpoint decoder in p->targets array. However, p->targets
> array replies on p->nr_targets to determine validity, which means when
> p->nr_targets == p->interleave_ways, driver assumes all elements from
> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
> violates this assumption and causes the driver to treat a NULL pointer
> as a valid endpoint decoder.
>
> To fix this issue, when a endpoint decoder is removed by
> cxl_region_remove_target(), always swap the last valid endpoint decoder
> pointer into the slot of removal endpoint decoder to ensure all pointers
> before p->targets[p->nr_targets] are valid.
>
> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
> Suggested-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Li Ming <ming.li@zohomail.com>
> ---
> drivers/cxl/core/region.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e90c024c8036..54018db87a4c 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
> p->nr_targets--;
> cxled->state = CXL_DECODER_STATE_AUTO;
> cxled->pos = -1;
> - p->targets[i] = NULL;
> +
> + /*
> + * Swap the last valid target into the slot to
> + * ensure no invalid target in p->nr_targets range.
> + * The targets array will be re-sorted during the
> + * last endpoint decoder attaching again.
> + */
> + p->targets[i] = p->targets[p->nr_targets];
> + p->targets[p->nr_targets] = NULL;
>
> return 1;
> }
Hi Ming,
I'm replying to top post here, but I have read the Sashiko response
and your response to that.
I'm offering review on the target list holes, but deferring on the
issue with cxl_rr_free_decoder because I think it's a narrow window
and it would not belong in *this* patch. (and maybe I'm running
out of steam too ;))
For the target list holes. I think there may be a single change
that can fix both the site you've fixed in this patch, and the
decoder detach site that Sashiko calls out.
Rather than add compaction at each removal site, make the AUTO
'appender' insert the decoder in the first free slot instead of
blindly at p->targets[p->nr_targets].
/* Use first free slot. Do not assume nr_targets is dense */
for (pos = 0; pos < p->interleave_ways; pos++)
if (!p->targets[pos])
break;
...
p->targets[pos] = cxled;
cxled->pos = pos;
My reasoning, that you'll need to prove -
- Removal only leaves a hole. The damage happens later in the appender
Fix the appender and the hole becomes harmless no matter who created it.
- It covers the cancel-auto site because a hole left by the staging
cancel is skipped by the next append, so the swap-compaction in this
patch is no longer needed.
- It covers the decoder detach site similarly (per Sashiko). The NULL
left by __cxl_decoder_detach() is filled on re-attach instead of being
appended past. Your reproduce seems like it would verify that.
- It needs no manual-vs-auto special case. An AUTO region has exactly
interleave_ways slots and interleave_ways members, so first-free-slot
keeps the array dense whenever it is full. The manual path is
untouched.
I'd probably rename to something like:
cxl/region: Fill first free targets[] slot during auto-discovery
BTW the fixes tag is not the OOB fix but is the original commit,
the same one you used in the OOB fix. 87805c32e6ad
-- Alison
>
> ---
> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
>
> Best regards,
> --
> Li Ming <ming.li@zohomail.com>
>
>
在 2026/6/4 06:40, Alison Schofield 写道:
> On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
>> cxl_region_remove_target() leaves a NULL pointer in the slot of the
>> removable endpoint decoder in p->targets array. However, p->targets
>> array replies on p->nr_targets to determine validity, which means when
>> p->nr_targets == p->interleave_ways, driver assumes all elements from
>> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
>> violates this assumption and causes the driver to treat a NULL pointer
>> as a valid endpoint decoder.
>>
>> To fix this issue, when a endpoint decoder is removed by
>> cxl_region_remove_target(), always swap the last valid endpoint decoder
>> pointer into the slot of removal endpoint decoder to ensure all pointers
>> before p->targets[p->nr_targets] are valid.
>>
>> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
>> Suggested-by: Alison Schofield <alison.schofield@intel.com>
>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>> ---
>> drivers/cxl/core/region.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index e90c024c8036..54018db87a4c 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>> p->nr_targets--;
>> cxled->state = CXL_DECODER_STATE_AUTO;
>> cxled->pos = -1;
>> - p->targets[i] = NULL;
>> +
>> + /*
>> + * Swap the last valid target into the slot to
>> + * ensure no invalid target in p->nr_targets range.
>> + * The targets array will be re-sorted during the
>> + * last endpoint decoder attaching again.
>> + */
>> + p->targets[i] = p->targets[p->nr_targets];
>> + p->targets[p->nr_targets] = NULL;
>>
>> return 1;
>> }
> Hi Ming,
>
> I'm replying to top post here, but I have read the Sashiko response
> and your response to that.
>
> I'm offering review on the target list holes, but deferring on the
> issue with cxl_rr_free_decoder because I think it's a narrow window
> and it would not belong in *this* patch. (and maybe I'm running
> out of steam too ;))
>
> For the target list holes. I think there may be a single change
> that can fix both the site you've fixed in this patch, and the
> decoder detach site that Sashiko calls out.
>
> Rather than add compaction at each removal site, make the AUTO
> 'appender' insert the decoder in the first free slot instead of
> blindly at p->targets[p->nr_targets].
>
> /* Use first free slot. Do not assume nr_targets is dense */
> for (pos = 0; pos < p->interleave_ways; pos++)
> if (!p->targets[pos])
> break;
> ...
> p->targets[pos] = cxled;
> cxled->pos = pos;
Yes, I think it can solve the problem, I will take a try.
>
>
> My reasoning, that you'll need to prove -
>
> - Removal only leaves a hole. The damage happens later in the appender
> Fix the appender and the hole becomes harmless no matter who created it.
>
> - It covers the cancel-auto site because a hole left by the staging
> cancel is skipped by the next append, so the swap-compaction in this
> patch is no longer needed.
Yes, if we use finding the first free slot for a decoder attachment, we
will not need swap-compaction. But we should reuse p->interleave_ways
instead of p->nr_targets for walking p->targets array in
cxl_region_remove_target(), consecutive detach will have problem without
that. Like this:
p->nr_targets = 4
[-1, 1, 2, 3] # __cxl_decoder_detach() called for pos 1, remove it.
p->nr_targets = 3
[-1, NULL, 2, 3] # __cxl_decoder_detach() called for pos 3, but it have
no chance to be accessed by cxl_region_remove_target().
Maybe Dave could drop the OOB fix, I will send out a patchset that
includes both the "insert decoder in the first free slot" and the OOB fix.
>
> - It covers the decoder detach site similarly (per Sashiko). The NULL
> left by __cxl_decoder_detach() is filled on re-attach instead of being
> appended past. Your reproduce seems like it would verify that.
>
> - It needs no manual-vs-auto special case. An AUTO region has exactly
> interleave_ways slots and interleave_ways members, so first-free-slot
> keeps the array dense whenever it is full. The manual path is
> untouched.
>
> I'd probably rename to something like:
> cxl/region: Fill first free targets[] slot during auto-discovery
Will do
>
> BTW the fixes tag is not the OOB fix but is the original commit,
> the same one you used in the OOB fix. 87805c32e6ad
Sure, Will fix it, thanks.
Ming
>
> -- Alison
>
>
>> ---
>> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
>> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
>>
>> Best regards,
>> --
>> Li Ming <ming.li@zohomail.com>
>>
>>
On 6/4/26 6:28 AM, Li Ming wrote:
>
> 在 2026/6/4 06:40, Alison Schofield 写道:
>> On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
>>> cxl_region_remove_target() leaves a NULL pointer in the slot of the
>>> removable endpoint decoder in p->targets array. However, p->targets
>>> array replies on p->nr_targets to determine validity, which means when
>>> p->nr_targets == p->interleave_ways, driver assumes all elements from
>>> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
>>> violates this assumption and causes the driver to treat a NULL pointer
>>> as a valid endpoint decoder.
>>>
>>> To fix this issue, when a endpoint decoder is removed by
>>> cxl_region_remove_target(), always swap the last valid endpoint decoder
>>> pointer into the slot of removal endpoint decoder to ensure all pointers
>>> before p->targets[p->nr_targets] are valid.
>>>
>>> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
>>> Suggested-by: Alison Schofield <alison.schofield@intel.com>
>>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>>> ---
>>> drivers/cxl/core/region.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index e90c024c8036..54018db87a4c 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>>> p->nr_targets--;
>>> cxled->state = CXL_DECODER_STATE_AUTO;
>>> cxled->pos = -1;
>>> - p->targets[i] = NULL;
>>> +
>>> + /*
>>> + * Swap the last valid target into the slot to
>>> + * ensure no invalid target in p->nr_targets range.
>>> + * The targets array will be re-sorted during the
>>> + * last endpoint decoder attaching again.
>>> + */
>>> + p->targets[i] = p->targets[p->nr_targets];
>>> + p->targets[p->nr_targets] = NULL;
>>> return 1;
>>> }
>> Hi Ming,
>>
>> I'm replying to top post here, but I have read the Sashiko response
>> and your response to that.
>>
>> I'm offering review on the target list holes, but deferring on the
>> issue with cxl_rr_free_decoder because I think it's a narrow window
>> and it would not belong in *this* patch. (and maybe I'm running
>> out of steam too ;))
>>
>> For the target list holes. I think there may be a single change
>> that can fix both the site you've fixed in this patch, and the
>> decoder detach site that Sashiko calls out.
>>
>> Rather than add compaction at each removal site, make the AUTO
>> 'appender' insert the decoder in the first free slot instead of
>> blindly at p->targets[p->nr_targets].
>>
>> /* Use first free slot. Do not assume nr_targets is dense */
>> for (pos = 0; pos < p->interleave_ways; pos++)
>> if (!p->targets[pos])
>> break;
>> ...
>> p->targets[pos] = cxled;
>> cxled->pos = pos;
> Yes, I think it can solve the problem, I will take a try.
>>
>>
>> My reasoning, that you'll need to prove -
>>
>> - Removal only leaves a hole. The damage happens later in the appender
>> Fix the appender and the hole becomes harmless no matter who created it.
>>
>> - It covers the cancel-auto site because a hole left by the staging
>> cancel is skipped by the next append, so the swap-compaction in this
>> patch is no longer needed.
>
> Yes, if we use finding the first free slot for a decoder attachment, we will not need swap-compaction. But we should reuse p->interleave_ways instead of p->nr_targets for walking p->targets array in cxl_region_remove_target(), consecutive detach will have problem without that. Like this:
>
> p->nr_targets = 4
>
> [-1, 1, 2, 3] # __cxl_decoder_detach() called for pos 1, remove it.
>
>
> p->nr_targets = 3
>
> [-1, NULL, 2, 3] # __cxl_decoder_detach() called for pos 3, but it have no chance to be accessed by cxl_region_remove_target().
>
>
> Maybe Dave could drop the OOB fix, I will send out a patchset that includes both the "insert decoder in the first free slot" and the OOB fix.
>
OOB fix dropped from cxl/next
DJ
>
>>
>> - It covers the decoder detach site similarly (per Sashiko). The NULL
>> left by __cxl_decoder_detach() is filled on re-attach instead of being
>> appended past. Your reproduce seems like it would verify that.
>>
>> - It needs no manual-vs-auto special case. An AUTO region has exactly
>> interleave_ways slots and interleave_ways members, so first-free-slot
>> keeps the array dense whenever it is full. The manual path is
>> untouched.
>>
>> I'd probably rename to something like:
>> cxl/region: Fill first free targets[] slot during auto-discovery
> Will do
>>
>> BTW the fixes tag is not the OOB fix but is the original commit,
>> the same one you used in the OOB fix. 87805c32e6ad
>
> Sure, Will fix it, thanks.
>
>
> Ming
>
>>
>> -- Alison
>>
>>
>>> ---
>>> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
>>> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
>>>
>>> Best regards,
>>> --
>>> Li Ming <ming.li@zohomail.com>
>>>
>>>
© 2016 - 2026 Red Hat, Inc.