drivers/cxl/core/hdm.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-)
After destroy-region, cxl_region_decode_reset() clears the HDM decoder
registers (base/size/commit). If the memdev is subsequently bounced
(disable/enable), port probe re-evaluates decoder capability via
should_emulate_decoders().
The existing logic checks each decoder's COMMITTED bit. Since those bits
are cleared by region teardown, should_emulate_decoders() incorrectly
falls back to DVSEC range emulation, even though HDM capability is still
present.
DVSEC fallback marks the endpoint decoder as AUTO, which triggers
cxl_add_to_region() -> construct_region(). That path copies the default
interleave_granularity (4096) into the region parameters. The resulting
spurious autodiscovered region consumes the CFMWS HPA space and causes a
subsequent create-region to fail in hpa_alloc().
Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED
bits to detect HDM capability. If the HDM decoder block is enabled, zeroed
registers indicate teardown, not absence of HDM support. This prevents the
unintended DVSEC fallback and subsequent region creation failure.
Based on cxl/fixes.
base-commit: 8441c7d3bd6c5a52ab2ecf77e43a5bf262004f5c
Fixes: 52cc48ad2a76 ("cxl/hdm: Limit emulation to the number of range registers")
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/cxl/core/hdm.c | 25 +++++++++----------------
1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index eb5a3a7640c6..a0718cbcc355 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -94,7 +94,6 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
struct cxl_hdm *cxlhdm;
void __iomem *hdm;
u32 ctrl;
- int i;
if (!info)
return false;
@@ -113,22 +112,16 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
return false;
/*
- * If any decoders are committed already, there should not be any
- * emulated DVSEC decoders.
+ * If HDM decoders are globally enabled, do not fall back to DVSEC
+ * range emulation. Zeroed decoder registers after region teardown
+ * do not imply absence of HDM capability.
+ *
+ * Falling back to DVSEC here would treat the decoder as AUTO and
+ * may incorrectly latch default interleave settings.
*/
- for (i = 0; i < cxlhdm->decoder_count; i++) {
- ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
- dev_dbg(&info->port->dev,
- "decoder%d.%d: committed: %ld base: %#x_%.8x size: %#x_%.8x\n",
- info->port->id, i,
- FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl),
- readl(hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i)),
- readl(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i)),
- readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i)),
- readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i)));
- if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
- return false;
- }
+ ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+ if (ctrl & CXL_HDM_DECODER_ENABLE)
+ return false;
return true;
}
--
2.17.1
Smita Koralahalli wrote: > After destroy-region, cxl_region_decode_reset() clears the HDM decoder > registers (base/size/commit). If the memdev is subsequently bounced > (disable/enable), port probe re-evaluates decoder capability via > should_emulate_decoders(). I do not think this bug is limited to "after destroy region". Simply, if the driver sees that the HDM capability is enabled before the driver loads it indicates that whomever left that configuration also intended for DVSEC range registers to be ignored. > The existing logic checks each decoder's COMMITTED bit. Since those bits > are cleared by region teardown, should_emulate_decoders() incorrectly > falls back to DVSEC range emulation, even though HDM capability is still > present. This is 2 separate bugs, right? Bug 1 destroying the register configuration of auto-assembled regions, fixed by your pending patch. Bug 2 destroying auto-assembled regions does not clean up DVSEC range registers making it look like those are set when HDM decode capability is disabled. That unintentionally / falsely mimics CXL 1.1 platform firmware behavior triggering should_emulate_decoders(). > DVSEC fallback marks the endpoint decoder as AUTO, which triggers > cxl_add_to_region() -> construct_region(). That path copies the default > interleave_granularity (4096) into the region parameters. The resulting > spurious autodiscovered region consumes the CFMWS HPA space and causes a > subsequent create-region to fail in hpa_alloc(). I do not think this part is relevant to the fix, right? Once DVSEC is being falsely used the rest is just knock-on-effects. > Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED > bits to detect HDM capability. If the HDM decoder block is enabled, ...if the HDM block is enabled the state of the register is irrelevant. > registers indicate teardown, not absence of HDM support. This prevents the > unintended DVSEC fallback and subsequent region creation failure. I think this wants 2 patches. This one you have here to always trust HDM decoder on setup, and another patch to teardown non-auto should_emulate_decoders() DVSEC configurations. Those combined with your "do not teardown auto regions" should squash this crop of setup bugs.
Hi Dan, On 3/10/2026 8:22 PM, Dan Williams wrote: > Smita Koralahalli wrote: >> After destroy-region, cxl_region_decode_reset() clears the HDM decoder >> registers (base/size/commit). If the memdev is subsequently bounced >> (disable/enable), port probe re-evaluates decoder capability via >> should_emulate_decoders(). > > I do not think this bug is limited to "after destroy region". Simply, if > the driver sees that the HDM capability is enabled before the driver > loads it indicates that whomever left that configuration also intended > for DVSEC range registers to be ignored. > >> The existing logic checks each decoder's COMMITTED bit. Since those bits >> are cleared by region teardown, should_emulate_decoders() incorrectly >> falls back to DVSEC range emulation, even though HDM capability is still >> present. > > This is 2 separate bugs, right? Bug 1 destroying the register > configuration of auto-assembled regions, fixed by your pending patch. > Bug 2 destroying auto-assembled regions does not clean up DVSEC range > registers making it look like those are set when HDM decode capability > is disabled. That unintentionally / falsely mimics CXL 1.1 platform > firmware behavior triggering should_emulate_decoders(). > >> DVSEC fallback marks the endpoint decoder as AUTO, which triggers >> cxl_add_to_region() -> construct_region(). That path copies the default >> interleave_granularity (4096) into the region parameters. The resulting >> spurious autodiscovered region consumes the CFMWS HPA space and causes a >> subsequent create-region to fail in hpa_alloc(). > > I do not think this part is relevant to the fix, right? Once DVSEC is > being falsely used the rest is just knock-on-effects. > >> Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED >> bits to detect HDM capability. If the HDM decoder block is enabled, > > ...if the HDM block is enabled the state of the register is irrelevant. > >> registers indicate teardown, not absence of HDM support. This prevents the >> unintended DVSEC fallback and subsequent region creation failure. > > I think this wants 2 patches. This one you have here to always trust HDM > decoder on setup, and another patch to teardown non-auto > should_emulate_decoders() DVSEC configurations. Those combined with your > "do not teardown auto regions" should squash this crop of setup bugs. Looking at this more closely for patch 2, if I'm not wrong, I think the problem is that cxl_setup_hdm_decoder_from_dvsec() is not setting up the interleave_granularity on the decoder. It inherits the default PAGE_SIZE (4096) from cxl_decoder_init(), and construct_region() then copies that stale value into the region params. So even in a DVSEC emulation case, the IG would be whatever cxl_decoder_init() defaulted to, not what the actual configuration is right? Is the second patch about fixing cxl_setup_hdm_decoder_from_dvsec() to properly set the interleave granularity? Please correct me if my understanding is wrong here. Thanks, Smita
Koralahalli Channabasappa, Smita wrote: [..] > > "do not teardown auto regions" should squash this crop of setup bugs. > > Looking at this more closely for patch 2, if I'm not wrong, I think the > problem is that cxl_setup_hdm_decoder_from_dvsec() is not setting up the > interleave_granularity on the decoder. It inherits the default PAGE_SIZE > (4096) from cxl_decoder_init(), and construct_region() then copies that > stale value into the region params. > > So even in a DVSEC emulation case, the IG would be whatever > cxl_decoder_init() defaulted to, not what the actual configuration is right? Remember that CXL 1.1 DVSEC range registers do not support interleave. Interleaving was a CXL 2.0 feature. So in the case where HDM decoders are actually disabled and DVSEC Range registers are sufficient, the bug is in cxl_port_setup_targets(). Specifically, when interleave_ways == 1 then interleave_granularity does not matter. It is ok for CFMWS and the region granularity numbers to mismatch in that case. So I would not expect that ig mismatch matters after: ce32b0c9c522 cxl: core/region - ignore interleave granularity when ways=1 ...did that miss an additional location where we are doing granularity checks? > Is the second patch about fixing cxl_setup_hdm_decoder_from_dvsec() to > properly set the interleave granularity? I would say just clean up the changelog per the feedback and we can start a separate thread to fixup DVSEC support. With the suggested changelog updates you can also add: Reviewed-by: Dan Williams <dan.j.williams@intel.com>
On 3/12/2026 12:42 PM, Dan Williams wrote: > Koralahalli Channabasappa, Smita wrote: > [..] >>> "do not teardown auto regions" should squash this crop of setup bugs. >> >> Looking at this more closely for patch 2, if I'm not wrong, I think the >> problem is that cxl_setup_hdm_decoder_from_dvsec() is not setting up the >> interleave_granularity on the decoder. It inherits the default PAGE_SIZE >> (4096) from cxl_decoder_init(), and construct_region() then copies that >> stale value into the region params. >> >> So even in a DVSEC emulation case, the IG would be whatever >> cxl_decoder_init() defaulted to, not what the actual configuration is right? > > Remember that CXL 1.1 DVSEC range registers do not support interleave. > Interleaving was a CXL 2.0 feature. > > So in the case where HDM decoders are actually disabled and DVSEC Range > registers are sufficient, the bug is in cxl_port_setup_targets(). > Specifically, when interleave_ways == 1 then interleave_granularity does > not matter. It is ok for CFMWS and the region granularity numbers to > mismatch in that case. > > So I would not expect that ig mismatch matters after: > > ce32b0c9c522 cxl: core/region - ignore interleave granularity when ways=1 > > ...did that miss an additional location where we are doing granularity > checks? What's actually triggering my error is likely the disabled decoder flag. The log shows: [] cxl region0: pci0000:e0:port1 cxl_port_setup_targets expected iw: 1 ig: 4096 [mem 0x850000000-0x284fffffff flags 0x200] [] cxl region0: pci0000:e0:port1 cxl_port_setup_targets got iw: 1 ig: 256 state: disabled 0x850000000:0x284fffffff The decoder is in disabled state because cxl_decoder_reset() clears CXL_DECODER_F_ENABLE during region teardown. So ce32b0c9c522 is handling the IG issue correctly and Im probably hitting failing condition with the disabled flag. I will spend some more time confirming this. > >> Is the second patch about fixing cxl_setup_hdm_decoder_from_dvsec() to >> properly set the interleave granularity? > > I would say just clean up the changelog per the feedback and we can > start a separate thread to fixup DVSEC support. > > With the suggested changelog updates you can also add: > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> Sure thanks on this. I will send out v2 just with these changes and rewriting the commit message. Thanks Smita
On Thu, Feb 12, 2026 at 10:38:00PM +0000, Smita Koralahalli wrote:
> After destroy-region, cxl_region_decode_reset() clears the HDM decoder
> registers (base/size/commit). If the memdev is subsequently bounced
> (disable/enable), port probe re-evaluates decoder capability via
> should_emulate_decoders().
>
> The existing logic checks each decoder's COMMITTED bit. Since those bits
> are cleared by region teardown, should_emulate_decoders() incorrectly
> falls back to DVSEC range emulation, even though HDM capability is still
> present.
>
> DVSEC fallback marks the endpoint decoder as AUTO, which triggers
> cxl_add_to_region() -> construct_region(). That path copies the default
> interleave_granularity (4096) into the region parameters. The resulting
> spurious autodiscovered region consumes the CFMWS HPA space and causes a
> subsequent create-region to fail in hpa_alloc().
>
> Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED
> bits to detect HDM capability. If the HDM decoder block is enabled, zeroed
> registers indicate teardown, not absence of HDM support. This prevents the
> unintended DVSEC fallback and subsequent region creation failure.
Calling attention to this one again. I'm debugging another issue in
this space, failing to rediscover a BIOS region on acpi unbind/bind.
Can we consider this one for a 7.0 fixes pull request?
I think it is important.
Alison
>
> Based on cxl/fixes.
> base-commit: 8441c7d3bd6c5a52ab2ecf77e43a5bf262004f5c
>
> Fixes: 52cc48ad2a76 ("cxl/hdm: Limit emulation to the number of range registers")
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/cxl/core/hdm.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index eb5a3a7640c6..a0718cbcc355 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -94,7 +94,6 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> struct cxl_hdm *cxlhdm;
> void __iomem *hdm;
> u32 ctrl;
> - int i;
>
> if (!info)
> return false;
> @@ -113,22 +112,16 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> return false;
>
> /*
> - * If any decoders are committed already, there should not be any
> - * emulated DVSEC decoders.
> + * If HDM decoders are globally enabled, do not fall back to DVSEC
> + * range emulation. Zeroed decoder registers after region teardown
> + * do not imply absence of HDM capability.
> + *
> + * Falling back to DVSEC here would treat the decoder as AUTO and
> + * may incorrectly latch default interleave settings.
> */
> - for (i = 0; i < cxlhdm->decoder_count; i++) {
> - ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
> - dev_dbg(&info->port->dev,
> - "decoder%d.%d: committed: %ld base: %#x_%.8x size: %#x_%.8x\n",
> - info->port->id, i,
> - FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl),
> - readl(hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i)));
> - if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
> - return false;
> - }
> + ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> + if (ctrl & CXL_HDM_DECODER_ENABLE)
> + return false;
>
> return true;
> }
> --
> 2.17.1
>
On 2/12/26 3:38 PM, Smita Koralahalli wrote:
> After destroy-region, cxl_region_decode_reset() clears the HDM decoder
> registers (base/size/commit). If the memdev is subsequently bounced
> (disable/enable), port probe re-evaluates decoder capability via
> should_emulate_decoders().
>
> The existing logic checks each decoder's COMMITTED bit. Since those bits
> are cleared by region teardown, should_emulate_decoders() incorrectly
> falls back to DVSEC range emulation, even though HDM capability is still
> present.
>
> DVSEC fallback marks the endpoint decoder as AUTO, which triggers
> cxl_add_to_region() -> construct_region(). That path copies the default
> interleave_granularity (4096) into the region parameters. The resulting
> spurious autodiscovered region consumes the CFMWS HPA space and causes a
> subsequent create-region to fail in hpa_alloc().
>
> Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED
> bits to detect HDM capability. If the HDM decoder block is enabled, zeroed
> registers indicate teardown, not absence of HDM support. This prevents the
> unintended DVSEC fallback and subsequent region creation failure.
>
> Based on cxl/fixes.
> base-commit: 8441c7d3bd6c5a52ab2ecf77e43a5bf262004f5c
This block should go under the '---'
DJ
>
> Fixes: 52cc48ad2a76 ("cxl/hdm: Limit emulation to the number of range registers")
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/cxl/core/hdm.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index eb5a3a7640c6..a0718cbcc355 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -94,7 +94,6 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> struct cxl_hdm *cxlhdm;
> void __iomem *hdm;
> u32 ctrl;
> - int i;
>
> if (!info)
> return false;
> @@ -113,22 +112,16 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> return false;
>
> /*
> - * If any decoders are committed already, there should not be any
> - * emulated DVSEC decoders.
> + * If HDM decoders are globally enabled, do not fall back to DVSEC
> + * range emulation. Zeroed decoder registers after region teardown
> + * do not imply absence of HDM capability.
> + *
> + * Falling back to DVSEC here would treat the decoder as AUTO and
> + * may incorrectly latch default interleave settings.
> */
> - for (i = 0; i < cxlhdm->decoder_count; i++) {
> - ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
> - dev_dbg(&info->port->dev,
> - "decoder%d.%d: committed: %ld base: %#x_%.8x size: %#x_%.8x\n",
> - info->port->id, i,
> - FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl),
> - readl(hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i)));
> - if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
> - return false;
> - }
> + ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> + if (ctrl & CXL_HDM_DECODER_ENABLE)
> + return false;
>
> return true;
> }
On Thu, Feb 12, 2026 at 10:38:00PM +0000, Smita Koralahalli wrote:
> After destroy-region, cxl_region_decode_reset() clears the HDM decoder
> registers (base/size/commit). If the memdev is subsequently bounced
> (disable/enable), port probe re-evaluates decoder capability via
> should_emulate_decoders().
>
> The existing logic checks each decoder's COMMITTED bit. Since those bits
> are cleared by region teardown, should_emulate_decoders() incorrectly
> falls back to DVSEC range emulation, even though HDM capability is still
> present.
>
> DVSEC fallback marks the endpoint decoder as AUTO, which triggers
> cxl_add_to_region() -> construct_region(). That path copies the default
> interleave_granularity (4096) into the region parameters. The resulting
> spurious autodiscovered region consumes the CFMWS HPA space and causes a
> subsequent create-region to fail in hpa_alloc().
>
> Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED
> bits to detect HDM capability. If the HDM decoder block is enabled, zeroed
> registers indicate teardown, not absence of HDM support. This prevents the
> unintended DVSEC fallback and subsequent region creation failure.
Nice find Smita!
I tried this out following your recipe w an auto region:
disable/destroy the region then disable/enable one memdev.
There was a problem before the patch that went away after the patch,
but the signature was different. In my case, the endpoint tried and
failed to begin construction on a new auto region yet still blocked
recreation of the original region because it consumed an endpoint
decoder, not the HPA space.
Memdev enable led to this:
[] cxl_pci 0000:da:00.0: mem3:decoder12.0 no CXL window for range 0x0:0x1fffffffff
which comes from cxl_add_to_region()->get_cxl_root_decoder()
And then the original region fails to recreate with:
cxl region: cxl_memdev_find_decoder: could not get a free decoder for mem3
Did you see any of that or totally different messaging?
It would be nice to confirm any varietals here and add useful signatures
to the commit log to help with searches.
-- Alison
>
> Based on cxl/fixes.
> base-commit: 8441c7d3bd6c5a52ab2ecf77e43a5bf262004f5c
>
> Fixes: 52cc48ad2a76 ("cxl/hdm: Limit emulation to the number of range registers")
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/cxl/core/hdm.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index eb5a3a7640c6..a0718cbcc355 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -94,7 +94,6 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> struct cxl_hdm *cxlhdm;
> void __iomem *hdm;
> u32 ctrl;
> - int i;
>
> if (!info)
> return false;
> @@ -113,22 +112,16 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> return false;
>
> /*
> - * If any decoders are committed already, there should not be any
> - * emulated DVSEC decoders.
> + * If HDM decoders are globally enabled, do not fall back to DVSEC
> + * range emulation. Zeroed decoder registers after region teardown
> + * do not imply absence of HDM capability.
> + *
> + * Falling back to DVSEC here would treat the decoder as AUTO and
> + * may incorrectly latch default interleave settings.
> */
> - for (i = 0; i < cxlhdm->decoder_count; i++) {
> - ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
> - dev_dbg(&info->port->dev,
> - "decoder%d.%d: committed: %ld base: %#x_%.8x size: %#x_%.8x\n",
> - info->port->id, i,
> - FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl),
> - readl(hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i)),
> - readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i)));
> - if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
> - return false;
> - }
> + ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> + if (ctrl & CXL_HDM_DECODER_ENABLE)
> + return false;
>
> return true;
> }
> --
> 2.17.1
>
On 2/12/2026 8:32 PM, Alison Schofield wrote:
> On Thu, Feb 12, 2026 at 10:38:00PM +0000, Smita Koralahalli wrote:
>> After destroy-region, cxl_region_decode_reset() clears the HDM decoder
>> registers (base/size/commit). If the memdev is subsequently bounced
>> (disable/enable), port probe re-evaluates decoder capability via
>> should_emulate_decoders().
>>
>> The existing logic checks each decoder's COMMITTED bit. Since those bits
>> are cleared by region teardown, should_emulate_decoders() incorrectly
>> falls back to DVSEC range emulation, even though HDM capability is still
>> present.
>>
>> DVSEC fallback marks the endpoint decoder as AUTO, which triggers
>> cxl_add_to_region() -> construct_region(). That path copies the default
>> interleave_granularity (4096) into the region parameters. The resulting
>> spurious autodiscovered region consumes the CFMWS HPA space and causes a
>> subsequent create-region to fail in hpa_alloc().
>>
>> Use the global CXL_HDM_DECODER_ENABLE bit instead of per-decoder COMMITTED
>> bits to detect HDM capability. If the HDM decoder block is enabled, zeroed
>> registers indicate teardown, not absence of HDM support. This prevents the
>> unintended DVSEC fallback and subsequent region creation failure.
>
> Nice find Smita!
>
> I tried this out following your recipe w an auto region:
> disable/destroy the region then disable/enable one memdev.
>
> There was a problem before the patch that went away after the patch,
> but the signature was different. In my case, the endpoint tried and
> failed to begin construction on a new auto region yet still blocked
> recreation of the original region because it consumed an endpoint
> decoder, not the HPA space.
>
> Memdev enable led to this:
> [] cxl_pci 0000:da:00.0: mem3:decoder12.0 no CXL window for range 0x0:0x1fffffffff
> which comes from cxl_add_to_region()->get_cxl_root_decoder()
>
> And then the original region fails to recreate with:
> cxl region: cxl_memdev_find_decoder: could not get a free decoder for mem3
>
> Did you see any of that or totally different messaging?
>
> It would be nice to confirm any varietals here and add useful signatures
> to the commit log to help with searches.
>
> -- Alison
>
Thanks for testing! My error signature is different from yours.
In my case, the DVSEC fallback triggers construct_region() which latches
the default IG (4096) into the region params. This causes an IG mismatch
during target setup.
After cxl enable-memdev:
[] should_emulate_decoders: cxl_port endpoint6: decoder6.0: committed: 0
base: 0x0_00000000 size: 0x0_00000000
[] devm_cxl_setup_hdm: cxl_port endpoint6: Fallback map 1 range register
[] add_hdm_decoder: cxl_mem mem1: decoder6.0 added to endpoint6
[] devm_cxl_add_region: cxl_acpi ACPI0017:00: decoder0.0: created region0
[] __construct_region: cxl_pci 0000:e1:00.0: mem1:decoder6.0:
__construct_region region0 res: [mem 0x850000000-0x284fffffff flags
0x200] iw: 1 ig: 4096
[] cxl region0: pci0000:e0:port1 cxl_port_setup_targets expected iw: 1
ig: 4096 [mem 0x850000000-0x284fffffff flags 0x200]
[] cxl region0: pci0000:e0:port1 cxl_port_setup_targets got iw: 1 ig:
256 state: disabled 0x850000000:0x284fffffff
[] cxl_port endpoint6: failed to attach decoder6.0 to region0: -6
The spurious auto-discovered region then consumes the CFMWS HPA space,
so the subsequent create-region fails with:
[] devm_cxl_add_region: cxl_acpi ACPI0017:00: decoder0.0: created region4
[] alloc_hpa: cxl region4: HPA allocation error (-34) for
size:0x0000002000000000 in CXL Window 0 [mem 0x850000000-0x284fffffff
flags 0x200]
Both go away with the global CXL_HDM_DECODER_ENABLE check.
I will add the signatures to the commit message in v2.
Thanks
Smita
>>
>> Based on cxl/fixes.
>> base-commit: 8441c7d3bd6c5a52ab2ecf77e43a5bf262004f5c
>>
>> Fixes: 52cc48ad2a76 ("cxl/hdm: Limit emulation to the number of range registers")
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/cxl/core/hdm.c | 25 +++++++++----------------
>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index eb5a3a7640c6..a0718cbcc355 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
>> @@ -94,7 +94,6 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
>> struct cxl_hdm *cxlhdm;
>> void __iomem *hdm;
>> u32 ctrl;
>> - int i;
>>
>> if (!info)
>> return false;
>> @@ -113,22 +112,16 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
>> return false;
>>
>> /*
>> - * If any decoders are committed already, there should not be any
>> - * emulated DVSEC decoders.
>> + * If HDM decoders are globally enabled, do not fall back to DVSEC
>> + * range emulation. Zeroed decoder registers after region teardown
>> + * do not imply absence of HDM capability.
>> + *
>> + * Falling back to DVSEC here would treat the decoder as AUTO and
>> + * may incorrectly latch default interleave settings.
>> */
>> - for (i = 0; i < cxlhdm->decoder_count; i++) {
>> - ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
>> - dev_dbg(&info->port->dev,
>> - "decoder%d.%d: committed: %ld base: %#x_%.8x size: %#x_%.8x\n",
>> - info->port->id, i,
>> - FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl),
>> - readl(hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i)),
>> - readl(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i)),
>> - readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i)),
>> - readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i)));
>> - if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
>> - return false;
>> - }
>> + ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
>> + if (ctrl & CXL_HDM_DECODER_ENABLE)
>> + return false;
>>
>> return true;
>> }
>> --
>> 2.17.1
>>
© 2016 - 2026 Red Hat, Inc.