[PATCH v2] cxl: core/region - ignore interleave granularity when ways=1

Gregory Price posted 1 patch 1 day, 13 hours ago
drivers/cxl/core/region.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v2] cxl: core/region - ignore interleave granularity when ways=1
Posted by Gregory Price 1 day, 13 hours ago
When validating decoder IW/IG when setting up regions, the granularity
is irrelevant when iw=1 - all accesses will always route to the only
target anyway - so all ig values are "correct". Loosen the requirement
that `ig = (parent_iw * parent_ig)` when iw=1.

On some Zen5 platforms, the platform BIOS specifies a 256-byte
interleave granularity window for host bridges when there is only
one target downstream.  This leads to Linux rejecting the configuration
of a region with a x2 root with two x1 hostbridges.

Decoder Programming:
   root - iw:2 ig:256
   hb1  - iw:1 ig:256  (Linux expects 512)
   hb2  - iw:1 ig:256  (Linux expects 512)
   ep1  - iw:2 ig:256
   ep2  - iw:2 ig:256

This change allows all decoders downstream of a passthrough decoder to
also be configured as passthrough (iw:1 ig:X), but still disallows
downstream decoders from applying subsequent interleaves.

e.g. in the above example if there was another decoder south of hb1
attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512
because the southern decoder would have iw:2 and require ig=pig*piw.

Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/region.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 04bc6cad092c..dec262eadf9a 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1553,7 +1553,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 
 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
 		if (cxld->interleave_ways != iw ||
-		    cxld->interleave_granularity != ig ||
+		    (iw > 1 && cxld->interleave_granularity != ig) ||
 		    cxled->spa_range.start != p->res->start ||
 		    cxled->spa_range.end != p->res->end ||
 		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
-- 
2.47.1
Re: [PATCH v2] cxl: core/region - ignore interleave granularity when ways=1
Posted by Dan Williams 1 day, 13 hours ago
Gregory Price wrote:
> When validating decoder IW/IG when setting up regions, the granularity
> is irrelevant when iw=1 - all accesses will always route to the only
> target anyway - so all ig values are "correct". Loosen the requirement
> that `ig = (parent_iw * parent_ig)` when iw=1.
> 
> On some Zen5 platforms, the platform BIOS specifies a 256-byte
> interleave granularity window for host bridges when there is only
> one target downstream.  This leads to Linux rejecting the configuration
> of a region with a x2 root with two x1 hostbridges.
> 
> Decoder Programming:
>    root - iw:2 ig:256
>    hb1  - iw:1 ig:256  (Linux expects 512)
>    hb2  - iw:1 ig:256  (Linux expects 512)
>    ep1  - iw:2 ig:256
>    ep2  - iw:2 ig:256
> 
> This change allows all decoders downstream of a passthrough decoder to
> also be configured as passthrough (iw:1 ig:X), but still disallows
> downstream decoders from applying subsequent interleaves.
> 
> e.g. in the above example if there was another decoder south of hb1
> attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512
> because the southern decoder would have iw:2 and require ig=pig*piw.
> 
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>

Thanks for the respin!

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Re: [PATCH v2] cxl: core/region - ignore interleave granularity when ways=1
Posted by Zhijian Li (Fujitsu) 1 day, 12 hours ago
Hi Gregory and CXL community
Cc Goto-san

Wow, our platform has encountered a similar issue, and I am intending to consult
the community regarding this matter.

I drafted similar patch locally, but I wonder if we should "ignore" the IG
or "program" the IG to the decoder.

Let me post the mail(question) from my drafts in your thread(I hope you I hope you won't mind).
======================================
[Question] granularity is a don't care if not interleaving?
I saw this sentence " granularity is a don't care if not interleaving" in this
patch "[ndctl PATCH 2/6] cxl/list: Add interleave parameters to decoder listings" [1]

This reminds me that our platform programed an unmatched interleave_granularity in HDM decoders
between endpoint and the host-bridge, see below:

                               CXL  Root
             CFMW0           /           \      CFMW1
            decoder0.0                        decoder0.1
       128 GiB       IW:1  IG:256           IW:1  IG:256      128 GiB
                              \           /
                               Host Bridge
                              /           \
                     decoder5.0           decoder5.1
                  IW:1  IG:256            IW:1  IG:256
	      /                                  \
          Endpoint9                           Endpoint10
               |                                   |
          decoder9.0                           decoder10.0
         IW:1 IG:1024                          IW:1 IG:1024

With this setup, the Linux kernel attempts to create regions for Endpoint9 and Endpoint10
but fails because the endpoint decoders’ interleave granularity (IG=1024) does not
match their parent decoders’ IG (256). Ideally, the endpoint decoders are expected to be
configured for IG=256.

Currently, we learned that we have only special handling for the root decoders [2][3].

My question are:
Q1: whether "granularity is a don't care if not interleaving" is applied to
all HDM decoders(including root decoder and HDM decoder)

In current cxl cli , it will not show any interleave_granularity at all when ways==1(no-interleaving)
$ cxl list -PDE | grep granularity  # show nothing when ways==1

Per the CXL Spec r3.1
IG: "The number of consecutive bytes that are assigned to each target in the Target List."
Q2: Does this imply a configuration where the number of ways>1?

Q3: Does the IG also represent the device's capabilities? When programming, should one also
consider whether the device supports it?


If "granularity is a don't care if not interleaving" is true, how about below changes

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 75cd5dbb41e4..647fe2ce18ca 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1435,6 +1435,11 @@ static int cxl_port_setup_targets(struct cxl_port *port,
+               if (cxld->interleave_ways == 1 && cxld->interleave_granularity != ig) {
+                       cxld->interleave_granularity = ig;
+                       /* program HDM decoder */
+                       ...
+               }
                 if (cxld->interleave_ways != iw ||
                     cxld->interleave_granularity != ig ||
                     cxld->hpa_range.start != p->res->start ||



[1] https://lore.kernel.org/all/165973188300.1528532.222988685552982872.stgit@dwillia2-xfh.jf.intel.com/
[2] https://lore.kernel.org/all/165853776917.2430596.16823264262010844458.stgit@dwillia2-xfh.jf.intel.com/
[3] https://lore.kernel.org/all/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in/

Thanks
Zhijian

On 03/04/2025 07:25, Gregory Price wrote:
> When validating decoder IW/IG when setting up regions, the granularity
> is irrelevant when iw=1 - all accesses will always route to the only
> target anyway - so all ig values are "correct". Loosen the requirement
> that `ig = (parent_iw * parent_ig)` when iw=1.
> 
> On some Zen5 platforms, the platform BIOS specifies a 256-byte
> interleave granularity window for host bridges when there is only
> one target downstream.  This leads to Linux rejecting the configuration
> of a region with a x2 root with two x1 hostbridges.
> 
> Decoder Programming:
>     root - iw:2 ig:256
>     hb1  - iw:1 ig:256  (Linux expects 512)
>     hb2  - iw:1 ig:256  (Linux expects 512)
>     ep1  - iw:2 ig:256
>     ep2  - iw:2 ig:256
> 
> This change allows all decoders downstream of a passthrough decoder to
> also be configured as passthrough (iw:1 ig:X), but still disallows
> downstream decoders from applying subsequent interleaves.
> 
> e.g. in the above example if there was another decoder south of hb1
> attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512
> because the southern decoder would have iw:2 and require ig=pig*piw.
> 
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>   drivers/cxl/core/region.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 04bc6cad092c..dec262eadf9a 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1553,7 +1553,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>   
>   	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
>   		if (cxld->interleave_ways != iw ||
> -		    cxld->interleave_granularity != ig ||
> +		    (iw > 1 && cxld->interleave_granularity != ig) ||
>   		    cxled->spa_range.start != p->res->start ||
>   		    cxled->spa_range.end != p->res->end ||
>   		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
Re: [PATCH v2] cxl: core/region - ignore interleave granularity when ways=1
Posted by Dan Williams 1 day, 8 hours ago
Zhijian Li (Fujitsu) wrote:
> Hi Gregory and CXL community
> Cc Goto-san
> 
> Wow, our platform has encountered a similar issue, and I am intending to consult
> the community regarding this matter.
> 
> I drafted similar patch locally, but I wonder if we should "ignore" the IG
> or "program" the IG to the decoder.
> 
> Let me post the mail(question) from my drafts in your thread(I hope you I hope you won't mind).
> ======================================
> [Question] granularity is a don't care if not interleaving?
> I saw this sentence " granularity is a don't care if not interleaving" in this
> patch "[ndctl PATCH 2/6] cxl/list: Add interleave parameters to decoder listings" [1]
> 
> This reminds me that our platform programed an unmatched interleave_granularity in HDM decoders
> between endpoint and the host-bridge, see below:
> 
>                                CXL  Root
>              CFMW0           /           \      CFMW1
>             decoder0.0                        decoder0.1
>        128 GiB       IW:1  IG:256           IW:1  IG:256      128 GiB
>                               \           /
>                                Host Bridge
>                               /           \
>                      decoder5.0           decoder5.1
>                   IW:1  IG:256            IW:1  IG:256
> 	      /                                  \
>           Endpoint9                           Endpoint10
>                |                                   |
>           decoder9.0                           decoder10.0
>          IW:1 IG:1024                          IW:1 IG:1024

Why 1024? Yes, the value does not matter, but attempting 1024 feels more
like a unit test than a production use case.

> 
> With this setup, the Linux kernel attempts to create regions for Endpoint9 and Endpoint10
> but fails because the endpoint decoders’ interleave granularity (IG=1024) does not
> match their parent decoders’ IG (256). Ideally, the endpoint decoders are expected to be
> configured for IG=256.
> 
> Currently, we learned that we have only special handling for the root decoders [2][3].
> 
> My question are:
> Q1: whether "granularity is a don't care if not interleaving" is applied to
> all HDM decoders(including root decoder and HDM decoder)

All decoders.

> In current cxl cli , it will not show any interleave_granularity at all when ways==1(no-interleaving)
> $ cxl list -PDE | grep granularity  # show nothing when ways==1

Right, because the value theoretically has no functional impact in the
ways==1 case. However, it errantly ends up having practical impact in
these corners cases where code performs granularity comparisons without
considering that ways may be 1.

> Per the CXL Spec r3.1
> IG: "The number of consecutive bytes that are assigned to each target in the Target List."
> Q2: Does this imply a configuration where the number of ways>1?

Right, the granularity is the boundary at which the decoder switches to
the next target in the target list. When ways=1 granularity can be
infinity or zero by that definition.

> Q3: Does the IG also represent the device's capabilities? When programming, should one also
> consider whether the device supports it?

Yes, see bits [9:8] in the CXL HDM Decoder Capability Register (CXL 3.2
8.2.4.20.1). So even though the math should not matter, I would still
expect the driver to try to be careful to make sure that IG+8 is less
than the address-bit max.

> If "granularity is a don't care if not interleaving" is true, how about below changes

Part of me says, "yes, that should be ok", another part of me says "what
is the practical benefit of allowing any granularity to be specified?".

So the fix from Gregory is limited to the case of "whoops, the platform
BIOS thought this was a good idea even though it does not matter in
practice, teach Linux to be lenient in this case.".

The proposal to accept that in all case allows user-created regions to
have odd large granularity sizes in the iw=1 case, and I am skeptical
it is worth supporting that now.
Re: [PATCH v2] cxl: core/region - ignore interleave granularity when ways=1
Posted by Zhijian Li (Fujitsu) 1 day, 4 hours ago
Hi Dan

I am grateful for your prompt response.


On 03/04/2025 12:42, Dan Williams wrote:
> Zhijian Li (Fujitsu) wrote:
>> Hi Gregory and CXL community
>> Cc Goto-san
>>
>>                 |                                   |
>>            decoder9.0                           decoder10.0
>>           IW:1 IG:1024                          IW:1 IG:1024
> 
> Why 1024? Yes, the value does not matter, but attempting 1024 feels more
> like a unit test than a production use case.

I am uncertain, it appears to be this way when we get the device.
I presume it should not be a side effect in no-interleaving case.


Thank you for your answers to these questions. Your reply has truly
cleared up my confusion. Once again, thank you!

Thanks
Zhijian

>>
>> My question are:
>> Q1: whether "granularity is a don't care if not interleaving" is applied to
>> all HDM decoders(including root decoder and HDM decoder)
> 
> All decoders.> 
>> In current cxl cli , it will not show any interleave_granularity at all when ways==1(no-interleaving)
>> $ cxl list -PDE | grep granularity  # show nothing when ways==1
> 
> Right, because the value theoretically has no functional impact in the
> ways==1 case. However, it errantly ends up having practical impact in
> these corners cases where code performs granularity comparisons without
> considering that ways may be 1.
> 
>> Per the CXL Spec r3.1
>> IG: "The number of consecutive bytes that are assigned to each target in the Target List."
>> Q2: Does this imply a configuration where the number of ways>1?
> 
> Right, the granularity is the boundary at which the decoder switches to
> the next target in the target list. When ways=1 granularity can be
> infinity or zero by that definition.
> 
>> Q3: Does the IG also represent the device's capabilities? When programming, should one also
>> consider whether the device supports it?
> 
> Yes, see bits [9:8] in the CXL HDM Decoder Capability Register (CXL 3.2
> 8.2.4.20.1). So even though the math should not matter, I would still
> expect the driver to try to be careful to make sure that IG+8 is less
> than the address-bit max.

> 
>> If "granularity is a don't care if not interleaving" is true, how about below changes
> 
> Part of me says, "yes, that should be ok", another part of me says "what
> is the practical benefit of allowing any granularity to be specified?".
> 
> So the fix from Gregory is limited to the case of "whoops, the platform
> BIOS thought this was a good idea even though it does not matter in
> practice, teach Linux to be lenient in this case.".
> 
> The proposal to accept that in all case allows user-created regions to
> have odd large granularity sizes in the iw=1 case, and I am skeptical
> it is worth supporting that now.