Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
instructed program_hpx_type2() to set the RCB in an endpoint,
although it's RC did not have the RCB bit set.
e42010d8207f fixed that by qualifying the setting of the RCB in the
endpoint with the RC supporting an 128 byte RCB.
In retrospect, the program_hpx_type2() should only modify the AER
bits, and stay away from fiddling with the Link Control Register.
Hence, we explicitly program the RCB from pci_configure_device().
According to PCIe r7.0, sec 7.5.3.7, RCB is only valid for Root Ports
(where it is Read-Only), Bridges, and Endpoints. The bit is 'RsvdP'
for Virtual Functions. Hence, for other cases than Bridges and Physical
Endpoints, we bail out early from pci_configure_rcb().
If the Root Port's RCB cannot be determined, we do nothing.
If RCB is set in the Root Port and not in the device, we set it. If it
is set in the device but not in the Root Port, we print an info
message and reset it.
Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
Note, that the current duplication of pcie_root_rcb_set() will be
removed in the next commit.
v2 -> v3:
* Qualified the device types more strictly
* s/pcie_root_rcb_set/pcie_read_root_rcb/ and changed signature
* Do nothing if the RP's RCB cannot be determined
* Reset the device's RCB if not set in the RP
---
drivers/pci/probe.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d9..7165ac4065c97 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2410,6 +2410,58 @@ static void pci_configure_serr(struct pci_dev *dev)
}
}
+static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
+{
+ struct pci_dev *rp = pcie_find_root_port(dev);
+ u16 lnkctl;
+
+ if (!rp)
+ return false;
+
+ pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+
+ *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
+ return true;
+}
+
+static void pci_configure_rcb(struct pci_dev *dev)
+{
+ u16 lnkctl;
+ bool rcb;
+
+ /*
+ * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
+ * Ports (where it is read-only), Endpoints, and Bridges. It
+ * may only be set for Endpoints and Bridges if it is set in
+ * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
+ * Functions. If the Root Port's RCB cannot be determined, we
+ * bail out.
+ */
+ if (!pci_is_pcie(dev) ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
+ dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
+ return;
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+ if (rcb) {
+ if (lnkctl & PCI_EXP_LNKCTL_RCB)
+ return;
+
+ lnkctl |= PCI_EXP_LNKCTL_RCB;
+ } else {
+ if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
+ return;
+
+ pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
+ lnkctl &= ~PCI_EXP_LNKCTL_RCB;
+ }
+
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+}
+
static void pci_configure_device(struct pci_dev *dev)
{
pci_configure_mps(dev);
@@ -2419,6 +2471,7 @@ static void pci_configure_device(struct pci_dev *dev)
pci_configure_aspm_l1ss(dev);
pci_configure_eetlp_prefix(dev);
pci_configure_serr(dev);
+ pci_configure_rcb(dev);
pci_acpi_program_hp_params(dev);
}
--
2.43.5
On Thu, Jan 22, 2026 at 02:09:53PM +0100, Håkon Bugge wrote:
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
> ...
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> + if (rcb) {
> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
> + return;
> +
> + lnkctl |= PCI_EXP_LNKCTL_RCB;
> + } else {
> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> + return;
> +
> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
I know I suggested all this code and the message, but I'm not sure
it's worth it. If the device doesn't work, that will be obvious
anyway, so this all feels over-engineered.
> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> + }
> +
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> +}
What do you think about this?
PCI: Initialize RCB from pci_configure_device()
Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root
Port supports it (_HPX)") worked around a bogus _HPX type 2 record, which
caused program_hpx_type2() to set the RCB in an endpoint even though the
Root Port did not have the RCB bit set.
e42010d8207f fixed that by setting the RCB in the endpoint only when it was
set in the Root Port.
In retrospect, program_hpx_type2() is intended for AER-related settings,
and the RCB should be configured elsewhere so it doesn't depend on the
presence or contents of an _HPX record.
Explicitly program the RCB from pci_configure_device() so it matches the
Root Port's RCB. The Root Port may not be visible to virtualized guests;
in that case, leave RCB alone.
Fixes: e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d..8571c7c6e1a0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2410,6 +2410,37 @@ static void pci_configure_serr(struct pci_dev *dev)
}
}
+static void pci_configure_rcb(struct pci_dev *dev)
+{
+ struct pci_dev *rp;
+ u16 rp_lnkctl;
+
+ /*
+ * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
+ * (where it is read-only), Endpoints, and Bridges. It may only be
+ * set for Endpoints and Bridges if it is set in the Root Port. For
+ * Endpoints, it is 'RsvdP' for Virtual Functions.
+ */
+ if (!pci_is_pcie(dev) ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
+ dev->is_virtfn)
+ return;
+
+ /* Root Port often not visible to virtualized guests */
+ rp = pcie_find_root_port(dev);
+ if (!rp)
+ return;
+
+ pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &rp_lnkctl);
+ pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
+ PCI_EXP_LNKCTL_RCB,
+ (rp_lnkctl & PCI_EXP_LNKCTL_RCB) ?
+ PCI_EXP_LNKCTL_RCB : 0);
+}
+
static void pci_configure_device(struct pci_dev *dev)
{
pci_configure_mps(dev);
@@ -2419,6 +2450,7 @@ static void pci_configure_device(struct pci_dev *dev)
pci_configure_aspm_l1ss(dev);
pci_configure_eetlp_prefix(dev);
pci_configure_serr(dev);
+ pci_configure_rcb(dev);
pci_acpi_program_hp_params(dev);
}
> On Thu, Jan 22, 2026 at 02:09:53PM +0100, Håkon Bugge wrote:
>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>> instructed program_hpx_type2() to set the RCB in an endpoint,
>> although it's RC did not have the RCB bit set.
>> ...
>>
>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>> + if (rcb) {
>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
>> + return;
>> +
>> + lnkctl |= PCI_EXP_LNKCTL_RCB;
>> + } else {
>> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>> + return;
>> +
>> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>
> I know I suggested all this code and the message, but I'm not sure
> it's worth it. If the device doesn't work,
I see your point. If the situation my commit is fixing has been there, the system would have failed, and a fix to the firmware must have been applied. Hence, so need to fix it in the OS.
> that will be obvious
> anyway, so this all feels over-engineered.
>
>> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> + }
>> +
>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>> +}
>
> What do you think about this?
>
> PCI: Initialize RCB from pci_configure_device()
>
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root
> Port supports it (_HPX)") worked around a bogus _HPX type 2 record, which
> caused program_hpx_type2() to set the RCB in an endpoint even though the
> Root Port did not have the RCB bit set.
>
> e42010d8207f fixed that by setting the RCB in the endpoint only when it was
> set in the Root Port.
>
> In retrospect, program_hpx_type2() is intended for AER-related settings,
> and the RCB should be configured elsewhere so it doesn't depend on the
> presence or contents of an _HPX record.
>
> Explicitly program the RCB from pci_configure_device() so it matches the
> Root Port's RCB. The Root Port may not be visible to virtualized guests;
> in that case, leave RCB alone.
>
> Fixes: e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
Crisp and clear. For this and the other commit, is it OK that I add you as a co-developer? Aka:
Co-developed-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
?
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d..8571c7c6e1a0 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2410,6 +2410,37 @@ static void pci_configure_serr(struct pci_dev *dev)
> }
> }
>
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> + struct pci_dev *rp;
> + u16 rp_lnkctl;
> +
> + /*
> + * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
> + * (where it is read-only), Endpoints, and Bridges. It may only be
> + * set for Endpoints and Bridges if it is set in the Root Port. For
> + * Endpoints, it is 'RsvdP' for Virtual Functions.
> + */
> + if (!pci_is_pcie(dev) ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
> + dev->is_virtfn)
> + return;
> +
> + /* Root Port often not visible to virtualized guests */
> + rp = pcie_find_root_port(dev);
> + if (!rp)
> + return;
> +
> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &rp_lnkctl);
> + pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> + PCI_EXP_LNKCTL_RCB,
> + (rp_lnkctl & PCI_EXP_LNKCTL_RCB) ?
> + PCI_EXP_LNKCTL_RCB : 0);
Looks good to me! This will enforce the locked flavour of pcie_capability_clear_and_set_word(). Is that an overkill?
Again, thank for the effort you put into this, ,Bjorn!
Thxs, Håkon
> +}
> +
> static void pci_configure_device(struct pci_dev *dev)
> {
> pci_configure_mps(dev);
> @@ -2419,6 +2450,7 @@ static void pci_configure_device(struct pci_dev *dev)
> pci_configure_aspm_l1ss(dev);
> pci_configure_eetlp_prefix(dev);
> pci_configure_serr(dev);
> + pci_configure_rcb(dev);
>
> pci_acpi_program_hp_params(dev);
> }
On Wed, Jan 28, 2026 at 05:08:23PM +0000, Haakon Bugge wrote:
> > On Thu, Jan 22, 2026 at 02:09:53PM +0100, Håkon Bugge wrote:
> >> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> >> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> >> instructed program_hpx_type2() to set the RCB in an endpoint,
> >> although it's RC did not have the RCB bit set.
> >> ...
> For this and the other commit, is it OK that I add
> you as a co-developer? Aka:
>
> Co-developed-by: Bjorn Helgaas <helgaas@kernel.org>
> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
No need for co-developed-by, IMO this is just part of normal patch
iteration. And I'll add my Signed-off-by when merging the patches.
> > + pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> > + PCI_EXP_LNKCTL_RCB,
> > + (rp_lnkctl & PCI_EXP_LNKCTL_RCB) ?
> > + PCI_EXP_LNKCTL_RCB : 0);
>
> Looks good to me! This will enforce the locked flavour of
> pcie_capability_clear_and_set_word(). Is that an overkill?
Probably no need for locking in this instance, but it's a per-device
lock so there won't be any contention anyway.
Bjorn
On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
>
--- snip ---
>
> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
> +{
> + struct pci_dev *rp = pcie_find_root_port(dev);
> + u16 lnkctl;
> +
> + if (!rp)
> + return false;
> +
> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> + return true;
> +}
In principle this is one of these things where platforms like s390
where the root port is hidden (only s390?) might want a hook to use
firmware supplied information to determine if the hidden root port
supports the setting. I wonder if this would make sense as a __weak
pcibios_read_rcb_supported() function. Not saying we need this now,
just thinking out loud.
> +
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> + u16 lnkctl;
> + bool rcb;
> +
> + /*
> + * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
> + * Ports (where it is read-only), Endpoints, and Bridges. It
> + * may only be set for Endpoints and Bridges if it is set in
> + * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
> + * Functions. If the Root Port's RCB cannot be determined, we
> + * bail out.
> + */
> + if (!pci_is_pcie(dev) ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
> + dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
> + return;
This solves the concern Bjorn had for hidden root ports in VMs and I
confirmed that the check correctly bails on s390 even for PFs. Thanks!
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> + if (rcb) {
> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
> + return;
> +
> + lnkctl |= PCI_EXP_LNKCTL_RCB;
> + } else {
> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> + return;
> +
> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
The FW_INFO in here seems to be a remnant from ACPI code. As far as I
know this isn't usually used in core PCI code and seems conceptually
misleading to me since this doesn't come out of ACPI or other firmware
code.
> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> + }
> +
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
I do see Ilpo's point about pcie_capability_clear_and_set_word() and
agree that it would look cleaner. For good measure I tested with that
variant too and do prefer it, especially since it also gets rid of the
lnkctl variable. On ther other hand the message might help identify
weird firmware behavior. So no strong preference from my side.
> +}
> +
> static void pci_configure_device(struct pci_dev *dev)
> {
> pci_configure_mps(dev);
> @@ -2419,6 +2471,7 @@ static void pci_configure_device(struct pci_dev *dev)
> pci_configure_aspm_l1ss(dev);
> pci_configure_eetlp_prefix(dev);
> pci_configure_serr(dev);
> + pci_configure_rcb(dev);
>
> pci_acpi_program_hp_params(dev);
> }
I tested that this patch series does not create problems on s390 and
would leave the RCB setting as is if our firmware set it. Since the
second patch doesn't touch code that is build on s390 I think the
Tested-by only makes sense for this one.
So feel free to add my:
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Furthermore with either the FW_INFO dropped or Ilpo's suggestion
incorporated feel free to also add:
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>> instructed program_hpx_type2() to set the RCB in an endpoint,
>> although it's RC did not have the RCB bit set.
>>
>> --- snip ---
>>
>> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
>> +{
>> + struct pci_dev *rp = pcie_find_root_port(dev);
>> + u16 lnkctl;
>> +
>> + if (!rp)
>> + return false;
>> +
>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>> +
>> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>> + return true;
>> +}
>
> In principle this is one of these things where platforms like s390
> where the root port is hidden (only s390?) might want a hook to use
> firmware supplied information to determine if the hidden root port
> supports the setting. I wonder if this would make sense as a __weak
> pcibios_read_rcb_supported() function. Not saying we need this now,
> just thinking out loud.
That may be a good idea. But I am not quite sure how we can find the Root Port from an "orphan" bridge or endpoint through the pci_bios_read set of interfaces.
>> +
>> +static void pci_configure_rcb(struct pci_dev *dev)
>> +{
>> + u16 lnkctl;
>> + bool rcb;
>> +
>> + /*
>> + * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
>> + * Ports (where it is read-only), Endpoints, and Bridges. It
>> + * may only be set for Endpoints and Bridges if it is set in
>> + * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
>> + * Functions. If the Root Port's RCB cannot be determined, we
>> + * bail out.
>> + */
>> + if (!pci_is_pcie(dev) ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
>> + dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
>> + return;
>
> This solves the concern Bjorn had for hidden root ports in VMs and I
> confirmed that the check correctly bails on s390 even for PFs. Thanks!
Thanks for confirming!
>> +
>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>> + if (rcb) {
>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
>> + return;
>> +
>> + lnkctl |= PCI_EXP_LNKCTL_RCB;
>> + } else {
>> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>> + return;
>> +
>> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>
> The FW_INFO in here seems to be a remnant from ACPI code. As far as I
> know this isn't usually used in core PCI code and seems conceptually
> misleading to me since this doesn't come out of ACPI or other firmware
> code.
I humbly disagree. As per PCIe r7.0, sec 7.5.3.7: "Default value of this bit is 0b". So, if we find it set, and it is not set in the Root Port, to me, it must be a firmware bug. And that is exactly what FW_INFO is intended for: "Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs" (commit a0ad05c75aa3).
>> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> + }
>> +
>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>
> I do see Ilpo's point about pcie_capability_clear_and_set_word() and
> agree that it would look cleaner. For good measure I tested with that
> variant too and do prefer it, especially since it also gets rid of the
> lnkctl variable. On ther other hand the message might help identify
> weird firmware behavior. So no strong preference from my side.
OK.
>> +}
>> +
>> static void pci_configure_device(struct pci_dev *dev)
>> {
>> pci_configure_mps(dev);
>> @@ -2419,6 +2471,7 @@ static void pci_configure_device(struct pci_dev *dev)
>> pci_configure_aspm_l1ss(dev);
>> pci_configure_eetlp_prefix(dev);
>> pci_configure_serr(dev);
>> + pci_configure_rcb(dev);
>>
>> pci_acpi_program_hp_params(dev);
>> }
>
> I tested that this patch series does not create problems on s390 and
> would leave the RCB setting as is if our firmware set it. Since the
> second patch doesn't touch code that is build on s390 I think the
> Tested-by only makes sense for this one.
>
> So feel free to add my:
>
> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Thanks a lot for the testing!
> Furthermore with either the FW_INFO dropped or Ilpo's suggestion
> incorporated feel free to also add:
>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Thanks for the conditional r-b. I'll like Bjorn to chime in here, as he introduced FW_INFO and did not object to the existing (non Ilpo variant).
Thxs, Håkon
On Fri, 2026-01-23 at 17:38 +0000, Haakon Bugge wrote:
> > > On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
> > > Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> > > Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> > > instructed program_hpx_type2() to set the RCB in an endpoint,
> > > although it's RC did not have the RCB bit set.
> > >
> > > --- snip ---
> > >
> > > +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
> > > +{
> > > + struct pci_dev *rp = pcie_find_root_port(dev);
> > > + u16 lnkctl;
> > > +
> > > + if (!rp)
> > > + return false;
> > > +
> > > + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> > > +
> > > + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> > > + return true;
> > > +}
> >
> > In principle this is one of these things where platforms like s390
> > where the root port is hidden (only s390?) might want a hook to use
> > firmware supplied information to determine if the hidden root port
> > supports the setting. I wonder if this would make sense as a __weak
> > pcibios_read_rcb_supported() function. Not saying we need this now,
> > just thinking out loud.
>
> That may be a good idea. But I am not quite sure how we can find the Root Port from an "orphan" bridge or endpoint through the pci_bios_read set of interfaces.
I only meant to have it as a function in a similar manner to e.g.
pcibios_enable_device() that we could override. But I don't think that
needs or even should be part of this patch as there wouldn't be a user
of the override yet.
>
> > > +
> > > +static void pci_configure_rcb(struct pci_dev *dev)
>
--- snip ---
> > > +
> > > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> > > + if (rcb) {
> > > + if (lnkctl & PCI_EXP_LNKCTL_RCB)
> > > + return;
> > > +
> > > + lnkctl |= PCI_EXP_LNKCTL_RCB;
> > > + } else {
> > > + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> > > + return;
> > > +
> > > + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> >
> > The FW_INFO in here seems to be a remnant from ACPI code. As far as I
> > know this isn't usually used in core PCI code and seems conceptually
> > misleading to me since this doesn't come out of ACPI or other firmware
> > code.
>
> I humbly disagree. As per PCIe r7.0, sec 7.5.3.7: "Default value of this bit is 0b". So, if we find it set, and it is not set in the Root Port, to me, it must be a firmware bug. And that is exactly what FW_INFO is intended for: "Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs" (commit a0ad05c75aa3).
Ah thanks for the pointer, I wasn't aware of this explicit "for
reporting FW issues" use. Reading commit a0ad05c75aa3 ("Introduce
FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs")
now I agree this makes sense after all.
>
> > > + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> > > + }
> > > +
> > > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> >
--- snip ---
> >
> > I tested that this patch series does not create problems on s390 and
> > would leave the RCB setting as is if our firmware set it. Since the
> > second patch doesn't touch code that is build on s390 I think the
> > Tested-by only makes sense for this one.
> >
> > So feel free to add my:
> >
> > Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>
> Thanks a lot for the testing!
>
> > Furthermore with either the FW_INFO dropped or Ilpo's suggestion
> > incorporated feel free to also add:
> >
> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>
> Thanks for the conditional r-b. I'll like Bjorn to chime in here, as he introduced FW_INFO and did not object to the existing (non Ilpo variant).
As stated above with the additional explanation this makes sense to me
now so no need for the conditional anymore either ;)
Thanks,
Niklas
> On 23 Jan 2026, at 19:54, Niklas Schnelle <schnelle@linux.ibm.com> wrote:
>
> On Fri, 2026-01-23 at 17:38 +0000, Haakon Bugge wrote:
>>>> On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
>>>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>>>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>>>> instructed program_hpx_type2() to set the RCB in an endpoint,
>>>> although it's RC did not have the RCB bit set.
>>>>
>>>> --- snip ---
>>>>
>>>> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
>>>> +{
>>>> + struct pci_dev *rp = pcie_find_root_port(dev);
>>>> + u16 lnkctl;
>>>> +
>>>> + if (!rp)
>>>> + return false;
>>>> +
>>>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>>>> +
>>>> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>>>> + return true;
>>>> +}
>>>
>>> In principle this is one of these things where platforms like s390
>>> where the root port is hidden (only s390?) might want a hook to use
>>> firmware supplied information to determine if the hidden root port
>>> supports the setting. I wonder if this would make sense as a __weak
>>> pcibios_read_rcb_supported() function. Not saying we need this now,
>>> just thinking out loud.
>>
>> That may be a good idea. But I am not quite sure how we can find the Root Port from an "orphan" bridge or endpoint through the pci_bios_read set of interfaces.
>
> I only meant to have it as a function in a similar manner to e.g.
> pcibios_enable_device() that we could override. But I don't think that
> needs or even should be part of this patch as there wouldn't be a user
> of the override yet.
Understood.
[snip]
>>> The FW_INFO in here seems to be a remnant from ACPI code. As far as I
>>> know this isn't usually used in core PCI code and seems conceptually
>>> misleading to me since this doesn't come out of ACPI or other firmware
>>> code.
>>
>> I humbly disagree. As per PCIe r7.0, sec 7.5.3.7: "Default value of this bit is 0b". So, if we find it set, and it is not set in the Root Port, to me, it must be a firmware bug. And that is exactly what FW_INFO is intended for: "Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs" (commit a0ad05c75aa3).
>
> Ah thanks for the pointer, I wasn't aware of this explicit "for
> reporting FW issues" use. Reading commit a0ad05c75aa3 ("Introduce
> FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs")
> now I agree this makes sense after all.
I must admit, I wasn't either, before I read that commit :-)
>>
>>>> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>>>> + }
>>>> +
>>>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>>>
> --- snip ---
>>>
>>> I tested that this patch series does not create problems on s390 and
>>> would leave the RCB setting as is if our firmware set it. Since the
>>> second patch doesn't touch code that is build on s390 I think the
>>> Tested-by only makes sense for this one.
>>>
>>> So feel free to add my:
>>>
>>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>
>> Thanks a lot for the testing!
>>
>>> Furthermore with either the FW_INFO dropped or Ilpo's suggestion
>>> incorporated feel free to also add:
>>>
>>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>
>> Thanks for the conditional r-b. I'll like Bjorn to chime in here, as he introduced FW_INFO and did not object to the existing (non Ilpo variant).
>
> As stated above with the additional explanation this makes sense to me
> now so no need for the conditional anymore either ;)
Thanks for both your t-b and r-b on this commit!
Thxs, Håkon
On Thu, 22 Jan 2026, Håkon Bugge wrote:
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
>
> e42010d8207f fixed that by qualifying the setting of the RCB in the
> endpoint with the RC supporting an 128 byte RCB.
>
> In retrospect, the program_hpx_type2() should only modify the AER
> bits, and stay away from fiddling with the Link Control Register.
>
> Hence, we explicitly program the RCB from pci_configure_device().
>
> According to PCIe r7.0, sec 7.5.3.7, RCB is only valid for Root Ports
> (where it is Read-Only), Bridges, and Endpoints. The bit is 'RsvdP'
> for Virtual Functions. Hence, for other cases than Bridges and Physical
> Endpoints, we bail out early from pci_configure_rcb().
>
> If the Root Port's RCB cannot be determined, we do nothing.
>
> If RCB is set in the Root Port and not in the device, we set it. If it
> is set in the device but not in the Root Port, we print an info
> message and reset it.
>
> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>
> ---
>
> Note, that the current duplication of pcie_root_rcb_set() will be
> removed in the next commit.
>
> v2 -> v3:
> * Qualified the device types more strictly
> * s/pcie_root_rcb_set/pcie_read_root_rcb/ and changed signature
> * Do nothing if the RP's RCB cannot be determined
> * Reset the device's RCB if not set in the RP
> ---
> drivers/pci/probe.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d9..7165ac4065c97 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2410,6 +2410,58 @@ static void pci_configure_serr(struct pci_dev *dev)
> }
> }
>
> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
> +{
> + struct pci_dev *rp = pcie_find_root_port(dev);
> + u16 lnkctl;
> +
> + if (!rp)
> + return false;
> +
> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> + return true;
> +}
> +
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> + u16 lnkctl;
> + bool rcb;
> +
> + /*
> + * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
> + * Ports (where it is read-only), Endpoints, and Bridges. It
> + * may only be set for Endpoints and Bridges if it is set in
> + * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
> + * Functions. If the Root Port's RCB cannot be determined, we
> + * bail out.
> + */
> + if (!pci_is_pcie(dev) ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
> + dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
> + return;
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> + if (rcb) {
> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
> + return;
> +
> + lnkctl |= PCI_EXP_LNKCTL_RCB;
> + } else {
> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> + return;
> +
> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> + }
> +
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
So this sequence is effectively implementing this simple statement:
pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
PCI_EXP_LNKCTL_RCB,
rcb ? PCI_EXP_LNKCTL_RCB : 0);
+ the print.
Is there a good reason why you want to avoid the write by using early
returns?
I also wonder if those clear & set & clean_and_set interfaces should
implement the write avoidance if it's an useful thing (callers should be
checked they're not used for RW1C bits if that's implemented though).
--
i.
On Thu, Jan 22, 2026 at 03:45:58PM +0200, Ilpo Järvinen wrote:
> On Thu, 22 Jan 2026, Håkon Bugge wrote:
>
> > Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> > Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> > instructed program_hpx_type2() to set the RCB in an endpoint,
> > although it's RC did not have the RCB bit set.
> > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> > + if (rcb) {
> > + if (lnkctl & PCI_EXP_LNKCTL_RCB)
> > + return;
> > +
> > + lnkctl |= PCI_EXP_LNKCTL_RCB;
> > + } else {
> > + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> > + return;
> > +
> > + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> > + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> > + }
> > +
> > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>
> So this sequence is effectively implementing this simple statement:
>
> pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> PCI_EXP_LNKCTL_RCB,
> rcb ? PCI_EXP_LNKCTL_RCB : 0);
>
> + the print.
>
> Is there a good reason why you want to avoid the write by using early
> returns?
Good question, pcie_capability_clear_and_set_word() is much more
readable.
> I also wonder if those clear & set & clean_and_set interfaces should
> implement the write avoidance if it's an useful thing (callers should be
> checked they're not used for RW1C bits if that's implemented though).
> On 22 Jan 2026, at 14:45, Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote:
>
> On Thu, 22 Jan 2026, Håkon Bugge wrote:
>
>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>> instructed program_hpx_type2() to set the RCB in an endpoint,
>> although it's RC did not have the RCB bit set.
>>
>> e42010d8207f fixed that by qualifying the setting of the RCB in the
>> endpoint with the RC supporting an 128 byte RCB.
>>
>> In retrospect, the program_hpx_type2() should only modify the AER
>> bits, and stay away from fiddling with the Link Control Register.
>>
>> Hence, we explicitly program the RCB from pci_configure_device().
>>
>> According to PCIe r7.0, sec 7.5.3.7, RCB is only valid for Root Ports
>> (where it is Read-Only), Bridges, and Endpoints. The bit is 'RsvdP'
>> for Virtual Functions. Hence, for other cases than Bridges and Physical
>> Endpoints, we bail out early from pci_configure_rcb().
>>
>> If the Root Port's RCB cannot be determined, we do nothing.
>>
>> If RCB is set in the Root Port and not in the device, we set it. If it
>> is set in the device but not in the Root Port, we print an info
>> message and reset it.
>>
>> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>
>> ---
>>
>> Note, that the current duplication of pcie_root_rcb_set() will be
>> removed in the next commit.
>>
>> v2 -> v3:
>> * Qualified the device types more strictly
>> * s/pcie_root_rcb_set/pcie_read_root_rcb/ and changed signature
>> * Do nothing if the RP's RCB cannot be determined
>> * Reset the device's RCB if not set in the RP
>> ---
>> drivers/pci/probe.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 41183aed8f5d9..7165ac4065c97 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2410,6 +2410,58 @@ static void pci_configure_serr(struct pci_dev *dev)
>> }
>> }
>>
>> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
>> +{
>> + struct pci_dev *rp = pcie_find_root_port(dev);
>> + u16 lnkctl;
>> +
>> + if (!rp)
>> + return false;
>> +
>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>> +
>> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>> + return true;
>> +}
>> +
>> +static void pci_configure_rcb(struct pci_dev *dev)
>> +{
>> + u16 lnkctl;
>> + bool rcb;
>> +
>> + /*
>> + * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
>> + * Ports (where it is read-only), Endpoints, and Bridges. It
>> + * may only be set for Endpoints and Bridges if it is set in
>> + * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
>> + * Functions. If the Root Port's RCB cannot be determined, we
>> + * bail out.
>> + */
>> + if (!pci_is_pcie(dev) ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
>> + dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
>> + return;
>> +
>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>> + if (rcb) {
>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
>> + return;
>> +
>> + lnkctl |= PCI_EXP_LNKCTL_RCB;
>> + } else {
>> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>> + return;
>> +
>> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> + }
>> +
>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>
> So this sequence is effectively implementing this simple statement:
>
> pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> PCI_EXP_LNKCTL_RCB,
> rcb ? PCI_EXP_LNKCTL_RCB : 0);
>
> + the print.
Well, not exactly, as there will be no writes unless required. This was discussed here [1].
> Is there a good reason why you want to avoid the write by using early
> returns?
No other reasons but performance.
> I also wonder if those clear & set & clean_and_set interfaces should
> implement the write avoidance if it's an useful thing (callers should be
> checked they're not used for RW1C bits if that's implemented though).
That may be a good idea, but for sure, outside the scope of this series.
Thxs, Håkon
[1] https://lore.kernel.org/linux-pci/ECE29E32-7925-44C3-BAAA-B16003E9E997@oracle.com/
© 2016 - 2026 Red Hat, Inc.