presence detection
The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system
was observed to incorrectly assert the Presence Detect Set bit in its
capabilities when tested on a Raptor Computing Systems Blackbird system,
resulting in the hot insert path never attempting a rescan of the bus
and any downstream devices not being re-detected.
Work around this by additionally checking whether the PCIe data link is
active or not when performing presence detection on downstream switches'
ports, similar to the pciehp_hpc.c driver.
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
---
drivers/pci/hotplug/pnv_php.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index aec0a6d594ac..bac8af3df41a 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -391,6 +391,20 @@ static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
return 0;
}
+static int pcie_check_link_active(struct pci_dev *pdev)
+{
+ u16 lnk_status;
+ int ret;
+
+ ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND || PCI_POSSIBLE_ERROR(lnk_status))
+ return -ENODEV;
+
+ ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
+
+ return ret;
+}
+
static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
{
struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
@@ -403,6 +417,19 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
*/
ret = pnv_pci_get_presence_state(php_slot->id, &presence);
if (ret >= 0) {
+ if (pci_pcie_type(php_slot->pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
+ presence == OPAL_PCI_SLOT_EMPTY) {
+ /*
+ * Similar to pciehp_hpc, check whether the Link Active
+ * bit is set to account for broken downstream bridges
+ * that don't properly assert Presence Detect State, as
+ * was observed on the Microsemi Switchtec PM8533 PFX
+ * [11f8:8533].
+ */
+ if (pcie_check_link_active(php_slot->pdev) > 0)
+ presence = OPAL_PCI_SLOT_PRESENT;
+ }
+
*state = presence;
ret = 0;
} else {
--
2.39.5
[+cc Lukas, pciehp expert] On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote: > presence detection (subject/commit wrapping seems to be on all of these patches) > The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system > was observed to incorrectly assert the Presence Detect Set bit in its > capabilities when tested on a Raptor Computing Systems Blackbird system, > resulting in the hot insert path never attempting a rescan of the bus > and any downstream devices not being re-detected. Seems like this switch supports standard PCIe hotplug? Quite a bit of this driver looks similar to things in pciehp. Is there some reason we can't use pciehp directly? Maybe pciehp could work if there were hooks for the PPC-specific bits? > Work around this by additionally checking whether the PCIe data link is > active or not when performing presence detection on downstream switches' > ports, similar to the pciehp_hpc.c driver. > > Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> > Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> > --- > drivers/pci/hotplug/pnv_php.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c > index aec0a6d594ac..bac8af3df41a 100644 > --- a/drivers/pci/hotplug/pnv_php.c > +++ b/drivers/pci/hotplug/pnv_php.c > @@ -391,6 +391,20 @@ static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state) > return 0; > } > > +static int pcie_check_link_active(struct pci_dev *pdev) > +{ > + u16 lnk_status; > + int ret; > + > + ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); > + if (ret == PCIBIOS_DEVICE_NOT_FOUND || PCI_POSSIBLE_ERROR(lnk_status)) > + return -ENODEV; > + > + ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); > + > + return ret; > +} > + > static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) > { > struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); > @@ -403,6 +417,19 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) > */ > ret = pnv_pci_get_presence_state(php_slot->id, &presence); > if (ret >= 0) { > + if (pci_pcie_type(php_slot->pdev) == PCI_EXP_TYPE_DOWNSTREAM && > + presence == OPAL_PCI_SLOT_EMPTY) { > + /* > + * Similar to pciehp_hpc, check whether the Link Active > + * bit is set to account for broken downstream bridges > + * that don't properly assert Presence Detect State, as > + * was observed on the Microsemi Switchtec PM8533 PFX > + * [11f8:8533]. > + */ > + if (pcie_check_link_active(php_slot->pdev) > 0) > + presence = OPAL_PCI_SLOT_PRESENT; > + } > + > *state = presence; > ret = 0; > } else { > -- > 2.39.5
----- Original Message ----- > From: "Bjorn Helgaas" <helgaas@kernel.org> > To: "Timothy Pearson" <tpearson@raptorengineering.com> > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci" > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>, > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas" > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, "Lukas Wunner" <lukas@wunner.de> > Sent: Wednesday, June 18, 2025 2:44:00 PM > Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken > [+cc Lukas, pciehp expert] > > On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote: >> presence detection > > (subject/commit wrapping seems to be on all of these patches) > >> The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system >> was observed to incorrectly assert the Presence Detect Set bit in its >> capabilities when tested on a Raptor Computing Systems Blackbird system, >> resulting in the hot insert path never attempting a rescan of the bus >> and any downstream devices not being re-detected. > > Seems like this switch supports standard PCIe hotplug? Quite a bit of > this driver looks similar to things in pciehp. Is there some reason > we can't use pciehp directly? Maybe pciehp could work if there were > hooks for the PPC-specific bits? While that is a good long term goal that Raptor is willing to work toward, it is non-trivial and will require buy-in from other stakeholders (e.g. IBM). If practical, I'd like to get this series merged first, to fix the broken hotplug on our hardware that is deployed worldwide, then in parallel see what can be done to merge PowerNV support into pciehp. Would that work?
On Wed, Jun 18, 2025 at 02:50:04PM -0500, Timothy Pearson wrote: > ----- Original Message ----- > > From: "Bjorn Helgaas" <helgaas@kernel.org> > > To: "Timothy Pearson" <tpearson@raptorengineering.com> > > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci" > > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>, > > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas" > > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, "Lukas Wunner" <lukas@wunner.de> > > Sent: Wednesday, June 18, 2025 2:44:00 PM > > Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken > > > [+cc Lukas, pciehp expert] > > > > On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote: > >> presence detection > > > > (subject/commit wrapping seems to be on all of these patches) > > > >> The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system > >> was observed to incorrectly assert the Presence Detect Set bit in its > >> capabilities when tested on a Raptor Computing Systems Blackbird system, > >> resulting in the hot insert path never attempting a rescan of the bus > >> and any downstream devices not being re-detected. > > > > Seems like this switch supports standard PCIe hotplug? Quite a bit of > > this driver looks similar to things in pciehp. Is there some reason > > we can't use pciehp directly? Maybe pciehp could work if there were > > hooks for the PPC-specific bits? > > While that is a good long term goal that Raptor is willing to work > toward, it is non-trivial and will require buy-in from other > stakeholders (e.g. IBM). If practical, I'd like to get this series > merged first, to fix the broken hotplug on our hardware that is > deployed worldwide, then in parallel see what can be done to merge > PowerNV support into pciehp. Would that work? Yeah, it wouldn't make sense to switch horses at this stage. I guess I was triggered by this patch, which seems to be a workaround for a defect in a device that is probably also used on non-PPC systems, and pciehp would need a similar workaround. But I guess you go on to say that pciehp already does something similar, so it guess it's already covered. Bjorn
----- Original Message ----- > From: "Bjorn Helgaas" <helgaas@kernel.org> > To: "Timothy Pearson" <tpearson@raptorengineering.com> > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci" > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>, > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas" > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, "Lukas Wunner" <lukas@wunner.de> > Sent: Wednesday, June 18, 2025 3:17:22 PM > Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken > On Wed, Jun 18, 2025 at 02:50:04PM -0500, Timothy Pearson wrote: >> ----- Original Message ----- >> > From: "Bjorn Helgaas" <helgaas@kernel.org> >> > To: "Timothy Pearson" <tpearson@raptorengineering.com> >> > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" >> > <linux-kernel@vger.kernel.org>, "linux-pci" >> > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, >> > "Michael Ellerman" <mpe@ellerman.id.au>, >> > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" >> > <naveen@kernel.org>, "Bjorn Helgaas" >> > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, >> > "Lukas Wunner" <lukas@wunner.de> >> > Sent: Wednesday, June 18, 2025 2:44:00 PM >> > Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with >> > broken >> >> > [+cc Lukas, pciehp expert] >> > >> > On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote: >> >> presence detection >> > >> > (subject/commit wrapping seems to be on all of these patches) >> > >> >> The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system >> >> was observed to incorrectly assert the Presence Detect Set bit in its >> >> capabilities when tested on a Raptor Computing Systems Blackbird system, >> >> resulting in the hot insert path never attempting a rescan of the bus >> >> and any downstream devices not being re-detected. >> > >> > Seems like this switch supports standard PCIe hotplug? Quite a bit of >> > this driver looks similar to things in pciehp. Is there some reason >> > we can't use pciehp directly? Maybe pciehp could work if there were >> > hooks for the PPC-specific bits? >> >> While that is a good long term goal that Raptor is willing to work >> toward, it is non-trivial and will require buy-in from other >> stakeholders (e.g. IBM). If practical, I'd like to get this series >> merged first, to fix the broken hotplug on our hardware that is >> deployed worldwide, then in parallel see what can be done to merge >> PowerNV support into pciehp. Would that work? > > Yeah, it wouldn't make sense to switch horses at this stage. > > I guess I was triggered by this patch, which seems to be a workaround > for a defect in a device that is probably also used on non-PPC > systems, and pciehp would need a similar workaround. But I guess you > go on to say that pciehp already does something similar, so it guess > it's already covered. No problem, I completely understand. To be perfectly frank the existing code quality in this driver (and the associated EEH driver) is not the best, and it's been a frustrating experience trying to hack it into semi-stable operation. I would vastly prefer to rewrite / integrate into the pciehp driver, and we have plans to do so, but that will take an unacceptable amount of time vs. trying to fix up the existing driver as a stopgap. As you mentioned, pciehp already has this fix, so we just have to deal with the duplicated code until we (Raptor) figures out how to merge PowerNV support into pciehp.
On Thu, Jun 19, 2025 at 02:29:33PM -0500, Timothy Pearson wrote: > To be perfectly frank the existing code quality in this driver > (and the associated EEH driver) is not the best, and it's been > a frustrating experience trying to hack it into semi-stable > operation. > > I would vastly prefer to rewrite / integrate into the pciehp driver, > and we have plans to do so, but that will take an unacceptable amount > of time vs. trying to fix up the existing driver as a stopgap. > > As you mentioned, pciehp already has this fix, so we just have to > deal with the duplicated code until we (Raptor) figures out how to > merge PowerNV support into pciehp. I don't know how much PCIe hotplug on PowerNV differs from native, spec-compliant PCIe hotplug. If the differences are vast (and I get the feeling they might be if I read terms like "PHB" and "EEH unfreeze", which sound completely foreign to me), it might be easier to refactor pnv_php.c and copy patterns or code from pciehp, than to integrate the functionality from pnv_php.c into pciehp. pciehp does carry some historic baggage of its own (such as poll mode), which you may not want to deal with on PowerNV. One thing I don't quite understand is, it sounds like you've attached a PCIe switch to a Root Port and the hotplug ports are on the PCIe switch. Aren't those hotplug ports just bog-standard ones that can be driven by pciehp? My expectation would have been that a PowerNV-specific hotplug driver would only be necessary for hotplug-capable Root Ports. Thanks, Lukas
----- Original Message ----- > From: "Lukas Wunner" <lukas@wunner.de> > To: "Timothy Pearson" <tpearson@raptorengineering.com> > Cc: "Bjorn Helgaas" <helgaas@kernel.org>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" > <linux-kernel@vger.kernel.org>, "linux-pci" <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, > "Michael Ellerman" <mpe@ellerman.id.au>, "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" > <naveen@kernel.org>, "Bjorn Helgaas" <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com> > Sent: Friday, June 20, 2025 2:52:55 AM > Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken > On Thu, Jun 19, 2025 at 02:29:33PM -0500, Timothy Pearson wrote: >> To be perfectly frank the existing code quality in this driver >> (and the associated EEH driver) is not the best, and it's been >> a frustrating experience trying to hack it into semi-stable >> operation. >> >> I would vastly prefer to rewrite / integrate into the pciehp driver, >> and we have plans to do so, but that will take an unacceptable amount >> of time vs. trying to fix up the existing driver as a stopgap. >> >> As you mentioned, pciehp already has this fix, so we just have to >> deal with the duplicated code until we (Raptor) figures out how to >> merge PowerNV support into pciehp. > > I don't know how much PCIe hotplug on PowerNV differs from native, > spec-compliant PCIe hotplug. If the differences are vast (and I > get the feeling they might be if I read terms like "PHB" and > "EEH unfreeze", which sound completely foreign to me), it might > be easier to refactor pnv_php.c and copy patterns or code from > pciehp, than to integrate the functionality from pnv_php.c into > pciehp. The differences at the root level (PHB) are quite significant -- the controller is more advanced in many ways than standard PCIe root complexes [1] -- and those differences need very special handling. Once we are looking at devices downstream of the root complex, standard PCIe hotplug and AER specifications apply, so we're in a strange spot of really wanting to use pciehp (to handle all nested downstream bridges), but pciehp still needs to understand how to deal with our unqiue root complex. One idea I had was to use the existing modularity of pciehp's source and add a new pciehp_powernv.c file with all of our root complex handling methods. We could then #ifdef swap the assocated root complex calls to that external file when compiled in PowerNV mode. > pciehp does carry some historic baggage of its own (such as poll mode), > which you may not want to deal with on PowerNV. At the root level we probably don't care about polling mode, but in terms of nested downtream bridges polling may still be desireable. I don't have a strong opinion either way. > One thing I don't quite understand is, it sounds like you've > attached a PCIe switch to a Root Port and the hotplug ports > are on the PCIe switch. Aren't those hotplug ports just > bog-standard ones that can be driven by pciehp? My expectation > would have been that a PowerNV-specific hotplug driver would > only be necessary for hotplug-capable Root Ports. That's the problem -- the pciehp driver assumes x86 root ports, and the powernv driver ends up duplicating (badly) parts of the pciehp functionality for downstream bridges. That's one reason I'd like to abstract the root port handling in pciehp and eventually move the PowerNV root port handling into that module. > Thanks, > > Lukas [1] Among other interesting differences, it is both capable of and has been tested to properly block and report all invalid accesses from a PCIe device to system memory. This breaks assumptions in many PCIe device drivers, but is also a significant security advantage. EEH freeze is effectively this security mechanism kicking in -- on detecting an invalid access, the PHB itself will block the access and put the PE into a frozen state where no PCIe traffic is permitted. It simultaneously rases an error to the host (colloquially referred to as EEH) and punts the decision making on whether to reset or resume the device to the kernel itself. We've caught various cards doing fun things like reading host RAM or trying to write to 0x0 via this mechanism, one of the most egregious was a Chinese telecom card that apparently tries to reset the host with a write to low memory on detecting an interrupt servicing delay (!).
On Fri, Jun 20, 2025 at 11:45:45AM -0500, Timothy Pearson wrote: > From: "Lukas Wunner" <lukas@wunner.de> > > I don't know how much PCIe hotplug on PowerNV differs from native, > > spec-compliant PCIe hotplug. If the differences are vast (and I > > get the feeling they might be if I read terms like "PHB" and > > "EEH unfreeze", which sound completely foreign to me), it might > > be easier to refactor pnv_php.c and copy patterns or code from > > pciehp, than to integrate the functionality from pnv_php.c into > > pciehp. > > The differences at the root level (PHB) are quite significant -- the > controller is more advanced in many ways than standard PCIe root > complexes [1] -- and those differences need very special handling. > Once we are looking at devices downstream of the root complex, > standard PCIe hotplug and AER specifications apply, so we're in a > strange spot of really wanting to use pciehp (to handle all nested > downstream bridges), but pciehp still needs to understand how to deal > with our unqiue root complex. > > One idea I had was to use the existing modularity of pciehp's source > and add a new pciehp_powernv.c file with all of our root complex > handling methods. We could then #ifdef swap the assocated root complex > calls to that external file when compiled in PowerNV mode. We've traditionally dealt with such issues by inserting pcibios_*() hooks in generic code, with a __weak implementation (which is usually an empty stub) and a custom implementation in arch/powerpc/. The linker then chooses the custom implementation over the __weak one. You can find the existing hooks with: git grep "__weak .*pcibios" -- drivers/pci git grep pcibios -- arch/powerpc An alternative method is to add a callback to struct pci_host_bridge. > > One thing I don't quite understand is, it sounds like you've > > attached a PCIe switch to a Root Port and the hotplug ports > > are on the PCIe switch. Aren't those hotplug ports just > > bog-standard ones that can be driven by pciehp? My expectation > > would have been that a PowerNV-specific hotplug driver would > > only be necessary for hotplug-capable Root Ports. > > That's the problem -- the pciehp driver assumes x86 root ports, Nah, there's nothing x86-specific about it. pciehp is used on all kinds of arches, it's just an implementation of the PCIe Base Spec. > and the powernv driver ends up duplicating (badly) parts of the pciehp > functionality for downstream bridges. That's one reason I'd like to > abstract the root port handling in pciehp and eventually move the > PowerNV root port handling into that module. Sounds like you're currently describing the Switch Downstream Ports with an "ibm,ioda-phb2" compatible string in the devicetree? That would feel wrong since they're not host bridges. > [1] Among other interesting differences, it is both capable of and has > been tested to properly block and report all invalid accesses from a > PCIe device to system memory. This breaks assumptions in many PCIe > device drivers, but is also a significant security advantage. > EEH freeze is effectively this security mechanism kicking in -- on > detecting an invalid access, the PHB itself will block the access and > put the PE into a frozen state where no PCIe traffic is permitted. That sounds somewhat similar to what the PCIe Access Control Services Capability provides. I think at least some IOMMUs support raising an exception on illegal accesses, so the EEH functionality is probably not *that* special. Thanks, Lukas
© 2016 - 2025 Red Hat, Inc.