[PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and

Timothy Pearson posted 6 patches 3 months, 3 weeks ago
There is a newer version of this series
[PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and
Posted by Timothy Pearson 3 months, 3 weeks ago
 recovery

The existing PowerNV hotplug code did not handle suprise plug events
correctly, leading to a complete failure of the hotplug system after
device removal and a required reboot to detect new devices.

This comes down to two issues:
1.) When a device is suprise removed, oftentimes the bridge upstream
    port will cause a PE freeze on the PHB.  If this freeze is not
    cleared, the MSI interrupts from the bridge hotplug notification
    logic will not be received by the kernel, stalling all plug events
    on all slots associated with the PE.
2.) When a device is removed from a slot, regardless of suprise or
    programmatic removal, the associated PHB/PE ls left frozen.
    If this freeze is not cleared via a fundamental reset, skiboot
    is unable to clear the freeze and cannot retrain / rescan the
    slot.  This also requires a reboot to clear the freeze and redetect
    the device in the slot.

Issue the appropriate unfreeze and rescan commands on hotplug events,
and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
---
 arch/powerpc/kernel/pci-hotplug.c |  3 ++
 drivers/pci/hotplug/pnv_php.c     | 53 ++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 9ea74973d78d..6f444d0822d8 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus)
 	struct pci_controller *phb;
 	struct device_node *dn = pci_bus_to_OF_node(bus);
 
+	if (!dn)
+		return;
+
 	phb = pci_bus_to_host(bus);
 
 	mode = PCI_PROBE_NORMAL;
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index bac8af3df41a..0ceb4a2c3c79 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -10,6 +10,7 @@
 #include <linux/libfdt.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/delay.h>
 #include <linux/pci_hotplug.h>
 #include <linux/of_fdt.h>
 
@@ -474,7 +475,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
 	struct hotplug_slot *slot = &php_slot->slot;
 	uint8_t presence = OPAL_PCI_SLOT_EMPTY;
 	uint8_t power_status = OPAL_PCI_SLOT_POWER_ON;
-	int ret;
+	int ret, i;
 
 	/* Check if the slot has been configured */
 	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
@@ -532,6 +533,27 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
 
 	/* Power is off, turn it on and then scan the slot */
 	ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+	if (ret) {
+		SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
+		SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
+		for (i = 0; i < 3; i++) {
+			/* Slot activation failed, PHB may be fenced from a prior device failure
+			 * Use the OPAL fundamental reset call to both try a device reset and clear
+			 * any potentially active PHB fence / freeze
+			 */
+			SLOT_WARN(php_slot, "Try %d...\n", i + 1);
+			pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
+			msleep(250);
+			pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
+
+			ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+			if (!ret)
+				break;
+		}
+
+		if (i >= 3)
+			SLOT_WARN(php_slot, "Failed to bring slot online, aborting!\n");
+	}
 	if (ret)
 		return ret;
 
@@ -841,12 +863,41 @@ static void pnv_php_event_handler(struct work_struct *work)
 	struct pnv_php_event *event =
 		container_of(work, struct pnv_php_event, work);
 	struct pnv_php_slot *php_slot = event->php_slot;
+	struct pci_dev *pdev = php_slot->pdev;
+	struct eeh_dev *edev;
+	struct eeh_pe *pe;
+	int i, rc;
 
 	if (event->added)
 		pnv_php_enable_slot(&php_slot->slot);
 	else
 		pnv_php_disable_slot(&php_slot->slot);
 
+	if (!event->added) {
+		/* When a device is surprise removed from a downstream bridge slot, the upstream bridge port
+		 * can still end up frozen due to related EEH events, which will in turn block the MSI interrupts
+		 * for slot hotplug detection.  Detect and thaw any frozen upstream PE after slot deactivation...
+		 */
+		edev = pci_dev_to_eeh_dev(pdev);
+		pe = edev ? edev->pe : NULL;
+		rc = eeh_pe_get_state(pe);
+		if ((rc == -ENODEV) || (rc == -ENOENT)) {
+			SLOT_WARN(php_slot, "Upstream bridge PE state unknown, hotplug detect may fail\n");
+		}
+		else {
+			if (pe->state & EEH_PE_ISOLATED) {
+				SLOT_WARN(php_slot, "Upstream bridge PE %02x frozen, thawing...\n", pe->addr);
+				for (i = 0; i < 3; i++)
+					if (!eeh_unfreeze_pe(pe))
+						break;
+				if (i >= 3)
+					SLOT_WARN(php_slot, "Unable to thaw PE %02x, hotplug detect will fail!\n", pe->addr);
+				else
+					SLOT_WARN(php_slot, "PE %02x thawed successfully\n", pe->addr);
+			}
+		}
+	}
+
 	kfree(event);
 }
 
-- 
2.39.5
Re: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and
Posted by Bjorn Helgaas 3 months, 3 weeks ago
On Wed, Jun 18, 2025 at 11:58:23AM -0500, Timothy Pearson wrote:
>  recovery

Same weird subject/commit wrapping.

> The existing PowerNV hotplug code did not handle suprise plug events
> correctly, leading to a complete failure of the hotplug system after
> device removal and a required reboot to detect new devices.

s/suprise/surprise/ (also below)

> This comes down to two issues:
> 1.) When a device is suprise removed, oftentimes the bridge upstream
>     port will cause a PE freeze on the PHB.  If this freeze is not
>     cleared, the MSI interrupts from the bridge hotplug notification
>     logic will not be received by the kernel, stalling all plug events
>     on all slots associated with the PE.

I guess you mean the bridge *downstream* port that leads to the slot?

> 2.) When a device is removed from a slot, regardless of suprise or
>     programmatic removal, the associated PHB/PE ls left frozen.
>     If this freeze is not cleared via a fundamental reset, skiboot
>     is unable to clear the freeze and cannot retrain / rescan the
>     slot.  This also requires a reboot to clear the freeze and redetect
>     the device in the slot.
> 
> Issue the appropriate unfreeze and rescan commands on hotplug events,
> and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.
> 
> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
> ---
>  arch/powerpc/kernel/pci-hotplug.c |  3 ++
>  drivers/pci/hotplug/pnv_php.c     | 53 ++++++++++++++++++++++++++++++-
>  2 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 9ea74973d78d..6f444d0822d8 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus)
>  	struct pci_controller *phb;
>  	struct device_node *dn = pci_bus_to_OF_node(bus);
>  
> +	if (!dn)
> +		return;
> +
>  	phb = pci_bus_to_host(bus);
>  
>  	mode = PCI_PROBE_NORMAL;
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index bac8af3df41a..0ceb4a2c3c79 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -10,6 +10,7 @@
>  #include <linux/libfdt.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>  #include <linux/pci_hotplug.h>
>  #include <linux/of_fdt.h>
>  
> @@ -474,7 +475,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>  	struct hotplug_slot *slot = &php_slot->slot;
>  	uint8_t presence = OPAL_PCI_SLOT_EMPTY;
>  	uint8_t power_status = OPAL_PCI_SLOT_POWER_ON;
> -	int ret;
> +	int ret, i;
>  
>  	/* Check if the slot has been configured */
>  	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
> @@ -532,6 +533,27 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>  
>  	/* Power is off, turn it on and then scan the slot */
>  	ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> +	if (ret) {
> +		SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
> +		SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
> +		for (i = 0; i < 3; i++) {
> +			/* Slot activation failed, PHB may be fenced from a prior device failure
> +			 * Use the OPAL fundamental reset call to both try a device reset and clear
> +			 * any potentially active PHB fence / freeze
> +			 */
> +			SLOT_WARN(php_slot, "Try %d...\n", i + 1);
> +			pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
> +			msleep(250);

What is the source of the 250 value?  Is there a spec you can cite for
this?  Maybe add a #define if it makes sense?

> +			pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
> +
> +			ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);

Wrap the comment and non-printk lines to fit in 80 columns like the
rest of the file.  Preserve the messages as-is so grep finds them
easily.

Usual multi-line comment style is:

  /*
   * Text ...
   */

Possibly factor this warn/reset code into a helper function to
unclutter pnv_php_enable()?

> +			if (!ret)
> +				break;
> +		}
> +
> +		if (i >= 3)
> +			SLOT_WARN(php_slot, "Failed to bring slot online, aborting!\n");
> +	}
>  	if (ret)
>  		return ret;
>  
> @@ -841,12 +863,41 @@ static void pnv_php_event_handler(struct work_struct *work)
>  	struct pnv_php_event *event =
>  		container_of(work, struct pnv_php_event, work);
>  	struct pnv_php_slot *php_slot = event->php_slot;
> +	struct pci_dev *pdev = php_slot->pdev;
> +	struct eeh_dev *edev;
> +	struct eeh_pe *pe;
> +	int i, rc;
>  
>  	if (event->added)
>  		pnv_php_enable_slot(&php_slot->slot);
>  	else
>  		pnv_php_disable_slot(&php_slot->slot);
>  
> +	if (!event->added) {
> +		/* When a device is surprise removed from a downstream bridge slot, the upstream bridge port
> +		 * can still end up frozen due to related EEH events, which will in turn block the MSI interrupts
> +		 * for slot hotplug detection.  Detect and thaw any frozen upstream PE after slot deactivation...
> +		 */

Restyle and wrap comment.

s/upstream bridge port/bridge downstream port/ to avoid confusion.

> +		edev = pci_dev_to_eeh_dev(pdev);
> +		pe = edev ? edev->pe : NULL;
> +		rc = eeh_pe_get_state(pe);
> +		if ((rc == -ENODEV) || (rc == -ENOENT)) {
> +			SLOT_WARN(php_slot, "Upstream bridge PE state unknown, hotplug detect may fail\n");
> +		}
> +		else {
> +			if (pe->state & EEH_PE_ISOLATED) {
> +				SLOT_WARN(php_slot, "Upstream bridge PE %02x frozen, thawing...\n", pe->addr);
> +				for (i = 0; i < 3; i++)
> +					if (!eeh_unfreeze_pe(pe))
> +						break;
> +				if (i >= 3)
> +					SLOT_WARN(php_slot, "Unable to thaw PE %02x, hotplug detect will fail!\n", pe->addr);
> +				else
> +					SLOT_WARN(php_slot, "PE %02x thawed successfully\n", pe->addr);
> +			}
> +		}
> +	}

Possibly factor this out, too.  Then pnv_php_event_handler() could
look simpler:

  if (event->added) {
    pnv_php_enable_slot(&php_slot->slot);
  } else {
    pnv_php_disable_slot(&php_slot->slot);
    <new helper to check for surprise removal>
  }

  kfree(event);

>  	kfree(event);
>  }
>  
> -- 
> 2.39.5
Re: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and
Posted by Timothy Pearson 3 months, 3 weeks ago

----- Original Message -----
> From: "Bjorn Helgaas" <helgaas@kernel.org>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
> Sent: Wednesday, June 18, 2025 2:15:30 PM
> Subject: Re: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and

> On Wed, Jun 18, 2025 at 11:58:23AM -0500, Timothy Pearson wrote:
>>  recovery
> 
> Same weird subject/commit wrapping.
> 
>> The existing PowerNV hotplug code did not handle suprise plug events
>> correctly, leading to a complete failure of the hotplug system after
>> device removal and a required reboot to detect new devices.
> 
> s/suprise/surprise/ (also below)
> 
>> This comes down to two issues:
>> 1.) When a device is suprise removed, oftentimes the bridge upstream
>>     port will cause a PE freeze on the PHB.  If this freeze is not
>>     cleared, the MSI interrupts from the bridge hotplug notification
>>     logic will not be received by the kernel, stalling all plug events
>>     on all slots associated with the PE.
> 
> I guess you mean the bridge *downstream* port that leads to the slot?

No, the upstream port leading to the PHB.  If it was just the downstream port, we'd still be receiving MSI interrupts from the bridge; the upstream PE also ends up frozen.  My best guess is that the downstream error is propagated upward by the PHB, and the hardware freezes the PE to avoid data corruption until the software (in this case the hotplug driver) can analyze the fault, see that it is expected, and thaw the PE.

>> 2.) When a device is removed from a slot, regardless of suprise or
>>     programmatic removal, the associated PHB/PE ls left frozen.
>>     If this freeze is not cleared via a fundamental reset, skiboot
>>     is unable to clear the freeze and cannot retrain / rescan the
>>     slot.  This also requires a reboot to clear the freeze and redetect
>>     the device in the slot.
>> 
>> Issue the appropriate unfreeze and rescan commands on hotplug events,
>> and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.
>> 
>> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
>> ---
>>  arch/powerpc/kernel/pci-hotplug.c |  3 ++
>>  drivers/pci/hotplug/pnv_php.c     | 53 ++++++++++++++++++++++++++++++-
>>  2 files changed, 55 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c
>> b/arch/powerpc/kernel/pci-hotplug.c
>> index 9ea74973d78d..6f444d0822d8 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus)
>>  	struct pci_controller *phb;
>>  	struct device_node *dn = pci_bus_to_OF_node(bus);
>>  
>> +	if (!dn)
>> +		return;
>> +
>>  	phb = pci_bus_to_host(bus);
>>  
>>  	mode = PCI_PROBE_NORMAL;
>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>> index bac8af3df41a..0ceb4a2c3c79 100644
>> --- a/drivers/pci/hotplug/pnv_php.c
>> +++ b/drivers/pci/hotplug/pnv_php.c
>> @@ -10,6 +10,7 @@
>>  #include <linux/libfdt.h>
>>  #include <linux/module.h>
>>  #include <linux/pci.h>
>> +#include <linux/delay.h>
>>  #include <linux/pci_hotplug.h>
>>  #include <linux/of_fdt.h>
>>  
>> @@ -474,7 +475,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot,
>> bool rescan)
>>  	struct hotplug_slot *slot = &php_slot->slot;
>>  	uint8_t presence = OPAL_PCI_SLOT_EMPTY;
>>  	uint8_t power_status = OPAL_PCI_SLOT_POWER_ON;
>> -	int ret;
>> +	int ret, i;
>>  
>>  	/* Check if the slot has been configured */
>>  	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
>> @@ -532,6 +533,27 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot,
>> bool rescan)
>>  
>>  	/* Power is off, turn it on and then scan the slot */
>>  	ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
>> +	if (ret) {
>> +		SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible
>> frozen PHB", ret);
>> +		SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot
>> activation\n");
>> +		for (i = 0; i < 3; i++) {
>> +			/* Slot activation failed, PHB may be fenced from a prior device failure
>> +			 * Use the OPAL fundamental reset call to both try a device reset and clear
>> +			 * any potentially active PHB fence / freeze
>> +			 */
>> +			SLOT_WARN(php_slot, "Try %d...\n", i + 1);
>> +			pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
>> +			msleep(250);
> 
> What is the source of the 250 value?  Is there a spec you can cite for
> this?  Maybe add a #define if it makes sense?

It's a magic number used elsewhere in the tree, specifically by the EEH driver which also issues fundamental resets as part of its logic.  Unfortunately I don't know the origin of the magic number, and from my understanding of the PCIe specification it's probably non-critical -- long enough to be reasonably sure that the PCIe device has seen the reset strobe, but short enough not to stall the operation for an obnoxiously long interval.

>> +			pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
>> +
>> +			ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> 
> Wrap the comment and non-printk lines to fit in 80 columns like the
> rest of the file.  Preserve the messages as-is so grep finds them
> easily.
> 
> Usual multi-line comment style is:
> 
>  /*
>   * Text ...
>   */

Will do.

> Possibly factor this warn/reset code into a helper function to
> unclutter pnv_php_enable()?

Sure, will do.

>> +			if (!ret)
>> +				break;
>> +		}
>> +
>> +		if (i >= 3)
>> +			SLOT_WARN(php_slot, "Failed to bring slot online, aborting!\n");
>> +	}
>>  	if (ret)
>>  		return ret;
>>  
>> @@ -841,12 +863,41 @@ static void pnv_php_event_handler(struct work_struct
>> *work)
>>  	struct pnv_php_event *event =
>>  		container_of(work, struct pnv_php_event, work);
>>  	struct pnv_php_slot *php_slot = event->php_slot;
>> +	struct pci_dev *pdev = php_slot->pdev;
>> +	struct eeh_dev *edev;
>> +	struct eeh_pe *pe;
>> +	int i, rc;
>>  
>>  	if (event->added)
>>  		pnv_php_enable_slot(&php_slot->slot);
>>  	else
>>  		pnv_php_disable_slot(&php_slot->slot);
>>  
>> +	if (!event->added) {
>> +		/* When a device is surprise removed from a downstream bridge slot, the
>> upstream bridge port
>> +		 * can still end up frozen due to related EEH events, which will in turn
>> block the MSI interrupts
>> +		 * for slot hotplug detection.  Detect and thaw any frozen upstream PE after
>> slot deactivation...
>> +		 */
> 
> Restyle and wrap comment.
> 
> s/upstream bridge port/bridge downstream port/ to avoid confusion.

It's the upstream port of the bridge (downstream slot of the PHB itself) that ends up as the frozen PE.

>> +		edev = pci_dev_to_eeh_dev(pdev);
>> +		pe = edev ? edev->pe : NULL;
>> +		rc = eeh_pe_get_state(pe);
>> +		if ((rc == -ENODEV) || (rc == -ENOENT)) {
>> +			SLOT_WARN(php_slot, "Upstream bridge PE state unknown, hotplug detect may
>> fail\n");
>> +		}
>> +		else {
>> +			if (pe->state & EEH_PE_ISOLATED) {
>> +				SLOT_WARN(php_slot, "Upstream bridge PE %02x frozen, thawing...\n",
>> pe->addr);
>> +				for (i = 0; i < 3; i++)
>> +					if (!eeh_unfreeze_pe(pe))
>> +						break;
>> +				if (i >= 3)
>> +					SLOT_WARN(php_slot, "Unable to thaw PE %02x, hotplug detect will fail!\n",
>> pe->addr);
>> +				else
>> +					SLOT_WARN(php_slot, "PE %02x thawed successfully\n", pe->addr);
>> +			}
>> +		}
>> +	}
> 
> Possibly factor this out, too.  Then pnv_php_event_handler() could
> look simpler:
> 
>  if (event->added) {
>    pnv_php_enable_slot(&php_slot->slot);
>  } else {
>    pnv_php_disable_slot(&php_slot->slot);
>    <new helper to check for surprise removal>
>  }

Fair enough, will do.

>  kfree(event);
> 
>>  	kfree(event);
>>  }
>>  
>> --
> > 2.39.5