From nobody Thu Oct 9 11:05:21 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92D192FEE06; Wed, 18 Jun 2025 16:58:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750265907; cv=none; b=O26iaKTsrj0h11A+4QFKRUdZlP+cqxHW/QRIpJaNQYWplrz0uFLQ2Ce+6bFSY5Cs4AscbL/U61zhxEVoEw5khiKM0USMTufVPsr+pb88gMQtOTsxj+qJOYdY8krqEYqei3/YGaWN/ttblkF6lYKK9zeX/b3sy5LjLcQLzBoSYRM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750265907; c=relaxed/simple; bh=K4xO/NvLZLmHFsh6LT2gQUQ/7RWKduLSHNDL1emCB0I=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=YiUDfcm9s2vSUCQ3AbAulz6RoVgkW2dXIFxbEtxuKJveUn5YCBmDDL6uR2JI0d7gvfNjXDr6hn6XYqiBjUXVWQE0F8iYtUowJOhmcHn/4J8De/70ijJgzSsPM0dNxb90CJMqJRedph4WTpVoddh6W5pY8t/76Hc0n5nUtI6jb9Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=s/sJ8shh; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="s/sJ8shh" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id C8AEC8287715; Wed, 18 Jun 2025 11:58:24 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id uziop6L64xOS; Wed, 18 Jun 2025 11:58:23 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 9B45B82879FF; Wed, 18 Jun 2025 11:58:23 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 9B45B82879FF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1750265903; bh=Pcx4Y/gSefDYdeUAyqXJQfazbp0AxfDxq+BYdkzSYrU=; h=Date:From:To:Message-ID:MIME-Version; b=s/sJ8shhbgmPauWxMst2O7Uqt7RZgWFL216oq1FVlmrHdRbspHHw9wQnZR0oBUjPe XwabUIPp/CKySuMeXD0meGlnXOI9BozdFqAoyuVq/mJxxcX6G6LSVoMAmEUcaKLo9/ I/ua3xdbAn2q3vscvWAjrgYxGyGYXzg92alTPp0c= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id rOwciQdemcmg; Wed, 18 Jun 2025 11:58:23 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 619338287715; Wed, 18 Jun 2025 11:58:23 -0500 (CDT) Date: Wed, 18 Jun 2025 11:58:23 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <317515920.1310655.1750265903281.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <581463409.1310624.1750265668004.JavaMail.zimbra@raptorengineeringinc.com> References: <581463409.1310624.1750265668004.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC137 (Linux)/8.5.0_GA_3042) Thread-Topic: pci/hotplug/pnv_php: Fix surprise plug detection and Thread-Index: 7ViWVrejj338yZQm64sXoMCfdWvE4Nfh1kgU Content-Type: text/plain; charset="utf-8" recovery The existing PowerNV hotplug code did not handle suprise plug events correctly, leading to a complete failure of the hotplug system after device removal and a required reboot to detect new devices. This comes down to two issues: 1.) When a device is suprise removed, oftentimes the bridge upstream port will cause a PE freeze on the PHB. If this freeze is not cleared, the MSI interrupts from the bridge hotplug notification logic will not be received by the kernel, stalling all plug events on all slots associated with the PE. 2.) When a device is removed from a slot, regardless of suprise or programmatic removal, the associated PHB/PE ls left frozen. If this freeze is not cleared via a fundamental reset, skiboot is unable to clear the freeze and cannot retrain / rescan the slot. This also requires a reboot to clear the freeze and redetect the device in the slot. Issue the appropriate unfreeze and rescan commands on hotplug events, and don't oops on hotplug if pci_bus_to_OF_node() returns NULL. Signed-off-by: Timothy Pearson --- arch/powerpc/kernel/pci-hotplug.c | 3 ++ drivers/pci/hotplug/pnv_php.c | 53 ++++++++++++++++++++++++++++++- 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-ho= tplug.c index 9ea74973d78d..6f444d0822d8 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus) struct pci_controller *phb; struct device_node *dn =3D pci_bus_to_OF_node(bus); =20 + if (!dn) + return; + phb =3D pci_bus_to_host(bus); =20 mode =3D PCI_PROBE_NORMAL; diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index bac8af3df41a..0ceb4a2c3c79 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include =20 @@ -474,7 +475,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot= , bool rescan) struct hotplug_slot *slot =3D &php_slot->slot; uint8_t presence =3D OPAL_PCI_SLOT_EMPTY; uint8_t power_status =3D OPAL_PCI_SLOT_POWER_ON; - int ret; + int ret, i; =20 /* Check if the slot has been configured */ if (php_slot->state !=3D PNV_PHP_STATE_REGISTERED) @@ -532,6 +533,27 @@ static int pnv_php_enable(struct pnv_php_slot *php_slo= t, bool rescan) =20 /* Power is off, turn it on and then scan the slot */ ret =3D pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON); + if (ret) { + SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, poss= ible frozen PHB", ret); + SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot = activation\n"); + for (i =3D 0; i < 3; i++) { + /* Slot activation failed, PHB may be fenced from a prior device failure + * Use the OPAL fundamental reset call to both try a device reset and c= lear + * any potentially active PHB fence / freeze + */ + SLOT_WARN(php_slot, "Try %d...\n", i + 1); + pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset); + msleep(250); + pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset); + + ret =3D pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON); + if (!ret) + break; + } + + if (i >=3D 3) + SLOT_WARN(php_slot, "Failed to bring slot online, aborting!\n"); + } if (ret) return ret; =20 @@ -841,12 +863,41 @@ static void pnv_php_event_handler(struct work_struct = *work) struct pnv_php_event *event =3D container_of(work, struct pnv_php_event, work); struct pnv_php_slot *php_slot =3D event->php_slot; + struct pci_dev *pdev =3D php_slot->pdev; + struct eeh_dev *edev; + struct eeh_pe *pe; + int i, rc; =20 if (event->added) pnv_php_enable_slot(&php_slot->slot); else pnv_php_disable_slot(&php_slot->slot); =20 + if (!event->added) { + /* When a device is surprise removed from a downstream bridge slot, the = upstream bridge port + * can still end up frozen due to related EEH events, which will in turn= block the MSI interrupts + * for slot hotplug detection. Detect and thaw any frozen upstream PE a= fter slot deactivation... + */ + edev =3D pci_dev_to_eeh_dev(pdev); + pe =3D edev ? edev->pe : NULL; + rc =3D eeh_pe_get_state(pe); + if ((rc =3D=3D -ENODEV) || (rc =3D=3D -ENOENT)) { + SLOT_WARN(php_slot, "Upstream bridge PE state unknown, hotplug detect m= ay fail\n"); + } + else { + if (pe->state & EEH_PE_ISOLATED) { + SLOT_WARN(php_slot, "Upstream bridge PE %02x frozen, thawing...\n", pe= ->addr); + for (i =3D 0; i < 3; i++) + if (!eeh_unfreeze_pe(pe)) + break; + if (i >=3D 3) + SLOT_WARN(php_slot, "Unable to thaw PE %02x, hotplug detect will fail= !\n", pe->addr); + else + SLOT_WARN(php_slot, "PE %02x thawed successfully\n", pe->addr); + } + } + } + kfree(event); } =20 --=20 2.39.5