From nobody Tue Oct 7 01:58:30 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1399924DCEC; Tue, 15 Jul 2025 21:36:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615373; cv=none; b=TgtWhvDkQyg39OVS7+J1Fj/SAhHKw7bTvLeuQ+fyxMNx1kGZFqaq2zv490ZBF1pfDGHkoUXbbcuPtpopOKFY8hfTzUZHyPazBz7EfF0/OJYuP9CNMpsN504CO8Vw6OKGMgjJlpipInM+LNkgXBrMyGdN08PXKXkz7lWmLJdgGrg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615373; c=relaxed/simple; bh=cooPlAxjCzZeTlVy0Pk0w5jF9nmdibY3JwXPQ8KXd30=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=Nr7WQm4z1wPKH16Rugq+In2OFZXPsMJCtiNUh3pDotAVpdCga764EnSHAEfUPMkrCIBP/Hr+xXWNdJTJnnWYtd0Rsi96oPqm6KpQghgCwj6kSb5QPbScuz8uHfbMH75C1cWeJkuOPtJCLcmNdFXPureIQkrbBtiPeUuQYyX4rjA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=cpAhd5YB; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="cpAhd5YB" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 1F6A8828832E; Tue, 15 Jul 2025 16:36:10 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id A1hSnC2-OLw1; Tue, 15 Jul 2025 16:36:08 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 39FCD8288591; Tue, 15 Jul 2025 16:36:08 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 39FCD8288591 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1752615368; bh=fkGDCdBQx8jgAUOYnnBb2gotGMbzdN6Ov873STtMQvs=; h=Date:From:To:Message-ID:MIME-Version; b=cpAhd5YBB01yCpsWaZdCkhcsVPE1PAhHGANTJ4adXp4AyGK9JWDYkZmbNiZrKYswb G/wMZghOhryG1xlygGR9RzeCID8vyKTKuSYwRobbufezIqE4P3saEtG16Xg7HY+b4r v4LIbhNikk5xUp0TwRMxDcIxecWFw97DZ2QQ6dcM= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id xX_LOiZ2uwh4; Tue, 15 Jul 2025 16:36:08 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id ECC71828832E; Tue, 15 Jul 2025 16:36:07 -0500 (CDT) Date: Tue, 15 Jul 2025 16:36:07 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <2013845045.1359852.1752615367790.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> References: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v3 1/6] PCI: pnv_php: Properly clean up allocated IRQs on unplug Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC138 (Linux)/8.5.0_GA_3042) Thread-Topic: pnv_php: Properly clean up allocated IRQs on unplug Thread-Index: XyF2OaMn/3q+H+nwsGaxXLVF4U4PF4ckTqwo Content-Type: text/plain; charset="utf-8" In cases where the root of a nested PCIe bridge configuration is unplugged, the pnv_php driver would leak the allocated IRQ resources for the child bridges' hotplug event notifications, resulting in a panic. Fix this by walking all child buses and deallocating all it's IRQ resources before calling pci_hp_remove_devices. Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so that it is only destroyed in pnv_php_free_slot, instead of pnv_php_disable_irq. This is required since pnv_php_disable_irq will now be called by workers triggered by hot unplug interrupts, so the workqueue needs to stay allocated. The abridged kernel panic that occurs without this patch is as follows: WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+= 0x6c/0x9c CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2 Call Trace: msi_device_data_release+0x34/0x9c (unreliable) release_nodes+0x64/0x13c devres_release_all+0xc0/0x140 device_del+0x2d4/0x46c pci_destroy_dev+0x5c/0x194 pci_hp_remove_devices+0x90/0x128 pci_hp_remove_devices+0x44/0x128 pnv_php_disable_slot+0x54/0xd4 power_write_file+0xf8/0x18c pci_slot_attr_store+0x40/0x5c sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x3bc/0x50c ksys_write+0x84/0x140 system_call_exception+0x124/0x230 system_call_vectored_common+0x15c/0x2ec Signed-off-by: Shawn Anastasio Signed-off-by: Timothy Pearson Acked-by: Bjorn Helgaas Tested-by: Ganesh Goudar --- drivers/pci/hotplug/pnv_php.c | 94 ++++++++++++++++++++++++++++------- 1 file changed, 75 insertions(+), 19 deletions(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 573a41869c15..aec0a6d594ac 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -3,6 +3,7 @@ * PCI Hotplug Driver for PowerPC PowerNV platform. * * Copyright Gavin Shan, IBM Corporation 2016. + * Copyright (C) 2025 Raptor Engineering, LLC */ =20 #include @@ -36,8 +37,10 @@ static void pnv_php_register(struct device_node *dn); static void pnv_php_unregister_one(struct device_node *dn); static void pnv_php_unregister(struct device_node *dn); =20 +static void pnv_php_enable_irq(struct pnv_php_slot *php_slot); + static void pnv_php_disable_irq(struct pnv_php_slot *php_slot, - bool disable_device) + bool disable_device, bool disable_msi) { struct pci_dev *pdev =3D php_slot->pdev; u16 ctrl; @@ -53,19 +56,15 @@ static void pnv_php_disable_irq(struct pnv_php_slot *ph= p_slot, php_slot->irq =3D 0; } =20 - if (php_slot->wq) { - destroy_workqueue(php_slot->wq); - php_slot->wq =3D NULL; - } - - if (disable_device) { + if (disable_device || disable_msi) { if (pdev->msix_enabled) pci_disable_msix(pdev); else if (pdev->msi_enabled) pci_disable_msi(pdev); + } =20 + if (disable_device) pci_disable_device(pdev); - } } =20 static void pnv_php_free_slot(struct kref *kref) @@ -74,7 +73,8 @@ static void pnv_php_free_slot(struct kref *kref) struct pnv_php_slot, kref); =20 WARN_ON(!list_empty(&php_slot->children)); - pnv_php_disable_irq(php_slot, false); + pnv_php_disable_irq(php_slot, false, false); + destroy_workqueue(php_slot->wq); kfree(php_slot->name); kfree(php_slot); } @@ -561,8 +561,57 @@ static int pnv_php_reset_slot(struct hotplug_slot *slo= t, bool probe) static int pnv_php_enable_slot(struct hotplug_slot *slot) { struct pnv_php_slot *php_slot =3D to_pnv_php_slot(slot); + u32 prop32; + int ret; + + ret =3D pnv_php_enable(php_slot, true); + if (ret) + return ret; + + /* (Re-)enable interrupt if the slot supports surprise hotplug */ + ret =3D of_property_read_u32(php_slot->dn, "ibm,slot-surprise-pluggable",= &prop32); + if (!ret && prop32) + pnv_php_enable_irq(php_slot); + + return 0; +} + +/** + * Disable any hotplug interrupts for all slots on the provided bus, as we= ll as + * all downstream slots in preparation for a hot unplug. + */ +static int pnv_php_disable_all_irqs(struct pci_bus *bus) +{ + struct pci_bus *child_bus; + struct pci_slot *cur_slot; + + /* First go down child busses */ + list_for_each_entry(child_bus, &bus->children, node) + pnv_php_disable_all_irqs(child_bus); + + /* Disable IRQs for all pnv_php slots on this bus */ + list_for_each_entry(cur_slot, &bus->slots, list) { + struct pnv_php_slot *php_slot =3D to_pnv_php_slot(cur_slot->hotplug); + + pnv_php_disable_irq(php_slot, false, true); + } =20 - return pnv_php_enable(php_slot, true); + return 0; +} + +/** + * Disable any hotplug interrupts for all downstream slots on the provided= bus in + * preparation for a hot unplug. + */ +static int pnv_php_disable_all_downstream_irqs(struct pci_bus *bus) +{ + struct pci_bus *child_bus; + + /* Go down child busses, recursively deactivating their IRQs */ + list_for_each_entry(child_bus, &bus->children, node) + pnv_php_disable_all_irqs(child_bus); + + return 0; } =20 static int pnv_php_disable_slot(struct hotplug_slot *slot) @@ -579,6 +628,12 @@ static int pnv_php_disable_slot(struct hotplug_slot *s= lot) php_slot->state !=3D PNV_PHP_STATE_REGISTERED) return 0; =20 + /* Free all IRQ resources from all child slots before remove. + * Note that we do not disable the root slot IRQ here as that + * would also deactivate the slot hot (re)plug interrupt! + */ + pnv_php_disable_all_downstream_irqs(php_slot->bus); + /* Remove all devices behind the slot */ pci_lock_rescan_remove(); pci_hp_remove_devices(php_slot->bus); @@ -647,6 +702,15 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct = device_node *dn) return NULL; } =20 + /* Allocate workqueue for this slot's interrupt handling */ + php_slot->wq =3D alloc_workqueue("pciehp-%s", 0, 0, php_slot->name); + if (!php_slot->wq) { + SLOT_WARN(php_slot, "Cannot alloc workqueue\n"); + kfree(php_slot->name); + kfree(php_slot); + return NULL; + } + if (dn->child && PCI_DN(dn->child)) php_slot->slot_no =3D PCI_SLOT(PCI_DN(dn->child)->devfn); else @@ -843,14 +907,6 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_= slot, int irq) u16 sts, ctrl; int ret; =20 - /* Allocate workqueue */ - php_slot->wq =3D alloc_workqueue("pciehp-%s", 0, 0, php_slot->name); - if (!php_slot->wq) { - SLOT_WARN(php_slot, "Cannot alloc workqueue\n"); - pnv_php_disable_irq(php_slot, true); - return; - } - /* Check PDC (Presence Detection Change) is broken or not */ ret =3D of_property_read_u32(php_slot->dn, "ibm,slot-broken-pdc", &broken_pdc); @@ -869,7 +925,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_s= lot, int irq) ret =3D request_irq(irq, pnv_php_interrupt, IRQF_SHARED, php_slot->name, php_slot); if (ret) { - pnv_php_disable_irq(php_slot, true); + pnv_php_disable_irq(php_slot, true, true); SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret, irq); return; } --=20 2.39.5 From nobody Tue Oct 7 01:58:30 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC94F1C4A24; Tue, 15 Jul 2025 21:36:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615418; cv=none; b=lfoz/QGtJ7y8+03s2Bg9iBChDyd5RT2Urw6gdf7EJ3S0A4YaRztPO7KiauIlps+QA3tqD/86cvSTnuojYyRQeVa/q/in67IId72gA59fO9oscNfMPSjR0ZZvqG3c0Euf1L/7xWKsYPe0x6Mmt/zTUimTpK1f1N1/fwcdfOYVSQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615418; c=relaxed/simple; bh=crhUl8K7NfSIhNH3G1cHvCpoEuha4kT7cS8OI37RL0s=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=ju4qb7dZkoRZmYYdMn7INkdMG/FNIFG/Vrfz67S3byt+cAvfLoj6rOShYgPv7LbNbVo5W+Xs1DfBGKAYB1CtlxstpMC0swvmwxTvsv/IcjFqHmaIQ0kCzODB3o/kjgqYKqGzU+2a41H3sEpVd81IkMUXflXQbZeaJewgdrI4//E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=Nep9x+2P; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="Nep9x+2P" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 253FD8287698; Tue, 15 Jul 2025 16:36:56 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6SdXHPDblhg1; Tue, 15 Jul 2025 16:36:55 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 641068287FFD; Tue, 15 Jul 2025 16:36:55 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 641068287FFD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1752615415; bh=LaLmDzHUyKChIzl5rHPnEoGKej3I2INTlfN2Cmf7iEc=; h=Date:From:To:Message-ID:MIME-Version; b=Nep9x+2PP5OT7QM2AGqVAkHugYtjVvmWYI1Qj333bXkUvyWlST5kZQsoX7dYAl+Gv FAOTzeAo76T3YqpyfkG/9WeA+byKHMQIAA/+8Mt3FqW7HR6NykU2CsAUQpfLGL8RJw N5Q7387kx7wnA7hBZRwqTYvMvFE0HAmZN5B9tkb8= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id yxtRlKp8tukF; Tue, 15 Jul 2025 16:36:55 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 328838287698; Tue, 15 Jul 2025 16:36:55 -0500 (CDT) Date: Tue, 15 Jul 2025 16:36:55 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <505981576.1359853.1752615415117.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> References: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v3 2/6] PCI: pnv_php: Work around switches with broken presence detection Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC138 (Linux)/8.5.0_GA_3042) Thread-Topic: pnv_php: Work around switches with broken presence detection Thread-Index: XyF2OaMn/3q+H+nwsGaxXLVF4U4PF4ZEJFKH Content-Type: text/plain; charset="utf-8" The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system was observed to incorrectly assert the Presence Detect Set bit in its capabilities when tested on a Raptor Computing Systems Blackbird system, resulting in the hot insert path never attempting a rescan of the bus and any downstream devices not being re-detected. Work around this by additionally checking whether the PCIe data link is active or not when performing presence detection on downstream switches' ports, similar to the pciehp_hpc.c driver. Signed-off-by: Shawn Anastasio Signed-off-by: Timothy Pearson Acked-by: Bjorn Helgaas Tested-by: Ganesh Goudar --- drivers/pci/hotplug/pnv_php.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index aec0a6d594ac..bac8af3df41a 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -391,6 +391,20 @@ static int pnv_php_get_power_state(struct hotplug_slot= *slot, u8 *state) return 0; } =20 +static int pcie_check_link_active(struct pci_dev *pdev) +{ + u16 lnk_status; + int ret; + + ret =3D pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); + if (ret =3D=3D PCIBIOS_DEVICE_NOT_FOUND || PCI_POSSIBLE_ERROR(lnk_status)) + return -ENODEV; + + ret =3D !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); + + return ret; +} + static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state) { struct pnv_php_slot *php_slot =3D to_pnv_php_slot(slot); @@ -403,6 +417,19 @@ static int pnv_php_get_adapter_state(struct hotplug_sl= ot *slot, u8 *state) */ ret =3D pnv_pci_get_presence_state(php_slot->id, &presence); if (ret >=3D 0) { + if (pci_pcie_type(php_slot->pdev) =3D=3D PCI_EXP_TYPE_DOWNSTREAM && + presence =3D=3D OPAL_PCI_SLOT_EMPTY) { + /* + * Similar to pciehp_hpc, check whether the Link Active + * bit is set to account for broken downstream bridges + * that don't properly assert Presence Detect State, as + * was observed on the Microsemi Switchtec PM8533 PFX + * [11f8:8533]. + */ + if (pcie_check_link_active(php_slot->pdev) > 0) + presence =3D OPAL_PCI_SLOT_PRESENT; + } + *state =3D presence; ret =3D 0; } else { --=20 2.39.5 From nobody Tue Oct 7 01:58:30 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09DD41C4A24; Tue, 15 Jul 2025 21:37:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615457; cv=none; b=nSrHYUPdDa6oLcUBcxIZb1gZt/+o/jBaIrbuo/ccBJIiNGDs+4xA7HkQ6RXt/F4o0o/kMGJMYCneHgzgkkNzX7//e589mOyl4yIVyYXeFKRUT48HWl65R0IqDhnpo2FgWL/k81X5ZzNMt8RbpfjQCwGomlWyoXH9rtRfW77cVWA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615457; c=relaxed/simple; bh=gzsomBt1JFxUGu+9kwXv3urclaJVHAULxqDXl7LM9wM=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=qKoie3wY0CTY4qc0QUweFAV9ld4vesBYbAhqONMjznJeXNdv/woxR6ExcyqhsYxneCPjZM5IWTzZMbkYAJn9YfmaDasBBPg2BbOT38lo0KRR+9CwDZrMuBjTkhHaw1CTgBSGhDUKbCLo5nCVnAlmbj+2a+h2cy+TJFKt79NOXgI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=BixKi5jV; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="BixKi5jV" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 8008B8287698; Tue, 15 Jul 2025 16:37:35 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id DXqvs6LDPYCI; Tue, 15 Jul 2025 16:37:35 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id E43148288591; Tue, 15 Jul 2025 16:37:34 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com E43148288591 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1752615454; bh=a/RHOZtuKI3IFnHAVZbUtLVWSIeWz5dK8J8PMYUdjSM=; h=Date:From:To:Message-ID:MIME-Version; b=BixKi5jVzM1CraSKJsc/j0p65Q52gyeo3hTO/e4RST7uJhulhOPSkAzgY8mRW6vA6 vrsf58GgVMF8udf1XFP+aOqazqL/HsM3bZiTasLSxvthaY/FcScEKMms6dhypiuSxs e3AJxjNziEmTIAPWG13wC8PSWiCGbb6zd1YBUPSQ= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jUiqRAhxzMqk; Tue, 15 Jul 2025 16:37:34 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id AB7398287698; Tue, 15 Jul 2025 16:37:34 -0500 (CDT) Date: Tue, 15 Jul 2025 16:37:34 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <1778535414.1359858.1752615454618.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> References: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v3 3/6] powerpc/eeh: Export eeh_unfreeze_pe() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC138 (Linux)/8.5.0_GA_3042) Thread-Topic: powerpc/eeh: Export eeh_unfreeze_pe() Thread-Index: XyF2OaMn/3q+H+nwsGaxXLVF4U4PF4wBZI2J Content-Type: text/plain; charset="utf-8" The PowerNV hotplug driver needs to be able to clear any frozen PE(s) on the PHB after suprise removal of a downstream device. Export the eeh_unfreeze_pe() symbol to allow implementation of this functionality in the php_nv module. Signed-off-by: Timothy Pearson Acked-by: Bjorn Helgaas Tested-by: Ganesh Goudar --- arch/powerpc/kernel/eeh.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index ca7f7bb2b478..2b5f3323e107 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1139,6 +1139,7 @@ int eeh_unfreeze_pe(struct eeh_pe *pe) =20 return ret; } +EXPORT_SYMBOL_GPL(eeh_unfreeze_pe); =20 =20 static struct pci_device_id eeh_reset_ids[] =3D { --=20 2.39.5 From nobody Tue Oct 7 01:58:30 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F1231C4A24; Tue, 15 Jul 2025 21:38:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615507; cv=none; b=rSgPA7jQiaC8vXF2QNMwfHFwCmyU62cPSLocFB5p0k/q8xFzFQMyDoZRIibfJzAOt+brY1dqdSVj87fe8/07zoW4CuA/+n0uuhC2HOfDpomw0B5T9shuxBc50ewB9mDHzbKm3XDGmEtYkh2xst9ztJ9ZYNOIwX1nLurS5m9QoMs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615507; c=relaxed/simple; bh=U5Lk+VkVJJP5V8zUZPjwKlzGXzRmt+MR/UQ4a+BmKuk=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=n2VvfqtFfBFYl2n2VcQ9BTMEY0UuEsNipuAnRknHqOnsc67S3hFfKT95V6/z+AOGciD+B7X6CAJXsbGSOxBZKRmPOu+k3itDixPTsFe3/D9kSbWU2SuQgrYR9q10/gfz0Z3SprTxXgWwFW/BtjSHYfnVpd3ljpqdj5Fc3ermm1o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=jIxJcX2W; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="jIxJcX2W" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id C69F88287698; Tue, 15 Jul 2025 16:38:24 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id TrGGvDmz4rZm; Tue, 15 Jul 2025 16:38:23 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 7D1738288591; Tue, 15 Jul 2025 16:38:23 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 7D1738288591 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1752615503; bh=AzvxEw34wp1D5TRYWoqBxaNBDOvxPv48PtS4dQyEJ24=; h=Date:From:To:Message-ID:MIME-Version; b=jIxJcX2WRxl4nSywEZGn+vl4wE3YHhYFtu7Gm/FpZepSdPbebbPDrqn78SArsroiq X6EkBK7CYwQT+kYvpMCgue+tLqU5RPET4BMP5gquftgPIicx/L3ssx2BRXAuZc2yhV FH6YaDkAOLnhlrJTYEmzxNaJszDMTHJSsLq63j+k= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dNDVnC83S7-x; Tue, 15 Jul 2025 16:38:23 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 3E77D8287698; Tue, 15 Jul 2025 16:38:23 -0500 (CDT) Date: Tue, 15 Jul 2025 16:38:23 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <1334208367.1359861.1752615503144.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> References: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v3 4/6] powerpc/eeh: Make EEH driver device hotplug safe Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC138 (Linux)/8.5.0_GA_3042) Thread-Topic: powerpc/eeh: Make EEH driver device hotplug safe Thread-Index: XyF2OaMn/3q+H+nwsGaxXLVF4U4PF0HA+s8u Content-Type: text/plain; charset="utf-8" Multiple race conditions existed between the PCIe hotplug driver and the EEH driver, leading to a variety of kernel oopses of the same general nature: A second class of oops is also seen when the underling bus disappears during device recovery. Refactor the EEH module to be PCI rescan and remove safe. Also clean up a few minor formatting / readability issues. Signed-off-by: Timothy Pearson Acked-by: Bjorn Helgaas Tested-by: Ganesh Goudar --- arch/powerpc/kernel/eeh_driver.c | 48 +++++++++++++++++++++----------- arch/powerpc/kernel/eeh_pe.c | 10 ++++--- 2 files changed, 38 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_dri= ver.c index 7efe04c68f0f..dd50de91c438 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -257,13 +257,12 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, = eeh_report_fn fn, struct pci_driver *driver; enum pci_ers_result new_result; =20 - pci_lock_rescan_remove(); pdev =3D edev->pdev; if (pdev) get_device(&pdev->dev); - pci_unlock_rescan_remove(); if (!pdev) { eeh_edev_info(edev, "no device"); + *result =3D PCI_ERS_RESULT_DISCONNECT; return; } device_lock(&pdev->dev); @@ -304,8 +303,9 @@ static void eeh_pe_report(const char *name, struct eeh_= pe *root, struct eeh_dev *edev, *tmp; =20 pr_info("EEH: Beginning: '%s'\n", name); - eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp) - eeh_pe_report_edev(edev, fn, result); + eeh_for_each_pe(root, pe) + eeh_pe_for_each_dev(pe, edev, tmp) + eeh_pe_report_edev(edev, fn, result); if (result) pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n", name, pci_ers_result_name(*result)); @@ -383,6 +383,8 @@ static void eeh_dev_restore_state(struct eeh_dev *edev,= void *userdata) if (!edev) return; =20 + pci_lock_rescan_remove(); + /* * The content in the config space isn't saved because * the blocked config space on some adapters. We have @@ -393,14 +395,19 @@ static void eeh_dev_restore_state(struct eeh_dev *ede= v, void *userdata) if (list_is_last(&edev->entry, &edev->pe->edevs)) eeh_pe_restore_bars(edev->pe); =20 + pci_unlock_rescan_remove(); return; } =20 pdev =3D eeh_dev_to_pci_dev(edev); - if (!pdev) + if (!pdev) { + pci_unlock_rescan_remove(); return; + } =20 pci_restore_state(pdev); + + pci_unlock_rescan_remove(); } =20 /** @@ -647,9 +654,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct p= ci_bus *bus, if (any_passed || driver_eeh_aware || (pe->type & EEH_PE_VF)) { eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data); } else { - pci_lock_rescan_remove(); pci_hp_remove_devices(bus); - pci_unlock_rescan_remove(); } =20 /* @@ -665,8 +670,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct p= ci_bus *bus, if (rc) return rc; =20 - pci_lock_rescan_remove(); - /* Restore PE */ eeh_ops->configure_bridge(pe); eeh_pe_restore_bars(pe); @@ -674,7 +677,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct p= ci_bus *bus, /* Clear frozen state */ rc =3D eeh_clear_pe_frozen_state(pe, false); if (rc) { - pci_unlock_rescan_remove(); return rc; } =20 @@ -709,7 +711,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct p= ci_bus *bus, pe->tstamp =3D tstamp; pe->freeze_count =3D cnt; =20 - pci_unlock_rescan_remove(); return 0; } =20 @@ -843,10 +844,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe) {LIST_HEAD_INIT(rmv_data.removed_vf_list), 0}; int devices =3D 0; =20 + pci_lock_rescan_remove(); + bus =3D eeh_pe_bus_get(pe); if (!bus) { pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n", __func__, pe->phb->global_number, pe->addr); + pci_unlock_rescan_remove(); return; } =20 @@ -1094,10 +1098,15 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true); eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); =20 - pci_lock_rescan_remove(); - pci_hp_remove_devices(bus); - pci_unlock_rescan_remove(); + bus =3D eeh_pe_bus_get(pe); + if (bus) + pci_hp_remove_devices(bus); + else + pr_err("%s: PCI bus for PHB#%x-PE#%x disappeared\n", + __func__, pe->phb->global_number, pe->addr); + /* The passed PE should no longer be used */ + pci_unlock_rescan_remove(); return; } =20 @@ -1114,6 +1123,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_clear_slot_attention(edev->pdev); =20 eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true); + + pci_unlock_rescan_remove(); } =20 /** @@ -1132,6 +1143,7 @@ void eeh_handle_special_event(void) unsigned long flags; int rc; =20 + pci_lock_rescan_remove(); =20 do { rc =3D eeh_ops->next_error(&pe); @@ -1171,10 +1183,12 @@ void eeh_handle_special_event(void) =20 break; case EEH_NEXT_ERR_NONE: + pci_unlock_rescan_remove(); return; default: pr_warn("%s: Invalid value %d from next_error()\n", __func__, rc); + pci_unlock_rescan_remove(); return; } =20 @@ -1186,7 +1200,9 @@ void eeh_handle_special_event(void) if (rc =3D=3D EEH_NEXT_ERR_FROZEN_PE || rc =3D=3D EEH_NEXT_ERR_FENCED_PHB) { eeh_pe_state_mark(pe, EEH_PE_RECOVERING); + pci_unlock_rescan_remove(); eeh_handle_normal_event(pe); + pci_lock_rescan_remove(); } else { eeh_for_each_pe(pe, tmp_pe) eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev) @@ -1199,7 +1215,6 @@ void eeh_handle_special_event(void) eeh_report_failure, NULL); eeh_set_channel_state(pe, pci_channel_io_perm_failure); =20 - pci_lock_rescan_remove(); list_for_each_entry(hose, &hose_list, list_node) { phb_pe =3D eeh_phb_pe_get(hose); if (!phb_pe || @@ -1218,7 +1233,6 @@ void eeh_handle_special_event(void) } pci_hp_remove_devices(bus); } - pci_unlock_rescan_remove(); } =20 /* @@ -1228,4 +1242,6 @@ void eeh_handle_special_event(void) if (rc =3D=3D EEH_NEXT_ERR_DEAD_IOC) break; } while (rc !=3D EEH_NEXT_ERR_NONE); + + pci_unlock_rescan_remove(); } diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index d283d281d28e..e740101fadf3 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -671,10 +671,12 @@ static void eeh_bridge_check_link(struct eeh_dev *ede= v) eeh_ops->write_config(edev, cap + PCI_EXP_LNKCTL, 2, val); =20 /* Check link */ - if (!edev->pdev->link_active_reporting) { - eeh_edev_dbg(edev, "No link reporting capability\n"); - msleep(1000); - return; + if (edev->pdev) { + if (!edev->pdev->link_active_reporting) { + eeh_edev_dbg(edev, "No link reporting capability\n"); + msleep(1000); + return; + } } =20 /* Wait the link is up until timeout (5s) */ --=20 2.39.5 From nobody Tue Oct 7 01:58:30 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A1511DC9B8; Tue, 15 Jul 2025 21:39:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615551; cv=none; b=aPpHfpZu9UHes/Hg9XqrPWEzw6BDOob/1nXzVrjUq+7ySsbfqr9UhfKAmAOQsJWq/8alH8/LJ/2BU/TjgOMysL/i5zQq5vXdga7ksYYQV9OhYYcbtMX8TMMr/+G8+hvPJeroWstDEumbTem9AB1ZL5aBfPUwGZ0s+Ut93KvUuwA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615551; c=relaxed/simple; bh=UIpyUk1ArThj4w4ht64zUSfWFZzv5915C1NSvpvwuxA=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=cdokaV5887LiGzjNM7ve3Lunb7FCnHVJ7CWbWRzPp4ZhhfI4gaiOxUDlT+A4ayAExHfFA0n04mHblFAT13w8HlhGGXvqivEkY/ywgPZd/sUtfOsU4rIxL/Z+5tWdhiA/WlvN8qfL6Gq4y3x1/mrKOuaCeHTwW8sVlgfJuyUsy/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=NqKmlqCb; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="NqKmlqCb" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 902FC8287698; Tue, 15 Jul 2025 16:39:08 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5U_tYvSKFoQ6; Tue, 15 Jul 2025 16:39:07 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 47CD68288591; Tue, 15 Jul 2025 16:39:07 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 47CD68288591 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1752615547; bh=8PjzjruFthJ/9WkYWAPNc3CMtV9pv0s1uiS/VXTCoIY=; h=Date:From:To:Message-ID:MIME-Version; b=NqKmlqCb9OR06Ks97mDo2JSnewgMuR2fKo0w3RKxNZNf1DqsvtQIgGruixPvIsNYd VSStFlkxfJqR0Y0qAoDiJOaCyGeHRa7y8k4CC77fjM710BMfVRd0J3h+s25/gtH8Pb gvjUKpnCcUWccv+eRoC3I4C6HNGeanl7DZpPfn/0= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id gc4u1bx2RXq7; Tue, 15 Jul 2025 16:39:07 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 118AF8287698; Tue, 15 Jul 2025 16:39:07 -0500 (CDT) Date: Tue, 15 Jul 2025 16:39:06 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <171044224.1359864.1752615546988.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> References: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v3 5/6] PCI: pnv_php: Fix surprise plug detection and recovery Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC138 (Linux)/8.5.0_GA_3042) Thread-Topic: pnv_php: Fix surprise plug detection and recovery Thread-Index: XyF2OaMn/3q+H+nwsGaxXLVF4U4PFwXmHrh/ Content-Type: text/plain; charset="utf-8" The existing PowerNV hotplug code did not handle surprise plug events correctly, leading to a complete failure of the hotplug system after device removal and a required reboot to detect new devices. This comes down to two issues: 1.) When a device is surprise removed, oftentimes the bridge upstream port will cause a PE freeze on the PHB. If this freeze is not cleared, the MSI interrupts from the bridge hotplug notification logic will not be received by the kernel, stalling all plug events on all slots associated with the PE. 2.) When a device is removed from a slot, regardless of surprise or programmatic removal, the associated PHB/PE ls left frozen. If this freeze is not cleared via a fundamental reset, skiboot is unable to clear the freeze and cannot retrain / rescan the slot. This also requires a reboot to clear the freeze and redetect the device in the slot. Issue the appropriate unfreeze and rescan commands on hotplug events, and don't oops on hotplug if pci_bus_to_OF_node() returns NULL. Signed-off-by: Timothy Pearson Acked-by: Bjorn Helgaas Tested-by: Ganesh Goudar --- arch/powerpc/kernel/pci-hotplug.c | 3 + drivers/pci/hotplug/pnv_php.c | 108 +++++++++++++++++++++++++++++- 2 files changed, 108 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-ho= tplug.c index 9ea74973d78d..6f444d0822d8 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus) struct pci_controller *phb; struct device_node *dn =3D pci_bus_to_OF_node(bus); =20 + if (!dn) + return; + phb =3D pci_bus_to_host(bus); =20 mode =3D PCI_PROBE_NORMAL; diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index bac8af3df41a..3533f7f23b71 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -4,12 +4,14 @@ * * Copyright Gavin Shan, IBM Corporation 2016. * Copyright (C) 2025 Raptor Engineering, LLC + * Copyright (C) 2025 Raptor Computing Systems, LLC */ =20 #include #include #include #include +#include #include #include =20 @@ -469,6 +471,59 @@ static int pnv_php_set_attention_state(struct hotplug_= slot *slot, u8 state) return 0; } =20 +static int pnv_php_activate_slot(struct pnv_php_slot *php_slot, + struct hotplug_slot *slot) +{ + int ret, i; + + /* + * Issue initial slot activation command to firmware + * + * Firmware will power slot on, attempt to train the link, and discover a= ny downstream devices + * If this process fails, firmware will return an error code and an inval= id device tree + * Failure can be caused for multiple reasons, including a faulty downstr= eam device, + * poor connection to the downstream device, or a previously latched PHB = fence. + * On failure, issue fundamental reset up to three times before aborting. + */ + ret =3D pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON); + if (ret) { + SLOT_WARN( + php_slot, + "PCI slot activation failed with error code %d, possible frozen PHB", + ret); + SLOT_WARN( + php_slot, + "Attempting complete PHB reset before retrying slot activation\n"); + for (i =3D 0; i < 3; i++) { + /* + * Slot activation failed, PHB may be fenced from a + * prior device failure. + * + * Use the OPAL fundamental reset call to both try a + * device reset and clear any potentially active PHB + * fence / freeze. + */ + SLOT_WARN(php_slot, "Try %d...\n", i + 1); + pci_set_pcie_reset_state(php_slot->pdev, + pcie_warm_reset); + msleep(250); + pci_set_pcie_reset_state(php_slot->pdev, + pcie_deassert_reset); + + ret =3D pnv_php_set_slot_power_state( + slot, OPAL_PCI_SLOT_POWER_ON); + if (!ret) + break; + } + + if (i >=3D 3) + SLOT_WARN(php_slot, + "Failed to bring slot online, aborting!\n"); + } + + return ret; +} + static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan) { struct hotplug_slot *slot =3D &php_slot->slot; @@ -531,7 +586,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot= , bool rescan) goto scan; =20 /* Power is off, turn it on and then scan the slot */ - ret =3D pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON); + ret =3D pnv_php_activate_slot(php_slot, slot); if (ret) return ret; =20 @@ -836,16 +891,63 @@ static int pnv_php_enable_msix(struct pnv_php_slot *p= hp_slot) return entry.vector; } =20 +static void +pnv_php_detect_clear_suprise_removal_freeze(struct pnv_php_slot *php_slot) +{ + struct pci_dev *pdev =3D php_slot->pdev; + struct eeh_dev *edev; + struct eeh_pe *pe; + int i, rc; + + /* + * When a device is surprise removed from a downstream bridge slot, + * the upstream bridge port can still end up frozen due to related EEH + * events, which will in turn block the MSI interrupts for slot hotplug + * detection. + * + * Detect and thaw any frozen upstream PE after slot deactivation... + */ + edev =3D pci_dev_to_eeh_dev(pdev); + pe =3D edev ? edev->pe : NULL; + rc =3D eeh_pe_get_state(pe); + if ((rc =3D=3D -ENODEV) || (rc =3D=3D -ENOENT)) { + SLOT_WARN( + php_slot, + "Upstream bridge PE state unknown, hotplug detect may fail\n"); + } else { + if (pe->state & EEH_PE_ISOLATED) { + SLOT_WARN( + php_slot, + "Upstream bridge PE %02x frozen, thawing...\n", + pe->addr); + for (i =3D 0; i < 3; i++) + if (!eeh_unfreeze_pe(pe)) + break; + if (i >=3D 3) + SLOT_WARN( + php_slot, + "Unable to thaw PE %02x, hotplug detect will fail!\n", + pe->addr); + else + SLOT_WARN(php_slot, + "PE %02x thawed successfully\n", + pe->addr); + } + } +} + static void pnv_php_event_handler(struct work_struct *work) { struct pnv_php_event *event =3D container_of(work, struct pnv_php_event, work); struct pnv_php_slot *php_slot =3D event->php_slot; =20 - if (event->added) + if (event->added) { pnv_php_enable_slot(&php_slot->slot); - else + } else { pnv_php_disable_slot(&php_slot->slot); + pnv_php_detect_clear_suprise_removal_freeze(php_slot); + } =20 kfree(event); } --=20 2.39.5 From nobody Tue Oct 7 01:58:30 2025 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 276E91C4A24; Tue, 15 Jul 2025 21:39:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=23.155.224.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615587; cv=none; b=X5PBiwdNSsCOVjLq9I3FlnKkoeVzx+1mFCZrAwg+duGxAhLTCMTsnfBgJLdOGmsBwHuKkJw2vj37HblvLm5hoL7P3BVr97y1XSKdEzWMTEo0B2kxUCv6Do7+Y4prvANoL+tF/kOnB90LbkyoBIaEiDTaXQ+KFzqEX6QPxBL26QE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752615587; c=relaxed/simple; bh=Lhc/UTs9aXQrD4O3R05g/Q2eQFWdW6oU8etPZNg2KyM=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: MIME-Version:Content-Type; b=bDnVERirGSVaIrJlJD6aV9iC0/YC0EGV4pCy/PWVl+Qjz4pN6XVIOvoNWZzSiG6GYZ57Qmq5AYBEKTs1cVcoqyEPBaDNnifxywoerNo9IY4cSuZ09m2sxqUE+B7ueaYkPGvuXEGAGlTT0nLN4MwANoWyDifQvD8r6p6Kb9sICqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com; spf=pass smtp.mailfrom=raptorengineering.com; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b=D3iAdAgV; arc=none smtp.client-ip=23.155.224.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=raptorengineering.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=raptorengineering.com header.i=@raptorengineering.com header.b="D3iAdAgV" Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 57CE08286D62; Tue, 15 Jul 2025 16:39:43 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id tmyS4NZMKEi3; Tue, 15 Jul 2025 16:39:42 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 5087B8287698; Tue, 15 Jul 2025 16:39:42 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 5087B8287698 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1752615582; bh=R/JTJdtwW+QkveuipqtnZ2LcFEWOkXpaagYwjf7bvsU=; h=Date:From:To:Message-ID:MIME-Version; b=D3iAdAgVu/nXdic/B0Uspr+WTRVXdaQpR4f6o+2w4RVT6N/HZoVhElCyeQ8BlqXcW 29fuWOnScL8FRtP3mVPwyBGfmxY3fcBB5kC8gADmcvoOr0s6URKYGJcrkPAZyM81kv pSqrk6Dsk01Muk1HWcjk7wIJlQ21qpkt0ZThnhgg= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1e5Fx11LJPWI; Tue, 15 Jul 2025 16:39:42 -0500 (CDT) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 173C08286D62; Tue, 15 Jul 2025 16:39:42 -0500 (CDT) Date: Tue, 15 Jul 2025 16:39:42 -0500 (CDT) From: Timothy Pearson To: Timothy Pearson Cc: linuxppc-dev , linux-kernel , linux-pci , Madhavan Srinivasan , Michael Ellerman , christophe leroy , Naveen N Rao , Bjorn Helgaas , Shawn Anastasio Message-ID: <1210309411.1359866.1752615582001.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> References: <1268570622.1359844.1752615109932.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v3 6/6] PCI: pnv_php: Enable third attention indicator state Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC138 (Linux)/8.5.0_GA_3042) Thread-Topic: pnv_php: Enable third attention indicator state Thread-Index: XyF2OaMn/3q+H+nwsGaxXLVF4U4PFzXTlJEB Content-Type: text/plain; charset="utf-8" The PCIe specification allows three attention indicator states, on, off, and blink. Enable all three states instead of basic on / off control. This changes the userspace API to match the behavior of pcihp. Signed-off-by: Timothy Pearson Acked-by: Bjorn Helgaas Tested-by: Ganesh Goudar --- drivers/pci/hotplug/pnv_php.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index 3533f7f23b71..c65460ced862 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -441,10 +441,23 @@ static int pnv_php_get_adapter_state(struct hotplug_s= lot *slot, u8 *state) return ret; } =20 +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8 = *state) +{ + struct pnv_php_slot *php_slot =3D to_pnv_php_slot(slot); + struct pci_dev *bridge =3D php_slot->pdev; + u16 status; + + pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status); + *state =3D (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6; + return 0; +} + + static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *stat= e) { struct pnv_php_slot *php_slot =3D to_pnv_php_slot(slot); =20 + pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state); *state =3D php_slot->attention_state; return 0; } @@ -462,7 +475,7 @@ static int pnv_php_set_attention_state(struct hotplug_s= lot *slot, u8 state) mask =3D PCI_EXP_SLTCTL_AIC; =20 if (state) - new =3D PCI_EXP_SLTCTL_ATTN_IND_ON; + new =3D FIELD_PREP(PCI_EXP_SLTCTL_AIC, state); else new =3D PCI_EXP_SLTCTL_ATTN_IND_OFF; =20 --=20 2.39.5