From nobody Mon May 25 05:14:59 2026 Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0399409E0A for ; Mon, 18 May 2026 13:23:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=209.85.167.42 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779110632; cv=pass; b=DDZIK/PdffyPddeRPKR/HoWLwuhTZ4M2DRqF+Of5znjw87AxcHCCeHkubjHyyrXNCpGvfVVWEcUOkrtsDS/fWsxhN81YGRnjLMWit8zJMt7W8237j+4o6tEFDyiOw+BlTIbuFkT7Wk/uT7ursr3s7p+doe7o3jzIBE2xtWK0p70= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779110632; c=relaxed/simple; bh=r8uqepWtcWCbo3g8XZJgo8dJTIh+Bn53roFrsh/VdHg=; h=MIME-Version:From:Date:Message-ID:Subject:To:Cc:Content-Type; b=FPrw0+qHPOZIelIGi2x7/xfX0e/ixEBXKu9HYjQ9iI5jgdJAvSDxz0c4Auq1e7gOTsd3h0m1xkcNRYb6e7rsUtj8qkuLEC/ZM/stUnScGLIT9R/5ZYSWTB+V7VOSBPx4EiKtm9Zrs1jxHr9o7gqzlfKli1ZUFvXZqQY4fbWlJas= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=arista.com; spf=pass smtp.mailfrom=arista.com; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b=IAgMEVnM; arc=pass smtp.client-ip=209.85.167.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=arista.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arista.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b="IAgMEVnM" Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-5a8fbe18b1dso3680612e87.2 for ; Mon, 18 May 2026 06:23:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779110629; cv=none; d=google.com; s=arc-20240605; b=Y0PdUKhF+9s9NmiQbtcjGrV/2LVAEgpA47kLQOoEZcpVnI6aoQk4vAPSriPAzO2NwL warMLOnTxUHKXpQDXu0Y9thsSMqLFTdfjU1XnaIm7mgIPIRBvMlPxjYxEoeXKg+mZVbb NN2lhEk9AOnZSL/DCkZruMesPDAShYWF+w+9f4uXt93x2yGfT6wZSCls7Kay2yRSANN4 mp/c5GDlgtMp+HYCl8Sp4+oEXBFdKTPvEJ/Fm+/Xk4g0cu6bMacq8IeggGVrmtWBqa95 1iyxnFfdZeMMERKs9mVbXSO/Bwts04Gak7nFyRjixYpMn5hBcxvtUY7ml5EXYl4HY6/G 4bkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:mime-version:dkim-signature; bh=iqfmEHThWndcoL8uij8sV8NK0VPjWDK3IFq607zTzh4=; fh=/YXltf5VO7sihIkfIREke2M8jMYkaUrjehX97L9QjpA=; b=GyPjxF9KBC7XbtsAwAItv2nScmrtplNrEfh+Z4lhdhF7VLrelea6lehinMWniqxZ9T erdcVScbJAExMXsBV/aSD827ApnP/U4ZYADBLSrxP5CSUgfKQPUFCMKoVYskwKBKJEf4 JiGtg5R0NNF2ubvLL119Zd1pevbeFSJaPdZgdY02dw6P3uhbLt5ah4n8SfQIyaodiE5n 9xCNW+y6haOBVqipwOywn7VWubLcBWA/oSD9/QwUKq13SNypBMFP0cTR31GGeV6RZp8n l/U/BuXPCkjulPEcAftshCy31Ng0ejYhtV6rAvuTFM7BSrJnq/Gy5UV19nDYGLh0mK4d 9zxA==; darn=vger.kernel.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=google; t=1779110629; x=1779715429; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=iqfmEHThWndcoL8uij8sV8NK0VPjWDK3IFq607zTzh4=; b=IAgMEVnMi1RJzReH/uLaU/G976gdfc3xFwpfpAmbKgbyWMG5nI02yfsLgVE/4ywLs+ 3KGu1Q5e2IEhaTWv4dTnDC2MefgpN5wjGvAtqe3Hu89gz2XfwT+Cz9KAKs01a/RO6YJ3 jvVYguNthWc+c9ZajHgIA99LWNm4OaaONII6liMlgLx2DhgJR/9BbsnRbf4L/VPIGeWk 7Mpo76+ZG7DMku1KY8zHrtwCYUW+EQMJ9HHE8DTW5yFlygCUDubK9nplOVOUY6XSQaXh Y45cKEtGT8NMemCYTxVBzNHkpTV7dpRKRpSRNvU03evGIX4BrLKUfhWHPZB0186od+vf Q7VQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779110629; x=1779715429; h=cc:to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iqfmEHThWndcoL8uij8sV8NK0VPjWDK3IFq607zTzh4=; b=OKHWs1LHKj7rxY62z4MfLlNIpMD/qxp/30GZ47nKYdGOPFCo0xrijkQv0NxDAsb532 MTghEuW76DidoVe3b24Y2qt6lx36zMdYt//yBwr7bfpkxUJKnvbnRP2SxwYjZ40dD8qO bxJMqigv2oVnUc6fI/1FFe/jo2kKQ/5CTDaosmHWzK7BJiRIte3ZUHHNuOPHcCCzAnWc Ed1NeFbxU6w/MCkxDIIU2N+gW3i6sU9Ac0AOBCkfoBz678vnMiQ711TvOnQ5CjQVRFyg 4avdJx9CjvzKglqKSQWOEmec6NCyrCsSo7/McON6jJ6tpOCAZyjhiu3iswLAZMS6TME3 /Kzw== X-Forwarded-Encrypted: i=1; AFNElJ9mFf7W9+aYDUryjjACkOxlp7ri4mJ1RoYEvkhZZ24b2g/n4BMy0dCNuNHWfb3+waG4kh5DgM4XLrP0bp4=@vger.kernel.org X-Gm-Message-State: AOJu0YyPtWw6MxsDoFUD9XTNsSzdm4MqhLFJvktVhPTsWfiBztAkyMjx MP4OSBBVtjQenXGfsur1yiBq8XXn6erEhslmlyjkiM/DktXZo9i+sc3y8ChbJLCpSk8uEr4Mrq9 4IrnDW6dNFkycAi8cviGvFEawPgHGZgKE50eEunTn X-Gm-Gg: Acq92OFrATcV+5JNW+pVQoKZjZbLAg43qTNjW4+D23AFqXdV+mQgCnZMRsSJuoJKQKN 6YADAZWAqWf2S/SdsOVq/NEcz+6Q4fyrag3yNctiJ1WUsInQ1NR4lKp86tcKR6X7CSmSVwpd0uX B1VCY7HmuaHOBkn43LYKO/+xTiFFqjbkxRjXYhifBnqdkSGl7YuPnGLyKNITWTWlSvV82NaDX+j Kmt2riLA7C//xlz9PbyJaDK2Mu0H668sKV57eD5mqziquCl5LGoLVCtbeK7HucvxqIK6jzwxHvE vlzNMQ1j5IeGf/DG0uE= X-Received: by 2002:a05:6512:3d04:b0:5a2:b8cf:39ce with SMTP id 2adb3069b0e04-5aa0e7428damr4225351e87.10.1779110628885; Mon, 18 May 2026 06:23:48 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yury Murashka Date: Mon, 18 May 2026 14:23:36 +0100 X-Gm-Features: AVHnY4JOrJy53gmu77kAh1twBosgSM7U8IAMkHT73ISMbEifpQmD_iNo2Y-2TdY Message-ID: Subject: [PATCH] PCI/AER: Clear non-fatal errors on AER recovery failure To: bhelgaas@google.com, mahesh@linux.ibm.com Cc: oohall@gmail.com, corbet@lwn.net, skhan@linuxfoundation.org, linux-pci@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Yury Murashka Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" pci_aer_clear_nonfatal_status() is not called when AER recovery fails. If a new AER error is subsequently reported, the AER driver calls find_source_device() to find the source of the error. It rescans the whole bus and picks the first device reporting an AER error. Because the previous error was never cleared, the error is attributed to the wrong device and AER recovery is started for the wrong device. Add a kernel boot parameter pci=3Daer_clear_on_recovery_failure to clear AER error status even when recovery fails, preventing stale errors from causing incorrect device identification on subsequent AER events. Signed-off-by: Yury Murashka --- Documentation/admin-guide/kernel-parameters.txt | 5 +++++ drivers/pci/pci.c | 2 ++ drivers/pci/pci.h | 2 ++ drivers/pci/pcie/err.c | 13 +++++++++++++ 4 files changed, 22 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 4d0f545fb..5a9e266f5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5301,6 +5301,11 @@ Kernel parameters nomio [S390] Do not use MIO instructions. norid [S390] ignore the RID field and force use of one PCI domain per PCI function + aer_clear_on_recovery_failure + [PCIE] If the PCIEAER kernel config paramet= er is + enabled, this kernel boot option can be use= d to + enable AER errors cleanup even if error rec= overy + failed. notph [PCIE] If the PCIE_TPH kernel config parame= ter is enabled, this kernel boot option can be = used to disable PCIe TLP Processing Hints support diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d34266651..701459c62 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6769,6 +6769,8 @@ static int __init pci_setup(char *str) disable_acs_redir_param =3D str + 18; } else if (!strncmp(str, "config_acs=3D", 11)) { config_acs_param =3D str + 11; + } else if (!strncmp(str, "aer_clear_on_recovery_failure", 29)) { + pci_enable_aer_clear_on_recovery_failure(); } else { pr_err("PCI: Unknown option `%s'\n", str); } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 4a14f88e5..093a7c896 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -1292,6 +1292,7 @@ int pci_aer_clear_status(struct pci_dev *dev); int pci_aer_raw_clear_status(struct pci_dev *dev); void pci_save_aer_state(struct pci_dev *dev); void pci_restore_aer_state(struct pci_dev *dev); +void pci_enable_aer_clear_on_recovery_failure(void); #else static inline void pci_no_aer(void) { } static inline void pci_aer_init(struct pci_dev *d) { } @@ -1301,6 +1302,7 @@ static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; } static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; } static inline void pci_save_aer_state(struct pci_dev *dev) { } static inline void pci_restore_aer_state(struct pci_dev *dev) { } +static inline void pci_enable_aer_clear_on_recovery_failure(void) { } #endif #ifdef CONFIG_ACPI diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index bebe4bc11..29d655a34 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -21,6 +21,13 @@ #include "portdrv.h" #include "../pci.h" +static int enable_aer_clear_on_recovery_failure; + +void pci_enable_aer_clear_on_recovery_failure(void) +{ + enable_aer_clear_on_recovery_failure =3D 1; +} + static pci_ers_result_t merge_result(enum pci_ers_result orig, enum pci_ers_result new) { @@ -289,6 +296,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, return status; failed: + if (enable_aer_clear_on_recovery_failure && + (host->native_aer || pcie_ports_native)) { + pcie_clear_device_status(dev); + pci_aer_clear_nonfatal_status(dev); + } + pci_walk_bridge(bridge, pci_pm_runtime_put, NULL); pci_walk_bridge(bridge, report_perm_failure_detected, NULL); -- 2.51.0