From nobody Fri Dec 19 16:06:57 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0066BC53210 for ; Tue, 3 Jan 2023 16:55:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237834AbjACQzz (ORCPT ); Tue, 3 Jan 2023 11:55:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233476AbjACQzu (ORCPT ); Tue, 3 Jan 2023 11:55:50 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71A8EB30; Tue, 3 Jan 2023 08:55:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1672764949; x=1704300949; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=lCf3CkygM0RRymks3SyRZKTWmUhd9Lv3qTfO+5Y+XeQ=; b=jGzcuX5FopK9dGp4YeznBafHxbzwqgqlguEZ2IDRTrRb6zUinI0GZZRs aDbBXAIYs3ChsK/GE2BkqYZ5XDeCGHTUuocdx7+qTdqD6GAHDml5zLcdO NYyaRlw4vpeqDBdoxwBnhLWODSjjEK7HuO8bSDSn9jtRkoDLTvBr74uDO 6U2U/uRj1+Lia+tqRtNNWhkHmRrifsqrstiShdyYJmfxo67cRbg5uOrpV qVLoFJZIry2ek8ucx0hhkKHpf7G8E2apshL5xGS4iNGA4LFVH0/lblknu kQ/nhDcM5MmsWJ00+co9SadLaHOxaJqDxmV/fKeR2QzqcVMnZcec6+tA0 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10579"; a="301385066" X-IronPort-AV: E=Sophos;i="5.96,297,1665471600"; d="scan'208";a="301385066" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2023 08:55:48 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10579"; a="656820695" X-IronPort-AV: E=Sophos;i="5.96,297,1665471600"; d="scan'208";a="656820695" Received: from unknown (HELO rajath-NUC10i7FNH..) ([10.223.165.88]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2023 08:55:46 -0800 From: Rajat Khandelwal To: ruscur@russell.cc, oohall@gmail.com, bhelgaas@google.com Cc: linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, rajat.khandelwal@intel.com, Rajat Khandelwal Subject: [PATCH] PCI/AER: Rate limit the reporting of the correctable errors Date: Tue, 3 Jan 2023 22:25:48 +0530 Message-Id: <20230103165548.570377-1-rajat.khandelwal@linux.intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There are many instances where correctable errors tend to inundate the message buffer. We observe such instances during thunderbolt PCIe tunneling. It's true that they are mitigated by the hardware and are non-fatal but we shouldn't be spamming the logs with such correctable errors as it confuses other kernel developers less familiar with PCI errors, support staff, and users who happen to look at the logs, hence rate limit them. A typical example log inside an HP TBT4 dock: [54912.661142] pcieport 0000:00:07.0: AER: Multiple Corrected error receive= d: 0000:2b:00.0 [54912.661194] igc 0000:2b:00.0: PCIe Bus Error: severity=3DCorrected, type= =3DData Link Layer, (Transmitter ID) [54912.661203] igc 0000:2b:00.0: device [8086:5502] error status/mask=3D0= 0001100/00002000 [54912.661211] igc 0000:2b:00.0: [ 8] Rollover [54912.661219] igc 0000:2b:00.0: [12] Timeout [54982.838760] pcieport 0000:00:07.0: AER: Corrected error received: 0000:2= b:00.0 [54982.838798] igc 0000:2b:00.0: PCIe Bus Error: severity=3DCorrected, type= =3DData Link Layer, (Transmitter ID) [54982.838808] igc 0000:2b:00.0: device [8086:5502] error status/mask=3D0= 0001000/00002000 [54982.838817] igc 0000:2b:00.0: [12] Timeout This gets repeated continuously, thus inundating the buffer. Signed-off-by: Rajat Khandelwal --- drivers/pci/pcie/aer.c | 54 +++++++++++++++++++++++++++--------------- include/linux/pci.h | 3 +++ 2 files changed, 38 insertions(+), 19 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index e2d8a74f83c3..7ae6761a8e59 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -684,23 +684,24 @@ static void __aer_print_error(struct pci_dev *dev, { const char **strings; unsigned long status =3D info->status & ~info->mask; - const char *level, *errmsg; + const char *errmsg; int i; =20 - if (info->severity =3D=3D AER_CORRECTABLE) { + if (info->severity =3D=3D AER_CORRECTABLE) strings =3D aer_correctable_error_string; - level =3D KERN_WARNING; - } else { + else strings =3D aer_uncorrectable_error_string; - level =3D KERN_ERR; - } =20 for_each_set_bit(i, &status, 32) { errmsg =3D strings[i]; if (!errmsg) errmsg =3D "Unknown Error Bit"; =20 - pci_printk(level, dev, " [%2d] %-22s%s\n", i, errmsg, + if (info->severity =3D=3D AER_CORRECTABLE) + pci_warn_ratelimited(dev, " [%2d] %-22s%s\n", i, errmsg, + info->first_error =3D=3D i ? " (First)" : ""); + else + pci_err(dev, " [%2d] %-22s%s\n", i, errmsg, info->first_error =3D=3D i ? " (First)" : ""); } pci_dev_aer_stats_incr(dev, info); @@ -710,7 +711,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_er= r_info *info) { int layer, agent; int id =3D ((dev->bus->number << 8) | dev->devfn); - const char *level; =20 if (!info->status) { pci_err(dev, "PCIe Bus Error: severity=3D%s, type=3DInaccessible, (Unreg= istered Agent ID)\n", @@ -721,14 +718,21 @@ void aer_print_error(struct pci_dev *dev, struct aer_= err_info *info) layer =3D AER_GET_LAYER_ERROR(info->severity, info->status); agent =3D AER_GET_AGENT(info->severity, info->status); =20 - level =3D (info->severity =3D=3D AER_CORRECTABLE) ? KERN_WARNING : KERN_E= RR; + if (info->severity =3D=3D AER_CORRECTABLE) { + pci_warn_ratelimited(dev, "PCIe Bus Error: severity=3D%s, type=3D%s, (%s= )\n", + aer_error_severity_string[info->severity], + aer_error_layer[layer], aer_agent_string[agent]); =20 - pci_printk(level, dev, "PCIe Bus Error: severity=3D%s, type=3D%s, (%s)\n", - aer_error_severity_string[info->severity], - aer_error_layer[layer], aer_agent_string[agent]); + pci_warn_ratelimited(dev, " device [%04x:%04x] error status/mask=3D%08x= /%08x\n", + dev->vendor, dev->device, info->status, info->mask); + } else { + pci_err(dev, "PCIe Bus Error: severity=3D%s, type=3D%s, (%s)\n", + aer_error_severity_string[info->severity], + aer_error_layer[layer], aer_agent_string[agent]); =20 - pci_printk(level, dev, " device [%04x:%04x] error status/mask=3D%08x/%08= x\n", - dev->vendor, dev->device, info->status, info->mask); + pci_err(dev, " device [%04x:%04x] error status/mask=3D%08x/%08x\n", + dev->vendor, dev->device, info->status, info->mask); + } =20 __aer_print_error(dev, info); =20 @@ -748,11 +755,19 @@ static void aer_print_port_info(struct pci_dev *dev, = struct aer_err_info *info) u8 bus =3D info->id >> 8; u8 devfn =3D info->id & 0xff; =20 - pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n", - info->multi_error_valid ? "Multiple " : "", - aer_error_severity_string[info->severity], - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), - PCI_FUNC(devfn)); + if (info->severity =3D=3D AER_CORRECTABLE) + pci_info_ratelimited(dev, "%s%s error received: %04x:%02x:%02x.%d\n", + info->multi_error_valid ? "Multiple " : "", + aer_error_severity_string[info->severity], + pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), + PCI_FUNC(devfn)); + else + pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n", + info->multi_error_valid ? "Multiple " : "", + aer_error_severity_string[info->severity], + pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), + PCI_FUNC(devfn)); + } =20 #ifdef CONFIG_ACPI_APEI_PCIEAER diff --git a/include/linux/pci.h b/include/linux/pci.h index 060af91bafcd..d9434bae10c8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2491,6 +2491,9 @@ void pci_uevent_ers(struct pci_dev *pdev, enum pci_e= rs_result err_type); #define pci_info_ratelimited(pdev, fmt, arg...) \ dev_info_ratelimited(&(pdev)->dev, fmt, ##arg) =20 +#define pci_warn_ratelimited(pdev, fmt, arg...) \ + dev_warn_ratelimited(&(pdev)->dev, fmt, ##arg) + #define pci_WARN(pdev, condition, fmt, arg...) \ WARN(condition, "%s %s: " fmt, \ dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg) --=20 2.34.1