From nobody Sat Oct 4 17:30:19 2025 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2082.outbound.protection.outlook.com [40.107.92.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86A67305E2A; Wed, 13 Aug 2025 18:57:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.82 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111424; cv=fail; b=ZSY5CsDJqFDg6jOxc+dsYrM85JhO94MydAQQAOGXvnLGLo2U0by/J5R5neGsExWwJFw0azshI/136Z9xg+e70d6wjxvU9tpAgBWaV1HPu17zlnhVdd3CYTfevaJgVTD9l4XjPWnDQP53tt/6gMzKSze8H2coImYspGxZxXvQl4g= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111424; c=relaxed/simple; bh=bpaGTdNufOcgWmLgBYcOIi7YgNK/i37WCEh8kn+ahrg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p4b7I6qdW3ymcTZT0RE1NrMOjJymcct1TE50tqGVlSfcyU0iLgGzerqJO8UIVkpjH2Thg6JYjDaUXHsMZ4Nh27gJOYqRiDnittVyjRAlHb3hXQu+a5kJ8//hz1mjIXzqNEUs50H2SyXrGm4d00cslI61WN+3HM5RsTpLRapEZq0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=I2QkoXIc; arc=fail smtp.client-ip=40.107.92.82 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="I2QkoXIc" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TKx2VVwVZaG/wlQTQOpkIBTpbkxIeWo8rBRvm1Wyn2RMG1bxZ76TJooZvkNze++GC2QEr9Nv8P/cJlk2Cbi0xwnKQHwa3md0TonSgDZq1OehUPJGysM31yRJyquDA4NSXs+JfBrVrxbOOSsRqaCF0QYO5s/eqOjsTGliik3Bj1151Y2gFiTY3ENQoz7DClQsdZqBsw04AnrvAJa3nGmOIpZtc1WYB0nfRfRzh6ytM4Q+ao9Rqd+19C3oATehnLo7YdDBBJxX6pzZG2fQECcWL5zs4/Y1Iz8SC0F0CTPvXZ1iTo/7WnXV9ZK3mK0e/NZFFEaE60yFqM3tAqaw/2o1sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ya3u/VU+6QdkvhIk4g9Ohazh/sHNKSNHpVk1YHxknQM=; b=vA0KCiyWMp17j0PUkWl54zsHLWXHaRuYyNECctC8gdlYAqxdcueinIYL4ybKMrIgXH61phsZhFG3BkRqJH+HXlnvjDkpUYV3K3WfzvceRLH0iFNCIvMNp4OIANciBo92X29v9sORN9lF9khTok7mtfd5hKiNYC6nTT/5UQnCes0J5yK7+LIj/WIj9pupEHZL0bH2cF6TiEeXdgZdNGUwun4/2J0WhSOAXJ/2YPhsRCbFh8eZ9xa/DpGskbgEi3KzPT/u1NmuOZSYrNx5x/PZVDkCFY/Unpqb4xhpJx1OKTFJDfZxOO0/+8OKtf+mGhxwxs1w3Vo8Y0qiHJE6aZt8wg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=marvell.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ya3u/VU+6QdkvhIk4g9Ohazh/sHNKSNHpVk1YHxknQM=; b=I2QkoXIcbJHQA4UPlHGfkHSN7DYO2gprRfrZmWAuy56VsrzItd9sDwmxucUJzRMaMfWdqbVtSDNJUMNpBegbRwb7y9DXCa6ZbjuoEm7xSmwg0vXj7Ap5tEWXeWBZy+3eaLV2pPWBSYNjXg2Sd+gZUWYIh9MFyNTQXNJ77WbiHINlsFZjzdo0HzXqZzqqAp/f4D/uS+AhnZD73LWOA5MzZ+LcNMu2hy1NveZCMdKSzPi8fG54TtjynHm2H66Dzw2uFQ5IHRyzgqF+ldpwcHu2k/auCEA82Y0FoWuSfxqs1I3f0LWFMua8MEodgW9vm6fQgg8pBBJmlaf/esyIV9PhAA== Received: from SJ0PR03CA0343.namprd03.prod.outlook.com (2603:10b6:a03:39c::18) by CY1PR12MB9558.namprd12.prod.outlook.com (2603:10b6:930:fe::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9009.22; Wed, 13 Aug 2025 18:56:56 +0000 Received: from SJ1PEPF00002327.namprd03.prod.outlook.com (2603:10b6:a03:39c::4) by SJ0PR03CA0343.outlook.office365.com (2603:10b6:a03:39c::18) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9031.15 via Frontend Transport; Wed, 13 Aug 2025 18:56:55 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ1PEPF00002327.mail.protection.outlook.com (10.167.242.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.11 via Frontend Transport; Wed, 13 Aug 2025 18:56:55 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:56:40 -0700 Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:56:40 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.9) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 13 Aug 2025 11:56:32 -0700 From: Tariq Toukan To: Jiri Pirko , Jiri Pirko , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Lunn , "David S. Miller" CC: Donald Hunter , Jonathan Corbet , Brett Creeley , Michael Chan , Pavan Chebbi , "Cai Huoqing" , Tony Nguyen , Przemek Kitszel , Sunil Goutham , Linu Cherian , Geetha sowjanya , Jerin Jacob , hariprasad , Subbaraya Sundeep , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Mark Bloch , Ido Schimmel , Petr Machata , Manish Chopra , , , , , , "Gal Pressman" , Dragos Tatulea , "Shahar Shitrit" Subject: [PATCH net-next V3 1/5] devlink: Move graceful period parameter to reporter ops Date: Wed, 13 Aug 2025 21:55:45 +0300 Message-ID: <1755111349-416632-2-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> References: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00002327:EE_|CY1PR12MB9558:EE_ X-MS-Office365-Filtering-Correlation-Id: a3df5e19-9a82-47be-af9e-08ddda9b2c8a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|7416014|82310400026|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?MnQHFzXiOn+PTxEeAmAP+QGOdOvli8HubOc2GMNRtbfSB2QnD7Jzqr91s02N?= =?us-ascii?Q?KPm+YbN1P3WM0+fSPsKWUBBUjndf2wxWS9Gyw0BJr8WaQQUJCQ0dXAORc02i?= =?us-ascii?Q?5KLgbfrjgfGQgJgIKNaxcjoL687ZkSOOTGd0n/hUYXYNuTejgqkZ2x6tYGrM?= =?us-ascii?Q?06J8MCCxmpZsCi8BIR9q1ogVPd5hOOhMWWCY30TjEHS5NZZlhRz1sVeiAv70?= =?us-ascii?Q?KgG1thjqCtYsgh0hGFjpYVbMNdfOdbKIh12jS48uM7DjtuUFSS1hOUqvMVmG?= =?us-ascii?Q?DktA/FstgpmXAqPl5/hqKo3bGg9mbB4JZLT3Dtx6blKxBOpwbGo1v3Vtm3N7?= =?us-ascii?Q?zxRQ8VUUIdakdSrrje42i2Rh35r64PXJJTVQWfmFVa+7A9VBMX2MH0SUdrq1?= =?us-ascii?Q?mLHWBW6rLuUlo3WRrUMv77b76xsve0iLjs39iL1eXh7Wn/QnV2q79oOqv3EB?= =?us-ascii?Q?vKpHC1gbt7I6vOjbUIm4dUPUB8nvEBQbWKIC4xrxX1GJtrld4h1Vlo4msZ8u?= =?us-ascii?Q?832dJ3lIPlUWbUKueHHwp71517KeakDN74AGO3JuKXdM9cbS6MAwCwfZFO2j?= =?us-ascii?Q?FMtP9A8U0OTo29J/+PRcorp1pbEJLK7bVYIi1nIB5ntlDAeZvMxDlwmmrVcL?= =?us-ascii?Q?09HOR/r1XmRRVUAdVhGrQx6Qqils7LrJH4Z5AUGvig0i3XkaHXH1lr+u9j6h?= =?us-ascii?Q?AP3RKpcCjeSPz/uehAiep95BPAMXnUX+Y5D4JksbXcE1VfMif78OBsmol8Es?= =?us-ascii?Q?+OAEgaL+udJZaeYXSaqOh6NigCgH+loVJAsj93Camt9Gljpa2LpXNCgjDxQx?= =?us-ascii?Q?NJj7OHwPywZ3UJtP4vbs8p7DdDsoKdEh7T9QzV+XxWcPODtyFlY0bcV0Er6Y?= =?us-ascii?Q?AlAlV+sgO2q2H1DDlO3d7yvLDrHwrH/viVYzbn77Uj37cT/p+wf6RhXffANa?= =?us-ascii?Q?B+89+z4nKrZW/MoNbHx4mCU9Q7P4bS6T1Zwl7I7qBqK0yvfzIbS5ePh+0JcU?= =?us-ascii?Q?o0HOFOSBYlXK4OEv88e9FAyrJ3Wsp2xUFyiPkzYxANpiiV76DoUccgv4m+UK?= =?us-ascii?Q?QWnhfx9G0fihN8OMf0XO/IVKUOliewhCoSw9/YXz0p3UG1+DJ4ibQF0ndCpl?= =?us-ascii?Q?YiRoDdP+N4ac7Icn8t0peROtSHdztl3r+fO2k7drakz8DV6s1JscbNfPE/Qg?= =?us-ascii?Q?so0V/trZ0mhGD3Vi9qFDvxpTneLe2cCeNUH9P5iVYLeccxxo6xPWFgVxOzL2?= =?us-ascii?Q?4s/cA6LvjwA1HHN8b5NIpbbMERY7e1XO+z7nuaLoNzlrRO9rXGlyLYNbot5I?= =?us-ascii?Q?fcEc5kGDfvmGYN2TZNAUicJFnBHQau8S5cd8BrVVERQkTbQ+sbpr3IHt3Ail?= =?us-ascii?Q?GoUGi5mzkY4QLenvF3Ii+wicjudzwwPA9RvaHrP+rJSkZ0J6/kDeyRwEhHDg?= =?us-ascii?Q?+Bz9s0Q2CnUEmdSV9dcoGauOM7tHG30hfkf25pfXvQZmtO9FJxT56rwbfX70?= =?us-ascii?Q?ML4BBJY45YxHahL5WFPaICUaySwu6SmnoRwx?= X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(1800799024)(7416014)(82310400026)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Aug 2025 18:56:55.7262 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a3df5e19-9a82-47be-af9e-08ddda9b2c8a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00002327.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR12MB9558 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Shahar Shitrit Move the default graceful period from a parameter to devlink_health_reporter_create() to a field in the devlink_health_reporter_ops structure. This change improves consistency, as the graceful period is inherently tied to the reporter's behavior and recovery policy. It simplifies the signature of devlink_health_reporter_create() and its internal helper functions. It also centralizes the reporter configuration at the ops structure, preparing the groundwork for a downstream patch that will introduce a devlink health reporter error burst period attribute whose default value will similarly be provided by the driver via the ops structure. Signed-off-by: Shahar Shitrit Reviewed-by: Jiri Pirko Reviewed-by: Carolina Jubran Signed-off-by: Tariq Toukan --- drivers/net/ethernet/amd/pds_core/main.c | 2 +- .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 2 +- .../net/ethernet/huawei/hinic/hinic_devlink.c | 10 +++-- .../net/ethernet/intel/ice/devlink/health.c | 3 +- .../marvell/octeontx2/af/rvu_devlink.c | 32 +++++++++++---- .../mellanox/mlx5/core/diag/reporter_vnic.c | 2 +- .../mellanox/mlx5/core/en/reporter_rx.c | 10 +++-- .../mellanox/mlx5/core/en/reporter_tx.c | 10 +++-- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- .../net/ethernet/mellanox/mlx5/core/health.c | 41 +++++++++++-------- drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_devlink.c | 10 +++-- drivers/net/netdevsim/health.c | 4 +- include/net/devlink.h | 11 +++-- net/devlink/health.c | 28 +++++-------- 15 files changed, 98 insertions(+), 71 deletions(-) diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/etherne= t/amd/pds_core/main.c index 9b81e1c260c2..c7a2eff57632 100644 --- a/drivers/net/ethernet/amd/pds_core/main.c +++ b/drivers/net/ethernet/amd/pds_core/main.c @@ -280,7 +280,7 @@ static int pdsc_init_pf(struct pdsc *pdsc) goto err_out_del_dev; } =20 - hr =3D devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, 0, pdsc); + hr =3D devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, pdsc); if (IS_ERR(hr)) { devl_unlock(dl); dev_warn(pdsc->dev, "Failed to create fw reporter: %pe\n", hr); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/ne= t/ethernet/broadcom/bnxt/bnxt_devlink.c index 4c4581b0342e..43fb75806cd6 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -220,7 +220,7 @@ __bnxt_dl_reporter_create(struct bnxt *bp, { struct devlink_health_reporter *reporter; =20 - reporter =3D devlink_health_reporter_create(bp->dl, ops, 0, bp); + reporter =3D devlink_health_reporter_create(bp->dl, ops, bp); if (IS_ERR(reporter)) { netdev_warn(bp->dev, "Failed to create %s health reporter, rc =3D %ld\n", ops->name, PTR_ERR(reporter)); diff --git a/drivers/net/ethernet/huawei/hinic/hinic_devlink.c b/drivers/ne= t/ethernet/huawei/hinic/hinic_devlink.c index 03e42512a2d5..300bc267a259 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_devlink.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_devlink.c @@ -443,8 +443,9 @@ int hinic_health_reporters_create(struct hinic_devlink_= priv *priv) struct devlink *devlink =3D priv_to_devlink(priv); =20 priv->hw_fault_reporter =3D - devlink_health_reporter_create(devlink, &hinic_hw_fault_reporter_ops, - 0, priv); + devlink_health_reporter_create(devlink, + &hinic_hw_fault_reporter_ops, + priv); if (IS_ERR(priv->hw_fault_reporter)) { dev_warn(&priv->hwdev->hwif->pdev->dev, "Failed to create hw fault repor= ter, err: %ld\n", PTR_ERR(priv->hw_fault_reporter)); @@ -452,8 +453,9 @@ int hinic_health_reporters_create(struct hinic_devlink_= priv *priv) } =20 priv->fw_fault_reporter =3D - devlink_health_reporter_create(devlink, &hinic_fw_fault_reporter_ops, - 0, priv); + devlink_health_reporter_create(devlink, + &hinic_fw_fault_reporter_ops, + priv); if (IS_ERR(priv->fw_fault_reporter)) { dev_warn(&priv->hwdev->hwif->pdev->dev, "Failed to create fw fault repor= ter, err: %ld\n", PTR_ERR(priv->fw_fault_reporter)); diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c b/drivers/net/= ethernet/intel/ice/devlink/health.c index ab519c0f28bf..8e9a8a8178d4 100644 --- a/drivers/net/ethernet/intel/ice/devlink/health.c +++ b/drivers/net/ethernet/intel/ice/devlink/health.c @@ -450,9 +450,8 @@ ice_init_devlink_rep(struct ice_pf *pf, { struct devlink *devlink =3D priv_to_devlink(pf); struct devlink_health_reporter *rep; - const u64 graceful_period =3D 0; =20 - rep =3D devl_health_reporter_create(devlink, ops, graceful_period, pf); + rep =3D devl_health_reporter_create(devlink, ops, pf); if (IS_ERR(rep)) { struct device *dev =3D ice_pf_to_dev(pf); =20 diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/driv= ers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 27c3a2daaaa9..3735372539bd 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -505,7 +505,9 @@ static int rvu_nix_register_reporters(struct rvu_devlin= k *rvu_dl) =20 rvu_reporters->nix_event_ctx =3D nix_event_context; rvu_reporters->rvu_hw_nix_intr_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_intr_reporter_ops= , 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_nix_intr_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_nix_intr_reporter)) { dev_warn(rvu->dev, "Failed to create hw_nix_intr reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_nix_intr_reporter)); @@ -513,7 +515,9 @@ static int rvu_nix_register_reporters(struct rvu_devlin= k *rvu_dl) } =20 rvu_reporters->rvu_hw_nix_gen_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_gen_reporter_ops,= 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_nix_gen_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_nix_gen_reporter)) { dev_warn(rvu->dev, "Failed to create hw_nix_gen reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_nix_gen_reporter)); @@ -521,7 +525,9 @@ static int rvu_nix_register_reporters(struct rvu_devlin= k *rvu_dl) } =20 rvu_reporters->rvu_hw_nix_err_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_err_reporter_ops,= 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_nix_err_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_nix_err_reporter)) { dev_warn(rvu->dev, "Failed to create hw_nix_err reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_nix_err_reporter)); @@ -529,7 +535,9 @@ static int rvu_nix_register_reporters(struct rvu_devlin= k *rvu_dl) } =20 rvu_reporters->rvu_hw_nix_ras_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_ras_reporter_ops,= 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_nix_ras_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_nix_ras_reporter)) { dev_warn(rvu->dev, "Failed to create hw_nix_ras reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_nix_ras_reporter)); @@ -1051,7 +1059,9 @@ static int rvu_npa_register_reporters(struct rvu_devl= ink *rvu_dl) =20 rvu_reporters->npa_event_ctx =3D npa_event_context; rvu_reporters->rvu_hw_npa_intr_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_intr_reporter_ops= , 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_npa_intr_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_npa_intr_reporter)) { dev_warn(rvu->dev, "Failed to create hw_npa_intr reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_npa_intr_reporter)); @@ -1059,7 +1069,9 @@ static int rvu_npa_register_reporters(struct rvu_devl= ink *rvu_dl) } =20 rvu_reporters->rvu_hw_npa_gen_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_gen_reporter_ops,= 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_npa_gen_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_npa_gen_reporter)) { dev_warn(rvu->dev, "Failed to create hw_npa_gen reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_npa_gen_reporter)); @@ -1067,7 +1079,9 @@ static int rvu_npa_register_reporters(struct rvu_devl= ink *rvu_dl) } =20 rvu_reporters->rvu_hw_npa_err_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_err_reporter_ops,= 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_npa_err_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_npa_err_reporter)) { dev_warn(rvu->dev, "Failed to create hw_npa_err reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_npa_err_reporter)); @@ -1075,7 +1089,9 @@ static int rvu_npa_register_reporters(struct rvu_devl= ink *rvu_dl) } =20 rvu_reporters->rvu_hw_npa_ras_reporter =3D - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_ras_reporter_ops,= 0, rvu); + devlink_health_reporter_create(rvu_dl->dl, + &rvu_hw_npa_ras_reporter_ops, + rvu); if (IS_ERR(rvu_reporters->rvu_hw_npa_ras_reporter)) { dev_warn(rvu->dev, "Failed to create hw_npa_ras reporter, err=3D%ld\n", PTR_ERR(rvu_reporters->rvu_hw_npa_ras_reporter)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b= /drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c index 86253a89c24c..878f9b46bf18 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c @@ -133,7 +133,7 @@ void mlx5_reporter_vnic_create(struct mlx5_core_dev *de= v) health->vnic_reporter =3D devlink_health_reporter_create(devlink, &mlx5_reporter_vnic_ops, - 0, dev); + dev); if (IS_ERR(health->vnic_reporter)) mlx5_core_warn(dev, "Failed to create vnic reporter, err =3D %ld\n", diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/dri= vers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c index 16c44d628eda..1b9ea72abc5a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c @@ -651,22 +651,24 @@ void mlx5e_reporter_icosq_resume_recovery(struct mlx5= e_channel *c) mutex_unlock(&c->icosq_recovery_lock); } =20 +#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500 + static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops =3D { .name =3D "rx", .recover =3D mlx5e_rx_reporter_recover, .diagnose =3D mlx5e_rx_reporter_diagnose, .dump =3D mlx5e_rx_reporter_dump, + .default_graceful_period =3D MLX5E_REPORTER_RX_GRACEFUL_PERIOD, }; =20 -#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500 - void mlx5e_reporter_rx_create(struct mlx5e_priv *priv) { + struct devlink_port *port =3D priv->netdev->devlink_port; struct devlink_health_reporter *reporter; =20 - reporter =3D devlink_port_health_reporter_create(priv->netdev->devlink_po= rt, + reporter =3D devlink_port_health_reporter_create(port, &mlx5_rx_reporter_ops, - MLX5E_REPORTER_RX_GRACEFUL_PERIOD, priv); + priv); if (IS_ERR(reporter)) { netdev_warn(priv->netdev, "Failed to create rx reporter, err =3D %ld\n", PTR_ERR(reporter)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/dri= vers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index 85d5cb39b107..7a4a77f6fe6a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -539,22 +539,24 @@ void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_p= tpsq *ptpsq) mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx); } =20 +#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500 + static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops =3D { .name =3D "tx", .recover =3D mlx5e_tx_reporter_recover, .diagnose =3D mlx5e_tx_reporter_diagnose, .dump =3D mlx5e_tx_reporter_dump, + .default_graceful_period =3D MLX5_REPORTER_TX_GRACEFUL_PERIOD, }; =20 -#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500 - void mlx5e_reporter_tx_create(struct mlx5e_priv *priv) { + struct devlink_port *port =3D priv->netdev->devlink_port; struct devlink_health_reporter *reporter; =20 - reporter =3D devlink_port_health_reporter_create(priv->netdev->devlink_po= rt, + reporter =3D devlink_port_health_reporter_create(port, &mlx5_tx_reporter_ops, - MLX5_REPORTER_TX_GRACEFUL_PERIOD, priv); + priv); if (IS_ERR(reporter)) { netdev_warn(priv->netdev, "Failed to create tx reporter, err =3D %ld\n", diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net= /ethernet/mellanox/mlx5/core/en_rep.c index 63a7a788fb0d..b231e7855bca 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1447,7 +1447,7 @@ static void mlx5e_rep_vnic_reporter_create(struct mlx= 5e_priv *priv, =20 reporter =3D devl_port_health_reporter_create(dl_port, &mlx5_rep_vnic_reporter_ops, - 0, rpriv); + rpriv); if (IS_ERR(reporter)) { mlx5_core_err(priv->mdev, "Failed to create representor vnic reporter, err =3D %ld\n", diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net= /ethernet/mellanox/mlx5/core/health.c index cf7a1edd0530..6959fea03443 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -669,54 +669,61 @@ static void mlx5_fw_fatal_reporter_err_work(struct wo= rk_struct *work) } } =20 +#define MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD 180000 +#define MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD 60000 +#define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000 +#define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD \ + MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD + +static const +struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ecpf_ops =3D { + .name =3D "fw_fatal", + .recover =3D mlx5_fw_fatal_reporter_recover, + .dump =3D mlx5_fw_fatal_reporter_dump, + .default_graceful_period =3D + MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD, +}; + static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_pf_= ops =3D { .name =3D "fw_fatal", .recover =3D mlx5_fw_fatal_reporter_recover, .dump =3D mlx5_fw_fatal_reporter_dump, + .default_graceful_period =3D MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD, }; =20 static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops= =3D { .name =3D "fw_fatal", .recover =3D mlx5_fw_fatal_reporter_recover, + .default_graceful_period =3D + MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD, }; =20 -#define MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD 180000 -#define MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD 60000 -#define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000 -#define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD MLX5_FW_REPORTER_VF_GRACE= FUL_PERIOD - void mlx5_fw_reporters_create(struct mlx5_core_dev *dev) { const struct devlink_health_reporter_ops *fw_fatal_ops; struct mlx5_core_health *health =3D &dev->priv.health; const struct devlink_health_reporter_ops *fw_ops; struct devlink *devlink =3D priv_to_devlink(dev); - u64 grace_period; =20 - fw_fatal_ops =3D &mlx5_fw_fatal_reporter_pf_ops; fw_ops =3D &mlx5_fw_reporter_pf_ops; if (mlx5_core_is_ecpf(dev)) { - grace_period =3D MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD; + fw_fatal_ops =3D &mlx5_fw_fatal_reporter_ecpf_ops; } else if (mlx5_core_is_pf(dev)) { - grace_period =3D MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD; + fw_fatal_ops =3D &mlx5_fw_fatal_reporter_pf_ops; } else { /* VF or SF */ - grace_period =3D MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD; fw_fatal_ops =3D &mlx5_fw_fatal_reporter_ops; fw_ops =3D &mlx5_fw_reporter_ops; } =20 - health->fw_reporter =3D - devl_health_reporter_create(devlink, fw_ops, 0, dev); + health->fw_reporter =3D devl_health_reporter_create(devlink, fw_ops, dev); if (IS_ERR(health->fw_reporter)) mlx5_core_warn(dev, "Failed to create fw reporter, err =3D %ld\n", PTR_ERR(health->fw_reporter)); =20 - health->fw_fatal_reporter =3D - devl_health_reporter_create(devlink, - fw_fatal_ops, - grace_period, - dev); + health->fw_fatal_reporter =3D devl_health_reporter_create(devlink, + fw_fatal_ops, + dev); if (IS_ERR(health->fw_fatal_reporter)) mlx5_core_warn(dev, "Failed to create fw fatal reporter, err =3D %ld\n", PTR_ERR(health->fw_fatal_reporter)); diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ether= net/mellanox/mlxsw/core.c index 2bb2b77351bd..980f3223f124 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -2043,7 +2043,7 @@ static int mlxsw_core_health_init(struct mlxsw_core *= mlxsw_core) return 0; =20 fw_fatal =3D devl_health_reporter_create(devlink, &mlxsw_core_health_fw_f= atal_ops, - 0, mlxsw_core); + mlxsw_core); if (IS_ERR(fw_fatal)) { dev_err(mlxsw_core->bus_info->dev, "Failed to create fw fatal reporter"); return PTR_ERR(fw_fatal); diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/et= hernet/qlogic/qed/qed_devlink.c index 1adc7fbb3f2f..d000ed734c7c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -87,20 +87,22 @@ qed_fw_fatal_reporter_recover(struct devlink_health_rep= orter *reporter, return 0; } =20 +#define QED_REPORTER_FW_GRACEFUL_PERIOD 0 + static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = =3D { .name =3D "fw_fatal", .recover =3D qed_fw_fatal_reporter_recover, .dump =3D qed_fw_fatal_reporter_dump, + .default_graceful_period =3D QED_REPORTER_FW_GRACEFUL_PERIOD, }; =20 -#define QED_REPORTER_FW_GRACEFUL_PERIOD 0 - void qed_fw_reporters_create(struct devlink *devlink) { struct qed_devlink *dl =3D devlink_priv(devlink); =20 - dl->fw_reporter =3D devlink_health_reporter_create(devlink, &qed_fw_fatal= _reporter_ops, - QED_REPORTER_FW_GRACEFUL_PERIOD, dl); + dl->fw_reporter =3D + devlink_health_reporter_create(devlink, + &qed_fw_fatal_reporter_ops, dl); if (IS_ERR(dl->fw_reporter)) { DP_NOTICE(dl->cdev, "Failed to create fw reporter, err =3D %ld\n", PTR_ERR(dl->fw_reporter)); diff --git a/drivers/net/netdevsim/health.c b/drivers/net/netdevsim/health.c index 688f05316b5e..3bd0e7a489c3 100644 --- a/drivers/net/netdevsim/health.c +++ b/drivers/net/netdevsim/health.c @@ -183,14 +183,14 @@ int nsim_dev_health_init(struct nsim_dev *nsim_dev, s= truct devlink *devlink) health->empty_reporter =3D devl_health_reporter_create(devlink, &nsim_dev_empty_reporter_ops, - 0, health); + health); if (IS_ERR(health->empty_reporter)) return PTR_ERR(health->empty_reporter); =20 health->dummy_reporter =3D devl_health_reporter_create(devlink, &nsim_dev_dummy_reporter_ops, - 0, health); + health); if (IS_ERR(health->dummy_reporter)) { err =3D PTR_ERR(health->dummy_reporter); goto err_empty_reporter_destroy; diff --git a/include/net/devlink.h b/include/net/devlink.h index 93640a29427c..a65aa24e8df4 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -742,6 +742,8 @@ enum devlink_health_reporter_state { * if priv_ctx is NULL, run a full dump * @diagnose: callback to diagnose the current status * @test: callback to trigger a test event + * @default_graceful_period: default min time (in msec) + between recovery attempts */ =20 struct devlink_health_reporter_ops { @@ -756,6 +758,7 @@ struct devlink_health_reporter_ops { struct netlink_ext_ack *extack); int (*test)(struct devlink_health_reporter *reporter, struct netlink_ext_ack *extack); + u64 default_graceful_period; }; =20 /** @@ -1924,22 +1927,22 @@ void devlink_fmsg_binary_pair_put(struct devlink_fm= sg *fmsg, const char *name, struct devlink_health_reporter * devl_port_health_reporter_create(struct devlink_port *port, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv); + void *priv); =20 struct devlink_health_reporter * devlink_port_health_reporter_create(struct devlink_port *port, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv); + void *priv); =20 struct devlink_health_reporter * devl_health_reporter_create(struct devlink *devlink, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv); + void *priv); =20 struct devlink_health_reporter * devlink_health_reporter_create(struct devlink *devlink, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv); + void *priv); =20 void devl_health_reporter_destroy(struct devlink_health_reporter *reporter); diff --git a/net/devlink/health.c b/net/devlink/health.c index b3ce8ecbb7fb..ba144b7426fa 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -108,11 +108,11 @@ devlink_port_health_reporter_find_by_name(struct devl= ink_port *devlink_port, static struct devlink_health_reporter * __devlink_health_reporter_create(struct devlink *devlink, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) + void *priv) { struct devlink_health_reporter *reporter; =20 - if (WARN_ON(graceful_period && !ops->recover)) + if (WARN_ON(ops->default_graceful_period && !ops->recover)) return ERR_PTR(-EINVAL); =20 reporter =3D kzalloc(sizeof(*reporter), GFP_KERNEL); @@ -122,7 +122,7 @@ __devlink_health_reporter_create(struct devlink *devlin= k, reporter->priv =3D priv; reporter->ops =3D ops; reporter->devlink =3D devlink; - reporter->graceful_period =3D graceful_period; + reporter->graceful_period =3D ops->default_graceful_period; reporter->auto_recover =3D !!ops->recover; reporter->auto_dump =3D !!ops->dump; return reporter; @@ -134,13 +134,12 @@ __devlink_health_reporter_create(struct devlink *devl= ink, * * @port: devlink_port to which health reports will relate * @ops: devlink health reporter ops - * @graceful_period: min time (in msec) between recovery attempts * @priv: driver priv pointer */ struct devlink_health_reporter * devl_port_health_reporter_create(struct devlink_port *port, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) + void *priv) { struct devlink_health_reporter *reporter; =20 @@ -150,8 +149,7 @@ devl_port_health_reporter_create(struct devlink_port *p= ort, ops->name)) return ERR_PTR(-EEXIST); =20 - reporter =3D __devlink_health_reporter_create(port->devlink, ops, - graceful_period, priv); + reporter =3D __devlink_health_reporter_create(port->devlink, ops, priv); if (IS_ERR(reporter)) return reporter; =20 @@ -164,14 +162,13 @@ EXPORT_SYMBOL_GPL(devl_port_health_reporter_create); struct devlink_health_reporter * devlink_port_health_reporter_create(struct devlink_port *port, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) + void *priv) { struct devlink_health_reporter *reporter; struct devlink *devlink =3D port->devlink; =20 devl_lock(devlink); - reporter =3D devl_port_health_reporter_create(port, ops, - graceful_period, priv); + reporter =3D devl_port_health_reporter_create(port, ops, priv); devl_unlock(devlink); return reporter; } @@ -182,13 +179,12 @@ EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create= ); * * @devlink: devlink instance which the health reports will relate * @ops: devlink health reporter ops - * @graceful_period: min time (in msec) between recovery attempts * @priv: driver priv pointer */ struct devlink_health_reporter * devl_health_reporter_create(struct devlink *devlink, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) + void *priv) { struct devlink_health_reporter *reporter; =20 @@ -197,8 +193,7 @@ devl_health_reporter_create(struct devlink *devlink, if (devlink_health_reporter_find_by_name(devlink, ops->name)) return ERR_PTR(-EEXIST); =20 - reporter =3D __devlink_health_reporter_create(devlink, ops, - graceful_period, priv); + reporter =3D __devlink_health_reporter_create(devlink, ops, priv); if (IS_ERR(reporter)) return reporter; =20 @@ -210,13 +205,12 @@ EXPORT_SYMBOL_GPL(devl_health_reporter_create); struct devlink_health_reporter * devlink_health_reporter_create(struct devlink *devlink, const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) + void *priv) { struct devlink_health_reporter *reporter; =20 devl_lock(devlink); - reporter =3D devl_health_reporter_create(devlink, ops, - graceful_period, priv); + reporter =3D devl_health_reporter_create(devlink, ops, priv); devl_unlock(devlink); return reporter; } --=20 2.31.1 From nobody Sat Oct 4 17:30:19 2025 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2084.outbound.protection.outlook.com [40.107.243.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0829B30748D; Wed, 13 Aug 2025 18:57:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.243.84 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111433; cv=fail; b=juvP+m5XHCkORTjqtkPTdYtqoDtUxMNCHEEkt9MVzRd1ebzGqRDONniYuAZTOH0xRgR+HenTZ5N2QS6jOouKn4nOhlA1lmHIKHDfJsHgdT/Doz+yPP4JQsnpPgPBvL17p/vtfB6DtKipD/mCuQ85wcLgBBPkrGYW0I9QSH4SapA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111433; c=relaxed/simple; bh=TwZPPMblzIjxCnzBgEjNraazTzu+Z2j0GQjcnHgWnKo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=faZOHiQuUhkixmQqcr6iHU168rseEuiiLbUV78i/Siq9xQ85ip9z+VwfKNmbSWtSOtR38hlvWNqCfgVj7mqFD9ZqZ/kSdrn/fIxpiNvctvgPCWYErWOizOu7NZUNzCkcZScBMmhAVTdWsgUCioyjC0582mUjXUs0jhzNE9cieBU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=GrpmLrxG; arc=fail smtp.client-ip=40.107.243.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="GrpmLrxG" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=szuOEC+zPo7J6cgL7u237wR5gykKH6f15xWcHNmYpq5sbvLyZdnpk22T5PL8/vtLF0GD/KxPN8coVzTZC6UUg92FcxVR86oKfbysRqYJSyhyMM9ZCdEnOhInJDw8nepoWhZLKrvV84dPiDzhzm3CXXM8BIIgWDIAIJgEYysA90VAjRWvLN6xqE+boPzHZZqqr9Uu+WNuljo4xFc8zn1u0mstLcAIyq+0HmT7a3X5LoEGS2wHQbfQACiw5gixRONFXORrn6F6+x7cV/CWGl4NfPGx830c7Zd0s0JQnub0RbnxO+11pOTEYSG8gpT8UgfykU0sS0+lNvvlj8u0M31AcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=V2VsP58gcTBgl+B7yHn1gkPfe18/m3I0VOOzDPttrck=; b=dNKweBPhsn7uxLJ5nGbclREaUlu5dnCmZbng5y3FlSWV4d2d9+RPqkQ3SioTWQiyJQH/KryziXpl+BACTRWJuMyBM7DD2RM/4ikA6tO0NDzQ3DqO+yjbMsHv1W95kd61RQq7WCcSYDP5svOAEUg2oPCUWNsVzZz4+xcYloH9fap/btKUlW4aZC4GNrdV/Lv6N5A2AE3NspQpS2XwRuDEWZ8vSDAZ7+W5/R7WW7+6YzYFury/aZMEpDXB/3WRJlEJfHQCx7Qa62i9MhCH4w145gXUtZvEjZPQggdIMAMHadI9xWJ4KmDWDYogGCyW22bqvyVz6Qhtvgcc8AiBxRn5nw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=marvell.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=V2VsP58gcTBgl+B7yHn1gkPfe18/m3I0VOOzDPttrck=; b=GrpmLrxG32IwIQqtm5JIBnUsro/1XZKoUkJPaBL9GoSaQXnTm6rTcHzRJPRABvrhT8KDU+qeHbqJQaWTNhu0GXukezHQVac5b+N7BC0cYSS4c29fcRPRfUusBN+TZP2PzQ4HkRDemSEr5XhYnxiwve3lHDUVw9fNWTph+UJwtYqqX21OJXiIdhEZPgXg9I0sz54+HR1WPuYGPP210UmKI/5srlG69omq7BvkaUgiQPxjkOT62CMmDXNuiVP7B9iD9toi9G1e6lSKlVijWZl3VFju4PR01UI/TgG9owaFpZnueBt/teZmDiIoDPyugOElIPXjjm+gU19RmIt3sTRZXg== Received: from BY5PR03CA0025.namprd03.prod.outlook.com (2603:10b6:a03:1e0::35) by LV3PR12MB9257.namprd12.prod.outlook.com (2603:10b6:408:1b7::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9009.22; Wed, 13 Aug 2025 18:57:07 +0000 Received: from SJ1PEPF00002320.namprd03.prod.outlook.com (2603:10b6:a03:1e0:cafe::6c) by BY5PR03CA0025.outlook.office365.com (2603:10b6:a03:1e0::35) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9031.16 via Frontend Transport; Wed, 13 Aug 2025 18:57:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ1PEPF00002320.mail.protection.outlook.com (10.167.242.86) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.11 via Frontend Transport; Wed, 13 Aug 2025 18:57:07 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:56:49 -0700 Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:56:48 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.9) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 13 Aug 2025 11:56:40 -0700 From: Tariq Toukan To: Jiri Pirko , Jiri Pirko , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Lunn , "David S. Miller" CC: Donald Hunter , Jonathan Corbet , Brett Creeley , Michael Chan , Pavan Chebbi , "Cai Huoqing" , Tony Nguyen , Przemek Kitszel , Sunil Goutham , Linu Cherian , Geetha sowjanya , Jerin Jacob , hariprasad , Subbaraya Sundeep , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Mark Bloch , Ido Schimmel , Petr Machata , Manish Chopra , , , , , , "Gal Pressman" , Dragos Tatulea , "Shahar Shitrit" Subject: [PATCH net-next V3 2/5] devlink: Move health reporter recovery abort logic to a separate function Date: Wed, 13 Aug 2025 21:55:46 +0300 Message-ID: <1755111349-416632-3-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> References: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00002320:EE_|LV3PR12MB9257:EE_ X-MS-Office365-Filtering-Correlation-Id: 875fa1f4-f4a8-47d5-3b71-08ddda9b3388 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?vvRIuWHITi/VNIfIY1+GF856HZ5a6BQgtaVAneyY43XYBicSsKF+JpS90Ssd?= =?us-ascii?Q?ADiqTDzlXhptEBcrRDbyGFzvWAGxOT8U98Uqe0bExtGrNUqzD7IMU/luxf3e?= =?us-ascii?Q?zzzDXyeFRRk+7PRxV9rKbDVFFphI0gJGK02iaeuBtqoC1zHSjaaMVLVzdAk3?= =?us-ascii?Q?9vv3bZpMfNnKnY1tc5tJGfYO7UguXk8DbO5Q4SfOm1FTK4DWNRc5TYjCNb/q?= =?us-ascii?Q?C2nrapo1Sc33r08wr/lsH11p7/VFyrCmvDOfRHsnSpi3sfn68+p3QQ2o35aj?= =?us-ascii?Q?alSCKrzbxcePloxuBq5PnIr6jUk9dg3uAsG+qyCJ8QVlipfMTD+K+Glfrhp0?= =?us-ascii?Q?qaKREdya1+91It2cxclehRt1InEv6qpAdzCExNBEOa7I4PKzKHz4Ed/83qfp?= =?us-ascii?Q?1u0aakTQKMnhfUriPQFRE+DbPcoW99sWMEKZ1zra5lqgjxMZlw61POCVV8Xc?= =?us-ascii?Q?SKFF+4+Lcy1Kv4mL6FEGn11rKgkx/GZW35YFjmaSLinSiGw7tSZHnnXUJaKE?= =?us-ascii?Q?0HLHZncBJArDt9Gd/S2BZSu7SFlb3M0Uy6Wx+jywdeDzCG9QQeO8dGyWqFMj?= =?us-ascii?Q?kiNNTmOC6i4ZPF+nPaAuT3GWkP3rxo09DCwOzwGfjOzceK0QgvYmAQ0XSb3R?= =?us-ascii?Q?aCOimT8x1SQZ4Xfmt4B42pfwUSmqIrRmkSkKwJrXJi3gaGMG6BfTNLOUd606?= =?us-ascii?Q?KRIeUTkSZIy7JmCQDBUZTspogB1pGVfD97pkujFbQOjZ8eVPF4dlgl2OZUPJ?= =?us-ascii?Q?rQGsjUPXBp21AFYsh/nW0usJ9Cs+D1EnSBRoN0LFbvgBDM2U+Ft/PNmIZMk0?= =?us-ascii?Q?PD86C89It/ClfEXYP4ZFsCHNFFz7mvzsFlGvKKYq/jyxIeELr6uWTMHgKEpb?= =?us-ascii?Q?yDktig7fronGGKX38IfVewkkGKZid+F18i7rKw7CWEhto16QmOszX61smHWW?= =?us-ascii?Q?SftvGqa5Vcajq5IKSxKMIRATh1tnsXEp6fMRf1bMTyUEeqnUqv72XiXTl+EL?= =?us-ascii?Q?25n+rJIjI5kK8UGQt9TEgACC8fYsByMQia7yzscX8okfMjwwVI0zFvk+9+uh?= =?us-ascii?Q?laWihT3HufZ8Vicw3t8QVyH9AXwKujSqfFx/MOHvL0hibr/mp7dWBD0+sKFf?= =?us-ascii?Q?XnCMT7Mj4mSZowRoUYZ5fAN5Bt37MLiHL+BMqN3qxKHoTm6P2gITBCDI+3Oy?= =?us-ascii?Q?C8OVbCz7aOwzUYOz+BoSOfijzBrEAVDpv4bRiu2SZ0A5rmBVaY08k5Oiq8mS?= =?us-ascii?Q?02xMp/LmeUU1p13K/pVoGLd66mc5tLRXFFJ6aaAdFNQNT8tGH9FHmFmQsBch?= =?us-ascii?Q?WPzeHkKi3SZKVnKaRfWy/fQJA4JR5MO84cDC/Q80jK5+BPIJkSuHxEkYhKLr?= =?us-ascii?Q?tEZ5WtdcOAc/nBf3Ul7vuiNzVS/i5vH35pdyWauHDhuxe8KUIUyX0TLvRa7U?= =?us-ascii?Q?DW1CJfzPnXTo9sib1DnIYK1/hEHFL3W2o2/TVnjISsokix/R5qA4YFOeTg9v?= =?us-ascii?Q?oLgR+IV3Vfjh8sX2e3EHPhj1pcS0HFl2YkEj?= X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(7416014)(1800799024)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Aug 2025 18:57:07.4542 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 875fa1f4-f4a8-47d5-3b71-08ddda9b3388 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00002320.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9257 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Shahar Shitrit Extract the health reporter recovery abort logic into a separate function devlink_health_recover_abort(). The function encapsulates the conditions for aborting recovery: - When auto-recovery is disabled - When previous error wasn't recovered - When within the grace period after last recovery Signed-off-by: Shahar Shitrit Reviewed-by: Carolina Jubran Reviewed-by: Jiri Pirko Signed-off-by: Tariq Toukan --- net/devlink/health.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/net/devlink/health.c b/net/devlink/health.c index ba144b7426fa..9d0d4a9face7 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -586,12 +586,33 @@ static int devlink_health_do_dump(struct devlink_heal= th_reporter *reporter, return err; } =20 +static bool +devlink_health_recover_abort(struct devlink_health_reporter *reporter, + enum devlink_health_reporter_state prev_state) +{ + unsigned long recover_ts_threshold; + + if (!reporter->auto_recover) + return false; + + /* abort if the previous error wasn't recovered */ + if (prev_state !=3D DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) + return true; + + recover_ts_threshold =3D reporter->last_recovery_ts + + msecs_to_jiffies(reporter->graceful_period); + if (reporter->last_recovery_ts && reporter->recovery_count && + time_is_after_jiffies(recover_ts_threshold)) + return true; + + return false; +} + int devlink_health_report(struct devlink_health_reporter *reporter, const char *msg, void *priv_ctx) { enum devlink_health_reporter_state prev_health_state; struct devlink *devlink =3D reporter->devlink; - unsigned long recover_ts_threshold; int ret; =20 /* write a log message of the current error */ @@ -602,13 +623,7 @@ int devlink_health_report(struct devlink_health_report= er *reporter, reporter->health_state =3D DEVLINK_HEALTH_REPORTER_STATE_ERROR; devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); =20 - /* abort if the previous error wasn't recovered */ - recover_ts_threshold =3D reporter->last_recovery_ts + - msecs_to_jiffies(reporter->graceful_period); - if (reporter->auto_recover && - (prev_health_state !=3D DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || - (reporter->last_recovery_ts && reporter->recovery_count && - time_is_after_jiffies(recover_ts_threshold)))) { + if (devlink_health_recover_abort(reporter, prev_health_state)) { trace_devlink_health_recover_aborted(devlink, reporter->ops->name, reporter->health_state, --=20 2.31.1 From nobody Sat Oct 4 17:30:19 2025 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2044.outbound.protection.outlook.com [40.107.236.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 955FF307AD4; Wed, 13 Aug 2025 18:57:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.236.44 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111441; cv=fail; b=fdvQlErd9WQJsEIiAA+Hh66nNPESeLuftvU6wzMBUIsMyABHOz8my9IVYPnbCAVEjV0y71rfO3lq294xdJId/9zK+kBvIuehyTxbgQM5SE15m7A3sRgjQR0II0c1LENLz1SKIDioRCC4VzpSTlsgkn2ZSaoWyCiBSqnM/j3kwZY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111441; c=relaxed/simple; bh=nw7e42wjf921ppD4T5yj0hdnNztiy5v0kdLKQWckYfA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CdHPc1cOQcxD9Ob0AZbiuHl9AoEvsTNMpFw+rW2gO790oW8j2iIHRs+ynEbFORhFpeQ85pqTnui1Bgjy/f9FQC1kn87IgLXbnG+n2PnetXP6KBRRiV+hEVHPr3uwXQ4JCVyBn2cdzBZmzgq1jlzqbAOikRgQEHlxCweoI5/ecUc= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Gc8akc8V; arc=fail smtp.client-ip=40.107.236.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Gc8akc8V" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=P6k+d+oyhG2lekJAQj0i+mkDiOX+BKVTS1IrjSzL82P6rUFVw/gSy9fHGMQv/sgAQBrRcS4h81TnPXgKI77P7oSQUNPskkGod0q2sFHQzxs0pyCFjgRnX/aVuIQ93hJeLoH1bBTliy7k6usmkgltrfTJKHze9u/Vl+YSaDZlRGNwPE9zPhhs4bnX6bhoQo6emT90Psmm52Wd3AtR3pIVUPdC2FEXPLmdBiNFAAoUgjnbdvTTm5dquicQ3b/5PBL0XXez27FYz9TqIcJgFg1vX1Ctk+lmLvO734RRQfeiaiAc6FMa6uSk1apglietaOUTnhcP2d7mOxcrsbepKwhJJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MNN/6e+AJSxVIWquN8UVMvimWWMzFfsxXSxW0WMrrxg=; b=JOgZPnnLJKILkEghavWa7tpFpwpG9xsYQvOxVWvdbDVA3h1MQ0aUg/w0pXGkRCHZpfczPDLPSh+yXerxYkq2g1haEYGRWYYCKtAwzlyzpGYNDpfTFBcSHj5gqp8tR8Cce1zI1Qmxgn7oBAe6vJ7w1JFusXMDuAMoyc+TzOaDPPWM95Zd2DBo88q2ZLZG3+WDgt4wkVknPPBJOq2dOsCM7ebaf787SjuthfUVDIwzJJ/Aw6WPtX9URy+ngWdPcs1slGLMab5bscGEuqF/GU/wRXwp2aXYH05+0Y7w2tCUaGA+A0mwGfBv1YZVJ+COwrL8UPyrHKbu0qPPsXpvA/dUYg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=marvell.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MNN/6e+AJSxVIWquN8UVMvimWWMzFfsxXSxW0WMrrxg=; b=Gc8akc8VIU5e05yJw9lUlGsLwejgSO8kMuwmnatBxPVAAF6etBHJ39QiiML8CRL4g13J+xkUFRQsGn3lf3FaU4CRPJBfuo0PifGCrH7hP2IIpMPmfw61gySwOYohYgRB0RbXxkfgy5Rt/TVTFNDdpmWD7CMOxflaBqdJz/s3HKn3RzJn/IGFSEvpZqlvsDTpk7Ep9VNcAGKMIJyzxcre7jXpRTQydN4L4ZLX4Qnzbomm8N3Mix55FrgPX9bQCmx7z5uda+2/kW7ksVICj2yZkjM09PD4zS0HBwUf79tTf8QVu+Ms04OsEon9Lnfd28pcRanwp+x8PHCTMgGQqQqXxA== Received: from SJ0PR05CA0064.namprd05.prod.outlook.com (2603:10b6:a03:332::9) by DS0PR12MB6533.namprd12.prod.outlook.com (2603:10b6:8:c2::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.14; Wed, 13 Aug 2025 18:57:10 +0000 Received: from SJ5PEPF000001F4.namprd05.prod.outlook.com (2603:10b6:a03:332:cafe::cb) by SJ0PR05CA0064.outlook.office365.com (2603:10b6:a03:332::9) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9031.12 via Frontend Transport; Wed, 13 Aug 2025 18:57:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001F4.mail.protection.outlook.com (10.167.242.72) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.11 via Frontend Transport; Wed, 13 Aug 2025 18:57:09 +0000 Received: from rnnvmail203.nvidia.com (10.129.68.9) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:56:57 -0700 Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail203.nvidia.com (10.129.68.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:56:57 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.9) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 13 Aug 2025 11:56:49 -0700 From: Tariq Toukan To: Jiri Pirko , Jiri Pirko , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Lunn , "David S. Miller" CC: Donald Hunter , Jonathan Corbet , Brett Creeley , Michael Chan , Pavan Chebbi , "Cai Huoqing" , Tony Nguyen , Przemek Kitszel , Sunil Goutham , Linu Cherian , Geetha sowjanya , Jerin Jacob , hariprasad , Subbaraya Sundeep , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Mark Bloch , Ido Schimmel , Petr Machata , Manish Chopra , , , , , , "Gal Pressman" , Dragos Tatulea , "Shahar Shitrit" Subject: [PATCH net-next V3 3/5] devlink: Introduce error burst period for health reporter Date: Wed, 13 Aug 2025 21:55:47 +0300 Message-ID: <1755111349-416632-4-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> References: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001F4:EE_|DS0PR12MB6533:EE_ X-MS-Office365-Filtering-Correlation-Id: 2d3cb27d-c35d-45ee-c6ca-08ddda9b34b2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|376014|7416014|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?N2sZPYyRDJnSXSj40nQXciVk67RagSDqYj+BFNqoJ/8RRHryvXuSjxKCwUFH?= =?us-ascii?Q?P06gCK2IDrxuJbft/HJ7tG1GKZ9MnX5b7TPuAfwxY/7aFq4EjA6337OtHx3m?= =?us-ascii?Q?yXB/ET1j37W1F5g0Um7Zz1lj/tDUzD97e3cE+qfTk5tnn6HMnmhm7tuH+R40?= =?us-ascii?Q?g/5h81zSw6l/v9VuMaf6YuqWI4JGxb0t2XegWNNy1gPPHyWyn1iJpmvSr2Yo?= =?us-ascii?Q?J94QhHhzjfD1YT1bVlVqL5Vz4Lt73nObawld2VsW8cavytHiJbXgvEp6jy9x?= =?us-ascii?Q?WE3fYvl29IhVMhW6W3eoDwqtrIaK2SBhmItiC1f67d2Fix3IoRVwfsVd8xlV?= =?us-ascii?Q?uApC79job3fC1r86ytGmzDy2bJn5h+4rAqDrZqNyxRMC8kNumikwAYSZgWpn?= =?us-ascii?Q?ohY/W1Sbt1d0RJd4StQJiOGAUhrdnGK9VVDqxL9sJgDQkZa8T427s6W4HsOC?= =?us-ascii?Q?xy7E8wbV6R/iYhOYVtRtWUCdWkd7cG3+CbbhDcYykIBfMNJJRUrZEAC70RnW?= =?us-ascii?Q?GIhHdBHW00LkPAnRj3P0d41tuqiLYFvQGNyL1hdn0p0tvz5anUUE4AaSUYT5?= =?us-ascii?Q?OZuJvLYJ+gz+R6rFwN3HsCHFKSVtHmdGFbY2NEMxRaIHxEmc+EqFSclRAqV5?= =?us-ascii?Q?s7RBDTwDtJq2eiYj1Kc81hPa4KaPVQu5cy38wjApqLch0pLH//qYj8i1hb4A?= =?us-ascii?Q?hEW8SnhfU5Xf7/IWAA12f1BCGeEIo3Cg2DIZEhdhoT50sc803ZfOE8BNoBnG?= =?us-ascii?Q?zVWiTQHGJnHLTDqF1nTayIRaFKSbXvLEt09TRKDK7gfnQWFlsPFG/EHKcndy?= =?us-ascii?Q?G0if75GiiiD3xw3RnRJNjxrNbDhfCdL/Udsw9awFKh7VY92bSTXh3vt9lHSW?= =?us-ascii?Q?REMqKgaatPiz1K8T9VVsMKepBJdoyb5w+ZU8B62EgynsX15kpZyQCHbjJEes?= =?us-ascii?Q?SgNUrKBc0BbAl5MuWedjrZ8PUnv60rt5Ge5UbOsDlgE2qS90JNld+tMrIIaF?= =?us-ascii?Q?ZYbLJhzWQejKqiX6NSig4JZ998hEFQmVun8JVaQcfKvrd+N/c4Jkmc5VihP8?= =?us-ascii?Q?+F6qMS7PyFs8Gb0ICmYx2hb+yYgvFogKCzc48HijheumfTK8dr24Mtbfz7h4?= =?us-ascii?Q?qlA4SsDgBWBSh+dh9jvEfChkSmTWZ6xnmLh9qKVlfHwsDdXcKzYn+RjJoTge?= =?us-ascii?Q?nR6YQ6yFcbC6wuvS1d5EAAyNJjDUH3xxCTcXBspsgNfRCz1uJwOtH9/l14Qg?= =?us-ascii?Q?nrggqcfQbMXFBnCi8F01JI44q5kD6W3tOD2V4pV6PE4KsudzBDjPXRxM5n3s?= =?us-ascii?Q?2dRudkQDKKOrUh5gvVSX82Xnih0qmGKOBmz0hLcBgjm1RdqWL5RVQKrjGdDr?= =?us-ascii?Q?bxOHMxYLKt1MdrNqYUngURAZEpFsgFk7hBuw6fTRL+Nd+mCalTiw+xJLURTg?= =?us-ascii?Q?o0XlsYPQOgQS3UvETyjU7z5fOyLlZAeKmGMJpHeEo8LgSSk4u5IyxbbfJLE7?= =?us-ascii?Q?z6LESGoZYDgxT0yoIf9RkjXYKWqU8FSUIWiP?= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(376014)(7416014)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Aug 2025 18:57:09.5118 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2d3cb27d-c35d-45ee-c6ca-08ddda9b34b2 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001F4.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6533 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Shahar Shitrit Currently, the devlink health reporter starts the grace period immediately after handling an error, blocking any further recoveries until it finished. However, when a single root cause triggers multiple errors in a short time frame, it is desirable to treat them as a bulk of errors and to allow their recoveries, avoiding premature blocking of subsequent related errors, and reducing the risk of inconsistent or incomplete error handling. To address this, introduce a configurable error burst period for devlink health reporter. Start this period when the first error is handled, and allow recovery attempts for reported errors during this window. Once error burst period expires, begin the grace period to block further recoveries until it concludes. Timeline summary: Reviewed-by: Carolina Jubran Reviewed-by: Jiri Pirko ----|--------|------------------------------/----------------------/-- error is error is error burst period grace period reported recovered (recoveries allowed) (recoveries blocked) For calculating the error burst period duration, use the same last_recovery_ts as the grace period. Update it on recovery only when the error burst period is inactive (either disabled or at the first error). This patch implements the framework for the error burst period and effectively sets its value to 0 at reporter creation, so the current behavior remains unchanged, which ensures backward compatibility. A downstream patch will make the error burst period configurable. Signed-off-by: Shahar Shitrit Reviewed-by: Carolina Jubran Reviewed-by: Jiri Pirko Signed-off-by: Tariq Toukan --- include/net/devlink.h | 4 ++++ net/devlink/health.c | 22 +++++++++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index a65aa24e8df4..0c7b41cbb0bd 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -744,6 +744,9 @@ enum devlink_health_reporter_state { * @test: callback to trigger a test event * @default_graceful_period: default min time (in msec) between recovery attempts + * @default_error_burst_period: default time (in msec) for + * error recoveries before + * starting the grace period */ =20 struct devlink_health_reporter_ops { @@ -759,6 +762,7 @@ struct devlink_health_reporter_ops { int (*test)(struct devlink_health_reporter *reporter, struct netlink_ext_ack *extack); u64 default_graceful_period; + u64 default_error_burst_period; }; =20 /** diff --git a/net/devlink/health.c b/net/devlink/health.c index 9d0d4a9face7..c4a028e37277 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -60,6 +60,7 @@ struct devlink_health_reporter { struct devlink_port *devlink_port; struct devlink_fmsg *dump_fmsg; u64 graceful_period; + u64 error_burst_period; bool auto_recover; bool auto_dump; u8 health_state; @@ -123,6 +124,7 @@ __devlink_health_reporter_create(struct devlink *devlin= k, reporter->ops =3D ops; reporter->devlink =3D devlink; reporter->graceful_period =3D ops->default_graceful_period; + reporter->error_burst_period =3D ops->default_error_burst_period; reporter->auto_recover =3D !!ops->recover; reporter->auto_dump =3D !!ops->dump; return reporter; @@ -508,11 +510,25 @@ static void devlink_recover_notify(struct devlink_hea= lth_reporter *reporter, devlink_nl_notify_send_desc(devlink, msg, &desc); } =20 +static bool +devlink_health_reporter_burst_period_active(struct devlink_health_reporter= *reporter) +{ + unsigned long burst_threshold =3D reporter->last_recovery_ts + + msecs_to_jiffies(reporter->error_burst_period); + + return time_is_after_jiffies(burst_threshold); +} + void devlink_health_reporter_recovery_done(struct devlink_health_reporter *repo= rter) { reporter->recovery_count++; - reporter->last_recovery_ts =3D jiffies; + if (!devlink_health_reporter_burst_period_active(reporter)) + /* When error burst period is set, last_recovery_ts marks the + * first recovery within the burst period, not necessarily the + * last one. + */ + reporter->last_recovery_ts =3D jiffies; } EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done); =20 @@ -599,7 +615,11 @@ devlink_health_recover_abort(struct devlink_health_rep= orter *reporter, if (prev_state !=3D DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) return true; =20 + if (devlink_health_reporter_burst_period_active(reporter)) + return false; + recover_ts_threshold =3D reporter->last_recovery_ts + + msecs_to_jiffies(reporter->error_burst_period) + msecs_to_jiffies(reporter->graceful_period); if (reporter->last_recovery_ts && reporter->recovery_count && time_is_after_jiffies(recover_ts_threshold)) --=20 2.31.1 From nobody Sat Oct 4 17:30:19 2025 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2050.outbound.protection.outlook.com [40.107.220.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A67E307496; Wed, 13 Aug 2025 18:57:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.220.50 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111445; cv=fail; b=QII/dCF+zmHEAFkxthSiznOEDe8zVr2HyCzPDarKmDSFd5iaQdybAdjC675qMgVbZYme3461jYdYc2KQ57gqCvqmSAEg0zPQEFiQKsiwEQrWvz2x55r3mtkasqJzltMfxPDymBoWqytrNdN85QnrltmROsWz8+O/AltI94QJIj8= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111445; c=relaxed/simple; bh=lqjosAPgrgm7IEkC8V4YHnDVbRtV+5NoXTliONViuRI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IgaGUlfFzR0OPjPrxZgaRs3oUZ+ThaMXblcuP937d4hUZJyxHIfbmMBf6SBQRLfo8mjmloKkDET61ou80vyWsms1wEJJ7fcqUv0B60KqF7CVJOyaZKK30dElqfxrFgZVrMCW3EIoAKQtrwvSRHCL7uZbyYXyWMILfRuhYk8xi2k= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=FkTT7DVG; arc=fail smtp.client-ip=40.107.220.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="FkTT7DVG" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=txqfqE66W1A4432qzMVC+P/8nqB3Tf+nbeSJ2Z9Nt68vFoYpw3/DzpbyNtEvSAVDh77BEIi5KYeLKaYTICPuLAT1/AjNKucYeGZF+oZWkr5dGovSon6HSnw+q8s+4ln/q7Vv9C+BL2cXAlhrhfC6Z1MSTcsNBh8IKeaauVbiCRI3LBiwezdcwKxe6HaLy+plT9ESNOHX+b+aUQV8fbegNAxe7zrul8kZjJKCV5KOXwx7lQ6rd2nx5nSve6Bj3RDAry0RfpYRTeogFtICK8oMwzGkfm5ZE08pEjPmxiJ1bnSRf/DMp0a2mtn2W5CQ/iM8Rq8hNkHzie6yiOvRf9mr9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5bx0nltJAqPDiw32nYF7myL0CiKz0propbjqOIW/bYQ=; b=HOzkmNnzoGY/NEBn+GyM306xMT4fbMaJO4oVm6sPd+RLQk+/NSi+k1Rm1XXfq1Kmb516EfmfIAaAGZdctJbx07WuqEw/rgNo0exl9luyGHjykbpcbzYu6tmFbR068PBVHPlUZXdHUJIz5eB3qziLRJzQPwygfHAWfrvuQmdaogD9BobLco0d8G09R37295lIW7ZL8QuwQam8rybYhsIr29bizbKMTMbJgOa1K6miS03oqm4PNVyjcM3kN38ddtI5a3a7YUzegHxTq362jAY12tC1bfEBuQIYmnEimTX3+526T1PIdW9+wo863qLau+wuVmVDkK1n07ruyCavCpJg1Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=marvell.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5bx0nltJAqPDiw32nYF7myL0CiKz0propbjqOIW/bYQ=; b=FkTT7DVG8FHvneTX1yOd78oUI/27skV/nA+nm/G+7EpnKWQzWrhfGIeK2LFhOOEfvUA4dzXJRMwBC13O+Br91MTf6GMozK8pj08YWSgV9Td0zbvLKg0ViesvnaK864nAPlinA+76ozEWrtXUj28p7kVLyJSDAHe4fT86aPxreWNCc50/AszaVMDWZ/OSk1/gTJTEzGd8aASy/W1jqnzbbaUZ8EO6tL4H2dBsBdsZGyHULHD1ZylTX8Ow8cXaldBDn7UrjjH/bSRST/RGU7HguzN4W8IJausSV57dxzuEcquHOY9AJZylIcSEWpBoGyrygnaEx2WiwMhOburFfeRYoQ== Received: from SJ0PR03CA0089.namprd03.prod.outlook.com (2603:10b6:a03:331::34) by PH0PR12MB8152.namprd12.prod.outlook.com (2603:10b6:510:292::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.13; Wed, 13 Aug 2025 18:57:20 +0000 Received: from SJ1PEPF00002325.namprd03.prod.outlook.com (2603:10b6:a03:331:cafe::1f) by SJ0PR03CA0089.outlook.office365.com (2603:10b6:a03:331::34) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9031.15 via Frontend Transport; Wed, 13 Aug 2025 18:57:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ1PEPF00002325.mail.protection.outlook.com (10.167.242.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.11 via Frontend Transport; Wed, 13 Aug 2025 18:57:20 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:57:06 -0700 Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:57:06 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.9) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 13 Aug 2025 11:56:57 -0700 From: Tariq Toukan To: Jiri Pirko , Jiri Pirko , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Lunn , "David S. Miller" CC: Donald Hunter , Jonathan Corbet , Brett Creeley , Michael Chan , Pavan Chebbi , "Cai Huoqing" , Tony Nguyen , Przemek Kitszel , Sunil Goutham , Linu Cherian , Geetha sowjanya , Jerin Jacob , hariprasad , Subbaraya Sundeep , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Mark Bloch , Ido Schimmel , Petr Machata , Manish Chopra , , , , , , "Gal Pressman" , Dragos Tatulea , "Shahar Shitrit" Subject: [PATCH net-next V3 4/5] devlink: Make health reporter error burst period configurable Date: Wed, 13 Aug 2025 21:55:48 +0300 Message-ID: <1755111349-416632-5-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> References: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00002325:EE_|PH0PR12MB8152:EE_ X-MS-Office365-Filtering-Correlation-Id: c34441a1-e032-4200-d292-08ddda9b3b54 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|7416014|36860700013|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?B?WHNISTl4V090bmFGcWpkTEk2cTUwUmR0SFFKblBlaUNwTGxvWTEzL3NhSENa?= =?utf-8?B?QUxQNEZEdDZxeWhFTmt0YlQySEJzYlFUelRGbkV1VUVMQW1TY1hhUHE2NmU5?= =?utf-8?B?WElxemNXTUc4bzVpMm9BZWxRM1B0M1JPbWpPaG5wSHp4aEd3UHM3R1FKNEta?= =?utf-8?B?N2NJdmxaaVJCZHBWSzdvM3k1OTdIcDZ0a29rc1ljenpGN2xpbkdzU3d1b0sy?= =?utf-8?B?Wld5Y0dsZVllSWVZUndtenl0Z3VReUdFOHZlS3JiUmdlajU2cm1BZnNrNzZH?= =?utf-8?B?eFRTNy90Ym5SbzZpYjlyVTJBWmR0cGNSVlQ5R1FHWUtkdURPK2ZQY011VWtq?= =?utf-8?B?TUtVcU5YdG4wSjlHL2lmUmNEYVVoMW9qNGkyUjlVZVhxTUFnMk41S0h6MEZs?= =?utf-8?B?S0ZvbTVxVHZGbW01c0t4b2NkdDNWU0NNLzRVR3BJL2RXZmJIbG8xdDNvZWpj?= =?utf-8?B?dzMxWXAvTC9vTEZHYzhJVG5NcDJXKzF0Rk9UOXN4TFJITG9rUy9PNjVna3FZ?= =?utf-8?B?NmZxNEFvZHE5SlZzbkxWNzFVSlpVd0o2MDRVY0hzUmprR3RJRWt1NnArY01O?= =?utf-8?B?MGZOYUgrUXUvdjh6ZHk0ZUhPaUg1RWNodjdWS09kNTJmdVA5ZXR5WHRtNnpL?= =?utf-8?B?bWsxVFZRTUptSG5ULzc4ZmN1Q21TcVJNUlk3cnkxY0E5R3RwL0txQWNCWTlK?= =?utf-8?B?c0NZYUw0cE4vZ1IwQWpTNHhuWWVjOFFCS3AxblUvZ0NKcDZWQ2dxazFZRHZS?= =?utf-8?B?TkJFeEFsYXNzOVdud3pQSWhZS2x6ZzFubG9sYzdVcTJLYzIweW00aUYzcmVk?= =?utf-8?B?RjhtTjFyQUdNRHhuZ2dsWXhxNWhZRk5VRWJvQ3RNNVVHMXlHQTVlSWdPbVlt?= =?utf-8?B?KzFxaHRENkVEdGxud1Y3Nk5YZStPQlh4YnY0d0RYdHBVSEpuNDdQdzllMGw1?= =?utf-8?B?NnVrWVNvWEtueC8ybWQ1OEQxNzRoZTVDVDFEeXNaM3RaZkZpOW9tam9qc2N2?= =?utf-8?B?VEt1S0NwZlkzUzcrbnZIY2pobDBaYzFBNG9kWmN4dlk1T3Q2N3ZSaW5xRTIw?= =?utf-8?B?WTlQcFZ2TkM5MG9pbWVDNmd1YmtXbWlCdlBiOE1EL1M1MHhZbGJER3hQcVBa?= =?utf-8?B?Z2c5OWE1M1ZlY3N4dkxjU1l3cmJ0UDVWcE0xc1hyaUpsS3NzZHVXSWkySDQ3?= =?utf-8?B?dEtpcHdjTTRNQ0lCQTRRM2c5WTJKeTl2cXQrWkxyMTRPdTY3WS9aUnhQTmJL?= =?utf-8?B?U05LUWtBcHcvMzJSNjJNRkd5aDh3Ynd3aFZsQmRyY0h3ZkY0bWxtQXlvODIr?= =?utf-8?B?cU8yZWMyRkFreFFzTzJsQWx3WmczYm8rZkN5alZqb2tTL2dES21GZDVwakJF?= =?utf-8?B?SHdUaXNvb1laM0RkbU81N2FVNUxXZHpPQllWb2pHeGQ1WlJVaEN0NFF1c0Ey?= =?utf-8?B?MjZsekFmQ05saU9GMmZvR2NXYmwybjVRbWI3a3RRWVNndmhwSHIwV1NWOUU2?= =?utf-8?B?MFlHaVFzdTJweWdZWTZ3cG9uSkhFVkVIeWV6NHZhcytVd3h4cVJPdzVUcVU3?= =?utf-8?B?Zy9XYUtNWEUzSWo3dGx3OXFDWjRJdWdyQWhwSkw0U1hGRVJ3S25VY2ZUU084?= =?utf-8?B?eVpCaHQwb2FYOHA4RmZHMTQrMWMxSDJXR2k2YTZWaWdzY0NIOG1UZWx0TWxS?= =?utf-8?B?NmhoNkp2Q3E0S0JxaW11TCszSEpMVEIwRGpBaFVTOW1VV1dFclVnQ3FqK3FX?= =?utf-8?B?TGJNSHZTMEJQdmkyRTBRMjhLcUtrSnJCaWxtb0MvNjBMR0xHbG54ZXZtUjcy?= =?utf-8?B?VFp0ZWpSaHdmUzVNaFp6UTcvMnpKOVdmWVV1Y0hGS0FJbjU5akVOVFF1MXor?= =?utf-8?B?RHo1UWtkRjNNOVQzQ0M3QnM2N0tPNUtGSGtSaHlLWk03RGptUUVSVTBOd1A4?= =?utf-8?B?MzN2K0JtQ2lTOFFRa3FIM3JGUnJYNmJoR0RRQlZmY2ptTGlibEV2M0ZHMkNZ?= =?utf-8?B?aGlPNW13a0FZTVJYNlZMMlprT0FtMFZDSkhRSFV1RkYvMi8zL3V5T3dHUHIv?= =?utf-8?Q?41W7AA?= X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(7416014)(36860700013)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Aug 2025 18:57:20.6146 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c34441a1-e032-4200-d292-08ddda9b3b54 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00002325.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB8152 From: Shahar Shitrit Enable configuration of the error burst period =E2=80=94 a time window starting from the first error recovery, during which the reporter allows recovery attempts for each reported error. This feature is helpful when a single underlying issue causes multiple errors, as it delays the start of the grace period to allow sufficient time for recovering all related errors. For example, if multiple TX queues time out simultaneously, a sufficient error burst period could allow all affected TX queues to be recovered within that window. Without this period, only the first TX queue that reports a timeout will undergo recovery, while the remaining TX queues will be blocked once the grace period begins. Configuration example: $ devlink health set pci/0000:00:09.0 reporter tx error_burst_period 500 Configuration example with ynl: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/devlink.yaml \ --do health-reporter-set --json '{ "bus-name": "auxiliary", "dev-name": "mlx5_core.eth.0", "port-index": 65535, "health-reporter-name": "tx", "health-reporter-error-burst-period": 500 }' Signed-off-by: Shahar Shitrit Reviewed-by: Carolina Jubran Reviewed-by: Jiri Pirko Signed-off-by: Tariq Toukan --- Documentation/netlink/specs/devlink.yaml | 6 ++++ .../networking/devlink/devlink-health.rst | 2 +- include/uapi/linux/devlink.h | 2 ++ net/devlink/health.c | 30 +++++++++++++++++-- net/devlink/netlink_gen.c | 5 ++-- 5 files changed, 39 insertions(+), 6 deletions(-) diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netli= nk/specs/devlink.yaml index bb87111d5e16..0e81640dd3b2 100644 --- a/Documentation/netlink/specs/devlink.yaml +++ b/Documentation/netlink/specs/devlink.yaml @@ -853,6 +853,9 @@ attribute-sets: type: nest multi-attr: true nested-attributes: dl-rate-tc-bws + - + name: health-reporter-error-burst-period + type: u64 - name: dl-dev-stats subset-of: devlink @@ -1216,6 +1219,8 @@ attribute-sets: name: health-reporter-dump-ts-ns - name: health-reporter-auto-dump + - + name: health-reporter-error-burst-period =20 - name: dl-attr-stats @@ -1961,6 +1966,7 @@ operations: - health-reporter-graceful-period - health-reporter-auto-recover - health-reporter-auto-dump + - health-reporter-error-burst-period =20 - name: health-reporter-recover diff --git a/Documentation/networking/devlink/devlink-health.rst b/Document= ation/networking/devlink/devlink-health.rst index e0b8cfed610a..2279a4370003 100644 --- a/Documentation/networking/devlink/devlink-health.rst +++ b/Documentation/networking/devlink/devlink-health.rst @@ -50,7 +50,7 @@ Once an error is reported, devlink health will perform th= e following actions: * Auto recovery attempt is being done. Depends on: =20 - Auto-recovery configuration - - Grace period vs. time passed since last recover + - Grace period (and error burst period) vs. time passed since last re= cover =20 Devlink formatted message =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 9fcb25a0f447..458915c22990 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -636,6 +636,8 @@ enum devlink_attr { =20 DEVLINK_ATTR_RATE_TC_BWS, /* nested */ =20 + DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD, /* u64 */ + /* Add new attributes above here, update the spec in * Documentation/netlink/specs/devlink.yaml and re-generate * net/devlink/netlink_gen.c. diff --git a/net/devlink/health.c b/net/devlink/health.c index c4a028e37277..d01eb4eaf89c 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -113,7 +113,9 @@ __devlink_health_reporter_create(struct devlink *devlin= k, { struct devlink_health_reporter *reporter; =20 - if (WARN_ON(ops->default_graceful_period && !ops->recover)) + if (WARN_ON(ops->default_error_burst_period && + !ops->default_graceful_period) || + WARN_ON(ops->default_graceful_period && !ops->recover)) return ERR_PTR(-EINVAL); =20 reporter =3D kzalloc(sizeof(*reporter), GFP_KERNEL); @@ -293,6 +295,11 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg, devlink_nl_put_u64(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, reporter->graceful_period)) goto reporter_nest_cancel; + if (reporter->ops->recover && + devlink_nl_put_u64(msg, + DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD, + reporter->error_burst_period)) + goto reporter_nest_cancel; if (reporter->ops->recover && nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, reporter->auto_recover)) @@ -458,16 +465,33 @@ int devlink_nl_health_reporter_set_doit(struct sk_buf= f *skb, =20 if (!reporter->ops->recover && (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] || + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD])) return -EOPNOTSUPP; =20 if (!reporter->ops->dump && info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) return -EOPNOTSUPP; =20 - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) { reporter->graceful_period =3D nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]); + if (!reporter->graceful_period) + reporter->error_burst_period =3D 0; + } + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD]) { + u64 burst_period =3D + nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD]); + + if (!reporter->graceful_period && burst_period) { + NL_SET_ERR_MSG_MOD(info->extack, + "Cannot set error burst period without a grace period."); + return -EINVAL; + } + + reporter->error_burst_period =3D burst_period; + } =20 if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]) reporter->auto_recover =3D diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c index d97c326a9045..a02da5a0002f 100644 --- a/net/devlink/netlink_gen.c +++ b/net/devlink/netlink_gen.c @@ -389,7 +389,7 @@ static const struct nla_policy devlink_health_reporter_= get_dump_nl_policy[DEVLIN }; =20 /* DEVLINK_CMD_HEALTH_REPORTER_SET - do */ -static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLI= NK_ATTR_HEALTH_REPORTER_AUTO_DUMP + 1] =3D { +static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLI= NK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD + 1] =3D { [DEVLINK_ATTR_BUS_NAME] =3D { .type =3D NLA_NUL_STRING, }, [DEVLINK_ATTR_DEV_NAME] =3D { .type =3D NLA_NUL_STRING, }, [DEVLINK_ATTR_PORT_INDEX] =3D { .type =3D NLA_U32, }, @@ -397,6 +397,7 @@ static const struct nla_policy devlink_health_reporter_= set_nl_policy[DEVLINK_ATT [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] =3D { .type =3D NLA_U64, }, [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] =3D { .type =3D NLA_U8, }, [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] =3D { .type =3D NLA_U8, }, + [DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD] =3D { .type =3D NLA_U64, = }, }; =20 /* DEVLINK_CMD_HEALTH_REPORTER_RECOVER - do */ @@ -1032,7 +1033,7 @@ const struct genl_split_ops devlink_nl_ops[74] =3D { .doit =3D devlink_nl_health_reporter_set_doit, .post_doit =3D devlink_nl_post_doit, .policy =3D devlink_health_reporter_set_nl_policy, - .maxattr =3D DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, + .maxattr =3D DEVLINK_ATTR_HEALTH_REPORTER_ERR_BURST_PERIOD, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, { --=20 2.31.1 From nobody Sat Oct 4 17:30:19 2025 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2053.outbound.protection.outlook.com [40.107.244.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A952C305E21; Wed, 13 Aug 2025 18:57:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.53 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111462; cv=fail; b=fGCB78BLH4l/DFavSZEZBYSp6hTLhnKXbZMd/ClVw1NczBwbbqnZXE8SNRM+olKGy2wpt6C6p1of/TMk7JR/AxmSQKqFI7iodzyScKNCQ+sWSnD0Me9ev8woTBdG76GR4yOIEeWqmIhmqH1FdNy24cbKNqhgfATSDifI7ZfHzNE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755111462; c=relaxed/simple; bh=nNclsje+EAqLbsYW3SihoM8DEEoQ1pE/MsVjmYP3Jlk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=W2Kl0rFuwNtXoRGkHnSMwzvs9xWXXoWaAm/6CXcqRiirFX1c0uOlcF7zLLmfClTFfnX272sD7cUYObFmI9wq6eVwjVgPB+pGnz9jxJ7w73hzMFo3NrYCyNYI5WDys/v7WdJSV0FqvdWYtSWYVY5KukuXhQnGSr/4jTuQ6jXUah0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=jI3q1XYm; arc=fail smtp.client-ip=40.107.244.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="jI3q1XYm" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=GviUxmdISR5kUgVxLrR6XYp1PBt6MQ423AnBncg5QHu/RKCNz52SgkCeyOJD4vo3iXhV7NxsxQI+j51ZZ0qfbXh9WZmqFKpTtW4PFihKnCMV+9AHhfRpn6WWVd9kUzXx6D2VBSWi1mGV/dVMWkNEv7KNTmESPQQrdJpvMdr41ODIQJbvcm2zomDObOOCYG2ovt8EKWqh/uOI57wcjAWP3rutE6rl/hUcJ7SyqiVi7XEcX3sxQJ5ba+5NuvvQCRfmi2me1/9anrv+84JaaU/ZMSSulBqxUS/iJ51uyREMuQ7sz7MSTROi5Sp/8nSi3DrasMh2gVyNTh0AbisbjF3kXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Fz5e5AjMB1U0jGd82Z0MTseLOWsgSQPAfq4rg9nEafw=; b=jv251CkVRK9s71dx2TmFM9t+eI/Ckkyzh2KlcZVWZq2Ur28noAGBilGrka0ds20mLGsOjrbP6efuOEB4feoAIvH3Qo4XsY3y5m7AryG+o0Xcmo3+JnzaHHtizUa5Nz3re/c+wCu4sytX4LextaotZa/e2nf4vCg5UAGGQsw9eOvgj0tu1tB5gtirIW+b9Hs/quoWiwhLvnQefuqEm4OmQS5NotRbWMtijlyVhXtWBj/VyEcKfu6FBObeS4kX0QHOwej0nnzCkYZL2oTVtKDMxVsQN+qUfOrI8CfyBZLESWZLhZoPOfcfO369s0pmUCkC/V5votdS6MI/WbOxZn30lQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=marvell.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Fz5e5AjMB1U0jGd82Z0MTseLOWsgSQPAfq4rg9nEafw=; b=jI3q1XYmP+G2DuvJxL4Q9hy9ccqauCuw6fav5L7Amf/ugALJXFJs9V7u1oEY5mOAv9N87asb6awphhyNzIjshpmpo0VNk5hMQlhuMnege1e8UEDCIIUXts9Aimyej15U9Q2KfkwK/TWPOhe3z2ogw5bLYAIjseXL9OpSfJH9q8PzdtUZUbB9Mzn9rWkyYpY+sTsDg2Uzn/gsuVa8cQMByy0SnOfeIbbJRUg49JWFyYLxRlf+ivHlVkikfApKqe4A5bZbQJNolYMOTst/QyMR92CBJ8L97g5T0cvdz8iVvxR3yP6KsPpwPvr5bHIi+krrT+rz7kEk9BeuWxl/4dkRLg== Received: from SJ0PR03CA0172.namprd03.prod.outlook.com (2603:10b6:a03:338::27) by IA0PR12MB8424.namprd12.prod.outlook.com (2603:10b6:208:40c::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9009.21; Wed, 13 Aug 2025 18:57:35 +0000 Received: from SJ5PEPF000001F4.namprd05.prod.outlook.com (2603:10b6:a03:338:cafe::7d) by SJ0PR03CA0172.outlook.office365.com (2603:10b6:a03:338::27) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9009.22 via Frontend Transport; Wed, 13 Aug 2025 18:57:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001F4.mail.protection.outlook.com (10.167.242.72) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.11 via Frontend Transport; Wed, 13 Aug 2025 18:57:34 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:57:15 -0700 Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Wed, 13 Aug 2025 11:57:14 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.9) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Wed, 13 Aug 2025 11:57:06 -0700 From: Tariq Toukan To: Jiri Pirko , Jiri Pirko , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Lunn , "David S. Miller" CC: Donald Hunter , Jonathan Corbet , Brett Creeley , Michael Chan , Pavan Chebbi , "Cai Huoqing" , Tony Nguyen , Przemek Kitszel , Sunil Goutham , Linu Cherian , Geetha sowjanya , Jerin Jacob , hariprasad , Subbaraya Sundeep , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Mark Bloch , Ido Schimmel , Petr Machata , Manish Chopra , , , , , , "Gal Pressman" , Dragos Tatulea , "Shahar Shitrit" Subject: [PATCH net-next V3 5/5] net/mlx5e: Set default error burst period for TX and RX reporters Date: Wed, 13 Aug 2025 21:55:49 +0300 Message-ID: <1755111349-416632-6-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> References: <1755111349-416632-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001F4:EE_|IA0PR12MB8424:EE_ X-MS-Office365-Filtering-Correlation-Id: 8791c638-c752-41fa-4f55-08ddda9b4398 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|376014|7416014|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?Gi0CVJYp+2uR/fVtGKnkuXf3m2BMn4eCamG3HJN0iwiaGWQ8/HUZQpeC4fRY?= =?us-ascii?Q?8cTlwVh9okhtxfWtNJScmEbtBFPhgL54WrZlMTJ8a3Aqg0vLln1hCToh789L?= =?us-ascii?Q?i340f1HNgYiilxaYm7Y4bqH9GQgx3HDKL/3tFGPj9Du4D8R9LolfzLC2d5Vm?= =?us-ascii?Q?sb9bOVCcLZC+WxEbsug6y7/lzYfJJTcA8v1H5apaacWygf+poqBE95yrMqUA?= =?us-ascii?Q?oZkyEzeqTnqvadzL8Gxq5MaQo1cOu50HDTOFIz20uECCU+339ta9z5TtPIW4?= =?us-ascii?Q?0DgRXGh76F4WHa1FCTEH99BpCgFhL/jg9AVcuNLw5EO+YZRQI/SbpWjys0qs?= =?us-ascii?Q?hqQ4FR7Y3FHogdtMt6dES1YnV19oeFzbA2UtDxuWdwzBgMr4QqBdP6Gk7Z8D?= =?us-ascii?Q?Jj6OPX/P8NUzzyxFp1oXjMYRMTpGLrYJloO31he8kn5fW/uq+a726/JmSPdg?= =?us-ascii?Q?aGZduToMPuuVfIXiiz58BX/GJ8wXE3pNg3Wlc7VOyLdXZtuWNXfDFJfAi2jh?= =?us-ascii?Q?lvgNZBQFf8tey+EFrc536NiRPepB41U6qAgNTC1CmWX1sLmyv1LkHfHzzCfS?= =?us-ascii?Q?6JdMnRNnO8Q21OWiQ6N6nFnrAZp1bXr2AEuFIVy5s6aep2MPnP+3ofggcHQl?= =?us-ascii?Q?SpE92BZ4BQtn6sCxTVUdAAsarjYgJ7Rde+ycvrEfHiNX4kQjR+JqT5YcdodV?= =?us-ascii?Q?PPfzVJrHq+yvRcBHqibpFoOtJJs6Oi2JwVUW2BK7zQHLJL/ZbSz2Ke5Qb7hP?= =?us-ascii?Q?o0YxKEJzv+tD1V5YHNkMLMSvTYsKB09+D+6YrbVxIVDt6fe2EoQ0pTlw9Isf?= =?us-ascii?Q?ZNlu2muwakA4z8Aqgt69kbWofnb5il+vYjJIZ8bv5E2SHZ2zIZNQJkq/Xv88?= =?us-ascii?Q?FLB4V/AYRHxsR+y/L2JKPcicO4dM8WcNdJr78ROS94XdG9MO5wwcFSRW70DW?= =?us-ascii?Q?fVk0bgVhYGRJ5Hry+sP1dYNXg5508566s1SODFxrNWRJNcdOsAM8lDI2dX2f?= =?us-ascii?Q?jJpWOqYA36+eYPQB23fe9UUfe1WzkPzDAKf+YKitioqmFFLQX0o6WLrYVm6b?= =?us-ascii?Q?kpJJA9I7AwTOBLXv5uk5rgbf7yk7cmF+AAcI/oxvcQRHgUec0iRgD1Zi/JWS?= =?us-ascii?Q?GlaY5orOG/b3Hj2YoCU5guHiQxjGYWic6ZhaObC4O/0gW010VhLvSmlqblHC?= =?us-ascii?Q?tSZyCFiRXLi+/w1Lt+wV0g+e0UTCM7ounpRj8Cal9S6jC2AEHOPr9/3DuAV6?= =?us-ascii?Q?QhkrCVcQBEy8HUQQf4caHSt86U1seTh9czQ0ZRf637q2MhzvpDheCW/qxLA3?= =?us-ascii?Q?2rNdWXSysfDDdwNXVbSNPCAMkx5EtBapoUnxO430VtFp27Wvj4zs/YuzpFrU?= =?us-ascii?Q?F4MMZXoYAeJrMHOaCn17K8AuPwe2Qdr7AlmRvO533CdcicgwVm8DiSgjv48L?= =?us-ascii?Q?jKxzL3GwoEy8tTCx06yldO+LN3Al7lVhowxvgvThl/guAj1mpqrv2uXyDJUv?= =?us-ascii?Q?GTPW8WG4CYb9AWaTSdKIVPe02k3nfmpRA3xs?= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(376014)(7416014)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Aug 2025 18:57:34.4962 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8791c638-c752-41fa-4f55-08ddda9b4398 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001F4.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB8424 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Shahar Shitrit System errors can sometimes cause multiple errors to be reported to the TX reporter at the same time. For instance, lost interrupts may cause several SQs to time out simultaneously. When dev_watchdog notifies the driver for that, it iterates over all SQs to trigger recovery for the timed-out ones, via TX health reporter. However, grace period allows only one recovery at a time, so only the first SQ recovers while others remain blocked. Since no further recoveries are allowed during the grace period, subsequent errors cause the reporter to enter an ERROR state, requiring manual intervention. To address this, set the TX reporter's default error burst period to 0.5 second. This allows the reporter to detect and handle all timed-out SQs within this window before initiating the grace period. To account for the possibility of a similar issue in the RX reporter, its default error burst period is also configured. Additionally, while here, align the TX definition prefix with the RX, as these are used only in EN driver. Signed-off-by: Shahar Shitrit Reviewed-by: Carolina Jubran Reviewed-by: Jiri Pirko Signed-off-by: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c | 2 ++ drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 7 +++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/dri= vers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c index 1b9ea72abc5a..0e861ae362bc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c @@ -652,6 +652,7 @@ void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_= channel *c) } =20 #define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500 +#define MLX5E_REPORTER_RX_ERROR_BURST_PERIOD 500 =20 static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops =3D { .name =3D "rx", @@ -659,6 +660,7 @@ static const struct devlink_health_reporter_ops mlx5_rx= _reporter_ops =3D { .diagnose =3D mlx5e_rx_reporter_diagnose, .dump =3D mlx5e_rx_reporter_dump, .default_graceful_period =3D MLX5E_REPORTER_RX_GRACEFUL_PERIOD, + .default_error_burst_period =3D MLX5E_REPORTER_RX_ERROR_BURST_PERIOD, }; =20 void mlx5e_reporter_rx_create(struct mlx5e_priv *priv) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/dri= vers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index 7a4a77f6fe6a..7813f18e7dfe 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -539,14 +539,17 @@ void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_p= tpsq *ptpsq) mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx); } =20 -#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500 +#define MLX5E_REPORTER_TX_GRACEFUL_PERIOD 500 +#define MLX5E_REPORTER_TX_ERROR_BURST_PERIOD 500 =20 static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops =3D { .name =3D "tx", .recover =3D mlx5e_tx_reporter_recover, .diagnose =3D mlx5e_tx_reporter_diagnose, .dump =3D mlx5e_tx_reporter_dump, - .default_graceful_period =3D MLX5_REPORTER_TX_GRACEFUL_PERIOD, + .default_graceful_period =3D MLX5E_REPORTER_TX_GRACEFUL_PERIOD, + .default_error_burst_period =3D + MLX5E_REPORTER_TX_ERROR_BURST_PERIOD, }; =20 void mlx5e_reporter_tx_create(struct mlx5e_priv *priv) --=20 2.31.1