From nobody Tue Dec 16 15:29:11 2025 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2067.outbound.protection.outlook.com [40.107.93.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8899C72639; Tue, 6 May 2025 11:27:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.67 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530858; cv=fail; b=WNcURdUHegCiAaTsrRFh+eA+OJAAovA3801VhW1GVcyuqzK5fW72unedh5bya7xNTFS2wuijC65sXkcInhkisz7GTj35HFKl1HM4uwJyWoV5NLTgAmyl+HGWBYxLNdeQ2IifbnNMAF60Acs+kc7OPKXyuha8e5tyjUpjhfXyxXI= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530858; c=relaxed/simple; bh=5J0s/YjUlV0PmKN8U1mF0p8vUjveb0bTbnBF0J0BwIM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=INmN0jdIbJLW223AdJPefA4FsOZHMBfXilzCHplIWjw4bGYmrAOntRVQnwtzb6duId/PQi/ri9YNeutrcm1bm8gk8hG1e3EODtfn7dLZlpFQiPkE6wrFLAy+AZ78sNEK73tr8Dj+evDX2c8d+gkacoLv0uT2xiRKjFxa8sSZlwY= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=AINZlrzw; arc=fail smtp.client-ip=40.107.93.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="AINZlrzw" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ytL5viEloLiyF/1eg87FFbSl3XdZXUxwOUILrScWcN1CWyG3zyAXbLSgcYyZ1Y+kcfNSXliQhHw7z/NonVi2Hu3Rtvfa/Uwaj7f3eDS7cGKbqxUPa6SHwkVF9ob2DejFFExxy6Iyr9XHuWuwHAmvrgTSzt6wxoCQlAR2yfP+rQG/+OGyvFwUIq9/mUgY+RhUqNsoea0mTrvAUyt4vIX3TvpggOmEfMO8ApwCNaFY1xwW+ernae5AACZFfalrDP6BAcly2g4N6Lhpo1M6Ctv3uGKUEciTZz841ECYcxTqEzGIYPBfgek97BW/G0C1Q4vKGTSXNaBG5hVhmc7xZtF05Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sDH/LjJSd3KvT5Z4PHv2gE+uaFj0WA8SJK0OarGgSzw=; b=IgnMMnOEuiWQD/BAsdKxfVUiujqbdrDuAXXFizJuulooMUsTNaV4lN/0OcPpZ2yrzkSxYmSHwRabFpupB1QLESVukZzCt9kVsYCZ9JxkjMy422yzpV9hQ5PGLbGRR233b/VfboK1kmrKgy3X5dQxz++mo1QUtb7fhTUdQF8q69U7Zp7KJigJRHbiFAK2i0KSANCKSOGMvG86wICp0c9WFtJs1YCqtaKO3TJ7nAENuxRESmvWlxBeFxvMowS63K7yhegDDKn1Ijk9iiVaK/efrvwwegmcalf3SApe2cK/PdthrQ4bXBN0ETW9o/zbRG5yYuKx0ZTpmGM9kKCRc8HjSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sDH/LjJSd3KvT5Z4PHv2gE+uaFj0WA8SJK0OarGgSzw=; b=AINZlrzwRJP3m6xtWDuGeARiEnHa5XZa3o9NK+eK02bZmqnv1JxImqTrd8YiHnnd1W3zVkI5FbUNS5aDuWgqlU2Ia/2IpjnKcsgMdShl0Nf6L+Iqp0wNh4b7EwsfyX2Q9Jt4EA8IAcRxCKfvkMvSj3HTLQmJZAhp6opir7bAyqOuzDCAEi9Wr3dtpQ6bPRqpR/uWgHbvkFDwS/9e9gRojOck/wxqQy7NfSgahmCDMClMCLlzZc6If+dzubrTV4dF+fxj67r57Eq0wbD9yJ9O5gmph76+r4G5hsZ1yF8h8vZ7Tu8BxG6uUEv7PviB/JavVsry91fO9V8WFCEX1QTEoQ== Received: from MW2PR16CA0018.namprd16.prod.outlook.com (2603:10b6:907::31) by SA1PR12MB6995.namprd12.prod.outlook.com (2603:10b6:806:24e::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8699.20; Tue, 6 May 2025 11:27:24 +0000 Received: from SJ1PEPF00001CE9.namprd03.prod.outlook.com (2603:10b6:907:0:cafe::41) by MW2PR16CA0018.outlook.office365.com (2603:10b6:907::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8699.31 via Frontend Transport; Tue, 6 May 2025 11:27:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by SJ1PEPF00001CE9.mail.protection.outlook.com (10.167.242.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.18 via Frontend Transport; Tue, 6 May 2025 11:27:23 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Tue, 6 May 2025 04:27:14 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Tue, 6 May 2025 04:27:15 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Tue, 6 May 2025 04:27:10 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jiri Pirko , Gal Pressman , "Leon Romanovsky" , Donald Hunter , "Jiri Pirko" , Jonathan Corbet , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , , , , , Moshe Shemesh , Mark Bloch , Carolina Jubran , Cosmin Ratiu Subject: [PATCH net-next V8 1/5] devlink: Extend devlink rate API with traffic classes bandwidth management Date: Tue, 6 May 2025 14:26:39 +0300 Message-ID: <1746530803-450152-2-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> References: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00001CE9:EE_|SA1PR12MB6995:EE_ X-MS-Office365-Filtering-Correlation-Id: 9c138882-8e7b-4fb0-87cc-08dd8c90f91b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|376014|36860700013|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?yMJ7JU0vi+KM0BhjT+WuMPDWcqlXogc6NRc5R3XOMcjjvCBKdyomygXi0yPL?= =?us-ascii?Q?zSU4hQCRZ+GS8coD1lqRUwzdVMb3SJwopz3calnM8PjMQ/OOrcZ2/5Ud6z6J?= =?us-ascii?Q?nOYdWd9eK91bZ10/chLuahB25vVNmCrud8hQsafIYTZXhbQpTKhSFrU9UNL/?= =?us-ascii?Q?Ny6PQPd9/VoGOYGJo9nOBzCGIJJhWr+VMEhNcHMo/uS7jdfQVo3yqx+Yf9I4?= =?us-ascii?Q?zlE/Jt5sWgXdsMjKYTwmwBqk3sUOKiXwQQGIkuHcNQQwowL74Hh8mqXK4OSe?= =?us-ascii?Q?PzQJgLx8DAXeRwzMWj+lXGKq82TqMv1tLrSRE/JoCacJHzcRf0PURvp2R2EY?= =?us-ascii?Q?R1l9CBIpmp8GW1GPo6xXF7Mpoha6B0MSDrCAcEW4z/7KPHzShgIf+iPMpuGF?= =?us-ascii?Q?o955a7FVPS2oCRUJMON/FNok23zAOlGxdUXPHFYprVdSx/vowMzZp7YF6W+C?= =?us-ascii?Q?nEaz5Wc2EF6Wb5AoVfe+NV9cEI5Nm89WLlwpYn/MKsbUjeEAaNeWEGD2Y3g2?= =?us-ascii?Q?A5XYjSX+sGIsIYaxUsTl4P7ckXDP0l5NAchuMd7Wtmbh5azcAVDvNavT2/N7?= =?us-ascii?Q?T2s0fB3ooCf8/WHlZn1nfsmM73mEQZP6yzejXiObzkhEyLEn8F/HJ6iFE1S3?= =?us-ascii?Q?TTMoq3gPwt3IT8V4wahakDd2oiS8g/rLgd76ERD9uezCBlbxoUV27DDcL5gm?= =?us-ascii?Q?MxnsO6+lCUqpVZUXKeTMYmd7lH4t0lqp2Gp7OJQTi2vONJdNe15fcxjdl0aL?= =?us-ascii?Q?Drnkt2KQUXkt9Gxy/G78Sm2bYoRlDEAtUHKB8F5RXNYg/Fgp3xxsy5laRTfv?= =?us-ascii?Q?kjV6htYtkEiPF+0CDQ8jysAygDW7WA3CfyRKJTucg+PcdNpGxrEFEBuTTeQZ?= =?us-ascii?Q?zL1Z8ZJYI4gaWA7ufHwqtxyd/BFk541BF2jv2qeHb4pjbeYAo2raqPo2xlW+?= =?us-ascii?Q?+hZw3xUjdbFzdEunPGnB8ei1Y209aoba/ucSWdHBxW5hcJA3RenSdeM23jnU?= =?us-ascii?Q?JMl+kAcu1fBHZJALmPzDrUhAZelqB0iLpCt/wmJgJq5Z97WRXffGiv+sWxMQ?= =?us-ascii?Q?LHBsLBqC8mkh11iDlYPmgoPpliZ/YjT0LdsKpz7qh+v13aWCKMWy6zongP1z?= =?us-ascii?Q?zdLvg8H6kIBu9y8dFwQu+wXnQLMc0K728Ij+Ztz0i12gXpyWtnki+gPJjlIl?= =?us-ascii?Q?GMX6Dvlf4CwpCAqAWOld5jks0Gu95I+sKG9CQ/F2iZKoGsfz57r/Edwh8Z6d?= =?us-ascii?Q?LzIIsEKAZcVwgeC1eAjXZfyA3UOyOZlKJhnbaqK+1aschthCiIEMx+tNYuAS?= =?us-ascii?Q?2nCGOHZDFNOzmt2DyCi6W5UtumVZDSWW7c5797m16e2gvzXL8rDDX4olkhIh?= =?us-ascii?Q?RxxHORmapuMl5uAXVnP9kOkwonczb+T50L96Zm3tpzrwVVEt7wp8KhOXNHrW?= =?us-ascii?Q?/zY1hKwtk+J2E8AZS4ACWdTsWBuaQOYCij91L0wsFLYmzRUGlm1Uixz5MS1v?= =?us-ascii?Q?TeS8iPdvr+SZhwINY7zBaTRn9z6yVzQMcTMA?= X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(376014)(36860700013)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 May 2025 11:27:23.8320 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9c138882-8e7b-4fb0-87cc-08dd8c90f91b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00001CE9.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB6995 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Carolina Jubran Introduce support for specifying bandwidth proportions between traffic classes (TC) in the devlink-rate API. This new option allows users to allocate bandwidth across multiple traffic classes in a single command. This feature provides a more granular control over traffic management, especially for scenarios requiring Enhanced Transmission Selection. Users can now define a specific bandwidth share for each traffic class, such as allocating 20% for TC0 (TCP/UDP) and 80% for TC5 (RoCE). Example: DEV=3Dpci/0000:08:00.0 $ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \ tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 $ devlink port function rate set $DEV/vfs_group \ tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0 Example usage with ynl: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"rate-tc-index": 0, "rate-tc-bw": 50}, {"rate-tc-index": 1, "rate-tc-bw": 50}, {"rate-tc-index": 2, "rate-tc-bw": 0}, {"rate-tc-index": 3, "rate-tc-bw": 0}, {"rate-tc-index": 4, "rate-tc-bw": 0}, {"rate-tc-index": 5, "rate-tc-bw": 0}, {"rate-tc-index": 6, "rate-tc-bw": 0}, {"rate-tc-index": 7, "rate-tc-bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0}, {'rate-tc-bw': 50, 'rate-tc-index': 1}, {'rate-tc-bw': 0, 'rate-tc-index': 2}, {'rate-tc-bw': 0, 'rate-tc-index': 3}, {'rate-tc-bw': 0, 'rate-tc-index': 4}, {'rate-tc-bw': 0, 'rate-tc-index': 5}, {'rate-tc-bw': 0, 'rate-tc-index': 6}, {'rate-tc-bw': 0, 'rate-tc-index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Signed-off-by: Carolina Jubran Reviewed-by: Cosmin Ratiu Reviewed-by: Jiri Pirko Signed-off-by: Tariq Toukan --- Documentation/netlink/specs/devlink.yaml | 36 ++++- .../networking/devlink/devlink-port.rst | 7 + include/net/devlink.h | 9 ++ include/uapi/linux/devlink.h | 4 + net/devlink/netlink_gen.c | 16 ++- net/devlink/netlink_gen.h | 2 + net/devlink/rate.c | 127 ++++++++++++++++++ 7 files changed, 196 insertions(+), 5 deletions(-) diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netli= nk/specs/devlink.yaml index bd9726269b4f..64b6aaa02047 100644 --- a/Documentation/netlink/specs/devlink.yaml +++ b/Documentation/netlink/specs/devlink.yaml @@ -202,6 +202,11 @@ definitions: name: exception - name: control + - + name: devlink-rate-tc-index-max + header: net/devlink.h + type: const + value: 7 =20 attribute-sets: - @@ -820,7 +825,26 @@ attribute-sets: - name: region-direct type: flag - + - + name: rate-tc-bws + type: nest + multi-attr: true + nested-attributes: dl-rate-tc-bws + - + name: rate-tc-index + type: u8 + checks: + min: 0 + max: devlink-rate-tc-index-max + - + name: rate-tc-bw + type: u32 + doc: | + Specifies the bandwidth allocation for the Traffic Class as a + percentage. + checks: + min: 0 + max: 100 - name: dl-dev-stats subset-of: devlink @@ -1225,6 +1249,14 @@ attribute-sets: - name: flash type: flag + - + name: dl-rate-tc-bws + subset-of: devlink + attributes: + - + name: rate-tc-index + - + name: rate-tc-bw =20 operations: enum-model: directional @@ -2150,6 +2182,7 @@ operations: - rate-tx-priority - rate-tx-weight - rate-parent-node-name + - rate-tc-bws =20 - name: rate-new @@ -2170,6 +2203,7 @@ operations: - rate-tx-priority - rate-tx-weight - rate-parent-node-name + - rate-tc-bws =20 - name: rate-del diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentat= ion/networking/devlink/devlink-port.rst index 9d22d41a7cd1..bc3b41ac2d51 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -418,6 +418,13 @@ API allows to configure following rate object's parame= ters: to all node children limits. ``tx_max`` is an upper limit for children. ``tx_share`` is a total bandwidth distributed among children. =20 +``tc_bw`` + Allow users to set the bandwidth allocation per traffic class on rate + objects. This enables fine-grained QoS configurations by assigning speci= fic + bandwidth percentages to different traffic classes. When applied to a + non-leaf node, tc_bw determines how bandwidth is shared among its child + elements. + ``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case nodes with the same priority form a WFQ subgroup in the sibling group and arbitration among them is based on assigned weights. diff --git a/include/net/devlink.h b/include/net/devlink.h index b8783126c1ed..1b7fa11b5841 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -20,6 +20,7 @@ #include #include #include +#include =20 struct devlink; struct devlink_linecard; @@ -99,6 +100,8 @@ struct devlink_port_attrs { }; }; =20 +#define DEVLINK_RATE_TC_INDEX_MAX (IEEE_8021QAZ_MAX_TCS - 1) + struct devlink_rate { struct list_head list; enum devlink_rate_type type; @@ -118,6 +121,8 @@ struct devlink_rate { =20 u32 tx_priority; u32 tx_weight; + + u32 tc_bw[IEEE_8021QAZ_MAX_TCS]; }; =20 struct devlink_port { @@ -1482,6 +1487,8 @@ struct devlink_ops { u32 tx_priority, struct netlink_ext_ack *extack); int (*rate_leaf_tx_weight_set)(struct devlink_rate *devlink_rate, void *p= riv, u32 tx_weight, struct netlink_ext_ack *extack); + int (*rate_leaf_tc_bw_set)(struct devlink_rate *devlink_rate, void *priv, + u32 *tc_bw, struct netlink_ext_ack *extack); int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *pr= iv, u64 tx_share, struct netlink_ext_ack *extack); int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv, @@ -1490,6 +1497,8 @@ struct devlink_ops { u32 tx_priority, struct netlink_ext_ack *extack); int (*rate_node_tx_weight_set)(struct devlink_rate *devlink_rate, void *p= riv, u32 tx_weight, struct netlink_ext_ack *extack); + int (*rate_node_tc_bw_set)(struct devlink_rate *devlink_rate, void *priv, + u32 *tc_bw, struct netlink_ext_ack *extack); int (*rate_node_new)(struct devlink_rate *rate_node, void **priv, struct netlink_ext_ack *extack); int (*rate_node_del)(struct devlink_rate *rate_node, void *priv, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 9401aa343673..b3b538c67c34 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -614,6 +614,10 @@ enum devlink_attr { =20 DEVLINK_ATTR_REGION_DIRECT, /* flag */ =20 + DEVLINK_ATTR_RATE_TC_BWS, /* nested */ + DEVLINK_ATTR_RATE_TC_INDEX, /* u8 */ + DEVLINK_ATTR_RATE_TC_BW, /* u32 */ + /* Add new attributes above here, update the spec in * Documentation/netlink/specs/devlink.yaml and re-generate * net/devlink/netlink_gen.c. diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c index f9786d51f68f..186f31522af0 100644 --- a/net/devlink/netlink_gen.c +++ b/net/devlink/netlink_gen.c @@ -9,6 +9,7 @@ #include "netlink_gen.h" =20 #include +#include =20 /* Common nested types */ const struct nla_policy devlink_dl_port_function_nl_policy[DEVLINK_PORT_FN= _ATTR_CAPS + 1] =3D { @@ -18,6 +19,11 @@ const struct nla_policy devlink_dl_port_function_nl_poli= cy[DEVLINK_PORT_FN_ATTR_ [DEVLINK_PORT_FN_ATTR_CAPS] =3D NLA_POLICY_BITFIELD32(15), }; =20 +const struct nla_policy devlink_dl_rate_tc_bws_nl_policy[DEVLINK_ATTR_RATE= _TC_BW + 1] =3D { + [DEVLINK_ATTR_RATE_TC_INDEX] =3D NLA_POLICY_RANGE(NLA_U8, 0, DEVLINK_RATE= _TC_INDEX_MAX), + [DEVLINK_ATTR_RATE_TC_BW] =3D NLA_POLICY_RANGE(NLA_U32, 0, 100), +}; + const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELF= TEST_ID_FLASH + 1] =3D { [DEVLINK_ATTR_SELFTEST_ID_FLASH] =3D { .type =3D NLA_FLAG, }, }; @@ -496,7 +502,7 @@ static const struct nla_policy devlink_rate_get_dump_nl= _policy[DEVLINK_ATTR_DEV_ }; =20 /* DEVLINK_CMD_RATE_SET - do */ -static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RAT= E_TX_WEIGHT + 1] =3D { +static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RAT= E_TC_BWS + 1] =3D { [DEVLINK_ATTR_BUS_NAME] =3D { .type =3D NLA_NUL_STRING, }, [DEVLINK_ATTR_DEV_NAME] =3D { .type =3D NLA_NUL_STRING, }, [DEVLINK_ATTR_RATE_NODE_NAME] =3D { .type =3D NLA_NUL_STRING, }, @@ -505,10 +511,11 @@ static const struct nla_policy devlink_rate_set_nl_po= licy[DEVLINK_ATTR_RATE_TX_W [DEVLINK_ATTR_RATE_TX_PRIORITY] =3D { .type =3D NLA_U32, }, [DEVLINK_ATTR_RATE_TX_WEIGHT] =3D { .type =3D NLA_U32, }, [DEVLINK_ATTR_RATE_PARENT_NODE_NAME] =3D { .type =3D NLA_NUL_STRING, }, + [DEVLINK_ATTR_RATE_TC_BWS] =3D NLA_POLICY_NESTED(devlink_dl_rate_tc_bws_n= l_policy), }; =20 /* DEVLINK_CMD_RATE_NEW - do */ -static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RAT= E_TX_WEIGHT + 1] =3D { +static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RAT= E_TC_BWS + 1] =3D { [DEVLINK_ATTR_BUS_NAME] =3D { .type =3D NLA_NUL_STRING, }, [DEVLINK_ATTR_DEV_NAME] =3D { .type =3D NLA_NUL_STRING, }, [DEVLINK_ATTR_RATE_NODE_NAME] =3D { .type =3D NLA_NUL_STRING, }, @@ -517,6 +524,7 @@ static const struct nla_policy devlink_rate_new_nl_poli= cy[DEVLINK_ATTR_RATE_TX_W [DEVLINK_ATTR_RATE_TX_PRIORITY] =3D { .type =3D NLA_U32, }, [DEVLINK_ATTR_RATE_TX_WEIGHT] =3D { .type =3D NLA_U32, }, [DEVLINK_ATTR_RATE_PARENT_NODE_NAME] =3D { .type =3D NLA_NUL_STRING, }, + [DEVLINK_ATTR_RATE_TC_BWS] =3D NLA_POLICY_NESTED(devlink_dl_rate_tc_bws_n= l_policy), }; =20 /* DEVLINK_CMD_RATE_DEL - do */ @@ -1164,7 +1172,7 @@ const struct genl_split_ops devlink_nl_ops[74] =3D { .doit =3D devlink_nl_rate_set_doit, .post_doit =3D devlink_nl_post_doit, .policy =3D devlink_rate_set_nl_policy, - .maxattr =3D DEVLINK_ATTR_RATE_TX_WEIGHT, + .maxattr =3D DEVLINK_ATTR_RATE_TC_BWS, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, { @@ -1174,7 +1182,7 @@ const struct genl_split_ops devlink_nl_ops[74] =3D { .doit =3D devlink_nl_rate_new_doit, .post_doit =3D devlink_nl_post_doit, .policy =3D devlink_rate_new_nl_policy, - .maxattr =3D DEVLINK_ATTR_RATE_TX_WEIGHT, + .maxattr =3D DEVLINK_ATTR_RATE_TC_BWS, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, { diff --git a/net/devlink/netlink_gen.h b/net/devlink/netlink_gen.h index 8f2bd50ddf5e..e3558cf89be4 100644 --- a/net/devlink/netlink_gen.h +++ b/net/devlink/netlink_gen.h @@ -10,9 +10,11 @@ #include =20 #include +#include =20 /* Common nested types */ extern const struct nla_policy devlink_dl_port_function_nl_policy[DEVLINK_= PORT_FN_ATTR_CAPS + 1]; +extern const struct nla_policy devlink_dl_rate_tc_bws_nl_policy[DEVLINK_AT= TR_RATE_TC_BW + 1]; extern const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_AT= TR_SELFTEST_ID_FLASH + 1]; =20 /* Ops table for devlink */ diff --git a/net/devlink/rate.c b/net/devlink/rate.c index 8828ffaf6cbc..aff5682aead7 100644 --- a/net/devlink/rate.c +++ b/net/devlink/rate.c @@ -80,6 +80,29 @@ devlink_rate_get_from_info(struct devlink *devlink, stru= ct genl_info *info) return ERR_PTR(-EINVAL); } =20 +static int devlink_rate_put_tc_bws(struct sk_buff *msg, u32 *tc_bw) +{ + struct nlattr *nla_tc_bw; + int i; + + for (i =3D 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + nla_tc_bw =3D nla_nest_start(msg, DEVLINK_ATTR_RATE_TC_BWS); + if (!nla_tc_bw) + return -EMSGSIZE; + + if (nla_put_u8(msg, DEVLINK_ATTR_RATE_TC_INDEX, i) || + nla_put_u32(msg, DEVLINK_ATTR_RATE_TC_BW, tc_bw[i])) + goto nla_put_failure; + + nla_nest_end(msg, nla_tc_bw); + } + return 0; + +nla_put_failure: + nla_nest_cancel(msg, nla_tc_bw); + return -EMSGSIZE; +} + static int devlink_nl_rate_fill(struct sk_buff *msg, struct devlink_rate *devlink_rate, enum devlink_command cmd, u32 portid, u32 seq, @@ -129,6 +152,9 @@ static int devlink_nl_rate_fill(struct sk_buff *msg, devlink_rate->parent->name)) goto nla_put_failure; =20 + if (devlink_rate_put_tc_bws(msg, devlink_rate->tc_bw)) + goto nla_put_failure; + genlmsg_end(msg, hdr); return 0; =20 @@ -316,6 +342,89 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *d= evlink_rate, return 0; } =20 +static int devlink_nl_rate_tc_bw_parse(struct nlattr *parent_nest, u32 *tc= _bw, + unsigned long *bitmap, struct netlink_ext_ack *extack) +{ + struct nlattr *tb[DEVLINK_ATTR_MAX + 1]; + u8 tc_index; + + nla_parse_nested(tb, DEVLINK_ATTR_MAX, parent_nest, devlink_dl_rate_tc_bw= s_nl_policy, + extack); + if (!tb[DEVLINK_ATTR_RATE_TC_INDEX]) { + NL_SET_ERR_ATTR_MISS(extack, parent_nest, DEVLINK_ATTR_RATE_TC_INDEX); + return -EINVAL; + } + + tc_index =3D nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]); + + if (!tb[DEVLINK_ATTR_RATE_TC_BW]) { + NL_SET_ERR_ATTR_MISS(extack, parent_nest, DEVLINK_ATTR_RATE_TC_BW); + return -EINVAL; + } + + if (test_and_set_bit(tc_index, bitmap)) { + NL_SET_ERR_MSG_FMT(extack, "Duplicate traffic class index specified (%u)= ", + tc_index); + return -EINVAL; + } + + tc_bw[tc_index] =3D nla_get_u32(tb[DEVLINK_ATTR_RATE_TC_BW]); + + return 0; +} + +static int devlink_nl_rate_tc_bw_set(struct devlink_rate *devlink_rate, + struct genl_info *info) +{ + DECLARE_BITMAP(bitmap, IEEE_8021QAZ_MAX_TCS) =3D {}; + struct devlink *devlink =3D devlink_rate->devlink; + const struct devlink_ops *ops =3D devlink->ops; + int rem, err =3D -EOPNOTSUPP, i, total =3D 0; + u32 tc_bw[IEEE_8021QAZ_MAX_TCS] =3D {}; + struct nlattr *attr; + + nla_for_each_attr(attr, genlmsg_data(info->genlhdr), + genlmsg_len(info->genlhdr), rem) { + if (nla_type(attr) =3D=3D DEVLINK_ATTR_RATE_TC_BWS) { + err =3D devlink_nl_rate_tc_bw_parse(attr, tc_bw, bitmap, info->extack); + if (err) + return err; + } + } + + for (i =3D 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + if (!test_bit(i, bitmap)) { + NL_SET_ERR_MSG_FMT(info->extack, + "Bandwidth values must be specified for all %u traffic classes", + IEEE_8021QAZ_MAX_TCS); + return -EINVAL; + } + + total +=3D tc_bw[i]; + } + + if (total && total !=3D 100) { + NL_SET_ERR_MSG_FMT(info->extack, + "Sum of all traffic class bandwidth values must be 100, got %u", + total); + return -EINVAL; + } + + if (devlink_rate_is_leaf(devlink_rate)) + err =3D ops->rate_leaf_tc_bw_set(devlink_rate, devlink_rate->priv, tc_bw, + info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err =3D ops->rate_node_tc_bw_set(devlink_rate, devlink_rate->priv, tc_bw, + info->extack); + + if (err) + return err; + + memcpy(devlink_rate->tc_bw, tc_bw, sizeof(tc_bw)); + + return 0; +} + static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, const struct devlink_ops *ops, struct genl_info *info) @@ -388,6 +497,12 @@ static int devlink_nl_rate_set(struct devlink_rate *de= vlink_rate, return err; } =20 + if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) { + err =3D devlink_nl_rate_tc_bw_set(devlink_rate, info); + if (err) + return err; + } + return 0; } =20 @@ -423,6 +538,12 @@ static bool devlink_rate_set_ops_supported(const struc= t devlink_ops *ops, "TX weight set isn't supported for the leafs"); return false; } + if (attrs[DEVLINK_ATTR_RATE_TC_BWS] && !ops->rate_leaf_tc_bw_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TC_BWS], + "TC bandwidth set isn't supported for the leafs"); + return false; + } } else if (type =3D=3D DEVLINK_RATE_TYPE_NODE) { if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) { NL_SET_ERR_MSG(info->extack, "TX share set isn't supported for the node= s"); @@ -449,6 +570,12 @@ static bool devlink_rate_set_ops_supported(const struc= t devlink_ops *ops, "TX weight set isn't supported for the nodes"); return false; } + if (attrs[DEVLINK_ATTR_RATE_TC_BWS] && !ops->rate_node_tc_bw_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TC_BWS], + "TC bandwidth set isn't supported for the nodes"); + return false; + } } else { WARN(1, "Unknown type of rate object"); return false; --=20 2.31.1 From nobody Tue Dec 16 15:29:11 2025 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2042.outbound.protection.outlook.com [40.107.100.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B838327AC38; Tue, 6 May 2025 11:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.100.42 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530860; cv=fail; b=my2qhegNMNQPLFanu20R3sJm4pg68yGVKgrVIoq1DE8W0lw3rNwZtHW1h7JfjPp+/tYcK068gDDcjKYxXXjENqv10spIAy8RpjMZWzumNvHxoAw6EIbdET+ne7z/vkRl1rF/UdKAteM2nROvuyM76k45vgvy7gFzHSQocTgCeWo= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530860; c=relaxed/simple; bh=Te1qEdlgvowDoGwvowaLWk6OdMjCp6ayGmekHU/Q+oo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VKIlwFNF/aQhHQ3dFwWwpES+8OoAC0F6mQ9+M72czRsjSkMGTdd78+NIuDNWby4cUt+pfXsuzl7Cah8Lbcba0PC0U4u61Itrqd/1ADipCalSoKXaTp5uVNvNUbthmVjWYKdgf3HKAuUa6NteG65OfhYoaoRmV1kRRy0aWLYktco= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=QsFmeae+; arc=fail smtp.client-ip=40.107.100.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="QsFmeae+" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=qvo9lJZlQyu045lbGBI4k+A4dBbwXmRT4+wBGss7Fwyl47bvrDTyn+UxBZZil0/w4ECWGNvdKXvS/SwNk+4Za4PIlslTUuhK8J7lAsNH3fwnvbsTw+RDIBbVTLx1hblQsfxUGaMMRKuLMUftamLFHDLeiKx1cgqK6hTbo3qVKrpBE4wNJZBR54plFxxepZ/pCCoSk92arQnxHmTAluNzo7Vwx4sPc1pU514h+5bptFsc97Zo0IHL1gDr/W1RJyntbBQizxEhCsO8HuCbBF7My31achKsNhhvJ7/EooeJJzTXvebqWtm6bdvyGRzOQgjXkmyGfECdupm/qKQXC0Yczw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=M/Cc1Ca5Hd2X09GMSLP3D3QneVtydXPAbWn22kqPAiw=; b=tB6XAx5fBB2jXxSZUDSVadX87gCdxJuBV0ZcVIFnfavl3qgHbWbnWX1HRqrvQU1yJ3+kcCM5CxyWmI9mj60Ov3uzWex9BLBmRDDoZsOGgmkY0Ml6j7z7EYNIfLtwsWpXlLzYBKOrmHY5xUV9gMXw9dLPY9bvDZUhkXF2VK9K3UIA3MsN65QT4LK0UcMH58J9m9twsbISyTLbvoVA5lDJjh9slhBC1efrOJZj5g+GTxKiMotPE/gJ1GuhGrXmvwdkkvHiRHRrl2yfAnRGinUgqKQBVJY5f1+ptMUwjGM453KQVP+cLOcVSqP5b/IkjytVvJClBEhiExHXHVj1v7Eg2g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=M/Cc1Ca5Hd2X09GMSLP3D3QneVtydXPAbWn22kqPAiw=; b=QsFmeae+h1atCxp2JX2khfCzeK1dxKR2jpPS/J995lz8Kl5DEUDd+9PeV9K2ACdSrxg1oqGidkuykxrv6fFcfXSYvFXX69Ppny4mnXAN6KAJiVc/TdqegAxMnOpMFw5tKfiF7atgoV2of2B4eEQGcIZlk/0qbAMLdHa1DXCRw2BtP6afHcnK+BBuBVAku5GxZVs0kbvNC7/cNh7OYBCYUuTsdXfIvqkDFY1IYTXHuyyyTTRBOkFTVoUyvgw674pGCMiAhnftfhwIKaiPGaFX10cT4z5AtJSPyuvCCEISIVzeDly67SXjqDfgmD0w/os8It7oi8vVdy0aUeZKJxaqzg== Received: from SJ0PR03CA0177.namprd03.prod.outlook.com (2603:10b6:a03:338::32) by PH7PR12MB8037.namprd12.prod.outlook.com (2603:10b6:510:27d::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8699.19; Tue, 6 May 2025 11:27:31 +0000 Received: from SJ1PEPF00001CEA.namprd03.prod.outlook.com (2603:10b6:a03:338:cafe::49) by SJ0PR03CA0177.outlook.office365.com (2603:10b6:a03:338::32) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8699.24 via Frontend Transport; Tue, 6 May 2025 11:27:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by SJ1PEPF00001CEA.mail.protection.outlook.com (10.167.242.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.18 via Frontend Transport; Tue, 6 May 2025 11:27:31 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Tue, 6 May 2025 04:27:19 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Tue, 6 May 2025 04:27:20 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Tue, 6 May 2025 04:27:16 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jiri Pirko , Gal Pressman , "Leon Romanovsky" , Donald Hunter , "Jiri Pirko" , Jonathan Corbet , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , , , , , Moshe Shemesh , Mark Bloch , Carolina Jubran , Cosmin Ratiu Subject: [PATCH net-next V8 2/5] net/mlx5: Add no-op implementation for setting tc-bw on rate objects Date: Tue, 6 May 2025 14:26:40 +0300 Message-ID: <1746530803-450152-3-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> References: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00001CEA:EE_|PH7PR12MB8037:EE_ X-MS-Office365-Filtering-Correlation-Id: 4a2c94ab-3e1b-43c5-404a-08dd8c90fd75 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700013|376014|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?K9SlgPC4zk66SX7E1D6oKTAeSwMLiQfYWRTgJrPXRCPy5Jx+LDh09XnnSXx7?= =?us-ascii?Q?jHomPzzFSEabDcqBEvxeq64pFz7ZQz4ZFhyzvdHarJdmg/2mK7XLKOJLT5Nm?= =?us-ascii?Q?3UJ2C+XIZdp2E32dsgJASmIvQ/Pb7BcplHCbUNpT6yFj+X6cWzgJcRNBR7Of?= =?us-ascii?Q?TKgMkeCwEfAKyKyVsf3zS/x8t2wC8st6caK7dOEq73UMaH9x8VXKDX4L+F5T?= =?us-ascii?Q?buHvTMR6Q7wp5wMZ/CmXD8J1TwDi3M0ILAR2vG2BZGmN9Rh3n04RPGZ0sdmU?= =?us-ascii?Q?WSCWn9qbK6qKYHgRB/oDASjELTStDbqCBeVSv6LV7EVciifSO04APqxYQSKr?= =?us-ascii?Q?Gp1zQlkRF66oliKYOGr7N1DgyELkTiN2c9ldyGBXZE4N2odZ0kmjpQTkFjaW?= =?us-ascii?Q?fMaG1TQyFtd8VIZuibXIxdMfypUJ53fjZbZ0NpniueUIRYjqvTT3QE7NDEMQ?= =?us-ascii?Q?bL6ByR28MhkoQwL5WvvhyYOrMNkrg7muYJjTNobgf3D75Jr0q6k5loYAvclg?= =?us-ascii?Q?7JGqELwhQEXSJt/ToK8CwZgqjR5EKv3YzF7phQf43tJe+k+DYeceiYYzLAuY?= =?us-ascii?Q?n0FigNlBx+MHe2u/sGSW7HrebW+HXkNdQrg/FpjHQ96MHT0eegHDoc96uDkB?= =?us-ascii?Q?5P26CT8xuwY/hQaugRWYmt58YCQ+EFxbbYBfCryD6GEreo1b7l6pg1IDO+JA?= =?us-ascii?Q?i10PLsPQYl5Jbfs/fL5+UJKksjRkLC81Ul1fKWPbF3vZozg33ZgypVm7iOy8?= =?us-ascii?Q?d/uAygn5tSLJW//58LbI0P952hXoDkYQMvfeTlKWPXFYPWC0VxEK+wi61aSM?= =?us-ascii?Q?LUsFj+9lpFJlxHyEOks0DlwNwZmJarNujegJwEkr2CU5Rr00wzqb1XYlQeHt?= =?us-ascii?Q?vR19KM+XaA9YEnOQa4PgnFd++T1a9Czsy9GyzJn8AOPNUauM5CGzEOYkwJpB?= =?us-ascii?Q?sgxUWFvHK0eu/0rPM7K2ImYHLqphdIoD0VvvotCVibqaM9LbS9RpSDx6AGNw?= =?us-ascii?Q?6nV78s3waMTIbw26Sfwk4bWpIVLPeS7LXUAscY+bakSJylh3YRL+vIdctAQL?= =?us-ascii?Q?zYJ1iNb/Aop7J1++b6IxRdWb00NfqO8MzLnRQx4H2tBOwuTJS7TlA+7Hcmbq?= =?us-ascii?Q?PhLjB1267mcxkFeodF70rtiECrBtEvSS15eAo+1V2ujbQkkW+dlGgRw0ryrW?= =?us-ascii?Q?UX67HrInvN89rE3vl+vUz7DwsIizzjuYToAa3tTr0ahSIZO367bQRHy7UinN?= =?us-ascii?Q?r5+YmaBStezLWovkEmt3MupV8R7kPu8dziVOQIHx0/RjEpgp3WeX1Lgbzn/t?= =?us-ascii?Q?sM029oJdvTNhkNF3iqCRawVqE85GAkjV89VlalA++khInoM/wgOEbFNle337?= =?us-ascii?Q?+oyheIhfOHL0bqCNADpk/5MvHy26pIQcznU0qP4ap3WFT+T0AOlG7KWPx8Zs?= =?us-ascii?Q?zRuUmE7J95+xfe3C20CXJL+EIIqcHF1Sz4NIkb/nWK9IVez6Xl6MV1yX++bj?= =?us-ascii?Q?U55j9RdT//cnbnsaQ7ejw67IVKUSw5tvceJn?= X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700013)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 May 2025 11:27:31.1705 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4a2c94ab-3e1b-43c5-404a-08dd8c90fd75 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00001CEA.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB8037 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Carolina Jubran Introduce `mlx5_esw_devlink_rate_node_tc_bw_set()` and `mlx5_esw_devlink_rate_leaf_tc_bw_set()` with no-op logic. Future patches will add support for setting traffic class bandwidth on rate objects. Signed-off-by: Carolina Jubran Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 2 ++ .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 20 +++++++++++++++++++ .../net/ethernet/mellanox/mlx5/core/esw/qos.h | 8 ++++++++ 3 files changed, 30 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/devlink.c index 73cd74644378..47d3acd011cf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -323,6 +323,8 @@ static const struct devlink_ops mlx5_devlink_ops =3D { .eswitch_encap_mode_get =3D mlx5_devlink_eswitch_encap_mode_get, .rate_leaf_tx_share_set =3D mlx5_esw_devlink_rate_leaf_tx_share_set, .rate_leaf_tx_max_set =3D mlx5_esw_devlink_rate_leaf_tx_max_set, + .rate_leaf_tc_bw_set =3D mlx5_esw_devlink_rate_leaf_tc_bw_set, + .rate_node_tc_bw_set =3D mlx5_esw_devlink_rate_node_tc_bw_set, .rate_node_tx_share_set =3D mlx5_esw_devlink_rate_node_tx_share_set, .rate_node_tx_max_set =3D mlx5_esw_devlink_rate_node_tx_max_set, .rate_node_new =3D mlx5_esw_devlink_rate_node_new, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/esw/qos.c index b6ae384396b3..ec706e9352e1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c @@ -906,6 +906,26 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devli= nk_rate *rate_leaf, void * return err; } =20 +int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, + void *priv, + u32 *tc_bw, + struct netlink_ext_ack *extack) +{ + NL_SET_ERR_MSG_MOD(extack, + "TC bandwidth shares are not supported on leafs"); + return -EOPNOTSUPP; +} + +int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, + void *priv, + u32 *tc_bw, + struct netlink_ext_ack *extack) +{ + NL_SET_ERR_MSG_MOD(extack, + "TC bandwidth shares are not supported on nodes"); + return -EOPNOTSUPP; +} + int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node= , void *priv, u64 tx_share, struct netlink_ext_ack *extack) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h b/drivers/ne= t/ethernet/mellanox/mlx5/core/esw/qos.h index ed40ec8f027e..0a50982b0e27 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h @@ -21,6 +21,14 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devli= nk_rate *rate_leaf, void u64 tx_share, struct netlink_ext_ack *extack); int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, = void *priv, u64 tx_max, struct netlink_ext_ack *extack); +int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_node, + void *priv, + u32 *tc_bw, + struct netlink_ext_ack *extack); +int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, + void *priv, + u32 *tc_bw, + struct netlink_ext_ack *extack); int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node= , void *priv, u64 tx_share, struct netlink_ext_ack *extack); int mlx5_esw_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, = void *priv, --=20 2.31.1 From nobody Tue Dec 16 15:29:11 2025 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2065.outbound.protection.outlook.com [40.107.223.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A52351C84C5; Tue, 6 May 2025 11:27:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.223.65 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530870; cv=fail; b=qCXtYC2OhdMbYjzUQL3jnFAcGlwnBCGxW2lAWTKfjzrSU8lPYabHbt1VIjNaS2NVykmvArU9rW6I7htG0m40CXL/htebolSp/fgWnZriJ3bO/cVcxqoBzM+2LC/XSosq5/Wyvw6k4N72yLfIMX9Sh4RhGxhH72aJjOiv3zYiBjA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530870; c=relaxed/simple; bh=lgJbu0ECHG9aeQY2NbNtz+wdtOSHCxmpoIVqmO2UCDE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rGOsnGYkopSgyobPsq9ISHepkfNnxzCukzh53dbZe5zWBPJhpv79WM8qmnq7sDO0RoaRmb8/ovA4eMjRCHr35cmrj9/QfGPXpEacVhJYxreBlwem3Bej9Vkx/QfUkmh9qodvYorD/nTvLE7tj2zLNqfVjUExNsVLaw4lroUkMIM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=cgC357U1; arc=fail smtp.client-ip=40.107.223.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="cgC357U1" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=QgjFJT0kZODeJ1q+olBbGDVXh33PKoy6OKUJ2LQh+/tDeLkZBxQ/HvBJuEDoS44Xm+WAKtrTl0KL6K7QCAxPTPFm4YC+Zbr9VfJsWE15/ItMYlaMcNXYiAEvEg4wZFjdlh9o5pVcM54yKYdBWDbwJCp7Axn/ElTOgbHJqWv1LDzT5LhkMiHVUhmPX2Jn42EOZqThvmqhnr0BaJLZEe0QQxFPr3tJuxA3uM87pYd+0rDjPC+F8CfdJ/iFBRdPODHaqaLqzbdWHFs/9YBP3AUOvvIbp2iozI/wwK7ueyYfwVNauS8hnOZo847edf2Q/+Duu595SRsJEOVCQg5q08xxvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Z42hs6IrNvRxY/wJhSsvCbZ9dNLF+C9zgVMbCjfhXOA=; b=xlSzvFkKd6/UGH8DvvGwgIf4uhYudcyNKuWgv6ik/d9Tpq7J6KSL3VDtoO2sV/Zpb2/wslOWRFZi/9etDDMlA38O6d/RFk2WKb8lbarlu2ReLBPoGyqKq+peZ1+BOewUkQaR2AaAg3WfsAJLGAX2oBjCfSbE3HuVl9Elt7oN1ClfaOnxyG/cBnExfftJnkPq2DzZJRFpm165Zxbt8ecQlOy9stHKYI17WZYdxdx2yaVJgThYr4/l6ZE0SCy3YN5wFS4/K05mC0Vd0/mBcsVWREon/PHiVUqTCC69hL2chdh8VJSQyzZdESNBj4SJWgrtxkGsf2E/cVOeYLGSUj8+Mg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Z42hs6IrNvRxY/wJhSsvCbZ9dNLF+C9zgVMbCjfhXOA=; b=cgC357U1acxdgph+F9X0dqVLZRaljWS4vUgCDeIROb+jYvnUaXUtCIdi8IG2Zz2T3YB0TDBC28wEq3RoypsJcgUBf8eNtWeuxFBH0N48FCIfF+BvRcqAXa+r3NdT7NWJs2nAp46WGH14Wp+nzVsfXcssi4jZAF6jg+q0MiHclLY+DtUREnYpOKJCXhhol2hXdJ4vItQIqHRXe50NrWB/CAuHHyXj34d8laas/PnT/2S6oJmIcpHocBBhy7Pkqdxpn/ZVTsR5pIOcgTDtJpOEUVLdMg3nipeOoeG9473upD12CKUFbwXVPYzR1gHSlQ7H52dkOUdNvMSCgS7wqjCubg== Received: from DM6PR02CA0135.namprd02.prod.outlook.com (2603:10b6:5:1b4::37) by DM4PR12MB9069.namprd12.prod.outlook.com (2603:10b6:8:b8::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8699.22; Tue, 6 May 2025 11:27:37 +0000 Received: from CY4PEPF0000EE36.namprd05.prod.outlook.com (2603:10b6:5:1b4:cafe::fd) by DM6PR02CA0135.outlook.office365.com (2603:10b6:5:1b4::37) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8699.31 via Frontend Transport; Tue, 6 May 2025 11:27:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CY4PEPF0000EE36.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.18 via Frontend Transport; Tue, 6 May 2025 11:27:37 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Tue, 6 May 2025 04:27:24 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Tue, 6 May 2025 04:27:25 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Tue, 6 May 2025 04:27:21 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jiri Pirko , Gal Pressman , "Leon Romanovsky" , Donald Hunter , "Jiri Pirko" , Jonathan Corbet , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , , , , , Moshe Shemesh , Mark Bloch , Carolina Jubran , Cosmin Ratiu Subject: [PATCH net-next V8 3/5] net/mlx5: Add support for setting tc-bw on nodes Date: Tue, 6 May 2025 14:26:41 +0300 Message-ID: <1746530803-450152-4-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> References: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE36:EE_|DM4PR12MB9069:EE_ X-MS-Office365-Filtering-Correlation-Id: 69fb962e-8975-4d92-d6f4-08dd8c910114 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|1800799024|376014|7416014; X-Microsoft-Antispam-Message-Info: =?utf-8?B?a1FURlFJYWtHQTBpMVZtWjFkaTcvRzNvekV6anFZNVdudjVKT3R4L3JtOXNl?= =?utf-8?B?MGgybStxRHZvc3BjWWZhQkU4MitjOEpjVk1PdlVldjl3VWxsRmxqVi90NXVR?= =?utf-8?B?UDFlUnBoN3d6M2x0c2kwRjk1RW9OUDFtQUZxL01uM1lmRkVjeEJFOHRsRlo2?= =?utf-8?B?OTZZMzN2QzlOQnpzclBxcGZxYVBTT2RldXdSRCtKclRpUmFReExUUUhxQnRQ?= =?utf-8?B?ZHJpK21LRUNBNmptVVJhOWQ2NVZmUG5seUlCU3ZGWHZsMHBBTDBycVdESnIv?= =?utf-8?B?R3BodGVlTXViTHNqeVlnR3BwWC8xVVFyemNUVzRPTlFNVVkrYWw2RHJtdHNv?= =?utf-8?B?NmVoSjBtS1IreFdhcE9lSm95dzVJYlpwTFdCVnJoNVpaaTRadGdjUHpKeDll?= =?utf-8?B?aWdzZVdLbHQ2cFZwSFBCc0U2ZGtWOEtRYUtiOUJEazRTcVpsTlBTWjRYUllh?= =?utf-8?B?WEZGY3R2eWZvWTJhak1GTlNTMWYycDlDZ2VISTdPNWZnYnhiOStVZjRwcnJB?= =?utf-8?B?NWpicGI4UXdydXJIeHZlZ1ovWFNVMXdzTDN4Mys3ZjlodEVGRkpmM2VBSHlQ?= =?utf-8?B?bXVYb2F0Rzk3ckhUdXF4WmNlQnhycVNtR1JsbHV1N3hZMjA0bUIvWkdSQTFN?= =?utf-8?B?cHpYajVTNURIUHlncTByTzkwYlA3am9EcFNPZmM5U0UwUFRITHVGV05JKzdV?= =?utf-8?B?WHVmYTBvR08vK0RXMkJRbnNTZy8rRmFCWTB3enpzODRTZVpRQ2RVZFFxVml4?= =?utf-8?B?TVdOVXhCNERuODAvZjRxNzlZa3hQRFVKM09JMXBteUUzODByczA5Wi9YMk1M?= =?utf-8?B?UUY5YkhEd3hZbU90MTE1Y1NuMG1Dbm5iQ2tkR3FRV3BzbWdOL3JPNWRReE1l?= =?utf-8?B?WUxnQUYwd3FSeUlDU1NQdzBSa01GaW4xOVN4bUI0SExGaEEzMDlURk9KcWR6?= =?utf-8?B?cXFYQWF0bHlBdmRwNS8zd29OK2t6azZQTUdYMVRhNUI4c0trbDE5VG16RUV0?= =?utf-8?B?bjNscFBydnBmS1BlYmdRcVZrMEx5ZHE0TlZDWmpZVzZQUmxRMzVFVmVTeklS?= =?utf-8?B?Nm1veFR6UmwxU1YvUHpjcGN3Sk5Dc1NyQXZGMnNldnM5c3JZZ2RiN1NxazVz?= =?utf-8?B?aVRFUXFCcWVXV21pSyszMDh0U3pQV3haT2ozZ0w4bzlEOU92Y2JSYTJ4MmI5?= =?utf-8?B?Y20wSnZ3dWFpK0VXMHNIaHU4L1dHYkZIWjA5UHFOOUg5YzU2UDJUZ2JFSXEz?= =?utf-8?B?Mm1lNnh5UUtiV2lRODZoK0lDYkNuZi9uMDh4WkhVS3BTcVpiM0c1WFRtK3gx?= =?utf-8?B?bWpxS3NOTGFBcnBiQXk5OU94Nk45SXNsWXpnaGNTN3FsMFNXZmY0Tk1lTGJz?= =?utf-8?B?dEROd0ZnRlJtZm9RV3pyMldHNVhQRXZBTEZId3pwRTF0T0xsQTJXN0M4ZTcy?= =?utf-8?B?eG8vei9MWll6b3FpS0NmTmgvM2hERGR2Y1M3NnlwSHAyOGlaQzh0YnQ0VVoz?= =?utf-8?B?U3pCWTBKWEl0dzYweDZteEhwSkRxai9GbzhYYzNXano5dTdScHBHVVhpdGor?= =?utf-8?B?T0d1SUN5Z0tWUmwxUGJLUk5RaDZXNzducFFOU1F3OUk2ZVJGNTV0MDFlcHpw?= =?utf-8?B?aWtxWGRpOS9VREh3OVNZREt5ZFIyN1RIaWxNM0tkN3UyL3lLZ0cvWXQzemJE?= =?utf-8?B?MHZSUGsrSHNiSlVVaW9XZGliRm5MNHJ2eDgwNVlRWU91eHFtakN2N0tNU0N2?= =?utf-8?B?cWMvZUdGNzRUVi9tY201UWZQTW9SMUg0UTkycE5xQVNRY2VDbGQ4eUdrbC9v?= =?utf-8?B?bmw1d2tHbWl5Nkc4ZmxnZ1phOVlUekRrSTViM2I3VUhydGFHbVdFZzlydXI1?= =?utf-8?B?bWVXakZiZ1FlZytiaFhPd05OYnRQdHlvZWpycklXa1Nub1EvK1JmeThsN0pQ?= =?utf-8?B?MmN5aXIveWVBMXd4YXVYbzNUdUtaMmhPa09kdnJlTFJoamVETDF6OEFZa2ZN?= =?utf-8?B?eGlBV2d5dGkySE5ETEVPME1rVUtsbUs0YitSV2RKLzFreEE4dktLWlZ3aDZJ?= =?utf-8?Q?+LZl6W?= X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(1800799024)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 May 2025 11:27:37.1666 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 69fb962e-8975-4d92-d6f4-08dd8c910114 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE36.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB9069 From: Carolina Jubran Introduce support for enabling and disabling Traffic Class (TC) arbitration for existing devlink rate nodes. This patch adds support for a new scheduling node type, `SCHED_NODE_TYPE_TC_ARBITER_TSAR`. Key changes include: - New helper functions for transitioning existing rate nodes to TC arbiter nodes and vice versa. These functions handle the allocation of TC arbiter nodes, copying of child nodes, and restoring vport QoS settings when TC arbitration is disabled. - Implementation of `mlx5_esw_devlink_rate_node_tc_bw_set()` to manage tc-bw configuration on nodes. - Introduced stubs for `esw_qos_tc_arbiter_scheduling_setup()` and `esw_qos_tc_arbiter_scheduling_teardown()`, which will be extended in future patches to provide full support for tc-bw on devlink rate objects. - Validation functions for tc-bw settings, allowing graceful handling of unsupported traffic class bandwidth configurations. - Updated `__esw_qos_alloc_node()` to insert the new node into the parent=E2=80=99s children list only if the parent is not NULL. For the ro= ot TSAR, the new node is inserted directly after the allocation call. - Don't allow `tc-bw` configuration for nodes containing non-leaf children. This patch lays the groundwork for future support for configuring tc-bw on devlink rate nodes. Although the infrastructure is in place, full support for tc-bw is not yet implemented; attempts to set tc-bw on nodes will return `-EOPNOTSUPP`. No functional changes are introduced at this stage. Signed-off-by: Carolina Jubran Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan --- .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 313 +++++++++++++++++- 1 file changed, 304 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/esw/qos.c index ec706e9352e1..9a92121cb4cb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c @@ -64,11 +64,13 @@ static void esw_qos_domain_release(struct mlx5_eswitch = *esw) enum sched_node_type { SCHED_NODE_TYPE_VPORTS_TSAR, SCHED_NODE_TYPE_VPORT, + SCHED_NODE_TYPE_TC_ARBITER_TSAR, }; =20 static const char * const sched_node_type_str[] =3D { [SCHED_NODE_TYPE_VPORTS_TSAR] =3D "vports TSAR", [SCHED_NODE_TYPE_VPORT] =3D "vport", + [SCHED_NODE_TYPE_TC_ARBITER_TSAR] =3D "TC Arbiter TSAR", }; =20 struct mlx5_esw_sched_node { @@ -106,6 +108,13 @@ static void esw_qos_node_attach_to_parent(struct mlx5_= esw_sched_node *node) } } =20 +static int esw_qos_num_tcs(struct mlx5_core_dev *dev) +{ + int num_tcs =3D mlx5_max_tc(dev) + 1; + + return num_tcs < IEEE_8021QAZ_MAX_TCS ? num_tcs : IEEE_8021QAZ_MAX_TCS; +} + static void esw_qos_node_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_= sched_node *parent) { @@ -116,6 +125,27 @@ esw_qos_node_set_parent(struct mlx5_esw_sched_node *no= de, struct mlx5_esw_sched_ esw_qos_node_attach_to_parent(node); } =20 +static void esw_qos_nodes_set_parent(struct list_head *nodes, + struct mlx5_esw_sched_node *parent) +{ + struct mlx5_esw_sched_node *node, *tmp; + + list_for_each_entry_safe(node, tmp, nodes, entry) { + esw_qos_node_set_parent(node, parent); + if (!list_empty(&node->children) && + parent->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + struct mlx5_esw_sched_node *child; + + list_for_each_entry(child, &node->children, entry) { + struct mlx5_vport *vport =3D child->vport; + + if (vport) + vport->qos.sched_node->parent =3D parent; + } + } + } +} + void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport) { kfree(vport->qos.sched_node); @@ -141,16 +171,24 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport= *vport) =20 static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int = err, const char *op) { - if (node->vport) { + switch (node->type) { + case SCHED_NODE_TYPE_VPORT: esw_warn(node->esw->dev, "E-Switch %s %s scheduling element failed (vport=3D%d,err=3D%d)\n", op, sched_node_type_str[node->type], node->vport->vport, err); - return; + break; + case SCHED_NODE_TYPE_TC_ARBITER_TSAR: + case SCHED_NODE_TYPE_VPORTS_TSAR: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (err=3D%d)\n", + op, sched_node_type_str[node->type], err); + break; + default: + esw_warn(node->esw->dev, + "E-Switch %s scheduling element failed (err=3D%d)\n", + op, err); + break; } - - esw_warn(node->esw->dev, - "E-Switch %s %s scheduling element failed (err=3D%d)\n", - op, sched_node_type_str[node->type], err); } =20 static int esw_qos_node_create_sched_element(struct mlx5_esw_sched_node *n= ode, void *ctx, @@ -388,6 +426,14 @@ __esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsa= r_ix, enum sched_node_type node->parent =3D parent; INIT_LIST_HEAD(&node->children); esw_qos_node_attach_to_parent(node); + if (!parent) { + /* The caller is responsible for inserting the node into the + * parent list if necessary. This function can also be used with + * a NULL parent, which doesn't necessarily indicate that it + * refers to the root scheduling element. + */ + list_del_init(&node->entry); + } =20 return node; } @@ -426,6 +472,7 @@ __esw_qos_create_vports_sched_node(struct mlx5_eswitch = *esw, struct mlx5_esw_sch goto err_alloc_node; } =20 + list_add_tail(&node->entry, &esw->qos.domain->nodes); esw_qos_normalize_min_rate(esw, NULL, extack); trace_mlx5_esw_node_qos_create(esw->dev, node, node->ix); =20 @@ -498,6 +545,9 @@ static int esw_qos_create(struct mlx5_eswitch *esw, str= uct netlink_ext_ack *exta SCHED_NODE_TYPE_VPORTS_TSAR, NULL)) esw->qos.node0 =3D ERR_PTR(-ENOMEM); + else + list_add_tail(&esw->qos.node0->entry, + &esw->qos.domain->nodes); } if (IS_ERR(esw->qos.node0)) { err =3D PTR_ERR(esw->qos.node0); @@ -555,6 +605,18 @@ static void esw_qos_put(struct mlx5_eswitch *esw) esw_qos_destroy(esw); } =20 +static void +esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node, + struct netlink_ext_ack *extack) +{} + +static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node = *node, + struct netlink_ext_ack *extack) +{ + NL_SET_ERR_MSG_MOD(extack, "TC arbiter elements are not supported."); + return -EOPNOTSUPP; +} + static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink= _ext_ack *extack) { struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; @@ -723,6 +785,195 @@ static int esw_qos_vport_update_parent(struct mlx5_vp= ort *vport, struct mlx5_esw return err; } =20 +static void +esw_qos_switch_vport_tcs_to_vport(struct mlx5_esw_sched_node *tc_arbiter_n= ode, + struct mlx5_esw_sched_node *node, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vports_tc_node, *vport_tc_node, *tmp; + + vports_tc_node =3D list_first_entry(&tc_arbiter_node->children, + struct mlx5_esw_sched_node, + entry); + + list_for_each_entry_safe(vport_tc_node, tmp, &vports_tc_node->children, + entry) + esw_qos_vport_update_parent(vport_tc_node->vport, node, extack); +} + +static int esw_qos_switch_tc_arbiter_node_to_vports( + struct mlx5_esw_sched_node *tc_arbiter_node, + struct mlx5_esw_sched_node *node, + struct netlink_ext_ack *extack) +{ + u32 parent_tsar_ix =3D node->parent ? + node->parent->ix : node->esw->qos.root_tsar_ix; + int err; + + err =3D esw_qos_create_node_sched_elem(node->esw->dev, parent_tsar_ix, + node->max_rate, node->bw_share, + &node->ix); + if (err) { + NL_SET_ERR_MSG_MOD(extack, + "Failed to create scheduling element for vports node when disabliin= g vports TC QoS"); + return err; + } + + node->type =3D SCHED_NODE_TYPE_VPORTS_TSAR; + + /* Disable TC QoS for vports in the arbiter node. */ + esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, extack); + + return 0; +} + +static int esw_qos_switch_vports_node_to_tc_arbiter( + struct mlx5_esw_sched_node *node, + struct mlx5_esw_sched_node *tc_arbiter_node, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node, *tmp; + struct mlx5_vport *vport; + int err; + + /* Enable TC QoS for each vport in the node. */ + list_for_each_entry_safe(vport_node, tmp, &node->children, entry) { + vport =3D vport_node->vport; + err =3D esw_qos_vport_update_parent(vport, tc_arbiter_node, + extack); + if (err) + goto err_out; + } + + /* Destroy the current vports node TSAR. */ + err =3D mlx5_destroy_scheduling_element_cmd(node->esw->dev, + SCHEDULING_HIERARCHY_E_SWITCH, + node->ix); + if (err) + goto err_out; + + return 0; +err_out: + /* Restore vports back into the node if an error occurs. */ + esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, NULL); + + return err; +} + +static struct mlx5_esw_sched_node * +esw_qos_move_node(struct mlx5_esw_sched_node *curr_node) +{ + struct mlx5_esw_sched_node *new_node; + + new_node =3D __esw_qos_alloc_node(curr_node->esw, curr_node->ix, + curr_node->type, NULL); + if (!IS_ERR(new_node)) + esw_qos_nodes_set_parent(&curr_node->children, new_node); + + return new_node; +} + +static int esw_qos_node_disable_tc_arbitration(struct mlx5_esw_sched_node = *node, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *curr_node; + int err; + + if (node->type !=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) + return 0; + + /* Allocate a new rate node to hold the current state, which will allow + * for restoring the vports back to this node after disabling TC + * arbitration. + */ + curr_node =3D esw_qos_move_node(node); + if (IS_ERR(curr_node)) { + NL_SET_ERR_MSG_MOD(extack, "Failed setting up vports node"); + return PTR_ERR(curr_node); + } + + /* Disable TC QoS for all vports, and assign them back to the node. */ + err =3D esw_qos_switch_tc_arbiter_node_to_vports(curr_node, node, extack); + if (err) + goto err_out; + + /* Clean up the TC arbiter node after disabling TC QoS for vports. */ + esw_qos_tc_arbiter_scheduling_teardown(curr_node, extack); + goto out; +err_out: + esw_qos_nodes_set_parent(&curr_node->children, node); +out: + __esw_qos_free_node(curr_node); + return err; +} + +static int esw_qos_node_enable_tc_arbitration(struct mlx5_esw_sched_node *= node, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *curr_node, *child; + int err, new_level, max_level; + + if (node->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) + return 0; + + /* Increase the hierarchy level by one to account for the additional + * vports TC scheduling node, and verify that the new level does not + * exceed the maximum allowed depth. + */ + new_level =3D node->level + 1; + max_level =3D 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); + if (new_level > max_level) { + NL_SET_ERR_MSG_MOD(extack, + "TC arbitration on nodes is not supported beyond max scheduling dep= th"); + return -EOPNOTSUPP; + } + + /* Ensure the node does not contain non-leaf children before assigning + * TC bandwidth. + */ + if (!list_empty(&node->children)) { + list_for_each_entry(child, &node->children, entry) { + if (!child->vport) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot configure TC bandwidth on a node with non-leaf children"); + return -EOPNOTSUPP; + } + } + } + + /* Allocate a new node that will store the information of the current + * node. This will be used later to restore the node if necessary. + */ + curr_node =3D esw_qos_move_node(node); + if (IS_ERR(curr_node)) { + NL_SET_ERR_MSG_MOD(extack, "Failed setting up node TC QoS"); + return PTR_ERR(curr_node); + } + + /* Initialize the TC arbiter node for QoS management. + * This step prepares the node for handling Traffic Class arbitration. + */ + err =3D esw_qos_tc_arbiter_scheduling_setup(node, extack); + if (err) + goto err_setup; + + /* Enable TC QoS for each vport within the current node. */ + err =3D esw_qos_switch_vports_node_to_tc_arbiter(curr_node, node, extack); + if (err) + goto err_switch_vports; + goto out; + +err_switch_vports: + esw_qos_tc_arbiter_scheduling_teardown(node, NULL); + node->ix =3D curr_node->ix; + node->type =3D curr_node->type; +err_setup: + esw_qos_nodes_set_parent(&curr_node->children, node); +out: + __esw_qos_free_node(curr_node); + return err; +} + static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *md= ev) { struct ethtool_link_ksettings lksettings; @@ -848,6 +1099,31 @@ static int esw_qos_devlink_rate_to_mbps(struct mlx5_c= ore_dev *mdev, const char * return 0; } =20 +static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, + u32 *tc_bw) +{ + int i, num_tcs =3D esw_qos_num_tcs(esw->dev); + + for (i =3D num_tcs; i < IEEE_8021QAZ_MAX_TCS; i++) { + if (tc_bw[i]) + return false; + } + + return true; +} + +static bool esw_qos_tc_bw_disabled(u32 *tc_bw) +{ + int i; + + for (i =3D 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + if (tc_bw[i]) + return false; + } + + return true; +} + int mlx5_esw_qos_init(struct mlx5_eswitch *esw) { if (esw->qos.domain) @@ -921,9 +1197,28 @@ int mlx5_esw_devlink_rate_node_tc_bw_set(struct devli= nk_rate *rate_node, u32 *tc_bw, struct netlink_ext_ack *extack) { - NL_SET_ERR_MSG_MOD(extack, - "TC bandwidth shares are not supported on nodes"); - return -EOPNOTSUPP; + struct mlx5_esw_sched_node *node =3D priv; + struct mlx5_eswitch *esw =3D node->esw; + bool disable; + int err; + + if (!esw_qos_validate_unsupported_tc_bw(esw, tc_bw)) { + NL_SET_ERR_MSG_MOD(extack, + "E-Switch traffic classes number is not supported"); + return -EOPNOTSUPP; + } + + disable =3D esw_qos_tc_bw_disabled(tc_bw); + esw_qos_lock(esw); + if (disable) { + err =3D esw_qos_node_disable_tc_arbitration(node, extack); + goto unlock; + } + + err =3D esw_qos_node_enable_tc_arbitration(node, extack); +unlock: + esw_qos_unlock(esw); + return err; } =20 int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node= , void *priv, --=20 2.31.1 From nobody Tue Dec 16 15:29:11 2025 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2065.outbound.protection.outlook.com [40.107.223.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33EA027C87C; Tue, 6 May 2025 11:27:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.223.65 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530875; cv=fail; b=q1NTvWnWWuTQAgs4cJma8zfV9tAfRZ0tLwISIDdKYwwMhXRG5reTojp5nobtoOrZQvuvaQIare4xYLoHJ0sqk0lpjv61PKFReucxmI3CfiJQgLPfxY67M17p6Em7SNjE7avycEJK05ZXOii6martZngtqFqurYhrLnlSHKEz+9s= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530875; c=relaxed/simple; bh=cvhhEasZVfTgOCHWP0Os8tPq1jSbgf1cDsQYbSFfP6w=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OjARaLoqviX0ARDBY2QdutO/vbT8cBn0+yaiKqAj44R4xldubenAkvppg2eOddTW18nkuYp0jqxlC2FdKqpxdJ71qtH9TeTHZ9+OIgnrkd3FmruGYdaabjrgc8LBNxEUnFHbGZftmqOibzOZ4mn/sIgXLr+aFoY8ofeIP3k4HOE= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=BFqMRhKJ; arc=fail smtp.client-ip=40.107.223.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="BFqMRhKJ" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=rqlLrtnO2/b7YEh2nAFfWgdbyvH/B0DOdtfvqB4HB2K3s0Um6di63F97zst/PWCSwC4Qf0kOSQgxmVAYfSJun7clcSlsR9Zzl7+rBkbTkvXey60OjXBHCTNOL6oyCklfD4rz/9lemDbpMDDYrLyMZtv0a9fMlWx8hxUUgSOmQbzrz10F3f2IzskMFKA1uaP+ElTJMYQh2Rj+9LZIqRGWEUh3iIQalk2dE4VqVdXyD+OT01y4t6c0Qg/MKAjAQPW3FXzUVY0aRutu1K3RAmcSPSiqpjVM4Oe7fau5TdZwNsWAb35HegLFGBcb+HJF9j174tWVixaORBpgNaRICZIz7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=a8/rW+in+zfjbP8VVbLA/SAXahsmKtwglbaogJE+Vgw=; b=jJnL09HySELbZzre3DrB2KXTUOSWAIbIRJJF/o4oVU/fv+8BDEz3I0mNxmveTm4a5v5jiXPZq1E8PO4bsrU0iVPtdk3JtyvrtQSf7GxPIRBYVA1kP1udrba1dQ2BVRrruIxzPH7Kjah8+l925gPW1st4mVu/WHO4MtTjhW0bQzSNNbHRVqyTMHzUxhzEjiRXXTh3K1elJ7BnqTMI0NhZU6GgpzqgK/s49MltNXcT6Pv15Iu0H1L/XxMHKX5vLdUvAxW1zk8QWOLqDHBEQGc4U1Dz/4AZRnnJzQ5b25r3lIiUjFLTMENTAPfBjb3L3OAtmS5DhZ5oNmgPLop0C74oLQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=a8/rW+in+zfjbP8VVbLA/SAXahsmKtwglbaogJE+Vgw=; b=BFqMRhKJFQwdIYQ5WDZlsWsbLtDaFfZ7yxBT3rJ/+69hyw8b9rmYV0fas31KVsg/ZcAQ8HGQZG5cD2/4zC02mhv28NDAYwW1MsPP33baNHDYElMrNG45UaTu2kbx3qXxVmTdq5TqbTZEZ3pR+Oo0P4fl5rTCMLZU4IQyerl6+ZBYwXQBLDy6o0/o8h7ljlrLNmelaTl5YbRbv8GO9cyJLcSbgjJGqI+HE5E2rcQkuvCClB9wyWNI/kcgzpQoHGKol8haMixR9hzMAOG35zAL81zqPsJKwobcqJsUeeXgnSEQrQ1fajaL+ZJia+IqAhDtXMjinra1xE/ZoG+kE2y49A== Received: from DM6PR02CA0129.namprd02.prod.outlook.com (2603:10b6:5:1b4::31) by MW6PR12MB8914.namprd12.prod.outlook.com (2603:10b6:303:244::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8699.23; Tue, 6 May 2025 11:27:43 +0000 Received: from CY4PEPF0000EE36.namprd05.prod.outlook.com (2603:10b6:5:1b4:cafe::48) by DM6PR02CA0129.outlook.office365.com (2603:10b6:5:1b4::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8722.20 via Frontend Transport; Tue, 6 May 2025 11:27:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by CY4PEPF0000EE36.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.18 via Frontend Transport; Tue, 6 May 2025 11:27:42 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Tue, 6 May 2025 04:27:30 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Tue, 6 May 2025 04:27:31 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Tue, 6 May 2025 04:27:26 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jiri Pirko , Gal Pressman , "Leon Romanovsky" , Donald Hunter , "Jiri Pirko" , Jonathan Corbet , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , , , , , Moshe Shemesh , Mark Bloch , Carolina Jubran , Cosmin Ratiu Subject: [PATCH net-next V8 4/5] net/mlx5: Add traffic class scheduling support for vport QoS Date: Tue, 6 May 2025 14:26:42 +0300 Message-ID: <1746530803-450152-5-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> References: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE36:EE_|MW6PR12MB8914:EE_ X-MS-Office365-Filtering-Correlation-Id: 69238bd0-c60b-484d-767e-08dd8c910470 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|36860700013|82310400026|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?yCZLD0mKIUKN43hOmkrX953kK+R64vfqvah9mow/6N59yU2JKb6tATorV1en?= =?us-ascii?Q?/hdXEpjAocHK/7FmMjBzTlYfYe+Y/igRai8OnkT2dzArzsuIZt/1vwIughlm?= =?us-ascii?Q?op1fZRqbZMcNrN/g7hR6jeisaq9J7WyfLf2Gov8bPx+2Cj0YxQZ8SEvADedK?= =?us-ascii?Q?dec8K3QfWRpMjQspBTK9bkK0zKhHX3eTzpsL8TgMZaeN//QIvb8tt6tCpNuT?= =?us-ascii?Q?j8ozRZEImjbdx3XS7RgSoNgdkEIHTUKGs9avr4N2jfUWFbguDTLFEDBb3/dj?= =?us-ascii?Q?goUlJTCBNGAghQBvvEaOzjZnotjeieT/V9jsU5Ti2DbioPzLbSwzXYbBW8I5?= =?us-ascii?Q?Wj0uYxmXmulGKux7gVr8bSJGf+WxuXEhg/P0lYVddWdsnyvye17BYpgdRhnT?= =?us-ascii?Q?wC5/zfvB7IrJuQbrHc+fjVHvspyBIKF9e7+lyDLibfcMv5PwHBYTRgpfv+V3?= =?us-ascii?Q?vp99ptXvRKyjJqHl316oziVIif8LQ5lQoLFmUjj+bG5VIasz/7a/AQ5n0Ol7?= =?us-ascii?Q?mnC3ymuBBpL1geD98XwvJmM29obFSfeukCcubZaF8BB1AJMy7At4mfj7roeJ?= =?us-ascii?Q?0f8nJ0MFviWfGmsxYOUm77Fuq4aPAz2xI2UjgvfQK2nVmXwbOEgElNr+VVy0?= =?us-ascii?Q?595srscmh2sFbJeXnhHHPjJuVxl5cBIS/InNNXGV37D2UT7gi9kvAGkeQmBF?= =?us-ascii?Q?ih5POf33SIcntMEZS1FXei6GObT2CLocl/AO+nOsOrAEBpW9aWk6j1uq1vK0?= =?us-ascii?Q?WhcTc8uOUWzfmfxerXLwZkzc3TvffwduMs9GXrwtiVKxKMFF6oYXvrVJvTWj?= =?us-ascii?Q?kscu4V9kfgTYb9fx28IPrOVlU0NK+15qZoZtqKUDVUAY9GtuELUhSkzIdEwy?= =?us-ascii?Q?1J23cQTxWcmA8MvEs0qoc5vaPzm9p6Z744/1+Nk/97y6Jnn1ckS/QBn6A3qO?= =?us-ascii?Q?gFVDJ/ig435whcgrRcmrdI4qvrvMEfXm7ObTMT2B9frY12uNdGF3H+vl29lB?= =?us-ascii?Q?66xc+9BUAdfzsMK70WHfg8RKPnWaF8jMqdqVJouE9M0r93cuADJc2PESIf/a?= =?us-ascii?Q?VfIyh4zmKkdBL3vK9SqDgFcpry8hDw8YvBho6wNPkZLtEUS0eo/UP1do2TN2?= =?us-ascii?Q?kC3aX22gO104hOvIRFv057/K8J64MUnejgVx3o3CBbv3nmtqZZkcEiYSy/Lk?= =?us-ascii?Q?4spoZt15Z+Ewj+8QuxaEWwzrH1f9h0zFVnYHfVuWoBdYGeyBTSqFEAhRLLKT?= =?us-ascii?Q?YrJ8KN5AslbAbiYXBFcRoVj3vsK+YiwmTMamaND+n8xseypmcn43PgHVB1lQ?= =?us-ascii?Q?2H8qK+UBd1Ai6TDMGdtWVjVg4k6XlLe71rdR+qtebX9PmFkgna9w/Venc9DI?= =?us-ascii?Q?tSTEhHlE6ub3CqnovfIVD0JiWuHBilfmwv5OGDsxds7t+vQ8UNQsB3ETdAa3?= =?us-ascii?Q?BVrb9FsJ+2d7UgMboJ6cD2r902pw8EZy8eflnPKCYJvnJIY0INmOrAs4tLQF?= =?us-ascii?Q?PKznDuer87Bs1EhTmhPUmmu9A9YHq1lO0fv+?= X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(7416014)(36860700013)(82310400026)(376014)(1800799024);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 May 2025 11:27:42.7956 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 69238bd0-c60b-484d-767e-08dd8c910470 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE36.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW6PR12MB8914 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Carolina Jubran Introduce support for traffic class (TC) scheduling on vports by allowing the vport to own multiple TC scheduling nodes. This patch enables more granular control of QoS by defining three distinct QoS states for vports, each providing unique scheduling behavior: 1. Regular QoS: The `sched_node` represents the vport directly, handling QoS as a single scheduling entity. 2. TC QoS on the vport: The `sched_node` acts as a TC arbiter, enabling TC scheduling directly on the vport. 3. TC QoS on the parent node: The `sched_node` functions as a rate limiter, with TC arbitration enabled at the parent level, associating multiple scheduling nodes with each vport. Key changes include: - Added support for new scheduling elements, vport traffic class and rate limiter. - New helper functions for creating, destroying, and restoring vport TC scheduling nodes, handling transitions between regular QoS and TC arbitration states. - Updated `esw_qos_vport_enable()` and `esw_qos_vport_disable()` to support both regular QoS and TC arbitration states, ensuring consistent transitions between scheduling modes. - Introduced a `sched_nodes` array under `vport->qos` to store multiple TC scheduling nodes per vport, enabling finer control over per-TC QoS. - Enhanced `esw_qos_vport_update_parent()` to handle transitions between the three QoS states based on the current and new parent node types. This patch lays the groundwork for future support for configuring tc-bw on vports. Although the infrastructure is in place, full support for tc-bw is not yet implemented; attempts to set tc-bw on vports will return `-EOPNOTSUPP`. No functional changes are introduced at this stage. Signed-off-by: Carolina Jubran Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan --- .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 438 ++++++++++++++++-- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 14 +- 2 files changed, 422 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/esw/qos.c index 9a92121cb4cb..8893aaf32724 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c @@ -65,12 +65,16 @@ enum sched_node_type { SCHED_NODE_TYPE_VPORTS_TSAR, SCHED_NODE_TYPE_VPORT, SCHED_NODE_TYPE_TC_ARBITER_TSAR, + SCHED_NODE_TYPE_RATE_LIMITER, + SCHED_NODE_TYPE_VPORT_TC, }; =20 static const char * const sched_node_type_str[] =3D { [SCHED_NODE_TYPE_VPORTS_TSAR] =3D "vports TSAR", [SCHED_NODE_TYPE_VPORT] =3D "vport", [SCHED_NODE_TYPE_TC_ARBITER_TSAR] =3D "TC Arbiter TSAR", + [SCHED_NODE_TYPE_RATE_LIMITER] =3D "Rate Limiter", + [SCHED_NODE_TYPE_VPORT_TC] =3D "vport TC", }; =20 struct mlx5_esw_sched_node { @@ -94,6 +98,8 @@ struct mlx5_esw_sched_node { struct mlx5_vport *vport; /* Level in the hierarchy. The root node level is 1. */ u8 level; + /* Valid only when this node represents a traffic class. */ + u8 tc; }; =20 static void esw_qos_node_attach_to_parent(struct mlx5_esw_sched_node *node) @@ -148,6 +154,15 @@ static void esw_qos_nodes_set_parent(struct list_head = *nodes, =20 void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport) { + if (vport->qos.sched_nodes) { + int num_tcs =3D esw_qos_num_tcs(vport->qos.sched_node->esw->dev); + int i; + + for (i =3D 0; i < num_tcs; i++) + kfree(vport->qos.sched_nodes[i]); + kfree(vport->qos.sched_nodes); + } + kfree(vport->qos.sched_node); memset(&vport->qos, 0, sizeof(vport->qos)); } @@ -172,11 +187,19 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport= *vport) static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int = err, const char *op) { switch (node->type) { + case SCHED_NODE_TYPE_VPORT_TC: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (vport=3D%d,tc=3D%d,err=3D%d= )\n", + op, + sched_node_type_str[node->type], + node->vport->vport, node->tc, err); + break; case SCHED_NODE_TYPE_VPORT: esw_warn(node->esw->dev, "E-Switch %s %s scheduling element failed (vport=3D%d,err=3D%d)\n", op, sched_node_type_str[node->type], node->vport->vport, err); break; + case SCHED_NODE_TYPE_RATE_LIMITER: case SCHED_NODE_TYPE_TC_ARBITER_TSAR: case SCHED_NODE_TYPE_VPORTS_TSAR: esw_warn(node->esw->dev, @@ -271,6 +294,24 @@ static int esw_qos_sched_elem_config(struct mlx5_esw_s= ched_node *node, u32 max_r return 0; } =20 +static int esw_qos_create_rate_limit_element(struct mlx5_esw_sched_node *n= ode, + struct netlink_ext_ack *extack) +{ + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] =3D {}; + + if (!mlx5_qos_element_type_supported( + node->esw->dev, + SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT, + SCHEDULING_HIERARCHY_E_SWITCH)) + return -EOPNOTSUPP; + + MLX5_SET(scheduling_context, sched_ctx, max_average_bw, node->max_rate); + MLX5_SET(scheduling_context, sched_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT); + + return esw_qos_node_create_sched_element(node, sched_ctx, extack); +} + static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw, struct mlx5_esw_sched_node *parent) { @@ -388,28 +429,64 @@ esw_qos_create_node_sched_elem(struct mlx5_core_dev *= dev, u32 parent_element_id, tsar_ix); } =20 -static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *= vport_node, - struct netlink_ext_ack *extack) +static int +esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, + struct netlink_ext_ack *extack) { u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] =3D {}; struct mlx5_core_dev *dev =3D vport_node->esw->dev; void *attr; =20 - if (!mlx5_qos_element_type_supported(dev, - SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT, - SCHEDULING_HIERARCHY_E_SWITCH)) + if (!mlx5_qos_element_type_supported( + dev, + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT, + SCHEDULING_HIERARCHY_E_SWITCH)) return -EOPNOTSUPP; =20 MLX5_SET(scheduling_context, sched_ctx, element_type, SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT); attr =3D MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); MLX5_SET(vport_element, attr, vport_number, vport_node->vport->vport); - MLX5_SET(scheduling_context, sched_ctx, parent_element_id, vport_node->pa= rent->ix); - MLX5_SET(scheduling_context, sched_ctx, max_average_bw, vport_node->max_r= ate); + MLX5_SET(scheduling_context, sched_ctx, parent_element_id, + vport_node->parent->ix); + MLX5_SET(scheduling_context, sched_ctx, max_average_bw, + vport_node->max_rate); =20 return esw_qos_node_create_sched_element(vport_node, sched_ctx, extack); } =20 +static int +esw_qos_vport_tc_create_sched_element(struct mlx5_esw_sched_node *vport_tc= _node, + u32 rate_limit_elem_ix, + struct netlink_ext_ack *extack) +{ + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] =3D {}; + struct mlx5_core_dev *dev =3D vport_tc_node->esw->dev; + void *attr; + + if (!mlx5_qos_element_type_supported( + dev, + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC, + SCHEDULING_HIERARCHY_E_SWITCH)) + return -EOPNOTSUPP; + + MLX5_SET(scheduling_context, sched_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC); + attr =3D MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); + MLX5_SET(vport_tc_element, attr, vport_number, + vport_tc_node->vport->vport); + MLX5_SET(vport_tc_element, attr, traffic_class, vport_tc_node->tc); + MLX5_SET(scheduling_context, sched_ctx, max_bw_obj_id, + rate_limit_elem_ix); + MLX5_SET(scheduling_context, sched_ctx, parent_element_id, + vport_tc_node->parent->ix); + MLX5_SET(scheduling_context, sched_ctx, bw_share, + vport_tc_node->bw_share); + + return esw_qos_node_create_sched_element(vport_tc_node, sched_ctx, + extack); +} + static struct mlx5_esw_sched_node * __esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_nod= e_type type, struct mlx5_esw_sched_node *parent) @@ -617,12 +694,202 @@ static int esw_qos_tc_arbiter_scheduling_setup(struc= t mlx5_esw_sched_node *node, return -EOPNOTSUPP; } =20 +static int +esw_qos_create_vport_tc_sched_node(struct mlx5_vport *vport, + u32 rate_limit_elem_ix, + struct mlx5_esw_sched_node *vports_tc_node, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; + struct mlx5_esw_sched_node *vport_tc_node; + u8 tc =3D vports_tc_node->tc; + int err; + + vport_tc_node =3D __esw_qos_alloc_node(vport_node->esw, 0, + SCHED_NODE_TYPE_VPORT_TC, + vports_tc_node); + if (!vport_tc_node) + return -ENOMEM; + + vport_tc_node->min_rate =3D vport_node->min_rate; + vport_tc_node->tc =3D tc; + vport_tc_node->vport =3D vport; + err =3D esw_qos_vport_tc_create_sched_element(vport_tc_node, + rate_limit_elem_ix, + extack); + if (err) + goto err_out; + + vport->qos.sched_nodes[tc] =3D vport_tc_node; + + return 0; +err_out: + __esw_qos_free_node(vport_tc_node); + return err; +} + +static void +esw_qos_destroy_vport_tc_sched_elements(struct mlx5_vport *vport, + struct netlink_ext_ack *extack) +{ + int i, num_tcs =3D esw_qos_num_tcs(vport->qos.sched_node->esw->dev); + + for (i =3D 0; i < num_tcs; i++) { + if (vport->qos.sched_nodes[i]) { + __esw_qos_destroy_node(vport->qos.sched_nodes[i], + extack); + } + } + + kfree(vport->qos.sched_nodes); + vport->qos.sched_nodes =3D NULL; +} + +static int +esw_qos_create_vport_tc_sched_elements(struct mlx5_vport *vport, + enum sched_node_type type, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; + struct mlx5_esw_sched_node *tc_arbiter_node, *vports_tc_node; + int err, num_tcs =3D esw_qos_num_tcs(vport_node->esw->dev); + u32 rate_limit_elem_ix; + + vport->qos.sched_nodes =3D kcalloc(num_tcs, + sizeof(struct mlx5_esw_sched_node *), + GFP_KERNEL); + if (!vport->qos.sched_nodes) { + NL_SET_ERR_MSG_MOD(extack, + "Allocating the vport TC scheduling elements failed."); + return -ENOMEM; + } + + rate_limit_elem_ix =3D type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER ? + vport_node->ix : 0; + tc_arbiter_node =3D type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER ? + vport_node->parent : vport_node; + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { + err =3D esw_qos_create_vport_tc_sched_node(vport, + rate_limit_elem_ix, + vports_tc_node, + extack); + if (err) + goto err_create_vport_tc; + } + + return 0; + +err_create_vport_tc: + esw_qos_destroy_vport_tc_sched_elements(vport, NULL); + + return err; +} + +static int +esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type typ= e, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; + int err, new_level, max_level; + + if (type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + /* Increase the parent's level by 2 to account for both the + * TC arbiter and the vports TC scheduling element. + */ + new_level =3D vport_node->parent->level + 2; + max_level =3D 1 << MLX5_CAP_QOS(vport_node->esw->dev, + log_esw_max_sched_depth); + if (new_level > max_level) { + NL_SET_ERR_MSG_MOD(extack, + "TC arbitration on leafs is not supported beyond max scheduling de= pth"); + return -EOPNOTSUPP; + } + } + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); + + if (type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER) + err =3D esw_qos_create_rate_limit_element(vport_node, extack); + else + err =3D esw_qos_tc_arbiter_scheduling_setup(vport_node, extack); + if (err) + return err; + + /* Rate limiters impact multiple nodes not directly connected to them + * and are not direct members of the QoS hierarchy. + * Unlink it from the parent to reflect that. + */ + if (type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER) { + list_del_init(&vport_node->entry); + vport_node->level =3D 0; + } + + err =3D esw_qos_create_vport_tc_sched_elements(vport, type, extack); + if (err) + goto err_sched_nodes; + + return 0; + +err_sched_nodes: + if (type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER) { + esw_qos_node_destroy_sched_element(vport_node, NULL); + list_add_tail(&vport_node->entry, + &vport_node->parent->children); + vport_node->level =3D vport_node->parent->level + 1; + } else { + esw_qos_tc_arbiter_scheduling_teardown(vport_node, NULL); + } + return err; +} + +static void esw_qos_vport_tc_disable(struct mlx5_vport *vport, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; + enum sched_node_type curr_type =3D vport_node->type; + + esw_qos_destroy_vport_tc_sched_elements(vport, extack); + + if (curr_type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER) + esw_qos_node_destroy_sched_element(vport_node, extack); + else + esw_qos_tc_arbiter_scheduling_teardown(vport_node, extack); +} + +static int esw_qos_set_vport_tcs_min_rate(struct mlx5_vport *vport, + u32 min_rate, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; + int err, i, num_tcs =3D esw_qos_num_tcs(vport_node->esw->dev); + + for (i =3D 0; i < num_tcs; i++) { + err =3D esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], + min_rate, extack); + if (err) + goto err_out; + } + vport_node->min_rate =3D min_rate; + + return 0; +err_out: + for (--i; i >=3D 0; i--) { + esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], + vport_node->min_rate, extack); + } + return err; +} + static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink= _ext_ack *extack) { struct mlx5_esw_sched_node *vport_node =3D vport->qos.sched_node; struct mlx5_esw_sched_node *parent =3D vport_node->parent; + enum sched_node_type curr_type =3D vport_node->type; =20 - esw_qos_node_destroy_sched_element(vport_node, extack); + if (curr_type =3D=3D SCHED_NODE_TYPE_VPORT) + esw_qos_node_destroy_sched_element(vport_node, extack); + else + esw_qos_vport_tc_disable(vport, extack); =20 vport_node->bw_share =3D 0; list_del_init(&vport_node->entry); @@ -631,7 +898,9 @@ static void esw_qos_vport_disable(struct mlx5_vport *vp= ort, struct netlink_ext_a trace_mlx5_esw_vport_qos_destroy(vport_node->esw->dev, vport); } =20 -static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_= sched_node *parent, +static int esw_qos_vport_enable(struct mlx5_vport *vport, + enum sched_node_type type, + struct mlx5_esw_sched_node *parent, struct netlink_ext_ack *extack) { int err; @@ -639,10 +908,16 @@ static int esw_qos_vport_enable(struct mlx5_vport *vp= ort, struct mlx5_esw_sched_ esw_assert_qos_lock_held(vport->dev->priv.eswitch); =20 esw_qos_node_set_parent(vport->qos.sched_node, parent); - err =3D esw_qos_vport_create_sched_element(vport->qos.sched_node, extack); + if (type =3D=3D SCHED_NODE_TYPE_VPORT) { + err =3D esw_qos_vport_create_sched_element(vport->qos.sched_node, + extack); + } else { + err =3D esw_qos_vport_tc_enable(vport, type, extack); + } if (err) return err; =20 + vport->qos.sched_node->type =3D type; esw_qos_normalize_min_rate(parent->esw, parent, extack); trace_mlx5_esw_vport_qos_create(vport->dev, vport, vport->qos.sched_node->max_rate, @@ -673,9 +948,8 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport = *vport, enum sched_node_t sched_node->min_rate =3D min_rate; sched_node->vport =3D vport; vport->qos.sched_node =3D sched_node; - err =3D esw_qos_vport_enable(vport, parent, extack); + err =3D esw_qos_vport_enable(vport, type, parent, extack); if (err) { - __esw_qos_free_node(sched_node); esw_qos_put(esw); vport->qos.sched_node =3D NULL; } @@ -728,6 +1002,8 @@ static int mlx5_esw_qos_set_vport_min_rate(struct mlx5= _vport *vport, u32 min_rat if (!vport_node) return mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, NULL, 0, = min_rate, extack); + else if (vport_node->type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER) + return esw_qos_set_vport_tcs_min_rate(vport, min_rate, extack); else return esw_qos_set_node_min_rate(vport_node, min_rate, extack); } @@ -760,12 +1036,60 @@ bool mlx5_esw_qos_get_vport_rate(struct mlx5_vport *= vport, u32 *max_rate, u32 *m return enabled; } =20 +static int esw_qos_vport_tc_check_type(enum sched_node_type curr_type, + enum sched_node_type new_type, + struct netlink_ext_ack *extack) +{ + if (curr_type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR && + new_type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot switch from vport-level TC arbitration to node-level TC arb= itration"); + return -EOPNOTSUPP; + } + + if (curr_type =3D=3D SCHED_NODE_TYPE_RATE_LIMITER && + new_type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot switch from node-level TC arbitration to vport-level TC arb= itration"); + return -EOPNOTSUPP; + } + + return 0; +} + +static int esw_qos_vport_update(struct mlx5_vport *vport, + enum sched_node_type type, + struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *curr_parent =3D vport->qos.sched_node->parent; + enum sched_node_type curr_type =3D vport->qos.sched_node->type; + int err; + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); + parent =3D parent ?: curr_parent; + if (curr_type =3D=3D type && curr_parent =3D=3D parent) + return 0; + + err =3D esw_qos_vport_tc_check_type(curr_type, type, extack); + if (err) + return err; + + esw_qos_vport_disable(vport, extack); + + err =3D esw_qos_vport_enable(vport, type, parent, extack); + if (err) + esw_qos_vport_enable(vport, curr_type, curr_parent, NULL); + + return err; +} + static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct ml= x5_esw_sched_node *parent, struct netlink_ext_ack *extack) { struct mlx5_eswitch *esw =3D vport->dev->priv.eswitch; struct mlx5_esw_sched_node *curr_parent; - int err; + enum sched_node_type type; =20 esw_assert_qos_lock_held(esw); curr_parent =3D vport->qos.sched_node->parent; @@ -773,16 +1097,17 @@ static int esw_qos_vport_update_parent(struct mlx5_v= port *vport, struct mlx5_esw if (curr_parent =3D=3D parent) return 0; =20 - esw_qos_vport_disable(vport, extack); - - err =3D esw_qos_vport_enable(vport, parent, extack); - if (err) { - if (esw_qos_vport_enable(vport, curr_parent, NULL)) - esw_warn(parent->esw->dev, "vport restore QoS failed (vport=3D%d)\n", - vport->vport); - } + /* Set vport QoS type based on parent node type if different from + * default QoS; otherwise, use the vport's current QoS type. + */ + if (parent->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) + type =3D SCHED_NODE_TYPE_RATE_LIMITER; + else if (curr_parent->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) + type =3D SCHED_NODE_TYPE_VPORT; + else + type =3D vport->qos.sched_node->type; =20 - return err; + return esw_qos_vport_update(vport, type, parent, extack); } =20 static void @@ -1112,6 +1437,16 @@ static bool esw_qos_validate_unsupported_tc_bw(struc= t mlx5_eswitch *esw, return true; } =20 +static bool esw_qos_vport_validate_unsupported_tc_bw(struct mlx5_vport *vp= ort, + u32 *tc_bw) +{ + struct mlx5_eswitch *esw =3D vport->qos.sched_node ? + vport->qos.sched_node->parent->esw : + vport->dev->priv.eswitch; + + return esw_qos_validate_unsupported_tc_bw(esw, tc_bw); +} + static bool esw_qos_tc_bw_disabled(u32 *tc_bw) { int i; @@ -1187,9 +1522,50 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devl= ink_rate *rate_leaf, u32 *tc_bw, struct netlink_ext_ack *extack) { - NL_SET_ERR_MSG_MOD(extack, - "TC bandwidth shares are not supported on leafs"); - return -EOPNOTSUPP; + struct mlx5_esw_sched_node *vport_node; + struct mlx5_vport *vport =3D priv; + struct mlx5_eswitch *esw; + bool disable; + int err =3D 0; + + esw =3D vport->dev->priv.eswitch; + if (!mlx5_esw_allowed(esw)) + return -EPERM; + + disable =3D esw_qos_tc_bw_disabled(tc_bw); + esw_qos_lock(esw); + + if (!esw_qos_vport_validate_unsupported_tc_bw(vport, tc_bw)) { + NL_SET_ERR_MSG_MOD(extack, + "E-Switch traffic classes number is not supported"); + err =3D -EOPNOTSUPP; + goto unlock; + } + + vport_node =3D vport->qos.sched_node; + if (disable && !vport_node) + goto unlock; + + if (disable) { + if (vport_node->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) + err =3D esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, + NULL, extack); + goto unlock; + } + + if (!vport_node) { + err =3D mlx5_esw_qos_vport_enable(vport, + SCHED_NODE_TYPE_TC_ARBITER_TSAR, + NULL, 0, 0, extack); + vport_node =3D vport->qos.sched_node; + } else { + err =3D esw_qos_vport_update(vport, + SCHED_NODE_TYPE_TC_ARBITER_TSAR, + NULL, extack); + } +unlock: + esw_qos_unlock(esw); + return err; } =20 int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, @@ -1311,10 +1687,16 @@ int mlx5_esw_qos_vport_update_parent(struct mlx5_vp= ort *vport, struct mlx5_esw_s } =20 esw_qos_lock(esw); - if (!vport->qos.sched_node && parent) - err =3D mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, parent, = 0, 0, extack); - else if (vport->qos.sched_node) + if (!vport->qos.sched_node && parent) { + enum sched_node_type type; + + type =3D parent->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR ? + SCHED_NODE_TYPE_RATE_LIMITER : SCHED_NODE_TYPE_VPORT; + err =3D mlx5_esw_qos_vport_enable(vport, type, parent, 0, 0, + extack); + } else if (vport->qos.sched_node) { err =3D esw_qos_vport_update_parent(vport, parent, extack); + } esw_qos_unlock(esw); return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/ne= t/ethernet/mellanox/mlx5/core/eswitch.h index 8573d36785f4..d59fdcb29cb8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -212,10 +212,20 @@ struct mlx5_vport { =20 struct mlx5_vport_info info; =20 - /* Protected with the E-Switch qos domain lock. */ + /* Protected with the E-Switch qos domain lock. The Vport QoS can + * either be disabled (sched_node is NULL) or in one of three states: + * 1. Regular QoS (sched_node is a vport node). + * 2. TC QoS enabled on the vport (sched_node is a TC arbiter). + * 3. TC QoS enabled on the vport's parent node + * (sched_node is a rate limit node). + * When TC is enabled in either mode, the vport owns vport TC scheduling + * nodes. + */ struct { - /* Vport scheduling element node. */ + /* Vport scheduling node. */ struct mlx5_esw_sched_node *sched_node; + /* Array of vport traffic class scheduling nodes. */ + struct mlx5_esw_sched_node **sched_nodes; } qos; =20 u16 vport; --=20 2.31.1 From nobody Tue Dec 16 15:29:11 2025 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2087.outbound.protection.outlook.com [40.107.92.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4C9127AC54; Tue, 6 May 2025 11:27:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.87 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530880; cv=fail; b=AK4X9Kbp99y7ZPkFDM/nWfS64PaZfkgu3jzerjctR6oOEHoZfqSXHT80uSg9LNJJm27AkQ0tKOzwQ1TNb8RkFrV006MAvOBkiyXsrEosvpslkHKopVO6YOfIPu0x21jCxSYwJ0PVUtpxjBSqrPhQ4tjRA0JQmESVjqP5fAIQ1+M= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746530880; c=relaxed/simple; bh=oWhpAXSxzr4hRhZJN0WCvReM3CV26fAJ9FGSLOFnvRE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=P7wU94n/Y8fcsgY5wDulhVBivPbYmI+BaBn7/QrwwW6QRytBhvwFBDz2bv6L45hnffGvljzm84ynWJPd2bylKuo+lGJpUp+Os7DM0YisH0/iTb18TjSdEHXXV138fbEr2YKWRXj+McCay6i/KEWpdY0o1McKKpJchyaQqI3/mBw= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Kw+7oiiY; arc=fail smtp.client-ip=40.107.92.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Kw+7oiiY" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=K86lklMQjmp/M4Q/WhnWRowC3B0U3+5ri9xpS1+0/LOLC9Yqbc9BWy4FTL43F7PpHyKquzUQlpYXoEK49KeTKsp8mF52CLDeIBfNnhHvaemI79SNEcG6Qfwo/VPbXeSWrjZrUE6pSs8Om18pCak8iO5Gbr/5xtzPdV3SdY7OWbB4dMTy4Ltpbvo2uRPWdBI1Jv2hqDfP5FhzHuLNvl1dYjefuUm6Hym9jsiqN2rpQ5lsiUInTh1k1DzVuMZii+6nPv1zjBQgepXvtDacaf4cx8TYw39lE2/bmfCTh4wtfgS4vwib242ET8GLkkw4PUP5z7HCteeGGwObwBnzsi0Xhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HxB5w+JLi0kgwlejSF/7asaTy44ctL1dcvR3ldNYdiY=; b=ExlB8ITHB9LrMwBs0e7x1csK8Vuk1GXTJk5R3vkzImLqlKvINYr5YUgGRgzPqW8I7ayYW91ZqE2AibJPPQfvAGlQM5qEzlVJyFFX+6b03q+AVNdmOJt4UNAAotsqMYjFC0jAcO2rD8fPrLWkluz0xg05x3klmJEXOlpC29J7HMJLyVmrD/1Yqos14DlzYO78NwYqGnyrbZGnLKytB5ySwoeSmKIZGkzaYFkbWK9lvmaWILQ65ggGUb4u9HqHh6fHfcmPEGrHIUFxNgfBm1O47ulgktnOPn627B4HqyYB4ZIbDxlFFZ5V/cewG38Wx/EfeD4k0OZl1Sj0XSpcgOt37w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HxB5w+JLi0kgwlejSF/7asaTy44ctL1dcvR3ldNYdiY=; b=Kw+7oiiYduXN36WkEOOyJzXes/TZm70tDmvlWDiRF2BDjHfWrp2dQ3dgUL9nXYb4F9PjLyMOCECMuedtriRYAPPnpmJltwkPlR606czGnz+94NW8kago+YajuTOC+3HpEZMbiTx+4HH/yBU9mrPUP2DXCBN4P9U0mPPZlvWPDPYtRgqk0eskovet2DEG3ZQ+1l/RdXeUqVgNgq/m9H2oEgDFwDG2uo0USOspjNGCU770YK0mw8IBi6WyK386wZzPXaVQV5M8poR4vI3pt2d/VwQ6RddPkeTLz6eBJbDJNxmvvviOyukRA07SV8LLsCUZWOoA1rFs+lI9eNt0WyIvmA== Received: from SJ0PR05CA0082.namprd05.prod.outlook.com (2603:10b6:a03:332::27) by IA1PR12MB7662.namprd12.prod.outlook.com (2603:10b6:208:425::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8699.24; Tue, 6 May 2025 11:27:46 +0000 Received: from SJ1PEPF00001CE5.namprd03.prod.outlook.com (2603:10b6:a03:332:cafe::7e) by SJ0PR05CA0082.outlook.office365.com (2603:10b6:a03:332::27) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8699.22 via Frontend Transport; Tue, 6 May 2025 11:27:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by SJ1PEPF00001CE5.mail.protection.outlook.com (10.167.242.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.18 via Frontend Transport; Tue, 6 May 2025 11:27:45 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Tue, 6 May 2025 04:27:35 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14; Tue, 6 May 2025 04:27:36 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.1544.14 via Frontend Transport; Tue, 6 May 2025 04:27:31 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" CC: Jiri Pirko , Gal Pressman , "Leon Romanovsky" , Donald Hunter , "Jiri Pirko" , Jonathan Corbet , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , , , , , Moshe Shemesh , Mark Bloch , Carolina Jubran , Cosmin Ratiu Subject: [PATCH net-next V8 5/5] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Date: Tue, 6 May 2025 14:26:43 +0300 Message-ID: <1746530803-450152-6-git-send-email-tariqt@nvidia.com> X-Mailer: git-send-email 2.8.0 In-Reply-To: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> References: <1746530803-450152-1-git-send-email-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: AnonymousSubmission X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00001CE5:EE_|IA1PR12MB7662:EE_ X-MS-Office365-Filtering-Correlation-Id: 8ae1058b-3f4d-4e1f-14a9-08dd8c91062d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|7416014|82310400026|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?bYjAnL/7JS4d3rhoLBn/nAGEbfIo8dhdYX+ySsz1Me/g5vvtlyLNGsfD/Enn?= =?us-ascii?Q?xbS1Lt7+WSm6bqiHcEhzc69iz9jEynhgrA1ci0HdwGDF4+RwACxwfEpsgCtv?= =?us-ascii?Q?5tT7J8wcmmqRJsPRAhE3WtyXBmpfM/k5sWxn1GQ37VnBw1wLB8l4l4rAOC10?= =?us-ascii?Q?wQFiL0ulGPnpCR56VQ9pbqMzRHfpF1J4PTJk3clRNs8i2s9DHRLUMG6yp4OM?= =?us-ascii?Q?C0TA+dL6l3cvY4TU245IWkad63zBkCCmxDP4iM53+LseGBi6Iv7SZweI4Wst?= =?us-ascii?Q?mdIguWBliDFJ1Uyf3b1+rW/LPI1NgZ95YDB7wV/77dOkn22lCHM+ZDXJBq5I?= =?us-ascii?Q?128oypQbVxYUmIj2x2pFsTM4kFsT9frzjDYzbtoorJOpcaxMJYmykvouE4FM?= =?us-ascii?Q?84yxy6Y8hfIF+IxuYiHL/MNWZAPMYMWH+PCiZYLAPRGYo5vu5NVQo8mVT9BE?= =?us-ascii?Q?N2DYcIQXeaU3+IhLU2mCtJgTMuM8zmYBrlcMKCYHdT7I5L1h4wTjyOLK+Hnw?= =?us-ascii?Q?WyPeBwKeAhh8MAOZ3tfK19tu4j8EfeB9E0zykt/1cwAjOFD7FYRk62Q3CEtl?= =?us-ascii?Q?/L7izSAwcCfEpp373vRcWFb/ubnQmZlg0/EnBn1n/b3MhLnjoUfNDyEhZv2o?= =?us-ascii?Q?XmDOnNOYZNC6ld8ECDytrQMUZH730UVweFPYHd5IXUlSdKRW3EUpdR9itbl9?= =?us-ascii?Q?jNJVZfvdRjqQQyixafwUZMlbpwX2clVDfiHI7UyGOD3RuVh7qFYxlRr4Ybe9?= =?us-ascii?Q?AWPeBtFA72QUrgcCcf+dor9BRnH6NnXM2XSdcNK//8u3gqM/xl0H/LJBoZdk?= =?us-ascii?Q?uJrR5jvI7tvbY5ugnEMovR2lz6fscFFBGbqsgGZTbV45lkrEesWSOXVzsiPM?= =?us-ascii?Q?4XgSiQPdym654i+nip46F53aRhEn/LkKmxLxJ7Wqqt9sRhcPy1XqL3edAgoP?= =?us-ascii?Q?rZZOCF2FIVrfzDcrUdYHbIQnxaGMWmSJGdduoVtQ0NTDDiYyipnMA1fIots9?= =?us-ascii?Q?M3Csw1B8ljoBg5jS9SxbcYwObtcIgfT74lpIicCFKRw23kxAJt22NjL/EXLK?= =?us-ascii?Q?ethR+FVcwfbLD1eNET9DMCaw3o5GTt9PREJRmp6dMI/1xBddaUIHpJveCHr3?= =?us-ascii?Q?0htsB4wt4WHlxerK+pYZQ5lvdyv0GuUcG+QZVVw8O5Z9QbbqNg0aIxpMuLMK?= =?us-ascii?Q?gKjS/ELS7Vu8hfch7Hqc+wVG+yqWrsF2KdQPoXMkw29kBPq1n0yj0JXKbkjy?= =?us-ascii?Q?fHXrjXbant4WH2XXyDcypTwpqqiVOx120QFWt0kaKK5YpexDxnl5cAKnMaAE?= =?us-ascii?Q?LBQMMS1uVQb9wL5pIqP6ArEeoUEg32UjOQBJaEjmqmGfiMVGLqtp3Kmx9b2I?= =?us-ascii?Q?BJEarWzR/1QD22nA9hc81J4owcS5sKJOhSnz78n6kNMrya4LfWahOEnTB6c0?= =?us-ascii?Q?iGaFipQsFsByO7UwdEX4xXJedwi/jmoL2V63/GADR6iuzNMgQrrR1dAzKqh9?= =?us-ascii?Q?taA+DdHDdvEcncq24+TAHNTJJlFEnWjegpGH?= X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(36860700013)(7416014)(82310400026)(376014)(1800799024);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 May 2025 11:27:45.7998 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8ae1058b-3f4d-4e1f-14a9-08dd8c91062d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00001CE5.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB7662 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Carolina Jubran Introduce support for managing Traffic Class (TC) arbiter nodes and associated vports TC nodes within the E-Switch QoS hierarchy. This patch adds support for the new scheduling node type, `SCHED_NODE_TYPE_VPORTS_TC_TSAR`, and implements full support for setting tc-bw on both vports and nodes. Key changes include: - Introduced the new scheduling node type, `SCHED_NODE_TYPE_VPORTS_TC_TSAR`, for managing vports within the TC arbiter node. - New helper functions for creating and destroying vports TC nodes under the TC arbiter. - Updated the minimum rate normalization function to skip nodes of type `SCHED_NODE_TYPE_VPORTS_TC_TSAR`. Vports TC TSARs have bandwidth shares configured on them but not minimum rates, so their `min_rate` cannot be normalized. - Implementation of `esw_qos_tc_arbiter_scheduling_setup()` and `esw_qos_tc_arbiter_scheduling_teardown()` for initializing and cleaning up TC arbiter scheduling elements. These functions now fully support tc-bw configuration on TC arbiter nodes. - Added `esw_qos_tc_arbiter_get_bw_shares()` and `esw_qos_set_tc_arbiter_bw_shares()` to handle the settings of bandwidth shares for vports traffic class TSARs. - Refactored `mlx5_esw_devlink_rate_node_tc_bw_set()` and `mlx5_esw_devlink_rate_leaf_tc_bw_set()` to fully support configuring tc-bw on devlink rate nodes and vports, respectively. - Refactored `mlx5_esw_qos_node_update_parent()` to ensure that tc-bw configuration remains compatible with setting a parent on a rate node, preserving level hierarchy functionality. Signed-off-by: Carolina Jubran Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan --- .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 264 +++++++++++++++++- 1 file changed, 257 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/esw/qos.c index 8893aaf32724..fac5058025ae 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c @@ -67,6 +67,7 @@ enum sched_node_type { SCHED_NODE_TYPE_TC_ARBITER_TSAR, SCHED_NODE_TYPE_RATE_LIMITER, SCHED_NODE_TYPE_VPORT_TC, + SCHED_NODE_TYPE_VPORTS_TC_TSAR, }; =20 static const char * const sched_node_type_str[] =3D { @@ -75,6 +76,7 @@ static const char * const sched_node_type_str[] =3D { [SCHED_NODE_TYPE_TC_ARBITER_TSAR] =3D "TC Arbiter TSAR", [SCHED_NODE_TYPE_RATE_LIMITER] =3D "Rate Limiter", [SCHED_NODE_TYPE_VPORT_TC] =3D "vport TC", + [SCHED_NODE_TYPE_VPORTS_TC_TSAR] =3D "vports TC TSAR", }; =20 struct mlx5_esw_sched_node { @@ -187,6 +189,11 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport = *vport) static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int = err, const char *op) { switch (node->type) { + case SCHED_NODE_TYPE_VPORTS_TC_TSAR: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (tc=3D%d,err=3D%d)\n", + op, sched_node_type_str[node->type], node->tc, err); + break; case SCHED_NODE_TYPE_VPORT_TC: esw_warn(node->esw->dev, "E-Switch %s %s scheduling element failed (vport=3D%d,tc=3D%d,err=3D%d= )\n", @@ -376,7 +383,13 @@ static void esw_qos_normalize_min_rate(struct mlx5_esw= itch *esw, if (node->esw !=3D esw || node->ix =3D=3D esw->qos.root_tsar_ix) continue; =20 - esw_qos_update_sched_node_bw_share(node, divider, extack); + /* Vports TC TSARs don't have a minimum rate configured, + * so there's no need to update the bw_share on them. + */ + if (node->type !=3D SCHED_NODE_TYPE_VPORTS_TC_TSAR) { + esw_qos_update_sched_node_bw_share(node, divider, + extack); + } =20 if (list_empty(&node->children)) continue; @@ -527,6 +540,144 @@ static void esw_qos_destroy_node(struct mlx5_esw_sche= d_node *node, struct netlin __esw_qos_free_node(node); } =20 +static int esw_qos_create_vports_tc_node(struct mlx5_esw_sched_node *paren= t, + u8 tc, struct netlink_ext_ack *extack) +{ + u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] =3D {}; + struct mlx5_core_dev *dev =3D parent->esw->dev; + struct mlx5_esw_sched_node *vports_tc_node; + void *attr; + int err; + + if (!mlx5_qos_element_type_supported( + dev, + SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR, + SCHEDULING_HIERARCHY_E_SWITCH) || + !mlx5_qos_tsar_type_supported(dev, + TSAR_ELEMENT_TSAR_TYPE_DWRR, + SCHEDULING_HIERARCHY_E_SWITCH)) + return -EOPNOTSUPP; + + vports_tc_node =3D __esw_qos_alloc_node(parent->esw, 0, + SCHED_NODE_TYPE_VPORTS_TC_TSAR, + parent); + if (!vports_tc_node) { + NL_SET_ERR_MSG_MOD(extack, "E-Switch alloc node failed"); + esw_warn(dev, "Failed to alloc vports TC node (tc=3D%d)\n", tc); + return -ENOMEM; + } + + attr =3D MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes); + MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_DWRR); + MLX5_SET(tsar_element, attr, traffic_class, tc); + MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, parent->ix); + MLX5_SET(scheduling_context, tsar_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR); + + err =3D esw_qos_node_create_sched_element(vports_tc_node, tsar_ctx, + extack); + if (err) + goto err_create_sched_element; + + vports_tc_node->tc =3D tc; + + return 0; + +err_create_sched_element: + __esw_qos_free_node(vports_tc_node); + return err; +} + +static void +esw_qos_tc_arbiter_get_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_no= de, + u32 *tc_bw) +{ + struct mlx5_esw_sched_node *vports_tc_node; + + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) + tc_bw[vports_tc_node->tc] =3D vports_tc_node->bw_share; +} + +static void +esw_qos_set_tc_arbiter_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_no= de, + u32 *tc_bw, struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vports_tc_node; + + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { + u32 bw_share; + u8 tc; + + tc =3D vports_tc_node->tc; + bw_share =3D tc_bw[tc] ?: MLX5_MIN_BW_SHARE; + esw_qos_sched_elem_config(vports_tc_node, 0, bw_share, extack); + } +} + +static void +esw_qos_destroy_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_nod= e, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vports_tc_node, *tmp; + + list_for_each_entry_safe(vports_tc_node, tmp, + &tc_arbiter_node->children, entry) + esw_qos_destroy_node(vports_tc_node, extack); +} + +static int +esw_qos_create_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node, + struct netlink_ext_ack *extack) +{ + struct mlx5_eswitch *esw =3D tc_arbiter_node->esw; + int err, i, num_tcs =3D esw_qos_num_tcs(esw->dev); + + for (i =3D 0; i < num_tcs; i++) { + err =3D esw_qos_create_vports_tc_node(tc_arbiter_node, i, extack); + if (err) + goto err_tc_node_create; + } + + return 0; + +err_tc_node_create: + esw_qos_destroy_vports_tc_nodes(tc_arbiter_node, NULL); + return err; +} + +static int esw_qos_create_tc_arbiter_sched_elem( + struct mlx5_esw_sched_node *tc_arbiter_node, + struct netlink_ext_ack *extack) +{ + u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] =3D {}; + u32 tsar_parent_ix; + void *attr; + + if (!mlx5_qos_tsar_type_supported(tc_arbiter_node->esw->dev, + TSAR_ELEMENT_TSAR_TYPE_TC_ARB, + SCHEDULING_HIERARCHY_E_SWITCH)) { + NL_SET_ERR_MSG_MOD(extack, + "E-Switch TC Arbiter scheduling element is not supported"); + return -EOPNOTSUPP; + } + + attr =3D MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes); + MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_TC_ARB); + tsar_parent_ix =3D tc_arbiter_node->parent ? tc_arbiter_node->parent->ix : + tc_arbiter_node->esw->qos.root_tsar_ix; + MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, + tsar_parent_ix); + MLX5_SET(scheduling_context, tsar_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR); + MLX5_SET(scheduling_context, tsar_ctx, max_average_bw, + tc_arbiter_node->max_rate); + MLX5_SET(scheduling_context, tsar_ctx, bw_share, + tc_arbiter_node->bw_share); + + return esw_qos_node_create_sched_element(tc_arbiter_node, tsar_ctx, + extack); +} + static struct mlx5_esw_sched_node * __esw_qos_create_vports_sched_node(struct mlx5_eswitch *esw, struct mlx5_e= sw_sched_node *parent, struct netlink_ext_ack *extack) @@ -591,6 +742,9 @@ static void __esw_qos_destroy_node(struct mlx5_esw_sche= d_node *node, struct netl { struct mlx5_eswitch *esw =3D node->esw; =20 + if (node->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) + esw_qos_destroy_vports_tc_nodes(node, extack); + trace_mlx5_esw_node_qos_destroy(esw->dev, node, node->ix); esw_qos_destroy_node(node, extack); esw_qos_normalize_min_rate(esw, NULL, extack); @@ -685,13 +839,38 @@ static void esw_qos_put(struct mlx5_eswitch *esw) static void esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node, struct netlink_ext_ack *extack) -{} +{ + /* Clean up all Vports TC nodes within the TC arbiter node. */ + esw_qos_destroy_vports_tc_nodes(node, extack); + /* Destroy the scheduling element for the TC arbiter node itself. */ + esw_qos_node_destroy_sched_element(node, extack); +} =20 static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node = *node, struct netlink_ext_ack *extack) { - NL_SET_ERR_MSG_MOD(extack, "TC arbiter elements are not supported."); - return -EOPNOTSUPP; + u32 curr_ix =3D node->ix; + int err; + + err =3D esw_qos_create_tc_arbiter_sched_elem(node, extack); + if (err) + return err; + /* Initialize the vports TC nodes within created TC arbiter TSAR. */ + err =3D esw_qos_create_vports_tc_nodes(node, extack); + if (err) + goto err_vports_tc_nodes; + + node->type =3D SCHED_NODE_TYPE_TC_ARBITER_TSAR; + + return 0; + +err_vports_tc_nodes: + /* If initialization fails, clean up the scheduling element + * for the TC arbiter node. + */ + esw_qos_node_destroy_sched_element(node, NULL); + node->ix =3D curr_ix; + return err; } =20 static int @@ -1064,6 +1243,7 @@ static int esw_qos_vport_update(struct mlx5_vport *vp= ort, { struct mlx5_esw_sched_node *curr_parent =3D vport->qos.sched_node->parent; enum sched_node_type curr_type =3D vport->qos.sched_node->type; + u32 curr_tc_bw[IEEE_8021QAZ_MAX_TCS] =3D {0}; int err; =20 esw_assert_qos_lock_held(vport->dev->priv.eswitch); @@ -1075,11 +1255,23 @@ static int esw_qos_vport_update(struct mlx5_vport *= vport, if (err) return err; =20 + if (curr_type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type =3D=3D = type) { + esw_qos_tc_arbiter_get_bw_shares(vport->qos.sched_node, + curr_tc_bw); + } + esw_qos_vport_disable(vport, extack); =20 err =3D esw_qos_vport_enable(vport, type, parent, extack); - if (err) + if (err) { esw_qos_vport_enable(vport, curr_type, curr_parent, NULL); + extack =3D NULL; + } + + if (curr_type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type =3D=3D = type) { + esw_qos_set_tc_arbiter_bw_shares(vport->qos.sched_node, + curr_tc_bw, extack); + } =20 return err; } @@ -1563,6 +1755,8 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devli= nk_rate *rate_leaf, SCHED_NODE_TYPE_TC_ARBITER_TSAR, NULL, extack); } + if (!err) + esw_qos_set_tc_arbiter_bw_shares(vport_node, tc_bw, extack); unlock: esw_qos_unlock(esw); return err; @@ -1592,6 +1786,8 @@ int mlx5_esw_devlink_rate_node_tc_bw_set(struct devli= nk_rate *rate_node, } =20 err =3D esw_qos_node_enable_tc_arbitration(node, extack); + if (!err) + esw_qos_set_tc_arbiter_bw_shares(node, tc_bw, extack); unlock: esw_qos_unlock(esw); return err; @@ -1716,6 +1912,15 @@ int mlx5_esw_devlink_rate_leaf_parent_set(struct dev= link_rate *devlink_rate, return mlx5_esw_qos_vport_update_parent(vport, node, extack); } =20 +static bool esw_qos_is_node_empty(struct mlx5_esw_sched_node *node) +{ + return list_empty(&node->children) || + (node->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR && + esw_qos_is_node_empty( + list_first_entry(&node->children, + struct mlx5_esw_sched_node, entry))); +} + static int mlx5_esw_qos_node_validate_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_sched_node *parent, @@ -1729,13 +1934,26 @@ mlx5_esw_qos_node_validate_set_parent(struct mlx5_e= sw_sched_node *node, return -EOPNOTSUPP; } =20 - if (!list_empty(&node->children)) { + if (!esw_qos_is_node_empty(node)) { NL_SET_ERR_MSG_MOD(extack, "Cannot reassign a node that contains rate objects"); return -EOPNOTSUPP; } =20 + if (parent && parent->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot attach a node to a parent with TC bandwidth configured"); + return -EOPNOTSUPP; + } + new_level =3D parent ? parent->level + 1 : 2; + if (node->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + /* Increase by one to account for the vports TC scheduling + * element. + */ + new_level +=3D 1; + } + max_level =3D 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); if (new_level > max_level) { NL_SET_ERR_MSG_MOD(extack, @@ -1746,6 +1964,32 @@ mlx5_esw_qos_node_validate_set_parent(struct mlx5_es= w_sched_node *node, return 0; } =20 +static int +esw_qos_tc_arbiter_node_update_parent(struct mlx5_esw_sched_node *node, + struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *curr_parent =3D node->parent; + u32 curr_tc_bw[IEEE_8021QAZ_MAX_TCS] =3D {0}; + struct mlx5_eswitch *esw =3D node->esw; + int err; + + esw_qos_tc_arbiter_get_bw_shares(node, curr_tc_bw); + esw_qos_tc_arbiter_scheduling_teardown(node, extack); + esw_qos_node_set_parent(node, parent); + err =3D esw_qos_tc_arbiter_scheduling_setup(node, extack); + if (err) { + esw_qos_node_set_parent(node, curr_parent); + if (esw_qos_tc_arbiter_scheduling_setup(node, extack)) { + esw_warn(esw->dev, "Node restore QoS failed\n"); + return err; + } + } + esw_qos_set_tc_arbiter_bw_shares(node, curr_tc_bw, extack); + + return err; +} + static int esw_qos_vports_node_update_parent(struct mlx5_esw_sched_node *n= ode, struct mlx5_esw_sched_node *parent, struct netlink_ext_ack *extack) @@ -1791,7 +2035,13 @@ static int mlx5_esw_qos_node_update_parent(struct ml= x5_esw_sched_node *node, =20 esw_qos_lock(esw); curr_parent =3D node->parent; - err =3D esw_qos_vports_node_update_parent(node, parent, extack); + if (node->type =3D=3D SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + err =3D esw_qos_tc_arbiter_node_update_parent(node, parent, + extack); + } else { + err =3D esw_qos_vports_node_update_parent(node, parent, extack); + } + if (err) goto out; =20 --=20 2.31.1