From nobody Mon Jun 8 04:24:53 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA66C3382F7 for ; Tue, 2 Jun 2026 11:53:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780401224; cv=none; b=aGUdUaIf7nAUAbqjuup3KPqhyiA6+pkWWUnzUnCYN6CcHHb0VvsnUcKeRK/fnaILEcC13QiYTKqbh6J0GuQpd4zcHWmD5OrjV1Ax9i1Zzafl9ffSeU/srEmrmC0fEJkFZPcFawpSPG5pIiih6FNLJ9gaycGCJAUSr8EPtIUttHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780401224; c=relaxed/simple; bh=6hoxYQ6ECyiEypWAOot1taMyVCa+DupI9Pw4V+YtaPE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=BpI3EIL70Xp5GaAMe315Q3SyX8iCHYnkkEdzeUt2WOusj2EhE7CnudreKcB/AIa/6okYnoQbAi7sfle3lEJZNU10Vkt3XmPZXZWBH3pMW8mRbo0eDdfNrPnKL7OZX1tKJNsjbZImcygWKPKVgqH36EOpnMYc2gVlRt4KWhm2f2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MANidmZK; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MANidmZK" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2c0c2d8b95bso15853025ad.1 for ; Tue, 02 Jun 2026 04:53:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780401219; x=1781006019; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=O08IdHXkcmFeKKbDayNq83iJPwpzLQQOHMfqAhn+3Ac=; b=MANidmZKe9U1WoR46UPI3gCTIf8ZYumOBygiKiMtv2AqLwMyp+7RHAIpJwLOY97dCS 6RxliTzjdRPql/KfyEhe99vc6CO1Pci3yI9IRR3BlvHclQxWROFsP11nOZe2NpaPwGxH sSqu1yX+DVhGx6ELp0061J7k9BBgj11zlZsgOAvVzFacY1Wi8zIlQ4gB44eiu16ShjAQ 6f0nGXjvKva+44J3Cw24u1UjRBVyhG5LprxuWe1fcd4VFHcEl6O/s0v0ucembO1z83rG NyeWs1QBBJIoUAy6kMmi0m85EiXHCrjBUBbURUt4WnfyxVCMrZBd6q/pY2gp6TeV4Wal FE/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780401219; x=1781006019; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=O08IdHXkcmFeKKbDayNq83iJPwpzLQQOHMfqAhn+3Ac=; b=E8fGwBuOwhNTCjaC8yNE8uN4PGbx1YpJyJVBjPR5ZpbpO54SuAvk1BOXPnAO06hSuS SYh6dUbcAT/hIRkgKpFbmghGztR1xIENCwh2WRBeqHymWsjHr40Emn/Gmb+zAKsv/gNn qxXOY5FDyQe/73GqOhwSsaa1E6r4ZihkOwlagNId3LAcAsH8NIk2wXrBQNHjhtmGT5IL D75bxfrBump1YyB/AhSr3SfL+MJzYiBmBH6MwunYgvZMEHiau0xzSRvbLzno+G5fv+7G Ni8CBGDFbFY9UKVYOpEkUeyvqwBmNWNqc3O8SqILNw9r/oHDu0x1hfZ9hK2rcs5CcLBz l1mA== X-Forwarded-Encrypted: i=1; AFNElJ9WuuO+8OLoO1K764GrvJB1f67zqP0+buSsHvurUTL7JmlzseB9LWB2oEF6CU3vR5kO7Ntf83Tl/6/WY3I=@vger.kernel.org X-Gm-Message-State: AOJu0YwAu7LcHe+Pa7UEx45nAYgQLB/8N6lbDHakpAbiNO+I8kM5A3AZ 1o3YzyMo5XjjeYPILljpuezQdfMmgmZcFCAI6Y7mpgPonUom2kPdJJce X-Gm-Gg: Acq92OHApVaIu6exdg9gCI2bCj7exTrvugUVO6N/blAEVOE0rsFItP8KPaeBVnlgRMG Q1c70tVYNb3wEHz7bdH3zAiCYXZUZN+aZdbFSYXPWE70WUoAqYI5nPiWvhAaMi9NLTLJA7lDhOC D7F1tjZV7MVx74k/7oKPdngBN9fvR+HdJA7g4hbpo4puyEVx4q6+SteQCx4imTpDjIb5FzXEGbZ MjbS1obm64dZ4Km1SGJCRbh8rb8Un5bXB+E6jQ9qsi1JgHFTAdV7kwtv4vlWKZxjGVTVcXeldVB GHxxGjcbbFzpu3Qq5XVYP35HVLcLYPI17EJ5xBUxlcsp15MqVZaGfS9pQpI+XsggcWhdGFNHHRV SEdEnRKC7lUMBTPUQYesqvqAYxGEQpZ4ouq605Vv/WuiGxiWekQvcjRUJijHnZ45EXp19Zao0Uj kIc4qMuh1UqyIaluuL8uCPuwi5pEFSEc3gIlNs0FRsoUmbH3sb6F5iBwHS1ykOWwWSqFuQeB/69 A== X-Received: by 2002:a17:903:28d:b0:2c0:a555:80d6 with SMTP id d9443c01a7336-2c0a5559517mr152672505ad.2.1780401219060; Tue, 02 Jun 2026 04:53:39 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1a42:147:98fb:6fce:8020:cb09]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bf23b011d3sm135472555ad.52.2026.06.02.04.53.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jun 2026 04:53:38 -0700 (PDT) From: Chuang Wang To: Cc: Chuang Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Stanislav Fomichev , Kuniyuki Iwashima , Samiullah Khawaja , Hangbin Liu , Neal Cardwell , Shakeel Butt , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v6] net: reduce RFS/ARFS flow updates by checking LLC affinity Date: Tue, 2 Jun 2026 19:53:10 +0800 Message-ID: <20260602115323.3502-1-nashuiliang@gmail.com> X-Mailer: git-send-email 2.50.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The current implementation of rps_record_sock_flow() updates the flow table every time a socket is processed on a different CPU. In high-load scenarios, especially with Accelerated RFS (ARFS), this triggers frequent flow steering updates via ndo_rx_flow_steer. For drivers like mlx5 that implement hardware flow steering, these constant updates lead to significant contention on internal driver locks (e.g., arfs_lock). This contention often becomes a performance bottleneck that outweighs the steering benefits. This patch introduces a cache-aware update strategy: the flow record is only updated if the flow migrates across Last Level Cache (LLC) boundaries. This minimizes expensive hardware reconfigurations while preserving cache locality for the application. A new sysctl, net.core.rps_feat_llc_affinity, is added to toggle this feature. Additionally, export sock_rps_record_flow_hash() and sock_rps_record_flow(). This resolves a symbol visibility compilation error triggered by 'tun' using sock_rps_record_flow_hash() in tun_flow_update() when CONFIG_TUN is built as a module. The same logic is applied to SCTP, allowing it to use sock_rps_record_flow() safely when built as a module. Performance Test Results: The patch was tested in a K8s environment (AMD CPU 128*2, 16-core Pod with CPU pinning, mlx5 NIC) using brpc[1] echo_server and rpc_press. rpc_press Commands: for i in {1..8}; do ./rpc_press -proto=3D./echo.proto -method=3Dexample.EchoService.Echo -server=3D:8000 -input=3D'{"message":"hello"}' -qps=3D0 -thread_num=3D512 -connection_type=3Dpooled & done Monitor mlx5e_rx_flow_steer frequency: /usr/share/bcc/tools/funccount -i 1 mlx5e_rx_flow_steer Frequency of mlx5e_rx_flow_steer (via funccount[2]): Before: ~335,000 counts/sec After: ~23,000 counts/sec (reduced by ~93%) System Metrics (after enabling rps_feat_llc_affinity): CPU Utilization: 38% -> 32% CPU PSI (Pressure Stall Information): 20% -> 10% These results demonstrate that filtering updates by LLC affinity significantly reduces driver lock contention and improves overall CPU efficiency under heavy network load. [1] https://github.com/apache/brpc/ [2] https://github.com/iovisor/bcc/blob/master/tools/funccount.py Signed-off-by: Chuang Wang --- v5 -> v6: - remove the multi-check 'old_val =3D=3D new_val' by Xuan Zhuo - fix 'modpost: "sock_rps_record_flow_hash" [drivers/net/tun.ko] undefined!= ' by kernel test robot - fix 'tcp.c:(.text+0x3e90): undefined reference to `sock_rps_record_flow''= by kernel test robot v4 -> v5: fix 'modpost: "rps_llc_check" [net/sctp/sctp.ko] undefined!' by k= ernel test robot v3 -> v4: add rps_llc_check by Eric Dumazet v2 -> v3: patch net -> net-next by Jakub Kicinski v1 -> v2: add rps_feat_llc_affinity; add brpc tests include/net/rps.h | 28 +++++---------- net/core/dev.c | 73 ++++++++++++++++++++++++++++++++++++++ net/core/sysctl_net_core.c | 35 ++++++++++++++++++ 3 files changed, 116 insertions(+), 20 deletions(-) diff --git a/include/net/rps.h b/include/net/rps.h index e33c6a2fa8bb..6dacf0888a6c 100644 --- a/include/net/rps.h +++ b/include/net/rps.h @@ -12,6 +12,7 @@ =20 extern struct static_key_false rps_needed; extern struct static_key_false rfs_needed; +extern struct static_key_false rps_feat_llc_affinity; =20 /* * This structure holds an RPS map which can be of variable length. The @@ -55,11 +56,14 @@ struct rps_sock_flow_table { =20 #define RPS_NO_CPU 0xffff =20 +bool rps_llc_check(u32 old_val, u32 new_val); + static inline void rps_record_sock_flow(rps_tag_ptr tag_ptr, u32 hash) { unsigned int index =3D hash & rps_tag_to_mask(tag_ptr); u32 val =3D hash & ~net_hotdata.rps_cpu_mask; struct rps_sock_flow_table *table; + u32 old_val; =20 /* We only give a hint, preemption can change CPU under us */ val |=3D raw_smp_processor_id(); @@ -68,7 +72,8 @@ static inline void rps_record_sock_flow(rps_tag_ptr tag_p= tr, u32 hash) /* The following WRITE_ONCE() is paired with the READ_ONCE() * here, and another one in get_rps_cpu(). */ - if (READ_ONCE(table[index].ent) !=3D val) + old_val =3D READ_ONCE(table[index].ent); + if (old_val !=3D val && rps_llc_check(old_val, val)) WRITE_ONCE(table[index].ent, val); } =20 @@ -136,25 +141,8 @@ static inline bool rfs_is_needed(void) #endif } =20 -static inline void sock_rps_record_flow_hash(__u32 hash) -{ -#ifdef CONFIG_RPS - if (!rfs_is_needed()) - return; - - _sock_rps_record_flow_hash(hash); -#endif -} - -static inline void sock_rps_record_flow(const struct sock *sk) -{ -#ifdef CONFIG_RPS - if (!rfs_is_needed()) - return; - - _sock_rps_record_flow(sk); -#endif -} +void sock_rps_record_flow_hash(__u32 hash); +void sock_rps_record_flow(const struct sock *sk); =20 static inline void sock_rps_delete_flow(const struct sock *sk) { diff --git a/net/core/dev.c b/net/core/dev.c index 26ac8eb9b259..53bad3c801dc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4997,6 +4997,8 @@ struct static_key_false rps_needed __read_mostly; EXPORT_SYMBOL(rps_needed); struct static_key_false rfs_needed __read_mostly; EXPORT_SYMBOL(rfs_needed); +struct static_key_false rps_feat_llc_affinity __read_mostly; +EXPORT_SYMBOL(rps_feat_llc_affinity); =20 static u32 rfs_slot(u32 hash, rps_tag_ptr tag_ptr) { @@ -5208,6 +5210,55 @@ static int get_rps_cpu(struct net_device *dev, struc= t sk_buff *skb, return cpu; } =20 +/** + * rps_llc_check - determine if RPS flow table should be updated. + * @old_val: previous flow record value. + * @new_val: target flow record value. + * + * Return: true if the record needs an update, false otherwise. + */ +bool rps_llc_check(u32 old_val, u32 new_val) +{ + u32 old_cpu =3D old_val & ~net_hotdata.rps_cpu_mask; + u32 new_cpu =3D new_val & ~net_hotdata.rps_cpu_mask; + + /* + * RPS LLC Affinity Feature: + * Reduce RFS/ARFS flow updates by checking LLC affinity. + * + * Frequent flow table updates can trigger constant hardware steering + * reconfigurations (e.g., ndo_rx_flow_steer), leading to significant + * contention on driver internal locks (like mlx5's arfs_lock). + * + * This strategy only updates the flow record if it migrates across LLC + * boundaries. This minimizes expensive hardware updates while preserving + * cache locality for the application. + */ + if (static_branch_unlikely(&rps_feat_llc_affinity)) { + /* Force update if the recorded CPU is invalid or has gone offline */ + if (old_cpu >=3D nr_cpu_ids || !cpu_active(old_cpu)) + return true; + + /* + * Force an update if the current task is no longer permitted + * to run on the old_cpu. + */ + if (!cpumask_test_cpu(old_cpu, current->cpus_ptr)) + return true; + + /* + * If CPUs do not share a cache, allow the update to prevent + * expensive remote memory accesses and cache misses. + */ + if (!cpus_share_cache(old_cpu, new_cpu)) + return true; + + return false; + } + + return true; +} + #ifdef CONFIG_RFS_ACCEL =20 /** @@ -5263,6 +5314,28 @@ static void rps_trigger_softirq(void *data) =20 #endif /* CONFIG_RPS */ =20 +void sock_rps_record_flow_hash(__u32 hash) +{ +#ifdef CONFIG_RPS + if (!rfs_is_needed()) + return; + + _sock_rps_record_flow_hash(hash); +#endif +} +EXPORT_SYMBOL(sock_rps_record_flow_hash); + +void sock_rps_record_flow(const struct sock *sk) +{ +#ifdef CONFIG_RPS + if (!rfs_is_needed()) + return; + + _sock_rps_record_flow(sk); +#endif +} +EXPORT_SYMBOL(sock_rps_record_flow); + /* Called from hardirq (IPI) context */ static void trigger_rx_softirq(void *data) { diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index b508618bfc12..b6d4ebcbb6a6 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -210,6 +210,33 @@ static int rps_sock_flow_sysctl(const struct ctl_table= *table, int write, kvfree_rcu_mightsleep(tofree); return ret; } + +static int rps_feat_llc_affinity_sysctl(const struct ctl_table *table, int= write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + u8 curr_state; + int ret; + const struct ctl_table tmp =3D { + .data =3D &curr_state, + .maxlen =3D sizeof(curr_state), + .mode =3D table->mode, + .extra1 =3D table->extra1, + .extra2 =3D table->extra2 + }; + + curr_state =3D static_branch_unlikely(&rps_feat_llc_affinity) ? 1 : 0; + + ret =3D proc_dou8vec_minmax(&tmp, write, buffer, lenp, ppos); + if (write && ret =3D=3D 0) { + if (curr_state && !static_branch_unlikely(&rps_feat_llc_affinity)) + static_branch_enable(&rps_feat_llc_affinity); + else if (!curr_state && static_branch_unlikely(&rps_feat_llc_affinity)) + static_branch_disable(&rps_feat_llc_affinity); + } + + return ret; +} + #endif /* CONFIG_RPS */ =20 #ifdef CONFIG_NET_FLOW_LIMIT @@ -554,6 +581,14 @@ static struct ctl_table net_core_table[] =3D { .mode =3D 0644, .proc_handler =3D rps_sock_flow_sysctl }, + { + .procname =3D "rps_feat_llc_affinity", + .maxlen =3D sizeof(u8), + .mode =3D 0644, + .proc_handler =3D rps_feat_llc_affinity_sysctl, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE + }, #endif #ifdef CONFIG_NET_FLOW_LIMIT { --=20 2.47.3