[PATCH net-next] netpoll: optimize struct layout for cache efficiency

Breno Leitao posted 1 patch 8 months, 3 weeks ago
include/linux/netpoll.h | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
[PATCH net-next] netpoll: optimize struct layout for cache efficiency
Posted by Breno Leitao 8 months, 3 weeks ago
The struct netpoll serves two distinct purposes: it contains
configuration data needed only during setup (in netpoll_setup()), and
runtime data that's accessed on every packet transmission (in
netpoll_send_udp()).

Currently, this structure spans three cache lines with suboptimal
organization, where frequently accessed fields are mixed with rarely
accessed ones.

This commit reorganizes the structure to place all runtime fields used
during packet transmission together in the first cache line, while
moving the setup-only configuration fields to subsequent cache lines.
This approach follows the principle of placing hot fields together for
better cache locality during the performance-critical path.

The restructuring also eliminates structural inefficiencies, reducing
the number of holes. This provides a more compact memory layout while
maintaining the same functionality, resulting in better cache
utilization and potentially improves performance during packet
transmission operations.

  -   /* sum members: 137, holes: 3, sum holes: 7 */
  +   /* sum members: 137, holes: 1, sum holes: 3 */

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/netpoll.h | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 0477208ed9ffa..a8de41d84be52 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -24,7 +24,16 @@ union inet_addr {
 
 struct netpoll {
 	struct net_device *dev;
+	u16 local_port, remote_port;
 	netdevice_tracker dev_tracker;
+	union inet_addr local_ip, remote_ip;
+	bool ipv6;
+
+	/* Hot fields above */
+
+	const char *name;
+	struct sk_buff_head skb_pool;
+	struct work_struct refill_wq;
 	/*
 	 * Either dev_name or dev_mac can be used to specify the local
 	 * interface - dev_name is used if it is a nonempty string, else
@@ -32,14 +41,7 @@ struct netpoll {
 	 */
 	char dev_name[IFNAMSIZ];
 	u8 dev_mac[ETH_ALEN];
-	const char *name;
-
-	union inet_addr local_ip, remote_ip;
-	bool ipv6;
-	u16 local_port, remote_port;
 	u8 remote_mac[ETH_ALEN];
-	struct sk_buff_head skb_pool;
-	struct work_struct refill_wq;
 };
 
 struct netpoll_info {

---
base-commit: bfc17c1658353f22843c7c13e27c2d31950f1887
change-id: 20250324-netpoll_structstruct-6ff56d1cc4d8

Best regards,
-- 
Breno Leitao <leitao@debian.org>
Re: [PATCH net-next] netpoll: optimize struct layout for cache efficiency
Posted by Jakub Kicinski 8 months, 3 weeks ago
On Mon, 24 Mar 2025 05:29:13 -0700 Breno Leitao wrote:
> The struct netpoll serves two distinct purposes: it contains
> configuration data needed only during setup (in netpoll_setup()), and
> runtime data that's accessed on every packet transmission (in
> netpoll_send_udp()).
> 
> Currently, this structure spans three cache lines with suboptimal
> organization, where frequently accessed fields are mixed with rarely
> accessed ones.
> 
> This commit reorganizes the structure to place all runtime fields used
> during packet transmission together in the first cache line, while
> moving the setup-only configuration fields to subsequent cache lines.
> This approach follows the principle of placing hot fields together for
> better cache locality during the performance-critical path.
> 
> The restructuring also eliminates structural inefficiencies, reducing
> the number of holes. This provides a more compact memory layout while
> maintaining the same functionality, resulting in better cache
> utilization and potentially improves performance during packet
> transmission operations.

Netpoll shouldn't send too many packets, "not too many" for networking
means >100kpps. So I don't think the hot / close split matters?

> 
>   -   /* sum members: 137, holes: 3, sum holes: 7 */
>   +   /* sum members: 137, holes: 1, sum holes: 3 */
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  include/linux/netpoll.h | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> index 0477208ed9ffa..a8de41d84be52 100644
> --- a/include/linux/netpoll.h
> +++ b/include/linux/netpoll.h
> @@ -24,7 +24,16 @@ union inet_addr {
>  
>  struct netpoll {
>  	struct net_device *dev;
> +	u16 local_port, remote_port;
>  	netdevice_tracker dev_tracker;

It's a little odd to leave the tracker in hot data, if you do it
should at least be adjacent to the pointer it tracks?

> +	union inet_addr local_ip, remote_ip;
> +	bool ipv6;
-- 
pw-bot: cr
Re: [PATCH net-next] netpoll: optimize struct layout for cache efficiency
Posted by Breno Leitao 8 months, 3 weeks ago
On Tue, Mar 25, 2025 at 08:48:38AM -0700, Jakub Kicinski wrote:
> On Mon, 24 Mar 2025 05:29:13 -0700 Breno Leitao wrote:
> > The struct netpoll serves two distinct purposes: it contains
> > configuration data needed only during setup (in netpoll_setup()), and
> > runtime data that's accessed on every packet transmission (in
> > netpoll_send_udp()).
> > 
> > Currently, this structure spans three cache lines with suboptimal
> > organization, where frequently accessed fields are mixed with rarely
> > accessed ones.
> > 
> > This commit reorganizes the structure to place all runtime fields used
> > during packet transmission together in the first cache line, while
> > moving the setup-only configuration fields to subsequent cache lines.
> > This approach follows the principle of placing hot fields together for
> > better cache locality during the performance-critical path.
> > 
> > The restructuring also eliminates structural inefficiencies, reducing
> > the number of holes. This provides a more compact memory layout while
> > maintaining the same functionality, resulting in better cache
> > utilization and potentially improves performance during packet
> > transmission operations.
> 
> Netpoll shouldn't send too many packets, "not too many" for networking
> means >100kpps. So I don't think the hot / close split matters?

I see your point. The gain is going to be marginal given the frequency
this netpoll is supposed to be called, for sure.

On the other side, I think this is still better than the current state,
given: 

 * it has no adverse effect
 * potential marginal performance win
 * structure packing, potentially saving 2 bytes.

> >   -   /* sum members: 137, holes: 3, sum holes: 7 */
> >   +   /* sum members: 137, holes: 1, sum holes: 3 */
> > 
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  include/linux/netpoll.h | 16 +++++++++-------
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> > index 0477208ed9ffa..a8de41d84be52 100644
> > --- a/include/linux/netpoll.h
> > +++ b/include/linux/netpoll.h
> > @@ -24,7 +24,16 @@ union inet_addr {
> >  
> >  struct netpoll {
> >  	struct net_device *dev;
> > +	u16 local_port, remote_port;
> >  	netdevice_tracker dev_tracker;
> 
> It's a little odd to leave the tracker in hot data, if you do it
> should at least be adjacent to the pointer it tracks?

Double-checking this better, netdevice_tracker is NOT on the hot path.
It is only used on the setup functions.

If you think this is not a total waste of time, I will send a v2 moving
it to the bottom.

Thanks for your review,
--breno
Re: [PATCH net-next] netpoll: optimize struct layout for cache efficiency
Posted by Jakub Kicinski 8 months, 3 weeks ago
On Tue, 25 Mar 2025 09:50:21 -0700 Breno Leitao wrote:
> > It's a little odd to leave the tracker in hot data, if you do it
> > should at least be adjacent to the pointer it tracks?  
> 
> Double-checking this better, netdevice_tracker is NOT on the hot path.
> It is only used on the setup functions.
> 
> If you think this is not a total waste of time, I will send a v2 moving
> it to the bottom.

I'd keep it next to dev. A bit of packing to save space is fine, 
but let's prioritize readability during the reshuffle.
Note that net-next is closed for the MW.