[PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()

Breno Leitao posted 1 patch 1 month, 3 weeks ago
include/net/route.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
[PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by Breno Leitao 1 month, 3 weeks ago
Branch annotation traces from approximately 200 IPv6-enabled hosts
revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
mispredicted. Given the increasing prevalence of IPv6 in modern networks,
this commit adjusts the function to favor the IPv6 path.

Swap the order of the conditional statements and move the 'likely'
annotation to the IPv6 case. This change aims to improve performance in
IPv6-dominant environments by reducing branch mispredictions.

This optimization aligns with the trend of IPv6 becoming the default IP
version in many deployments, and should benefit modern network
configurations.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/net/route.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 1789f1e6640b..b90b7b1effb8 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
 	struct net_device *dev = rt->dst.dev;
 	struct neighbour *neigh;
 
-	if (likely(rt->rt_gw_family == AF_INET)) {
-		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
-	} else if (rt->rt_gw_family == AF_INET6) {
+	if (likely(rt->rt_gw_family == AF_INET6)) {
 		neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
 		*is_v6gw = true;
+	} else if (rt->rt_gw_family == AF_INET) {
+		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
 	} else {
 		neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
 	}
-- 
2.43.5
Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by David Ahern 1 month, 3 weeks ago
On 10/4/24 10:27 AM, Breno Leitao wrote:
> Branch annotation traces from approximately 200 IPv6-enabled hosts
> revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> this commit adjusts the function to favor the IPv6 path.
> 
> Swap the order of the conditional statements and move the 'likely'
> annotation to the IPv6 case. This change aims to improve performance in
> IPv6-dominant environments by reducing branch mispredictions.
> 
> This optimization aligns with the trend of IPv6 becoming the default IP
> version in many deployments, and should benefit modern network
> configurations.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  include/net/route.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/route.h b/include/net/route.h
> index 1789f1e6640b..b90b7b1effb8 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
>  	struct net_device *dev = rt->dst.dev;
>  	struct neighbour *neigh;
>  
> -	if (likely(rt->rt_gw_family == AF_INET)) {
> -		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> -	} else if (rt->rt_gw_family == AF_INET6) {
> +	if (likely(rt->rt_gw_family == AF_INET6)) {
>  		neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
>  		*is_v6gw = true;
> +	} else if (rt->rt_gw_family == AF_INET) {
> +		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
>  	} else {
>  		neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
>  	}

This is an IPv4 function allowing support for IPv6 addresses as a
nexthop. It is appropriate for IPv4 family checks to be first.
Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by Breno Leitao 1 month, 3 weeks ago
Hello David,

On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> On 10/4/24 10:27 AM, Breno Leitao wrote:
> > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > this commit adjusts the function to favor the IPv6 path.
> > 
> > Swap the order of the conditional statements and move the 'likely'
> > annotation to the IPv6 case. This change aims to improve performance in
> > IPv6-dominant environments by reducing branch mispredictions.
> > 
> > This optimization aligns with the trend of IPv6 becoming the default IP
> > version in many deployments, and should benefit modern network
> > configurations.
> > 
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  include/net/route.h | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/net/route.h b/include/net/route.h
> > index 1789f1e6640b..b90b7b1effb8 100644
> > --- a/include/net/route.h
> > +++ b/include/net/route.h
> > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> >  	struct net_device *dev = rt->dst.dev;
> >  	struct neighbour *neigh;
> >  
> > -	if (likely(rt->rt_gw_family == AF_INET)) {
> > -		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > -	} else if (rt->rt_gw_family == AF_INET6) {
> > +	if (likely(rt->rt_gw_family == AF_INET6)) {
> >  		neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> >  		*is_v6gw = true;
> > +	} else if (rt->rt_gw_family == AF_INET) {
> > +		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> >  	} else {
> >  		neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> >  	}
> 
> This is an IPv4 function allowing support for IPv6 addresses as a
> nexthop. It is appropriate for IPv4 family checks to be first.

Right. In which case is this called on IPv6 only systems?

On my IPv6-only 200 systems, the annotated branch predictor is showing
it is mispredicted 100% of the time.

Thanks for the review
--breno
Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by Paolo Abeni 1 month, 2 weeks ago
On 10/4/24 19:37, Breno Leitao wrote:
> On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
>> On 10/4/24 10:27 AM, Breno Leitao wrote:
>>> Branch annotation traces from approximately 200 IPv6-enabled hosts
>>> revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
>>> mispredicted. Given the increasing prevalence of IPv6 in modern networks,
>>> this commit adjusts the function to favor the IPv6 path.
>>>
>>> Swap the order of the conditional statements and move the 'likely'
>>> annotation to the IPv6 case. This change aims to improve performance in
>>> IPv6-dominant environments by reducing branch mispredictions.
>>>
>>> This optimization aligns with the trend of IPv6 becoming the default IP
>>> version in many deployments, and should benefit modern network
>>> configurations.
>>>
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>>   include/net/route.h | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/net/route.h b/include/net/route.h
>>> index 1789f1e6640b..b90b7b1effb8 100644
>>> --- a/include/net/route.h
>>> +++ b/include/net/route.h
>>> @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
>>>   	struct net_device *dev = rt->dst.dev;
>>>   	struct neighbour *neigh;
>>>   
>>> -	if (likely(rt->rt_gw_family == AF_INET)) {
>>> -		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
>>> -	} else if (rt->rt_gw_family == AF_INET6) {
>>> +	if (likely(rt->rt_gw_family == AF_INET6)) {
>>>   		neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
>>>   		*is_v6gw = true;
>>> +	} else if (rt->rt_gw_family == AF_INET) {
>>> +		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
>>>   	} else {
>>>   		neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
>>>   	}
>>
>> This is an IPv4 function allowing support for IPv6 addresses as a
>> nexthop. It is appropriate for IPv4 family checks to be first.
> 
> Right. In which case is this called on IPv6 only systems?
> 
> On my IPv6-only 200 systems, the annotated branch predictor is showing
> it is mispredicted 100% of the time.

perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag; 
perf script

should give you an hint.

Cheers,

Paolo
Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by Breno Leitao 1 month, 2 weeks ago
Hello Paolo,

On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> On 10/4/24 19:37, Breno Leitao wrote:
> > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > this commit adjusts the function to favor the IPv6 path.
> > > > 
> > > > Swap the order of the conditional statements and move the 'likely'
> > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > 
> > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > version in many deployments, and should benefit modern network
> > > > configurations.
> > > > 
> > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > ---
> > > >   include/net/route.h | 6 +++---
> > > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > --- a/include/net/route.h
> > > > +++ b/include/net/route.h
> > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > >   	struct net_device *dev = rt->dst.dev;
> > > >   	struct neighbour *neigh;
> > > > -	if (likely(rt->rt_gw_family == AF_INET)) {
> > > > -		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > -	} else if (rt->rt_gw_family == AF_INET6) {
> > > > +	if (likely(rt->rt_gw_family == AF_INET6)) {
> > > >   		neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > >   		*is_v6gw = true;
> > > > +	} else if (rt->rt_gw_family == AF_INET) {
> > > > +		neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > >   	} else {
> > > >   		neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > >   	}
> > > 
> > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > nexthop. It is appropriate for IPv4 family checks to be first.
> > 
> > Right. In which case is this called on IPv6 only systems?
> > 
> > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > it is mispredicted 100% of the time.
> 
> perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> perf script
> 
> should give you an hint.

Thanks. That proved to be very useful.

As I said above, all the hosts I have a webserver running, I see this
that likely mispredicted. Same for this server:

	# cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
	 correct incorrect  %        Function                  File              Line
	       0    17127 100 ip_neigh_for_gw                route.h              393

It is mostly coming from ip_finish_output2() and tcp_v4. Important to
say that these machine has no IPv4 configured, except 127.0.0.1
(localhost).

Output of `perf script`:

	curl 3284017 [020] 342043.646674: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3284017 [020] 342043.646720: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3286356 [026] 342055.690384: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3288713 [032] 342103.631991: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3288713 [032] 342103.632039: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3289126 [021] 342115.725482: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3291018 [030] 342163.627633: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3291018 [030] 342163.627673: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3291256 [031] 342175.683527: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3293421 [025] 342223.618198: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3293421 [025] 342223.618239: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3293659 [034] 342235.695019: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3295399 [012] 342283.632642: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3295399 [012] 342283.632691: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3295746 [001] 342295.712436: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3298603 [020] 342343.608814: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3298603 [020] 342343.608858: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3299252 [032] 342355.693816: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3303255 [033] 342403.616685: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3303255 [033] 342403.616729: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3304989 [011] 342415.740580: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3312952 [035] 342463.633808: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3312952 [035] 342463.633859: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3314546 [032] 342475.766762: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)

	curl 3321983 [006] 342523.654221: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	curl 3321983 [006] 342523.654262: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_unicast_reply+0x3fc 
		 tcp_v4_send_reset+0x668 
		 tcp_v4_rcv+0xcdf 
		 ip_protocol_deliver_rcu+0x10e 
		 ip_local_deliver_finish+0x97 
		 ip_local_deliver+0x43 
		 ip_rcv+0x35 
		 process_backlog+0x1b8 
		 __napi_poll+0x30 
		 net_rx_action+0x180 
		 __kprobes_text_end+0xf2 
		 __local_bh_enable_ip+0xeb 
		 __dev_queue_xmit+0xc18 
		 ip_finish_output2+0x63a 
		 ip_output+0x73 
		 __ip_queue_xmit+0x504 
		 __tcp_transmit_skb+0xcfe 
		 tcp_connect+0xa1d 
		 tcp_v4_connect+0x463 
		 __inet_stream_connect+0x5b 
		 inet_stream_connect+0x36 
		 __sys_connect+0x8d 
		 __x64_sys_connect+0x16 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_connect+0x4b (/usr/lib64/libc.so.6)
			       0 [unknown] ([unknown])

	isc-net-0000 3323932 [025] 342535.718587: probe:ip_neigh_for_gw: ()
		 ip_finish_output2+0x150 
		 ip_output+0x73 
		 ip_send_skb+0x15 
		 udp_send_skb+0xd2 
		 udp_sendmsg+0xaa7 
		 __x64_sys_sendmsg+0x338 
		 do_syscall_64+0xc2 
		 entry_SYSCALL_64_after_hwframe+0x4b 
		    __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by Eric Dumazet 1 month, 2 weeks ago
On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Paolo,
>
> On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> > On 10/4/24 19:37, Breno Leitao wrote:
> > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > > this commit adjusts the function to favor the IPv6 path.
> > > > >
> > > > > Swap the order of the conditional statements and move the 'likely'
> > > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > >
> > > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > > version in many deployments, and should benefit modern network
> > > > > configurations.
> > > > >
> > > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > > ---
> > > > >   include/net/route.h | 6 +++---
> > > > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > > --- a/include/net/route.h
> > > > > +++ b/include/net/route.h
> > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > >         struct net_device *dev = rt->dst.dev;
> > > > >         struct neighbour *neigh;
> > > > > -       if (likely(rt->rt_gw_family == AF_INET)) {
> > > > > -               neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > -       } else if (rt->rt_gw_family == AF_INET6) {
> > > > > +       if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > >                 neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > >                 *is_v6gw = true;
> > > > > +       } else if (rt->rt_gw_family == AF_INET) {
> > > > > +               neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > >         } else {
> > > > >                 neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > >         }
> > > >
> > > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > > nexthop. It is appropriate for IPv4 family checks to be first.
> > >
> > > Right. In which case is this called on IPv6 only systems?
> > >
> > > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > > it is mispredicted 100% of the time.
> >
> > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> > perf script
> >
> > should give you an hint.
>
> Thanks. That proved to be very useful.
>
> As I said above, all the hosts I have a webserver running, I see this
> that likely mispredicted. Same for this server:
>
>         # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
>          correct incorrect  %        Function                  File              Line
>                0    17127 100 ip_neigh_for_gw                route.h              393
>
> It is mostly coming from ip_finish_output2() and tcp_v4. Important to
> say that these machine has no IPv4 configured, except 127.0.0.1
> (localhost).

Now run the experiment on a typical server using IPv4 ?

I would advise removing the likely() if it really bothers you.
(I doubt this has any impact)

But assuming everything is IPv6 is too soon.

There are more obvious changes like :

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea
100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
*skb, struct net *net)
        /* When the interface is in promisc. mode, drop all the crap
         * that it receives, do not try to analyse it.
         */
-       if (skb->pkt_type == PACKET_OTHERHOST) {
+       if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
                dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
                drop_reason = SKB_DROP_REASON_OTHERHOST;
                goto drop;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60
100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff
*skb, struct net_device *dev,
        u32 pkt_len;
        struct inet6_dev *idev;

-       if (skb->pkt_type == PACKET_OTHERHOST) {
+       if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
                dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
                kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST);
                return NULL;
Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
Posted by Breno Leitao 1 month, 2 weeks ago
Hello Eric,

On Tue, Oct 08, 2024 at 04:15:37PM +0200, Eric Dumazet wrote:
> On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@debian.org> wrote:
> >
> > Hello Paolo,
> >
> > On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> > > On 10/4/24 19:37, Breno Leitao wrote:
> > > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > > > this commit adjusts the function to favor the IPv6 path.
> > > > > >
> > > > > > Swap the order of the conditional statements and move the 'likely'
> > > > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > > >
> > > > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > > > version in many deployments, and should benefit modern network
> > > > > > configurations.
> > > > > >
> > > > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > > > ---
> > > > > >   include/net/route.h | 6 +++---
> > > > > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > > > --- a/include/net/route.h
> > > > > > +++ b/include/net/route.h
> > > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > > >         struct net_device *dev = rt->dst.dev;
> > > > > >         struct neighbour *neigh;
> > > > > > -       if (likely(rt->rt_gw_family == AF_INET)) {
> > > > > > -               neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > > -       } else if (rt->rt_gw_family == AF_INET6) {
> > > > > > +       if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > > >                 neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > > >                 *is_v6gw = true;
> > > > > > +       } else if (rt->rt_gw_family == AF_INET) {
> > > > > > +               neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > >         } else {
> > > > > >                 neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > > >         }
> > > > >
> > > > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > > > nexthop. It is appropriate for IPv4 family checks to be first.
> > > >
> > > > Right. In which case is this called on IPv6 only systems?
> > > >
> > > > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > > > it is mispredicted 100% of the time.
> > >
> > > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> > > perf script
> > >
> > > should give you an hint.
> >
> > Thanks. That proved to be very useful.
> >
> > As I said above, all the hosts I have a webserver running, I see this
> > that likely mispredicted. Same for this server:
> >
> >         # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
> >          correct incorrect  %        Function                  File              Line
> >                0    17127 100 ip_neigh_for_gw                route.h              393
> >
> > It is mostly coming from ip_finish_output2() and tcp_v4. Important to
> > say that these machine has no IPv4 configured, except 127.0.0.1
> > (localhost).
> 
> Now run the experiment on a typical server using IPv4 ?
> 
> I would advise removing the likely() if it really bothers you.
> (I doubt this has any impact)

Thanks. I am mostly concerned about likely/unlikely() that are wrong
100% the time when running production workloads in modern hardware.

I just got a few hundreds host to able to run annotated
branches enabled, and I am looking on how it can help us to understand
our code flow better.

Regarding performance impact, I agree with you that performance is
minimal (at least on x86). On other architectures, such as powerpc
things can be more evident, given that the branch hint is encoded in the
instruction itself, so, the hardware knows where to predict.

That said, I think the best approach is just to remove the likely() from
that code path.

> But assuming everything is IPv6 is too soon.
> 
> There are more obvious changes like :
> 
> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea
> 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
> *skb, struct net *net)
>         /* When the interface is in promisc. mode, drop all the crap
>          * that it receives, do not try to analyse it.
>          */
> -       if (skb->pkt_type == PACKET_OTHERHOST) {
> +       if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
>                 dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
>                 drop_reason = SKB_DROP_REASON_OTHERHOST;
>                 goto drop;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60
> 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff
> *skb, struct net_device *dev,
>         u32 pkt_len;
>         struct inet6_dev *idev;
> 
> -       if (skb->pkt_type == PACKET_OTHERHOST) {
> +       if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
>                 dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
>                 kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST);
>                 return NULL;

Agree, that would be obvious changes also.