From nobody Thu Jan 1 08:55:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7461C25B6B for ; Tue, 24 Oct 2023 02:35:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232118AbjJXCfr (ORCPT ); Mon, 23 Oct 2023 22:35:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232038AbjJXCfn (ORCPT ); Mon, 23 Oct 2023 22:35:43 -0400 Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C155310CF for ; Mon, 23 Oct 2023 19:35:36 -0700 (PDT) Received: by mail-qt1-x829.google.com with SMTP id d75a77b69052e-41cc535cd5cso25474431cf.2 for ; Mon, 23 Oct 2023 19:35:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1698114936; x=1698719736; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PKCfFTctINsmr0NVINSOxPjj3BaFxpxN7lJJJNdp9c8=; b=BFLScDwTEr6QzcRuvCY5DWx9r1BhGYAvODy4EziPobaoQK4ghKYI/biHgDjwwm83OP h6jdOxNygtZzYKcKtXOtIN0t5uaX9++9+tSshRB4EggecGmTsRgVOQ1DIpAP5b+OwBrq hXODtU4Xd5/bT+0GKqorfqqJvEPfJxglbogtRHZxeEjTsQxohlyHEPMVzJRWVev9o+HM DJrEVPxLxunNXVLrgvAkssFHMgnHCaGz77crj1EgtIRGc85aYHDb5ENrHXNnP9ex/E1Z dWz87ua8w2zt/LvwKB72JDVcZ3nIzDztw6Sws6kAxGnAwM4tvyfcatcJ+6W5tF7s6Bgj 74DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698114936; x=1698719736; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PKCfFTctINsmr0NVINSOxPjj3BaFxpxN7lJJJNdp9c8=; b=rBw3Q7wia0ly3ymHxB88C+iHLmQqeHLyno3DkoPU2g1AuMTUxD7s8MY5O5CWhIRMXR GQwdpRiCoB7aufaF1ckPJl/gryL4h6Nrud59Sox8gOGETCU+TEO9I51YopkxzyITvKyA brIcy+66fqlo101Vd48E7mt/f7vkyGAt2AECzEUz9cZaGCzVWDKKAXiypEs5jxHGY8Ks My5au6HHBnAnN3pjt3VNdmY9jzDPdE90qekjlBJyWwwTFx7MH4H5QO4V8ovU5asALz40 JesnwSFiPXtAfOCZbmrMzhOQZQtkcAXWo3ztgfATawywgwCobwBlG1FtIczBatfBKKok OfJA== X-Gm-Message-State: AOJu0Yx5W1RVedCQBDL9NBhpU4hls4TsV7b/h6O9yobfswmsN14XjibE ayYp+Ym3kez+9TG6jDMALulIRQ== X-Google-Smtp-Source: AGHT+IGLQf5wDDE9ASY9v5Epd/4/zSZPQxEtDVBOB13XvEMesiBo3RkNeoRHN2FlrQCN3AAhTaALig== X-Received: by 2002:a05:622a:15cb:b0:418:bdf:f4b with SMTP id d11-20020a05622a15cb00b004180bdf0f4bmr13680615qty.6.1698114935795; Mon, 23 Oct 2023 19:35:35 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id f1-20020a05622a1a0100b0041cc25a75e5sm3188759qtb.77.2023.10.23.19.35.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 19:35:35 -0700 (PDT) Date: Mon, 23 Oct 2023 19:35:34 -0700 From: Yan Zhai To: netdev@vger.kernel.org Cc: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Aya Levin , Tariq Toukan , linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Florian Westphal , Willem de Bruijn , Alexander H Duyck Subject: [PATCH v4 net-next 1/3] ipv6: drop feature RTAX_FEATURE_ALLFRAG Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" RTAX_FEATURE_ALLFRAG was added before the first git commit: https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature would always test false at the moment, so remove related code or mark them as unused. Signed-off-by: Yan Zhai Reviewed-by: Eric Dumazet Reviewed-by: Florian Westphal --- V3 -> V4: cleaned up all RTAX_FEATURE_ALLFRAG code, rather than just drop the check at IPv6 output. --- include/net/dst.h | 7 ------- include/net/inet_connection_sock.h | 1 - include/net/inet_sock.h | 2 +- include/uapi/linux/rtnetlink.h | 2 +- net/ipv4/tcp_output.c | 20 +------------------- net/ipv6/ip6_output.c | 15 ++------------- net/ipv6/tcp_ipv6.c | 1 - net/ipv6/xfrm6_output.c | 2 +- net/mptcp/subflow.c | 1 - 9 files changed, 6 insertions(+), 45 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index f8b8599a0600..f5dfc8fb7b37 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -222,13 +222,6 @@ static inline unsigned long dst_metric_rtt(const struc= t dst_entry *dst, int metr return msecs_to_jiffies(dst_metric(dst, metric)); } =20 -static inline u32 -dst_allfrag(const struct dst_entry *dst) -{ - int ret =3D dst_feature(dst, RTAX_FEATURE_ALLFRAG); - return ret; -} - static inline int dst_metric_locked(const struct dst_entry *dst, int metric) { diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connecti= on_sock.h index 086d1193c9ef..d0a2f827d5f2 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -44,7 +44,6 @@ struct inet_connection_sock_af_ops { struct request_sock *req_unhash, bool *own_req); u16 net_header_len; - u16 net_frag_header_len; u16 sockaddr_len; int (*setsockopt)(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 98e11958cdff..dedbc757b688 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -244,7 +244,7 @@ struct inet_sock { }; =20 #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */ -#define IPCORK_ALLFRAG 2 /* always fragment (for ipv6 for now) */ +#define IPCORK_ALLFRAG 2 /* (unused) always fragment (for ipv6 for now) */ =20 enum { INET_FLAGS_PKTINFO =3D 0, diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index aa2482a0614a..3b687d20c9ed 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -505,7 +505,7 @@ enum { #define RTAX_FEATURE_ECN (1 << 0) #define RTAX_FEATURE_SACK (1 << 1) /* unused */ #define RTAX_FEATURE_TIMESTAMP (1 << 2) /* unused */ -#define RTAX_FEATURE_ALLFRAG (1 << 3) +#define RTAX_FEATURE_ALLFRAG (1 << 3) /* unused */ #define RTAX_FEATURE_TCP_USEC_TS (1 << 4) =20 #define RTAX_FEATURE_MASK (RTAX_FEATURE_ECN | \ diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2866ccbccde0..ca4d7594efd4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1698,14 +1698,6 @@ static inline int __tcp_mtu_to_mss(struct sock *sk, = int pmtu) */ mss_now =3D pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcph= dr); =20 - /* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */ - if (icsk->icsk_af_ops->net_frag_header_len) { - const struct dst_entry *dst =3D __sk_dst_get(sk); - - if (dst && dst_allfrag(dst)) - mss_now -=3D icsk->icsk_af_ops->net_frag_header_len; - } - /* Clamp it (mss_clamp does not include tcp options) */ if (mss_now > tp->rx_opt.mss_clamp) mss_now =3D tp->rx_opt.mss_clamp; @@ -1733,21 +1725,11 @@ int tcp_mss_to_mtu(struct sock *sk, int mss) { const struct tcp_sock *tp =3D tcp_sk(sk); const struct inet_connection_sock *icsk =3D inet_csk(sk); - int mtu; =20 - mtu =3D mss + + return mss + tp->tcp_header_len + icsk->icsk_ext_hdr_len + icsk->icsk_af_ops->net_header_len; - - /* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */ - if (icsk->icsk_af_ops->net_frag_header_len) { - const struct dst_entry *dst =3D __sk_dst_get(sk); - - if (dst && dst_allfrag(dst)) - mtu +=3D icsk->icsk_af_ops->net_frag_header_len; - } - return mtu; } EXPORT_SYMBOL(tcp_mss_to_mtu); =20 diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 3c7de89d6755..86efd901ee5a 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -191,7 +191,6 @@ static int __ip6_finish_output(struct net *net, struct = sock *sk, struct sk_buff return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); =20 if ((skb->len > mtu && !skb_is_gso(skb)) || - dst_allfrag(skb_dst(skb)) || (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(net, sk, skb, ip6_finish_output2); else @@ -1017,9 +1016,6 @@ int ip6_fragment(struct net *net, struct sock *sk, st= ruct sk_buff *skb, return err; =20 fail_toobig: - if (skb->sk && dst_allfrag(skb_dst(skb))) - sk_gso_disable(skb->sk); - icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); err =3D -EMSGSIZE; =20 @@ -1384,10 +1380,7 @@ static int ip6_setup_cork(struct sock *sk, struct in= et_cork_full *cork, cork->base.mark =3D ipc6->sockc.mark; sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags); =20 - if (dst_allfrag(xfrm_dst_path(&rt->dst))) - cork->base.flags |=3D IPCORK_ALLFRAG; cork->base.length =3D 0; - cork->base.transmit_time =3D ipc6->sockc.transmit_time; =20 return 0; @@ -1444,8 +1437,6 @@ static int __ip6_append_data(struct sock *sk, =20 headersize =3D sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + - (dst_allfrag(&rt->dst) ? - sizeof(struct frag_hdr) : 0) + rt->rt6i_nfheader_len; =20 if (mtu <=3D fragheaderlen || @@ -1555,7 +1546,7 @@ static int __ip6_append_data(struct sock *sk, =20 while (length > 0) { /* Check if the remaining data fits into current packet. */ - copy =3D (cork->length <=3D mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu= : maxfraglen) - skb->len; + copy =3D (cork->length <=3D mtu ? mtu : maxfraglen) - skb->len; if (copy < length) copy =3D maxfraglen - skb->len; =20 @@ -1586,7 +1577,7 @@ static int __ip6_append_data(struct sock *sk, */ datalen =3D length + fraggap; =20 - if (datalen > (cork->length <=3D mtu && !(cork->flags & IPCORK_ALLFRAG)= ? mtu : maxfraglen) - fragheaderlen) + if (datalen > (cork->length <=3D mtu ? mtu : maxfraglen) - fragheaderle= n) datalen =3D maxfraglen - fragheaderlen - rt->dst.trailer_len; fraglen =3D datalen + fragheaderlen; pagedlen =3D 0; @@ -1835,7 +1826,6 @@ static void ip6_cork_steal_dst(struct sk_buff *skb, s= truct inet_cork_full *cork) struct dst_entry *dst =3D cork->base.dst; =20 cork->base.dst =3D NULL; - cork->base.flags &=3D ~IPCORK_ALLFRAG; skb_dst_set(skb, dst); } =20 @@ -1856,7 +1846,6 @@ static void ip6_cork_release(struct inet_cork_full *c= ork, if (cork->base.dst) { dst_release(cork->base.dst); cork->base.dst =3D NULL; - cork->base.flags &=3D ~IPCORK_ALLFRAG; } } =20 diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 0c8a14ba104f..dc27988512a6 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1895,7 +1895,6 @@ const struct inet_connection_sock_af_ops ipv6_specifi= c =3D { .conn_request =3D tcp_v6_conn_request, .syn_recv_sock =3D tcp_v6_syn_recv_sock, .net_header_len =3D sizeof(struct ipv6hdr), - .net_frag_header_len =3D sizeof(struct frag_hdr), .setsockopt =3D ipv6_setsockopt, .getsockopt =3D ipv6_getsockopt, .addr2sockaddr =3D inet6_csk_addr2sockaddr, diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c index ad07904642ca..5f7b1fdbffe6 100644 --- a/net/ipv6/xfrm6_output.c +++ b/net/ipv6/xfrm6_output.c @@ -95,7 +95,7 @@ static int __xfrm6_output(struct net *net, struct sock *s= k, struct sk_buff *skb) return -EMSGSIZE; } =20 - if (toobig || dst_allfrag(skb_dst(skb))) + if (toobig) return ip6_fragment(net, sk, skb, __xfrm6_output_finish); =20 diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 9c1f8d1d63d2..7064543b534d 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -2044,7 +2044,6 @@ void __init mptcp_subflow_init(void) subflow_v6m_specific.send_check =3D ipv4_specific.send_check; subflow_v6m_specific.net_header_len =3D ipv4_specific.net_header_len; subflow_v6m_specific.mtu_reduced =3D ipv4_specific.mtu_reduced; - subflow_v6m_specific.net_frag_header_len =3D 0; subflow_v6m_specific.rebuild_header =3D subflow_rebuild_header; =20 tcpv6_prot_override =3D tcpv6_prot; --=20 2.30.2 From nobody Thu Jan 1 08:55:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D316DC25B67 for ; Tue, 24 Oct 2023 02:35:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232158AbjJXCft (ORCPT ); Mon, 23 Oct 2023 22:35:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232103AbjJXCfo (ORCPT ); Mon, 23 Oct 2023 22:35:44 -0400 Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6854510D4 for ; Mon, 23 Oct 2023 19:35:38 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id 6a1803df08f44-66d190a8f87so25334466d6.0 for ; Mon, 23 Oct 2023 19:35:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1698114937; x=1698719737; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ANWtBi2MdCaxrrm0KgoW1Ul1Iz1M5CJm2G0yJPXRo30=; b=ZgPXYpxWibYgwNLYUFrWHe7A825rT5FTXIi3PBQql3SiaZVJCVQ+VgFmP2vpSzA2ry gpb01jcgA4QCfPbOm25cYE62fq1SPXDNLv/qiqjv3bn4oFu4W1Qga/PQ8hF11khbcd7h 48sK2vm6tVPPgsLmQy/kkt39ZwK2c5Jwm+vzA9fEH52245eC9Oz8C9N0hmH+M5MyMFuX g8qHZzwpUFqBqZ8kKKfd4quZ/Tl7vat76J8IjsOyY40tozL4gaRz/YfnbW+dcjEPEV4o 9m6VESPn/PekhvERmayft5i0rh5ZdGGmVwn2VmQVbRic7FUzotqLB+W5ex6iDnAQtcdj GfBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698114937; x=1698719737; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ANWtBi2MdCaxrrm0KgoW1Ul1Iz1M5CJm2G0yJPXRo30=; b=MK1jM6FtvLxx0PvzXO8HmLmE0nob8oHMOEtetQr9cgKiHnxBX1ruYDroSU8U0pjeZt N0Y/OtETHXki2RHmnpvWo4qYdrA2x8lFTwn5tQkwzelC8HOYgrzQ8JZK4WAZLWV9G+qk 1T5newDsvua8uzAXKNwXByitL+8dzrUnUj4UbX+7717HG6rDBmjz/8lP4CXlhsPS0cQp RVz0j716N/ptPXPOthKpWmrPU2Obvx4a4zxOKamQrlO9GuL6htbkqYaPWh0xBuZTskqW 3vsRHMUR5Whuq2VUzVyxgEztjtQ/WJuWVYcGNo6wnVUC2mYqyZW1nR348kcdLJSlcObm 7/uA== X-Gm-Message-State: AOJu0Yy3zQKbPPwJlYgsb3JEgkarNGH98gSQvjvb9ja3C3o/zW6MxufW WDyL7USiYa1kWL7zcwU8SGbGTA== X-Google-Smtp-Source: AGHT+IGeV8Ebnl0uDo642VEo++go/0BPYcDg16vGOID8NEJ25tLFFvSguqvsL73tLZsoAIcbqFP5Vg== X-Received: by 2002:a05:6214:27eb:b0:65a:fc6a:1423 with SMTP id jt11-20020a05621427eb00b0065afc6a1423mr15754968qvb.17.1698114937549; Mon, 23 Oct 2023 19:35:37 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id d11-20020a05620a140b00b00767d4a3f4d9sm3153295qkj.29.2023.10.23.19.35.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 19:35:37 -0700 (PDT) Date: Mon, 23 Oct 2023 19:35:35 -0700 From: Yan Zhai To: netdev@vger.kernel.org Cc: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Aya Levin , Tariq Toukan , linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Florian Westphal , Willem de Bruijn , Alexander H Duyck Subject: [PATCH v4 net-next 2/3] ipv6: refactor ip6_finish_output for GSO handling Message-ID: <489a6b97c123700de4d28df86a95e79471cfe12b.1698114636.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Separate GSO and non-GSO packets handling to make the logic cleaner. For GSO packets, frag_max_size check can be omitted because it is only useful for packets defragmented by netfilter hooks. Both local output and GRO logic won't produce GSO packets when defragment is needed. This also mirrors what IPv4 side code is doing. Suggested-by: Florian Westphal Signed-off-by: Yan Zhai Reviewed-by: Willem de Bruijn --- net/ipv6/ip6_output.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 86efd901ee5a..4010dd97aaf8 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -172,6 +172,16 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, s= truct sock *sk, return ret; } =20 +static int ip6_finish_output_gso(struct net *net, struct sock *sk, + struct sk_buff *skb, unsigned int mtu) +{ + if (!(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) && + !skb_gso_validate_network_len(skb, mtu)) + return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); + + return ip6_finish_output2(net, sk, skb); +} + static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk= _buff *skb) { unsigned int mtu; @@ -185,16 +195,14 @@ static int __ip6_finish_output(struct net *net, struc= t sock *sk, struct sk_buff #endif =20 mtu =3D ip6_skb_dst_mtu(skb); - if (skb_is_gso(skb) && - !(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) && - !skb_gso_validate_network_len(skb, mtu)) - return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); + if (skb_is_gso(skb)) + return ip6_finish_output_gso(net, sk, skb, mtu); =20 - if ((skb->len > mtu && !skb_is_gso(skb)) || + if (skb->len > mtu || (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(net, sk, skb, ip6_finish_output2); - else - return ip6_finish_output2(net, sk, skb); + + return ip6_finish_output2(net, sk, skb); } =20 static int ip6_finish_output(struct net *net, struct sock *sk, struct sk_b= uff *skb) --=20 2.30.2 From nobody Thu Jan 1 08:55:45 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFC9FC25B67 for ; Tue, 24 Oct 2023 02:35:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232239AbjJXCf6 (ORCPT ); Mon, 23 Oct 2023 22:35:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232082AbjJXCfq (ORCPT ); Mon, 23 Oct 2023 22:35:46 -0400 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B5B110C3 for ; Mon, 23 Oct 2023 19:35:40 -0700 (PDT) Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-77774120c6eso259524785a.2 for ; Mon, 23 Oct 2023 19:35:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1698114939; x=1698719739; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=6S93RbRglo4zJtt34VMTicjVThmJc0ap31UZOgLJcsE=; b=WCu9otX47GCdVRdugIJIAODwuN1dOZ9H2eEAtQfYrgUmi2E9JeCLwzPF9328sqKmM7 RdXiUp/zbaF1EgGI/YQT39EY5C64RCCiqM6ztsE1SE4J2ZkWT6e54ghifIPsFHHS1e/g bC1hUwswNLjBiw4FkvBN20kAz5K9pIdbW3EUNMIWbpvIohA3HlQp893nJGEtp3j/BSE9 0Uj/MPZ7Tg2ppgri9wn5LVowrgCZrr/6rczgvFTtx11L6vtw2v70aFEEY92OwUfGkDAc nXjt8OkTaT0zQQPrMYw53MCntTldvVX2t8RBeKCA0sHftL9TPi/itqIUOw3E77n2gAkM Gnbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698114939; x=1698719739; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6S93RbRglo4zJtt34VMTicjVThmJc0ap31UZOgLJcsE=; b=PrFEi7KJhFEVrG+Cs+epmD0bMwxMeGNgsVn82GVad6lavHC6sl5XpJNaE3rM6aulwQ vJvdZytP1VMW5l6K6f6rWjsUizcayXF6hygtaxQxNGCF6VDBkg92EuQ890ftgihLAQvJ LM372vCzCzPezm3vtiadF0iEtWmgBUVpukqTf2QoWsCAVM9qzikNyK0wNCaMd5fabzn3 2UcHFwe/WPMHcAq5xVXCoiu1XvSdxhcvJnU4+6bFjdum30pSgft8yodHcgnquJVvu5b3 SJFHmtylHxv3JtlIcA+uTS+Pn40aVVC4mgDQ+0DWOVi/gq+Kp7BY1UdZSmcWuROPc0w+ IWIg== X-Gm-Message-State: AOJu0YxU98KNvwfctunEcK7AO1VPuFd8TUQyMOoEYK0Uewe98lwoKCYc BK2gWCXdUZOGKSyNxWgR+dLUwg== X-Google-Smtp-Source: AGHT+IFeSNLxbbaTEz7kuBUXmZKSLILtgAtkYjWmUsJsYLJo9qtUUPYsz1aa129SuvLRTlepJgm2LQ== X-Received: by 2002:a05:620a:24cb:b0:775:9e64:f5be with SMTP id m11-20020a05620a24cb00b007759e64f5bemr12393344qkn.55.1698114939293; Mon, 23 Oct 2023 19:35:39 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id e6-20020a05620a208600b007742218dc42sm3122060qka.119.2023.10.23.19.35.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 19:35:38 -0700 (PDT) Date: Mon, 23 Oct 2023 19:35:37 -0700 From: Yan Zhai To: netdev@vger.kernel.org Cc: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Aya Levin , Tariq Toukan , linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Florian Westphal , Willem de Bruijn , Alexander H Duyck Subject: [PATCH v4 net-next 3/3] ipv6: avoid atomic fragment on GSO packets Message-ID: <6b2347a888c8b2d8f259dbb4662c4995ba9a505e.1698114636.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When the ipv6 stack output a GSO packet, if its gso_size is larger than dst MTU, then all segments would be fragmented. However, it is possible for a GSO packet to have a trailing segment with smaller actual size than both gso_size as well as the MTU, which leads to an "atomic fragment". Atomic fragments are considered harmful in RFC-8021. An Existing report from APNIC also shows that atomic fragments are more likely to be dropped even it is equivalent to a no-op [1]. Add an extra check in the GSO slow output path. For each segment from the original over-sized packet, if it fits with the path MTU, then avoid generating an atomic fragment. Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1] Fixes: b210de4f8c97 ("net: ipv6: Validate GSO SKB before finish IPv6 proces= sing") Reported-by: David Wragg Signed-off-by: Yan Zhai --- net/ipv6/ip6_output.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 4010dd97aaf8..a722a43dd668 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -164,7 +164,13 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, s= truct sock *sk, int err; =20 skb_mark_not_on_list(segs); - err =3D ip6_fragment(net, sk, segs, ip6_finish_output2); + /* Last GSO segment can be smaller than gso_size (and MTU). + * Adding a fragment header would produce an "atomic fragment", + * which is considered harmful (RFC-8021). Avoid that. + */ + err =3D segs->len > mtu ? + ip6_fragment(net, sk, segs, ip6_finish_output2) : + ip6_finish_output2(net, sk, segs); if (err && ret =3D=3D 0) ret =3D err; } --=20 2.30.2