From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-107.mailbox.org (mout-b-107.mailbox.org [195.10.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 456FB3DFC66; Thu, 19 Mar 2026 15:19:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933592; cv=none; b=Pr1l+Msm2ixeqzUrDICkaa41DHnp1Y/PflwNFAvzo0HgoMuQl28me/JBR/uywCNbwpw+hC7gmjzjfEl0ynpr3O3CI0Suq2iITdtjKkgxXMrLDGzn4T5BgJ1CFwJwf3pHr65243kwjtAhTkdB5J9LvrberEYB/BHqeD2TwGfT6/I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933592; c=relaxed/simple; bh=wuZQ9vTbz8UTXtHV3/zPGSw68fibhsPS+1RYice2eLU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bqE4+DktGTKRxqajlJD3+j4pGH/rYiHHuVA5af3l3RTYQ/eEpUqO1QpzqSBQUiUBxI8ADZp34sqovOi2yxsA0REc+a0nwkKu1MEiIVIxN2Fi19CjLcu9ztHU/dBkq1q971mYeLMMnGFyxJ8g3EodjNHrMs3yj2RvqzFQ1p0p+0w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=MwSlKnYW; arc=none smtp.client-ip=195.10.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="MwSlKnYW" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-107.mailbox.org (Postfix) with ESMTPS id 4fc8MT0fdwzDs2m; Thu, 19 Mar 2026 16:12:53 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VRiQAbmOZNBrKYT2ZwQeh5CazEoSH0MkUOjWXENOVH4=; b=MwSlKnYWblA6TOpAuxURZTRB9f1ESXp+itO/3t5phIySsfR1GDQHK1CzYKVtSCIgExodBn RkOqp3nx4I4vr66H0nlSd1WAD3dMcvD83M3H2GRzhBU3VmymFaYd5taQThTWJOqOLoYKUr MZuugdo43ER7Q8Bf+U08scfgv8WpXLIeCN3/NRiZJmyqxlEDuX9E/UrvBPW7fU3Ef67htF RT9v28l4E1jQrZCKv77qe+ZUHeXRMI67k93+of4w6L3HCkERNi6HIGQYAgb1Ejz7e6ngCB QcnycTM1WOEV2TIZ5gFgHqcZcCy9q6hrkFwGUUswkd3uyjsu8IC92C7DPTCPXA== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 01/15] drivers/net: add ipxlat netdevice skeleton and build plumbing Date: Thu, 19 Mar 2026 16:12:10 +0100 Message-ID: <20260319151230.655687-2-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable ipxlat is a virtual netdevice implementing stateless IPv4/IPv6 translation (SIIT). The translation model follows RFC 7915 behavior and RFC 6052 address embedding rules. The netdevice form is intentional: it provides per-instance lifecycle, MTU/statistics semantics and explicit routing integration, so translated traffic can be steered through a dedicated device and configured per namespace. This series targets ipxlat as a reusable kernel building block for SIIT deployments and for NAT64-style setups when combined with existing nftables rules in userspace policy. This first patch introduces only the driver scaffolding: - drivers/net/ipxlat/ directory and build integration - Kconfig/Makefile entries - basic private structures and defaults - rtnl_link_ops and netdevice skeleton needed to create/register links No translation logic is added in this patch yet. Follow-up patches add packet validation, transport/ICMP translation, error handling, fragmentation handling, generic netlink control plane, selftests and documentation. Signed-off-by: Ralf Lici --- drivers/net/Kconfig | 13 ++++ drivers/net/Makefile | 1 + drivers/net/ipxlat/Makefile | 7 ++ drivers/net/ipxlat/ipxlpriv.h | 53 +++++++++++++ drivers/net/ipxlat/main.c | 137 ++++++++++++++++++++++++++++++++++ drivers/net/ipxlat/main.h | 27 +++++++ 6 files changed, 238 insertions(+) create mode 100644 drivers/net/ipxlat/Makefile create mode 100644 drivers/net/ipxlat/ipxlpriv.h create mode 100644 drivers/net/ipxlat/main.c create mode 100644 drivers/net/ipxlat/main.h diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index b2fd90466bab..a3b28f294d95 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -117,6 +117,19 @@ config OVPN This module enhances the performance of the OpenVPN userspace software by offloading the data channel processing to kernelspace. =20 +config IPXLAT + tristate "IPv6<>IPv4 packet translation virtual device (SIIT)" + depends on NET && INET && IPV6 + help + Virtual network device driver for Stateless IP/ICMP Packet + Translation (RFC 7915). Useful for IPv6 focused networks. + Particularly NAT64, SIIT-DC, 464XLAT network architectures. + + See also . + + To compile this driver as a module, choose M here: the module will be + called ipxlat. + config EQUALIZER tristate "EQL (serial line load balancing) support" help diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 5b01215f6829..4f982c9e6585 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_NET) +=3D loopback.o obj-$(CONFIG_NETDEV_LEGACY_INIT) +=3D Space.o obj-$(CONFIG_NETCONSOLE) +=3D netconsole.o obj-$(CONFIG_NETKIT) +=3D netkit.o +obj-$(CONFIG_IPXLAT) +=3D ipxlat/ obj-y +=3D phy/ obj-y +=3D pse-pd/ obj-y +=3D mdio/ diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile new file mode 100644 index 000000000000..bd48c2700bf5 --- /dev/null +++ b/drivers/net/ipxlat/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + +obj-$(CONFIG_IPXLAT) :=3D ipxlat.o + +ipxlat-objs +=3D main.o diff --git a/drivers/net/ipxlat/ipxlpriv.h b/drivers/net/ipxlat/ipxlpriv.h new file mode 100644 index 000000000000..5027d8377bdd --- /dev/null +++ b/drivers/net/ipxlat/ipxlpriv.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_IPXLPRIV_H_ +#define _NET_IPXLAT_IPXLPRIV_H_ + +#include +#include +#include + +/** + * struct ipv6_prefix - IPv6 prefix definition + * @addr: prefix address (host bits may be non-zero) + * @len: prefix length in bits + */ +struct ipv6_prefix { + struct in6_addr addr; + u8 len; +}; + +/** + * struct ipxlat_priv - private state stored in netdev priv area + * @dev: owning netdevice + * @xlat_prefix6: RFC 6052 prefix used for stateless v4<->v6 mapping + * @lowest_ipv6_mtu: LIM threshold used by 4->6 pre-fragment planning + * @cfg_lock: serializes control-plane updates + * @gro_cells: receive-side reinjection queue used by forward path + * + * Datapath reads config without taking @cfg_lock to keep per-packet overh= ead + * low. Writers serialize updates under @cfg_lock. During reconfiguration, + * readers may transiently observe mixed old/new values; this may cause a = small + * number of drops and is an accepted tradeoff for a lightweight datapath. + */ +struct ipxlat_priv { + struct net_device *dev; + struct ipv6_prefix xlat_prefix6; + u32 lowest_ipv6_mtu; + /* serializes control-plane updates */ + struct mutex cfg_lock; + struct gro_cells gro_cells; +}; + +#endif /* _NET_IPXLAT_IPXLPRIV_H_ */ diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c new file mode 100644 index 000000000000..26b7f5b6ff20 --- /dev/null +++ b/drivers/net/ipxlat/main.c @@ -0,0 +1,137 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include + +#include + +#include "ipxlpriv.h" +#include "main.h" + +MODULE_AUTHOR("Alberto Leiva Popper "); +MODULE_AUTHOR("Antonio Quartulli "); +MODULE_AUTHOR("Daniel Gr=C3=B6ber "); +MODULE_AUTHOR("Ralf Lici "); +MODULE_DESCRIPTION("IPv6<>IPv4 translation virtual netdev support (SIIT)"); +MODULE_LICENSE("GPL"); + +static int ipxlat_dev_init(struct net_device *dev) +{ + struct ipxlat_priv *ipxlat =3D netdev_priv(dev); + int err; + + ipxlat->dev =3D dev; + /* default xlat-prefix6 is 64:ff9b::/96 */ + ipxlat->xlat_prefix6.addr.s6_addr32[0] =3D htonl(0x0064ff9b); + ipxlat->xlat_prefix6.addr.s6_addr32[1] =3D 0; + ipxlat->xlat_prefix6.addr.s6_addr32[2] =3D 0; + ipxlat->xlat_prefix6.addr.s6_addr32[3] =3D 0; + ipxlat->xlat_prefix6.len =3D 96; + ipxlat->lowest_ipv6_mtu =3D 1280; + mutex_init(&ipxlat->cfg_lock); + + err =3D gro_cells_init(&ipxlat->gro_cells, dev); + if (unlikely(err)) + return err; + + return 0; +} + +static void ipxlat_dev_uninit(struct net_device *dev) +{ + struct ipxlat_priv *ipxlat =3D netdev_priv(dev); + + gro_cells_destroy(&ipxlat->gro_cells); +} + +static int ipxlat_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + dev_dstats_tx_dropped(dev); + kfree_skb(skb); + return NETDEV_TX_OK; +} + +static const struct net_device_ops ipxlat_netdev_ops =3D { + .ndo_init =3D ipxlat_dev_init, + .ndo_uninit =3D ipxlat_dev_uninit, + .ndo_start_xmit =3D ipxlat_start_xmit, +}; + +static const struct device_type ipxlat_type =3D { + .name =3D "ipxlat", +}; + +static void ipxlat_setup(struct net_device *dev) +{ + const netdev_features_t feat =3D NETIF_F_SG | NETIF_F_FRAGLIST | + NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | + NETIF_F_GSO_SOFTWARE; + + dev->type =3D ARPHRD_NONE; + dev->flags =3D IFF_NOARP; + dev->priv_flags |=3D IFF_NO_QUEUE; + dev->hard_header_len =3D 0; + dev->addr_len =3D 0; + + dev->lltx =3D true; + dev->features |=3D feat; + dev->hw_features |=3D feat; + dev->hw_enc_features |=3D feat; + + dev->netdev_ops =3D &ipxlat_netdev_ops; + dev->needs_free_netdev =3D true; + dev->pcpu_stat_type =3D NETDEV_PCPU_STAT_DSTATS; + dev->max_mtu =3D IP_MAX_MTU - sizeof(struct ipv6hdr) - + sizeof(struct iphdr); + dev->min_mtu =3D IPV6_MIN_MTU; + dev->mtu =3D ETH_DATA_LEN; + + /* keep skb->dst up to ndo_start_xmit so ICMP error emission can + * reuse routing metadata from ingress when available + */ + netif_keep_dst(dev); + + SET_NETDEV_DEVTYPE(dev, &ipxlat_type); +} + +static struct rtnl_link_ops ipxlat_link_ops =3D { + .kind =3D "ipxlat", + .priv_size =3D sizeof(struct ipxlat_priv), + .setup =3D ipxlat_setup, +}; + +bool ipxlat_dev_is_valid(const struct net_device *dev) +{ + return dev->rtnl_link_ops =3D=3D &ipxlat_link_ops; +} + +static int __init ipxlat_init(void) +{ + int err; + + err =3D rtnl_link_register(&ipxlat_link_ops); + if (err) { + pr_err("ipxlat: failed to register rtnl link ops: %d\n", err); + return err; + } + + return 0; +} + +static void __exit ipxlat_exit(void) +{ + rtnl_link_unregister(&ipxlat_link_ops); +} + +module_init(ipxlat_init); +module_exit(ipxlat_exit); diff --git a/drivers/net/ipxlat/main.h b/drivers/net/ipxlat/main.h new file mode 100644 index 000000000000..fb78f910b2e2 --- /dev/null +++ b/drivers/net/ipxlat/main.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_MAIN_H_ +#define _NET_IPXLAT_MAIN_H_ + +#include + +/** + * ipxlat_dev_is_valid - tell whether a netdev is an ipxlat interface + * @dev: netdevice to inspect + * + * Return: true if @dev was created with ipxlat link ops. + */ +bool ipxlat_dev_is_valid(const struct net_device *dev); + +#endif /* _NET_IPXLAT_MAIN_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-210.mailbox.org (mout-b-210.mailbox.org [195.10.208.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2F413D3481; Thu, 19 Mar 2026 15:22:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933750; cv=none; b=BXLSkawsdvPHXOQrzo5qe+FzwNLi/7081ptoWxe4+k5JerefiURxa5xIy8w1hUuiNhAIrAD0OCrIEFDw//eZLZ90rgkBRmZutcNeOniTgbLYh33gxO9xpJJdE9w+/xjGZrNgB7c2RplBOiHwQVoJ5FkExuKV7viUL+IakV2Ujcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933750; c=relaxed/simple; bh=iTO4e6EY6LzuRBmdnlWNtywEwzjsLcLwOY1vsh6w6X8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ndRFJARN/7DgjyNpCFvW5ou4Pcy1vcpYxBmX/5BH04qtgTx3KUrkh+vLUXjJwfAgLZds7dtoo6RB6CS7Y8h+SnRVsOgFGMK80rkUnpDeMYDH3I08IIJE++wiH6kE2jp6/LIe5KAyVWbv/tdeANQditzuUUonfnEt+iDeNrsa5lA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=Afz5ifqR; arc=none smtp.client-ip=195.10.208.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="Afz5ifqR" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-210.mailbox.org (Postfix) with ESMTPS id 4fc8MY5hxkzDqrQ; Thu, 19 Mar 2026 16:12:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0icos2ZBtUrsLwF58QCrTkTKUVqnIISU3RcxMFUjpAk=; b=Afz5ifqR/m3cvYhcAV+HV4VQFRBdnEgoxSlYhlEr/XS8DszzK3uqn6OeG3FT1rsJgWFDvJ PvlHyxjyrZmf5kate0iMR0mTVI8cZOZSFjY1K3+y7pOWW+bOEBSlJ/SeJfV2f/RNNvlpGj fHelUymu7cCSQ41qd2KR37p8hKj5BwTGkDRzLllSXuSTjaOcYVYPl+sQohM2dQWuROrK0f HpIUAAJ+MiXtii4JI9rH08xHqbKHgDwjLeP74tnP5COxxZj8gpMMG6fP1OpmQGuIwCXZ6M hh5uTD85JKWDOKOpZhgcDNA4niIreOLkdfnhH9U7YPD3i9fzTk3K978ucpxCrQ== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 02/15] ipxlat: add RFC 6052 address conversion helpers Date: Thu, 19 Mar 2026 16:12:11 +0100 Message-ID: <20260319151230.655687-3-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Introduce IPv4/IPv6 stateless address mapping helpers used by the translation pipeline. Add the core 4<->6 conversion routines, including RFC 6052 prefix embedding/extraction and the RFC 6791 fallback source selection logic used by ICMP translation paths. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/Makefile | 1 + drivers/net/ipxlat/address.c | 132 +++++++++++++++++++++++++++++++++++ drivers/net/ipxlat/address.h | 59 ++++++++++++++++ 3 files changed, 192 insertions(+) create mode 100644 drivers/net/ipxlat/address.c create mode 100644 drivers/net/ipxlat/address.h diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile index bd48c2700bf5..b6367dedd78e 100644 --- a/drivers/net/ipxlat/Makefile +++ b/drivers/net/ipxlat/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_IPXLAT) :=3D ipxlat.o =20 ipxlat-objs +=3D main.o +ipxlat-objs +=3D address.o diff --git a/drivers/net/ipxlat/address.c b/drivers/net/ipxlat/address.c new file mode 100644 index 000000000000..d1a2b7d1768f --- /dev/null +++ b/drivers/net/ipxlat/address.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include "address.h" + +static bool ipxlat_prefix6_contains(const struct ipv6_prefix *prefix, + const struct in6_addr *addr) +{ + return ipv6_prefix_equal(&prefix->addr, addr, prefix->len); +} + +static __be32 ipxlat_64_extract_addr(const struct in6_addr *src, + unsigned int q1, unsigned int q2, + unsigned int q3, unsigned int q4) +{ + q1 =3D src->s6_addr[q1]; + q2 =3D src->s6_addr[q2]; + q3 =3D src->s6_addr[q3]; + q4 =3D src->s6_addr[q4]; + return htonl((q1 << 24) | (q2 << 16) | (q3 << 8) | q4); +} + +static void ipxlat_46_embed_addr(__be32 __src, struct in6_addr *dst, + unsigned int q1, unsigned int q2, + unsigned int q3, unsigned int q4) +{ + u32 src =3D ntohl(__src); + + dst->s6_addr[q1] =3D ((src >> 24) & 0xFF); + dst->s6_addr[q2] =3D ((src >> 16) & 0xFF); + dst->s6_addr[q3] =3D ((src >> 8) & 0xFF); + dst->s6_addr[q4] =3D ((src) & 0xFF); +} + +void ipxlat_46_convert_addr(const struct ipv6_prefix *xlat_prefix6, + __be32 addr4, struct in6_addr *addr6) +{ + *addr6 =3D xlat_prefix6->addr; + + switch (xlat_prefix6->len) { + case 96: + addr6->s6_addr32[3] =3D addr4; + return; + case 64: + ipxlat_46_embed_addr(addr4, addr6, 9, 10, 11, 12); + return; + case 56: + ipxlat_46_embed_addr(addr4, addr6, 7, 9, 10, 11); + return; + case 48: + ipxlat_46_embed_addr(addr4, addr6, 6, 7, 9, 10); + return; + case 40: + ipxlat_46_embed_addr(addr4, addr6, 5, 6, 7, 9); + return; + case 32: + addr6->s6_addr32[1] =3D addr4; + return; + } + + DEBUG_NET_WARN_ON_ONCE(1); +} + +int ipxlat_64_convert_addrs(const struct ipv6_prefix *xlat_prefix6, + const struct ipv6hdr *hdr6, bool icmp_err, + __be32 *src, __be32 *dst) +{ + bool src_ok; + + src_ok =3D ipxlat_prefix6_contains(xlat_prefix6, &hdr6->saddr); + if (unlikely(!src_ok && !icmp_err)) + return -EINVAL; + if (unlikely(!ipxlat_prefix6_contains(xlat_prefix6, &hdr6->daddr))) + return -EINVAL; + + switch (xlat_prefix6->len) { + case 96: + if (likely(src_ok)) + *src =3D hdr6->saddr.s6_addr32[3]; + *dst =3D hdr6->daddr.s6_addr32[3]; + break; + case 64: + if (likely(src_ok)) + *src =3D ipxlat_64_extract_addr(&hdr6->saddr, 9, 10, 11, + 12); + *dst =3D ipxlat_64_extract_addr(&hdr6->daddr, 9, 10, 11, 12); + break; + case 56: + if (likely(src_ok)) + *src =3D ipxlat_64_extract_addr(&hdr6->saddr, 7, + 9, 10, 11); + *dst =3D ipxlat_64_extract_addr(&hdr6->daddr, 7, 9, 10, 11); + break; + case 48: + if (likely(src_ok)) + *src =3D ipxlat_64_extract_addr(&hdr6->saddr, 6, + 7, 9, 10); + *dst =3D ipxlat_64_extract_addr(&hdr6->daddr, 6, 7, 9, 10); + break; + case 40: + if (likely(src_ok)) + *src =3D ipxlat_64_extract_addr(&hdr6->saddr, 5, 6, 7, 9); + *dst =3D ipxlat_64_extract_addr(&hdr6->daddr, 5, 6, 7, 9); + break; + case 32: + if (likely(src_ok)) + *src =3D hdr6->saddr.s6_addr32[1]; + *dst =3D hdr6->daddr.s6_addr32[1]; + break; + default: + DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + } + + /* keep 6->4 ICMP error translation functional even when the ICMPv6 + * source is not xlat_prefix6-mapped (for example, stack-generated PTB) + */ + if (unlikely(!src_ok)) + *src =3D htonl(INADDR_DUMMY); + + return 0; +} diff --git a/drivers/net/ipxlat/address.h b/drivers/net/ipxlat/address.h new file mode 100644 index 000000000000..4283fdddac56 --- /dev/null +++ b/drivers/net/ipxlat/address.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_ADDRESS_H_ +#define _NET_IPXLAT_ADDRESS_H_ + +#include +#include + +#include "ipxlpriv.h" + +/** + * ipxlat_46_convert_addr - translate one IPv4 address into RFC 6052 IPv6 = form + * @xlat_prefix6: configured RFC 6052 prefix + * @addr4: IPv4 address to convert + * @addr6: output IPv6 address + */ +void ipxlat_46_convert_addr(const struct ipv6_prefix *xlat_prefix6, + __be32 addr4, struct in6_addr *addr6); + +/** + * ipxlat_64_convert_addrs - translate outer IPv6 endpoints into IPv4 pair + * @xlat_prefix6: configured RFC 6052 prefix + * @hdr6: source IPv6 header + * @icmp_err: source packet is ICMPv6 error + * @src: output IPv4 source address + * @dst: output IPv4 destination address + * + * Return: 0 on success, negative errno on non-translatable addresses. + */ +int ipxlat_64_convert_addrs(const struct ipv6_prefix *xlat_prefix6, + const struct ipv6hdr *hdr6, bool icmp_err, + __be32 *src, __be32 *dst); + +/** + * ipxlat_46_convert_addrs - translate outer IPv4 endpoints into IPv6 pair + * @xlat_prefix6: configured RFC 6052 prefix + * @iph4: source IPv4 header + * @iph6: output IPv6 header (only saddr/daddr are updated) + */ +static inline void +ipxlat_46_convert_addrs(const struct ipv6_prefix *xlat_prefix6, + const struct iphdr *iph4, struct ipv6hdr *iph6) +{ + ipxlat_46_convert_addr(xlat_prefix6, iph4->saddr, &iph6->saddr); + ipxlat_46_convert_addr(xlat_prefix6, iph4->daddr, &iph6->daddr); +} + +#endif /* _NET_IPXLAT_ADDRESS_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-112.mailbox.org (mout-b-112.mailbox.org [195.10.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E103A2DBF78; Thu, 19 Mar 2026 15:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933191; cv=none; b=qepQ1+D6xeY7euh89n1pX29RPLRff2VKVgWnjTvSuf8UV7m1VgmCiKm1vlnsZTXIRvyaNz2C+Cmz3hHUubPEeYziZBs9r7Xb4XKuALmQ7gNcHLkGgQsa48myTNvAb/iiv2yWVY6UGKDf7E8pHilHpeGz+egh9WlQ1NyIUzc2NZ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933191; c=relaxed/simple; bh=2NYDwy/gV7dbV10YVs0Lb78B4af4Gw0dgptem2nEozY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=I31K/OkKG4beUf0xzqHY7LHVxCU7Ri7jCTiKl79zTLBd7gXI2Br5e/RgpymwwuCZqYi5+l43qPoREPfUA5d7lOuP0hPY3sqsch1L0v5Zx4vxRqBDwl4k+/K34RFCwSiMIRqKrwIC/MDTvdiuNqRo8qEzhBKPu+c3fSaxSf8/4Lg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=p/84I8wn; arc=none smtp.client-ip=195.10.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="p/84I8wn" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-112.mailbox.org (Postfix) with ESMTPS id 4fc8Mg1NJmzDv9S; Thu, 19 Mar 2026 16:13:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SndRzQCr+0TV28vMI6neq7lH/2aOabTy4OqIGVFvHHA=; b=p/84I8wnabOFnqFtdXpkV2zqcfTMlzwFVjMKv8Nxy5UQEvdcoaNwSDOTNcCX9lfqKIVMoD WIjpckbaij8ZxffwcbSG/gL/fMQOMxIAtCfB8uy/ir0TP6jvJXSlVfC0RYW70Rwvhh8ded ecAMacrvoX3LXpd2bE91C8u7nVGdZmzt9G1H9UoHr6daPDrnOjXgqJiw6Vr/jsxiNYsrsJ XKw3a5fhHS0f/3ekIbVC4UjAQGS1GY3zlRLdpsC6GmODIMepbTFx4PmjeycKnbAtMRSnh+ 9evkI7lXYJmoxCqg3H7SOPGCP+g+9FeaqNivThMcyG6OT0LaGBAOygiVAKOlbw== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 03/15] ipxlat: add packet metadata control block helpers Date: Thu, 19 Mar 2026 16:12:12 +0100 Message-ID: <20260319151230.655687-4-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8Mg1NJmzDv9S Add the per-skb control-block layout and shared packet helper routines used by translation stages introducing common metadata bookkeeping (offset rebasing and invariant checks) plus protocol-fragment helper utilities. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/Makefile | 1 + drivers/net/ipxlat/packet.c | 99 +++++++++++++++++++++ drivers/net/ipxlat/packet.h | 166 ++++++++++++++++++++++++++++++++++++ 3 files changed, 266 insertions(+) create mode 100644 drivers/net/ipxlat/packet.c create mode 100644 drivers/net/ipxlat/packet.h diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile index b6367dedd78e..90dbc0489fa2 100644 --- a/drivers/net/ipxlat/Makefile +++ b/drivers/net/ipxlat/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_IPXLAT) :=3D ipxlat.o =20 ipxlat-objs +=3D main.o ipxlat-objs +=3D address.o +ipxlat-objs +=3D packet.o diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c new file mode 100644 index 000000000000..f82c375255f3 --- /dev/null +++ b/drivers/net/ipxlat/packet.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include "packet.h" + +/* Shift cached skb cb offsets by the L3 header delta after in-place rewri= te. + * + * Translation may replace only the outer L3 header size (4->6 or 6->4), w= hile + * cached offsets were computed before rewrite. Rebasing applies the same = delta + * to all cached absolute offsets so they still point to the same logical + * fields in the modified skb. + * + * This helper only guards against underflow (< 0). Relative ordering chec= ks + * are done by ipxlat_cb_offsets_valid. + */ +int ipxlat_cb_rebase_offsets(struct ipxlat_cb *cb, int delta) +{ + int off; + + off =3D cb->l4_off + delta; + if (unlikely(off < 0)) + return -EINVAL; + cb->l4_off =3D off; + + off =3D cb->payload_off + delta; + if (unlikely(off < 0)) + return -EINVAL; + cb->payload_off =3D off; + + if (unlikely(cb->is_icmp_err)) { + off =3D cb->inner_l3_offset + delta; + if (unlikely(off < 0)) + return -EINVAL; + cb->inner_l3_offset =3D off; + + off =3D cb->inner_l4_offset + delta; + if (unlikely(off < 0)) + return -EINVAL; + cb->inner_l4_offset =3D off; + + if (cb->inner_fragh_off) { + off =3D cb->inner_fragh_off + delta; + if (unlikely(off < 0)) + return -EINVAL; + cb->inner_fragh_off =3D off; + } + } + + return 0; +} + +#ifdef CONFIG_DEBUG_NET +/* Verify ordering/range relations between cached skb cb offsets. + * + * Unlike ipxlat_cb_rebase_offsets, this checks structural invariants: + * l4 <=3D payload, inner_l3 >=3D payload, inner_l3 <=3D inner_l4, and fra= gment + * header (when present) located inside inner L3 area before inner L4. + */ +bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb) +{ + if (unlikely(cb->payload_off < cb->l4_off)) + return false; + + if (unlikely(cb->is_icmp_err)) { + if (unlikely(cb->inner_l3_offset < cb->payload_off)) + return false; + if (unlikely(cb->inner_l4_offset < cb->inner_l3_offset)) + return false; + if (unlikely(cb->inner_fragh_off && + cb->inner_fragh_off < cb->inner_l3_offset)) + return false; + if (unlikely(cb->inner_fragh_off && + cb->inner_fragh_off >=3D cb->inner_l4_offset)) + return false; + } + + return true; +} +#endif + +int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxl, struct sk_buff *skb) +{ + return -EOPNOTSUPP; +} + +int ipxlat_v6_validate_skb(struct sk_buff *skb) +{ + return -EOPNOTSUPP; +} diff --git a/drivers/net/ipxlat/packet.h b/drivers/net/ipxlat/packet.h new file mode 100644 index 000000000000..f39c25987940 --- /dev/null +++ b/drivers/net/ipxlat/packet.h @@ -0,0 +1,166 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_PACKET_H_ +#define _NET_IPXLAT_PACKET_H_ + +#include + +#include "ipxlpriv.h" + +/** + * struct ipxlat_cb - per-skb parser and control metadata stored in skb->cb + * @l4_off: outer L4 header offset + * @payload_off: outer payload offset + * @fragh_off: outer IPv6 Fragment Header offset, or 0 if absent + * @inner_l3_offset: quoted inner L3 offset for ICMP errors + * @inner_l4_offset: quoted inner L4 offset for ICMP errors + * @inner_fragh_off: quoted inner IPv6 Fragment Header offset, or 0 + * @udp_zero_csum_len: outer UDP length used for 4->6 checksum synthesis + * @frag_max_size: pre-fragment payload cap for ip_do_fragment + * @l4_proto: outer L4 protocol (or nexthdr for IPv6) + * @inner_l4_proto: quoted inner L4 protocol + * @l3_hdr_len: outer L3 header length including extension headers + * @inner_l3_hdr_len: quoted inner L3 header length + * @is_icmp_err: packet is ICMP error and carries quoted inner packet + * @emit_icmp_err: datapath must emit translator-generated ICMP on drop + * @icmp_err: ICMP type/code/info cached for deferred emission + * @icmp_err.type: ICMP type to emit + * @icmp_err.code: ICMP code to emit + * @icmp_err.info: ICMP auxiliary info (e.g. pointer/MTU) + */ +struct ipxlat_cb { + u16 l4_off; + u16 payload_off; + u16 fragh_off; + u16 inner_l3_offset; + u16 inner_l4_offset; + u16 inner_fragh_off; + /* L4 span length (UDP header + payload) for outer IPv4 UDP packets + * arriving with checksum 0. + */ + u16 udp_zero_csum_len; + u16 frag_max_size; + u8 l4_proto; + u8 inner_l4_proto; + u8 l3_hdr_len; + u8 inner_l3_hdr_len; + bool is_icmp_err; + bool emit_icmp_err; + struct { + u8 type; + u8 code; + u32 info; + } icmp_err; +}; + +/** + * ipxlat_skb_cb - return ipxlat private control block in skb->cb + * @skb: skb carrying ipxlat metadata + * + * Return: pointer to &struct ipxlat_cb stored in the control buffer of @s= kb. + */ +static inline struct ipxlat_cb *ipxlat_skb_cb(const struct sk_buff *skb) +{ + BUILD_BUG_ON(sizeof(struct ipxlat_cb) > sizeof(skb->cb)); + return (struct ipxlat_cb *)(skb->cb); +} + +static inline unsigned int ipxlat_skb_datagram_len(const struct sk_buff *s= kb) +{ + return skb->len - skb_transport_offset(skb); +} + +static inline u8 ipxlat_get_ipv6_tclass(const struct ipv6hdr *hdr) +{ + return (hdr->priority << 4) | (hdr->flow_lbl[0] >> 4); +} + +static inline u16 ipxlat_get_frag6_offset(const struct frag_hdr *hdr) +{ + return be16_to_cpu(hdr->frag_off) & 0xFFF8U; +} + +static inline u16 ipxlat_get_frag4_offset(const struct iphdr *hdr) +{ + return (be16_to_cpu(hdr->frag_off) & IP_OFFSET) << 3; +} + +static inline bool ipxlat_is_first_frag6(const struct frag_hdr *hdr) +{ + return hdr ? (ipxlat_get_frag6_offset(hdr) =3D=3D 0) : true; +} + +static inline bool ipxlat_is_first_frag4(const struct iphdr *hdr) +{ + return !(hdr->frag_off & htons(IP_OFFSET)); +} + +static inline __be16 ipxlat_build_frag6_offset(u16 frag_offset, bool mf) +{ + return cpu_to_be16((frag_offset & 0xFFF8U) | mf); +} + +static inline __be16 +ipxlat_build_frag4_offset(bool df, bool mf, u16 frag_offset) +{ + return cpu_to_be16((df ? (1U << 14) : 0) | (mf ? (1U << 13) : 0) | + (frag_offset >> 3)); +} + +/** + * ipxlat_cb_rebase_offsets - shift cached cb offsets after skb relayout + * @cb: parsed packet metadata + * @delta: signed byte delta applied to cached offsets + * + * Return: 0 on success, negative errno if rebased offsets would underflow. + */ +int ipxlat_cb_rebase_offsets(struct ipxlat_cb *cb, int delta); +#ifdef CONFIG_DEBUG_NET +/** + * ipxlat_cb_offsets_valid - validate monotonicity and bounds of cb offsets + * @cb: parsed packet metadata + * + * Return: true if cached offsets are internally consistent. + */ +bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb); +#else +static inline bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb) +{ + return true; +} +#endif + +/** + * ipxlat_v4_validate_skb - validate and summarize IPv4 packet into skb->cb + * @ipxlat: translator private context + * @skb: packet to validate + * + * Populates &struct ipxlat_cb and may mark translator-generated ICMP acti= on on + * failure paths. + * + * Return: 0 on success, negative errno on validation failure. + */ +int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb= ); + +/** + * ipxlat_v6_validate_skb - validate and summarize IPv6 packet into skb->cb + * @skb: packet to validate + * + * Populates &struct ipxlat_cb for subsequent 6->4 translation. + * + * Return: 0 on success, negative errno on validation failure. + */ +int ipxlat_v6_validate_skb(struct sk_buff *skb); + +#endif /* _NET_IPXLAT_PACKET_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-107.mailbox.org (mout-b-107.mailbox.org [195.10.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1FBB3DE437; Thu, 19 Mar 2026 15:13:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933195; cv=none; b=KpySPH7sgfJXjdVTb33uPl+xxu9yMf0SHdVGdGk1R9q5UbgpMv6IrMr2Sn31p+arR97jIsFREUtwYOiM2G1JfQhk23xBOl6eux6vS/B5zlz1gXPHI0ViHMflhiYGzD82FaNwwXOcbPQcohsAO0fSeE7HEo1M3b1EWmpeqdanV2w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933195; c=relaxed/simple; bh=D1vNkKfupMoP2d5SF28Np/g0vRw1AZ3D7zDLIYeLhq8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZRrX+Aklkr4lNuBM+rHm0TM2tPcRDvPvO6P1A6pc2cbE8hWT8OdAARw8nbLf5t7eP07iNWobFY2SGnz8Je5epSf5U2aPsTkTqXusEEV8ISvImuGl4a1V17yfRBMjMJoUDZIHis1p2Nts7euTF9CvgGMWFNwG3HAT/dRcze58e4s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=0LmAj+Se; arc=none smtp.client-ip=195.10.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="0LmAj+Se" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-107.mailbox.org (Postfix) with ESMTPS id 4fc8Mn2m7yzDs2b; Thu, 19 Mar 2026 16:13:09 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ELi3LIGFSDx2zQa6RJTz+X8P1Q4yynMqgkJVxUcqTz8=; b=0LmAj+SeaWqLrSHgbG9+zW77XSC2QtpijJMiM7HKx93qGtu6efE65uDBsTuZCSfw7j/AbZ TQfBpbWH6tSzNeAd7lRtru/YnNUuParoFV/caFY2i/uP/dwYB3aWUHYt821mJ3LLvYWRp0 eQxnKCWOVffk10fnM24JxiMT8d9gUIc20ePZduhOJGqLJLZ5uLxdo/NS+OolUi0vBghvon kENFxzKSgWf0HrPnfDC+VkjGDVEQI7XU5DDTDNAvCYHjloTxS4pjoC1wt5eHe1aN/Bc4B7 Ypj54dAJBCATF8fqXIJ4Jq4GESr+vlg9CbzbWtg3LR5+5qqm5jKR/IVqEP27Sg== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 04/15] ipxlat: add IPv4 packet validation path Date: Thu, 19 Mar 2026 16:12:13 +0100 Message-ID: <20260319151230.655687-5-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8Mn2m7yzDs2b Content-Type: text/plain; charset="utf-8" Implement IPv4 packet parsing and validation, including option inspection, fragment-sensitive L4 checks, and UDP checksum-zero handling consistent with translator constraints. The parser populates skb control-block metadata consumed by translation and marks RFC-driven drop reasons for later action handling. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/packet.c | 312 +++++++++++++++++++++++++++++++++++- 1 file changed, 310 insertions(+), 2 deletions(-) diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c index f82c375255f3..0cc619dca147 100644 --- a/drivers/net/ipxlat/packet.c +++ b/drivers/net/ipxlat/packet.c @@ -11,6 +11,8 @@ * Ralf Lici */ =20 +#include + #include "packet.h" =20 /* Shift cached skb cb offsets by the L3 header delta after in-place rewri= te. @@ -88,9 +90,315 @@ bool ipxlat_cb_offsets_valid(const struct ipxlat_cb *cb) } #endif =20 -int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxl, struct sk_buff *skb) +static bool ipxlat_v4_validate_addr(__be32 addr4) { - return -EOPNOTSUPP; + return !(ipv4_is_zeronet(addr4) || ipv4_is_loopback(addr4) || + ipv4_is_multicast(addr4) || ipv4_is_lbcast(addr4)); +} + +/* RFC 7915 Section 4.1 requires ignoring IPv4 options unless an unexpired + * LSRR/SSRR is present, in which case we must send ICMPv4 SR_FAILED. + * We intentionally treat malformed option encoding as invalid input and + * drop early instead of continuing translation. + */ +static int ipxlat_v4_srr_check(struct sk_buff *skb, const struct iphdr *hd= r) +{ + const u8 *opt, *end; + u8 type, len, ptr; + + if (likely(hdr->ihl <=3D 5)) + return 0; + + opt =3D (const u8 *)(hdr + 1); + end =3D (const u8 *)hdr + (hdr->ihl << 2); + + while (opt < end) { + type =3D opt[0]; + if (type =3D=3D IPOPT_END) + return 0; + if (type =3D=3D IPOPT_NOOP) { + opt++; + continue; + } + + if (unlikely(end - opt < 2)) + return -EINVAL; + + len =3D opt[1]; + if (unlikely(len < 2 || opt + len > end)) + return -EINVAL; + + if (type =3D=3D IPOPT_LSRR || type =3D=3D IPOPT_SSRR) { + if (unlikely(len < 3)) + return -EINVAL; + + /* points to the beginning of the next IP addr */ + ptr =3D opt[2]; + if (unlikely(ptr < 4)) + return -EINVAL; + if (unlikely(ptr > len)) + return 0; + if (unlikely(ptr > len - 3)) + return -EINVAL; + + return -EINVAL; + } + + opt +=3D len; + } + + return 0; +} + +static int ipxlat_v4_pull_l3(struct sk_buff *skb, unsigned int l3_offset, + bool inner) +{ + const struct iphdr *iph; + unsigned int tot_len; + int l3_len; + + if (unlikely(!pskb_may_pull(skb, l3_offset + sizeof(*iph)))) + return -EINVAL; + + iph =3D (const struct iphdr *)(skb->data + l3_offset); + if (unlikely(iph->version !=3D 4 || iph->ihl < 5)) + return -EINVAL; + + l3_len =3D iph->ihl << 2; + /* For inner packets use ntohs(iph->tot_len) instead of iph_totlen. + * If inner iph->tot_len is zero, iph_totlen would fall back to outer + * GSO metadata, which is unrelated to quoted inner packet length. + */ + tot_len =3D unlikely(inner) ? ntohs(iph->tot_len) : iph_totlen(skb, iph); + if (unlikely(tot_len < l3_len)) + return -EINVAL; + + if (unlikely(!pskb_may_pull(skb, l3_offset + l3_len))) + return -EINVAL; + + return l3_len; +} + +static int ipxlat_v4_pull_l4(struct sk_buff *skb, unsigned int l4_offset, + u8 l4_proto, bool *is_icmp_err) +{ + struct icmphdr *icmp; + struct udphdr *udp; + struct tcphdr *tcp; + + *is_icmp_err =3D false; + + switch (l4_proto) { + case IPPROTO_TCP: + if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*tcp)))) + return -EINVAL; + + tcp =3D (struct tcphdr *)(skb->data + l4_offset); + if (unlikely(tcp->doff < 5)) + return -EINVAL; + + return __tcp_hdrlen(tcp); + case IPPROTO_UDP: + if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*udp)))) + return -EINVAL; + + udp =3D (struct udphdr *)(skb->data + l4_offset); + if (unlikely(ntohs(udp->len) < sizeof(*udp))) + return -EINVAL; + + return sizeof(struct udphdr); + case IPPROTO_ICMP: + if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*icmp)))) + return -EINVAL; + + icmp =3D (struct icmphdr *)(skb->data + l4_offset); + *is_icmp_err =3D icmp_is_err(icmp->type); + return sizeof(struct icmphdr); + default: + return 0; + } +} + +static int ipxlat_v4_pull_icmp_inner(struct sk_buff *skb, + unsigned int inner_l3_off) +{ + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct iphdr *inner_l3_hdr; + unsigned int inner_l4_off; + int inner_l3_len, err; + bool is_icmp_err; + + inner_l3_len =3D ipxlat_v4_pull_l3(skb, inner_l3_off, true); + if (unlikely(inner_l3_len < 0)) + return inner_l3_len; + inner_l3_hdr =3D (const struct iphdr *)(skb->data + inner_l3_off); + + /* accept non-first quoted fragments: only inner L3 is translatable */ + inner_l4_off =3D inner_l3_off + inner_l3_len; + cb->inner_l3_offset =3D inner_l3_off; + cb->inner_l3_hdr_len =3D inner_l3_len; + cb->inner_l4_offset =3D inner_l4_off; + + if (unlikely(!ipxlat_is_first_frag4(inner_l3_hdr))) + return 0; + + err =3D ipxlat_v4_pull_l4(skb, inner_l4_off, inner_l3_hdr->protocol, + &is_icmp_err); + if (unlikely(err < 0)) + return err; + if (unlikely(is_icmp_err)) + return -EINVAL; + + return 0; +} + +static int ipxlat_v4_pull_hdrs(struct sk_buff *skb) +{ + const unsigned int l3_off =3D skb_network_offset(skb); + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + int err, l3_len, l4_len =3D 0; + const struct iphdr *l3_hdr; + + /* parse IPv4 header and get its full length including options */ + l3_len =3D ipxlat_v4_pull_l3(skb, l3_off, false); + if (unlikely(l3_len < 0)) + return l3_len; + l3_hdr =3D ip_hdr(skb); + + if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->daddr))) + return -EINVAL; + + /* RFC 7915 Section 4.1 */ + if (unlikely(ipxlat_v4_srr_check(skb, l3_hdr))) + return -EINVAL; + if (unlikely(l3_hdr->ttl <=3D 1)) + return -EINVAL; + + /* RFC 7915 Section 1.2: + * Fragmented ICMP/ICMPv6 packets will not be translated by IP/ICMP + * translators. + */ + if (unlikely(l3_hdr->protocol =3D=3D IPPROTO_ICMP && + ip_is_fragment(l3_hdr))) + return -EINVAL; + + cb->l3_hdr_len =3D l3_len; + cb->l4_proto =3D l3_hdr->protocol; + cb->l4_off =3D l3_off + l3_len; + cb->payload_off =3D cb->l4_off; + cb->is_icmp_err =3D false; + + /* only non fragmented packets or first fragments have transport hdrs */ + if (unlikely(!ipxlat_is_first_frag4(l3_hdr))) { + if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->saddr))) + return -EINVAL; + return 0; + } + + l4_len =3D ipxlat_v4_pull_l4(skb, cb->l4_off, l3_hdr->protocol, + &cb->is_icmp_err); + if (unlikely(l4_len < 0)) + return l4_len; + + /* RFC 7915 Section 4.1: + * Illegal IPv4 sources are accepted only for ICMPv4 error translation. + */ + if (unlikely(!ipxlat_v4_validate_addr(l3_hdr->saddr) && + !cb->is_icmp_err)) + return -EINVAL; + + cb->payload_off =3D cb->l4_off + l4_len; + + if (unlikely(cb->is_icmp_err)) { + /* validate the quoted packet in an ICMP error */ + err =3D ipxlat_v4_pull_icmp_inner(skb, cb->payload_off); + if (unlikely(err)) + return err; + } + + return 0; +} + +static int ipxlat_v4_validate_icmp_csum(const struct sk_buff *skb) +{ + __sum16 csum; + + /* skip when checksum is not software-owned */ + if (skb->ip_summed !=3D CHECKSUM_NONE) + return 0; + + /* compute checksum over ICMP header and payload, then fold to 16-bit + * Internet checksum to validate it + */ + csum =3D csum_fold(skb_checksum(skb, skb_transport_offset(skb), + ipxlat_skb_datagram_len(skb), 0)); + return unlikely(csum) ? -EINVAL : 0; +} + +/** + * ipxlat_v4_validate_skb - validate IPv4 input and fill parser metadata i= n cb + * @ipxlat: translator private context + * @skb: packet to validate + * + * Ensures required headers are present/consistent and stores parsed offse= ts + * into &struct ipxlat_cb for the translation path. + * + * Return: 0 on success, negative errno on validation failure. + */ +int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb) +{ + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + struct iphdr *l3_hdr; + struct udphdr *udph; + int err; + + if (unlikely(skb_shared(skb))) + return -EINVAL; + + err =3D ipxlat_v4_pull_hdrs(skb); + if (unlikely(err)) + return err; + + skb_set_transport_header(skb, cb->l4_off); + + if (unlikely(cb->is_icmp_err)) { + if (unlikely(cb->l4_proto !=3D IPPROTO_ICMP)) { + DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + } + + /* Translation path recomputes ICMPv6 checksum from scratch. + * Validate here so a corrupted ICMPv4 error is not converted + * into a translated packet with a valid checksum. + */ + return ipxlat_v4_validate_icmp_csum(skb); + } + + l3_hdr =3D ip_hdr(skb); + if (likely(cb->l4_proto !=3D IPPROTO_UDP)) + return 0; + if (unlikely(!ipxlat_is_first_frag4(l3_hdr))) + return 0; + + udph =3D udp_hdr(skb); + if (likely(udph->check !=3D 0)) + return 0; + + /* We are in the path where L4 header is present (unfragmented packets + * or first fragments) and is UDP. + * Fragmented checksum-less IPv4 UDP is rejected because 4->6 cannot + * reliably translate it. + */ + if (unlikely(ip_is_fragment(l3_hdr))) + return -EINVAL; + + /* udph->len bounds the span used to compute replacement checksum */ + if (unlikely(ntohs(udph->len) > skb->len - cb->l4_off)) + return -EINVAL; + + cb->udp_zero_csum_len =3D ntohs(udph->len); + + return 0; } =20 int ipxlat_v6_validate_skb(struct sk_buff *skb) --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-112.mailbox.org (mout-b-112.mailbox.org [195.10.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DCC93E2768; Thu, 19 Mar 2026 15:13:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933199; cv=none; b=f6hbYRNS3wZrYXkZz5aL9ULBAgHLjwV1vbOQAMExSdiXTgDQatePsNlVl6sCzhkk8jz2pTnn2mnIhNUH5KVgDYIQPC6zcFHpdk5X+kRmlxMJtWh8UihDh3uQ4Laf+D7AMtVFIMk+R0TST9LkmbEIQgKu9Acx5NWcfjoWxgjul+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933199; c=relaxed/simple; bh=JwGy+BPlkGjZjKA3BTlYKwhIK/1wQWC18opsy5kgiCU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kZyGy4tuUOmWCP8B1FuwetUmKeiGTcdEgFXrd/O1x7+Pi+Tpg+Lpy5ihvN5Wu2jKWJ6ZY8MtlK9QmwKnB53r4qED5eQvsYU2o1OxL8pOuFNJID0ZgLHgh1MP0I61N8n/67OWBUswEvnZWpJdHuJUCytBabm10VJFeNpuTfXhWRI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=lay8uwxM; arc=none smtp.client-ip=195.10.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="lay8uwxM" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-112.mailbox.org (Postfix) with ESMTPS id 4fc8Mt0rsQzDvNX; Thu, 19 Mar 2026 16:13:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hNd5T3aNWF08LM/DvKx32Cly2ZAmObchJq+lKFbTppc=; b=lay8uwxMQHB4PkU6Z8gyledt+NXkDLsV8oGNCOlt6DqJbiIvoe1Y7gFfy7bmO9h8B6WxPC l9s02dX2P99EhHMnXbcSIIhrpQAUZAiSbWoC4NZA9Vx132BsXb4stauOoXtQ175RgKFMan UscRPaX67hSa+NIJuhIJVZ25maXUMJ1kc2MqvAiGBYODKJSI+rjtPX+wO+1zLB9v5SPiBM 7YOZPIEWnf+jcAISQzBF/Qym5KZr6ShEpG9ckYs5LuM9xYvF4t7iREHzQS271oru8n+ere HgpArXuIlUCPQmGcBh3PBpEQ7m6QfrFbZx3vH2tJSZ6xmgJCEMmNoZKoEmBHYg== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 05/15] ipxlat: add IPv6 packet validation path Date: Thu, 19 Mar 2026 16:12:14 +0100 Message-ID: <20260319151230.655687-6-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8Mt0rsQzDvNX Content-Type: text/plain; charset="utf-8" Implement IPv6 packet parsing and validation, including extension header traversal, fragment-header constraints, and ICMPv6 checksum handling for informational/error traffic. The parser fills skb control-block metadata for 6->4 translation and quoted-inner packet handling. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/packet.c | 326 +++++++++++++++++++++++++++++++++++- 1 file changed, 325 insertions(+), 1 deletion(-) diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c index 0cc619dca147..b9a9af1b3adb 100644 --- a/drivers/net/ipxlat/packet.c +++ b/drivers/net/ipxlat/packet.c @@ -401,7 +401,331 @@ int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat= , struct sk_buff *skb) return 0; } =20 +static bool ipxlat_v6_validate_saddr(const struct in6_addr *addr6) +{ + return !(ipv6_addr_any(addr6) || ipv6_addr_loopback(addr6) || + ipv6_addr_is_multicast(addr6)); +} + +static int ipxlat_v6_pull_l4(struct sk_buff *skb, unsigned int l4_offset, + u8 l4_proto, bool *is_icmp_err) +{ + struct icmp6hdr *icmp; + struct udphdr *udp; + struct tcphdr *tcp; + + *is_icmp_err =3D false; + + switch (l4_proto) { + case NEXTHDR_TCP: + if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*tcp)))) + return -EINVAL; + tcp =3D (struct tcphdr *)(skb->data + l4_offset); + return __tcp_hdrlen(tcp); + case NEXTHDR_UDP: + if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*udp)))) + return -EINVAL; + udp =3D (struct udphdr *)(skb->data + l4_offset); + if (unlikely(ntohs(udp->len) < sizeof(*udp))) + return -EINVAL; + return sizeof(struct udphdr); + case NEXTHDR_ICMP: + if (unlikely(!pskb_may_pull(skb, l4_offset + sizeof(*icmp)))) + return -EINVAL; + icmp =3D (struct icmp6hdr *)(skb->data + l4_offset); + *is_icmp_err =3D icmpv6_is_err(icmp->icmp6_type); + return sizeof(struct icmp6hdr); + default: + return 0; + } +} + +/* Basic IPv6 header walk: parse only the packet starting at l3_offset. + * It does not inspect quoted inner packets carried by ICMP errors. + */ +static int ipxlat_v6_walk_hdrs(struct sk_buff *skb, unsigned int l3_offset, + u8 *l4_proto, unsigned int *fhdr_offset, + unsigned int *l4_offset, bool *has_l4) +{ + unsigned int frag_hdr_off, l4hdr_off; + struct frag_hdr *frag; + struct ipv6hdr *ip6; + bool first_frag; + int err; + + /* cannot use default getter because this function is used both for + * outer and inner packets + */ + ip6 =3D (struct ipv6hdr *)(skb->data + l3_offset); + + /* if present, locate Fragment Header first because it affects + * whether transport headers are available + */ + frag_hdr_off =3D l3_offset; + err =3D ipv6_find_hdr(skb, &frag_hdr_off, NEXTHDR_FRAGMENT, NULL, NULL); + if (unlikely(err < 0 && err !=3D -ENOENT)) + return -EINVAL; + + *has_l4 =3D true; + *fhdr_offset =3D 0; + if (unlikely(err =3D=3D NEXTHDR_FRAGMENT)) { + if (unlikely(!pskb_may_pull(skb, frag_hdr_off + sizeof(*frag)))) + return -EINVAL; + frag =3D (struct frag_hdr *)(skb->data + frag_hdr_off); + + /* remember Fragment Header offset for downstream logic */ + *fhdr_offset =3D frag_hdr_off; + first_frag =3D ipxlat_is_first_frag6(frag); + + /* ipv6 forbids chaining FHs */ + if (unlikely(frag->nexthdr =3D=3D NEXTHDR_FRAGMENT)) + return -EINVAL; + + /* RFC 7915 Section 5.1.1 does not support extension headers + * after FH (except NEXTHDR_NONE) + */ + if (unlikely(ipv6_ext_hdr(frag->nexthdr) && + frag->nexthdr !=3D NEXTHDR_NONE)) + return -EPROTONOSUPPORT; + + /* non-first fragments do not carry a full transport header */ + if (!first_frag) { + *l4_proto =3D frag->nexthdr; + /* first byte after FH is fragment payload, + * not L4 header + */ + *l4_offset =3D frag_hdr_off + sizeof(struct frag_hdr); + *has_l4 =3D false; + return 0; + } + } + + /* walk extension headers to terminal protocol and compute offsets used + * by validation/translation + */ + l4hdr_off =3D l3_offset; + err =3D ipv6_find_hdr(skb, &l4hdr_off, -1, NULL, NULL); + if (unlikely(err < 0)) + return -EINVAL; + + *l4_proto =3D err; + *l4_offset =3D l4hdr_off; + return 0; +} + +/* RFC 7915 Section 5.1 says a Routing Header with Segments Left !=3D 0 + * must not be translated. We detect it by asking ipv6_find_hdr not to + * skip RH, then emit ICMPv6 Parameter Problem pointing to segments_left. + */ +static int ipxlat_v6_check_rh(struct sk_buff *skb) +{ + unsigned int rh_off; + int flags, nexthdr; + + rh_off =3D 0; + flags =3D IP6_FH_F_SKIP_RH; + nexthdr =3D ipv6_find_hdr(skb, &rh_off, NEXTHDR_ROUTING, NULL, &flags); + if (unlikely(nexthdr < 0 && nexthdr !=3D -ENOENT)) + return -EINVAL; + if (likely(nexthdr !=3D NEXTHDR_ROUTING)) + return 0; + + return -EINVAL; +} + +static int ipxlat_v6_pull_outer_l3(struct sk_buff *skb) +{ + const unsigned int l3_off =3D skb_network_offset(skb); + struct ipv6hdr *l3_hdr; + + if (unlikely(!pskb_may_pull(skb, l3_off + sizeof(*l3_hdr)))) + return -EINVAL; + l3_hdr =3D ipv6_hdr(skb); + + /* translator does not support jumbograms; payload_len must match skb */ + if (unlikely(l3_hdr->version !=3D 6 || + skb->len !=3D sizeof(*l3_hdr) + + be16_to_cpu(l3_hdr->payload_len) || + !ipxlat_v6_validate_saddr(&l3_hdr->saddr))) + return -EINVAL; + + if (unlikely(l3_hdr->hop_limit <=3D 1)) + return -EINVAL; + + return 0; +} + +static int ipxlat_v6_pull_icmp_inner(struct sk_buff *skb, + unsigned int outer_payload_off) +{ + unsigned int inner_fhdr_off, inner_l4_off; + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + struct ipv6hdr *inner_ip6; + bool has_l4, is_icmp_err; + u8 inner_l4_proto; + int err; + + if (unlikely(!pskb_may_pull(skb, + outer_payload_off + sizeof(*inner_ip6)))) + return -EINVAL; + + inner_ip6 =3D (struct ipv6hdr *)(skb->data + outer_payload_off); + if (unlikely(inner_ip6->version !=3D 6)) + return -EINVAL; + + err =3D ipxlat_v6_walk_hdrs(skb, outer_payload_off, &inner_l4_proto, + &inner_fhdr_off, &inner_l4_off, &has_l4); + if (unlikely(err)) + return err; + + cb->inner_l3_offset =3D outer_payload_off; + cb->inner_l4_offset =3D inner_l4_off; + cb->inner_fragh_off =3D inner_fhdr_off; + cb->inner_l4_proto =3D inner_l4_proto; + + if (likely(has_l4)) { + err =3D ipxlat_v6_pull_l4(skb, inner_l4_off, inner_l4_proto, + &is_icmp_err); + if (unlikely(err < 0)) + return err; + if (unlikely(is_icmp_err)) + return -EINVAL; + } + + return 0; +} + +static int ipxlat_v6_pull_hdrs(struct sk_buff *skb) +{ + const unsigned int l3_off =3D skb_network_offset(skb); + unsigned int fragh_off, l4_off, payload_off; + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + int l3_len, l4_len, err; + struct frag_hdr *frag; + bool has_l4; + u8 l4_proto; + + /* parse IPv6 base header and perform basic structural checks */ + err =3D ipxlat_v6_pull_outer_l3(skb); + if (unlikely(err)) + return err; + + /* walk extension/fragment headers and locate the transport header */ + err =3D ipxlat_v6_walk_hdrs(skb, l3_off, &l4_proto, &fragh_off, &l4_off, + &has_l4); + /* -EPROTONOSUPPORT means packet layout is syntactically valid but + * unsupported by our RFC 7915 path + */ + if (unlikely(err =3D=3D -EPROTONOSUPPORT)) + return -EINVAL; + if (unlikely(err)) + return err; + + l3_len =3D l4_off - l3_off; + payload_off =3D l4_off; + + if (likely(has_l4)) { + l4_len =3D ipxlat_v6_pull_l4(skb, l4_off, l4_proto, + &cb->is_icmp_err); + if (unlikely(l4_len < 0)) + return l4_len; + payload_off +=3D l4_len; + } + + /* RFC 7915 Section 5.1 */ + err =3D ipxlat_v6_check_rh(skb); + if (unlikely(err)) + return err; + + if (unlikely(l4_proto =3D=3D NEXTHDR_ICMP)) { + /* A stateless translator cannot reliably translate ICMP + * checksum across real IPv6 fragments, so fragmented ICMP is + * dropped. A Fragment Header alone, however, is not enough to + * decide: so-called atomic fragments (offset=3D0, M=3D0) carry a + * Fragment Header but are not actually fragmented. + */ + if (unlikely(fragh_off)) { + if (unlikely(!pskb_may_pull(skb, + fragh_off + sizeof(*frag)))) + return -EINVAL; + + frag =3D (struct frag_hdr *)(skb->data + fragh_off); + if (unlikely(ipxlat_get_frag6_offset(frag) || + (be16_to_cpu(frag->frag_off) & IP6_MF))) + return -EINVAL; + } + + if (unlikely(cb->is_icmp_err)) { + /* validate the quoted packet in an ICMP error */ + err =3D ipxlat_v6_pull_icmp_inner(skb, payload_off); + if (unlikely(err)) + return err; + } + } + + cb->l4_proto =3D l4_proto; + cb->l4_off =3D l4_off; + cb->fragh_off =3D fragh_off; + cb->payload_off =3D payload_off; + cb->l3_hdr_len =3D l3_len; + + return 0; +} + +static int ipxlat_v6_validate_icmp_csum(const struct sk_buff *skb) +{ + struct ipv6hdr *iph6; + unsigned int len; + __sum16 csum; + + if (skb->ip_summed !=3D CHECKSUM_NONE) + return 0; + + iph6 =3D ipv6_hdr(skb); + len =3D ipxlat_skb_datagram_len(skb); + csum =3D csum_ipv6_magic(&iph6->saddr, &iph6->daddr, len, NEXTHDR_ICMP, + skb_checksum(skb, skb_transport_offset(skb), len, + 0)); + + return unlikely(csum) ? -EINVAL : 0; +} + +/** + * ipxlat_v6_validate_skb - validate IPv6 input and fill parser metadata i= n cb + * @skb: packet to validate + * + * Ensures required headers are present/consistent and stores parsed offse= ts + * into &struct ipxlat_cb for the translation path. + * + * Return: 0 on success, negative errno on validation failure. + */ int ipxlat_v6_validate_skb(struct sk_buff *skb) { - return -EOPNOTSUPP; + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + int err; + + if (unlikely(skb_shared(skb))) + return -EINVAL; + + err =3D ipxlat_v6_pull_hdrs(skb); + if (unlikely(err)) + return err; + + skb_set_transport_header(skb, cb->l4_off); + + if (unlikely(cb->is_icmp_err)) { + if (unlikely(cb->l4_proto !=3D NEXTHDR_ICMP)) { + DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + } + + /* The translated ICMPv4 checksum is recomputed from scratch, + * so reject bad ICMPv6 error checksums before conversion. + */ + err =3D ipxlat_v6_validate_icmp_csum(skb); + if (unlikely(err)) + return err; + } + + return 0; } --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-203.mailbox.org (mout-b-203.mailbox.org [195.10.208.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 573FC3DD52A; Thu, 19 Mar 2026 15:20:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933613; cv=none; b=f/8muDkM1COxmVmRmDgApT/vXbSzv+Pk440VMm9BUFbjKxubRs3w55oS7gwsvapE8E3YKS7BqdAdXth4DPJOzlaCcKOo1wBD66rMxEnZveP3jbp72f5VFdWpCYMfhAMidnWsUxa7cCYWjwu1rkf+sMs67VwMAeX59YE9+iw0NcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933613; c=relaxed/simple; bh=cWaW2hJHEcFomAfoes8Q2TkBrG47FpwnOq1LEspkmBU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cJfm8VPsBZqbTRNuTC0eJbuX2m4hspxRmEOreKnsWpq7I6HruR/7X6s2NfZ54jbnitt3HR54YF8M5ZX/MkXo0Hcp0zAn1NaCkCCV3U9Qynnx9qSuHppSwBNvGJnfiUFO/cUi36Ci+BPF5og9qQO1iiHDz3SW2/mb7d2cfpMf1Ig= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=S9ZWqsKa; arc=none smtp.client-ip=195.10.208.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="S9ZWqsKa" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-203.mailbox.org (Postfix) with ESMTPS id 4fc8My6FJVz9xB0; Thu, 19 Mar 2026 16:13:18 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933198; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0mkPPwP2E8HVu98S0M2DSu5QibU7KhMmUvb3wWQT4RE=; b=S9ZWqsKahzaZ8gYBVvFBcoh4GNrqfpB5K63u6XVJr3gFZZ8+a9HHZmsfmHsolnmDqQH9Eu 6YDTHbPQpQSpvIyDvngp2LqdtdO0OY5qKiweHciyKUwiGa64q2YzUKuylYd6c0PiKfEnjH by8BdewRcZh0BffHBOKlTtDWDcmj4abep46C5F4A408oouiMSNMewoIm5N5BSrrYNMavEw Uy99of3scZMPwPnmtfE3GutqlBoMnd6yigYjgOtEEepwhCLEyj3hPZhWFC/EovV4Nhgc1N dizS/Rq1NTig145YpjDmKAwQPn8n/DBQkNIm4XYROBBQMqsd+2wD9hT5LakQSg== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 06/15] ipxlat: add transport checksum and offload helpers Date: Thu, 19 Mar 2026 16:12:15 +0100 Message-ID: <20260319151230.655687-7-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add shared transport-layer helpers for checksum manipulation and offload metadata normalization across family translation. This introduces incremental and full checksum utilities plus generic ICMP relayout/offload finalization routines reused by later 4->6 and 6->4 transport translation paths. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/transport.c | 146 +++++++++++++++++++++++++++++++++ drivers/net/ipxlat/transport.h | 83 +++++++++++++++++++ 2 files changed, 229 insertions(+) create mode 100644 drivers/net/ipxlat/transport.c create mode 100644 drivers/net/ipxlat/transport.h diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c new file mode 100644 index 000000000000..cd786ce84adc --- /dev/null +++ b/drivers/net/ipxlat/transport.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include +#include +#include +#include + +#include "packet.h" +#include "transport.h" + +/* set CHECKSUM_PARTIAL metadata for transport checksum completion */ +int ipxlat_set_partial_csum(struct sk_buff *skb, u16 csum_offset) +{ + if (likely(skb_partial_csum_set(skb, skb_transport_offset(skb), + csum_offset))) + return 0; + return -EINVAL; +} + +static __wsum ipxlat_pseudohdr6_csum(const struct ipv6hdr *hdr) +{ + return ~csum_unfold(csum_ipv6_magic(&hdr->saddr, &hdr->daddr, 0, 0, 0)); +} + +static __wsum ipxlat_pseudohdr4_csum(const struct iphdr *hdr) +{ + return csum_tcpudp_nofold(hdr->saddr, hdr->daddr, 0, 0, 0); +} + +static __sum16 ipxlat_46_update_csum(__sum16 csum16, + const struct iphdr *in_ip4, + const void *in_l4_hdr, + const struct ipv6hdr *out_ip6, + const void *out_l4_hdr, size_t l4_hdr_len) +{ + __wsum csum; + + csum =3D ~csum_unfold(csum16); + + /* replace pseudohdr and L4 header contributions, payload unchanged */ + csum =3D csum_sub(csum, ipxlat_pseudohdr4_csum(in_ip4)); + csum =3D csum_sub(csum, csum_partial(in_l4_hdr, l4_hdr_len, 0)); + csum =3D csum_add(csum, ipxlat_pseudohdr6_csum(out_ip6)); + csum =3D csum_add(csum, csum_partial(out_l4_hdr, l4_hdr_len, 0)); + return csum_fold(csum); +} + +static __sum16 ipxlat_64_update_csum(__sum16 csum16, + const struct ipv6hdr *in_ip6, + const void *in_l4_hdr, + size_t in_l4_hdr_len, + const struct iphdr *out_ip4, + const void *out_l4_hdr, + size_t out_l4_hdr_len) +{ + __wsum csum; + + csum =3D ~csum_unfold(csum16); + + /* only address terms matter because L4 length/proto are unchanged */ + csum =3D csum_sub(csum, ipxlat_pseudohdr6_csum(in_ip6)); + csum =3D csum_sub(csum, csum_partial(in_l4_hdr, in_l4_hdr_len, 0)); + + csum =3D csum_add(csum, ipxlat_pseudohdr4_csum(out_ip4)); + csum =3D csum_add(csum, csum_partial(out_l4_hdr, out_l4_hdr_len, 0)); + + return csum_fold(csum); +} + +__sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr, + const struct in6_addr *daddr, + const struct sk_buff *skb, unsigned int l4_off, + unsigned int l4_len, u8 proto) +{ + return csum_ipv6_magic(saddr, daddr, l4_len, proto, + skb_checksum(skb, l4_off, l4_len, 0)); +} + +/* Normalize checksum/offload metadata after address-family translation. + * + * Translation changes protocol family but keeps transport payload semanti= cs + * intact, so TCP GSO only needs type remap (gso_from -> gso_to), while IC= MP + * must clear stale GSO state because there is no ICMP GSO transform here. + * + * This mirrors forwarding expectations: reject LRO on xmit and clear hash + * when tuple semantics may have changed (fragments and non-TCP/UDP). + */ +int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_frag= ment, + u32 gso_from, u32 gso_to) +{ + struct skb_shared_info *shinfo; + + if (unlikely(skb->ip_summed =3D=3D CHECKSUM_COMPLETE)) + skb->ip_summed =3D CHECKSUM_NONE; + + if (!skb_is_gso(skb)) + goto out_hash; + + /* align with forwarding paths that reject LRO skbs before xmit */ + if (unlikely(skb_warn_if_lro(skb))) + return -EINVAL; + + shinfo =3D skb_shinfo(skb); + switch (l4_proto) { + case IPPROTO_TCP: + /* segment payload size is unchanged by address-family + * translation so there's no need to touch gso_size + */ + if (shinfo->gso_type & gso_from) { + shinfo->gso_type &=3D ~gso_from; + shinfo->gso_type |=3D gso_to; + } else if (unlikely(!(shinfo->gso_type & gso_to))) { + return -EOPNOTSUPP; + } + break; + case IPPROTO_UDP: + break; + case IPPROTO_ICMP: + /* for ICMP there is no GSO transform; clear stale offload + * metadata so the stack treats it as a normal frame + */ + skb_gso_reset(skb); + break; + default: + return -EPROTONOSUPPORT; + } + +out_hash: + if (unlikely(is_fragment || + (l4_proto !=3D IPPROTO_TCP && l4_proto !=3D IPPROTO_UDP))) + skb_clear_hash(skb); + else + skb_clear_hash_if_not_l4(skb); + return 0; +} diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h new file mode 100644 index 000000000000..bd228aecfb3b --- /dev/null +++ b/drivers/net/ipxlat/transport.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_TRANSPORT_H_ +#define _NET_IPXLAT_TRANSPORT_H_ + +#include +#include +#include + +/** + * ipxlat_l4_min_len - minimum transport header size for protocol + * @protocol: transport protocol identifier + * + * Return: minimum header length for @protocol, or 0 when unsupported. + */ +static inline unsigned int ipxlat_l4_min_len(u8 protocol) +{ + switch (protocol) { + case IPPROTO_TCP: + return sizeof(struct tcphdr); + case IPPROTO_UDP: + return sizeof(struct udphdr); + case IPPROTO_ICMP: + return sizeof(struct icmphdr); + default: + return 0; + } +} + +/** + * ipxlat_set_partial_csum - program CHECKSUM_PARTIAL metadata on skb + * @skb: packet with transport checksum field + * @csum_offset: offset of checksum field within transport header + * + * Return: 0 on success, negative errno on invalid skb state. + */ +int ipxlat_set_partial_csum(struct sk_buff *skb, u16 csum_offset); + +/** + * ipxlat_l4_csum_ipv6 - compute full L4 checksum with IPv6 pseudo-header + * @saddr: IPv6 source address + * @daddr: IPv6 destination address + * @skb: packet buffer + * @l4_off: transport header offset + * @l4_len: transport span (header + payload) + * @proto: transport protocol + * + * Return: folded checksum value covering pseudo-header and transport payl= oad. + */ +__sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr, + const struct in6_addr *daddr, + const struct sk_buff *skb, unsigned int l4_off, + unsigned int l4_len, u8 proto); + +/** + * ipxlat_finalize_offload - normalize checksum/GSO metadata after transla= tion + * @skb: translated packet + * @l4_proto: resulting transport protocol + * @is_fragment: resulting packet is fragmented + * @gso_from: input TCP GSO type bit + * @gso_to: output TCP GSO type bit + * + * Converts TCP GSO family bits and clears stale checksum/hash state when + * offload metadata cannot be preserved across address-family translation. + * + * Return: 0 on success, negative errno on unsupported/offload-incompatible + * input. + */ +int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_frag= ment, + u32 gso_from, u32 gso_to); + +#endif /* _NET_IPXLAT_TRANSPORT_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-210.mailbox.org (mout-b-210.mailbox.org [195.10.208.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 987CE3D3481; Thu, 19 Mar 2026 15:13:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.40 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933208; cv=none; b=a7C4MA6bbMTi2PJ5T5HIVbGwnFSbraAlx3XZMZ5T/9bztWsfJQAK5cqsWB8CFuyIxDd42w6u/a5US457ZTxxgZkH++DzucXmQV0xdmKS2pRo17rqHQRBWqK4LCZ8baTMUMIIzLCTg+Y9hADZzEb7fB8BF2ksItuSKjhvfK3b9+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933208; c=relaxed/simple; bh=AdceW1FN4eDUaFa0FYDNgm9vSomij039gshdCZPh0EI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ov8mGX2UApPSCYIyQmUBh4z9/4MX/EsLspr7XyXx6z6kJeJPq+nlGRvyDp8iLb7kljdNJvZB9ttQ71tw5eSacDScktGRUMpKnw7KVnVh87i9D827sVLEzMV8yueX43NH36y++/P1WatBagw5n1DNUet99QaNmzdiDaqy0hLRWkY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=joSMu+tX; arc=none smtp.client-ip=195.10.208.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="joSMu+tX" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-210.mailbox.org (Postfix) with ESMTPS id 4fc8N34JYkzDx5w; Thu, 19 Mar 2026 16:13:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933203; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XZR5uoZ2ZSYxknGF6muIkL03VVRbFhAx1x3IXKaFjIM=; b=joSMu+tX1NoUzw/SJBK1PfA8vYLdRUN54+uWHZJwchWVl8xedbKgqywpjvCuixhrJPB137 OGjVV/6v5lwRQM8ttZcZ++vAFFNw1Uqp8nV5ugU/w0WzInbsybEK9zyHhB4NjyEOeN1Prn eR65FTxDhy7UGoJTH+fkeBFpBKgYqsjY/djP8fHb5etKuXVqYfMEHDqgT9kfikgyvtWfbK WasWHJW2SsB5Eq7Ny5f1Cq8yaDG1YhBSh9yuoxJMMY3w82Rx9Wj8GlvTYxQX48/SAagrJG p2kLAGWhBxAiEdhifJpMfz/fmAMpKHG4x3eLOXstXk+Ifiazidne3ps6AShDFw== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 07/15] ipxlat: add 4to6 and 6to4 TCP/UDP translation helpers Date: Thu, 19 Mar 2026 16:12:16 +0100 Message-ID: <20260319151230.655687-8-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8N34JYkzDx5w Content-Type: text/plain; charset="utf-8" Add protocol-specific transport translation entry points for both address-family directions. This wires checksum adjustment for outer and quoted-inner TCP/UDP headers and provides the transport routines consumed by the translation engine. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/transport.c | 194 +++++++++++++++++++++++++++++++++ drivers/net/ipxlat/transport.h | 20 ++++ 2 files changed, 214 insertions(+) diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c index cd786ce84adc..3aa00c635916 100644 --- a/drivers/net/ipxlat/transport.c +++ b/drivers/net/ipxlat/transport.c @@ -144,3 +144,197 @@ int ipxlat_finalize_offload(struct sk_buff *skb, u8 l= 4_proto, bool is_fragment, skb_clear_hash_if_not_l4(skb); return 0; } + +int ipxlat_46_outer_tcp(struct sk_buff *skb, const struct iphdr *in4) +{ + const struct ipv6hdr *iph6 =3D ipv6_hdr(skb); + struct tcphdr *tcp_new =3D tcp_hdr(skb); + struct tcphdr tcp_old; + __sum16 csum16; + + /* CHECKSUM_PARTIAL keeps a pseudohdr seed in check, not a final + * transport checksum. For 4->6, we only re-seed it with IPv6 pseudohdr + * data and keep completion deferred to offload. + */ + if (skb->ip_summed =3D=3D CHECKSUM_PARTIAL) { + tcp_new->check =3D ~tcp_v6_check(ipxlat_skb_datagram_len(skb), + &iph6->saddr, &iph6->daddr, 0); + return ipxlat_set_partial_csum(skb, + offsetof(struct tcphdr, check)); + } + + /* zeroing check in old/new headers avoids double-accounting it */ + csum16 =3D tcp_new->check; + tcp_old =3D *tcp_new; + tcp_old.check =3D 0; + tcp_new->check =3D 0; + tcp_new->check =3D ipxlat_46_update_csum(csum16, in4, + &tcp_old, iph6, tcp_new, + sizeof(*tcp_new)); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_46_outer_udp(struct sk_buff *skb, const struct iphdr *in4) +{ + const struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct ipv6hdr *iph6 =3D ipv6_hdr(skb); + struct udphdr *udp_new =3D udp_hdr(skb); + struct udphdr udp_old; + __sum16 csum16; + + /* outer path enforces UDP zero-checksum policy in validation */ + if (skb->ip_summed =3D=3D CHECKSUM_PARTIAL && likely(udp_new->check !=3D = 0)) { + udp_new->check =3D ~udp_v6_check(ipxlat_skb_datagram_len(skb), + &iph6->saddr, &iph6->daddr, 0); + return ipxlat_set_partial_csum(skb, + offsetof(struct udphdr, check)); + } + + /* incoming UDP IPv4 has no checksum (legal in IPv4, not in IPv6) */ + if (unlikely(udp_new->check =3D=3D 0)) { + if (unlikely(!cb->udp_zero_csum_len)) + return -EINVAL; + + udp_new->check =3D + ipxlat_l4_csum_ipv6(&iph6->saddr, &iph6->daddr, skb, + skb_transport_offset(skb), + cb->udp_zero_csum_len, IPPROTO_UDP); + /* 0x0000 on wire means "no checksum"; preserve computed zero */ + if (udp_new->check =3D=3D 0) + udp_new->check =3D CSUM_MANGLED_0; + skb->ip_summed =3D CHECKSUM_NONE; + return 0; + } + + csum16 =3D udp_new->check; + udp_old =3D *udp_new; + udp_old.check =3D 0; + udp_new->check =3D 0; + udp_new->check =3D ipxlat_46_update_csum(csum16, in4, + &udp_old, iph6, udp_new, + sizeof(*udp_new)); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_46_inner_tcp(struct sk_buff *skb, const struct iphdr *in4, + const struct ipv6hdr *iph6, struct tcphdr *tcp_new) +{ + struct tcphdr tcp_old; + __sum16 csum16; + + csum16 =3D tcp_new->check; + tcp_old =3D *tcp_new; + tcp_old.check =3D 0; + tcp_new->check =3D 0; + tcp_new->check =3D ipxlat_46_update_csum(csum16, in4, &tcp_old, iph6, + tcp_new, sizeof(*tcp_new)); + return 0; +} + +int ipxlat_46_inner_udp(struct sk_buff *skb, const struct iphdr *in4, + const struct ipv6hdr *iph6, struct udphdr *udp_new) +{ + struct udphdr udp_old; + __sum16 csum16; + + if (unlikely(udp_new->check =3D=3D 0)) + return 0; + + csum16 =3D udp_new->check; + udp_old =3D *udp_new; + udp_old.check =3D 0; + udp_new->check =3D 0; + udp_new->check =3D ipxlat_46_update_csum(csum16, in4, &udp_old, iph6, + udp_new, sizeof(*udp_new)); + return 0; +} + +int ipxlat_64_outer_tcp(struct sk_buff *skb, const struct ipv6hdr *in6) +{ + struct tcphdr tcp_old, *tcp_new; + __sum16 csum16; + + tcp_new =3D tcp_hdr(skb); + + if (skb->ip_summed =3D=3D CHECKSUM_PARTIAL) { + tcp_new->check =3D ~tcp_v4_check(ipxlat_skb_datagram_len(skb), + ip_hdr(skb)->saddr, + ip_hdr(skb)->daddr, 0); + return ipxlat_set_partial_csum(skb, + offsetof(struct tcphdr, check)); + } + + csum16 =3D tcp_new->check; + tcp_old =3D *tcp_new; + tcp_old.check =3D 0; + tcp_new->check =3D 0; + tcp_new->check =3D ipxlat_64_update_csum(csum16, in6, &tcp_old, + sizeof(tcp_old), ip_hdr(skb), + tcp_new, sizeof(*tcp_new)); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_64_outer_udp(struct sk_buff *skb, const struct ipv6hdr *in6) +{ + struct udphdr udp_old, *udp_new; + __sum16 csum16; + + udp_new =3D udp_hdr(skb); + + if (skb->ip_summed =3D=3D CHECKSUM_PARTIAL) { + udp_new->check =3D ~udp_v4_check(ipxlat_skb_datagram_len(skb), + ip_hdr(skb)->saddr, + ip_hdr(skb)->daddr, 0); + return ipxlat_set_partial_csum(skb, + offsetof(struct udphdr, check)); + } + + csum16 =3D udp_new->check; + udp_old =3D *udp_new; + udp_old.check =3D 0; + udp_new->check =3D 0; + udp_new->check =3D ipxlat_64_update_csum(csum16, in6, &udp_old, + sizeof(udp_old), ip_hdr(skb), + udp_new, sizeof(*udp_new)); + if (udp_new->check =3D=3D 0) + udp_new->check =3D CSUM_MANGLED_0; + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6, + const struct iphdr *out4, struct tcphdr *tcp_new) +{ + struct tcphdr tcp_old; + __sum16 csum16; + + csum16 =3D tcp_new->check; + tcp_old =3D *tcp_new; + tcp_old.check =3D 0; + tcp_new->check =3D 0; + tcp_new->check =3D ipxlat_64_update_csum(csum16, in6, &tcp_old, + sizeof(tcp_old), out4, tcp_new, + sizeof(*tcp_new)); + return 0; +} + +int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6, + const struct iphdr *out4, struct udphdr *udp_new) +{ + struct udphdr udp_old; + __sum16 csum16; + + csum16 =3D udp_new->check; + udp_old =3D *udp_new; + udp_old.check =3D 0; + udp_new->check =3D 0; + udp_new->check =3D ipxlat_64_update_csum(csum16, in6, &udp_old, + sizeof(udp_old), out4, udp_new, + sizeof(*udp_new)); + if (udp_new->check =3D=3D 0) + udp_new->check =3D CSUM_MANGLED_0; + return 0; +} diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h index bd228aecfb3b..9b6fe422b01f 100644 --- a/drivers/net/ipxlat/transport.h +++ b/drivers/net/ipxlat/transport.h @@ -80,4 +80,24 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr, int ipxlat_finalize_offload(struct sk_buff *skb, u8 l4_proto, bool is_frag= ment, u32 gso_from, u32 gso_to); =20 +/* outer transport translation helpers (packet L3 already translated) */ +int ipxlat_46_outer_tcp(struct sk_buff *skb, const struct iphdr *in4); +int ipxlat_46_outer_udp(struct sk_buff *skb, const struct iphdr *in4); + +/* quoted-inner transport translation helpers for ICMP error payloads */ +int ipxlat_46_inner_tcp(struct sk_buff *skb, const struct iphdr *in4, + const struct ipv6hdr *iph6, struct tcphdr *tcp_new); +int ipxlat_46_inner_udp(struct sk_buff *skb, const struct iphdr *in4, + const struct ipv6hdr *iph6, struct udphdr *udp_new); + +/* outer transport translation helpers (packet L3 already translated) */ +int ipxlat_64_outer_tcp(struct sk_buff *skb, const struct ipv6hdr *in6); +int ipxlat_64_outer_udp(struct sk_buff *skb, const struct ipv6hdr *in6); + +/* quoted-inner transport translation helpers for ICMP error payloads */ +int ipxlat_64_inner_tcp(struct sk_buff *skb, const struct ipv6hdr *in6, + const struct iphdr *out4, struct tcphdr *tcp_new); +int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6, + const struct iphdr *out4, struct udphdr *udp_new); + #endif /* _NET_IPXLAT_TRANSPORT_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-201.mailbox.org (mout-b-201.mailbox.org [195.10.208.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8577A3E1212; Thu, 19 Mar 2026 15:20:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.61 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933632; cv=none; b=Sj2/qpzK9th8LQxFsJVh2QEyATUvdpFic0iKdMytvQ61tqj35VkwdfkL9BCfhXirKdTTeTsbY2rwg/MSoV5vlJFNad8eUc146bAsekk4BjFuRNvtAg+WoFSiAxXJNfYuPDKWcnm1kyyJvvGQhiUStnwNblREN4oi3ElyhONV3+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933632; c=relaxed/simple; bh=FwOP6Y/r6cUNiycZePr8tA9psg2KP7NGp0NJhxq+fo0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=d89pwx8qpKr5gaQ+X/r4TuUt8LQ7uEaqlYaOSAx20Tp4kLTmBIfUxAVKuMW3jYb44uSaj2AgpRhXx8KxSaPdG7GMC/64gewOleK9mcvS4B9+U7CQdTI4qKDmQRnoWcpfCS+i5mof7+jEi94QkiUYyC9HCQjJ0JGY/DlibaRWhQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=yGRMOjHo; arc=none smtp.client-ip=195.10.208.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="yGRMOjHo" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-201.mailbox.org (Postfix) with ESMTPS id 4fc8N93hc4zDs24; Thu, 19 Mar 2026 16:13:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6HCdS8enImfVbeBMGZCgmGJPoYxliDdYncuxGpUTcFA=; b=yGRMOjHopOG68LpT6zhbwRr44VdpaZNuPkv/Qvdmz5tri+fCh0aFRWT/rHffbp0OGzT+/a 99C/Q9i2J+IQXgMCZ3lpXx+2oMceK9jxfQsiYk0YlnMkc1eOMdYaAz1gEqiPRGr+YJhkqg DILTkjvZe1lQyHd4jXmvfVD02S2emavzVojiP0g/3NATmIjBQ4bnnQkZf422+hn08TcXrk UoHnGqsk4hV2cfGpGKcAgyI1CCNXQXLcH25onPEJT367qabTvwGkWlQKOky16FQRiHY2k5 PbMNY9oyjLVlNRSAj0mwDdQHKqFVgg/Cnskh7dAZK6OcWWImTX1WgaGbw/DBhg== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core Date: Thu, 19 Mar 2026 16:12:17 +0100 Message-ID: <20260319151230.655687-9-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This commit introduces the core start_xmit processing flow: validate, select action, translate, and forward. It centralizes action resolution in the dispatch layer and keeps per-direction translation logic separate from device glue. The result is a single data-path entry point with explicit control over drop/forward/emit behavior. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/Makefile | 4 + drivers/net/ipxlat/dispatch.c | 104 +++++++++++++++ drivers/net/ipxlat/dispatch.h | 71 +++++++++++ drivers/net/ipxlat/main.c | 6 +- drivers/net/ipxlat/packet.c | 1 + drivers/net/ipxlat/translate_46.c | 198 +++++++++++++++++++++++++++++ drivers/net/ipxlat/translate_46.h | 73 +++++++++++ drivers/net/ipxlat/translate_64.c | 205 ++++++++++++++++++++++++++++++ drivers/net/ipxlat/translate_64.h | 56 ++++++++ drivers/net/ipxlat/transport.c | 11 ++ drivers/net/ipxlat/transport.h | 5 + 11 files changed, 732 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ipxlat/dispatch.c create mode 100644 drivers/net/ipxlat/dispatch.h create mode 100644 drivers/net/ipxlat/translate_46.c create mode 100644 drivers/net/ipxlat/translate_46.h create mode 100644 drivers/net/ipxlat/translate_64.c create mode 100644 drivers/net/ipxlat/translate_64.h diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile index 90dbc0489fa2..d7b7097aee5f 100644 --- a/drivers/net/ipxlat/Makefile +++ b/drivers/net/ipxlat/Makefile @@ -7,3 +7,7 @@ obj-$(CONFIG_IPXLAT) :=3D ipxlat.o ipxlat-objs +=3D main.o ipxlat-objs +=3D address.o ipxlat-objs +=3D packet.o +ipxlat-objs +=3D transport.o +ipxlat-objs +=3D dispatch.o +ipxlat-objs +=3D translate_46.o +ipxlat-objs +=3D translate_64.o diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c new file mode 100644 index 000000000000..133d30859f49 --- /dev/null +++ b/drivers/net/ipxlat/dispatch.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include + +#include "dispatch.h" +#include "packet.h" +#include "translate_46.h" +#include "translate_64.h" + +static enum ipxlat_action +ipxlat_resolve_failed_action(const struct sk_buff *skb) +{ + return IPXLAT_ACT_DROP; +} + +enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat, + struct sk_buff *skb) +{ + const u16 proto =3D ntohs(skb->protocol); + + memset(skb->cb, 0, sizeof(struct ipxlat_cb)); + + if (proto =3D=3D ETH_P_IPV6) { + if (unlikely(ipxlat_v6_validate_skb(skb)) || + unlikely(ipxlat_64_translate(ipxlat, skb))) + return ipxlat_resolve_failed_action(skb); + + return IPXLAT_ACT_FWD; + } else if (likely(proto =3D=3D ETH_P_IP)) { + if (unlikely(ipxlat_v4_validate_skb(ipxlat, skb))) + return ipxlat_resolve_failed_action(skb); + + if (unlikely(ipxlat_46_translate(ipxlat, skb))) + return ipxlat_resolve_failed_action(skb); + + return IPXLAT_ACT_FWD; + } + + return IPXLAT_ACT_DROP; +} + +/* mark current skb as drop-with-icmp and cache type/code/info for dispatc= h */ +void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info) +{ + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + + cb->emit_icmp_err =3D true; + cb->icmp_err.type =3D type; + cb->icmp_err.code =3D code; + cb->icmp_err.info =3D info; +} + +static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff = *skb) +{ + const unsigned int len =3D skb->len; + int err; + + /* reinject as a fresh packet with scrubbed metadata */ + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, false); + + err =3D gro_cells_receive(&ipxlat->gro_cells, skb); + if (likely(err =3D=3D NET_RX_SUCCESS)) + dev_dstats_rx_add(ipxlat->dev, len); + /* on failure gro_cells updates rx drop stats internally */ +} + +int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb, + bool allow_pre_frag) +{ + enum ipxlat_action action; + int err =3D -EINVAL; + + (void)allow_pre_frag; + + action =3D ipxlat_translate(ipxlat, skb); + switch (action) { + case IPXLAT_ACT_FWD: + dev_dstats_tx_add(ipxlat->dev, skb->len); + ipxlat_forward_pkt(ipxlat, skb); + return 0; + case IPXLAT_ACT_DROP: + goto drop_free; + default: + DEBUG_NET_WARN_ON_ONCE(1); + goto drop_free; + } + +drop_free: + dev_dstats_tx_dropped(ipxlat->dev); + kfree_skb(skb); + return err; +} diff --git a/drivers/net/ipxlat/dispatch.h b/drivers/net/ipxlat/dispatch.h new file mode 100644 index 000000000000..fa6fafea656b --- /dev/null +++ b/drivers/net/ipxlat/dispatch.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_DISPATCH_H_ +#define _NET_IPXLAT_DISPATCH_H_ + +#include "ipxlpriv.h" + +struct sk_buff; + +/** + * enum ipxlat_action - result of packet translation dispatch + * @IPXLAT_ACT_DROP: drop the packet + * @IPXLAT_ACT_FWD: packet translated and ready for forward reinjection + * @IPXLAT_ACT_PRE_FRAG: packet must be fragmented before 4->6 translation + * @IPXLAT_ACT_ICMP_ERR: drop packet and emit translator-generated ICMP er= ror + */ +enum ipxlat_action { + IPXLAT_ACT_DROP, + IPXLAT_ACT_FWD, + IPXLAT_ACT_PRE_FRAG, + IPXLAT_ACT_ICMP_ERR, +}; + +/** + * ipxlat_mark_icmp_drop - cache translator-generated ICMP action in skb cb + * @skb: packet being rejected + * @type: ICMP type to emit + * @code: ICMP code to emit + * @info: ICMP auxiliary info (pointer/MTU), host-endian + * + * This does not emit immediately; dispatch consumes the mark later and se= nds + * the ICMP error through the appropriate address family path. + */ +void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info= ); + +/** + * ipxlat_translate - validate/translate one packet and return next action + * @ipxlat: translator private context + * @skb: packet to process + * + * Return: one of &enum ipxlat_action. + */ +enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat, + struct sk_buff *skb); + +/** + * ipxlat_process_skb - top-level packet handler for ndo_start_xmit/reinje= ction + * @ipxlat: translator private context + * @skb: packet to process + * @allow_pre_frag: allow 4->6 pre-fragment action for this invocation + * + * The function always consumes @skb directly or through fragmentation + * callback/reinjection paths. + * + * Return: 0 on success, negative errno on processing failure. + */ +int ipxlat_process_skb(struct ipxlat_priv *ipxlat, struct sk_buff *skb, + bool allow_pre_frag); + +#endif /* _NET_IPXLAT_DISPATCH_H_ */ diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c index 26b7f5b6ff20..a1b4bcd39478 100644 --- a/drivers/net/ipxlat/main.c +++ b/drivers/net/ipxlat/main.c @@ -15,6 +15,7 @@ =20 #include =20 +#include "dispatch.h" #include "ipxlpriv.h" #include "main.h" =20 @@ -56,8 +57,9 @@ static void ipxlat_dev_uninit(struct net_device *dev) =20 static int ipxlat_start_xmit(struct sk_buff *skb, struct net_device *dev) { - dev_dstats_tx_dropped(dev); - kfree_skb(skb); + struct ipxlat_priv *ipxlat =3D netdev_priv(dev); + + ipxlat_process_skb(ipxlat, skb, true); return NETDEV_TX_OK; } =20 diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c index b9a9af1b3adb..b37a3e55aff8 100644 --- a/drivers/net/ipxlat/packet.c +++ b/drivers/net/ipxlat/packet.c @@ -13,6 +13,7 @@ =20 #include =20 +#include "dispatch.h" #include "packet.h" =20 /* Shift cached skb cb offsets by the L3 header delta after in-place rewri= te. diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/transla= te_46.c new file mode 100644 index 000000000000..aec8500db2c2 --- /dev/null +++ b/drivers/net/ipxlat/translate_46.c @@ -0,0 +1,198 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include + +#include "address.h" +#include "packet.h" +#include "transport.h" +#include "translate_46.h" + +u8 ipxlat_46_map_proto_to_nexthdr(u8 protocol) +{ + return (protocol =3D=3D IPPROTO_ICMP) ? NEXTHDR_ICMP : protocol; +} + +void ipxlat_46_build_frag_hdr(struct frag_hdr *fh6, const struct iphdr *hd= r4, + u8 l4_proto) +{ + fh6->nexthdr =3D ipxlat_46_map_proto_to_nexthdr(l4_proto); + fh6->reserved =3D 0; + fh6->frag_off =3D + ipxlat_build_frag6_offset(ipxlat_get_frag4_offset(hdr4), + !!(be16_to_cpu(hdr4->frag_off) & + IP_MF)); + fh6->identification =3D cpu_to_be32(be16_to_cpu(hdr4->id)); +} + +void ipxlat_46_build_l3(struct ipv6hdr *iph6, const struct iphdr *iph4, + unsigned int payload_len, u8 nexthdr, u8 hop_limit) +{ + iph6->version =3D 6; + iph6->priority =3D iph4->tos >> 4; + iph6->flow_lbl[0] =3D (iph4->tos & 0x0F) << 4; + iph6->flow_lbl[1] =3D 0; + iph6->flow_lbl[2] =3D 0; + iph6->payload_len =3D htons(payload_len); + iph6->nexthdr =3D nexthdr; + iph6->hop_limit =3D hop_limit; +} + +/* Lookup post-translation IPv6 PMTU for 4->6 output decisions. + * Falls back to translator MTU on routing failures and clamps route MTU + * against translator egress MTU. + */ +unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat, + const struct sk_buff *skb, + const struct iphdr *in4) +{ + unsigned int mtu6, dev_mtu; + struct flowi6 fl6 =3D {}; + struct dst_entry *dst; + + dev_mtu =3D READ_ONCE(ipxlat->dev->mtu); + + ipxlat_46_convert_addr(&ipxlat->xlat_prefix6, in4->saddr, + &fl6.saddr); + ipxlat_46_convert_addr(&ipxlat->xlat_prefix6, in4->daddr, + &fl6.daddr); + fl6.flowi6_mark =3D skb->mark; + + dst =3D ip6_route_output(dev_net(ipxlat->dev), NULL, &fl6); + if (unlikely(dst->error)) { + mtu6 =3D dev_mtu; + goto out; + } + + /* Route lookup can return a very large MTU (eg, local/loopback style + * routes) that does not reflect the translator egress constraint. + * Clamp with the translator device MTU so DF decisions are stable and + * pre-fragment planning never targets packets larger than what this + * interface can hand to the next stages. + */ + mtu6 =3D min_t(unsigned int, dst_mtu(dst), dev_mtu); + +out: + dst_release(dst); + return mtu6; +} + +/** + * ipxlat_46_translate - translate one validated packet from IPv4 to IPv6 + * @ipxlat: translator private context + * @skb: packet to translate + * + * Rewrites outer L3 in place, rebases cached offsets and translates L4 on + * first fragments only. + * + * Return: 0 on success, negative errno on translation failure. + */ +int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb) +{ + unsigned int min_l4_len, old_l3_len, new_l3_len; + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct iphdr outer4 =3D *ip_hdr(skb); + const u8 in_l4_proto =3D cb->l4_proto; + bool has_frag, first_frag; + struct frag_hdr *fh6; + struct ipv6hdr *iph6; + int l3_delta, err; + u8 out_l4_proto; + + /* snapshot the original IPv4 header fields before skb layout changes */ + has_frag =3D ip_is_fragment(&outer4); + first_frag =3D ipxlat_is_first_frag4(&outer4); + out_l4_proto =3D ipxlat_46_map_proto_to_nexthdr(in_l4_proto); + + old_l3_len =3D cb->l3_hdr_len; + new_l3_len =3D sizeof(struct ipv6hdr) + + (has_frag ? sizeof(struct frag_hdr) : 0); + l3_delta =3D (int)new_l3_len - (int)old_l3_len; + + /* make room for the new hdrs */ + if (unlikely(skb_cow_head(skb, max_t(int, 0, l3_delta)))) + return -ENOMEM; + + /* replace outer L3 area: drop IPv4 hdr, reserve IPv6(+Frag) hdr */ + skb_pull(skb, old_l3_len); + skb_push(skb, new_l3_len); + skb_reset_network_header(skb); + skb_set_transport_header(skb, new_l3_len); + skb->protocol =3D htons(ETH_P_IPV6); + + /* build outer IPv6 base hdr from translated IPv4 fields */ + iph6 =3D ipv6_hdr(skb); + ipxlat_46_build_l3(iph6, &outer4, skb->len - sizeof(*iph6), + out_l4_proto, outer4.ttl - 1); + + /* translate IPv4 endpoints into IPv6 addresses using xlat_prefix6 */ + ipxlat_46_convert_addrs(&ipxlat->xlat_prefix6, &outer4, iph6); + + /* add IPv6 fragment hdr when the IPv4 packet carried fragmentation */ + if (unlikely(has_frag)) { + iph6->nexthdr =3D NEXTHDR_FRAGMENT; + + fh6 =3D (struct frag_hdr *)(iph6 + 1); + ipxlat_46_build_frag_hdr(fh6, &outer4, in_l4_proto); + cb->fragh_off =3D sizeof(struct ipv6hdr); + } + + /* Rebase cached offsets after L3 size delta. + * For outer 4->6 translation this should not underflow: cached offsets + * were built from l3_off + ip4_len(+...) and delta =3D ip6_len - ip4_len, + * so ip4_len cancels out after rebasing. A failure here means internal + * metadata inconsistency, not a packet validation outcome. + */ + err =3D ipxlat_cb_rebase_offsets(cb, l3_delta); + if (unlikely(err)) { + DEBUG_NET_WARN_ON_ONCE(1); + return err; + } + + cb->l3_hdr_len =3D new_l3_len; + cb->l4_proto =3D out_l4_proto; + DEBUG_NET_WARN_ON_ONCE(!ipxlat_cb_offsets_valid(cb)); + + /* non-first fragments have no transport header to translate */ + if (unlikely(!first_frag)) + goto out; + + /* ensure transport bytes are writable before L4 csum/proto rewrites */ + min_l4_len =3D ipxlat_l4_min_len(in_l4_proto); + if (unlikely(skb_ensure_writable(skb, skb_transport_offset(skb) + + min_l4_len))) + return -ENOMEM; + + /* translate transport hdr and pseudohdr dependent checksums */ + switch (in_l4_proto) { + case IPPROTO_TCP: + err =3D ipxlat_46_outer_tcp(skb, &outer4); + break; + case IPPROTO_UDP: + err =3D ipxlat_46_outer_udp(skb, &outer4); + break; + case IPPROTO_ICMP: + err =3D ipxlat_46_icmp(ipxlat, skb); + break; + default: + err =3D 0; + break; + } + if (unlikely(err)) + return err; + +out: + /* normalize checksum/offload metadata for the translated frame */ + return ipxlat_finalize_offload(skb, in_l4_proto, has_frag, + SKB_GSO_TCPV4, SKB_GSO_TCPV6); +} diff --git a/drivers/net/ipxlat/translate_46.h b/drivers/net/ipxlat/transla= te_46.h new file mode 100644 index 000000000000..75def10d0cad --- /dev/null +++ b/drivers/net/ipxlat/translate_46.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_TRANSLATE_46_H_ +#define _NET_IPXLAT_TRANSLATE_46_H_ + +#include "ipxlpriv.h" + +struct iphdr; +struct ipv6hdr; +struct frag_hdr; +struct sk_buff; + +/** + * ipxlat_46_map_proto_to_nexthdr - map IPv4 L4 protocol to IPv6 nexthdr + * @protocol: IPv4 L4 protocol + * + * Return: IPv6 next-header value corresponding to @protocol. + */ +u8 ipxlat_46_map_proto_to_nexthdr(u8 protocol); + +/** + * ipxlat_46_build_frag_hdr - build IPv6 Fragment Header from IPv4 fragmen= t info + * @fh6: output IPv6 fragment header + * @hdr4: source IPv4 header + * @l4_proto: original IPv4 L4 protocol + */ +void ipxlat_46_build_frag_hdr(struct frag_hdr *fh6, const struct iphdr *hd= r4, + u8 l4_proto); + +/** + * ipxlat_46_build_l3 - build translated outer IPv6 header from IPv4 metad= ata + * @iph6: output IPv6 header + * @iph4: source IPv4 header + * @payload_len: IPv6 payload length + * @nexthdr: resulting IPv6 nexthdr + * @hop_limit: resulting IPv6 hop limit + */ +void ipxlat_46_build_l3(struct ipv6hdr *iph6, const struct iphdr *iph4, + unsigned int payload_len, u8 nexthdr, u8 hop_limit); + +/** + * ipxlat_46_lookup_pmtu6 - lookup post-translation IPv6 PMTU for a 4->6 p= acket + * @ipxlat: translator private context + * @skb: packet being translated + * @in4: source IPv4 header snapshot + * + * Return: effective PMTU clamped against translator device MTU. + */ +unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *ipxlat, + const struct sk_buff *skb, + const struct iphdr *in4); + +/** + * ipxlat_46_translate - translate outer packet from IPv4 to IPv6 in place + * @ipxlat: translator private context + * @skb: packet to translate + * + * Return: 0 on success, negative errno on translation failure. + */ +int ipxlat_46_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb); + +#endif /* _NET_IPXLAT_TRANSLATE_46_H_ */ diff --git a/drivers/net/ipxlat/translate_64.c b/drivers/net/ipxlat/transla= te_64.c new file mode 100644 index 000000000000..50a95fb75f9d --- /dev/null +++ b/drivers/net/ipxlat/translate_64.c @@ -0,0 +1,205 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include +#include + +#include "translate_64.h" +#include "address.h" +#include "packet.h" +#include "transport.h" + +u8 ipxlat_64_map_nexthdr_proto(u8 nexthdr) +{ + return (nexthdr =3D=3D NEXTHDR_ICMP) ? IPPROTO_ICMP : nexthdr; +} + +void ipxlat_64_build_l3(struct iphdr *iph4, const struct ipv6hdr *iph6, + unsigned int tot_len, __be16 frag_off, u8 protocol, + __be32 saddr, __be32 daddr, u8 ttl, __be16 id) +{ + iph4->version =3D 4; + iph4->ihl =3D 5; + iph4->tos =3D ipxlat_get_ipv6_tclass(iph6); + iph4->tot_len =3D cpu_to_be16(tot_len); + iph4->frag_off =3D frag_off; + iph4->ttl =3D ttl; + iph4->protocol =3D protocol; + iph4->saddr =3D saddr; + iph4->daddr =3D daddr; + iph4->id =3D id; + iph4->check =3D 0; + iph4->check =3D ip_fast_csum(iph4, iph4->ihl); +} + +static __be16 ipxlat_64_build_frag_off(const struct sk_buff *skb, + const struct frag_hdr *frag6, + u8 l4_proto) +{ + bool df, mf, over_mtu; + u16 frag_offset; + + /* preserve real IPv6 fragmentation state with a Fragment Header */ + if (frag6) { + mf =3D !!(be16_to_cpu(frag6->frag_off) & IP6_MF); + frag_offset =3D ipxlat_get_frag6_offset(frag6); + return ipxlat_build_frag4_offset(false, mf, frag_offset); + } + + /* frag_list implies segmented payload emitted as fragments */ + if (skb_has_frag_list(skb)) + return ipxlat_build_frag4_offset(false, false, 0); + + if (skb_is_gso(skb)) { + /* GSO frames are one datagram here; set DF only for TCP + * when later segmentation exceeds IPv6 minimum MTU + */ + df =3D (l4_proto =3D=3D IPPROTO_TCP) && + (ipxlat_skb_cb(skb)->payload_off + + skb_shinfo(skb)->gso_size > + (IPV6_MIN_MTU - sizeof(struct iphdr))); + return ipxlat_build_frag4_offset(df, false, 0); + } + + over_mtu =3D skb->len > (IPV6_MIN_MTU - sizeof(struct iphdr)); + return ipxlat_build_frag4_offset(over_mtu, false, 0); +} + +/** + * ipxlat_64_translate - translate one validated packet from IPv6 to IPv4 + * @ipxlat: translator private context + * @skb: packet to translate + * + * Rewrites outer L3 in place, rebases cached offsets and translates L4 on + * first fragments only. + * + * Return: 0 on success, negative errno on translation failure. + */ +int ipxlat_64_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb) +{ + unsigned int min_l4_len, old_l3_len, new_l3_len; + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + struct ipv6hdr outer6 =3D *ipv6_hdr(skb); + bool is_icmp_err, has_frag, first_frag; + u8 in_l4_proto, out_l4_proto; + struct frag_hdr frag_copy; + struct frag_hdr *frag6; + __be32 saddr, daddr; + __be16 frag_off, id; + struct iphdr *iph4; + int l3_delta, err; + + /* snapshot original outer IPv6 fields before L3 rewrite */ + frag6 =3D cb->fragh_off ? (struct frag_hdr *)(skb->data + cb->fragh_off) : + NULL; + has_frag =3D !!frag6; + in_l4_proto =3D cb->l4_proto; + is_icmp_err =3D cb->is_icmp_err; + out_l4_proto =3D ipxlat_64_map_nexthdr_proto(in_l4_proto); + + old_l3_len =3D cb->l3_hdr_len; + new_l3_len =3D sizeof(struct iphdr); + l3_delta =3D (int)new_l3_len - (int)old_l3_len; + + if (unlikely(has_frag)) + frag_copy =3D *frag6; + first_frag =3D ipxlat_is_first_frag6(has_frag ? &frag_copy : NULL); + + if (unlikely(is_icmp_err)) { + if (unlikely(in_l4_proto !=3D NEXTHDR_ICMP)) + return -EINVAL; + } + + /* derive translated IPv4 endpoints */ + err =3D ipxlat_64_convert_addrs(&ipxlat->xlat_prefix6, &outer6, + is_icmp_err, &saddr, &daddr); + if (unlikely(err)) + return err; + + /* replace outer IPv6 hdr with IPv4 hdr in-place */ + skb_pull(skb, old_l3_len); + skb_push(skb, new_l3_len); + skb_reset_network_header(skb); + skb_set_transport_header(skb, new_l3_len); + skb->protocol =3D htons(ETH_P_IP); + + /* Rebase cached offsets after L3 size delta. + * For outer 6->4 translation this should not underflow: cached offsets + * were built from l3_off + ip6_len (+ ...), and + * delta =3D sizeof(struct iphdr) - ip6_len, so ip6_len cancels out after + * rebasing. A failure here means internal metadata inconsistency, not + * a packet validation outcome. + */ + err =3D ipxlat_cb_rebase_offsets(cb, l3_delta); + if (unlikely(err)) { + DEBUG_NET_WARN_ON_ONCE(1); + return err; + } + + cb->l3_hdr_len =3D sizeof(struct iphdr); + cb->fragh_off =3D 0; + cb->l4_proto =3D out_l4_proto; + DEBUG_NET_WARN_ON_ONCE(!ipxlat_cb_offsets_valid(cb)); + + /* build outer IPv4 base hdr from translated IPv6 fields */ + iph4 =3D ip_hdr(skb); + frag_off =3D ipxlat_64_build_frag_off(skb, has_frag ? &frag_copy : NULL, + out_l4_proto); + /* when source had Fragment Header we preserve its identification; + * otherwise allocate a fresh IPv4 ID for the translated packet + */ + id =3D has_frag ? cpu_to_be16(be32_to_cpu(frag_copy.identification)) : 0; + ipxlat_64_build_l3(iph4, &outer6, skb->len, frag_off, + out_l4_proto, saddr, daddr, + outer6.hop_limit - 1, id); + + if (likely(!has_frag)) { + iph4->id =3D 0; + __ip_select_ident(dev_net(ipxlat->dev), iph4, 1); + iph4->check =3D 0; + iph4->check =3D ip_fast_csum(iph4, iph4->ihl); + } + + /* non-first fragments have no transport header to translate */ + if (unlikely(!first_frag)) + goto out; + + /* ensure transport bytes are writable before L4 csum/proto rewrites */ + min_l4_len =3D ipxlat_l4_min_len(out_l4_proto); + if (unlikely(skb_ensure_writable(skb, skb_transport_offset(skb) + + min_l4_len))) + return -ENOMEM; + + /* translate transport hdr and pseudohdr dependent checksums */ + switch (out_l4_proto) { + case IPPROTO_TCP: + err =3D ipxlat_64_outer_tcp(skb, &outer6); + break; + case IPPROTO_UDP: + err =3D ipxlat_64_outer_udp(skb, &outer6); + break; + case IPPROTO_ICMP: + err =3D ipxlat_64_icmp(ipxlat, skb, &outer6); + break; + default: + err =3D 0; + break; + } + if (unlikely(err)) + return err; + +out: + /* normalize checksum/offload metadata for the translated frame */ + return ipxlat_finalize_offload(skb, out_l4_proto, ip_is_fragment(iph4), + SKB_GSO_TCPV6, SKB_GSO_TCPV4); +} diff --git a/drivers/net/ipxlat/translate_64.h b/drivers/net/ipxlat/transla= te_64.h new file mode 100644 index 000000000000..269d1955944f --- /dev/null +++ b/drivers/net/ipxlat/translate_64.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_TRANSLATE_64_H_ +#define _NET_IPXLAT_TRANSLATE_64_H_ + +#include "ipxlpriv.h" + +struct sk_buff; +struct iphdr; +struct ipv6hdr; + +/** + * ipxlat_64_build_l3 - build translated outer IPv4 header from IPv6 metad= ata + * @iph4: output IPv4 header + * @iph6: source IPv6 header + * @tot_len: resulting IPv4 total length + * @frag_off: resulting IPv4 fragment offset/flags + * @protocol: resulting IPv4 L4 protocol + * @saddr: resulting IPv4 source address + * @daddr: resulting IPv4 destination address + * @ttl: resulting IPv4 TTL + * @id: resulting IPv4 identification field + */ +void ipxlat_64_build_l3(struct iphdr *iph4, const struct ipv6hdr *iph6, + unsigned int tot_len, __be16 frag_off, u8 protocol, + __be32 saddr, __be32 daddr, u8 ttl, __be16 id); + +/** + * ipxlat_64_translate - translate outer packet from IPv6 to IPv4 in place + * @ipxlat: translator private context + * @skb: packet to translate + * + * Return: 0 on success, negative errno on translation failure. + */ +int ipxlat_64_translate(struct ipxlat_priv *ipxlat, struct sk_buff *skb); + +/** + * ipxlat_64_map_nexthdr_proto - map IPv6 nexthdr to IPv4 L4 protocol + * @nexthdr: IPv6 next-header value + * + * Return: IPv4 protocol value corresponding to @nexthdr. + */ +u8 ipxlat_64_map_nexthdr_proto(u8 nexthdr); + +#endif /* _NET_IPXLAT_TRANSLATE_64_H_ */ diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c index 3aa00c635916..78548d0b8c22 100644 --- a/drivers/net/ipxlat/transport.c +++ b/drivers/net/ipxlat/transport.c @@ -338,3 +338,14 @@ int ipxlat_64_inner_udp(struct sk_buff *skb, const str= uct ipv6hdr *in6, udp_new->check =3D CSUM_MANGLED_0; return 0; } + +int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb) +{ + return -EPROTONOSUPPORT; +} + +int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, + const struct ipv6hdr *outer6) +{ + return -EPROTONOSUPPORT; +} diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h index 9b6fe422b01f..0e69b98eafd0 100644 --- a/drivers/net/ipxlat/transport.h +++ b/drivers/net/ipxlat/transport.h @@ -100,4 +100,9 @@ int ipxlat_64_inner_tcp(struct sk_buff *skb, const stru= ct ipv6hdr *in6, int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6, const struct iphdr *out4, struct udphdr *udp_new); =20 +/* temporary ICMP stubs until ICMP translation support is introduced */ +int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb); +int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, + const struct ipv6hdr *outer6); + #endif /* _NET_IPXLAT_TRANSPORT_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-201.mailbox.org (mout-b-201.mailbox.org [195.10.208.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 858113E556B; Thu, 19 Mar 2026 15:20:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.61 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933631; cv=none; b=muisLNLfLkgGSC1/yfE13o7sWxUWImFkbzjqLifRQBVwMZj6ukpLLGQx0Yf/NDDVMjEVaGfsjlQ2bFpFAUylfWhfpWna8vG0Y/as4QXRQp47S7ofiES2le5gbwOwh7woaravIwnJAAt2e4JdBjPH+qC1jPczxu2Gngdpug6nj8c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933631; c=relaxed/simple; bh=HFsB2eky2Tk0cvZa+RJn225M3EBHBhiR9X02au8fvgE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eN/CTLTSkDNl+SL1XaL31lUnSJkBJWZB5wuKerA67rKrOjrJSsEozRVc7/v9PZZT3YZVJZQ4JlXwfTr66iecn1n6yzEsCb7h6q5yeZ31hlJbW7SEsQ+MQout+Fl7Emx+Yt1r5xXlihWZ+c6h9EV290GuMl/fBgLmARmQQdgm/Kc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=hJFg6k/E; arc=none smtp.client-ip=195.10.208.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="hJFg6k/E" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-201.mailbox.org (Postfix) with ESMTPS id 4fc8NG0dZrzDs2P; Thu, 19 Mar 2026 16:13:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933214; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lZmf3XMXGG3SmE3J2wOJSJ7B3hQ2E0d5WPu7AoMMTbo=; b=hJFg6k/EGLrXPGrEpxuSoGOJxC3USyR+lkRouQbilasWQzkvxxIVqHc0m8Xhf+LGDo+nrJ Cno4OLiZL/z47DJ5WDSAXAmmvdgy0+94kQZUOoNUgr+9EjdAx/NEMexUPiDwqIWi8SvtCS vBvVxg2O9PLnOzZ+2nc8/bnNs4r23RUsaenZKJGiCT8YS1YUrb2TF9IdH/pyVjRCKiJOPX UKdd342kqV8nK9tG/UAj9xPjxSYIcElK3HCLW4v07IPjLopAaONxiV0JVew+B9k7DAyFWC xBxwycdt8WcmGKWS59GGtAU3aZ/WEzyvF53hZZpyqA+zqYfOC04buZjA0mSIyA== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 09/15] ipxlat: emit translator-generated ICMP errors on drop Date: Thu, 19 Mar 2026 16:12:18 +0100 Message-ID: <20260319151230.655687-10-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8NG0dZrzDs2P Content-Type: text/plain; charset="utf-8" When validation or policy requires dropping a packet and generating an ICMP error, route that failure through explicit ICMP emission paths so the sender can be notified where appropriate. This commit adds translator-originated error generation for both directions and integrates it into dispatch action handling without changing normal forwarding behavior. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/dispatch.c | 66 ++++++++++++++++++++++++++++++++++- drivers/net/ipxlat/dispatch.h | 7 ++++ drivers/net/ipxlat/packet.c | 25 ++++++++++--- 3 files changed, 92 insertions(+), 6 deletions(-) diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c index 133d30859f49..b8b9b930b04c 100644 --- a/drivers/net/ipxlat/dispatch.c +++ b/drivers/net/ipxlat/dispatch.c @@ -11,7 +11,12 @@ * Ralf Lici */ =20 +#include +#include +#include #include +#include +#include =20 #include "dispatch.h" #include "packet.h" @@ -21,7 +26,8 @@ static enum ipxlat_action ipxlat_resolve_failed_action(const struct sk_buff *skb) { - return IPXLAT_ACT_DROP; + return ipxlat_skb_cb(skb)->emit_icmp_err ? IPXLAT_ACT_ICMP_ERR : + IPXLAT_ACT_DROP; } =20 enum ipxlat_action ipxlat_translate(struct ipxlat_priv *ipxlat, @@ -61,6 +67,59 @@ void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type,= u8 code, u32 info) cb->icmp_err.info =3D info; } =20 +static void ipxlat_46_emit_icmp_err(struct ipxlat_priv *ipxlat, + struct sk_buff *inner) +{ + struct ipxlat_cb *cb =3D ipxlat_skb_cb(inner); + const struct iphdr *iph =3D ip_hdr(inner); + struct inet_skb_parm param =3D {}; + + /* build route metadata on demand when the packet has no dst */ + if (unlikely(!skb_dst(inner))) { + const int reason =3D ip_route_input_noref(inner, iph->daddr, + iph->saddr, + ip4h_dscp(iph), + inner->dev); + + if (unlikely(reason)) { + netdev_dbg(ipxlat->dev, + "icmp4 emit: route build failed reason=3D%d\n", + reason); + return; + } + } + + /* emit the ICMPv4 error */ + __icmp_send(inner, cb->icmp_err.type, cb->icmp_err.code, + htonl(cb->icmp_err.info), ¶m); +} + +static void ipxlat_64_emit_icmp_err(struct sk_buff *inner) +{ + struct ipxlat_cb *cb =3D ipxlat_skb_cb(inner); + struct inet6_skb_parm param =3D {}; + + /* emit the ICMPv6 error */ + icmp6_send(inner, cb->icmp_err.type, cb->icmp_err.code, + cb->icmp_err.info, NULL, ¶m); +} + +/* emit translator-generated ICMP errors for packets rejected by RFC rules= */ +void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *in= ner) +{ + switch (ntohs(inner->protocol)) { + case ETH_P_IPV6: + ipxlat_64_emit_icmp_err(inner); + return; + case ETH_P_IP: + ipxlat_46_emit_icmp_err(ipxlat, inner); + return; + default: + DEBUG_NET_WARN_ON_ONCE(1); + return; + } +} + static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff = *skb) { const unsigned int len =3D skb->len; @@ -90,6 +149,11 @@ int ipxlat_process_skb(struct ipxlat_priv *ipxlat, stru= ct sk_buff *skb, dev_dstats_tx_add(ipxlat->dev, skb->len); ipxlat_forward_pkt(ipxlat, skb); return 0; + case IPXLAT_ACT_ICMP_ERR: + dev_dstats_tx_dropped(ipxlat->dev); + ipxlat_emit_icmp_error(ipxlat, skb); + consume_skb(skb); + return 0; case IPXLAT_ACT_DROP: goto drop_free; default: diff --git a/drivers/net/ipxlat/dispatch.h b/drivers/net/ipxlat/dispatch.h index fa6fafea656b..73acd831b6cf 100644 --- a/drivers/net/ipxlat/dispatch.h +++ b/drivers/net/ipxlat/dispatch.h @@ -44,6 +44,13 @@ enum ipxlat_action { */ void ipxlat_mark_icmp_drop(struct sk_buff *skb, u8 type, u8 code, u32 info= ); =20 +/** + * ipxlat_emit_icmp_error - emit cached translator-generated ICMP error + * @ipxlat: translator private context + * @inner: offending packet used as quoted payload + */ +void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff *in= ner); + /** * ipxlat_translate - validate/translate one packet and return next action * @ipxlat: translator private context diff --git a/drivers/net/ipxlat/packet.c b/drivers/net/ipxlat/packet.c index b37a3e55aff8..758b72bdc6f1 100644 --- a/drivers/net/ipxlat/packet.c +++ b/drivers/net/ipxlat/packet.c @@ -142,6 +142,8 @@ static int ipxlat_v4_srr_check(struct sk_buff *skb, con= st struct iphdr *hdr) if (unlikely(ptr > len - 3)) return -EINVAL; =20 + ipxlat_mark_icmp_drop(skb, ICMP_DEST_UNREACH, + ICMP_SR_FAILED, 0); return -EINVAL; } =20 @@ -272,8 +274,10 @@ static int ipxlat_v4_pull_hdrs(struct sk_buff *skb) /* RFC 7915 Section 4.1 */ if (unlikely(ipxlat_v4_srr_check(skb, l3_hdr))) return -EINVAL; - if (unlikely(l3_hdr->ttl <=3D 1)) + if (unlikely(l3_hdr->ttl <=3D 1)) { + ipxlat_mark_icmp_drop(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0); return -EINVAL; + } =20 /* RFC 7915 Section 1.2: * Fragmented ICMP/ICMPv6 packets will not be translated by IP/ICMP @@ -390,8 +394,11 @@ int ipxlat_v4_validate_skb(struct ipxlat_priv *ipxlat,= struct sk_buff *skb) * Fragmented checksum-less IPv4 UDP is rejected because 4->6 cannot * reliably translate it. */ - if (unlikely(ip_is_fragment(l3_hdr))) + if (unlikely(ip_is_fragment(l3_hdr))) { + ipxlat_mark_icmp_drop(skb, ICMP_DEST_UNREACH, ICMP_PKT_FILTERED, + 0); return -EINVAL; + } =20 /* udph->len bounds the span used to compute replacement checksum */ if (unlikely(ntohs(udph->len) > skb->len - cb->l4_off)) @@ -520,7 +527,7 @@ static int ipxlat_v6_walk_hdrs(struct sk_buff *skb, uns= igned int l3_offset, */ static int ipxlat_v6_check_rh(struct sk_buff *skb) { - unsigned int rh_off; + unsigned int rh_off, pointer; int flags, nexthdr; =20 rh_off =3D 0; @@ -531,6 +538,8 @@ static int ipxlat_v6_check_rh(struct sk_buff *skb) if (likely(nexthdr !=3D NEXTHDR_ROUTING)) return 0; =20 + pointer =3D rh_off + offsetof(struct ipv6_rt_hdr, segments_left); + ipxlat_mark_icmp_drop(skb, ICMPV6_PARAMPROB, ICMPV6_HDR_FIELD, pointer); return -EINVAL; } =20 @@ -550,8 +559,11 @@ static int ipxlat_v6_pull_outer_l3(struct sk_buff *skb) !ipxlat_v6_validate_saddr(&l3_hdr->saddr))) return -EINVAL; =20 - if (unlikely(l3_hdr->hop_limit <=3D 1)) + if (unlikely(l3_hdr->hop_limit <=3D 1)) { + ipxlat_mark_icmp_drop(skb, ICMPV6_TIME_EXCEED, + ICMPV6_EXC_HOPLIMIT, 0); return -EINVAL; + } =20 return 0; } @@ -617,8 +629,11 @@ static int ipxlat_v6_pull_hdrs(struct sk_buff *skb) /* -EPROTONOSUPPORT means packet layout is syntactically valid but * unsupported by our RFC 7915 path */ - if (unlikely(err =3D=3D -EPROTONOSUPPORT)) + if (unlikely(err =3D=3D -EPROTONOSUPPORT)) { + ipxlat_mark_icmp_drop(skb, ICMPV6_DEST_UNREACH, + ICMPV6_ADM_PROHIBITED, 0); return -EINVAL; + } if (unlikely(err)) return err; =20 --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-110.mailbox.org (mout-b-110.mailbox.org [195.10.208.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C095A3DEFEE; Thu, 19 Mar 2026 15:18:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933538; cv=none; b=BWA9HOFj914CWY+Ywpb9RjSgb/R5xIvzTr0hKKjqI6aVsvFsCHL4dXWVmtaz95/A9A6YWCuHGDlrpFU0ctPfoZX+38F5f2OhEyoVgU8z3fdwlWIBwSJv718P56nqQqFsUuhhfzFe1+S2JdCpXoDSjJEJO48aP2BLgeV0KkhKck0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933538; c=relaxed/simple; bh=p/IricehhSgBDg3JPp7uZ1Tlpjhpc28qmp7zqTB4g9k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JldgbUeRmwi8jZGb4o1u9JXatNER3gx5UkrG5Q2fCbtu5D+EMrPMB6f5Za00jsvWTYn5tZUfc0j9O7GruiQrU1e17zkzYIlpAWzurJqdoS1ERE46bkzLX6vLX6awW5P2oVM+7iRcwWVH7RK0PDEPZKaHiW/G5Y8yizUWBDwA84k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=X9fxtJcq; arc=none smtp.client-ip=195.10.208.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="X9fxtJcq" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-110.mailbox.org (Postfix) with ESMTPS id 4fc8NM4m33zB0t7; Thu, 19 Mar 2026 16:13:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sMX1i4BMieIT4ri4r8wi6e4mrWEXh5MS4f00MKXKxeI=; b=X9fxtJcqIBd3hDAWB5sbebpd1+OTNBnSz9INQRX8nStPA73NSRT36WjOgU8Cp6w/X0WamF g9qy/oZ7EbeqRico7oD8ZDBeWR+9dj+Nfpc1e4iwSNLlIgbfuqARQozuGRtG8Bh1nU9uIv t5qywwn7o/4VQpi91JRIAZvGN5xOjVhT/OdfamaV5OR5qJeu7OGRgQkxIP2pZWD6qpOr38 R/stHfntR6eA+Yq+KSgch+qmlIMvJALkKcqc/ER5TZ+d0sGrgxbNIaIQyZ2hPzJTWk2hIn KgmaUs2qae0eaaR9xY7/EV2h/KSso4e5BUjM1ijdFQKkG3GTqdQFwilnB9ZR3g== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 10/15] ipxlat: add 4to6 pre-fragmentation path Date: Thu, 19 Mar 2026 16:12:19 +0100 Message-ID: <20260319151230.655687-11-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8NM4m33zB0t7 Content-Type: text/plain; charset="utf-8" RFC 7915 requires handling packets that would exceed the translated IPv6 size constraints. Add a pre-fragmentation planning/action path that invokes kernel fragmentation helpers before translation, carries fragment size through skb metadata, and then reinjects fragments into the normal translation path. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/dispatch.c | 99 ++++++++++++++++++++++++++++++- drivers/net/ipxlat/translate_46.c | 59 +++++++++++++++++- drivers/net/ipxlat/translate_46.h | 11 ++++ 3 files changed, 166 insertions(+), 3 deletions(-) diff --git a/drivers/net/ipxlat/dispatch.c b/drivers/net/ipxlat/dispatch.c index b8b9b930b04c..b58191d4b2c9 100644 --- a/drivers/net/ipxlat/dispatch.c +++ b/drivers/net/ipxlat/dispatch.c @@ -47,6 +47,16 @@ enum ipxlat_action ipxlat_translate(struct ipxlat_priv *= ipxlat, if (unlikely(ipxlat_v4_validate_skb(ipxlat, skb))) return ipxlat_resolve_failed_action(skb); =20 + /* 4->6 prefrag plan stores per-skb frag_max_size + * when the packet must be split before translation + * (DF clear and translated size + * above PMTU/threshold). + */ + if (unlikely(ipxlat_46_plan_prefrag(ipxlat, skb))) + return ipxlat_resolve_failed_action(skb); + if (unlikely(ipxlat_skb_cb(skb)->frag_max_size)) + return IPXLAT_ACT_PRE_FRAG; + if (unlikely(ipxlat_46_translate(ipxlat, skb))) return ipxlat_resolve_failed_action(skb); =20 @@ -120,6 +130,76 @@ void ipxlat_emit_icmp_error(struct ipxlat_priv *ipxlat= , struct sk_buff *inner) } } =20 +static unsigned int ipxlat_frag_dst_get_mtu(const struct dst_entry *dst) +{ + return READ_ONCE(dst->dev->mtu); +} + +static struct dst_ops ipxlat_frag_dst_ops =3D { + .family =3D AF_UNSPEC, + .mtu =3D ipxlat_frag_dst_get_mtu, +}; + +/** + * ipxlat_46_frag_output - reinject one fragment produced by ip_do_fragment + * @net: network namespace of the transmitter + * @sk: originating socket + * @skb: fragment to reinject + * + * This callback mirrors ndo_start_xmit processing but runs with + * pre-fragmentation disabled to prevent recursive pre-fragment loops. + * + * Return: 0 on success, negative errno on processing failure. + */ +static int ipxlat_46_frag_output(struct net *net, struct sock *sk, + struct sk_buff *skb) +{ + struct ipxlat_priv *ipxlat =3D netdev_priv(skb->dev); + + return ipxlat_process_skb(ipxlat, skb, false); +} + +/** + * ipxlat_46_fragment_pkt - fragment oversized 4->6 input before translati= on + * @ipxlat: translator private context + * @skb: original packet to fragment + * @frag_max_size: per-fragment payload cap for ip_do_fragment + * + * Installs a temporary synthetic dst so ip_do_fragment can read MTU and t= hen + * reinjects each produced fragment back into ipxlat through + * ipxlat_46_frag_output. + * + * Return: 0 on success, negative errno on fragmentation failure. + */ +static int ipxlat_46_fragment_pkt(struct ipxlat_priv *ipxlat, + struct sk_buff *skb, u16 frag_max_size) +{ + const unsigned long orig_dst =3D skb->_skb_refdst; + struct rtable ipxlat_rt =3D {}; + int err; + + /* ip_do_fragment needs a dst object to query mtu */ + dst_init(&ipxlat_rt.dst, &ipxlat_frag_dst_ops, NULL, DST_OBSOLETE_NONE, + DST_NOCOUNT); + + /* use translator netdev as mtu source for the temporary dst */ + ipxlat_rt.dst.dev =3D ipxlat->dev; + + /* setup the skb for fragmentation */ + skb_dst_set_noref(skb, &ipxlat_rt.dst); + memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); + IPCB(skb)->frag_max_size =3D frag_max_size; + + /* fragment and reinject each frag in the translator */ + err =3D ip_do_fragment(dev_net(ipxlat->dev), skb->sk, skb, + ipxlat_46_frag_output); + + /* drop original dst ref replaced by the synthetic NOREF dst */ + refdst_drop(orig_dst); + + return err; +} + static void ipxlat_forward_pkt(struct ipxlat_priv *ipxlat, struct sk_buff = *skb) { const unsigned int len =3D skb->len; @@ -141,14 +221,29 @@ int ipxlat_process_skb(struct ipxlat_priv *ipxlat, st= ruct sk_buff *skb, enum ipxlat_action action; int err =3D -EINVAL; =20 - (void)allow_pre_frag; - action =3D ipxlat_translate(ipxlat, skb); switch (action) { case IPXLAT_ACT_FWD: dev_dstats_tx_add(ipxlat->dev, skb->len); ipxlat_forward_pkt(ipxlat, skb); return 0; + case IPXLAT_ACT_PRE_FRAG: + /* prefrag is allowed only once to avoid unbounded loops */ + if (unlikely(!allow_pre_frag)) { + err =3D -ELOOP; + goto drop_free; + } + + /* fragment first, then reinject each fragment through + * ipxlat_process_skb via ipxlat_46_frag_output + */ + err =3D ipxlat_46_fragment_pkt(ipxlat, skb, + ipxlat_skb_cb(skb)->frag_max_size); + /* fragment path already consumed/freed skb */ + skb =3D NULL; + if (unlikely(err)) + goto drop_free; + return 0; case IPXLAT_ACT_ICMP_ERR: dev_dstats_tx_dropped(ipxlat->dev); ipxlat_emit_icmp_error(ipxlat, skb); diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/transla= te_46.c index aec8500db2c2..0b79ca07c771 100644 --- a/drivers/net/ipxlat/translate_46.c +++ b/drivers/net/ipxlat/translate_46.c @@ -87,6 +87,63 @@ unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *= ipxlat, return mtu6; } =20 +/** + * ipxlat_46_plan_prefrag - plan pre-translation IPv4 fragmentation for 4-= >6 + * @ipxlat: translator private context + * @skb: packet being translated + * + * Decides whether packet exceeds PMTU/LIM thresholds and, when needed, st= ores + * per-skb fragmentation cap in cb->frag_max_size for later ip_do_fragment. + * + * Return: 0 on success, negative errno on policy/validation failure. + */ +int ipxlat_46_plan_prefrag(struct ipxlat_priv *ipxlat, struct sk_buff *skb) +{ + unsigned int pkt_len6, pmtu6, threshold6, frag_max_size, pkt_len4, + old_l3_len, new_l3_len; + struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct iphdr *in4 =3D ip_hdr(skb); + int l3_delta, frag_l3_delta; + + if (unlikely(cb->frag_max_size)) { + DEBUG_NET_WARN_ON_ONCE(1); + cb->frag_max_size =3D 0; + } + + pkt_len4 =3D iph_totlen(skb, in4); + old_l3_len =3D cb->l3_hdr_len; + new_l3_len =3D sizeof(struct ipv6hdr) + + (ip_is_fragment(in4) ? sizeof(struct frag_hdr) : 0); + l3_delta =3D (int)new_l3_len - (int)old_l3_len; + pkt_len6 =3D pkt_len4 + l3_delta; + + pmtu6 =3D ipxlat_46_lookup_pmtu6(ipxlat, skb, in4); + threshold6 =3D min(pmtu6, READ_ONCE(ipxlat->lowest_ipv6_mtu)); + + if (likely(pkt_len6 <=3D threshold6)) + return 0; + + /* df packets are never locally pre-fragmented */ + if (likely(be16_to_cpu(in4->frag_off) & IP_DF)) { + /* Let the IPv6 forwarding path raise PTB when needed and rely + * on the reverse 6->4 ICMP translation path for feedback. + */ + return 0; + } + + /* df not set: we can fragment */ + + frag_l3_delta =3D + (int)(sizeof(struct ipv6hdr) + sizeof(struct frag_hdr)) - + (int)old_l3_len; + frag_max_size =3D threshold6 - frag_l3_delta; + /* store per-skb prefrag cap: ipxlat_46_fragment_pkt will copy it into + * IPCB(skb)->frag_max_size before calling ip_do_fragment + */ + cb->frag_max_size =3D min_t(unsigned int, frag_max_size, IP_MAX_MTU); + return 0; +} + /** * ipxlat_46_translate - translate one validated packet from IPv4 to IPv6 * @ipxlat: translator private context @@ -182,7 +239,7 @@ int ipxlat_46_translate(struct ipxlat_priv *ipxlat, str= uct sk_buff *skb) err =3D ipxlat_46_outer_udp(skb, &outer4); break; case IPPROTO_ICMP: - err =3D ipxlat_46_icmp(ipxlat, skb); + err =3D -EPROTONOSUPPORT; break; default: err =3D 0; diff --git a/drivers/net/ipxlat/translate_46.h b/drivers/net/ipxlat/transla= te_46.h index 75def10d0cad..6ba409c94185 100644 --- a/drivers/net/ipxlat/translate_46.h +++ b/drivers/net/ipxlat/translate_46.h @@ -61,6 +61,17 @@ unsigned int ipxlat_46_lookup_pmtu6(struct ipxlat_priv *= ipxlat, const struct sk_buff *skb, const struct iphdr *in4); =20 +/** + * ipxlat_46_plan_prefrag - decide whether IPv4 packet must be pre-fragmen= ted + * @ipxlat: translator private context + * @skb: packet being translated + * + * Sets cb->frag_max_size when pre-fragmentation is required. + * + * Return: 0 on success, negative errno on policy/validation failure. + */ +int ipxlat_46_plan_prefrag(struct ipxlat_priv *ipxlat, struct sk_buff *skb= ); + /** * ipxlat_46_translate - translate outer packet from IPv4 to IPv6 in place * @ipxlat: translator private context --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-112.mailbox.org (mout-b-112.mailbox.org [195.10.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 682B272622; Thu, 19 Mar 2026 15:13:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933231; cv=none; b=P2jZOjVYw3uS8JpwMVsn0Hk7caLGb3p0nMRAqBacojDQYSyVkpQPGcx83yTQFvm2oU900I0zshhCnPTLmXNYnoqiOt8hJuFz5f8L+//tOzPWICPRaAoibvdVl9ryuHSWXqiN9WTnBTvbgLjAlRBA7qHB6blP3y7yYWlVJZYIvWM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933231; c=relaxed/simple; bh=YdGU6iQSB6J1VJnMwbYn+xQQAwK+qB4z+3mEGB8ehFU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oWRiuTgk7OFT5iA6dxlPa9q4VOqt7QBVlYbLlsuvFMWJ/d1gM3Gw7EjAKujJgRoRIi17AEPwOlgXCHBQ9zNtqtBEuBbsTPorMQGxH5xXXAFzocqZV0f0LbvqRW0kr+SIDUKKzt/V6+2zHkCAV0Qv+uHldk1+kliOdY53ACZKot0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=CcIZynlt; arc=none smtp.client-ip=195.10.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="CcIZynlt" Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-112.mailbox.org (Postfix) with ESMTPS id 4fc8NT3JspzDvP8; Thu, 19 Mar 2026 16:13:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933225; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j3X36v8Qu47aU8G+k1e+nD4qx4C1jckCtR5SYpVmAAM=; b=CcIZynlt99MXoP0o+PubWaZlzVOHM49Z4XxdmiYG/BI1AbRk28ZGfNZU+l0itBkqEJFGHN 6Qld/EIR3gIlXe56Ad7BU7RrQxEyh93Yur1wtvFEhzaXKb5R9m3V9Y02Yk120ft2AAsgd9 4CpYpoasUO9T2VUMLwFL1CfrUT3pwBGn/kHckAzvUXQ664Wf6z8mXDZECJzv7Vvhqd1Dxb /c8SUO98THmUzvLoZfUAIEasFswv0L1jC3OQKJzIIGYxvPzaWljCxJuAb5/cnBTeuJPNSc D8tcUjjxON5/TpFFut/v98UVcEuzdF+i0Kg3qA7yzx1wbDnPpKxUCdE2ygIovQ== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of ralf@mandelbit.com designates 2001:67c:2050:b231:465::102 as permitted sender) smtp.mailfrom=ralf@mandelbit.com From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 11/15] ipxlat: add ICMP informational translation paths Date: Thu, 19 Mar 2026 16:12:20 +0100 Message-ID: <20260319151230.655687-12-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4fc8NT3JspzDvP8 Add ICMP informational message translation for both 4->6 and 6->4 paths and wire the new ICMP translation units into the engine. This introduces the protocol mapping and checksum update logic for echo request/reply traffic, while ICMP error quoted-inner translation is added in a follow-up commit. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/Makefile | 2 + drivers/net/ipxlat/icmp.h | 43 ++++++++++++++ drivers/net/ipxlat/icmp_46.c | 95 +++++++++++++++++++++++++++++++ drivers/net/ipxlat/icmp_64.c | 92 ++++++++++++++++++++++++++++++ drivers/net/ipxlat/translate_64.c | 1 + drivers/net/ipxlat/transport.c | 11 ---- drivers/net/ipxlat/transport.h | 5 -- 7 files changed, 233 insertions(+), 16 deletions(-) create mode 100644 drivers/net/ipxlat/icmp.h create mode 100644 drivers/net/ipxlat/icmp_46.c create mode 100644 drivers/net/ipxlat/icmp_64.c diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile index d7b7097aee5f..2ded504902e3 100644 --- a/drivers/net/ipxlat/Makefile +++ b/drivers/net/ipxlat/Makefile @@ -11,3 +11,5 @@ ipxlat-objs +=3D transport.o ipxlat-objs +=3D dispatch.o ipxlat-objs +=3D translate_46.o ipxlat-objs +=3D translate_64.o +ipxlat-objs +=3D icmp_46.o +ipxlat-objs +=3D icmp_64.o diff --git a/drivers/net/ipxlat/icmp.h b/drivers/net/ipxlat/icmp.h new file mode 100644 index 000000000000..52d681787d6a --- /dev/null +++ b/drivers/net/ipxlat/icmp.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_ICMP_H_ +#define _NET_IPXLAT_ICMP_H_ + +#include + +#include "ipxlpriv.h" + +/** + * ipxlat_46_icmp - translate ICMP informational payload + * after outer 4->6 rewrite + * @ipxl: translator private context + * @skb: packet carrying ICMPv4 transport payload + * + * Return: 0 on success, negative errno on translation failure. + */ +int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb); + +/** + * ipxlat_64_icmp - translate ICMP informational payload + * after outer 6->4 rewrite + * @ipxlat: translator private context + * @skb: packet carrying ICMPv6 transport payload + * @in6: snapshot of original outer IPv6 header + * + * Return: 0 on success, negative errno on translation failure. + */ +int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, + const struct ipv6hdr *in6); + +#endif /* _NET_IPXLAT_ICMP_H_ */ diff --git a/drivers/net/ipxlat/icmp_46.c b/drivers/net/ipxlat/icmp_46.c new file mode 100644 index 000000000000..ad907f60416c --- /dev/null +++ b/drivers/net/ipxlat/icmp_46.c @@ -0,0 +1,95 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include +#include + +#include "icmp.h" +#include "packet.h" +#include "transport.h" + +static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in, + struct icmp6hdr *out) +{ + switch (in->type) { + case ICMP_ECHO: + out->icmp6_type =3D ICMPV6_ECHO_REQUEST; + out->icmp6_code =3D 0; + out->icmp6_identifier =3D in->un.echo.id; + out->icmp6_sequence =3D in->un.echo.sequence; + return 0; + case ICMP_ECHOREPLY: + out->icmp6_type =3D ICMPV6_ECHO_REPLY; + out->icmp6_code =3D 0; + out->icmp6_identifier =3D in->un.echo.id; + out->icmp6_sequence =3D in->un.echo.sequence; + return 0; + } + + return -EPROTONOSUPPORT; +} + +static void ipxlat_46_icmp_info_update_csum(const struct icmphdr *icmp4, + struct icmp6hdr *icmp6, + const struct ipv6hdr *ip6, + const struct sk_buff *skb, + unsigned int l4_off) +{ + struct icmp6hdr icmp6_zero; + struct icmphdr icmp4_zero; + __wsum csum; + + icmp4_zero =3D *icmp4; + icmp4_zero.checksum =3D 0; + icmp6_zero =3D *icmp6; + icmp6_zero.icmp6_cksum =3D 0; + csum =3D ~csum_unfold(icmp4->checksum); + csum =3D csum_sub(csum, csum_partial(&icmp4_zero, sizeof(icmp4_zero), 0)); + csum =3D csum_add(csum, csum_partial(&icmp6_zero, sizeof(icmp6_zero), 0)); + icmp6->icmp6_cksum =3D csum_ipv6_magic(&ip6->saddr, &ip6->daddr, + skb->len - l4_off, + IPPROTO_ICMPV6, csum); +} + +static int ipxlat_46_icmp_info_outer(struct sk_buff *skb) +{ + const unsigned int l4_off =3D skb_transport_offset(skb); + const struct icmphdr icmp4 =3D *icmp_hdr(skb); + const struct ipv6hdr *ip6 =3D ipv6_hdr(skb); + struct icmp6hdr *icmp6 =3D icmp6_hdr(skb); + int err; + + err =3D ipxlat_46_map_icmp_info_type_code(&icmp4, icmp6); + if (unlikely(err)) + return -EINVAL; + + if (skb->ip_summed =3D=3D CHECKSUM_PARTIAL) { + icmp6->icmp6_cksum =3D ~csum_ipv6_magic(&ip6->saddr, &ip6->daddr, + skb->len - l4_off, + IPPROTO_ICMPV6, 0); + return ipxlat_set_partial_csum(skb, offsetof(struct icmp6hdr, + icmp6_cksum)); + } + + ipxlat_46_icmp_info_update_csum(&icmp4, icmp6, ip6, skb, l4_off); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb) +{ + if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err)) + return -EPROTONOSUPPORT; + + return ipxlat_46_icmp_info_outer(skb); +} diff --git a/drivers/net/ipxlat/icmp_64.c b/drivers/net/ipxlat/icmp_64.c new file mode 100644 index 000000000000..6b11aa638068 --- /dev/null +++ b/drivers/net/ipxlat/icmp_64.c @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2024- Alberto Leiva Popper + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Alberto Leiva Popper + * Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include + +#include "icmp.h" +#include "packet.h" +#include "transport.h" + +static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in, + struct icmphdr *out) +{ + switch (in->icmp6_type) { + case ICMPV6_ECHO_REQUEST: + out->type =3D ICMP_ECHO; + out->code =3D 0; + out->un.echo.id =3D in->icmp6_identifier; + out->un.echo.sequence =3D in->icmp6_sequence; + return 0; + case ICMPV6_ECHO_REPLY: + out->type =3D ICMP_ECHOREPLY; + out->code =3D 0; + out->un.echo.id =3D in->icmp6_identifier; + out->un.echo.sequence =3D in->icmp6_sequence; + return 0; + default: + return -EINVAL; + } +} + +static __sum16 ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6, + const struct icmp6hdr *in_icmp6, + const struct icmphdr *out_icmp4, + unsigned int l4_len) +{ + struct icmp6hdr icmp6_zero; + struct icmphdr icmp4_zero; + __wsum csum, tmp; + + icmp6_zero =3D *in_icmp6; + icmp6_zero.icmp6_cksum =3D 0; + icmp4_zero =3D *out_icmp4; + icmp4_zero.checksum =3D 0; + + csum =3D ~csum_unfold(in_icmp6->icmp6_cksum); + tmp =3D ~csum_unfold(csum_ipv6_magic(&in6->saddr, &in6->daddr, l4_len, + NEXTHDR_ICMP, 0)); + csum =3D csum_sub(csum, tmp); + csum =3D csum_sub(csum, csum_partial(&icmp6_zero, sizeof(icmp6_zero), 0)); + csum =3D csum_add(csum, csum_partial(&icmp4_zero, sizeof(icmp4_zero), 0)); + return csum_fold(csum); +} + +static int ipxlat_64_icmp_info(struct sk_buff *skb, const struct ipv6hdr *= in6) +{ + struct icmp6hdr ic6_copy, *ic6; + struct icmphdr *ic4; + int err; + + ic6 =3D icmp6_hdr(skb); + ic6_copy =3D *ic6; + + ic4 =3D (struct icmphdr *)(skb->data + skb_transport_offset(skb)); + err =3D ipxlat_64_map_icmp_info_type_code(&ic6_copy, ic4); + if (unlikely(err)) + return err; + + ic4->checksum =3D + ipxlat_64_compute_icmp_info_csum(in6, &ic6_copy, ic4, + ipxlat_skb_datagram_len(skb)); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_64_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb, + const struct ipv6hdr *in6) +{ + if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err)) + return -EPROTONOSUPPORT; + + return ipxlat_64_icmp_info(skb, in6); +} diff --git a/drivers/net/ipxlat/translate_64.c b/drivers/net/ipxlat/transla= te_64.c index 50a95fb75f9d..412d29214a43 100644 --- a/drivers/net/ipxlat/translate_64.c +++ b/drivers/net/ipxlat/translate_64.c @@ -16,6 +16,7 @@ =20 #include "translate_64.h" #include "address.h" +#include "icmp.h" #include "packet.h" #include "transport.h" =20 diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c index 78548d0b8c22..3aa00c635916 100644 --- a/drivers/net/ipxlat/transport.c +++ b/drivers/net/ipxlat/transport.c @@ -338,14 +338,3 @@ int ipxlat_64_inner_udp(struct sk_buff *skb, const str= uct ipv6hdr *in6, udp_new->check =3D CSUM_MANGLED_0; return 0; } - -int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb) -{ - return -EPROTONOSUPPORT; -} - -int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, - const struct ipv6hdr *outer6) -{ - return -EPROTONOSUPPORT; -} diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h index 0e69b98eafd0..9b6fe422b01f 100644 --- a/drivers/net/ipxlat/transport.h +++ b/drivers/net/ipxlat/transport.h @@ -100,9 +100,4 @@ int ipxlat_64_inner_tcp(struct sk_buff *skb, const stru= ct ipv6hdr *in6, int ipxlat_64_inner_udp(struct sk_buff *skb, const struct ipv6hdr *in6, const struct iphdr *out4, struct udphdr *udp_new); =20 -/* temporary ICMP stubs until ICMP translation support is introduced */ -int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb); -int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, - const struct ipv6hdr *outer6); - #endif /* _NET_IPXLAT_TRANSPORT_H_ */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-107.mailbox.org (mout-b-107.mailbox.org [195.10.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1B9A3DBD4C; Thu, 19 Mar 2026 15:13:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933236; cv=none; b=P07N0oqqzY13GIZJ4E2Pz7krifON4YVrmDks395NNVUA1CB5YU3UB2d5Zq4wdZ5M7QaBfcgHWxgd2tujBNMDmmnVs0iY6xXcabYosL2izpqGYKTupIb/Swt3FqkNlevBVQlgV28NeSDd5sRKcwN2jeaTVYJ3yq5llJKBv8XZkIE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933236; c=relaxed/simple; bh=iv/RoZG2i0Vfix/pw0Pn5MBTtScs4OUE8WnIBTXtYW0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z4d3gbZfQFeS0kwVuQe7aYlsAwcVSZ2YNHa8OlkFGM4RNe7G4qSrUt9+oywMMMQC0384Wdrq1B6IDzbrGke95JBEMfGNrCx9lnPW9M7DwuGiNGLED3tXo4YEMskZv5aWEj/5Is3K/P2p5CmgXkIz1iOjEuUWP8SLdZ1laAHEOhU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=s07lKALh; arc=none smtp.client-ip=195.10.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="s07lKALh" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-107.mailbox.org (Postfix) with ESMTPS id 4fc8NZ5XYMzDs35; Thu, 19 Mar 2026 16:13:50 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BdVIX3OEz97Rndlxx28vLkvE2hbfipn+YTLQYxmB624=; b=s07lKALh/eJ7qxH1/6ap4ZTVqx+NCt5z7y8QD84eYiuKyb/QsDK2CKmjLPgAW4teW0stgo 21CxSMPuDLnYw2RaDk96ETwqMGrpUW3tX7PBDZr5md6+kt8Zi4RxN94dOZQ5RNeaVlYdnr E5gjWm7YQ2TfJj/ZLGdDwNMOPlDpQm4wRnZUOHp+9lbaipUuyqsrV/0ZmyAupwxHu7RSOl R/TpoTs+W36uhAzy7ozY6RLCSbwHbm7IASBJIK//3me5yZ6J1PdZsyfH9lf5SclV6IAzHd LRVUIi0axJga3p1M22cBc4+qu4W5NOQSHeCt2L/sfQ8rJU1VedlmpSrIrrdq+A== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org Subject: [RFC net-next 12/15] ipxlat: add ICMP error translation and quoted-inner handling Date: Thu, 19 Mar 2026 16:12:21 +0100 Message-ID: <20260319151230.655687-13-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend ICMP translation with error-path support for both directions, including quoted-inner packet rewriting and RFC 4884 extension relayout/squeeze logic. This adds the ICMP type/code/error-field mappings, inner L3/L4 rewrite paths, and final checksum handling required for translator ICMP error processing. Signed-off-by: Ralf Lici --- drivers/net/ipxlat/icmp.h | 14 +- drivers/net/ipxlat/icmp_46.c | 467 ++++++++++++++++++++++++++++++++- drivers/net/ipxlat/icmp_64.c | 453 +++++++++++++++++++++++++++++++- drivers/net/ipxlat/transport.c | 61 +++++ drivers/net/ipxlat/transport.h | 19 ++ 5 files changed, 996 insertions(+), 18 deletions(-) diff --git a/drivers/net/ipxlat/icmp.h b/drivers/net/ipxlat/icmp.h index 52d681787d6a..71bd7e20af91 100644 --- a/drivers/net/ipxlat/icmp.h +++ b/drivers/net/ipxlat/icmp.h @@ -19,22 +19,24 @@ #include "ipxlpriv.h" =20 /** - * ipxlat_46_icmp - translate ICMP informational payload - * after outer 4->6 rewrite - * @ipxl: translator private context + * ipxlat_46_icmp - translate ICMP payload after outer 4->6 L3 rewrite + * @ipxlat: translator private context * @skb: packet carrying ICMPv4 transport payload * + * Handles both ICMP info translation and ICMP error quoted-inner rewritin= g. + * * Return: 0 on success, negative errno on translation failure. */ -int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb); +int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb); =20 /** - * ipxlat_64_icmp - translate ICMP informational payload - * after outer 6->4 rewrite + * ipxlat_64_icmp - translate ICMP payload after outer 6->4 L3 rewrite * @ipxlat: translator private context * @skb: packet carrying ICMPv6 transport payload * @in6: snapshot of original outer IPv6 header * + * Handles both ICMP info translation and ICMP error quoted-inner rewritin= g. + * * Return: 0 on success, negative errno on translation failure. */ int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, diff --git a/drivers/net/ipxlat/icmp_46.c b/drivers/net/ipxlat/icmp_46.c index ad907f60416c..41a91d4bc3dc 100644 --- a/drivers/net/ipxlat/icmp_46.c +++ b/drivers/net/ipxlat/icmp_46.c @@ -11,13 +11,49 @@ * Ralf Lici */ =20 -#include -#include - +#include "address.h" #include "icmp.h" #include "packet.h" +#include "translate_46.h" #include "transport.h" =20 +#define IPXLAT_ICMP4_PP_CODE_PTR 0 +#define IPXLAT_ICMP4_PP_CODE_BADLEN 2 + +/* RFC 7915 Section 4.2, Figure 3 */ +static const u8 ipxlat_46_icmp_param_prob_map[] =3D { 0, 1, 4, 4, 0x= ff, + 0xff, 0xff, 0xff, 7, 6, + 0xff, 0xff, 8, 8, 8, + 8, 24, 24, 24, 24 }; + +/* RFC 1191 plateau table used when ICMPv4 FRAG_NEEDED reports MTU=3D0 */ +static const u16 ipxlat_46_mtu_plateaus[] =3D { + 65535, 32000, 17914, 8166, 4352, 2002, 1492, +}; + +static u8 ipxlat_icmp4_get_param_ptr(const struct icmphdr *ic4) +{ + return ntohl(ic4->un.gateway) >> 24; +} + +static int ipxlat_46_map_icmp_param_prob(const struct icmphdr *in, + struct icmp6hdr *out) +{ + u8 ptr; + + if (unlikely(in->code !=3D IPXLAT_ICMP4_PP_CODE_PTR && + in->code !=3D IPXLAT_ICMP4_PP_CODE_BADLEN)) + return -EPROTONOSUPPORT; + + ptr =3D ipxlat_icmp4_get_param_ptr(in); + if (unlikely(ptr >=3D ARRAY_SIZE(ipxlat_46_icmp_param_prob_map) || + ipxlat_46_icmp_param_prob_map[ptr] =3D=3D 0xff)) + return -EPROTONOSUPPORT; + + out->icmp6_pointer =3D cpu_to_be32(ipxlat_46_icmp_param_prob_map[ptr]); + return 0; +} + static int ipxlat_46_map_icmp_info_type_code(const struct icmphdr *in, struct icmp6hdr *out) { @@ -39,6 +75,165 @@ static int ipxlat_46_map_icmp_info_type_code(const stru= ct icmphdr *in, return -EPROTONOSUPPORT; } =20 +static __be32 ipxlat_46_compute_icmp_mtu6(unsigned int pkt_mtu, + unsigned int nexthop6mtu, + unsigned int nexthop4mtu, + u16 tot_len_field) +{ + unsigned int i; + u32 result; + + /* RFC 7915 Section 4.2: + * If the IPv4 router set the MTU field to zero, then the translator + * MUST use the plateau values specified in RFC 1191 to determine a + * likely path MTU and include that path MTU in the ICMPv6 packet. + */ + if (unlikely(pkt_mtu =3D=3D 0)) { + for (i =3D 0; i < ARRAY_SIZE(ipxlat_46_mtu_plateaus); i++) { + if (ipxlat_46_mtu_plateaus[i] < tot_len_field) { + pkt_mtu =3D ipxlat_46_mtu_plateaus[i]; + break; + } + } + } + + /* RFC 7915 Section 4.2: + * max(1280, min(pkt_mtu + 20, mtu6_nexthop, mtu4_nexthop + 20)) + * + * pkt_mtu + 20 converts ICMPv4-reported MTU to IPv6 context. + * mtu6_nexthop and mtu4_nexthop + 20 clamp to local next-hop limits. + * max(..., 1280) enforces IPv6 minimum MTU. + */ + result =3D min(pkt_mtu + 20, min(nexthop6mtu, nexthop4mtu + 20)); + if (result < IPV6_MIN_MTU) + result =3D IPV6_MIN_MTU; + + return cpu_to_be32(result); +} + +static int ipxlat_46_build_icmp_dest_unreach(struct ipxlat_priv *ipxlat, + struct sk_buff *skb, + const struct icmphdr *in, + struct icmp6hdr *out, + const struct iphdr *inner4) +{ + unsigned int inner4_tot_len, in_frag_mtu, in_mtu, out_mtu; + + switch (in->code) { + case ICMP_NET_UNREACH: + case ICMP_HOST_UNREACH: + case ICMP_SR_FAILED: + case ICMP_NET_UNKNOWN: + case ICMP_HOST_UNKNOWN: + case ICMP_HOST_ISOLATED: + case ICMP_NET_UNR_TOS: + case ICMP_HOST_UNR_TOS: + case ICMP_PORT_UNREACH: + case ICMP_NET_ANO: + case ICMP_HOST_ANO: + case ICMP_PKT_FILTERED: + case ICMP_PREC_CUTOFF: + out->icmp6_unused =3D 0; + return 0; + case ICMP_PROT_UNREACH: + out->icmp6_pointer =3D + cpu_to_be32(offsetof(struct ipv6hdr, nexthdr)); + return 0; + case ICMP_FRAG_NEEDED: + in_frag_mtu =3D be16_to_cpu(in->un.frag.mtu); + inner4_tot_len =3D be16_to_cpu(inner4->tot_len); + in_mtu =3D READ_ONCE(ipxlat->dev->mtu); + out_mtu =3D ipxlat_46_lookup_pmtu6(ipxlat, skb, inner4); + + out->icmp6_mtu =3D + ipxlat_46_compute_icmp_mtu6(in_frag_mtu, out_mtu, + in_mtu, inner4_tot_len); + return 0; + } + + return -EPROTONOSUPPORT; +} + +static int ipxlat_46_map_icmp_type_code(struct ipxlat_priv *ipxlat, + struct sk_buff *skb, + const struct icmphdr *in, + struct icmp6hdr *out, + const struct iphdr *inner4, + bool *ie_forbidden) +{ + int err; + + *ie_forbidden =3D false; + + switch (in->type) { + case ICMP_ECHO: + case ICMP_ECHOREPLY: + return ipxlat_46_map_icmp_info_type_code(in, out); + case ICMP_DEST_UNREACH: + switch (in->code) { + case ICMP_NET_UNREACH: + case ICMP_HOST_UNREACH: + case ICMP_SR_FAILED: + case ICMP_NET_UNKNOWN: + case ICMP_HOST_UNKNOWN: + case ICMP_HOST_ISOLATED: + case ICMP_NET_UNR_TOS: + case ICMP_HOST_UNR_TOS: + out->icmp6_type =3D ICMPV6_DEST_UNREACH; + out->icmp6_code =3D ICMPV6_NOROUTE; + break; + case ICMP_PROT_UNREACH: + out->icmp6_type =3D ICMPV6_PARAMPROB; + out->icmp6_code =3D ICMPV6_UNK_NEXTHDR; + *ie_forbidden =3D true; + break; + case ICMP_PORT_UNREACH: + out->icmp6_type =3D ICMPV6_DEST_UNREACH; + out->icmp6_code =3D ICMPV6_PORT_UNREACH; + break; + case ICMP_FRAG_NEEDED: + out->icmp6_type =3D ICMPV6_PKT_TOOBIG; + out->icmp6_code =3D 0; + *ie_forbidden =3D true; + break; + case ICMP_NET_ANO: + case ICMP_HOST_ANO: + case ICMP_PKT_FILTERED: + case ICMP_PREC_CUTOFF: + out->icmp6_type =3D ICMPV6_DEST_UNREACH; + out->icmp6_code =3D ICMPV6_ADM_PROHIBITED; + break; + default: + return -EPROTONOSUPPORT; + } + return ipxlat_46_build_icmp_dest_unreach(ipxlat, + skb, in, out, + inner4); + case ICMP_TIME_EXCEEDED: + out->icmp6_type =3D ICMPV6_TIME_EXCEED; + out->icmp6_code =3D in->code; + out->icmp6_unused =3D 0; + return 0; + case ICMP_PARAMETERPROB: + out->icmp6_type =3D ICMPV6_PARAMPROB; + *ie_forbidden =3D true; + switch (in->code) { + case IPXLAT_ICMP4_PP_CODE_PTR: + case IPXLAT_ICMP4_PP_CODE_BADLEN: + out->icmp6_code =3D ICMPV6_HDR_FIELD; + break; + default: + return -EPROTONOSUPPORT; + } + err =3D ipxlat_46_map_icmp_param_prob(in, out); + if (unlikely(err)) + return err; + return 0; + } + + return -EPROTONOSUPPORT; +} + static void ipxlat_46_icmp_info_update_csum(const struct icmphdr *icmp4, struct icmp6hdr *icmp6, const struct ipv6hdr *ip6, @@ -86,10 +281,272 @@ static int ipxlat_46_icmp_info_outer(struct sk_buff *= skb) return 0; } =20 -int ipxlat_46_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb) +static int ipxlat_46_icmp_info_inner(struct sk_buff *skb, + unsigned int inner_l4_off, + const struct ipv6hdr *inner6) +{ + struct icmp6hdr *icmp6; + struct icmphdr icmp4; + int err; + + /* inner header alignment is not guaranteed */ + memcpy(&icmp4, skb->data + inner_l4_off, sizeof(icmp4)); + icmp6 =3D (struct icmp6hdr *)(skb->data + inner_l4_off); + + err =3D ipxlat_46_map_icmp_info_type_code(&icmp4, icmp6); + if (unlikely(err)) + return -EINVAL; + + ipxlat_46_icmp_info_update_csum(&icmp4, icmp6, inner6, skb, + inner_l4_off); + return 0; +} + +static int ipxlat_46_icmp_inner_l4(struct sk_buff *skb, + unsigned int inner_l4_off, + const struct iphdr *inner4, + const struct ipv6hdr *inner6) +{ + struct tcphdr *tcp; + struct udphdr *udp; + + switch (inner4->protocol) { + case IPPROTO_TCP: + tcp =3D (struct tcphdr *)(skb->data + inner_l4_off); + return ipxlat_46_inner_tcp(skb, inner4, inner6, tcp); + case IPPROTO_UDP: + udp =3D (struct udphdr *)(skb->data + inner_l4_off); + return ipxlat_46_inner_udp(skb, inner4, inner6, udp); + case IPPROTO_ICMP: + return ipxlat_46_icmp_info_inner(skb, inner_l4_off, inner6); + default: + return 0; + } +} + +static int ipxlat_46_icmp_inner(struct ipxlat_priv *ipxlat, + struct sk_buff *skb, struct iphdr *inner4, + int *inner_delta) +{ + unsigned int inner_l3_len, inner_l3_off, inner_l4_off, old_prefix, + new_prefix, inner_tot_len, inner_l3_payload, inner_l4_payload; + const unsigned int outer_l3_len =3D skb_transport_offset(skb); + const struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + struct ipv6hdr outer_ip6_copy, *inner_ip6; + struct frag_hdr *fh6; + u8 next_hdr; + bool has_inner_frag; + + inner_l3_off =3D cb->inner_l3_offset; + inner_l4_off =3D cb->inner_l4_offset; + + /* inner header alignment is not guaranteed */ + memcpy(inner4, skb->data + inner_l3_off, sizeof(*inner4)); + inner_l3_len =3D inner4->ihl << 2; + has_inner_frag =3D ip_is_fragment(inner4); + + /* save outer IPv6 hdr because pull+push destroys that hdr region */ + outer_ip6_copy =3D *ipv6_hdr(skb); + + old_prefix =3D inner_l3_off + inner_l3_len; + new_prefix =3D inner_l3_off + sizeof(struct ipv6hdr) + + (has_inner_frag ? sizeof(struct frag_hdr) : 0); + *inner_delta =3D (int)new_prefix - (int)old_prefix; + + if (unlikely(skb_cow_head(skb, max_t(int, 0, *inner_delta)))) + return -ENOMEM; + + skb_pull(skb, old_prefix); + skb_push(skb, new_prefix); + /* outer 4->6 path already set header offsets, but inner relayout + * pulls/pushes change skb->data placement. Reinitialize outer header + * offsets so ip{,v6}_hdr/icmp{,6}_hdr and skb_transport_offset keep + * pointing to the outer packet. + */ + skb_reset_network_header(skb); + skb_set_transport_header(skb, outer_l3_len); + + *ipv6_hdr(skb) =3D outer_ip6_copy; + ipv6_hdr(skb)->payload_len =3D htons(skb->len - sizeof(struct ipv6hdr)); + + inner_ip6 =3D (struct ipv6hdr *)(skb->data + inner_l3_off); + /* use quoted IPv4 total-length, not skb->len: + * skb->len also includes ICMP extension bytes at the end, which are + * not part of the quoted inner IP datagram length. + */ + inner_tot_len =3D ntohs(inner4->tot_len); + if (unlikely(inner_tot_len < inner_l3_len)) + return -EINVAL; + + inner_l3_payload =3D inner_tot_len - inner_l3_len + + (has_inner_frag ? sizeof(struct frag_hdr) : 0); + if (has_inner_frag) + next_hdr =3D NEXTHDR_FRAGMENT; + else + next_hdr =3D ipxlat_46_map_proto_to_nexthdr(inner4->protocol); + + ipxlat_46_build_l3(inner_ip6, inner4, inner_l3_payload, next_hdr, + inner4->ttl); + + ipxlat_46_convert_addrs(&ipxlat->xlat_prefix6, inner4, inner_ip6); + + if (unlikely(has_inner_frag)) { + fh6 =3D (struct frag_hdr *)(inner_ip6 + 1); + ipxlat_46_build_frag_hdr(fh6, inner4, inner4->protocol); + } + + if (unlikely(!ipxlat_is_first_frag4(inner4))) + return 0; + + inner_l4_payload =3D new_prefix + ipxlat_l4_min_len(inner4->protocol); + if (unlikely(skb_ensure_writable(skb, inner_l4_payload))) + return -ENOMEM; + + return ipxlat_46_icmp_inner_l4(skb, new_prefix, inner4, inner_ip6); +} + +/* Adjust ICMP error quoted-datagram/extensions after inner 4->6 translati= on. + * The inner rewrite changes quoted datagram length; this helper recomputes + * RFC 4884 delimiter/padding, preserves extensions only when allowed, and + * enforces IPv6 minimum-MTU packet size constraints. + */ +static int ipxlat_46_icmp_squeeze_ext(struct sk_buff *skb, + unsigned int icmp4_ipl, int inner_delta, + bool ie_forbidden) +{ + unsigned int icmp6_iel_in, icmp6_iel_out, max_iel, outer_hdrs_len, + out_pad, payload_len, icmp6_ipl_out_bytes, pkt_len_cap; + unsigned int icmp6_ipl_out =3D 0; + int icmp6_ipl_in_bytes, err; + struct icmp6hdr *ic6; + struct ipv6hdr *iph6; + + /* icmp4_ipl marks where quoted datagram ends and extension area starts + */ + if (likely(!icmp4_ipl)) + goto no_extensions; + + outer_hdrs_len =3D skb_transport_offset(skb) + sizeof(struct icmp6hdr); + payload_len =3D skb->len - outer_hdrs_len; + icmp6_ipl_in_bytes =3D icmp4_ipl + inner_delta; + if (unlikely(icmp6_ipl_in_bytes < 0 || + icmp6_ipl_in_bytes > payload_len)) + return -EINVAL; + + if (likely(icmp6_ipl_in_bytes =3D=3D payload_len)) + goto no_extensions; + + icmp6_iel_in =3D payload_len - icmp6_ipl_in_bytes; + max_iel =3D IPV6_MIN_MTU - (outer_hdrs_len + ICMP_EXT_ORIG_DGRAM_MIN_LEN); + + if (unlikely(ie_forbidden || icmp6_iel_in > max_iel)) { + pkt_len_cap =3D min_t(unsigned int, skb->len - icmp6_iel_in, + IPV6_MIN_MTU); + icmp6_ipl_out_bytes =3D pkt_len_cap - outer_hdrs_len; + out_pad =3D 0; + icmp6_iel_out =3D 0; + icmp6_ipl_out =3D 0; + } else { + pkt_len_cap =3D min_t(unsigned int, skb->len, IPV6_MIN_MTU); + icmp6_ipl_out_bytes =3D + round_down(pkt_len_cap - icmp6_iel_in - outer_hdrs_len, + sizeof(u64)); + out_pad =3D max_t(unsigned int, ICMP_EXT_ORIG_DGRAM_MIN_LEN, + icmp6_ipl_out_bytes) - + icmp6_ipl_out_bytes; + icmp6_iel_out =3D icmp6_iel_in; + icmp6_ipl_out =3D (icmp6_ipl_out_bytes + out_pad) >> 3; + } + + /* if no extension bytes are copied and no pad is written, relayout only + * trims/updates lengths and does not require full data writability + */ + if (unlikely(icmp6_iel_out || out_pad)) { + err =3D skb_ensure_writable(skb, skb->len); + if (unlikely(err)) + return err; + } + + err =3D ipxlat_icmp_relayout(skb, outer_hdrs_len, icmp6_ipl_in_bytes, + icmp6_iel_in, icmp6_ipl_out_bytes, out_pad, + icmp6_iel_out); + if (unlikely(err)) + return err; + + iph6 =3D ipv6_hdr(skb); + iph6->payload_len =3D htons(skb->len - sizeof(*iph6)); + +no_extensions: + if (unlikely(skb->len > IPV6_MIN_MTU)) { + err =3D pskb_trim(skb, IPV6_MIN_MTU); + if (unlikely(err)) + return err; + + iph6 =3D ipv6_hdr(skb); + iph6->payload_len =3D htons(skb->len - sizeof(*iph6)); + } + + ic6 =3D icmp6_hdr(skb); + ic6->icmp6_datagram_len =3D icmp6_ipl_out; + return 0; +} + +/** + * ipxlat_46_icmp_error - translate ICMPv4 error payload to ICMPv6 error f= orm + * @ipxlat: translator private context + * @skb: packet carrying outer ICMPv4 error + * + * Rewrites the quoted inner datagram in place, maps type/code/fields and + * adjusts RFC 4884 datagram/extension layout before recomputing outer che= cksum. + * + * Return: 0 on success, negative errno on translation failure. + */ +static int ipxlat_46_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff= *skb) +{ + const struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct icmphdr icmp4 =3D *icmp_hdr(skb); + struct iphdr inner4_ip; + int inner_delta, err; + bool ie_forbidden; + + if (unlikely(!(cb->is_icmp_err))) { + DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + } + + /* translate quoted inner packet headers */ + err =3D ipxlat_46_icmp_inner(ipxlat, skb, &inner4_ip, &inner_delta); + if (unlikely(err)) + return err; + + err =3D ipxlat_46_map_icmp_type_code(ipxlat, skb, &icmp4, icmp6_hdr(skb), + &inner4_ip, &ie_forbidden); + if (unlikely(err)) + return err; + + err =3D ipxlat_46_icmp_squeeze_ext(skb, icmp4.un.reserved[1] << 2, + inner_delta, ie_forbidden); + if (unlikely(err)) + return err; + + /* error path rewrites quoted packet bytes/lengths, so use full + * checksum recomputation instead of incremental update + */ + icmp6_hdr(skb)->icmp6_cksum =3D 0; + icmp6_hdr(skb)->icmp6_cksum =3D + ipxlat_l4_csum_ipv6(&ipv6_hdr(skb)->saddr, + &ipv6_hdr(skb)->daddr, skb, + skb_transport_offset(skb), + ipxlat_skb_datagram_len(skb), + IPPROTO_ICMPV6); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_46_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb) { if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err)) - return -EPROTONOSUPPORT; + return ipxlat_46_icmp_error(ipxlat, skb); =20 return ipxlat_46_icmp_info_outer(skb); } diff --git a/drivers/net/ipxlat/icmp_64.c b/drivers/net/ipxlat/icmp_64.c index 6b11aa638068..18583620a09a 100644 --- a/drivers/net/ipxlat/icmp_64.c +++ b/drivers/net/ipxlat/icmp_64.c @@ -11,12 +11,38 @@ * Ralf Lici */ =20 -#include +#include =20 +#include "address.h" #include "icmp.h" #include "packet.h" +#include "translate_64.h" #include "transport.h" =20 +#define IPXLAT_ICMP4_ERROR_MAX_LEN 576U + +/* RFC 7915 Section 5.2, Figure 4 */ +static const u8 ipxlat_64_icmp_param_prob_map[] =3D { + 0, 1, 0xff, 0xff, 2, 2, 9, 8, 12, 12, 12, 12, 12, 12, + 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, +}; + +static int ipxlat_64_map_icmp_param_prob(u32 ptr6, u32 *ptr4) +{ + if (unlikely(ptr6 >=3D ARRAY_SIZE(ipxlat_64_icmp_param_prob_map) || + ipxlat_64_icmp_param_prob_map[ptr6] =3D=3D 0xff)) + return -EPROTONOSUPPORT; + + *ptr4 =3D ipxlat_64_icmp_param_prob_map[ptr6]; + return 0; +} + +static void ipxlat_icmp4_set_param_ptr(struct icmphdr *ic4, u8 ptr) +{ + ic4->un.gateway =3D htonl((u32)ptr << 24); +} + static int ipxlat_64_map_icmp_info_type_code(const struct icmp6hdr *in, struct icmphdr *out) { @@ -38,10 +64,119 @@ static int ipxlat_64_map_icmp_info_type_code(const str= uct icmp6hdr *in, } } =20 -static __sum16 ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6, - const struct icmp6hdr *in_icmp6, - const struct icmphdr *out_icmp4, - unsigned int l4_len) +/* Lookup post-translation IPv4 PMTU for ICMPv6 PTB -> ICMPv4 FRAG_NEEDED. + * Falls back to translator MTU on routing failures and clamps route MTU + * against translator egress MTU. + */ +static unsigned int ipxlat_64_lookup_pmtu4(struct ipxlat_priv *ipxlat, + const struct sk_buff *skb) +{ + const struct iphdr *iph4; + struct flowi4 fl4 =3D {}; + unsigned int dev_mtu; + struct rtable *rt; + unsigned int mtu4; + + dev_mtu =3D READ_ONCE(ipxlat->dev->mtu); + iph4 =3D ip_hdr(skb); + + fl4.daddr =3D iph4->daddr; + fl4.saddr =3D iph4->saddr; + fl4.flowi4_mark =3D skb->mark; + fl4.flowi4_proto =3D IPPROTO_ICMP; + + rt =3D ip_route_output_key(dev_net(ipxlat->dev), &fl4); + if (IS_ERR(rt)) + return dev_mtu; + + /* clamp against translator MTU to avoid oversized local PMTU */ + mtu4 =3D min_t(unsigned int, dst_mtu(&rt->dst), dev_mtu); + ip_rt_put(rt); + + return mtu4; +} + +static int ipxlat_64_build_icmp4_errhdr(struct ipxlat_priv *ipxlat, + struct sk_buff *skb, + const struct icmp6hdr *ic6, + struct icmphdr *ic4, bool *ie_forbidden) +{ + unsigned int in_mtu, out_mtu; + u32 ptr6, ptr4; + int err; + + switch (ic6->icmp6_type) { + case ICMPV6_DEST_UNREACH: + ic4->type =3D ICMP_DEST_UNREACH; + switch (ic6->icmp6_code) { + case ICMPV6_NOROUTE: + case ICMPV6_NOT_NEIGHBOUR: + case ICMPV6_ADDR_UNREACH: + ic4->code =3D ICMP_HOST_UNREACH; + break; + case ICMPV6_ADM_PROHIBITED: + ic4->code =3D ICMP_HOST_ANO; + break; + case ICMPV6_PORT_UNREACH: + ic4->code =3D ICMP_PORT_UNREACH; + break; + default: + return -EINVAL; + } + ic4->un.gateway =3D 0; + *ie_forbidden =3D false; + return 0; + case ICMPV6_TIME_EXCEED: + ic4->type =3D ICMP_TIME_EXCEEDED; + ic4->code =3D ic6->icmp6_code; + ic4->un.gateway =3D 0; + *ie_forbidden =3D false; + return 0; + case ICMPV6_PKT_TOOBIG: + ic4->type =3D ICMP_DEST_UNREACH; + ic4->code =3D ICMP_FRAG_NEEDED; + ic4->un.frag.__unused =3D 0; + in_mtu =3D ipxlat_64_lookup_pmtu4(ipxlat, skb); + out_mtu =3D READ_ONCE(ipxlat->dev->mtu); + /* RFC 7915 Section 5.2: + * min((PTB_mtu - 20), mtu4_nexthop, (mtu6_nexthop - 20)) + */ + ic4->un.frag.mtu =3D + cpu_to_be16(min3(be32_to_cpu(ic6->icmp6_mtu) - 20, + in_mtu, out_mtu - 20)); + *ie_forbidden =3D true; + return 0; + case ICMPV6_PARAMPROB: + ptr6 =3D be32_to_cpu(ic6->icmp6_dataun.un_data32[0]); + switch (ic6->icmp6_code) { + case ICMPV6_HDR_FIELD: + ic4->type =3D ICMP_PARAMETERPROB; + ic4->code =3D 0; + err =3D ipxlat_64_map_icmp_param_prob(ptr6, &ptr4); + if (unlikely(err)) + return err; + ipxlat_icmp4_set_param_ptr(ic4, ptr4); + break; + case ICMPV6_UNK_NEXTHDR: + ic4->type =3D ICMP_DEST_UNREACH; + ic4->code =3D ICMP_PROT_UNREACH; + ic4->un.gateway =3D 0; + break; + default: + return -EINVAL; + } + *ie_forbidden =3D true; + return 0; + default: + return -EINVAL; + } +} + +static __sum16 +ipxlat_64_compute_icmp_info_csum(const struct ipv6hdr *in6, + const struct icmp6hdr *in_icmp6, + const struct icmphdr *out_icmp4, + unsigned int l4_len) { struct icmp6hdr icmp6_zero; struct icmphdr icmp4_zero; @@ -82,11 +217,315 @@ static int ipxlat_64_icmp_info(struct sk_buff *skb, c= onst struct ipv6hdr *in6) return 0; } =20 -int ipxlat_64_icmp(struct ipxlat_priv *ipxl, struct sk_buff *skb, +static int ipxlat_64_icmp_inner_info(struct sk_buff *skb, + unsigned int inner_l4_off) +{ + struct icmphdr *ic4; + struct icmp6hdr ic6; + int err; + + /* inner header alignment is not guaranteed */ + memcpy(&ic6, skb->data + inner_l4_off, sizeof(ic6)); + ic4 =3D (struct icmphdr *)(skb->data + inner_l4_off); + err =3D ipxlat_64_map_icmp_info_type_code(&ic6, ic4); + if (unlikely(err)) + return err; + + ic4->checksum =3D 0; + ic4->checksum =3D csum_fold(skb_checksum(skb, inner_l4_off, + skb->len - inner_l4_off, 0)); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +static int ipxlat_64_icmp_inner_l4(struct sk_buff *skb, + unsigned int inner_l4_off, + const struct iphdr *inner4, + const struct ipv6hdr *inner6) +{ + struct tcphdr *tcp; + struct udphdr *udp; + + switch (inner4->protocol) { + case IPPROTO_TCP: + tcp =3D (struct tcphdr *)(skb->data + inner_l4_off); + return ipxlat_64_inner_tcp(skb, inner6, inner4, tcp); + case IPPROTO_UDP: + udp =3D (struct udphdr *)(skb->data + inner_l4_off); + return ipxlat_64_inner_udp(skb, inner6, inner4, udp); + case IPPROTO_ICMP: + return ipxlat_64_icmp_inner_info(skb, inner_l4_off); + default: + return 0; + } +} + +static int ipxlat_64_icmp_inner(struct ipxlat_priv *ipxlat, struct sk_buff= *skb, + int *inner_delta) +{ + unsigned int old_prefix, new_prefix, inner_l3_len, inner_tot_len, + inner_l4_payload, outer_prefix, inner_l3_off, inner_l4_old_off; + const unsigned int outer_l3_len =3D skb_transport_offset(skb); + const struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct iphdr outer4_copy =3D *ip_hdr(skb); + bool has_inner_frag, first_inner_frag, mf, df; + struct frag_hdr inner_fragh; + struct ipv6hdr inner6; + struct iphdr *inner4; + __be32 saddr, daddr; + u16 frag_off; + u8 inner_l4_proto; + __be16 frag_id; + int err; + + inner_l3_off =3D cb->inner_l3_offset; + inner_l4_old_off =3D cb->inner_l4_offset; + inner_l3_len =3D inner_l4_old_off - inner_l3_off; + outer_prefix =3D inner_l3_off; + + inner_l4_proto =3D ipxlat_64_map_nexthdr_proto(cb->inner_l4_proto); + has_inner_frag =3D !!cb->inner_fragh_off; + + /* inner header alignment is not guaranteed */ + memcpy(&inner6, skb->data + outer_prefix, sizeof(inner6)); + + first_inner_frag =3D true; + if (unlikely(has_inner_frag)) { + memcpy(&inner_fragh, skb->data + cb->inner_fragh_off, + sizeof(inner_fragh)); + first_inner_frag =3D ipxlat_is_first_frag6(&inner_fragh); + } + + err =3D ipxlat_64_convert_addrs(&ipxlat->xlat_prefix6, &inner6, false, + &saddr, &daddr); + if (unlikely(err)) + return err; + + old_prefix =3D outer_prefix + inner_l3_len; + new_prefix =3D outer_prefix + sizeof(struct iphdr); + *inner_delta =3D (int)new_prefix - (int)old_prefix; + + /* unlike 46, inner 6->4 always shrinks quoted L3 size */ + skb_pull(skb, old_prefix); + skb_push(skb, new_prefix); + /* outer 6->4 translation already set network/transport headers, but + * inner relayout pulls/pushes again and changes skb->data placement. + * Reinitialize outer header offsets so ip{,v6}_hdr/icmp{,6}_hdr and + * skb_transport_offset keep pointing to the outer packet. + */ + skb_reset_network_header(skb); + skb_set_transport_header(skb, outer_l3_len); + + *ip_hdr(skb) =3D outer4_copy; + + inner4 =3D (struct iphdr *)(skb->data + outer_prefix); + inner_tot_len =3D ntohs(inner6.payload_len) + sizeof(inner6) - + inner_l3_len + sizeof(struct iphdr); + /* RFC 7915 Section 5.1 */ + if (likely(!has_inner_frag)) { + df =3D inner_tot_len > (IPV6_MIN_MTU - sizeof(struct iphdr)); + inner4->frag_off =3D ipxlat_build_frag4_offset(df, false, 0); + } else { + mf =3D !!(be16_to_cpu(inner_fragh.frag_off) & IP6_MF); + frag_off =3D ipxlat_get_frag6_offset(&inner_fragh); + inner4->frag_off =3D + ipxlat_build_frag4_offset(false, mf, frag_off); + } + + /* keep low 16 bits of IPv6 Fragment ID as numeric value, then re-encode + * to network-order IPv4 ID + */ + frag_id =3D has_inner_frag ? + cpu_to_be16(be32_to_cpu(inner_fragh.identification)) : + 0; + ipxlat_64_build_l3(inner4, &inner6, inner_tot_len, inner4->frag_off, + inner_l4_proto, saddr, daddr, inner6.hop_limit, + frag_id); + + if (likely(!has_inner_frag)) { + inner4->id =3D 0; + __ip_select_ident(dev_net(ipxlat->dev), inner4, 1); + inner4->check =3D 0; + inner4->check =3D ip_fast_csum(inner4, inner4->ihl); + } + + if (unlikely(!first_inner_frag)) + return 0; + + inner_l4_payload =3D new_prefix + ipxlat_l4_min_len(inner4->protocol); + if (unlikely(skb_ensure_writable(skb, inner_l4_payload))) + return -ENOMEM; + + return ipxlat_64_icmp_inner_l4(skb, new_prefix, inner4, &inner6); +} + +/* Rebuild ICMPv4 quoted-datagram/extensions after inner 6->4 translation. + * + * The inner rewrite changes the quoted datagram length. This helper updat= es + * the RFC 4884 delimiter/padding and extension bytes, then enforces the + * IPv4 ICMP error size cap. + * + * This is intentionally not a mirror of ipxlat_46_icmp_squeeze_ext: + * - 4->6 always writes icmp6_datagram_len (either computed or 0). + * - 6->4 updates ICMPv4 datagram-length only when extensions are allowed. + * Some mapped ICMPv6 errors set ie_forbidden, and in that case we keep = the + * ICMPv4 header semantics for that type/code and only relayout/trim pay= load. + */ +static int ipxlat_64_squeeze_icmp_ext(struct sk_buff *skb, + unsigned int icmp6_ipl, int inner_delta, + bool ie_forbidden) +{ + unsigned int outer_hdrs_len, payload_len, icmp4_iel_in, icmp4_iel_out; + unsigned int out_pad, max_iel, pkt_len_cap, icmp4_ipl_out_bytes; + unsigned int icmp4_ipl_out =3D 0, icmp4_ipl_in_bytes; + unsigned int new_tot_len; + int icmp4_ipl_in, err; + struct icmphdr *ic4; + struct iphdr *iph4; + + if (likely(!icmp6_ipl)) + goto finalize; + + outer_hdrs_len =3D skb_transport_offset(skb) + sizeof(struct icmphdr); + if (unlikely(skb->len < outer_hdrs_len)) + return -EINVAL; + + payload_len =3D skb->len - outer_hdrs_len; + icmp4_ipl_in =3D (int)icmp6_ipl + inner_delta; + if (unlikely(icmp4_ipl_in < 0)) + return -EINVAL; + icmp4_ipl_in_bytes =3D icmp4_ipl_in; + if (unlikely(icmp4_ipl_in_bytes > payload_len)) + return -EINVAL; + + if (likely(icmp4_ipl_in_bytes =3D=3D payload_len)) + goto finalize; + + icmp4_iel_in =3D payload_len - icmp4_ipl_in_bytes; + max_iel =3D IPXLAT_ICMP4_ERROR_MAX_LEN - + (outer_hdrs_len + ICMP_EXT_ORIG_DGRAM_MIN_LEN); + + if (unlikely(ie_forbidden)) { + icmp4_ipl_out_bytes =3D icmp4_ipl_in_bytes; + out_pad =3D 0; + icmp4_iel_out =3D 0; + } else if (unlikely(icmp4_iel_in > max_iel)) { + pkt_len_cap =3D min_t(unsigned int, skb->len - icmp4_iel_in, + IPXLAT_ICMP4_ERROR_MAX_LEN); + icmp4_ipl_out_bytes =3D pkt_len_cap - outer_hdrs_len; + out_pad =3D 0; + icmp4_iel_out =3D 0; + icmp4_ipl_out =3D 0; + } else { + pkt_len_cap =3D min_t(unsigned int, skb->len, + IPXLAT_ICMP4_ERROR_MAX_LEN); + icmp4_ipl_out_bytes =3D + round_down(pkt_len_cap - icmp4_iel_in - outer_hdrs_len, + sizeof(u32)); + out_pad =3D max_t(unsigned int, ICMP_EXT_ORIG_DGRAM_MIN_LEN, + icmp4_ipl_out_bytes) - + icmp4_ipl_out_bytes; + icmp4_iel_out =3D icmp4_iel_in; + /* RFC 4884 field is in 32-bit units for ICMPv4 errors */ + icmp4_ipl_out =3D (icmp4_ipl_out_bytes + out_pad) >> 2; + } + + /* if no extension bytes are copied and no pad is written, relayout only + * trims/updates lengths and does not require full data writability + */ + if (unlikely(icmp4_iel_out || out_pad)) { + err =3D skb_ensure_writable(skb, skb->len); + if (unlikely(err)) + return err; + } + + err =3D ipxlat_icmp_relayout(skb, outer_hdrs_len, icmp4_ipl_in_bytes, + icmp4_iel_in, icmp4_ipl_out_bytes, out_pad, + icmp4_iel_out); + if (unlikely(err)) + return err; + +finalize: + if (!ie_forbidden) { + ic4 =3D icmp_hdr(skb); + ic4->un.reserved[1] =3D icmp4_ipl_out; + } + + if (unlikely(skb->len > IPXLAT_ICMP4_ERROR_MAX_LEN)) { + err =3D pskb_trim(skb, IPXLAT_ICMP4_ERROR_MAX_LEN); + if (unlikely(err)) + return err; + } + + iph4 =3D ip_hdr(skb); + new_tot_len =3D skb->len; + if (unlikely(be16_to_cpu(iph4->tot_len) !=3D new_tot_len)) { + iph4->tot_len =3D cpu_to_be16(new_tot_len); + /* relayout/trim may invalidate precomputed DF decision */ + iph4->frag_off &=3D cpu_to_be16(~IP_DF); + iph4->check =3D 0; + iph4->check =3D ip_fast_csum(iph4, iph4->ihl); + } + + return 0; +} + +/** + * ipxlat_64_icmp_error - translate ICMPv6 error payload to ICMPv4 error f= orm + * @ipxlat: translator private context + * @skb: packet carrying outer ICMPv6 error + * + * Rewrites the quoted inner datagram in place, maps type/code/fields and + * adjusts RFC 4884 datagram/extension layout before recomputing outer che= cksum. + * + * Return: 0 on success, negative errno on translation failure. + */ +static int ipxlat_64_icmp_error(struct ipxlat_priv *ipxlat, struct sk_buff= *skb) +{ + const struct ipxlat_cb *cb =3D ipxlat_skb_cb(skb); + const struct icmp6hdr ic6 =3D *icmp6_hdr(skb); + unsigned int icmp6_ipl; + int inner_delta, err; + struct icmphdr *ic4; + bool ie_forbidden; + + if (unlikely(!(cb->is_icmp_err))) { + DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + } + + /* translate quoted inner packet headers */ + err =3D ipxlat_64_icmp_inner(ipxlat, skb, &inner_delta); + if (unlikely(err)) + return err; + + /* build outer ICMPv4 error header after inner relayout */ + ic4 =3D (struct icmphdr *)(skb->data + skb_transport_offset(skb)); + err =3D ipxlat_64_build_icmp4_errhdr(ipxlat, skb, &ic6, ic4, + &ie_forbidden); + if (unlikely(err)) + return err; + + icmp6_ipl =3D ic6.icmp6_datagram_len << 3; + err =3D ipxlat_64_squeeze_icmp_ext(skb, icmp6_ipl, inner_delta, + ie_forbidden); + if (unlikely(err)) + return err; + + /* recompute whole ICMPv4 checksum after error-path relayout */ + ic4->checksum =3D 0; + ic4->checksum =3D csum_fold(skb_checksum(skb, skb_transport_offset(skb), + ipxlat_skb_datagram_len(skb), + 0)); + skb->ip_summed =3D CHECKSUM_NONE; + return 0; +} + +int ipxlat_64_icmp(struct ipxlat_priv *ipxlat, struct sk_buff *skb, const struct ipv6hdr *in6) { if (unlikely(ipxlat_skb_cb(skb)->is_icmp_err)) - return -EPROTONOSUPPORT; + return ipxlat_64_icmp_error(ipxlat, skb); =20 return ipxlat_64_icmp_info(skb, in6); } diff --git a/drivers/net/ipxlat/transport.c b/drivers/net/ipxlat/transport.c index 3aa00c635916..82aedfb0ee48 100644 --- a/drivers/net/ipxlat/transport.c +++ b/drivers/net/ipxlat/transport.c @@ -87,6 +87,67 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr, skb_checksum(skb, l4_off, l4_len, 0)); } =20 +static int ipxlat_ensure_tailroom(struct sk_buff *skb, const unsigned int = grow) +{ + int err; + + if (!grow || skb_tailroom(skb) >=3D grow) + return 0; + + /* tail growth may reallocate backing storage and move skb data */ + err =3D pskb_expand_head(skb, 0, grow - skb_tailroom(skb), GFP_ATOMIC); + if (unlikely(err)) + return err; + + return 0; +} + +/* Rewrite quoted datagram layout after inner translation in ICMP errors. + * + * Caller provides old/new quoted lengths and extension lengths; this help= er + * only does byte moves/padding/trim while preserving extension bytes at t= he + * end of the packet when present + */ +int ipxlat_icmp_relayout(struct sk_buff *skb, unsigned int outer_len, + unsigned int in_ipl, unsigned int in_iel, + unsigned int out_ipl, unsigned int out_pad, + unsigned int out_iel) +{ + const unsigned int in_ie_off =3D outer_len + in_ipl, old_len =3D skb->len; + const unsigned int new_len =3D outer_len + out_ipl + out_pad + out_iel; + const unsigned int out_ie_off =3D outer_len + out_ipl + out_pad; + unsigned int grow =3D 0; + int err; + + /* new_len > old_len here means "we need extra bytes on top of + * already-translated length", mainly due padding/layout decisions + * while keeping extensions + */ + if (unlikely(new_len > old_len)) { + grow =3D new_len - old_len; + + err =3D ipxlat_ensure_tailroom(skb, grow); + if (unlikely(err)) + return err; + + __skb_put(skb, grow); + } + + if (unlikely(out_iel)) + memmove(skb->data + out_ie_off, skb->data + in_ie_off, out_iel); + + if (unlikely(out_pad)) + memset(skb->data + outer_len + out_ipl, 0, out_pad); + + if (unlikely(new_len < old_len)) { + err =3D pskb_trim(skb, new_len); + if (unlikely(err)) + return err; + } + + return 0; +} + /* Normalize checksum/offload metadata after address-family translation. * * Translation changes protocol family but keeps transport payload semanti= cs diff --git a/drivers/net/ipxlat/transport.h b/drivers/net/ipxlat/transport.h index 9b6fe422b01f..09f522696eea 100644 --- a/drivers/net/ipxlat/transport.h +++ b/drivers/net/ipxlat/transport.h @@ -63,6 +63,25 @@ __sum16 ipxlat_l4_csum_ipv6(const struct in6_addr *saddr, const struct sk_buff *skb, unsigned int l4_off, unsigned int l4_len, u8 proto); =20 +/** + * ipxlat_icmp_relayout - resize quoted ICMP payload/extensions in place + * @skb: packet buffer + * @outer_len: offset to quoted datagram start + * @in_ipl: input datagram payload length + * @in_iel: input extension length + * @out_ipl: output datagram payload length + * @out_pad: output pad bytes between datagram and extensions + * @out_iel: output extension length + * + * This helper may move payload bytes and adjust skb tail length. + * + * Return: 0 on success, negative errno on resize/memory failures. + */ +int ipxlat_icmp_relayout(struct sk_buff *skb, unsigned int outer_len, + unsigned int in_ipl, unsigned int in_iel, + unsigned int out_ipl, unsigned int out_pad, + unsigned int out_iel); + /** * ipxlat_finalize_offload - normalize checksum/GSO metadata after transla= tion * @skb: translated packet --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-105.mailbox.org (mout-b-105.mailbox.org [195.10.208.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AA253D7D63; Thu, 19 Mar 2026 15:22:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933741; cv=none; b=JmduGHeiTFpPA89YbUVa4Pk1IesVR2jrGPoHy9JDVR0Zo4mmzG6LJv/gQKk3w/s492YISPH2MhUHPZR189s04CO6Tp+EYCcJ1M1flBph/gHxcBk5yx+CmI9ADeDv809fHbx0+1hR/dwAzJrHV0OjRaYK65GnwCuVyFS7GGj/HWo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933741; c=relaxed/simple; bh=9HLDIX0VSw1Ao+Fg4hTNFPg2WBSxWi+frzX+4O/YuDE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=aAzCXrunwX18ZCtjkt7pY25MUsTn+jqIWrK3apsyN3nLTc0TKoVk2EN3teGq9JOoj/AzqxkKcEHiimsKpVKiBZpaRK/Ri79YFgylnDCSFQuvkfazemdUYb0tEOQi9KUtoaGU52lKAuayHCtTjnfgxITiYoJK3KBiPUjXPo1WgEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=L7/br4Fw; arc=none smtp.client-ip=195.10.208.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="L7/br4Fw" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-105.mailbox.org (Postfix) with ESMTPS id 4fc8Nj1k4Jz9xYW; Thu, 19 Mar 2026 16:13:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/+bYIX13NZJ/o8DNHJSrFIiE8CbJVAhtyWJSmMPu6Qg=; b=L7/br4FwZd3Cu5XNBjMVoa1Dt/xm7fIGEr5KJtww3IwXLT0O+DAxrHtaQOh46VuCHPxt2M JCOSwEfIa3JdSHnbVCDmexF+ERuNptvEf0kqjlO4P2Z3OCqJ/d34uVdx4lYJX+tlZMoV2x xvGZPFW6YWTHoniKx37gj9GHLh6njYn6MvgEFMe/ca00kqzJ7qz0DrUcosldrVX9bEsPAn e82dDakTW6i8XZw/y1wyeGPyRIr7Zxd80Rba39G+7ldThKpx8anw4qmfQReIWFyVBGLPzM ofTEALhNZzoUSjvg+P/xiXiLQu9+4LXuJRGh1SHhj6FyibmZ3DBknRBD81kQZw== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Andrew Lunn , linux-kernel@vger.kernel.org Subject: [RFC net-next 13/15] ipxlat: add netlink control plane and uapi Date: Thu, 19 Mar 2026 16:12:22 +0100 Message-ID: <20260319151230.655687-14-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Expose runtime configuration through netlink with validated set/get/dump operations and generated policy glue from the YAML spec. The API configures the translator prefix and MTU threshold used by the data path. Signed-off-by: Ralf Lici --- Documentation/netlink/specs/ipxlat.yaml | 97 +++++++ drivers/net/ipxlat/Makefile | 2 + drivers/net/ipxlat/main.c | 9 + drivers/net/ipxlat/netlink-gen.c | 71 +++++ drivers/net/ipxlat/netlink-gen.h | 31 +++ drivers/net/ipxlat/netlink.c | 348 ++++++++++++++++++++++++ drivers/net/ipxlat/netlink.h | 27 ++ drivers/net/ipxlat/translate_46.c | 3 +- include/uapi/linux/ipxlat.h | 48 ++++ 9 files changed, 635 insertions(+), 1 deletion(-) create mode 100644 Documentation/netlink/specs/ipxlat.yaml create mode 100644 drivers/net/ipxlat/netlink-gen.c create mode 100644 drivers/net/ipxlat/netlink-gen.h create mode 100644 drivers/net/ipxlat/netlink.c create mode 100644 drivers/net/ipxlat/netlink.h create mode 100644 include/uapi/linux/ipxlat.h diff --git a/Documentation/netlink/specs/ipxlat.yaml b/Documentation/netlin= k/specs/ipxlat.yaml new file mode 100644 index 000000000000..d0df5ef16e04 --- /dev/null +++ b/Documentation/netlink/specs/ipxlat.yaml @@ -0,0 +1,97 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Cla= use) +# +# Copyright (C) 2026- Mandelbit SRL +# +# Author: Antonio Quartulli +# Ralf Lici +# +--- +name: ipxlat +protocol: genetlink +doc: Netlink protocol to control IPXLAT (SIIT) network devices. + +definitions: + - + type: const + name: xlat-prefix6-max-prefix-len + value: 96 + doc: Maximum prefix length accepted for xlat-prefix6. + +attribute-sets: + - + name: pool + attributes: + - + name: prefix + type: binary + checks: + exact-len: 16 + - + name: prefix-len + type: u8 + checks: + max: xlat-prefix6-max-prefix-len + - + name: cfg + attributes: + - + name: xlat-prefix6 + type: nest + doc: IPv6 translation prefix. + nested-attributes: pool + - + name: lowest-ipv6-mtu + type: u32 + checks: + min: 1280 + - + name: dev + attributes: + - + name: ifindex + type: u32 + doc: Index of the ipxlat interface to operate on. + - + name: netnsid + type: s32 + doc: ID of the netns the device lives in. + - + name: config + type: nest + doc: Ipxlat device configuration. + nested-attributes: cfg + +operations: + list: + - + name: dev-get + attribute-set: dev + doc: Get / dump configuration of ipxlat devices. + do: + pre: ipxlat-nl-pre-doit + post: ipxlat-nl-post-doit + request: + attributes: + - ifindex + reply: &dev-all + attributes: + - ifindex + - netnsid + - config + dump: + reply: *dev-all + + - + name: dev-set + doc: Set configuration of an ipxlat device. + attribute-set: dev + flags: [admin-perm] + do: + request: + attributes: + - ifindex + - config + reply: + attributes: [] + pre: ipxlat-nl-pre-doit + post: ipxlat-nl-post-doit diff --git a/drivers/net/ipxlat/Makefile b/drivers/net/ipxlat/Makefile index 2ded504902e3..b906d5698351 100644 --- a/drivers/net/ipxlat/Makefile +++ b/drivers/net/ipxlat/Makefile @@ -13,3 +13,5 @@ ipxlat-objs +=3D translate_46.o ipxlat-objs +=3D translate_64.o ipxlat-objs +=3D icmp_46.o ipxlat-objs +=3D icmp_64.o +ipxlat-objs +=3D netlink.o +ipxlat-objs +=3D netlink-gen.o diff --git a/drivers/net/ipxlat/main.c b/drivers/net/ipxlat/main.c index a1b4bcd39478..bef67ed634b6 100644 --- a/drivers/net/ipxlat/main.c +++ b/drivers/net/ipxlat/main.c @@ -18,6 +18,7 @@ #include "dispatch.h" #include "ipxlpriv.h" #include "main.h" +#include "netlink.h" =20 MODULE_AUTHOR("Alberto Leiva Popper "); MODULE_AUTHOR("Antonio Quartulli "); @@ -127,11 +128,19 @@ static int __init ipxlat_init(void) return err; } =20 + err =3D ipxlat_nl_register(); + if (err) { + pr_err("ipxlat: failed to register netlink family: %d\n", err); + rtnl_link_unregister(&ipxlat_link_ops); + return err; + } + return 0; } =20 static void __exit ipxlat_exit(void) { + ipxlat_nl_unregister(); rtnl_link_unregister(&ipxlat_link_ops); } =20 diff --git a/drivers/net/ipxlat/netlink-gen.c b/drivers/net/ipxlat/netlink-= gen.c new file mode 100644 index 000000000000..e2cfaa6bb4dc --- /dev/null +++ b/drivers/net/ipxlat/netlink-gen.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Cl= ause) +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/ipxlat.yaml */ +/* YNL-GEN kernel source */ +/* To regenerate run: tools/net/ynl/ynl-regen.sh */ + +#include +#include + +#include "netlink-gen.h" + +#include + +/* Common nested types */ +const struct nla_policy ipxlat_cfg_nl_policy[IPXLAT_A_CFG_LOWEST_IPV6_MTU = + 1] =3D { + [IPXLAT_A_CFG_XLAT_PREFIX6] =3D NLA_POLICY_NESTED(ipxlat_pool_nl_policy), + [IPXLAT_A_CFG_LOWEST_IPV6_MTU] =3D NLA_POLICY_MIN(NLA_U32, 1280), +}; + +const struct nla_policy ipxlat_pool_nl_policy[IPXLAT_A_POOL_PREFIX_LEN + 1= ] =3D { + [IPXLAT_A_POOL_PREFIX] =3D NLA_POLICY_EXACT_LEN(16), + [IPXLAT_A_POOL_PREFIX_LEN] =3D NLA_POLICY_MAX(NLA_U8, IPXLAT_XLAT_PREFIX6= _MAX_PREFIX_LEN), +}; + +/* IPXLAT_CMD_DEV_GET - do */ +static const struct nla_policy ipxlat_dev_get_nl_policy[IPXLAT_A_DEV_IFIND= EX + 1] =3D { + [IPXLAT_A_DEV_IFINDEX] =3D { .type =3D NLA_U32, }, +}; + +/* IPXLAT_CMD_DEV_SET - do */ +static const struct nla_policy ipxlat_dev_set_nl_policy[IPXLAT_A_DEV_CONFI= G + 1] =3D { + [IPXLAT_A_DEV_IFINDEX] =3D { .type =3D NLA_U32, }, + [IPXLAT_A_DEV_CONFIG] =3D NLA_POLICY_NESTED(ipxlat_cfg_nl_policy), +}; + +/* Ops table for ipxlat */ +static const struct genl_split_ops ipxlat_nl_ops[] =3D { + { + .cmd =3D IPXLAT_CMD_DEV_GET, + .pre_doit =3D ipxlat_nl_pre_doit, + .doit =3D ipxlat_nl_dev_get_doit, + .post_doit =3D ipxlat_nl_post_doit, + .policy =3D ipxlat_dev_get_nl_policy, + .maxattr =3D IPXLAT_A_DEV_IFINDEX, + .flags =3D GENL_CMD_CAP_DO, + }, + { + .cmd =3D IPXLAT_CMD_DEV_GET, + .dumpit =3D ipxlat_nl_dev_get_dumpit, + .flags =3D GENL_CMD_CAP_DUMP, + }, + { + .cmd =3D IPXLAT_CMD_DEV_SET, + .pre_doit =3D ipxlat_nl_pre_doit, + .doit =3D ipxlat_nl_dev_set_doit, + .post_doit =3D ipxlat_nl_post_doit, + .policy =3D ipxlat_dev_set_nl_policy, + .maxattr =3D IPXLAT_A_DEV_CONFIG, + .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, +}; + +struct genl_family ipxlat_nl_family __ro_after_init =3D { + .name =3D IPXLAT_FAMILY_NAME, + .version =3D IPXLAT_FAMILY_VERSION, + .netnsok =3D true, + .parallel_ops =3D true, + .module =3D THIS_MODULE, + .split_ops =3D ipxlat_nl_ops, + .n_split_ops =3D ARRAY_SIZE(ipxlat_nl_ops), +}; diff --git a/drivers/net/ipxlat/netlink-gen.h b/drivers/net/ipxlat/netlink-= gen.h new file mode 100644 index 000000000000..2a766d05e0b4 --- /dev/null +++ b/drivers/net/ipxlat/netlink-gen.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Cl= ause) */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/ipxlat.yaml */ +/* YNL-GEN kernel header */ +/* To regenerate run: tools/net/ynl/ynl-regen.sh */ + +#ifndef _LINUX_IPXLAT_GEN_H +#define _LINUX_IPXLAT_GEN_H + +#include +#include + +#include + +/* Common nested types */ +extern const struct nla_policy ipxlat_cfg_nl_policy[IPXLAT_A_CFG_LOWEST_IP= V6_MTU + 1]; +extern const struct nla_policy ipxlat_pool_nl_policy[IPXLAT_A_POOL_PREFIX_= LEN + 1]; + +int ipxlat_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *s= kb, + struct genl_info *info); +void +ipxlat_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, + struct genl_info *info); + +int ipxlat_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info); +int ipxlat_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback = *cb); +int ipxlat_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info); + +extern struct genl_family ipxlat_nl_family; + +#endif /* _LINUX_IPXLAT_GEN_H */ diff --git a/drivers/net/ipxlat/netlink.c b/drivers/net/ipxlat/netlink.c new file mode 100644 index 000000000000..02d097726f22 --- /dev/null +++ b/drivers/net/ipxlat/netlink.c @@ -0,0 +1,348 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include +#include + +#include + +#include "netlink.h" +#include "main.h" +#include "netlink-gen.h" +#include "ipxlpriv.h" + +MODULE_ALIAS_GENL_FAMILY(IPXLAT_FAMILY_NAME); + +struct ipxlat_nl_info_ctx { + struct ipxlat_priv *ipxlat; + netdevice_tracker tracker; +}; + +struct ipxlat_nl_dump_ctx { + unsigned long last_ifindex; +}; + +/** + * ipxlat_get_from_attrs - retrieve ipxlat private data for target netdev + * @net: network namespace where to look for the interface + * @info: generic netlink info from the user request + * @tracker: tracker object to be used for the netdev reference acquisition + * + * Return: the ipxlat private data, if found, or an error otherwise + */ +static struct ipxlat_priv *ipxlat_get_from_attrs(struct net *net, + struct genl_info *info, + netdevice_tracker *tracker) +{ + struct ipxlat_priv *ipxlat; + struct net_device *dev; + int ifindex; + + if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_IFINDEX)) + return ERR_PTR(-EINVAL); + ifindex =3D nla_get_u32(info->attrs[IPXLAT_A_DEV_IFINDEX]); + + rcu_read_lock(); + dev =3D dev_get_by_index_rcu(net, ifindex); + if (!dev) { + rcu_read_unlock(); + NL_SET_ERR_MSG_MOD(info->extack, + "ifindex does not match any interface"); + return ERR_PTR(-ENODEV); + } + + if (!ipxlat_dev_is_valid(dev)) { + rcu_read_unlock(); + NL_SET_ERR_MSG_MOD(info->extack, + "specified interface is not ipxlat"); + NL_SET_BAD_ATTR(info->extack, + info->attrs[IPXLAT_A_DEV_IFINDEX]); + return ERR_PTR(-EINVAL); + } + + ipxlat =3D netdev_priv(dev); + netdev_hold(dev, tracker, GFP_ATOMIC); + rcu_read_unlock(); + + return ipxlat; +} + +int ipxlat_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *s= kb, + struct genl_info *info) +{ + struct ipxlat_nl_info_ctx *ctx =3D (struct ipxlat_nl_info_ctx *)info->ctx; + struct ipxlat_priv *ipxlat; + + BUILD_BUG_ON(sizeof(*ctx) > sizeof(info->ctx)); + + ipxlat =3D ipxlat_get_from_attrs(genl_info_net(info), info, + &ctx->tracker); + if (IS_ERR(ipxlat)) + return PTR_ERR(ipxlat); + + ctx->ipxlat =3D ipxlat; + return 0; +} + +void ipxlat_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff = *skb, + struct genl_info *info) +{ + struct ipxlat_nl_info_ctx *ctx =3D (struct ipxlat_nl_info_ctx *)info->ctx; + + if (ctx->ipxlat) + netdev_put(ctx->ipxlat->dev, &ctx->tracker); +} + +static int ipxlat_nl_send_dev(struct sk_buff *skb, struct ipxlat_priv *ipx= lat, + struct net *src_net, const u32 portid, + const u32 seq, int flags) +{ + struct nlattr *attr_cfg, *attr_pool; + struct ipv6_prefix xlat_prefix6; + int id, ret =3D -EMSGSIZE; + u32 lowest_ipv6_mtu; + void *hdr; + + /* snapshot settings under lock so userspace sees a coherent state */ + mutex_lock(&ipxlat->cfg_lock); + xlat_prefix6 =3D ipxlat->xlat_prefix6; + lowest_ipv6_mtu =3D ipxlat->lowest_ipv6_mtu; + mutex_unlock(&ipxlat->cfg_lock); + + hdr =3D genlmsg_put(skb, portid, seq, &ipxlat_nl_family, flags, + IPXLAT_CMD_DEV_GET); + if (!hdr) + return -ENOBUFS; + + if (nla_put_u32(skb, IPXLAT_A_DEV_IFINDEX, ipxlat->dev->ifindex)) + goto err; + + if (!net_eq(src_net, dev_net(ipxlat->dev))) { + id =3D peernet2id_alloc(src_net, dev_net(ipxlat->dev), + GFP_ATOMIC); + if (id < 0) { + ret =3D id; + goto err; + } + if (nla_put_s32(skb, IPXLAT_A_DEV_NETNSID, id)) + goto err; + } + + attr_cfg =3D nla_nest_start(skb, IPXLAT_A_DEV_CONFIG); + if (!attr_cfg) + goto err; + + attr_pool =3D nla_nest_start(skb, IPXLAT_A_CFG_XLAT_PREFIX6); + if (!attr_pool) + goto err; + + if (nla_put_in6_addr(skb, IPXLAT_A_POOL_PREFIX, &xlat_prefix6.addr) || + nla_put_u8(skb, IPXLAT_A_POOL_PREFIX_LEN, xlat_prefix6.len)) + goto err; + + nla_nest_end(skb, attr_pool); + + if (nla_put_u32(skb, IPXLAT_A_CFG_LOWEST_IPV6_MTU, lowest_ipv6_mtu)) + goto err; + + nla_nest_end(skb, attr_cfg); + genlmsg_end(skb, hdr); + + return 0; +err: + genlmsg_cancel(skb, hdr); + return ret; +} + +int ipxlat_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct ipxlat_nl_info_ctx *ctx =3D (struct ipxlat_nl_info_ctx *)info->ctx; + struct sk_buff *reply; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_IFINDEX)) + return -EINVAL; + + reply =3D nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!reply) + return -ENOMEM; + + ret =3D ipxlat_nl_send_dev(reply, ctx->ipxlat, genl_info_net(info), + info->snd_portid, info->snd_seq, 0); + if (ret < 0) { + nlmsg_free(reply); + return ret; + } + + return genlmsg_reply(reply, info); +} + +int ipxlat_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback = *cb) +{ + struct ipxlat_nl_dump_ctx *state =3D (struct ipxlat_nl_dump_ctx *)cb->ctx; + struct net *net =3D sock_net(cb->skb->sk); + netdevice_tracker tracker; + struct net_device *dev; + int ret; + + rcu_read_lock(); + for_each_netdev_dump(net, dev, state->last_ifindex) { + if (!ipxlat_dev_is_valid(dev)) + continue; + + netdev_hold(dev, &tracker, GFP_ATOMIC); + rcu_read_unlock(); + + ret =3D ipxlat_nl_send_dev(skb, netdev_priv(dev), net, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI); + + rcu_read_lock(); + netdev_put(dev, &tracker); + + if (ret < 0) { + if (skb->len > 0) + break; + rcu_read_unlock(); + return ret; + } + } + rcu_read_unlock(); + return skb->len; +} + +static int ipxlat_nl_validate_xlat_prefix6(const struct ipv6_prefix *prefi= x, + struct netlink_ext_ack *extack) +{ + if (prefix->len !=3D 32 && prefix->len !=3D 40 && prefix->len !=3D 48 && + prefix->len !=3D 56 && prefix->len !=3D 64 && prefix->len !=3D 96) { + NL_SET_ERR_MSG_FMT_MOD(extack, + "unsupported RFC 6052 prefix length: %u", + prefix->len); + return -EINVAL; + } + + return 0; +} + +static int ipxlat_nl_parse_xlat_prefix6(struct nlattr *attr, + struct ipv6_prefix *xlat_prefix6, + struct netlink_ext_ack *extack) +{ + struct nlattr *attrs_pool[IPXLAT_A_POOL_MAX + 1]; + struct ipv6_prefix new_xlat_prefix6; + int ret; + + new_xlat_prefix6 =3D *xlat_prefix6; + + ret =3D nla_parse_nested(attrs_pool, IPXLAT_A_POOL_MAX, attr, + ipxlat_pool_nl_policy, extack); + if (ret) + return ret; + + if (!attrs_pool[IPXLAT_A_POOL_PREFIX] && + !attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]) { + NL_SET_ERR_MSG_MOD(extack, "xlat-prefix6 update is empty"); + return -EINVAL; + } + + if (attrs_pool[IPXLAT_A_POOL_PREFIX]) + new_xlat_prefix6.addr =3D + nla_get_in6_addr(attrs_pool[IPXLAT_A_POOL_PREFIX]); + if (attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]) + new_xlat_prefix6.len =3D + nla_get_u8(attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]); + + ret =3D ipxlat_nl_validate_xlat_prefix6(&new_xlat_prefix6, extack); + if (ret) { + if (attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]) + NL_SET_BAD_ATTR(extack, + attrs_pool[IPXLAT_A_POOL_PREFIX_LEN]); + else + NL_SET_BAD_ATTR(extack, + attrs_pool[IPXLAT_A_POOL_PREFIX]); + return ret; + } + + *xlat_prefix6 =3D new_xlat_prefix6; + return 0; +} + +int ipxlat_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct ipxlat_nl_info_ctx *ctx =3D (struct ipxlat_nl_info_ctx *)info->ctx; + struct nlattr *attrs[IPXLAT_A_CFG_MAX + 1]; + struct nlattr *xlat_prefix6_attr; + struct ipv6_prefix xlat_prefix6; + u32 lowest_ipv6_mtu; + int ret =3D 0; + + if (GENL_REQ_ATTR_CHECK(info, IPXLAT_A_DEV_CONFIG)) + return -EINVAL; + + ret =3D nla_parse_nested(attrs, IPXLAT_A_CFG_MAX, + info->attrs[IPXLAT_A_DEV_CONFIG], + ipxlat_cfg_nl_policy, info->extack); + if (ret) + return ret; + + if (!attrs[IPXLAT_A_CFG_XLAT_PREFIX6] && + !attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]) { + NL_SET_ERR_MSG_MOD(info->extack, "config update is empty"); + return -EINVAL; + } + xlat_prefix6_attr =3D attrs[IPXLAT_A_CFG_XLAT_PREFIX6]; + + mutex_lock(&ctx->ipxlat->cfg_lock); + + /* Stage updates that can fail before writing device state. + * This keeps dev-set all-or-nothing and avoids partial commits when + * xlat-prefix parsing/validation fails. + */ + if (xlat_prefix6_attr) { + xlat_prefix6 =3D ctx->ipxlat->xlat_prefix6; + ret =3D ipxlat_nl_parse_xlat_prefix6(xlat_prefix6_attr, + &xlat_prefix6, + info->extack); + if (ret) + goto out_unlock; + } + + if (xlat_prefix6_attr) + ctx->ipxlat->xlat_prefix6 =3D xlat_prefix6; + if (attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]) { + lowest_ipv6_mtu =3D + nla_get_u32(attrs[IPXLAT_A_CFG_LOWEST_IPV6_MTU]); + WRITE_ONCE(ctx->ipxlat->lowest_ipv6_mtu, lowest_ipv6_mtu); + } + +out_unlock: + mutex_unlock(&ctx->ipxlat->cfg_lock); + return ret; +} + +/** + * ipxlat_nl_register - perform any needed registration in the netlink sub= system + * + * Return: 0 on success, a negative error code otherwise + */ +int __init ipxlat_nl_register(void) +{ + return genl_register_family(&ipxlat_nl_family); +} + +/** + * ipxlat_nl_unregister - undo any module wide netlink registration + */ +void ipxlat_nl_unregister(void) +{ + genl_unregister_family(&ipxlat_nl_family); +} diff --git a/drivers/net/ipxlat/netlink.h b/drivers/net/ipxlat/netlink.h new file mode 100644 index 000000000000..1ea292ad9964 --- /dev/null +++ b/drivers/net/ipxlat/netlink.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#ifndef _NET_IPXLAT_NETLINK_H_ +#define _NET_IPXLAT_NETLINK_H_ + +/** + * ipxlat_nl_register - register ipxlat generic-netlink family + * + * Return: 0 on success, negative errno on registration failure. + */ +int ipxlat_nl_register(void); + +/** + * ipxlat_nl_unregister - unregister ipxlat generic-netlink family + */ +void ipxlat_nl_unregister(void); + +#endif /* _NET_IPXLAT_NETLINK_H_ */ diff --git a/drivers/net/ipxlat/translate_46.c b/drivers/net/ipxlat/transla= te_46.c index 0b79ca07c771..d625dc85576b 100644 --- a/drivers/net/ipxlat/translate_46.c +++ b/drivers/net/ipxlat/translate_46.c @@ -14,6 +14,7 @@ #include =20 #include "address.h" +#include "icmp.h" #include "packet.h" #include "transport.h" #include "translate_46.h" @@ -239,7 +240,7 @@ int ipxlat_46_translate(struct ipxlat_priv *ipxlat, str= uct sk_buff *skb) err =3D ipxlat_46_outer_udp(skb, &outer4); break; case IPPROTO_ICMP: - err =3D -EPROTONOSUPPORT; + err =3D ipxlat_46_icmp(ipxlat, skb); break; default: err =3D 0; diff --git a/include/uapi/linux/ipxlat.h b/include/uapi/linux/ipxlat.h new file mode 100644 index 000000000000..f8db3df3f9e8 --- /dev/null +++ b/include/uapi/linux/ipxlat.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Cl= ause) */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/ipxlat.yaml */ +/* YNL-GEN uapi header */ +/* To regenerate run: tools/net/ynl/ynl-regen.sh */ + +#ifndef _UAPI_LINUX_IPXLAT_H +#define _UAPI_LINUX_IPXLAT_H + +#define IPXLAT_FAMILY_NAME "ipxlat" +#define IPXLAT_FAMILY_VERSION 1 + +#define IPXLAT_XLAT_PREFIX6_MAX_PREFIX_LEN 96 + +enum { + IPXLAT_A_POOL_PREFIX =3D 1, + IPXLAT_A_POOL_PREFIX_LEN, + + __IPXLAT_A_POOL_MAX, + IPXLAT_A_POOL_MAX =3D (__IPXLAT_A_POOL_MAX - 1) +}; + +enum { + IPXLAT_A_CFG_XLAT_PREFIX6 =3D 1, + IPXLAT_A_CFG_LOWEST_IPV6_MTU, + + __IPXLAT_A_CFG_MAX, + IPXLAT_A_CFG_MAX =3D (__IPXLAT_A_CFG_MAX - 1) +}; + +enum { + IPXLAT_A_DEV_IFINDEX =3D 1, + IPXLAT_A_DEV_NETNSID, + IPXLAT_A_DEV_CONFIG, + + __IPXLAT_A_DEV_MAX, + IPXLAT_A_DEV_MAX =3D (__IPXLAT_A_DEV_MAX - 1) +}; + +enum { + IPXLAT_CMD_DEV_GET =3D 1, + IPXLAT_CMD_DEV_SET, + + __IPXLAT_CMD_MAX, + IPXLAT_CMD_MAX =3D (__IPXLAT_CMD_MAX - 1) +}; + +#endif /* _UAPI_LINUX_IPXLAT_H */ --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-206.mailbox.org (mout-b-206.mailbox.org [195.10.208.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7AFC3DC4B1; Thu, 19 Mar 2026 15:19:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933588; cv=none; b=PP8aO6shN7F4IQYldClRtSx5OnzaOjpeDQyFJQmuLIdsll9ZuB4uIni4boGAVhuyC575VGzDVp3tqX5JecytMPyk2c/aPfvHOnkGSuL8Eqqes5xPphWbYZEsiPS2Dmvl9fkbyWrC6H0p8S7X3pS7CAPBmvShhbi4ueTQw++oOWw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933588; c=relaxed/simple; bh=MsKo4Zoo02SIR8owpdwiyrumQgy7wYQYv3uePrWS/tI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DukSQvCHD7+Kl5jPsxHxvZJYUE+D1I6xxHk/lVuoCnLchVB+yNL3hbnWa5xMBYufR7LfRp80q6EFgbs1skqh+Tv+/UAKOKhbcyibyiQDYqjNJE49d01TzPhNeQc8UDnHisAOO1dJ/HXJnAVlpWK5MYKpWfN7yKnaZBNQeayeme8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=nr18a5m0; arc=none smtp.client-ip=195.10.208.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="nr18a5m0" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-206.mailbox.org (Postfix) with ESMTPS id 4fc8Nr1fGFz9xrR; Thu, 19 Mar 2026 16:14:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933244; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FxqOcZZ5MjGkDOCx3+Lq7zc+VMke3dRIIT4JhJAl1YI=; b=nr18a5m0SewJuROb8YETETTBY8sg4B482rIvyaEHXt5gw+hFc4pbIAyjzB86yCKrjQBpDe TiN5WDn8eXvG635ub3dUZ/RJ+A36tszTYfvx/FG0X+3Z7U0CiVefN2aAN0Dy009KFiJRh6 cwB0t0XJpUkYw1EcXPpuyyCUgTlKWHu2QMqcfQ+8lcgtfPwHQFHABkH/lX4Vkc57FvnHmw b16VwlcQTslfbZZqWHu1V231xymHbIhHSsrB+cOHl20kMsvZQK0wug5eLnyvRqxue+c8Gz 2PM60kxNy04u6fIJuHhPtrc0VPB5msE47HDgJ4NJJLpo2mTavcP40qgEGXLtEw== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Ralf Lici , Antonio Quartulli , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Shuah Khan , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC net-next 14/15] selftests: net: add ipxlat coverage Date: Thu, 19 Mar 2026 16:12:23 +0100 Message-ID: <20260319151230.655687-15-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add selftests for ipxlat data plane behavior and control-plane setup. The tests build an isolated netns topology, configure ipxlat through YNL, and exercise core traffic classes (TCP, UDP, ICMP info/error, and fragment-related paths). This provides reproducible end-to-end coverage for the translation pipeline and basic regression protection for future changes. Signed-off-by: Ralf Lici --- tools/testing/selftests/net/ipxlat/.gitignore | 1 + tools/testing/selftests/net/ipxlat/Makefile | 25 ++ .../selftests/net/ipxlat/ipxlat_data.sh | 70 +++++ .../selftests/net/ipxlat/ipxlat_frag.sh | 70 +++++ .../selftests/net/ipxlat/ipxlat_icmp_err.sh | 54 ++++ .../selftests/net/ipxlat/ipxlat_lib.sh | 273 ++++++++++++++++++ .../net/ipxlat/ipxlat_udp4_zero_csum_send.c | 119 ++++++++ 7 files changed, 612 insertions(+) create mode 100644 tools/testing/selftests/net/ipxlat/.gitignore create mode 100644 tools/testing/selftests/net/ipxlat/Makefile create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_data.sh create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_frag.sh create mode 100755 tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_lib.sh create mode 100644 tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csu= m_send.c diff --git a/tools/testing/selftests/net/ipxlat/.gitignore b/tools/testing/= selftests/net/ipxlat/.gitignore new file mode 100644 index 000000000000..43bd01d8a84b --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/.gitignore @@ -0,0 +1 @@ +ipxlat_udp4_zero_csum_send diff --git a/tools/testing/selftests/net/ipxlat/Makefile b/tools/testing/se= lftests/net/ipxlat/Makefile new file mode 100644 index 000000000000..cca588945e48 --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/Makefile @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: GPL-2.0 +# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver +# +# Copyright (C) 2026- Mandelbit SRL +# Copyright (C) 2026- Daniel Gr=C3=B6ber +# +# Author: Antonio Quartulli +# Daniel Gr=C3=B6ber +# Ralf Lici + +TEST_PROGS :=3D \ + ipxlat_data.sh \ + ipxlat_frag.sh \ + ipxlat_icmp_err.sh \ +# end of TEST_PROGS + +TEST_FILES :=3D \ + ipxlat_lib.sh \ +# end of TEST_FILES + +TEST_GEN_FILES :=3D \ + ipxlat_udp4_zero_csum_send \ +# end of TEST_GEN_FILES + +include ../../lib.mk diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_data.sh b/tools/test= ing/selftests/net/ipxlat/ipxlat_data.sh new file mode 100755 index 000000000000..101e0a65f0a9 --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/ipxlat_data.sh @@ -0,0 +1,70 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver +# +# Copyright (C) 2026- Mandelbit SRL +# Copyright (C) 2026- Daniel Gr=C3=B6ber +# +# Author: Antonio Quartulli +# Daniel Gr=C3=B6ber +# Ralf Lici + +set -o pipefail + +SCRIPT_DIR=3D$(dirname "$(readlink -f "$0")") +source "$SCRIPT_DIR/ipxlat_lib.sh" + +trap ipxlat_cleanup EXIT + +ipxlat_setup_env + +# Send ICMP Echo and verify we receive a reply back + +RET=3D0 +ip netns exec "$NS4" ping -c 2 -W 2 "$IPXLAT_V4_REMOTE" >/dev/null 2>&1 +check_err $? "ping 4->6 failed" +log_test "icmp-info 4->6" + +RET=3D0 +ip netns exec "$NS6" ping -6 -c 2 -W 2 -I "$IPXLAT_V6_NS6_SRC" \ + "$IPXLAT_V6_NS4" >/dev/null 2>&1 +check_err $? "ping 6->4 failed" +log_test "icmp-info 6->4" + +# Run a TCP data transfer over the translator path + +RET=3D0 +ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5201 -n 256K +check_err $? "tcp 4->6 failed" +log_test "tcp 4->6" + +RET=3D0 +ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5201 \ + -B "$IPXLAT_V6_NS6_SRC" -n 256K +check_err $? "tcp 6->4 failed" +log_test "tcp 6->4" + +# Run UDP traffic to verify UDP translation and delivery + +RET=3D0 +ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5202 -u -b 5M -t 1 +check_err $? "udp 4->6 failed" +log_test "udp 4->6" + +RET=3D0 +ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5202 \ + -B "$IPXLAT_V6_NS6_SRC" -u -b 5M -t 1 +check_err $? "udp 6->4 failed" +log_test "udp 6->4" + +# Send one IPv4 UDP packet with checksum=3D0 and verify 4->6 translation. + +RET=3D0 +ipxlat_capture_pkts "$NS6" \ + "ip6 and udp and dst host $IPXLAT_V6_REMOTE and dst port 5555" 1 3 \ + ip netns exec "$NS4" "$SCRIPT_DIR/ipxlat_udp4_zero_csum_send" \ + "$IPXLAT_NS4_ADDR" "$IPXLAT_V4_REMOTE" 5555 +check_err $? "udp checksum-zero 4->6 failed" +log_test "udp checksum-zero 4->6" + +exit "$EXIT_STATUS" diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh b/tools/test= ing/selftests/net/ipxlat/ipxlat_frag.sh new file mode 100755 index 000000000000..26ed351cd263 --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/ipxlat_frag.sh @@ -0,0 +1,70 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver +# +# Copyright (C) 2026- Mandelbit SRL +# Copyright (C) 2026- Daniel Gr=C3=B6ber +# +# Author: Antonio Quartulli +# Daniel Gr=C3=B6ber +# Ralf Lici + +set -o pipefail + +SCRIPT_DIR=3D$(dirname "$(readlink -f "$0")") +source "$SCRIPT_DIR/ipxlat_lib.sh" + +trap ipxlat_cleanup EXIT + +ipxlat_setup_env + +# Exercise large TCP flow on 4->6 path to cover pre-fragmentation behavior +RET=3D0 +ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5301 -n 8M +check_err $? "large tcp 4->6 failed" +log_test "large tcp 4->6" + +# Exercise large UDP flow on 4->6 path to cover pre-fragmentation behavior +RET=3D0 +ipxlat_run_iperf "$NS6" "$NS4" "$IPXLAT_V4_REMOTE" 5302 -u -b 20M -t 2 -l = 1400 +check_err $? "large udp 4->6 failed" +log_test "large udp 4->6" + +# Exercise large TCP flow on 6->4 path to cover +# fragmentation-sensitive translation +RET=3D0 +ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5303 \ + -B "$IPXLAT_V6_NS6_SRC" -n 8M +check_err $? "large tcp 6->4 failed" +log_test "large tcp 6->4" + +# Exercise large UDP flow on 6->4 path to cover +# fragmentation-sensitive translation +RET=3D0 +ipxlat_run_iperf "$NS4" "$NS6" "$IPXLAT_V6_NS4" 5304 \ + -B "$IPXLAT_V6_NS6_SRC" -u -b 20M -t 2 -l 1400 +check_err $? "large udp 6->4 failed" +log_test "large udp 6->4" + +# Send oversized IPv4 ICMP Echo with DF disabled (source fragmentation all= owed) +# and verify translator drops fragmented ICMPv4 input (no translated ICMPv6 +# Echo seen in NS6) +RET=3D0 +ipxlat_capture_pkts "$NS6" "icmp6 and ip6[40] =3D=3D 128" 0 5 \ + ip netns exec "$NS4" bash -c \ + "ping -M \"dont\" -s 2000 -c 1 -W 1 \"$IPXLAT_V4_REMOTE\" \ + >/dev/null 2>&1 || test \$? -eq 1" +check_err $? "fragmented icmp 4->6 should be dropped" +log_test "drop fragmented icmp 4->6" + +# Send oversized IPv6 ICMP echo request and verify translator drops fragme= nted +# ICMPv6 input (no translated ICMPv4 Echo seen in NS4) +RET=3D0 +ipxlat_capture_pkts "$NS4" "icmp and icmp[0] =3D=3D 8" 0 5 \ + ip netns exec "$NS6" bash -c \ + "ping -6 -s 2000 -c 1 -W 1 -I \"$IPXLAT_V6_NS6_SRC\" \ + \"$IPXLAT_V6_NS4\" >/dev/null 2>&1 || test \$? -eq 1" +check_err $? "fragmented icmp 6->4 should be dropped" +log_test "drop fragmented icmp 6->4" + +exit "$EXIT_STATUS" diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh b/tools/= testing/selftests/net/ipxlat/ipxlat_icmp_err.sh new file mode 100755 index 000000000000..946584b55895 --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/ipxlat_icmp_err.sh @@ -0,0 +1,54 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver +# +# Copyright (C) 2026- Mandelbit SRL +# Copyright (C) 2026- Daniel Gr=C3=B6ber +# +# Author: Antonio Quartulli +# Daniel Gr=C3=B6ber +# Ralf Lici + +set -o pipefail + +SCRIPT_DIR=3D$(dirname "$(readlink -f "$0")") +source "$SCRIPT_DIR/ipxlat_lib.sh" + +trap ipxlat_cleanup EXIT + +ipxlat_setup_env + +# Trigger UDP to a closed port from NS4 and capture translated +# ICMPv4 Port Unreachable +RET=3D0 +ipxlat_capture_pkts "$NS4" "icmp and icmp[0] =3D=3D 3 and icmp[1] =3D=3D 3= " 1 3 \ + ip netns exec "$NS4" bash -c \ + "echo x > /dev/udp/$IPXLAT_V4_REMOTE/9 || true" +check_err $? "icmp-error 4->6 not observed" +log_test "icmp-error xlate 4->6" + +# Trigger UDP to a closed port from NS6 and capture translated +# ICMPv6 Port Unreachable +RET=3D0 +ipxlat_capture_pkts "$NS6" "icmp6 and ip6[40] =3D=3D 1 and ip6[41] =3D=3D = 4" 1 3 \ + ip netns exec "$NS6" bash -c \ + "echo x > /dev/udp/$IPXLAT_V6_NS4/9 || true" +check_err $? "icmp-error 6->4 not observed" +log_test "icmp-error xlate 6->4" + +# Send oversized DF IPv4 packet and verify local ICMPv4 +# Fragmentation Needed emission +sysctl -qw net.ipv4.conf.ipxl0.accept_local=3D1 +sysctl -qw net.ipv4.conf.all.rp_filter=3D0 +sysctl -qw net.ipv4.conf.default.rp_filter=3D0 +sysctl -qw net.ipv4.conf.ipxl0.rp_filter=3D0 +sleep 2 +RET=3D0 +ipxlat_capture_pkts "$NS4" "icmp and icmp[0] =3D=3D 3 and icmp[1] =3D=3D 4= " 1 3 \ + ip netns exec "$NS4" bash -c \ + "ping -M \"do\" -s 1300 -c 1 -W 1 \"$IPXLAT_V4_REMOTE\" \ + >/dev/null 2>&1 || test \$? -eq 1" +check_err $? "icmpv4 frag-needed emission not observed" +log_test "icmpv4 frag-needed emission" + +exit "$EXIT_STATUS" diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh b/tools/testi= ng/selftests/net/ipxlat/ipxlat_lib.sh new file mode 100644 index 000000000000..e27683f280d4 --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/ipxlat_lib.sh @@ -0,0 +1,273 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver +# +# Copyright (C) 2026- Mandelbit SRL +# Copyright (C) 2026- Daniel Gr=C3=B6ber +# +# Author: Antonio Quartulli +# Daniel Gr=C3=B6ber +# Ralf Lici + +set -o pipefail + +IPXLAT_TEST_DIR=3D$(dirname "$(readlink -f "${BASH_SOURCE[0]}")") +source "$IPXLAT_TEST_DIR/../lib.sh" + +KDIR=3D${KDIR:-$(readlink -f "$IPXLAT_TEST_DIR/../../../../../")} +YNL_CLI=3D"$KDIR/tools/net/ynl/pyynl/cli.py" +YNL_SPEC=3D"$KDIR/Documentation/netlink/specs/ipxlat.yaml" +IPXLAT_IPERF_TIMEOUT=3D${IPXLAT_IPERF_TIMEOUT:-10} + +IPXLAT_TRANSLATOR_DEV=3Dipxl0 +IPXLAT_VETH4_HOST=3Dveth4r +IPXLAT_VETH4_NS=3Dveth4n +IPXLAT_VETH6_HOST=3Dveth6r +IPXLAT_VETH6_NS=3Dveth6n + +IPXLAT_XLAT_PREFIX6=3D2001:db8:100:: +IPXLAT_XLAT_PREFIX6_LEN=3D40 +IPXLAT_XLAT_PREFIX6_HEX=3D20010db8010000000000000000000000 +IPXLAT_LOWEST_IPV6_MTU=3D1280 + +IPXLAT_HOST4_ADDR=3D198.51.100.1 +IPXLAT_HOST6_ADDR=3D2001:db8:1::1 + +IPXLAT_NS4_ADDR=3D198.51.100.2 +IPXLAT_NS6_ADDR=3D2001:db8:1::2 +export IPXLAT_V4_REMOTE=3D192.0.2.33 + +IPXLAT_V6_REMOTE=3D2001:db8:1c0:2:21:: +IPXLAT_V6_NS4=3D2001:db8:1c6:3364:2:: +IPXLAT_V6_NS6_SRC=3D2001:db8:1c0:2:2:: + +NS4=3D"" +NS6=3D"" + +ipxlat_ynl() +{ + python3 "$YNL_CLI" --spec "$YNL_SPEC" "$@" +} + +ipxlat_build_dev_set_json() +{ + local ifindex=3D"$1" + + jq -cn \ + --argjson ifindex "$ifindex" \ + --arg prefix "$IPXLAT_XLAT_PREFIX6_HEX" \ + --argjson prefix_len "$IPXLAT_XLAT_PREFIX6_LEN" \ + --argjson lowest_ipv6_mtu "$IPXLAT_LOWEST_IPV6_MTU" \ + '{ + ifindex: $ifindex, + config: { + "xlat-prefix6": { + prefix: $prefix, + "prefix-len": $prefix_len + }, + "lowest-ipv6-mtu": $lowest_ipv6_mtu + } + }' +} + +ipxlat_require_root() +{ + if [[ $(id -u) -ne 0 ]]; then + echo "ipxlat selftests need root; skipping" + exit "$ksft_skip" + fi +} + +ipxlat_require_tools() +{ + if [[ ! -f "$YNL_CLI" || ! -f "$YNL_SPEC" ]]; then + log_test_skip "ipxlat netlink spec/ynl not found" + exit "$ksft_skip" + fi + + for tool in ip python3 ping iperf3 tcpdump timeout jq; do + require_command "$tool" + done +} + +ipxlat_cleanup() +{ + cleanup_ns "${NS4:-}" "${NS6:-}" || true + ip link del "$IPXLAT_TRANSLATOR_DEV" 2>/dev/null || true + ip link del "$IPXLAT_VETH4_HOST" 2>/dev/null || true + ip link del "$IPXLAT_VETH6_HOST" 2>/dev/null || true +} + +# Test topology: +# +# host namespace: +# - owns ipxlat dev `ipxl0` +# - has veth peers `veth4r` and `veth6r` +# - routes IPv4 test prefix (192.0.2.0/24) to ipxl0 (v4 network steering= rule) +# - routes xlat-prefix6 prefix (2001:db8:100::/40) out to NS6 side +# - routes mapped NS4 IPv6 identity (2001:db8:1c6:3364:2::/128) to ipxl0 +# so NS6->NS4 traffic enters 6->4 translation +# +# NS4: +# - IPv4-only endpoint: 198.51.100.2/24 on veth4n +# - default route via host 198.51.100.1 (veth4r) +# - sends traffic to 192.0.2.33 (translated by ipxl0 to IPv6) +# +# NS6: +# - IPv6 endpoint: 2001:db8:1::2/64 on veth6n +# - also owns mapped addresses used by tests: +# 2001:db8:1c0:2:21:: (maps to 192.0.2.33) +# 2001:db8:1c0:2:2:: (maps to 192.0.2.2, used as explicit src +# since we have multiple v6 addresses) +# - route to mapped NS4 IPv6 address is pinned via host: +# 2001:db8:1c6:3364:2::/128 +# This keeps the 6->4 test path deterministic. +# +# ipxlat config under test: +# - xlat-prefix6 =3D 2001:db8:100::/40 +# - lowest-ipv6-mtu =3D 1280 +ipxlat_configure_topology() +{ + local ifindex + local dev_set_json + + if ! ip link add "$IPXLAT_TRANSLATOR_DEV" type ipxlat; then + echo "ipxlat link kind unavailable; skipping" + exit "$ksft_skip" + fi + ip link set "$IPXLAT_TRANSLATOR_DEV" up + ifindex=3D$(cat /sys/class/net/"$IPXLAT_TRANSLATOR_DEV"/ifindex) + dev_set_json=3D$(ipxlat_build_dev_set_json "$ifindex") + + if ! ipxlat_ynl --do dev-set --json "$dev_set_json" >/dev/null; then + echo "ipxlat dev-set failed" + exit "$ksft_fail" + fi + + setup_ns NS4 NS6 || exit "$ksft_skip" + + ip link add "$IPXLAT_VETH4_HOST" type veth peer name "$IPXLAT_VETH4_NS" + ip link add "$IPXLAT_VETH6_HOST" type veth peer name "$IPXLAT_VETH6_NS" + ip link set "$IPXLAT_VETH4_NS" netns "$NS4" + ip link set "$IPXLAT_VETH6_NS" netns "$NS6" + + ip addr add "$IPXLAT_HOST4_ADDR/24" dev "$IPXLAT_VETH4_HOST" + ip -6 addr add "$IPXLAT_HOST6_ADDR/64" dev "$IPXLAT_VETH6_HOST" + ip link set "$IPXLAT_VETH4_HOST" up + ip link set "$IPXLAT_VETH6_HOST" up + + ip netns exec "$NS4" ip addr add "$IPXLAT_NS4_ADDR/24" \ + dev "$IPXLAT_VETH4_NS" + ip netns exec "$NS4" ip link set "$IPXLAT_VETH4_NS" up + ip netns exec "$NS4" ip route add default via "$IPXLAT_HOST4_ADDR" + + ip netns exec "$NS6" ip -6 addr add "$IPXLAT_NS6_ADDR/64" \ + dev "$IPXLAT_VETH6_NS" + ip netns exec "$NS6" ip -6 addr add "$IPXLAT_V6_REMOTE/128" \ + dev "$IPXLAT_VETH6_NS" + ip netns exec "$NS6" ip -6 addr add "$IPXLAT_V6_NS6_SRC/128" \ + dev "$IPXLAT_VETH6_NS" + ip netns exec "$NS6" ip link set "$IPXLAT_VETH6_NS" up + ip netns exec "$NS6" ip -6 route add default via "$IPXLAT_HOST6_ADDR" + ip netns exec "$NS6" ip -6 route replace "$IPXLAT_V6_NS4/128" \ + via "$IPXLAT_HOST6_ADDR" + sleep 2 + + sysctl -qw net.ipv4.ip_forward=3D1 + sysctl -qw net.ipv6.conf.all.forwarding=3D1 + + # 4->6 steering rule + ip route replace 192.0.2.0/24 dev "$IPXLAT_TRANSLATOR_DEV" + # Post-translation egress: + # IPv6 destinations in xlat-prefix6 leave toward NS6. + ip -6 route replace "$IPXLAT_XLAT_PREFIX6/$IPXLAT_XLAT_PREFIX6_LEN" \ + dev "$IPXLAT_VETH6_HOST" + # 6->4 steering rule + ip -6 route replace "$IPXLAT_V6_NS4/128" dev "$IPXLAT_TRANSLATOR_DEV" + + ip link set "$IPXLAT_VETH6_HOST" mtu 1280 + ip netns exec "$NS6" ip link set "$IPXLAT_VETH6_NS" mtu 1280 +} + +ipxlat_setup_env() +{ + ipxlat_require_root + ipxlat_require_tools + ipxlat_cleanup + + ipxlat_configure_topology +} + +ipxlat_run_iperf() +{ + local srv_ns=3D"$1" + local cli_ns=3D"$2" + local dst=3D"$3" + local port=3D"$4" + local -a args=3D() + local client_rc + local server_rc + local spid + local idx + + for ((idx =3D 5; idx <=3D $#; idx++)); do + args+=3D("${!idx}") + done + + ip netns exec "$srv_ns" timeout "$IPXLAT_IPERF_TIMEOUT" \ + iperf3 -s -1 -p "$port" >/dev/null 2>&1 & + spid=3D$! + sleep 0.2 + + ip netns exec "$cli_ns" timeout "$IPXLAT_IPERF_TIMEOUT" \ + iperf3 -c "$dst" -p "$port" "${args[@]}" >/dev/null 2>&1 + + client_rc=3D$? + if [[ $client_rc -ne 0 ]]; then + kill "$spid" >/dev/null 2>&1 || true + fi + + wait "$spid" >/dev/null 2>&1 + server_rc=3D$? + + ((client_rc !=3D 0)) && return "$client_rc" + return "$server_rc" +} + +ipxlat_capture_pkts() +{ + local ns=3D"$1" + local filter=3D"$2" + local expect_pkts=3D"$3" + local timeout_s=3D"$4" + local cap_goal + local cap_pid + local rc + local trigger_rc + + shift 4 + + cap_goal=3D1 + [[ $expect_pkts -gt 0 ]] && cap_goal=3D$expect_pkts + + ip netns exec "$ns" timeout "$timeout_s" \ + tcpdump -nni any -c "$cap_goal" \ + "$filter" >/dev/null 2>&1 & + cap_pid=3D$! + sleep 0.2 + + "$@" + trigger_rc=3D$? + wait "$cap_pid" >/dev/null 2>&1 + rc=3D$? + + if [[ $trigger_rc -ne 0 ]]; then + return "$trigger_rc" + fi + + if [[ $expect_pkts -eq 0 ]]; then + [[ $rc -eq 124 ]] + else + [[ $rc -eq 0 ]] + fi +} diff --git a/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.= c b/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c new file mode 100644 index 000000000000..ef9f07f8d699 --- /dev/null +++ b/tools/testing/selftests/net/ipxlat/ipxlat_udp4_zero_csum_send.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 +/* IPXLAT - Stateless IP/ICMP Translation (SIIT) virtual device driver + * + * Copyright (C) 2026- Mandelbit SRL + * Copyright (C) 2026- Daniel Gr=C3=B6ber + * + * Author: Antonio Quartulli + * Daniel Gr=C3=B6ber + * Ralf Lici + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static uint16_t iphdr_csum(const void *buf, size_t len) +{ + const uint16_t *p =3D buf; + uint32_t sum =3D 0; + + while (len > 1) { + sum +=3D *p++; + len -=3D 2; + } + if (len) + sum +=3D *(const uint8_t *)p; + + while (sum >> 16) + sum =3D (sum & 0xffff) + (sum >> 16); + + return (uint16_t)~sum; +} + +int main(int argc, char **argv) +{ + static const char payload[] =3D "ipxlat-zero-udp-csum"; + struct sockaddr_in dst =3D {}; + struct { + struct iphdr ip; + struct udphdr udp; + char payload[sizeof(payload)]; + } pkt =3D {}; + in_addr_t saddr, daddr; + unsigned long dport_ul; + socklen_t dst_len; + ssize_t n; + int one =3D 1; + int fd; + + if (argc !=3D 4) { + fprintf(stderr, "usage: %s \n", argv[0]); + return 2; + } + + if (!inet_pton(AF_INET, argv[1], &saddr) || + !inet_pton(AF_INET, argv[2], &daddr)) { + fprintf(stderr, "invalid IPv4 address\n"); + return 2; + } + + errno =3D 0; + dport_ul =3D strtoul(argv[3], NULL, 10); + if (errno || dport_ul > 65535) { + fprintf(stderr, "invalid UDP port\n"); + return 2; + } + + fd =3D socket(AF_INET, SOCK_RAW, IPPROTO_RAW); + if (fd < 0) { + perror("socket"); + return 1; + } + + if (setsockopt(fd, IPPROTO_IP, IP_HDRINCL, &one, sizeof(one)) < 0) { + perror("setsockopt(IP_HDRINCL)"); + close(fd); + return 1; + } + + pkt.ip.version =3D 4; + pkt.ip.ihl =3D 5; + pkt.ip.ttl =3D 64; + pkt.ip.protocol =3D IPPROTO_UDP; + pkt.ip.tot_len =3D htons(sizeof(pkt)); + pkt.ip.id =3D htons(1); + pkt.ip.frag_off =3D 0; + pkt.ip.saddr =3D saddr; + pkt.ip.daddr =3D daddr; + pkt.ip.check =3D iphdr_csum(&pkt.ip, sizeof(pkt.ip)); + + pkt.udp.source =3D htons(4242); + pkt.udp.dest =3D htons((uint16_t)dport_ul); + pkt.udp.len =3D htons(sizeof(pkt.udp) + sizeof(payload)); + pkt.udp.check =3D 0; + + memcpy(pkt.payload, payload, sizeof(payload)); + + dst.sin_family =3D AF_INET; + dst.sin_port =3D pkt.udp.dest; + dst.sin_addr.s_addr =3D daddr; + dst_len =3D sizeof(dst); + + n =3D sendto(fd, &pkt, sizeof(pkt), 0, (struct sockaddr *)&dst, dst_len); + if (n !=3D (ssize_t)sizeof(pkt)) { + perror("sendto"); + close(fd); + return 1; + } + + close(fd); + return 0; +} --=20 2.53.0 From nobody Mon Apr 6 10:42:06 2026 Received: from mout-b-203.mailbox.org (mout-b-203.mailbox.org [195.10.208.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FE7B3A7581; Thu, 19 Mar 2026 15:14:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.10.208.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933257; cv=none; b=jz4uxL0sRo0LSK/SDhhKTilPzdcXjI0vfAMzSZ5DgJze/RQeOksCbY851yIAxaQixgn959L0aU/pbbHtHqJHn2h1VTjb2iZkmtdQL2igEXSfOiunjixmhBoCG5YUp5nWMirmL57uti8HtE3Z6qs8itQhY5Ajzm6YaZ4r8nsxWi0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933257; c=relaxed/simple; bh=n8R0lxQXS/WRXKTyJKSYL9wuO+24uiQFtgi6nZnJAK0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=i4UyGRsm6kyNRxUsKAKKHYj7PcCbaXtWpTnsh+LoEejLIJOwLvNxWyQApDjFC/F57ZKKbJ7coRRMaA5bfEUP9haIfF2zrfoMSPi1XztrDm3bMNlSwe7rK4bFYcuIo3VHraz2VaSN5m39B4IrO81CSA5VcExB3i3hTcb2TIi7ODY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com; spf=pass smtp.mailfrom=mandelbit.com; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b=eXj+SHQW; arc=none smtp.client-ip=195.10.208.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mandelbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mandelbit.com header.i=@mandelbit.com header.b="eXj+SHQW" Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-b-203.mailbox.org (Postfix) with ESMTPS id 4fc8P035MVz9xDr; Thu, 19 Mar 2026 16:14:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandelbit.com; s=MBO0001; t=1773933252; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S7WhGtrDIV2TD9WUU8l0miYi1zOi2sYafJlFAbioGLo=; b=eXj+SHQWkzay4JN2jAsFgEcsoQ6qBvwrc+fMFsOvi57VNcHgsLtzEwkW4S/hlQr/p1E8so b0GYT8qyij5Xyb4/XwU54hkh/SHYay95u+LvQ+z2kUFx5tGX2fS333VM6VSjJ80YEzxITE iYE8No1V8Aplil1Fs80lxtFGa0FL2mNbj8TkRMJqC6/HIo+6Yj9RhjTxzAllFOFLAD+jXe nEegOeRm3LLn7XNtTeDWzSw/CIzl/3/vZ8OHR2h7XgvQowXhe3rxtV5mk40z1QEMCTKBp5 j/cuBk8EArFEx7/U5hZTR1qyE3WDCm37rB9mRo2q7Zh/vCTZduCJMjGMSo/nbw== From: Ralf Lici To: netdev@vger.kernel.org Cc: =?UTF-8?q?Daniel=20Gr=C3=B6ber?= , Antonio Quartulli , Ralf Lici , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide Date: Thu, 19 Mar 2026 16:12:24 +0100 Message-ID: <20260319151230.655687-16-ralf@mandelbit.com> In-Reply-To: <20260319151230.655687-1-ralf@mandelbit.com> References: <20260319151230.655687-1-ralf@mandelbit.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Daniel Gr=C3=B6ber Add user and reviewer documentation for the ipxlat virtual netdevice in Documentation/networking/ipxlat.rst. The document describes the datapath model, stateless IPv4/IPv6 address translation rules, ICMP handling, control-plane configuration, and test topology assumptions. It also records the intended runtime configuration contract and current behavior limits so deployment expectations are clear. Signed-off-by: Daniel Gr=C3=B6ber Signed-off-by: Ralf Lici --- Documentation/networking/ipxlat.rst | 190 ++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) create mode 100644 Documentation/networking/ipxlat.rst diff --git a/Documentation/networking/ipxlat.rst b/Documentation/networking= /ipxlat.rst new file mode 100644 index 000000000000..5a0ad02c05be --- /dev/null +++ b/Documentation/networking/ipxlat.rst @@ -0,0 +1,190 @@ +.. SPDX-License-Identifier: GPL-2.0+ +.. Copyright (C) 2026 Daniel Gr=C3=B6ber + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +IPXLAT - IPv6<>IPv4 IP/ICMP Translation (SIIT) +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +ipxlat (``CONFIG_IPXLAT=3Dy``) provides a virtual netdevice implementing +stateless IP packet translation between IP versions 6 and 4. This is a +building block for establishing layer 3 connectivity between otherwise +uncommunicative IPv6-only and/or IPv4-only networks. + + +Creation and Configuration Parameters +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +An ipxlat netdevice can be created and configured using YNL like so:: + + $ ip link add siit0 type ipxlat + + $ IID=3D$(cat /sys/class/net/siit0/ifindex) + + $ ADDR_HEX=3D$(python3 -c 'import ipaddress,sys; \ + print(ipaddress.IPv6Address(sys.argv[1]).packed.hex())' \ + 64:ff9b:: | tee /dev/stderr) + 0064ff9b000000000000000000000000 + + $ ./tools/net/ynl/pyynl/cli.py --family ipxlat --json '{"ifindex": $II= D, \ + "config": {"xlat-prefix6": "'$HEX_ADDR'", "prefix-len": 96} }' + +(TODO: Once implemented) A ipxlat netdevice can be configured using +iproute2:: + + $ ip link add siit0 type ipxlat [ OPTIONS ] + + # where OPTIONS can include (TODO: iproute2 patch): + # + # prefix ADDR (default 64:ff9b::/96) + # + # lowest-ipv6-mtu MTU (default 1280) + + +Introduction to Packet-level IPv6<>IPv4 Translation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +Translatable packets delivered into an ipxlat device as either of the IP +protocol versions loop-back as the other. Untranslatable packets are +rejected with ICMP errors of the same IP version as appropriate or dropped +silently if required by RFC-SIIT_. + +.. _RFC-SIIT: https://datatracker.ietf.org/doc/html/rfc7915 + +Supported upper layer protocols (TCP/UDP/ICMP) have their checksums +recomputed as-needed as part of translation. Unsupported IP protocols +(IPPROTO\_*) are passed through unmodified. This will make them fail at the +receiver except in special cases. + +Differences in IP layer semantic concerns are handled using several +different strategies, here we'll only give a high-level summary in the +areas of most friction: + Fragmentation approach, Path MTU Discovery (PMTUD), IP Options and Exten= sion + Headers. + +**Fragmentation Approach** (v4: on-path vs v6: end-to-end) is smoothed ove= r by: + | 4->6: Fragmenting (DF=3D0) IPv4 packets when needed. See "lowest-ipv6-m= tu". + | 6->4: Using on-path frag. down the line for v4 pkts smaller than 1260. + Details are tedious, check RFC-SIIT_. + +**PMTUD** is maintained by recalculating advised MTU values in ICMP +PKT_TOO_BIG and FRAG_NEEDED messages as they're being translated. Taking +into account the necessary header re-sizing and post-translation nexthop +MTU in the main routing table. + +**IP Options and IPv6 Extension Headers** except the Fragment Header are +dropped or ignored expept where more specific behaviour is specified in +RFC-SIIT_. + + +Address Translation +------------------- + +The ipxlat address translation algorithm is stateless, per RFC-ADDR_, all +possible IPv4 addressess are mapped one-to-one into the translation prefix, +optionally including a non-standard "suffix". See `RFC-ADDR Section 2.2 +`_. + +.. _RFC-ADDR: https://datatracker.ietf.org/doc/html/rfc6052 + +IPv6 addressess outside this prefix are rejected with ICMPv6 errors with +the notable exception of ICMPv6 errors originating from untranslatable +source addressess. These are translated to be sourced from the IPv4 Dummy +Address ``192.0.0.8`` (per I-D-dummy_) instead to maintain IPv4 traceroute +visibility. + +.. _I-D-dummy: + https://datatracker.ietf.org/doc/draft-ietf-v6ops-icmpext-xlat-v6only-s= ource/ + +In a basic bidirectional 6<>4 connectivity scenario this means IPv6 hosts +must be addressed wholly from inside the translation prefix and per +RFC-ADDR_. Plain vanilla SLAAC doesn't cut it here, static addressing or +DHCPv6 is needed, unless that is we introduce statefulnes (RFC-NAT64_) into +the mix. See below on that. + +.. _RFC-NAT64: https://datatracker.ietf.org/doc/html/rfc6146 + + +Stateful Translation (NAT64) +---------------------------- + +Using NAT64 has several drawbacks, it's necessary only when your control +over IPv4 or IPv6 addressing of hosts is limited. + +Using nftables we can turn a system into a stateful translator. For example +to make the IPv4 internet reachable to a IPv6-only LAN having this system +as it's default route, further assuming we have an IPv4 default route and +``192.0.2.1/32`` is routed to this system:: + + $ ip link add siit0 type ipxlat + $ ip link set dev siit0 up + $ ip route 192.0.2.1/32 dev siit0 + $ ip route 64:ff9b::/96 dev siit0 + $ sysctl -w net.ipv4.conf.all.forwarding=3D1 + $ sysctl -w net.ipv6.conf.all.forwarding=3D1 + $ nft -f- <`_, + ipxlat SHOULD drop UDPv4 zero checksum packets, yet we chose to always + recalculate checksums for unfragmented packets. + + If you want your translator to follow the SHOULD add a netfilter rule + dropping such packets. For example using ``nft(8)`` syntax:: + + nft add rule filter ip postrouting -- oifkind ipxlat udp checksum 0 lo= g drop + +- Per `RFC 6146 + `_, + Fragmented UDPv4 zero checksum recalculation by reassembly is not + supported. + +- I-D-dummy_: Adding a Node Identity Object to for IPv4-side traceroute + disambiguation is not yet supported. --=20 2.53.0