drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 11 +++++++++++ drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
Currently, there is no way to disable mlx5 vxlan offloading if vxlan
is enabled. We've (possibly) seen some minor udp rr and udp stream
regressions when enabling vxlan, and want a way to disable this
offloading. Also coupling vxlan offloading with vxlan enablement
generally limits the flexability of vxlan setups.
Add a new config option for mlx5 vxlan offloading specifically, so
that users can use vxlan without automatically opting in to the
offloading.
To keep the same behavior as before, the new config option is enabled
by default if vxlan is enabled.
Signed-off-by: Marc Harvey <marcharvey@google.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 11 +++++++++++
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h | 2 +-
3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 3c3e84100d5a..d2e091bdbafc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -218,3 +218,14 @@ config MLX5_EN_PSP
interfaces to PSP Stack which supports PSP crypto offload.
If unsure, say Y.
+
+config MLX5_VXLAN
+ bool "Mellanox Technologies vxlan offloading"
+ depends on VXLAN
+ depends on MLX5_CORE_EN
+ default y
+ help
+ mlx5 device offload support for vxlan. Makes the mlx5 driver always
+ attempt to initialize device handling of vxlan packets.
+
+ If unsure, say Y.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d39fe9c4a87c..6f2cc5414d07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -86,7 +86,7 @@ mlx5_core-$(CONFIG_MLX5_BRIDGE) += esw/bridge.o esw/bridge_mcast.o esw/bridge
mlx5_core-$(CONFIG_HWMON) += hwmon.o
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
-ifneq ($(CONFIG_VXLAN),)
+ifneq ($(CONFIG_MLX5_VXLAN),)
mlx5_core-y += lib/vxlan.o
endif
mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h
index 34ef662da35e..67d0c126c2ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h
@@ -50,7 +50,7 @@ static inline bool mlx5_vxlan_allowed(struct mlx5_vxlan *vxlan)
return !IS_ERR_OR_NULL(vxlan);
}
-#if IS_ENABLED(CONFIG_VXLAN)
+#if IS_ENABLED(CONFIG_MLX5_VXLAN)
struct mlx5_vxlan *mlx5_vxlan_create(struct mlx5_core_dev *mdev);
void mlx5_vxlan_destroy(struct mlx5_vxlan *vxlan);
int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port);
---
base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
change-id: 20260427-mlx5_vxlan-715699b8bbea
Best regards,
--
Marc Harvey <marcharvey@google.com>
On Tue, 28 Apr 2026 22:44:34 +0000 Marc Harvey wrote: > Currently, there is no way to disable mlx5 vxlan offloading if vxlan > is enabled. We've (possibly) seen some minor udp rr and udp stream > regressions when enabling vxlan, and want a way to disable this > offloading. Also coupling vxlan offloading with vxlan enablement > generally limits the flexability of vxlan setups. > > Add a new config option for mlx5 vxlan offloading specifically, so > that users can use vxlan without automatically opting in to the > offloading. > > To keep the same behavior as before, the new config option is enabled > by default if vxlan is enabled. Can we delay init of whatever makes the device slow down until the first vxlan port is registered? A kconfig level optimization of this sort will have rather limited applicability.
On Tue, Apr 28, 2026 at 6:46 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 28 Apr 2026 22:44:34 +0000 Marc Harvey wrote: > > Currently, there is no way to disable mlx5 vxlan offloading if vxlan > > is enabled. We've (possibly) seen some minor udp rr and udp stream > > regressions when enabling vxlan, and want a way to disable this > > offloading. Also coupling vxlan offloading with vxlan enablement > > generally limits the flexability of vxlan setups. > > > > Add a new config option for mlx5 vxlan offloading specifically, so > > that users can use vxlan without automatically opting in to the > > offloading. > > > > To keep the same behavior as before, the new config option is enabled > > by default if vxlan is enabled. > > Can we delay init of whatever makes the device slow down until the > first vxlan port is registered? A kconfig level optimization of this > sort will have rather limited applicability. There would still be the problem of wanting to use vxlan without vxlan offload. Agree that a kconfig might not be ideal, but it is currently guarded by a kconfig that offers no choice to opt out.
On Wed, 29 Apr 2026 17:46:36 -0700 Marc Harvey wrote: > On Tue, Apr 28, 2026 at 6:46 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 28 Apr 2026 22:44:34 +0000 Marc Harvey wrote: > > > Currently, there is no way to disable mlx5 vxlan offloading if vxlan > > > is enabled. We've (possibly) seen some minor udp rr and udp stream > > > regressions when enabling vxlan, and want a way to disable this > > > offloading. Also coupling vxlan offloading with vxlan enablement > > > generally limits the flexability of vxlan setups. > > > > > > Add a new config option for mlx5 vxlan offloading specifically, so > > > that users can use vxlan without automatically opting in to the > > > offloading. > > > > > > To keep the same behavior as before, the new config option is enabled > > > by default if vxlan is enabled. > > > > Can we delay init of whatever makes the device slow down until the > > first vxlan port is registered? A kconfig level optimization of this > > sort will have rather limited applicability. > > There would still be the problem of wanting to use vxlan without vxlan > offload. Agree that a kconfig might not be ideal, but it is currently > guarded by a kconfig that offers no choice to opt out. Are you aware of NETIF_F_RX_UDP_TUNNEL_PORT ? I haven't checked it does exactly what we need, but I recall there was a ethtool feature for this..
On Wed, Apr 29, 2026 at 7:01 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Are you aware of NETIF_F_RX_UDP_TUNNEL_PORT ? > I haven't checked it does exactly what we need, but I recall there was > a ethtool feature for this.. Thanks, I didn't know about that feature and mlx5 uses it. However, mlx5 unconditionally sets the `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` flag, which excludes port 4789 from the entire UDP tunnel core offload management (see `__udp_tunnel_nic_add_port()`). So using ethtool to disable `NETIF_F_RX_UDP_TUNNEL_PORT` will not disable vxlan offload for port 4789. I think a better approach would be to just remove this static automatic offloading for port 4789, mlx5 is the only driver using `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` anyway. However, there might be a reason for this, such as some supported hardware offloading vxlan on port 4789 by default even without commands from the driver. If mlx5 continues to use the `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` flag, then some change is required to fully disable vxlan offloading.
On Mon, 4 May 2026 15:44:26 -0700 Marc Harvey wrote: > > Are you aware of NETIF_F_RX_UDP_TUNNEL_PORT ? > > I haven't checked it does exactly what we need, but I recall there was > > a ethtool feature for this.. > > Thanks, I didn't know about that feature and mlx5 uses it. However, > mlx5 unconditionally sets the `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` > flag, which excludes port 4789 from the entire UDP tunnel core offload > management (see `__udp_tunnel_nic_add_port()`). > > So using ethtool to disable `NETIF_F_RX_UDP_TUNNEL_PORT` will not > disable vxlan offload for port 4789. > > I think a better approach would be to just remove this static > automatic offloading for port 4789, mlx5 is the only driver using > `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` anyway. However, there might > be a reason for this, such as some supported hardware offloading vxlan > on port 4789 by default even without commands from the driver. > > If mlx5 continues to use the `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` > flag, then some change is required to fully disable vxlan offloading. Sorry, I don't know mlx5 very well. Sounds like you have to talk to nVidia or/and run some experiments. The current patch is a no-go.
Hi Marc, On 05/05/2026 4:10, Jakub Kicinski wrote: > On Mon, 4 May 2026 15:44:26 -0700 Marc Harvey wrote: >>> Are you aware of NETIF_F_RX_UDP_TUNNEL_PORT ? >>> I haven't checked it does exactly what we need, but I recall there was >>> a ethtool feature for this.. >> >> Thanks, I didn't know about that feature and mlx5 uses it. However, >> mlx5 unconditionally sets the `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` >> flag, which excludes port 4789 from the entire UDP tunnel core offload >> management (see `__udp_tunnel_nic_add_port()`). >> >> So using ethtool to disable `NETIF_F_RX_UDP_TUNNEL_PORT` will not >> disable vxlan offload for port 4789. >> >> I think a better approach would be to just remove this static >> automatic offloading for port 4789, mlx5 is the only driver using >> `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` anyway. However, there might >> be a reason for this, such as some supported hardware offloading vxlan >> on port 4789 by default even without commands from the driver. >> >> If mlx5 continues to use the `UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN` >> flag, then some change is required to fully disable vxlan offloading. > > Sorry, I don't know mlx5 very well. Sounds like you have to talk > to nVidia or/and run some experiments. The current patch is a no-go. > The hardware offloads 4789 by default, hence the UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN, you cannot simply remove it. Have you tried disabling tx-udp_tnl-segmentation through ethtool?
On Tue, May 5, 2026 at 1:21 AM Gal Pressman <gal@nvidia.com> wrote: > > Hi Marc, > > On 05/05/2026 4:10, Jakub Kicinski wrote: > > > > Sorry, I don't know mlx5 very well. Sounds like you have to talk > > to nVidia or/and run some experiments. The current patch is a no-go. > > > > The hardware offloads 4789 by default, hence the > UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN, you cannot simply remove it. > > Have you tried disabling tx-udp_tnl-segmentation through ethtool? Thanks for the responses. Gal, by "the hardware offloads 4789 by default," does that mean the hardware offloads 4789 even without an explicit command from the driver? I ask because mlx5 does send a command to offload 4789, but perhaps this command is redundant and only for bookkeeping purposes. I haven't tried changing the offload related ethtool parameters, and I will, but I suspect it won't help in this specific case since the perceived regression involved non-tunneled traffic. If the hardware indeed offloads 4789 autonomously, then the perceived regression we saw might not have been real or related to enabling vxlan.
On 13/05/2026 19:20, Marc Harvey wrote: > On Tue, May 5, 2026 at 1:21 AM Gal Pressman <gal@nvidia.com> wrote: >> >> Hi Marc, >> >> On 05/05/2026 4:10, Jakub Kicinski wrote: >>> >>> Sorry, I don't know mlx5 very well. Sounds like you have to talk >>> to nVidia or/and run some experiments. The current patch is a no-go. >>> >> >> The hardware offloads 4789 by default, hence the >> UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN, you cannot simply remove it. >> >> Have you tried disabling tx-udp_tnl-segmentation through ethtool? > > Thanks for the responses. > > Gal, by "the hardware offloads 4789 by default," does that mean the > hardware offloads 4789 even without an explicit command from the > driver? I ask because mlx5 does send a command to offload 4789, but > perhaps this command is redundant and only for bookkeeping purposes. Yes, we add the port so features_check flow would see this port is offloaded. > > I haven't tried changing the offload related ethtool parameters, and I > will, but I suspect it won't help in this specific case since the > perceived regression involved non-tunneled traffic. What is the purpose of this patch then? > > If the hardware indeed offloads 4789 autonomously, then the perceived > regression we saw might not have been real or related to enabling > vxlan. Can you elaborate more on the regression? What is the issue? When did is start happening? Maybe you're experiencing this? https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/
© 2016 - 2026 Red Hat, Inc.