From nobody Mon May 25 00:08:04 2026 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98EA43A782C for ; Wed, 20 May 2026 08:10:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264631; cv=none; b=abggjcIS6Hp4dvQzRCaaOuQ3CLgVmP6MAdiLKYRsUmLmI1vIjEJt1Ui61/HU6PKVlEJgV4MK/z8ohjafwXcP4dNzRS5i1NCCiYw7NSLZjHCCPOxxL2Pl9bhshvnRoQttcOUj5GykIkJaInQIIl1/rcMu1EuOi/0h/j+0gPJ8ZdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264631; c=relaxed/simple; bh=73443vAbOnViwJT99Snz/ZFTOD/YCP1uLgVIWnv0a50=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t79W7QQo2GmqKt1OjPOP6p675NPsw76E178QuzF0XfXFtYxyIOpVRCJNyrIpKpOolmudC9NQxxxt7KC9ogX1HLm5jZY3Lsnm7z/Bmrqc0pdStfVn3iGXiM6/6AjpiaiXhnietWFOIQQrqrwge6YpD6tgj3pFAW+Q1B3L8HDLsy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=f9iquGcG; arc=none smtp.client-ip=209.85.216.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="f9iquGcG" Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-36931e4f5e8so3884653a91.2 for ; Wed, 20 May 2026 01:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779264629; x=1779869429; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vuIq0omjUggvBDRqrxdMgf0DV4raWV/SHuyYkGAo0Y8=; b=f9iquGcGVwemiYfwxzl/zSq+cRUMYjMTzWfgdQcuir2KNzPcJDCwh6P6udyRbn0fIB Bulj4BvJ3zYqcaGaleOABzu9HoGYKVigMnjeK2j0o0dx92FEQzx45i+J6c6EnVE0C295 20QpeP+sK/ey2AVtD/djN2RaV96GNjJ+WAVNYblQgGRBj15ZA+gFYUOnmnTyaQyRbkD5 uWbbgsHrCtemyx+Fp2n6uwDZPXejLTrSAbo/SUh0l0TeC/sc6h5TUmQ/ek14QqsAsvO/ e0s3nUecv1YfIWR6Gber9LDHtHXqNsg4CT9VZyFX3Jt01ZsOKYBT7vhgiTjyh+uhNkwC hiUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779264629; x=1779869429; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=vuIq0omjUggvBDRqrxdMgf0DV4raWV/SHuyYkGAo0Y8=; b=UxWvNh4vHTuBaV6L0CZGiGeb53q8APJnVKRIw0sILyewmybCG6sClC8zk49Ihb1xQk nGaMu/QhVeODJrq1Nol/rRQvfSogmt5+S1qFZ0/BPZzItlWAhU20340R+3LAjwENE59w nn7v7GnALW7CEQmtF7XnlE6ishISRpqd5eFL2IizV2mkXdlH2T/c6JMgT7jEW85+l0au 3fw56tO04Nziq1/2Kykjq+3RzbFA70cefsUwlD9ALEPEWUcxdQV5F2khck2a8oa1nNle JjA79GcKHgGgItm9OA9RgRkEB3yQH6ao9UqV058zMoM2KtHRmvgCNsqYstPA9IFlrdR1 jx3w== X-Forwarded-Encrypted: i=1; AFNElJ9IZh9gPF46Mk5NrRaiPMCnzFei4/pWFxdwbVVQCKAJr17ah8liT9Ea8R40ZRWOyBqM5w1f1fCnxrge+M4=@vger.kernel.org X-Gm-Message-State: AOJu0YzyP7IcuxEnVAM8lbd/bAXvhOt2sIB0vV+iooDuXa4myCDno7TO wlQZEaxTOypb8DwUkjwqZ4IkGbgH3JMPngw1OhSeErS5UY+qe0tJY6vr X-Gm-Gg: Acq92OHeSLtQ8y6vLnogs5Eq5g04Vj9iACTMGw8H07H85Q5B+DgAqGXOhG0SrElDHud rVUUCFG0H50fX2d0mK+fOdhGmc4mDgdFdPjYRpavxH0X+y+v5dDEO1VJY1ezaTWXHn8TGfDxI3X n3MZK3CLepw3B1aaiVRjNN58kLLYj6AkeLNQ3ER4QkiSBa+X6qej+JTvO0QGdAjaNpjRoKq1PMv WxOxjw/malDIBv6POptpH9fwP8m/KEjXb8R6F8mwxQfiru+ECrqY/YswZyDBXibdqObjWPwUGsE aiveO0LfIrl8aCjJxPztkrDoA3MwB6g6rYxDlGbfTkDH+MUXiv+CP3MqMuUvGpVLiXtUAl6rXZb MnRKPhDb4/PZVsbe4ob94MOAY8xcSSVWfUbL8NWxnJv3iFJ4suWzapzMYDhRDtLQuo5nmOYUxQq nuxzleh2937lPvhLCj2fB/1WrYFP/IdWFSObL6ndysBg== X-Received: by 2002:a17:902:8304:b0:2bc:8beb:525b with SMTP id d9443c01a7336-2bd7e8506a1mr188895445ad.18.1779264628831; Wed, 20 May 2026 01:10:28 -0700 (PDT) Received: from mincom1 ([14.67.155.25]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5d116287sm211632735ad.68.2026.05.20.01.10.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 01:10:28 -0700 (PDT) From: Jihong Min To: netdev@vger.kernel.org Cc: Jay Vosburgh , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Steffen Klassert , Herbert Xu , linux-kernel@vger.kernel.org, Jihong Min Subject: [PATCH RFC net-next 1/4] xfrm: add a lower-device offload handle resolver Date: Wed, 20 May 2026 17:10:01 +0900 Message-ID: <20260520081004.2232091-2-hurryman2212@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260520081004.2232091-1-hurryman2212@gmail.com> References: <20260520081004.2232091-1-hurryman2212@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" An upper device can own an XFRM offload state while the selected datapath device is one of its lower devices. A single xso.offload_handle is not enough for that case because each lower device may return a different hardware handle for the same state. Add an optional xfrmdev_ops resolver and a lower-driver opt-in flag so helper-aware lower drivers can resolve the handle for the lower device they are transmitting or receiving on. Keep the direct-device path as the fast path and clear upper private state when device offload state is freed. Assisted-by: Codex:gpt-5.5 Signed-off-by: Jihong Min --- include/linux/netdevice.h | 27 ++++++++++++++++++++++ include/net/xfrm.h | 48 +++++++++++++++++++++++++++++++++++++-- net/xfrm/xfrm_state.c | 1 + 3 files changed, 74 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0e1e581efc5a..b4e844e90db8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1033,6 +1033,16 @@ struct netdev_bpf { #define XDP_WAKEUP_TX (1 << 1) =20 #ifdef CONFIG_XFRM_OFFLOAD +/* + * xfrmdev_ops.flags values. + * + * XFRMDEV_OPS_F_LOWER_HANDLE marks a lower driver whose datapath gets XFRM + * hardware handles with xfrm_dev_state_lower_handle(). This is required w= hen + * the XFRM state is owned by an upper device because xso.offload_handle m= ay + * not contain the handle for the current lower device. + */ +#define XFRMDEV_OPS_F_LOWER_HANDLE BIT(0) + struct xfrmdev_ops { int (*xdo_dev_state_add)(struct net_device *dev, struct xfrm_state *x, @@ -1048,6 +1058,23 @@ struct xfrmdev_ops { int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack = *extack); void (*xdo_dev_policy_delete) (struct xfrm_policy *x); void (*xdo_dev_policy_free) (struct xfrm_policy *x); + /* + * Resolve the offload handle for lower_dev when this upper device + * owns the XFRM state. This belongs in xfrmdev_ops because the + * resolver is an XFRM offload operation of the device that owns the + * state. Keeping the dispatch here avoids a bonding-specific dependency + * in the XFRM helper. + * + * Upper devices like bonding may implement this callback when they + * keep the lower-device handle mapping. Lower devices must leave it + * NULL because they do not own that map. Lower drivers advertise + * that their datapath calls the resolver with + * XFRMDEV_OPS_F_LOWER_HANDLE instead. + */ + unsigned long (*xdo_dev_state_lower_handle)(struct net_device *dev, + struct xfrm_state *x, + struct net_device *lower_dev); + u32 flags; }; #endif =20 diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 10d3edde6b2f..b61e2c023eb4 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -162,6 +162,10 @@ struct xfrm_dev_offload { */ struct net_device *real_dev; unsigned long offload_handle; + /* Private state owned by dev in this structure when that device is an + * upper device. Lower drivers must not use this directly. + */ + void __rcu *upper_priv; u8 dir : 2; u8 type : 2; u8 flags : 2; @@ -1700,6 +1704,37 @@ struct xfrm_state *xfrm_state_lookup_byspi(struct ne= t *net, __be32 spi, int xfrm_state_check_expire(struct xfrm_state *x); void xfrm_state_update_stats(struct net *net); #ifdef CONFIG_XFRM_OFFLOAD +/* + * Return the hardware offload handle lower_dev should use for x. States + * installed directly on lower_dev use xso.offload_handle. States owned by= an + * upper device are resolved through the owner's xdo_dev_state_lower_handl= e(). + * Bonding uses that callback for replicated XFRM states because it instal= ls the + * state on each slave and keeps the per-slave hardware handles internally. + */ +static inline unsigned long +xfrm_dev_state_lower_handle(struct xfrm_state *x, struct net_device *lower= _dev) +{ + struct xfrm_dev_offload *xdo =3D &x->xso; + struct net_device *real_dev =3D READ_ONCE(xdo->real_dev); + struct net_device *dev =3D READ_ONCE(xdo->dev); + unsigned long offload_handle =3D READ_ONCE(xdo->offload_handle); + + if (!dev || !lower_dev) + return 0; + + if (dev =3D=3D lower_dev) + return offload_handle; + + if (dev->xfrmdev_ops && dev->xfrmdev_ops->xdo_dev_state_lower_handle) + return dev->xfrmdev_ops->xdo_dev_state_lower_handle(dev, x, + lower_dev); + + if (real_dev =3D=3D lower_dev) + return offload_handle; + + return 0; +} + static inline void xfrm_dev_state_update_stats(struct xfrm_state *x) { struct xfrm_dev_offload *xdo =3D &x->xso; @@ -1711,6 +1746,12 @@ static inline void xfrm_dev_state_update_stats(struc= t xfrm_state *x) =20 } #else +static inline unsigned long +xfrm_dev_state_lower_handle(struct xfrm_state *x, struct net_device *lower= _dev) +{ + return 0; +} + static inline void xfrm_dev_state_update_stats(struct xfrm_state *x) {} #endif void xfrm_state_insert(struct xfrm_state *x); @@ -2089,15 +2130,18 @@ static inline void xfrm_dev_state_advance_esn(struc= t xfrm_state *x) static inline bool xfrm_dst_offload_ok(struct dst_entry *dst) { struct xfrm_state *x =3D dst->xfrm; + bool has_offload_state; struct xfrm_dst *xdst; =20 if (!x || !x->type_offload) return false; =20 xdst =3D (struct xfrm_dst *) dst; - if (!x->xso.offload_handle && !xdst->child->xfrm) + has_offload_state =3D x->xso.offload_handle || + rcu_access_pointer(x->xso.upper_priv); + if (!has_offload_state && !xdst->child->xfrm) return true; - if (x->xso.offload_handle && (x->xso.dev =3D=3D xfrm_dst_path(dst)->dev) = && + if (has_offload_state && (x->xso.dev =3D=3D xfrm_dst_path(dst)->dev) && !xdst->child->xfrm) return true; =20 diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 686014d39429..584f913751bf 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -791,6 +791,7 @@ void xfrm_dev_state_free(struct xfrm_state *x) if (dev->xfrmdev_ops->xdo_dev_state_free) dev->xfrmdev_ops->xdo_dev_state_free(dev, x); WRITE_ONCE(xso->dev, NULL); + RCU_INIT_POINTER(xso->upper_priv, NULL); xso->type =3D XFRM_DEV_OFFLOAD_UNSPECIFIED; netdev_put(dev, &xso->dev_tracker); } --=20 2.53.0 From nobody Mon May 25 00:08:04 2026 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E4B93A9018 for ; Wed, 20 May 2026 08:10:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264634; cv=none; b=mlHKLB8bvqzrUDo8b8WF3l5fcYPuoLBZ/DsVPnS1PSbyaGlTtpBFt0XAJo3kZOdP1V8YFhmp6TKHvZFrlrSQksr4bGlbm04IorjZ4Q5kwsopD+gcotWOg3VzYG34JnGskxKa840EraZWlLCTLqJCvtzBH95xVzzsItMQTPmQahU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264634; c=relaxed/simple; bh=oQK00aTYLUCPZk7LjkLs573X2Q/yJqcD48CEsM0ghWk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OmGI2lwbjF/towIRDiIp/vJw2Xf3yPJDZuwNQO5ko68fo0wtkMnvUw8SwGV9o9KMSaixiqepRif69VvUH92GCfoPVULrUOnvd0vBET9m4u26K+rI/RguQWOm2OXucnLBCge9b1nyArCHzGaWMpiv+PxUSRtnnFDK/6yeDTHsqpI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nbWnDmQ5; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nbWnDmQ5" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2b4583f0a1aso29063525ad.3 for ; Wed, 20 May 2026 01:10:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779264632; x=1779869432; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wQMvPggW1JJLFBNYh7U2UPVPXWL1+VWscKlQEhLFbKo=; b=nbWnDmQ5oE9CJtgHRe0NWV4NJp4XjN2AoEAPGLdZTGsOe5wMWecr/9tkbyjuYCPhkj fDMLaQ74eFGVY2etKPpz05g82qMnNDtIKoUw6ej71tTmFpNr5Wj18iJASlLPEwuPSeo8 U7LAierwjvxydVuDV1bE9OdHoB2pLp7HOguf5pPsl6fBQEYCvfMbnUGkKHJl7OhRr71V UvqSBYZ+UODPB4qNiBVuW/g7KNdpvGcYg8OIPWhAkDM/K4TSqDuJLqNAXeqZvRl47Xq5 4rnfQEIuja69CDgPA5MEi8xKGXq8x7G1RILKiJ9ZLPzizWddFAqS/kLDdYfDkDiQU2Yv Gljg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779264632; x=1779869432; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wQMvPggW1JJLFBNYh7U2UPVPXWL1+VWscKlQEhLFbKo=; b=KMr46484E6+k+iD0K5feF3150jGd/5CoHsQ0XpiOLTTsYs+kSTYXfjHxFHzl/9eSAm Sgqmr82Z55dVtEDc1DhKz6AHYbK8zol3kfMLhQzywe4jrzYzm7Zkc8ZhEhqpJ122y0If EF3BIcsWOytcfGIkaZ9OC/NTinvUy34dSq7OTp3VkaWwZnr9mS7EBLcate+qIVuF7hvk dTIRW98A5YlC0ZVV4fb69k2qEn98JoYZznEgKrX9sRst2m6vM9ThDFwb/ZWAvfUEG4NB 6XJD2A3GfYtiZAluREc9uUl8kjbsfdeV2oxLE7jolgBSUG3JhQ4oAjQuCtu0ppfmAGd/ oEuw== X-Forwarded-Encrypted: i=1; AFNElJ/Abm3YDzL+1OsILgKMst6t1JEfKxQ/Z8vVtpl6bCkXcWXqXIcyub25eSCZfmUenoxNu0mL6MMCwcHmHls=@vger.kernel.org X-Gm-Message-State: AOJu0Ywo5jQG/qW4DnseOM9qI0LybjoOfUQjHy6UxL/VIFMHvUWfeDxH 4FvYbam1N3lZl0rL8RiYFdc8mmJ+NqHv///EJ7c8mbu+d65QnZIFNZPp X-Gm-Gg: Acq92OFR1MBib6YVFmZCYEzWJCzRtlrWNQUcUjU3h4XE2bhWrbSB1y6sNXLu/oqbW71 djfghu+n60OQ28FYHQxxOoVXmXUI/xG3pSGdOlrODCOBAyFf14BZDfbAO7A6uIbhJBN3q5aTxOa zDbXNpXABElzDT+UWfwmqOoEHoZ0fhoU3b8j6AotXv8WAjxNt3KKVXgWpFpMQStH4Fq+dkS1COZ 6JRQsBtxXyJnfkDFCF0IvZLGJVHGMt+qi/V8QJaSm4465zJl/5QNzFjrGI6IkHjbr1SM3nFF/Fy 72yK0t//N/72ZUh2vWmlBAwyKldFKuj0jO/wq8sgW9+SWbK4FgFNjo3iHbRoauT0qceWMhOwzK2 ECdKcehY50pM/Zo0rzxpd9aa9FrCWoBLZUjvhvL/6b4shj1M506ggTZt+18kRbClxg5OAGZlqkw 0xoRQPY+GqcXkxUorihlmT X-Received: by 2002:a17:902:eccf:b0:2be:3626:dd42 with SMTP id d9443c01a7336-2be3626e03cmr51404125ad.6.1779264631463; Wed, 20 May 2026 01:10:31 -0700 (PDT) Received: from mincom1 ([14.67.155.25]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5d116287sm211632735ad.68.2026.05.20.01.10.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 01:10:31 -0700 (PDT) From: Jihong Min To: netdev@vger.kernel.org Cc: Jay Vosburgh , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Steffen Klassert , Herbert Xu , linux-kernel@vger.kernel.org, Jihong Min Subject: [PATCH RFC net-next 2/4] bonding: replicate XFRM offload state across LAG slaves Date: Wed, 20 May 2026 17:10:02 +0900 Message-ID: <20260520081004.2232091-3-hurryman2212@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260520081004.2232091-1-hurryman2212@gmail.com> References: <20260520081004.2232091-1-hurryman2212@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" LAG bonds need to install the same IPsec/XFRM state on every eligible lower device, but each lower device may return a different hardware handle. Add a replicated bonding-private XFRM state object that stores per-lower-device instances and handles. Use the replicated model for 802.3ad and balance-xor with layer3+4 hashing. Install the state on every eligible running slave, capture each lower handle, and roll back in reverse order on failure. Keep active-backup on the existing single-lower path and expose a bonding resolver for lower drivers that call xfrm_dev_state_lower_handle(). Assisted-by: Codex:gpt-5.5 Signed-off-by: Jihong Min --- drivers/net/bonding/bond_main.c | 578 +++++++++++++++++++++++++++++++- include/net/bonding.h | 29 +- 2 files changed, 595 insertions(+), 12 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_mai= n.c index af82a3df2c5d..66435de852e9 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -455,6 +455,432 @@ static struct net_device *bond_ipsec_dev(struct xfrm_= state *xs) return slave->dev; } =20 +static void bond_ipsec_inst_rcu_free(struct rcu_head *rcu) +{ + struct bond_ipsec_inst *inst; + + inst =3D container_of(rcu, struct bond_ipsec_inst, rcu); + netdev_put(inst->real_dev, &inst->dev_tracker); + kfree(inst); +} + +static void bond_ipsec_rcu_free(struct rcu_head *rcu) +{ + struct bond_ipsec *ipsec; + + ipsec =3D container_of(rcu, struct bond_ipsec, rcu); + kfree(ipsec); +} + +static bool bond_ipsec_slave_has_xfrm_ops(struct net_device *real_dev) +{ + const struct xfrmdev_ops *ops; + + if (!real_dev || netif_is_bond_master(real_dev)) + return false; + + ops =3D real_dev->xfrmdev_ops; + if (!ops) + return false; + + return ops->xdo_dev_state_add && ops->xdo_dev_state_delete; +} + +static bool bond_ipsec_lag_slave_has_ops(struct net_device *real_dev) +{ + return bond_ipsec_slave_has_xfrm_ops(real_dev) && + real_dev->xfrmdev_ops->flags & XFRMDEV_OPS_F_LOWER_HANDLE; +} + +static bool bond_ipsec_lag_slave_ok(struct net_device *real_dev) +{ + return (real_dev->features & NETIF_F_HW_ESP) && + bond_ipsec_lag_slave_has_ops(real_dev); +} + +static void bond_ipsec_lag_free_instances(struct bond_ipsec *ipsec) +{ + struct bond_ipsec_inst *inst, *tmp; + + list_for_each_entry_safe(inst, tmp, &ipsec->inst_list, list) { + list_del_rcu(&inst->list); + call_rcu(&inst->rcu, bond_ipsec_inst_rcu_free); + } +} + +static void bond_ipsec_lag_call_inst(struct xfrm_state *xs, + struct bond_ipsec_inst *inst, + bool delete_state, + bool free_state) +{ + unsigned long bond_handle =3D xs->xso.offload_handle; + struct net_device *bond_real_dev =3D xs->xso.real_dev; + const struct xfrmdev_ops *ops =3D inst->real_dev->xfrmdev_ops; + + if (!inst->lower_handle) + return; + + if (!ops) + return; + + xs->xso.real_dev =3D inst->real_dev; + xs->xso.offload_handle =3D inst->lower_handle; + if (delete_state) { + WRITE_ONCE(inst->added, false); + if (!inst->deleted && ops->xdo_dev_state_delete) { + ops->xdo_dev_state_delete(inst->real_dev, xs); + xs->xso.offload_handle =3D inst->lower_handle; + inst->deleted =3D true; + } + } + if (free_state && ops->xdo_dev_state_free) + ops->xdo_dev_state_free(inst->real_dev, xs); + if (free_state) + inst->lower_handle =3D 0; + + xs->xso.real_dev =3D bond_real_dev; + xs->xso.offload_handle =3D bond_handle; +} + +static void bond_ipsec_lag_call_state(struct xfrm_state *xs, + struct bond_ipsec *ipsec, + bool delete_state, + bool free_state) +{ + struct bond_ipsec_inst *inst; + + list_for_each_entry_reverse(inst, &ipsec->inst_list, list) { + bond_ipsec_lag_call_inst(xs, inst, delete_state, free_state); + } +} + +static int bond_ipsec_lag_add_inst(struct xfrm_state *xs, + struct bond_ipsec_inst *inst, + struct netlink_ext_ack *extack) +{ + unsigned long bond_handle =3D xs->xso.offload_handle; + struct net_device *bond_real_dev =3D xs->xso.real_dev; + const struct xfrmdev_ops *ops; + int err; + + if (!bond_ipsec_lag_slave_ok(inst->real_dev)) + return -EOPNOTSUPP; + + ops =3D inst->real_dev->xfrmdev_ops; + xs->xso.real_dev =3D inst->real_dev; + xs->xso.offload_handle =3D 0; + err =3D ops->xdo_dev_state_add(inst->real_dev, xs, extack); + if (err) + goto out; + + inst->lower_handle =3D xs->xso.offload_handle; + if (!inst->lower_handle) { + err =3D -EINVAL; + NL_SET_ERR_MSG_MOD(extack, "Slave did not return an IPsec offload handle= "); + if (ops->xdo_dev_state_delete) + ops->xdo_dev_state_delete(inst->real_dev, xs); + if (ops->xdo_dev_state_free) + ops->xdo_dev_state_free(inst->real_dev, xs); + goto out; + } + + inst->deleted =3D false; + inst->added =3D true; + +out: + xs->xso.real_dev =3D bond_real_dev; + xs->xso.offload_handle =3D bond_handle; + return err; +} + +static int bond_ipsec_lag_add_sa(struct net_device *bond_dev, + struct xfrm_state *xs, + struct netlink_ext_ack *extack) +{ + struct bonding *bond =3D netdev_priv(bond_dev); + struct bond_ipsec_inst *inst; + struct bond_ipsec *ipsec; + struct list_head *iter; + struct slave *slave; + int err =3D 0; + int count =3D 0; + + if (xs->xso.type !=3D XFRM_DEV_OFFLOAD_CRYPTO) { + NL_SET_ERR_MSG_MOD(extack, "LAG supports only XFRM crypto offload"); + return -EOPNOTSUPP; + } + + if (xs->props.flags & XFRM_STATE_ESN) { + NL_SET_ERR_MSG_MOD(extack, "LAG does not support XFRM ESN offload"); + return -EOPNOTSUPP; + } + + ipsec =3D kmalloc_obj(*ipsec); + if (!ipsec) + return -ENOMEM; + + ipsec->xs =3D xs; + ipsec->replicated =3D true; + INIT_LIST_HEAD(&ipsec->list); + INIT_LIST_HEAD(&ipsec->inst_list); + + /* Serialize with slave down/remove and LAG eligibility changes so they + * cannot miss lower SAs installed before this state is published. + */ + mutex_lock(&bond->ipsec_lock); + if (bond->ipsec_lag_blocked) { + err =3D -EAGAIN; + NL_SET_ERR_MSG_MOD(extack, "Bond LAG XFRM state add is blocked"); + goto err_free_unlock; + } + if (!(bond_dev->features & NETIF_F_HW_ESP)) { + err =3D -EOPNOTSUPP; + NL_SET_ERR_MSG_MOD(extack, "Bond IPsec offload is disabled"); + goto err_free_unlock; + } + if (!bond_mode_can_use_lag_xfrm(bond)) { + err =3D -EAGAIN; + NL_SET_ERR_MSG_MOD(extack, "Bond LAG XFRM eligibility changed"); + goto err_free_unlock; + } + rcu_read_lock(); + bond_for_each_slave_rcu(bond, slave, iter) { + struct net_device *real_dev =3D slave->dev; + + if (!netif_running(real_dev)) + continue; + + if (!bond_ipsec_lag_slave_ok(real_dev)) { + err =3D -EOPNOTSUPP; + break; + } + + inst =3D kzalloc_obj(*inst, GFP_ATOMIC); + if (!inst) { + err =3D -ENOMEM; + break; + } + + inst->real_dev =3D real_dev; + netdev_hold(real_dev, &inst->dev_tracker, GFP_ATOMIC); + list_add_tail(&inst->list, &ipsec->inst_list); + count++; + } + rcu_read_unlock(); + + if (!err && !count) + err =3D -ENODEV; + if (err) { + if (err =3D=3D -EOPNOTSUPP) + NL_SET_ERR_MSG_MOD(extack, "Not all slaves support IPsec offload"); + goto err_free_unlock; + } + + list_for_each_entry(inst, &ipsec->inst_list, list) { + err =3D bond_ipsec_lag_add_inst(xs, inst, extack); + if (err) + goto err_delete; + } + + xs->xso.real_dev =3D NULL; + xs->xso.offload_handle =3D 0; + if (!bond_mode_can_use_lag_xfrm(bond)) { + err =3D -EAGAIN; + NL_SET_ERR_MSG_MOD(extack, "Bond LAG XFRM eligibility changed"); + goto err_delete; + } + rcu_assign_pointer(xs->xso.upper_priv, ipsec); + list_add(&ipsec->list, &bond->ipsec_list); + mutex_unlock(&bond->ipsec_lock); + + return 0; + +err_delete: + bond_ipsec_lag_call_state(xs, ipsec, true, true); + xs->xso.real_dev =3D NULL; + xs->xso.offload_handle =3D 0; + RCU_INIT_POINTER(xs->xso.upper_priv, NULL); +err_free_unlock: + mutex_unlock(&bond->ipsec_lock); + bond_ipsec_lag_free_instances(ipsec); + kfree(ipsec); + return err; +} + +static void bond_ipsec_lag_flush_pending(struct bonding *bond) +{ + struct bond_ipsec *ipsec, *tmp; + + /* Caller must hold ipsec_lock to serialize with LAG SA add. */ + list_for_each_entry_safe(ipsec, tmp, &bond->ipsec_list, list) { + struct xfrm_dev_offload *xso; + struct xfrm_state *xs; + struct net *net; + bool pending; + + if (!ipsec->replicated) + continue; + + xs =3D ipsec->xs; + net =3D xs_net(xs); + spin_lock_bh(&net->xfrm.xfrm_state_lock); + pending =3D hlist_unhashed(&xs->bydst) && + xs->km.state !=3D XFRM_STATE_DEAD; + spin_unlock_bh(&net->xfrm.xfrm_state_lock); + if (!pending) + continue; + + xso =3D &xs->xso; + list_del(&ipsec->list); + RCU_INIT_POINTER(xso->upper_priv, NULL); + bond_ipsec_lag_call_state(xs, ipsec, true, true); + bond_ipsec_lag_free_instances(ipsec); + call_rcu(&ipsec->rcu, bond_ipsec_rcu_free); + + xso->real_dev =3D NULL; + xso->offload_handle =3D 0; + if (xso->dev =3D=3D bond->dev) { + WRITE_ONCE(xso->dev, NULL); + xso->dir =3D 0; + xso->type =3D XFRM_DEV_OFFLOAD_UNSPECIFIED; + netdev_put(bond->dev, &xso->dev_tracker); + xfrm_unset_type_offload(xs); + } + } +} + +void bond_ipsec_lag_begin_flush(struct bonding *bond) +{ + mutex_lock(&bond->ipsec_lock); + bond->ipsec_lag_blocked =3D true; + bond_ipsec_lag_flush_pending(bond); + mutex_unlock(&bond->ipsec_lock); +} + +void bond_ipsec_lag_end_flush(struct bonding *bond) +{ + mutex_lock(&bond->ipsec_lock); + bond->ipsec_lag_blocked =3D false; + mutex_unlock(&bond->ipsec_lock); +} + +static void bond_ipsec_lag_remove_slave(struct bonding *bond, + struct net_device *real_dev) +{ + struct bond_ipsec_inst *inst, *tmp; + struct bond_ipsec *ipsec; + bool removed =3D false; + + if (!bond_mode_can_use_lag_xfrm(bond)) + return; + + mutex_lock(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (!ipsec->replicated) + continue; + + list_for_each_entry(inst, &ipsec->inst_list, list) { + if (inst->real_dev !=3D real_dev) + continue; + + WRITE_ONCE(inst->added, false); + removed =3D true; + } + } + if (!removed) + goto out; + + synchronize_net(); + + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (!ipsec->replicated) + continue; + + list_for_each_entry_safe(inst, tmp, &ipsec->inst_list, list) { + if (inst->real_dev !=3D real_dev) + continue; + + bond_ipsec_lag_call_inst(ipsec->xs, inst, true, true); + list_del_rcu(&inst->list); + call_rcu(&inst->rcu, bond_ipsec_inst_rcu_free); + } + } +out: + mutex_unlock(&bond->ipsec_lock); +} + +static int bond_ipsec_lag_add_slave(struct bonding *bond, + struct slave *slave, + struct netlink_ext_ack *extack) +{ + struct net_device *real_dev =3D slave->dev; + struct bond_ipsec_inst *inst; + struct bond_ipsec *ipsec; + bool have_states =3D false; + bool slave_ok; + int err =3D 0; + + if (!bond_mode_can_use_lag_xfrm(bond) || !netif_running(real_dev)) + return 0; + + slave_ok =3D bond_ipsec_lag_slave_ok(real_dev); + + mutex_lock(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + bool found =3D false; + + if (!ipsec->replicated) + continue; + have_states =3D true; + + if (ipsec->xs->km.state =3D=3D XFRM_STATE_DEAD) + continue; + + if (!slave_ok) { + err =3D -EOPNOTSUPP; + break; + } + + list_for_each_entry(inst, &ipsec->inst_list, list) { + if (inst->real_dev =3D=3D real_dev) { + found =3D true; + break; + } + } + if (found) + continue; + + inst =3D kzalloc_obj(*inst, GFP_KERNEL); + if (!inst) { + err =3D -ENOMEM; + break; + } + + inst->real_dev =3D real_dev; + netdev_hold(real_dev, &inst->dev_tracker, GFP_KERNEL); + + err =3D bond_ipsec_lag_add_inst(ipsec->xs, inst, extack); + if (err) { + netdev_put(real_dev, &inst->dev_tracker); + kfree(inst); + break; + } + + list_add_tail_rcu(&inst->list, &ipsec->inst_list); + } + mutex_unlock(&bond->ipsec_lock); + + if (err && have_states) { + slave_warn(bond->dev, real_dev, + "failed to replicate IPsec SA, flushing bond states\n"); + bond_ipsec_lag_begin_flush(bond); + xfrm_dev_state_flush(dev_net(bond->dev), bond->dev, true); + bond_ipsec_lag_end_flush(bond); + } + + return err; +} + /** * bond_ipsec_add_sa - program device with a security association * @bond_dev: pointer to the bond net device @@ -475,8 +901,15 @@ static int bond_ipsec_add_sa(struct net_device *bond_d= ev, if (!bond_dev) return -EINVAL; =20 - rcu_read_lock(); bond =3D netdev_priv(bond_dev); + if (bond_mode_can_use_lag_xfrm(bond)) + return bond_ipsec_lag_add_sa(bond_dev, xs, extack); + if (BOND_MODE(bond) !=3D BOND_MODE_ACTIVEBACKUP) { + NL_SET_ERR_MSG_MOD(extack, "Bond mode does not support IPsec offload"); + return -EOPNOTSUPP; + } + + rcu_read_lock(); slave =3D rcu_dereference(bond->curr_active_slave); real_dev =3D slave ? slave->dev : NULL; netdev_hold(real_dev, &tracker, GFP_ATOMIC); @@ -504,7 +937,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_de= v, if (!err) { xs->xso.real_dev =3D real_dev; ipsec->xs =3D xs; + ipsec->replicated =3D false; INIT_LIST_HEAD(&ipsec->list); + INIT_LIST_HEAD(&ipsec->inst_list); mutex_lock(&bond->ipsec_lock); list_add(&ipsec->list, &bond->ipsec_list); mutex_unlock(&bond->ipsec_lock); @@ -523,6 +958,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) struct bond_ipsec *ipsec; struct slave *slave; =20 + if (BOND_MODE(bond) !=3D BOND_MODE_ACTIVEBACKUP) + return; + slave =3D rtnl_dereference(bond->curr_active_slave); real_dev =3D slave ? slave->dev : NULL; if (!real_dev) @@ -540,6 +978,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) } =20 list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->replicated) + continue; + /* If new state is added before ipsec_lock acquired */ if (ipsec->xs->xso.real_dev =3D=3D real_dev) continue; @@ -568,6 +1009,19 @@ static void bond_ipsec_add_sa_all(struct bonding *bon= d) mutex_unlock(&bond->ipsec_lock); } =20 +static struct bond_ipsec *bond_ipsec_find(struct bonding *bond, + struct xfrm_state *xs) +{ + struct bond_ipsec *ipsec; + + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->xs =3D=3D xs) + return ipsec; + } + + return NULL; +} + /** * bond_ipsec_del_sa - clear out this specific SA * @bond_dev: pointer to the bond net device @@ -577,8 +1031,24 @@ static void bond_ipsec_del_sa(struct net_device *bond= _dev, struct xfrm_state *xs) { struct net_device *real_dev; + struct bond_ipsec *ipsec; + struct bonding *bond; + + if (!bond_dev) + return; + + bond =3D netdev_priv(bond_dev); =20 - if (!bond_dev || !xs->xso.real_dev) + mutex_lock(&bond->ipsec_lock); + ipsec =3D bond_ipsec_find(bond, xs); + if (ipsec && ipsec->replicated) { + bond_ipsec_lag_call_state(xs, ipsec, true, false); + mutex_unlock(&bond->ipsec_lock); + return; + } + mutex_unlock(&bond->ipsec_lock); + + if (!xs->xso.real_dev) return; =20 real_dev =3D xs->xso.real_dev; @@ -600,6 +1070,9 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) struct bond_ipsec *ipsec; struct slave *slave; =20 + if (BOND_MODE(bond) !=3D BOND_MODE_ACTIVEBACKUP) + return; + slave =3D rtnl_dereference(bond->curr_active_slave); real_dev =3D slave ? slave->dev : NULL; if (!real_dev) @@ -607,6 +1080,9 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) =20 mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->replicated) + continue; + if (!ipsec->xs->xso.real_dev) continue; =20 @@ -647,23 +1123,33 @@ static void bond_ipsec_free_sa(struct net_device *bo= nd_dev, bond =3D netdev_priv(bond_dev); =20 mutex_lock(&bond->ipsec_lock); - if (!xs->xso.real_dev) + ipsec =3D bond_ipsec_find(bond, xs); + if (ipsec && ipsec->replicated) { + list_del(&ipsec->list); + RCU_INIT_POINTER(xs->xso.upper_priv, NULL); + bond_ipsec_lag_call_state(xs, ipsec, false, true); + bond_ipsec_lag_free_instances(ipsec); + call_rcu(&ipsec->rcu, bond_ipsec_rcu_free); + xs->xso.real_dev =3D NULL; + xs->xso.offload_handle =3D 0; goto out; + } =20 real_dev =3D xs->xso.real_dev; + if (!real_dev) + goto free_ipsec; =20 xs->xso.real_dev =3D NULL; if (real_dev->xfrmdev_ops && real_dev->xfrmdev_ops->xdo_dev_state_free) real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs); -out: - list_for_each_entry(ipsec, &bond->ipsec_list, list) { - if (ipsec->xs =3D=3D xs) { - list_del(&ipsec->list); - kfree(ipsec); - break; - } + +free_ipsec: + if (ipsec) { + list_del(&ipsec->list); + kfree(ipsec); } +out: mutex_unlock(&bond->ipsec_lock); } =20 @@ -674,7 +1160,17 @@ static void bond_ipsec_free_sa(struct net_device *bon= d_dev, **/ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *= xs) { + struct net_device *bond_dev =3D xs->xso.dev; struct net_device *real_dev; + struct bonding *bond; + + if (!bond_dev) + return false; + + bond =3D netdev_priv(bond_dev); + if (bond_mode_can_use_lag_xfrm(bond)) + return xs->xso.type =3D=3D XFRM_DEV_OFFLOAD_CRYPTO && + rcu_access_pointer(xs->xso.upper_priv); =20 rcu_read_lock(); real_dev =3D bond_ipsec_dev(xs); @@ -735,6 +1231,47 @@ static void bond_xfrm_update_stats(struct xfrm_state = *xs) rcu_read_unlock(); } =20 +/* + * xdo_dev_state_lower_handle implementation for bond-owned XFRM states. + * lower_dev is the slave selected by the lower driver datapath. Replicate= d LAG + * state is resolved from the bond private instance list. Single-lower + * active-backup state is resolved from xso.real_dev/offload_handle here b= ecause + * xfrm_dev_state_lower_handle() delegates all bond-owned lookups to bondi= ng. + */ +static unsigned long bond_ipsec_lower_handle(struct net_device *bond_dev, + struct xfrm_state *xs, + struct net_device *lower_dev) +{ + struct bonding *bond =3D netdev_priv(bond_dev); + struct bond_ipsec_inst *inst; + struct bond_ipsec *ipsec; + unsigned long handle =3D 0; + + if (BOND_MODE(bond) =3D=3D BOND_MODE_ACTIVEBACKUP) { + struct net_device *real_dev =3D READ_ONCE(xs->xso.real_dev); + + return real_dev =3D=3D lower_dev ? READ_ONCE(xs->xso.offload_handle) : 0; + } + if (!bond_mode_can_use_lag_xfrm(bond)) + return 0; + + rcu_read_lock(); + ipsec =3D rcu_dereference(xs->xso.upper_priv); + if (!ipsec || !ipsec->replicated || ipsec->xs !=3D xs) + goto out; + + list_for_each_entry_rcu(inst, &ipsec->inst_list, list) { + if (READ_ONCE(inst->added) && inst->real_dev =3D=3D lower_dev) { + handle =3D inst->lower_handle; + break; + } + } + +out: + rcu_read_unlock(); + return handle; +} + static const struct xfrmdev_ops bond_xfrmdev_ops =3D { .xdo_dev_state_add =3D bond_ipsec_add_sa, .xdo_dev_state_delete =3D bond_ipsec_del_sa, @@ -742,7 +1279,25 @@ static const struct xfrmdev_ops bond_xfrmdev_ops =3D { .xdo_dev_offload_ok =3D bond_ipsec_offload_ok, .xdo_dev_state_advance_esn =3D bond_advance_esn_state, .xdo_dev_state_update_stats =3D bond_xfrm_update_stats, + .xdo_dev_state_lower_handle =3D bond_ipsec_lower_handle, }; +#else +static void bond_ipsec_lag_remove_slave(struct bonding *bond, + struct net_device *real_dev) +{ +} + +static int bond_ipsec_lag_add_slave(struct bonding *bond, + struct slave *slave, + struct netlink_ext_ack *extack) +{ + return 0; +} + +static void bond_sync_slave_xfrm_features(struct bonding *bond, + struct slave *slave) +{ +} #endif /* CONFIG_XFRM_OFFLOAD */ =20 /*------------------------------- Link status ----------------------------= ---*/ @@ -6006,10 +6561,11 @@ void bond_setup(struct net_device *bond_dev) bond_dev->priv_flags &=3D ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING); =20 #ifdef CONFIG_XFRM_OFFLOAD - /* set up xfrm device ops (only supported in active-backup right now) */ + /* set up xfrm device ops */ bond_dev->xfrmdev_ops =3D &bond_xfrmdev_ops; INIT_LIST_HEAD(&bond->ipsec_list); mutex_init(&bond->ipsec_lock); + bond->ipsec_lag_blocked =3D false; #endif /* CONFIG_XFRM_OFFLOAD */ =20 /* don't acquire bond device's netif_tx_lock when transmitting */ diff --git a/include/net/bonding.h b/include/net/bonding.h index edd1942dcd73..a581252b5b06 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -203,9 +203,24 @@ struct bond_up_slave { */ #define BOND_LINK_NOCHANGE -1 =20 +/* XFRM offload state tracked by bonding for one xfrm_state. */ struct bond_ipsec { struct list_head list; struct xfrm_state *xs; + struct list_head inst_list; + struct rcu_head rcu; + bool replicated; +}; + +/* Per-lower-device instance of a replicated LAG XFRM state. */ +struct bond_ipsec_inst { + struct list_head list; + struct net_device *real_dev; + netdevice_tracker dev_tracker; + unsigned long lower_handle; + struct rcu_head rcu; + bool added; + bool deleted; }; =20 /* @@ -259,8 +274,9 @@ struct bonding { struct rtnl_link_stats64 bond_stats; #ifdef CONFIG_XFRM_OFFLOAD struct list_head ipsec_list; - /* protecting ipsec_list */ + /* protecting ipsec_list and ipsec_lag_blocked */ struct mutex ipsec_lock; + bool ipsec_lag_blocked; #endif /* CONFIG_XFRM_OFFLOAD */ struct bpf_prog *xdp_prog; }; @@ -325,6 +341,13 @@ static inline bool bond_mode_can_use_xmit_hash(const s= truct bonding *bond) BOND_MODE(bond) =3D=3D BOND_MODE_ALB); } =20 +static inline bool bond_mode_can_use_lag_xfrm(const struct bonding *bond) +{ + return (BOND_MODE(bond) =3D=3D BOND_MODE_8023AD || + BOND_MODE(bond) =3D=3D BOND_MODE_XOR) && + bond->params.xmit_policy =3D=3D BOND_XMIT_POLICY_LAYER34; +} + static inline bool bond_mode_uses_xmit_hash(const struct bonding *bond) { return (BOND_MODE(bond) =3D=3D BOND_MODE_8023AD || @@ -712,6 +735,10 @@ void bond_slave_arr_work_rearm(struct bonding *bond, u= nsigned long delay); void bond_peer_notify_work_rearm(struct bonding *bond, unsigned long delay= ); void bond_work_init_all(struct bonding *bond); void bond_work_cancel_all(struct bonding *bond); +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD) +void bond_ipsec_lag_begin_flush(struct bonding *bond); +void bond_ipsec_lag_end_flush(struct bonding *bond); +#endif =20 #ifdef CONFIG_PROC_FS void bond_create_proc_entry(struct bonding *bond); --=20 2.53.0 From nobody Mon May 25 00:08:04 2026 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 943703A9612 for ; Wed, 20 May 2026 08:10:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264639; cv=none; b=M0b5wJqNYg+Her4kXSrq/cPnc8WXxuAHlhpSRJbPgDLVTK59hJOxhEt7P68iBeZQzYb+rn/5hedPJSDJ8xkXw/h0cT2gWm1UOcasy88hxsmmmmZdGhwUWeXOG9iyHgZkuNJN7Bx0y8Xxnf5Fcv+2rJs1wASdoeHxwvpK+Q923V4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264639; c=relaxed/simple; bh=4ND/hhTipseeVwzG+pxwJxvXf1aZaEOHsBBDRic9/Qc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=brkK6/h6yRVBpH38JViTXq+niSLjlrgkvn9ihAPlTOOwsCQQaSO/39BZ067KGFqrjFky3DYpwYvwdBlwn53KcA016oAy+pcNI0u9CqGc7gqGOj/g9IhEz+OZlETio+BRqQaBXysyyLFSPyi3pStcYHye7K3prrgJcWEtQ9qb//U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dFN2bEbs; arc=none smtp.client-ip=209.85.216.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dFN2bEbs" Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-3665a90bcd3so4711169a91.1 for ; Wed, 20 May 2026 01:10:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779264634; x=1779869434; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lF/xBy0b+pG1BSktUEe93aPhpfzhEzGrgWB/FDUpwfM=; b=dFN2bEbsSig7CXMMUlHEvPJ7vEnxhyDyz+6UubH7TXUVg4JAdbr2GpAvBxXnVXpbSe bLqxRX4bpTwrTngeTj2Qa1m/pvrpDuPybMVPnWhB0+l+IRAuH5G340jyu/Ej8Ps2xKgV pIDu1LP3hA2KFSFU1g+nBB1X74Ew7fz7cMs30vtQe7Bfn+EBCQ40joXH8Oic6Csc1A5Z m1HSSeTjjB4B+fApK7igAU6i7Cuj931VurPZgvurlgmvniynHDKh7154Px8tyKL1kZzY Wwzk65mzO2fSpe9w74UH7lmJFymawIVFMVD+QwYGO4ERW/kWYpZ36wOWcrJTm0Gt285v pXEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779264634; x=1779869434; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lF/xBy0b+pG1BSktUEe93aPhpfzhEzGrgWB/FDUpwfM=; b=OD6NEV8czItOfIxuQNn7ozahUo8nKa6NU9Lj2jnz4bGbOEpPKMeJ6G5r8Bbp7Cwj18 xY6qg+D107dVmAjUaffjy/mkVA7fuR7J1Fnw/bVXVoUGCoXLdM/w11u+7xn0UivaypYv Ac/Pq+8/fNtB7V7yd1oHPB1IhKackYITGGoO+9OlLe5K9u4AGjzgorYAop/2DMzdrakm 3se7MKRVTcXQE5RQ3wdp7/r0UzqKIQxD7zNKuI7TEG74y6BQPE+1lZ834glF5nHfJU4K uwxdWcMhXJhgLNYTITmuNpddIKrMy6TS5EEljWwd1soAgBiI7COb5p6iPlLxSKZcz+s4 zAvg== X-Forwarded-Encrypted: i=1; AFNElJ8+iQb8mNumh3x3Kdprgzaj0FyuB9KEPumB7JHs7EHHYMgTG0LSHuEMi9zOU8OzOkXBypWcyzhX2xdDCPY=@vger.kernel.org X-Gm-Message-State: AOJu0YwggqwUJ5xcLDnfm1RLNyx4y3JFYIFkoXb95+ki1b5bNrLCuINq JPb6NhB1Vos3fsfaIi8ZOncPJ+fYvAjnNcxeaYupM0YeVudBMBH386iy X-Gm-Gg: Acq92OE3gLca/1HN/N8lpHjidFlde7MaGL2dSlvFAZS9XZ8a52FaxOAWb+LkAz+hLRh MVNQpOD6n0K4IhOtdvwYsRp612RkHwQAK2XD/jwzZPGpGDTkyiLRioV1Ln+tCU8/Te85pifpje3 ADMG5iG37dGHXA1Egk4VfJ9tq0KGU06NadKGE30ujUR91pUWehZe2WMNWUtgJgQDKplwWNxcDMU 1kBvEYT6NostR33AoFZ3fZuakJLPhhMm/9X6MyqjhRHFkekyIAxo+KNoG5eVI5WfnJTspzN7590 cG+HP96TqS4Ye0poMzPbWzR+5f1r8xbMAJaRKr+eNgNXiGWTK9b+7k/ramf7oRI8XKWhpXC7xJ6 OHTJKK/YESTDvoPjb4N3VvP3/cgD84RVRPIqDM3mfkVuVOd67222khk+3E6W7wRAO+2Fow8GtCD +H7jN4E5qmnU6LciBpy0YI X-Received: by 2002:a17:902:da8a:b0:2bd:eb0d:efb7 with SMTP id d9443c01a7336-2bdeb0df7b4mr117589775ad.1.1779264634180; Wed, 20 May 2026 01:10:34 -0700 (PDT) Received: from mincom1 ([14.67.155.25]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5d116287sm211632735ad.68.2026.05.20.01.10.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 01:10:33 -0700 (PDT) From: Jihong Min To: netdev@vger.kernel.org Cc: Jay Vosburgh , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Steffen Klassert , Herbert Xu , linux-kernel@vger.kernel.org, Jihong Min Subject: [PATCH RFC net-next 3/4] bonding: expose user-controlled IPsec features for LAG Date: Wed, 20 May 2026 17:10:03 +0900 Message-ID: <20260520081004.2232091-4-hurryman2212@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260520081004.2232091-1-hurryman2212@gmail.com> References: <20260520081004.2232091-1-hurryman2212@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Expose LAG IPsec offload as user-controlled bonding features instead of enabling it by default. Keep the existing active-backup default behavior, but make newly eligible LAG bonds start with ESP/XFRM features explicitly disabled so users opt in with ethtool. Let 802.3ad and balance-xor with layer3+4 advertise the intersection of XFRM features across running eligible slaves, with supported features shown as mutable off features rather than fixed-off capabilities. Propagate mutable XFRM feature requests to running lower devices, verify that requested features are actually enabled, and roll lower devices back if propagation fails. Disable dependent ESP checksum and segmentation features when HW ESP is not available. Assisted-by: Codex:gpt-5.5 Signed-off-by: Jihong Min --- drivers/net/bonding/bond_main.c | 232 +++++++++++++++++++++++++++++ drivers/net/bonding/bond_options.c | 2 +- 2 files changed, 233 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_mai= n.c index 66435de852e9..d81dae5a1902 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2048,6 +2048,13 @@ static netdev_features_t bond_fix_features(struct ne= t_device *dev, struct list_head *iter; netdev_features_t mask; struct slave *slave; +#ifdef CONFIG_XFRM_OFFLOAD + netdev_features_t lag_xfrm_features =3D BOND_XFRM_FEATURES; + bool ab_xfrm =3D BOND_MODE(bond) =3D=3D BOND_MODE_ACTIVEBACKUP; + bool lag_xfrm_ok =3D true; + bool lag_xfrm =3D bond_mode_can_use_lag_xfrm(bond); + int lag_xfrm_slaves =3D 0; +#endif /* CONFIG_XFRM_OFFLOAD */ =20 mask =3D features; features =3D netdev_base_features(features); @@ -2056,12 +2063,234 @@ static netdev_features_t bond_fix_features(struct = net_device *dev, features =3D netdev_increment_features(features, slave->dev->features, mask); +#ifdef CONFIG_XFRM_OFFLOAD + if (lag_xfrm && (mask & BOND_XFRM_FEATURES) && + netif_running(slave->dev)) { + netdev_features_t slave_xfrm_features; + netdev_features_t slave_xfrm_enableable; + netdev_features_t missing; + + slave_xfrm_features =3D slave->dev->features & + BOND_XFRM_FEATURES; + slave_xfrm_enableable =3D slave->dev->hw_features & + mask & BOND_XFRM_FEATURES; + slave_xfrm_features |=3D slave_xfrm_enableable; + missing =3D (BOND_XFRM_FEATURES & mask) & + ~slave_xfrm_features; + if (missing) + slave_dbg(dev, slave->dev, + "missing LAG XFRM feature(s) %pNF\n", + &missing); + lag_xfrm_features &=3D slave_xfrm_features; + + if (!(slave_xfrm_features & NETIF_F_HW_ESP) || + !bond_ipsec_lag_slave_has_ops(slave->dev)) { + slave_dbg(dev, slave->dev, + "missing LAG XFRM offload ops\n"); + lag_xfrm_ok =3D false; + } + lag_xfrm_slaves++; + } +#endif /* CONFIG_XFRM_OFFLOAD */ } features =3D netdev_add_tso_features(features, mask); =20 +#ifdef CONFIG_XFRM_OFFLOAD + if (!ab_xfrm && !lag_xfrm) + features &=3D ~BOND_XFRM_FEATURES; + else if (lag_xfrm && (!lag_xfrm_ok || !lag_xfrm_slaves)) + features &=3D ~BOND_XFRM_FEATURES; + else if (lag_xfrm) + features =3D (features & ~BOND_XFRM_FEATURES) | + (lag_xfrm_features & mask); + if (!(features & NETIF_F_HW_ESP)) + features &=3D ~(NETIF_F_HW_ESP_TX_CSUM | NETIF_F_GSO_ESP); +#endif /* CONFIG_XFRM_OFFLOAD */ + return features; } =20 +#ifdef CONFIG_XFRM_OFFLOAD +static int bond_set_slave_xfrm_features(struct bonding *bond, + struct slave *slave, + netdev_features_t features) +{ + struct net_device *real_dev =3D slave->dev; + netdev_features_t xfrm_features; + netdev_features_t mutable; + bool notifier_ctx; + int err =3D 0; + + mutable =3D real_dev->hw_features & BOND_XFRM_FEATURES; + if (!mutable) + return 0; + + xfrm_features =3D features & BOND_XFRM_FEATURES; + + notifier_ctx =3D bond->notifier_ctx; + bond->notifier_ctx =3D true; + netdev_lock_ops(real_dev); + real_dev->wanted_features &=3D ~mutable; + real_dev->wanted_features |=3D xfrm_features & mutable; + err =3D __netdev_update_features(real_dev); + if (err) + netdev_features_change(real_dev); + netdev_unlock_ops(real_dev); + bond->notifier_ctx =3D notifier_ctx; + + return err < 0 ? err : 0; +} + +static void bond_restore_slave_xfrm_features(struct bonding *bond, + netdev_features_t features) +{ + struct list_head *iter; + struct slave *slave; + + bond_for_each_slave(bond, slave, iter) { + if (!netif_running(slave->dev)) + continue; + + bond_set_slave_xfrm_features(bond, slave, features); + } +} + +static void bond_sync_slave_xfrm_features(struct bonding *bond, + struct slave *slave) +{ + netdev_features_t requested =3D bond->dev->wanted_features; + netdev_features_t old_wanted =3D slave->dev->wanted_features; + netdev_features_t available; + netdev_features_t missing; + int err; + + if (!bond_mode_can_use_lag_xfrm(bond)) + return; + + if (!netif_running(slave->dev)) + return; + + requested &=3D BOND_XFRM_FEATURES; + if (!requested) + return; + + available =3D slave->dev->features | slave->dev->hw_features; + missing =3D requested & ~available; + if ((requested & NETIF_F_HW_ESP) && + !bond_ipsec_lag_slave_has_ops(slave->dev)) + missing |=3D NETIF_F_HW_ESP; + if (missing) + goto disable_missing; + + err =3D bond_set_slave_xfrm_features(bond, slave, requested); + missing =3D requested & ~slave->dev->features; + if (err && !missing) + missing =3D requested; + if (!missing) + return; + + bond_set_slave_xfrm_features(bond, slave, old_wanted); + +disable_missing: + if (missing & NETIF_F_HW_ESP) + missing |=3D BOND_XFRM_FEATURES; + slave_warn(bond->dev, slave->dev, + "disabling XFRM feature(s) %pNF after slave enable failed\n", + &missing); + bond->dev->wanted_features &=3D ~missing; +} + +static int bond_set_features(struct net_device *dev, netdev_features_t fea= tures) +{ + struct bonding *bond =3D netdev_priv(dev); + netdev_features_t changed; + netdev_features_t enabled; + struct list_head *iter; + struct slave *slave; + int err =3D 0; + + if (!bond_mode_can_use_lag_xfrm(bond)) + return 0; + + changed =3D (dev->features ^ features) & BOND_XFRM_FEATURES; + if (!changed) + return 0; + + enabled =3D features & BOND_XFRM_FEATURES; + if (enabled) { + int targets =3D 0; + + bond_for_each_slave(bond, slave, iter) { + netdev_features_t available; + netdev_features_t missing; + + if (!netif_running(slave->dev)) + continue; + + available =3D slave->dev->features | slave->dev->hw_features; + missing =3D enabled & ~available; + if ((enabled & NETIF_F_HW_ESP) && + !bond_ipsec_lag_slave_has_ops(slave->dev)) + missing |=3D NETIF_F_HW_ESP; + if (missing) { + slave_warn(dev, slave->dev, + "missing XFRM feature(s) %pNF\n", + &missing); + return -EOPNOTSUPP; + } + targets++; + } + if (!targets) + return -EOPNOTSUPP; + } + + if ((dev->features & NETIF_F_HW_ESP) && + !(features & NETIF_F_HW_ESP)) { + bond_ipsec_lag_begin_flush(bond); + xfrm_dev_state_flush(dev_net(dev), dev, true); + } + + bond_for_each_slave(bond, slave, iter) { + if (!netif_running(slave->dev)) + continue; + + err =3D bond_set_slave_xfrm_features(bond, slave, features); + if (err) + break; + } + if (err) { + bond_restore_slave_xfrm_features(bond, dev->features); + if ((dev->features & NETIF_F_HW_ESP) && + !(features & NETIF_F_HW_ESP)) + bond_ipsec_lag_end_flush(bond); + return err; + } + + bond_for_each_slave(bond, slave, iter) { + netdev_features_t missing =3D enabled & ~slave->dev->features; + + if (!netif_running(slave->dev)) + continue; + + if (missing) { + slave_warn(dev, slave->dev, + "failed to enable XFRM feature(s) %pNF\n", + &missing); + bond_restore_slave_xfrm_features(bond, dev->features); + if ((dev->features & NETIF_F_HW_ESP) && + !(features & NETIF_F_HW_ESP)) + bond_ipsec_lag_end_flush(bond); + return -EOPNOTSUPP; + } + } + + if (features & NETIF_F_HW_ESP) + bond_ipsec_lag_end_flush(bond); + + return 0; +} +#endif /* CONFIG_XFRM_OFFLOAD */ + static int bond_header_create(struct sk_buff *skb, struct net_device *bond= _dev, unsigned short type, const void *daddr, const void *saddr, unsigned int len) @@ -6510,6 +6739,9 @@ static const struct net_device_ops bond_netdev_ops = =3D { .ndo_add_slave =3D bond_enslave, .ndo_del_slave =3D bond_release, .ndo_fix_features =3D bond_fix_features, +#ifdef CONFIG_XFRM_OFFLOAD + .ndo_set_features =3D bond_set_features, +#endif /* CONFIG_XFRM_OFFLOAD */ .ndo_features_check =3D passthru_features_check, .ndo_get_xmit_slave =3D bond_xmit_get_slave, .ndo_sk_get_lower_dev =3D bond_sk_get_lower_dev, diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_= options.c index 7380cc4ee75a..634b42c0d8e9 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -885,7 +885,7 @@ static bool bond_set_xfrm_features(struct bonding *bond) =20 if (BOND_MODE(bond) =3D=3D BOND_MODE_ACTIVEBACKUP) bond->dev->wanted_features |=3D BOND_XFRM_FEATURES; - else + else if (!bond_mode_can_use_lag_xfrm(bond)) bond->dev->wanted_features &=3D ~BOND_XFRM_FEATURES; =20 return true; --=20 2.53.0 From nobody Mon May 25 00:08:04 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F254A3A8734 for ; Wed, 20 May 2026 08:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264639; cv=none; b=qqJFgWktL9w77idKbKrIKzvEZbcvr9R5cMcEPS+zvzFqDcLeP2a5qqpWGaFeQBSsEEH2Vdhh99LaxgES0wDs1m3vqAFrdyMSf9na03IPeCBL48ymfyQUQePYCWFruK27e8CMcfPAYEEy1dUJr8dSFKw2goJ97pNzrXCOIBYHobw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779264639; c=relaxed/simple; bh=k+1bVDzpBLn4HjRGfnqPaZ2j211rDIpZJWTrRzdAZSc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q45wMd4/lU42Y1n8c3FqIhnWPMdg2B/xiCn1C0ntcUJC4SeBpSdfb2kehYZJXi4sRBJbJv+EafUpI/4qMkAjLZYyFaJfJ1OnDo8ywKokSyP/WhIR/zuz3hH1yoI+v/VeJ6vDJdnNPjXwEHsAC8q/AfE09QN8a04vKj6RV3woG5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aDGxBYAf; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aDGxBYAf" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2bab82d75fdso21715495ad.2 for ; Wed, 20 May 2026 01:10:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779264637; x=1779869437; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y5id6WMBtbH44nKXBNyAlULAMrWlMSnG/2pQM6iYYZo=; b=aDGxBYAfRaf4smjiqu862z8rF6EZoW1Vyj6fVqZAfEascL3bqvXoU+6U6qdRmpIQG4 YesQM5LccHLxW7r4Bqpov1qCxgifWQqSxrFqHVS5AJt2BtLwO1CPHguc6EvaTMy0hMRO 8MxJ9D2tAXZTLicga23IShqCF1Gk9AMS+kpwT/tBL//IBrDft9BdFK3p7PrbmBeYx18R J+mBLLuj+2BmZN9m4jxCLb9Rf8uTcFrOByGj2hY37R4sfcF5ClFFlVwuSHAmZ11ye2MD ZbdpnuQhIxcZv3KmDDdaWhOhlBbUrBIiiLO5X24FQqj9fLMKkAkIdxJaSjfx+cjJu22b Nb7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779264637; x=1779869437; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Y5id6WMBtbH44nKXBNyAlULAMrWlMSnG/2pQM6iYYZo=; b=C0Z4izHF0/nS10OdA4P1i6GR1kzy5lKuxtrc1MmN6EGhVbgiVHgES2TGtkrX+fPy/E iviYymvkpmkziriXvNVt6JMP7LxlHg9+vXyhFdk8NxNRXqYLeC7PO7Ai+OxPw8Lzw0Mp NZa/tr+ppc4ZUMr0gcSZZTq6y790Vg7i2bxXGP0ViuxoSMAZLACJkl5ZIuXkjYg3dTwZ TjfP7D+3LQZrfXswzQXiAQ+7U5LP2p6w2EqodFsxLrABIOIPzWLjiQIUtSAogjtsmBQw opmgpVdN0ItRCmOismWcwrJRUBASpUOA1KzhI7hHuJRjosqeG3lTXwO96uhUtslIbfXS Nmzw== X-Forwarded-Encrypted: i=1; AFNElJ9I3KJRA5nVF/7IHLAdtdhRJuRzkcnoOdbadH38S9gn0C8gt2niDXXvimRDLLcjuv125SREQ6YkVRtXQpI=@vger.kernel.org X-Gm-Message-State: AOJu0YxH/+Vf+FzGeUTjSec26/ShqgIW7RiFoOfVdrprnZeUDuFVXZUr oOYC2DztAwIxgzb7mISn/Fyr1G3sLawSuWv/OGTFQajqKdoQmevSbBYs X-Gm-Gg: Acq92OEHvsUxU43IwUoKVqs4Vw8WV/s2870LiVZoiNc46V27pcYxB1q4RHRZeYHG47b yGV46u0eMiZP41GOxEOgJT8MyixniturN50B6e5KAfSpEtUSwzxOYjPI3RfHA6DDOcqMxJ7OqAq BCaaAp/i85EjaYU1NCYdwZwVKt2TIcln4Dmspbs92W5TTF9A4fTruneLkm63/HWEmOKZu//EeFr sTSputyU8aQT81WGyaa1bmQU3UejlGUPMMpsGSjOGlzR24UYe46dYIGb3UKtBxZ/cO1YAlr9+m2 KUZyptG2vYworOWRLePcgzzMaCVXF9fpHCV+2hNsHgIZv0d/A8gGlegieH3fUFs030YZgOTRT68 nxurgdEABAMmuQhxZZt5juXkhmWQTg+uxJZ6PuG6CjJFmfFaHwnSH7DthbfiwdmsTT9jRbS3Nvx bFs7eRYOYzBi1JaFA/04wRlErsKpQ3S8U= X-Received: by 2002:a17:903:1209:b0:2bd:d4dc:a4f9 with SMTP id d9443c01a7336-2bdd4dcb523mr142439315ad.26.1779264636753; Wed, 20 May 2026 01:10:36 -0700 (PDT) Received: from mincom1 ([14.67.155.25]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5d116287sm211632735ad.68.2026.05.20.01.10.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 01:10:36 -0700 (PDT) From: Jihong Min To: netdev@vger.kernel.org Cc: Jay Vosburgh , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Steffen Klassert , Herbert Xu , linux-kernel@vger.kernel.org, Jihong Min Subject: [PATCH RFC net-next 4/4] bonding: handle replicated IPsec SAs across LAG changes Date: Wed, 20 May 2026 17:10:04 +0900 Message-ID: <20260520081004.2232091-5-hurryman2212@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260520081004.2232091-1-hurryman2212@gmail.com> References: <20260520081004.2232091-1-hurryman2212@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Keep replicated bonding IPsec state consistent as the LAG changes. Add newly usable slaves to existing replicated states, remove only the departing lower instance on down/remove, and update the usable slave array before hiding lower handles. Flush bond-owned XFRM offload state when mode or hash policy leaves the LAG XFRM eligible configuration. Block new LAG offload adds while pending replicated states are cleaned up and the XFRM table is flushed. Assisted-by: Codex:gpt-5.5 Signed-off-by: Jihong Min --- drivers/net/bonding/bond_main.c | 45 ++++++++++++++++++++--- drivers/net/bonding/bond_options.c | 57 ++++++++++++++++++++++++++++++ 2 files changed, 98 insertions(+), 4 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_mai= n.c index d81dae5a1902..0243950c2fa6 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3104,6 +3104,18 @@ int bond_enslave(struct net_device *bond_dev, struct= net_device *slave_dev, bpf_prog_inc(bond->xdp_prog); } =20 +#ifdef CONFIG_XFRM_OFFLOAD + if ((bond_dev->wanted_features & BOND_XFRM_FEATURES) && + bond_mode_can_use_lag_xfrm(bond)) { + bond_sync_slave_xfrm_features(bond, new_slave); + bond->notifier_ctx =3D true; + netdev_compute_master_upper_features(bond->dev, true); + bond->notifier_ctx =3D false; + } +#endif /* CONFIG_XFRM_OFFLOAD */ + + bond_ipsec_lag_add_slave(bond, new_slave, extack); + /* broadcast mode uses the all_slaves to loop through slaves. */ if (bond_mode_can_use_xmit_hash(bond) || BOND_MODE(bond) =3D=3D BOND_MODE_BROADCAST) @@ -3222,6 +3234,9 @@ static int __bond_release_one(struct net_device *bond= _dev, } =20 bond_set_slave_inactive_flags(slave, BOND_SLAVE_NOTIFY_NOW); + if (bond_mode_can_use_xmit_hash(bond) || + BOND_MODE(bond) =3D=3D BOND_MODE_BROADCAST) + bond_update_slave_arr(bond, slave); =20 bond_sysfs_slave_del(slave); =20 @@ -3239,8 +3254,10 @@ static int __bond_release_one(struct net_device *bon= d_dev, slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n"); } =20 - /* unregister rx_handler early so bond_handle_frame wouldn't be called - * for this slave anymore. + bond_ipsec_lag_remove_slave(bond, slave_dev); + + /* unregister rx_handler after lower IPsec state is gone so RX cannot + * bypass the bond while a bond-owned SA is still installed. */ netdev_rx_handler_unregister(slave_dev); =20 @@ -4758,8 +4775,13 @@ static int bond_slave_netdev_event(unsigned long eve= nt, =20 if (BOND_MODE(bond) =3D=3D BOND_MODE_8023AD) bond_3ad_adapter_speed_duplex_changed(slave); - fallthrough; - case NETDEV_DOWN: + bond_sync_slave_xfrm_features(bond, slave); + if (bond_mode_can_use_lag_xfrm(bond)) { + bond->notifier_ctx =3D true; + netdev_compute_master_upper_features(bond->dev, true); + bond->notifier_ctx =3D false; + } + bond_ipsec_lag_add_slave(bond, slave, NULL); /* Refresh slave-array if applicable! * If the setup does not use miimon or arpmon (mode-specific!), * then these events will not cause the slave-array to be @@ -4771,6 +4793,19 @@ static int bond_slave_netdev_event(unsigned long eve= nt, if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); break; + case NETDEV_DOWN: + /* Refresh slave-array before deleting IPsec state so no new + * TX path picks this slave after its offload handle is hidden. + */ + if (bond_mode_can_use_xmit_hash(bond)) + bond_update_slave_arr(bond, slave); + bond_ipsec_lag_remove_slave(bond, slave_dev); + if (bond_mode_can_use_lag_xfrm(bond)) { + bond->notifier_ctx =3D true; + netdev_compute_master_upper_features(bond->dev, true); + bond->notifier_ctx =3D false; + } + break; case NETDEV_CHANGEMTU: /* TODO: Should slaves be allowed to * independently alter their MTU? For @@ -4809,10 +4844,12 @@ static int bond_slave_netdev_event(unsigned long ev= ent, break; case NETDEV_FEAT_CHANGE: if (!bond->notifier_ctx) { + bond_sync_slave_xfrm_features(bond, slave); bond->notifier_ctx =3D true; netdev_compute_master_upper_features(bond->dev, true); bond->notifier_ctx =3D false; } + bond_ipsec_lag_add_slave(bond, slave, NULL); break; case NETDEV_RESEND_IGMP: /* Propagate to master device */ diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_= options.c index 634b42c0d8e9..ee3ffc698d7d 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -17,6 +17,7 @@ =20 #include #include +#include =20 static int bond_option_active_slave_set(struct bonding *bond, const struct bond_opt_value *newval); @@ -894,6 +895,13 @@ static bool bond_set_xfrm_features(struct bonding *bon= d) static int bond_option_mode_set(struct bonding *bond, const struct bond_opt_value *newval) { +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD) + bool old_ab_xfrm =3D BOND_MODE(bond) =3D=3D BOND_MODE_ACTIVEBACKUP; + bool old_lag_xfrm =3D bond_mode_can_use_lag_xfrm(bond); + bool new_lag_xfrm; + bool flush_lag_xfrm =3D false; +#endif + if (bond->xdp_prog && !bond_xdp_check(bond, newval->value)) return -EOPNOTSUPP; =20 @@ -918,8 +926,26 @@ static int bond_option_mode_set(struct bonding *bond, =20 /* don't cache arp_validate between modes */ bond->params.arp_validate =3D BOND_ARP_VALIDATE_NONE; + bond->params.mode =3D newval->value; =20 +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD) + new_lag_xfrm =3D bond_mode_can_use_lag_xfrm(bond); + if (old_ab_xfrm && new_lag_xfrm) + bond->dev->wanted_features &=3D ~BOND_XFRM_FEATURES; + if (old_lag_xfrm && !new_lag_xfrm) { + bond_ipsec_lag_begin_flush(bond); + flush_lag_xfrm =3D true; + } + + if (flush_lag_xfrm) { + if (bond->dev->reg_state =3D=3D NETREG_REGISTERED) + xfrm_dev_state_flush(dev_net(bond->dev), bond->dev, + true); + bond_ipsec_lag_end_flush(bond); + } +#endif + /* When changing mode, the bond device is down, we may reduce * the bond_bcast_neigh_enabled in bond_close() if broadcast_neighbor * enabled in 8023ad mode. Therefore, only clear broadcast_neighbor @@ -1575,12 +1601,43 @@ static int bond_option_fail_over_mac_set(struct bon= ding *bond, static int bond_option_xmit_hash_policy_set(struct bonding *bond, const struct bond_opt_value *newval) { +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD) + bool old_lag_xfrm =3D bond_mode_can_use_lag_xfrm(bond); + bool new_lag_xfrm; + bool flush_lag_xfrm =3D false; +#endif + if (bond->xdp_prog && !__bond_xdp_check(BOND_MODE(bond), newval->value)) return -EOPNOTSUPP; netdev_dbg(bond->dev, "Setting xmit hash policy to %s (%llu)\n", newval->string, newval->value); + bond->params.xmit_policy =3D newval->value; =20 +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD) + new_lag_xfrm =3D bond_mode_can_use_lag_xfrm(bond); + if (old_lag_xfrm && !new_lag_xfrm) { + bond_ipsec_lag_begin_flush(bond); + flush_lag_xfrm =3D true; + } + + if (flush_lag_xfrm) { + if (bond->dev->reg_state =3D=3D NETREG_REGISTERED) + xfrm_dev_state_flush(dev_net(bond->dev), bond->dev, + true); + bond_ipsec_lag_end_flush(bond); + } +#endif + + if (bond->dev->reg_state =3D=3D NETREG_REGISTERED) { + bool update =3D false; + + update |=3D bond_set_xfrm_features(bond); + + if (update) + netdev_update_features(bond->dev); + } + return 0; } =20 --=20 2.53.0