From nobody Wed Dec 17 09:00:52 2025 Received: from mx0b-0064b401.pphosted.com (mx0b-0064b401.pphosted.com [205.220.178.238]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F3ACC2F2; Mon, 12 May 2025 03:19:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.178.238 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747019970; cv=none; b=g2If08IEU+s8ctOdePVxUDyY6ts9trb25gzOtgLSE99afVIYBXyi3JTpm2255i576zPFKWif7sttDb0is2DsBhYnjb7Ze984Ref8r/oPg5fP5x1ZNANOx8SxJZ13n0tHKDfibuJJtcUaCdYqrcdzrBh1PqHNNMLoL+v4szOI8eQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747019970; c=relaxed/simple; bh=OA4Hw06E2Cdb16C5vZkhnOCcZQj4egfXWOx3Iqw3nlk=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=bwAOXepv8Noe0k5d0puUKGoWfFlV3befTp8iYp9IN78BQ8SnqGCVOlFzBaFOE9DFyzPswPSd+RQKyP7f94vX7wFObJ0K26GW3SEZrC2BFJxpUMB+RoAt4ReMn+c/lV2x9NaMnWE93qZ5VTrATIbRvx4AGS4OZyU+d8tRkipo9C0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=windriver.com; spf=pass smtp.mailfrom=windriver.com; arc=none smtp.client-ip=205.220.178.238 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=windriver.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=windriver.com Received: from pps.filterd (m0250812.ppops.net [127.0.0.1]) by mx0a-0064b401.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 54C27nYk004464; Mon, 12 May 2025 03:18:55 GMT Received: from ala-exchng02.corp.ad.wrs.com (ala-exchng02.wrs.com [147.11.82.254]) by mx0a-0064b401.pphosted.com (PPS) with ESMTPS id 46hws89991-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Mon, 12 May 2025 03:18:54 +0000 (GMT) Received: from ALA-EXCHNG02.corp.ad.wrs.com (147.11.82.254) by ALA-EXCHNG02.corp.ad.wrs.com (147.11.82.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.43; Sun, 11 May 2025 20:18:53 -0700 Received: from pek-lpg-core1.wrs.com (147.11.136.210) by ALA-EXCHNG02.corp.ad.wrs.com (147.11.82.254) with Microsoft SMTP Server id 15.1.2507.43 via Frontend Transport; Sun, 11 May 2025 20:18:48 -0700 From: To: , CC: , , , , , , , , , , , , , , , , , Subject: [PATCH 6.1.y] bpf: support deferring bpf_link dealloc to after RCU grace period Date: Mon, 12 May 2025 11:18:47 +0800 Message-ID: <20250512031847.3331135-1-jianqi.ren.cn@windriver.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: dYoj7KhBQdfsw-CznJjOh4EvUqLC69iQ X-Proofpoint-GUID: dYoj7KhBQdfsw-CznJjOh4EvUqLC69iQ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwNTEyMDAzMyBTYWx0ZWRfX6AXKM2FIPhlt OOb9cbfvvpb6LIEf2yGA5J+fKwr4sethcyhIVqiPCtGVENFLmWFVqOej4/Ui9ueDTRmNVZJ9H44 jZgEjKm8UWFUOc4pLo0g3MLtjfqZ+XjVGk5p2cylpxd2wsKV3/PN0Xyhc6qwGdS8MdtVnGPHnKD 2il+SPuicbVf3OjRIjK+1wP4FWP0842LpiPSmbftzkl4+G2RYCjJZA97n1cRgLNyiNfosQ1o4Zu TSxVp7XA4KypMHgyMVGhUF8VL1Q4jzputCK18ZuSMXXiA6OJL1dD/oXB9S9YRFpAY6e7JtaGxPO uXIdePIdI7n+d7Ra4sdPxi/XFYKhEHTHrS2dk8cGcOiu38YjlkQ1Fg3YTHjCO/8ke8qNZq/r+CI lCdZ8PlPe3pIl/XW496JvjqSZ9fFDKphjvqeZX1COJhJ3PF2eWsSmiGLxHAJsPSXoVJ+H5Lw X-Authority-Analysis: v=2.4 cv=Q+HS452a c=1 sm=1 tr=0 ts=6821689e cx=c_pps a=K4BcnWQioVPsTJd46EJO2w==:117 a=K4BcnWQioVPsTJd46EJO2w==:17 a=dt9VzEwgFbYA:10 a=VwQbUJbxAAAA:8 a=hSkVLCK3AAAA:8 a=t7CeM3EgAAAA:8 a=TqKKbRrxTIIqW3QnL98A:9 a=cQPPKAXgyycSBL8etih5:22 a=FdTzh2GWekK77mhwV6Dw:22 X-Sensitive_Customer_Information: Yes X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1099,Hydra:6.0.736,FMLib:17.12.80.40 definitions=2025-05-12_01,2025-05-09_01,2025-02-21_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 spamscore=0 clxscore=1011 suspectscore=0 adultscore=0 malwarescore=0 impostorscore=0 mlxscore=0 classifier=spam authscore=0 authtc=n/a authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.21.0-2504070000 definitions=main-2505120033 Content-Type: text/plain; charset="utf-8" From: Andrii Nakryiko commit 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce upstream. BPF link for some program types is passed as a "context" which can be used by those BPF programs to look up additional information. E.g., for multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values. Because of this runtime dependency, when bpf_link refcnt drops to zero there could still be active BPF programs running accessing link data. This patch adds generic support to defer bpf_link dealloc callback to after RCU GP, if requested. This is done by exposing two different deallocation callbacks, one synchronous and one deferred. If deferred one is provided, bpf_link_free() will schedule dealloc_deferred() callback to happen after RCU GP. BPF is using two flavors of RCU: "classic" non-sleepable one and RCU tasks trace one. The latter is used when sleepable BPF programs are used. bpf_link_free() accommodates that by checking underlying BPF program's sleepable flag, and goes either through normal RCU GP only for non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF program is sleepable. We use this for multi-kprobe and multi-uprobe links, which dereference link during program run. We also preventively switch raw_tp link to use deferred dealloc callback, as upcoming changes in bpf-next tree expose raw_tp link data (specifically, cookie value) to BPF program at runtime as well. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko Acked-by: Jiri Olsa Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov [fixed conflicts due to missing commits 89ae89f53d20 ("bpf: Add multi uprobe link")] Signed-off-by: Jianqi Ren Signed-off-by: He Zhe --- Verified the build test --- include/linux/bpf.h | 16 +++++++++++++++- kernel/bpf/syscall.c | 35 ++++++++++++++++++++++++++++++++--- kernel/trace/bpf_trace.c | 2 +- 3 files changed, 48 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e9c1338851e3..1cf8c7037289 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1293,12 +1293,26 @@ struct bpf_link { enum bpf_link_type type; const struct bpf_link_ops *ops; struct bpf_prog *prog; - struct work_struct work; + /* rcu is used before freeing, work can be used to schedule that + * RCU-based freeing before that, so they never overlap + */ + union { + struct rcu_head rcu; + struct work_struct work; + }; }; =20 struct bpf_link_ops { void (*release)(struct bpf_link *link); + /* deallocate link resources callback, called without RCU grace period + * waiting + */ void (*dealloc)(struct bpf_link *link); + /* deallocate link resources callback, called after RCU grace period; + * if underlying BPF program is sleepable we go through tasks trace + * RCU GP and then "classic" RCU GP + */ + void (*dealloc_deferred)(struct bpf_link *link); int (*detach)(struct bpf_link *link); int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog, struct bpf_prog *old_prog); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 27fdf1b2fc46..1cc9b28b065a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2750,17 +2750,46 @@ void bpf_link_inc(struct bpf_link *link) atomic64_inc(&link->refcnt); } =20 +static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) +{ + struct bpf_link *link =3D container_of(rcu, struct bpf_link, rcu); + + /* free bpf_link and its containing memory */ + link->ops->dealloc_deferred(link); +} + +static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) +{ + if (rcu_trace_implies_rcu_gp()) + bpf_link_defer_dealloc_rcu_gp(rcu); + else + call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); +} + /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { + bool sleepable =3D false; + bpf_link_free_id(link->id); if (link->prog) { + sleepable =3D link->prog->aux->sleepable; /* detach BPF program, clean up used resources */ link->ops->release(link); bpf_prog_put(link->prog); } - /* free bpf_link and its containing memory */ - link->ops->dealloc(link); + if (link->ops->dealloc_deferred) { + /* schedule BPF link deallocation; if underlying BPF program + * is sleepable, we need to first wait for RCU tasks trace + * sync, then go through "classic" RCU grace period + */ + if (sleepable) + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); + else + call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); + } + if (link->ops->dealloc) + link->ops->dealloc(link); } =20 static void bpf_link_put_deferred(struct work_struct *work) @@ -3246,7 +3275,7 @@ static int bpf_raw_tp_link_fill_link_info(const struc= t bpf_link *link, =20 static const struct bpf_link_ops bpf_raw_tp_link_lops =3D { .release =3D bpf_raw_tp_link_release, - .dealloc =3D bpf_raw_tp_link_dealloc, + .dealloc_deferred =3D bpf_raw_tp_link_dealloc, .show_fdinfo =3D bpf_raw_tp_link_show_fdinfo, .fill_link_info =3D bpf_raw_tp_link_fill_link_info, }; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 7254c808b27c..989b6843069e 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2564,7 +2564,7 @@ static void bpf_kprobe_multi_link_dealloc(struct bpf_= link *link) =20 static const struct bpf_link_ops bpf_kprobe_multi_link_lops =3D { .release =3D bpf_kprobe_multi_link_release, - .dealloc =3D bpf_kprobe_multi_link_dealloc, + .dealloc_deferred =3D bpf_kprobe_multi_link_dealloc, }; =20 static void bpf_kprobe_multi_cookie_swap(void *a, void *b, int size, const= void *priv) --=20 2.34.1