From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15FFCC4332F for ; Wed, 12 Jan 2022 19:25:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344239AbiALTZz (ORCPT ); Wed, 12 Jan 2022 14:25:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242903AbiALTZx (ORCPT ); Wed, 12 Jan 2022 14:25:53 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85269C061748 for ; Wed, 12 Jan 2022 11:25:53 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id d127-20020a256885000000b006112fd779ddso6365630ybc.14 for ; Wed, 12 Jan 2022 11:25:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=XfvrIOiHNbk7sHm262Jen6JEO4ueUlQOWGJuAR4JJ6g=; b=BpGGHxCNjsjfxj+LWSId8tSqHJeUJNR4TF3w1wU9fZYnoFLi/zV2VEamgGffGHQoNi ZjqsDcYPUHVCPJYmQfkf8ymz0kwqN0MH7IaG4kniTfxwhWy6C+Asdt0Q/DGRQMBO7NP5 H6055Gt8NOzOtbWN3M/p/y7WqE26cloLI/gZjzOUbNup/fnVmC2kNvfcajV6E6MtdlSY 0o+bfp9a1wxet08J7x/wkAl0AiRjbqczzMHC9sg523txMx3gVPbfYmr1AXi0ly80QopQ 3xqlrGj1u8VM1IfMflWJMHJnHRPWBk2TioFZhQrrHZBpb/qE0JTAwU+9iK9YNFGsD5Cu 64Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=XfvrIOiHNbk7sHm262Jen6JEO4ueUlQOWGJuAR4JJ6g=; b=x42ychQotKk+0RgKBNluvpCjVCUpllATe6MAI1fgytIGzbJtD4TR5lRrUylT5NCOM/ j6VN/NmS2o5HSeFCnwEWI99ANya9gix+89JkH7y8Nkk7J3TQ1chMN17rpdb1V+63z7eq /9d39WKjnS3j3sOdjP2qgWxJJFpcf19cm3pttKlVsMmyedZvEuvMfkoySghys7orHb/u 4TuqqC++gQ4VpuKbTRsLfcSPxahEQM7GZa/5xmLMPD2IQNHVGydzwA7TX5oBNpl0vXgP qUDH6C5/Z5DTAmL0ELo+sNp/F0G3DBnCTjsrBx/OFfwVrFEECYJLvggw3HA1moX9wzLG YXGg== X-Gm-Message-State: AOAM530yo3OsF8tJ0i/aDUyS6r3Mkt7+5K4LLbgBFqhiVCusR96WCvQP SF4G3Jm3brVBRYI90p1EFOiNTVcAGEo= X-Google-Smtp-Source: ABdhPJz86scxJ+AvkKMvPVkmkCpbPPw8niBdO3dU6BUotXtFY+cJv17D3xQZlutDj1uKtNW31JLV8KXQc90= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a05:6902:152:: with SMTP id p18mr1503182ybh.85.1642015552792; Wed, 12 Jan 2022 11:25:52 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:40 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-2-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system. From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a new API called bpf_watch_inode() to watch the destruction of an inode and calls a registered callback function. With the help of this new API, one can implement pinning bpf objects in a non-bpf file system such as sockfs. The ability of pinning bpf objects in an external file system has potential uses: for example, allow using bpf programs to customize file behaviors, as we can see in the following patches. Extending the pinning logic in bpf_obj_do_pin() to associate bpf objects to inodes of another file system is relatively straightforward. The challenge is how to notify the bpf object when the associated inode is gone so that the object's refcnt can be decremented at that time. Bpffs uses .free_inode() callback in super_operations to drop object's refcnt. But not every file system implements .free_inode() and inserting bpf notification to every target file system can be too instrusive. Thanks to fsnotify, there is a generic callback in VFS that can be used to notify the events of an inode. bpf_watch_inode() implements on top of that. bpf_watch_inode() allows the caller to pass in a callback (for example, decrementing an object's refcnt), which will be called when the inode is about to be freed. So typically, one can implement exposing bpf objects to other file systems in the following steps: 1. extend bpf_obj_do_pin() to create a new entry in the target file system. 2. call bpf_watch_inode() to register bpf object put operation at the destruction of the newly created inode. Of course, on a system with no fsnotify support, pinning bpf object in non-bpf file system will not be available. Signed-off-by: Hao Luo --- kernel/bpf/inode.c | 118 ++++++++++++++++++++++++++++++++++++++++----- kernel/bpf/inode.h | 33 +++++++++++++ 2 files changed, 140 insertions(+), 11 deletions(-) create mode 100644 kernel/bpf/inode.h diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 80da1db47c68..b4066dd986a8 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -16,18 +16,13 @@ #include #include #include +#include #include #include #include #include #include "preload/bpf_preload.h" - -enum bpf_type { - BPF_TYPE_UNSPEC =3D 0, - BPF_TYPE_PROG, - BPF_TYPE_MAP, - BPF_TYPE_LINK, -}; +#include "inode.h" =20 static void *bpf_any_get(void *raw, enum bpf_type type) { @@ -67,6 +62,95 @@ static void bpf_any_put(void *raw, enum bpf_type type) } } =20 +#ifdef CONFIG_FSNOTIFY +/* Notification mechanism based on fsnotify, used in bpf to watch the + * destruction of an inode. This inode could an inode in bpffs or any + * other file system. + */ + +struct notify_mark { + struct fsnotify_mark fsn_mark; + const struct notify_ops *ops; + void *object; + enum bpf_type type; + void *priv; +}; + +struct fsnotify_group *bpf_notify_group; +struct kmem_cache *bpf_notify_mark_cachep __read_mostly; + +/* Handler for any inode event. */ +int handle_inode_event(struct fsnotify_mark *mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *file_name, u32 cookie) +{ + return 0; +} + +/* Handler for freeing marks. This is called when the watched inode is bei= ng + * freed. + */ +static void notify_freeing_mark(struct fsnotify_mark *mark, struct fsnotif= y_group *group) +{ + struct notify_mark *b_mark; + + b_mark =3D container_of(mark, struct notify_mark, fsn_mark); + + if (b_mark->ops && b_mark->ops->free_inode) + b_mark->ops->free_inode(b_mark->object, b_mark->type, b_mark->priv); +} + +static void notify_free_mark(struct fsnotify_mark *mark) +{ + struct notify_mark *b_mark; + + b_mark =3D container_of(mark, struct notify_mark, fsn_mark); + + kmem_cache_free(bpf_notify_mark_cachep, b_mark); +} + +struct fsnotify_ops bpf_notify_ops =3D { + .handle_inode_event =3D handle_inode_event, + .freeing_mark =3D notify_freeing_mark, + .free_mark =3D notify_free_mark, +}; + +static int bpf_inode_type(const struct inode *inode, enum bpf_type *type); + +/* Watch the destruction of an inode and calls the callbacks in the given + * notify_ops. + */ +int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops, voi= d *priv) +{ + enum bpf_type type; + struct notify_mark *b_mark; + int ret; + + if (IS_ERR(bpf_notify_group) || unlikely(!bpf_notify_mark_cachep)) + return -ENOMEM; + + b_mark =3D kmem_cache_alloc(bpf_notify_mark_cachep, GFP_KERNEL_ACCOUNT); + if (unlikely(!b_mark)) + return -ENOMEM; + + fsnotify_init_mark(&b_mark->fsn_mark, bpf_notify_group); + b_mark->ops =3D ops; + b_mark->priv =3D priv; + b_mark->object =3D inode->i_private; + bpf_inode_type(inode, &type); + b_mark->type =3D type; + + ret =3D fsnotify_add_inode_mark(&b_mark->fsn_mark, inode, + /*allow_dups=3D*/1); + + fsnotify_put_mark(&b_mark->fsn_mark); /* match get in fsnotify_init_mark = */ + + return ret; +} +#endif + +/* bpffs */ + static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type) { void *raw; @@ -435,11 +519,15 @@ static int bpf_iter_link_pin_kernel(struct dentry *pa= rent, return ret; } =20 +static bool dentry_is_bpf_dir(struct dentry *dentry) +{ + return d_inode(dentry)->i_op =3D=3D &bpf_dir_iops; +} + static int bpf_obj_do_pin(const char __user *pathname, void *raw, enum bpf_type type) { struct dentry *dentry; - struct inode *dir; struct path path; umode_t mode; int ret; @@ -454,8 +542,7 @@ static int bpf_obj_do_pin(const char __user *pathname, = void *raw, if (ret) goto out; =20 - dir =3D d_inode(path.dentry); - if (dir->i_op !=3D &bpf_dir_iops) { + if (!dentry_is_bpf_dir(path.dentry)) { ret =3D -EPERM; goto out; } @@ -821,8 +908,17 @@ static int __init bpf_init(void) return ret; =20 ret =3D register_filesystem(&bpf_fs_type); - if (ret) + if (ret) { sysfs_remove_mount_point(fs_kobj, "bpf"); + return ret; + } + +#ifdef CONFIG_FSNOTIFY + bpf_notify_mark_cachep =3D KMEM_CACHE(notify_mark, 0); + bpf_notify_group =3D fsnotify_alloc_group(&bpf_notify_ops); + if (IS_ERR(bpf_notify_group) || !bpf_notify_mark_cachep) + pr_warn("Failed to initialize bpf_notify system, user can not pin object= s outside bpffs.\n"); +#endif =20 return ret; } diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h new file mode 100644 index 000000000000..3f53a4542028 --- /dev/null +++ b/kernel/bpf/inode.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */ +/* Copyright (c) 2022 Google + */ +#ifndef __BPF_INODE_H_ +#define __BPF_INODE_H_ + +enum bpf_type { + BPF_TYPE_UNSPEC =3D 0, + BPF_TYPE_PROG, + BPF_TYPE_MAP, + BPF_TYPE_LINK, +}; + +struct notify_ops { + void (*free_inode)(void *object, enum bpf_type type, void *priv); +}; + +#ifdef CONFIG_FSNOTIFY +/* Watch the destruction of an inode and calls the callbacks in the given + * notify_ops. + */ +int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops, + void *priv); +#else +static inline +int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops, + void *priv) +{ + return -EPERM; +} +#endif // CONFIG_FSNOTIFY + +#endif // __BPF_INODE_H_ --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1262FC433EF for ; Wed, 12 Jan 2022 19:26:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344977AbiALT0E (ORCPT ); Wed, 12 Jan 2022 14:26:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344405AbiALTZ4 (ORCPT ); Wed, 12 Jan 2022 14:25:56 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B843BC061751 for ; Wed, 12 Jan 2022 11:25:55 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id s7-20020a5b0447000000b005fb83901511so6407548ybp.11 for ; Wed, 12 Jan 2022 11:25:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=WLIXWpZVKEATWJu5eTPegEpeW0twWlWVehQQ8+8FOxE=; b=IATn1j8XXNh5QnUCS84GAaGXz+wGJO5e1Sv0M/GDGavPxGQnUbuD3E9d46xu+SBdJt hLrvNtx2BAddqjacdEzxKiOkYqyCQWz2Y4JY4YBKylNriDljP0qiXBp8kWi2f5Zt7ZcX 0+r8UDoY5AMVstj2BMzBYFrRXeJqzwHcb5W+85FZUJh7U36MQvl+FewJUUdpovMVA26h 9rn9wkzFeiqkIWXQca+35HSWXqs3b4NXmtbs+3TSn2oUomsSQF/2PWBBb7MvDAJp6pKa ISHNbEFp0JA6XuzkMvPT4x9SH/yt1kQ+0OBWBbnKMXdKTUi736hh2+ap23pSr8VUxQsZ qqRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=WLIXWpZVKEATWJu5eTPegEpeW0twWlWVehQQ8+8FOxE=; b=wztfiBExS5ihQMczEH/1r1GQ9GoqAtIkBZoE0bdd1dMG7a3sK6GEgpQdWEgqz8wSsl oBW2f599OW8R4mRK5IhUw6lem5SWVnOD5KUdq/jFapADZyZq/zrp6DlM2n3iJe/PJYcu lATJEKKk0BzqPD/ZWLWJ43kN7irF6aNicbqCRhcrEbSbGFjda/cT4vqETuJoZPRpCbqm ZrAUDD5LpDsRpK3rn5KKzk2nSuRuhF9/pbEub+85tol4b1rg2DqZQW+Rs02q7jnc8Xu3 Tz8Ty1CViviY2plCsHe0oYOWbK4bmI7VK2NffrNahecExHNfUp3yG7Ll9qnRw/T3Z6ip 137Q== X-Gm-Message-State: AOAM533ZlrL7lPx9sgSdYk7Jje2pwOD9n6gGE0GL46TghzGqTYUDCP08 5MOrb1UCOKyQ9Vo2zpOx65kOwIEhUyE= X-Google-Smtp-Source: ABdhPJwMOcii6IXxp52+PEu8f3M6n+TRdKrFngmJTjckpFj8F483qBybtGPwQ7pD7cij3ywVHVjKm4YD6kY= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a5b:30e:: with SMTP id j14mr1720721ybp.60.1642015554999; Wed, 12 Jan 2022 11:25:54 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:41 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-3-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When an object is pinned in bpffs, record the bpffs inode in the object. The previous patch introduced bpf_watch_inode(), which can also be used to watch the bpffs inode. This capability will be used in the following patches to expose bpf objects to file systems where the nodes in the file system are not backed by an inode. Signed-off-by: Hao Luo --- include/linux/bpf.h | 5 +++- kernel/bpf/inode.c | 60 ++++++++++++++++++++++++++++++++++++++++++++- kernel/bpf/inode.h | 9 +++++++ 3 files changed, 72 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6e947cd91152..2ec693c3d6f6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -184,7 +184,8 @@ struct bpf_map { char name[BPF_OBJ_NAME_LEN]; bool bypass_spec_v1; bool frozen; /* write-once; write-protected by freeze_mutex */ - /* 14 bytes hole */ + struct inode *backing_inode; /* back pointer to the inode in bpffs */ + /* 6 bytes hole */ =20 /* The 3rd and 4th cacheline with misc members to avoid false sharing * particularly with refcounting. @@ -991,6 +992,7 @@ struct bpf_prog_aux { struct work_struct work; struct rcu_head rcu; }; + struct inode *backing_inode; /* back pointer to the inode in bpffs */ }; =20 struct bpf_array_aux { @@ -1018,6 +1020,7 @@ struct bpf_link { const struct bpf_link_ops *ops; struct bpf_prog *prog; struct work_struct work; + struct inode *backing_inode; /* back pointer to the inode in bpffs */ }; =20 struct bpf_link_ops { diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index b4066dd986a8..9ba10912cbf8 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -226,6 +226,57 @@ static int bpf_inode_type(const struct inode *inode, e= num bpf_type *type) return 0; } =20 +/* Conditionally set an object's backing inode. */ +static void cond_set_backing_inode(void *obj, enum bpf_type type, + struct inode *old, struct inode *new) +{ + struct inode **ptr; + + if (type =3D=3D BPF_TYPE_PROG) { + struct bpf_prog *prog =3D obj; + ptr =3D &prog->aux->backing_inode; + } else if (type =3D=3D BPF_TYPE_MAP) { + struct bpf_map *map =3D obj; + ptr =3D &map->backing_inode; + } else if (type =3D=3D BPF_TYPE_LINK) { + struct bpf_link *link =3D obj; + ptr =3D &link->backing_inode; + } else { + return; + } + + if (*ptr =3D=3D old) + *ptr =3D new; +} + +struct inode *get_backing_inode(void *obj, enum bpf_type type) +{ + struct inode *inode =3D NULL; + + if (type =3D=3D BPF_TYPE_PROG) { + struct bpf_prog *prog =3D obj; + inode =3D prog->aux->backing_inode; + } else if (type =3D=3D BPF_TYPE_MAP) { + struct bpf_map *map =3D obj; + inode =3D map->backing_inode; + } else if (type =3D=3D BPF_TYPE_LINK) { + struct bpf_link *link =3D obj; + inode =3D link->backing_inode; + } + + if (!inode) + return NULL; + + spin_lock(&inode->i_lock); + if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) { + spin_unlock(&inode->i_lock); + return NULL; + } + __iget(inode); + spin_unlock(&inode->i_lock); + return inode; +} + static void bpf_dentry_finalize(struct dentry *dentry, struct inode *inode, struct inode *dir) { @@ -418,6 +469,8 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t= mode, void *raw, { struct inode *dir =3D dentry->d_parent->d_inode; struct inode *inode =3D bpf_get_inode(dir->i_sb, dir, mode); + enum bpf_type type; + if (IS_ERR(inode)) return PTR_ERR(inode); =20 @@ -425,6 +478,9 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t= mode, void *raw, inode->i_fop =3D fops; inode->i_private =3D raw; =20 + if (!bpf_inode_type(inode, &type)) + cond_set_backing_inode(raw, type, NULL, inode); + bpf_dentry_finalize(dentry, inode, dir); return 0; } @@ -703,8 +759,10 @@ static void bpf_free_inode(struct inode *inode) =20 if (S_ISLNK(inode->i_mode)) kfree(inode->i_link); - if (!bpf_inode_type(inode, &type)) + if (!bpf_inode_type(inode, &type)) { + cond_set_backing_inode(inode->i_private, type, inode, NULL); bpf_any_put(inode->i_private, type); + } free_inode_nonrcu(inode); } =20 diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h index 3f53a4542028..e7fe8137be80 100644 --- a/kernel/bpf/inode.h +++ b/kernel/bpf/inode.h @@ -30,4 +30,13 @@ int bpf_watch_inode(struct inode *inode, const struct no= tify_ops *ops, } #endif // CONFIG_FSNOTIFY =20 +/* Get the backing inode of a bpf object. When an object is pinned in bpf + * file system, an inode is associated with the object. This function retu= rns + * that inode. + * + * On success, the inode is returned with refcnt incremented. + * On failure, NULL is returned. + */ +struct inode *get_backing_inode(void *obj, enum bpf_type); + #endif // __BPF_INODE_H_ --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47ADCC433EF for ; Wed, 12 Jan 2022 19:26:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345125AbiALT00 (ORCPT ); Wed, 12 Jan 2022 14:26:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344475AbiALTZ6 (ORCPT ); Wed, 12 Jan 2022 14:25:58 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB272C061757 for ; Wed, 12 Jan 2022 11:25:57 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id v48-20020a25abb3000000b006113ce63ed8so6295107ybi.22 for ; Wed, 12 Jan 2022 11:25:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LRC0uuW3m4fIg7GXmCy19/8orrmcRboRMIkgNo7+8QE=; b=BDQlkIvGJNrnOdBSxgvKFbzGLDHbyHHmxAuDROjE7i4p66m8UjN70MDMXFevKm3xj7 bMpc6PAuyYclO9cySCH7WogBbaVySkoZOR6d3+diP64RFrpGKhzOhIMwdi46zmegrPrh 0yXscwuAOsr8Caqk39ADU5s5lYWWIeWSOBxF4pB69qPuhejcqHtfOnpVVwz+HDaQaziV dEzM++JZApxAgECZjA1lCOVvzZoFFpnfS8vKv9A9bUSpMrBcekYFsNh4HSMg+kN2cQgL dnZBRklVoZptesOGHzgzW236UKpGlTOM30z/KXetspZUBfCxyvUq2e/wobcOvK26qOjN RuZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LRC0uuW3m4fIg7GXmCy19/8orrmcRboRMIkgNo7+8QE=; b=JKipIAL+gWmW6ERhuoAc8I+lO4fXSoz6pX7SP5Gq/CE9gMR7IDhywbkWor0za3xg0i XhvSEm6QqiJ2enXgdLHMbzYyFkIB2Ct00mHroi2WS3c5EWacVZLzdAEQZ78Nx2hbgXch VNCNjrm1fJzAGHA21DZC9TMqkhdGocvIzlIZMQMQHSw6HPskXh9TdzidWGaQMKdbIcy4 HRpTBwIoXfDd5Q+FzohrVgi+8eI9Gp1IVQBZjf/1bHKrcfXUJo4+qGl7N7CVNCmwXnha 63zw7EW1oWIo+bkcenNYSn/j5aNwQq2S2WeW78ldgvl7aRqGJaqWUBT1Wn15xmnG78jp 3Pfw== X-Gm-Message-State: AOAM532DfPPMTXyAvr+0HktUWY9neYAwWr5h/oDJBomo545lA/WnL5pH 0VMOp8Yl1jdgHu8BNfbyNGMxXTo98NE= X-Google-Smtp-Source: ABdhPJwj2GOVpNkvwBmPDtA+eRlkg506Io+Vs6y7rJccAqLJtpUN5FZy4qeHEWl7hyQsWmuv/ScEEIbL6Yg= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a25:b906:: with SMTP id x6mr1664253ybj.372.1642015557103; Wed, 12 Jan 2022 11:25:57 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:42 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-4-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch extends bpf_obj_do_pin() to allow creating a new entry in kernfs which references a bpf object. Different from pinning objects in bpffs, the created kernfs node does not hold an extra reference to the object, because kernfs by itself doesn't have a notification mechanism to put the object when the kernfs node is gone. Therefore this patch is not "pinning" the object, but rather "exposing" the object in kernfs. The lifetime of the created kernfs node depends on the lifetime of the bpf object, not the other way around. More specifically, we allow a bpf object to be exposed to kernfs only after it becomes "persistent" by pinning in bpffs. So the lifetime of the created kernfs node is tied to the bpffs inode. When the object is unpinned from bpffs, the kernfs nodes exposing the bpf object will be removed automatically. It uses the bpf_watch_inode() interface introduced in the previous patches. Because the kernfs nodes do not hold extra references to the object, we can remove the nodes at any time without worrying about reference leak. Signed-off-by: Hao Luo --- kernel/bpf/Makefile | 2 +- kernel/bpf/inode.c | 43 +++++++++++++------- kernel/bpf/inode.h | 11 ++++- kernel/bpf/kernfs_node.c | 87 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 126 insertions(+), 17 deletions(-) create mode 100644 kernel/bpf/kernfs_node.c diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index c1a9be6a4b9f..b1abf0d94b5b 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -8,7 +8,7 @@ CFLAGS_core.o +=3D $(call cc-disable-warning, override-init= ) $(cflags-nogcse-yy) =20 obj-$(CONFIG_BPF_SYSCALL) +=3D syscall.o verifier.o inode.o helpers.o tnum= .o bpf_iter.o map_iter.o task_iter.o prog_iter.o obj-$(CONFIG_BPF_SYSCALL) +=3D hashtab.o arraymap.o percpu_freelist.o bpf_= lru_list.o lpm_trie.o map_in_map.o bloom_filter.o -obj-$(CONFIG_BPF_SYSCALL) +=3D local_storage.o queue_stack_maps.o ringbuf.o +obj-$(CONFIG_BPF_SYSCALL) +=3D local_storage.o queue_stack_maps.o ringbuf.= o kernfs_node.o obj-$(CONFIG_BPF_SYSCALL) +=3D bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} +=3D bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) +=3D disasm.o diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 9ba10912cbf8..7e93e477b57c 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -580,6 +580,21 @@ static bool dentry_is_bpf_dir(struct dentry *dentry) return d_inode(dentry)->i_op =3D=3D &bpf_dir_iops; } =20 +static int bpf_obj_do_pin_generic(struct dentry *dentry, umode_t mode, + void *obj, enum bpf_type type) +{ + switch (type) { + case BPF_TYPE_PROG: + return vfs_mkobj(dentry, mode, bpf_mkprog, obj); + case BPF_TYPE_MAP: + return vfs_mkobj(dentry, mode, bpf_mkmap, obj); + case BPF_TYPE_LINK: + return vfs_mkobj(dentry, mode, bpf_mklink, obj); + default: + return -EPERM; + } +} + static int bpf_obj_do_pin(const char __user *pathname, void *raw, enum bpf_type type) { @@ -598,22 +613,20 @@ static int bpf_obj_do_pin(const char __user *pathname= , void *raw, if (ret) goto out; =20 - if (!dentry_is_bpf_dir(path.dentry)) { - ret =3D -EPERM; - goto out; - } + if (dentry_is_kernfs_dir(path.dentry)) { + ret =3D bpf_obj_do_pin_kernfs(dentry, mode, raw, type); =20 - switch (type) { - case BPF_TYPE_PROG: - ret =3D vfs_mkobj(dentry, mode, bpf_mkprog, raw); - break; - case BPF_TYPE_MAP: - ret =3D vfs_mkobj(dentry, mode, bpf_mkmap, raw); - break; - case BPF_TYPE_LINK: - ret =3D vfs_mkobj(dentry, mode, bpf_mklink, raw); - break; - default: + /* Match bpf_fd_probe_obj(). bpf objects exposed to kernfs + * do not hold an active reference. The lifetime of the + * created kernfs node is tied to an inode in bpffs. So the + * kernfs node gets destroyed automatically when the object + * is unpinned from bpffs. + */ + if (ret =3D=3D 0) + bpf_any_put(raw, type); + } else if (dentry_is_bpf_dir(path.dentry)) { + ret =3D bpf_obj_do_pin_generic(dentry, mode, raw, type); + } else { ret =3D -EPERM; } out: diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h index e7fe8137be80..c12d385a3e2a 100644 --- a/kernel/bpf/inode.h +++ b/kernel/bpf/inode.h @@ -4,8 +4,10 @@ #ifndef __BPF_INODE_H_ #define __BPF_INODE_H_ =20 +#include + enum bpf_type { - BPF_TYPE_UNSPEC =3D 0, + BPF_TYPE_UNSPEC =3D 0, BPF_TYPE_PROG, BPF_TYPE_MAP, BPF_TYPE_LINK, @@ -39,4 +41,11 @@ int bpf_watch_inode(struct inode *inode, const struct no= tify_ops *ops, */ struct inode *get_backing_inode(void *obj, enum bpf_type); =20 +/* Test whether a given dentry is a kernfs entry. */ +bool dentry_is_kernfs_dir(struct dentry *dentry); + +/* Expose bpf object to kernfs. Requires dentry to be in kernfs. */ +int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t mode, void *obj, + enum bpf_type type); + #endif // __BPF_INODE_H_ diff --git a/kernel/bpf/kernfs_node.c b/kernel/bpf/kernfs_node.c new file mode 100644 index 000000000000..c1c45f7b948b --- /dev/null +++ b/kernel/bpf/kernfs_node.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Expose eBPF objects in kernfs file system. + */ + +#include +#include +#include "inode.h" + +/* file_operations for kernfs file system */ + +/* Handler when the watched inode is freed. */ +static void kn_watch_free_inode(void *obj, enum bpf_type type, void *kn) +{ + kernfs_remove(kn); + + /* match get in bpf_obj_do_pin_kernfs */ + kernfs_put(kn); +} + +static const struct notify_ops notify_ops =3D { + .free_inode =3D kn_watch_free_inode, +}; + +/* Kernfs file operations for bpf created files. */ +static const struct kernfs_ops bpf_generic_ops =3D { +}; + +/* Test whether a given dentry is a kernfs entry. */ +bool dentry_is_kernfs_dir(struct dentry *dentry) +{ + return kernfs_node_from_dentry(dentry) !=3D NULL; +} + +/* Expose bpf object to kernfs. Requires dentry to exist in kernfs. */ +int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t mode, void *obj, + enum bpf_type type) +{ + struct dentry *parent_dentry; + struct super_block *sb; + struct kernfs_node *parent_kn, *kn; + struct kernfs_root *root; + const struct kernfs_ops *ops; + struct inode *inode; + int ret; + + sb =3D dentry->d_sb; + root =3D kernfs_root_from_sb(sb); + if (!root) /* Not a kernfs file system. */ + return -EPERM; + + parent_dentry =3D dentry->d_parent; + parent_kn =3D kernfs_node_from_dentry(parent_dentry); + if (WARN_ON(!parent_kn)) + return -EPERM; + + inode =3D get_backing_inode(obj, type); + if (!inode) + return -ENXIO; + + ops =3D &bpf_generic_ops; + kn =3D __kernfs_create_file(parent_kn, dentry->d_iname, mode, + GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, + 0, ops, inode, NULL, NULL); + if (IS_ERR(kn)) { + iput(inode); + return PTR_ERR(kn); + } + + /* hold an active kn by bpffs inode. */ + kernfs_get(kn); + + /* Watch the backing inode of the object in bpffs. When the backing + * inode is freed, the created kernfs entry will be removed as well. + */ + ret =3D bpf_watch_inode(inode, ¬ify_ops, kn); + if (ret) { + kernfs_put(kn); + kernfs_remove(kn); + iput(inode); + return ret; + } + + kernfs_activate(kn); + iput(inode); + return 0; +} --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F37FBC433EF for ; Wed, 12 Jan 2022 19:26:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345452AbiALT0q (ORCPT ); Wed, 12 Jan 2022 14:26:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344337AbiALT0A (ORCPT ); Wed, 12 Jan 2022 14:26:00 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15614C061748 for ; Wed, 12 Jan 2022 11:26:00 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id i81-20020a253b54000000b00611b1da1f8fso1550754yba.5 for ; Wed, 12 Jan 2022 11:26:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=sXsPIbfQ05P0repzOjAvu6+3E/heYEca8fbLMQGQFvE=; b=OVkw/OOQFATmzm1SJea+QKvbMcvbrUZuyRpLKQiqbaK5v2Tnvmu+BZ0ots7zJCdUHU z8ccxQc7x69F99FJgAIRsTcsBG1ISQK7GavxRUeCpdJp7zfJW1nCzASnhVjx4Ng0xmIe C44vlOxJa6I+pJxIfsk1hiZt+lu6sBOAUI3WPPBvbN6UiS7GG3CdDzAQ0O1c8uvnAXai O6Hd4heEx0/z5vJzDL1M5KdGMZttUn5ReRHQMQmSA+7ykquom4vhDJv0DStOsLAorR2N csWjLeU6++lgIkjcyWgPfzztPqg29O6fOEN4A+Bxq1bMisAZj/obE5R8IY0i+SKCxSbH WzTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=sXsPIbfQ05P0repzOjAvu6+3E/heYEca8fbLMQGQFvE=; b=y7uGJ38ja31wxc0UWPgOxG7nF1qzgnay4qjoRB0rUmFoiIIhU4SqPTZ8pLWgL6eQlc uhdAnq704iEV6udi0hyD+RvWtxuIauUWhaEuDZ0lzDJ9eh15q4gLtCGcWl0nbT8jtXi/ PtaQQM7gJ6cPxXjEk18WFgp9DpVCHWfvv2tUQuXrh1ighdNjmck2bN1uGSB/NzoVd3rs EjFUnl3VXJg18+pNwXb6j8OzsoeVlOW7fLpL3l7kJsKHpMDx3Z4A7JI8HjNIW/Tvig9h iwcyYPjQgFT+zhKEm/9EMbYb+oqEYrTdC+li8ZOP6/1yKQV32Ta+1hb1X+TIFjK1k2VH 70/A== X-Gm-Message-State: AOAM530FLKPhMRDVQyx8qphrYeSQaUQ9rLb9+2QAYfCy11F3ZSodBjG6 0K8gDjPV7e5BlQkBaZeYYygVJpWvjZg= X-Google-Smtp-Source: ABdhPJxYm5u9VpP7OA8U6j16gepHc9xNcbC+H/aOGc5oUb0spGUr2hCdgYEzAcb2fZU4jMdkNLKi5RoNLG4= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a25:ac24:: with SMTP id w36mr656053ybi.610.1642015559343; Wed, 12 Jan 2022 11:25:59 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:43 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-5-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 4/8] bpf: Support removing kernfs entries From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a bpf object has been exposed in kernfs, there should be a way to remove it. Kernfs doesn't implement unlink, therefore one can not remove the entry in a normal way. To remove the file, we can allow writing a special command to the new entry, which can trigger a remove_self() for removal. So far there are two ways to remove an entry that is created by pinning bpf objects in kernfs: 1. unpin the object from bpffs. 2. write a special command to the kernfs entry. Signed-off-by: Hao Luo --- kernel/bpf/kernfs_node.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/kernel/bpf/kernfs_node.c b/kernel/bpf/kernfs_node.c index c1c45f7b948b..3d331d8357db 100644 --- a/kernel/bpf/kernfs_node.c +++ b/kernel/bpf/kernfs_node.c @@ -9,6 +9,9 @@ =20 /* file_operations for kernfs file system */ =20 +/* Command for removing a kernfs entry */ +#define REMOVE_CMD "rm" + /* Handler when the watched inode is freed. */ static void kn_watch_free_inode(void *obj, enum bpf_type type, void *kn) { @@ -22,8 +25,27 @@ static const struct notify_ops notify_ops =3D { .free_inode =3D kn_watch_free_inode, }; =20 +static ssize_t bpf_generic_write(struct kernfs_open_file *of, char *buf, + size_t bytes, loff_t off) +{ + if (sysfs_streq(buf, REMOVE_CMD)) { + kernfs_remove_self(of->kn); + return bytes; + } + + return -EINVAL; +} + +static ssize_t bpf_generic_read(struct kernfs_open_file *of, char *buf, + size_t bytes, loff_t off) +{ + return -EIO; +} + /* Kernfs file operations for bpf created files. */ static const struct kernfs_ops bpf_generic_ops =3D { + .write =3D bpf_generic_write, + .read =3D bpf_generic_read, }; =20 /* Test whether a given dentry is a kernfs entry. */ --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E94DC433FE for ; Wed, 12 Jan 2022 19:27:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344979AbiALT1H (ORCPT ); Wed, 12 Jan 2022 14:27:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344974AbiALT0E (ORCPT ); Wed, 12 Jan 2022 14:26:04 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47374C06175D for ; Wed, 12 Jan 2022 11:26:02 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id i81-20020a253b54000000b00611b1da1f8fso1550934yba.5 for ; Wed, 12 Jan 2022 11:26:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Z0MPDtD7Ryvxqv7atBRsBBIy5nWrE/2KAVK5YgOTQ1o=; b=Y/rDKGvA1zU/2l2jSB1FQkz0ORNXDNmIo4kXk6glG+tVWdLtlP7bkyGRwfJL3X+s0e 3taEToryp1xi/VxVb1mx1v76iHXKy5dRuDWSG8d16cXbyFFAxLKGfS49DbpkATcwjKga v80aZGBttHMQj4OZbOaU2Kj1emczuwnE85Q5EJGcm6ixK8tgMWz1uMFjf1A5D9auyS8e IjsJcUyE8PvetyY2/xDAqvhnDXcMqKyBKahPl9ojlR9je3h+4wdr9EhIX3dEnmANtjiS aaPWs/R4tOy/DVYxgcU+++9wXhpJeLQxqPEoNPG0XHsNxa1aQMGHAelZxNijFQvJiQJq aH/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Z0MPDtD7Ryvxqv7atBRsBBIy5nWrE/2KAVK5YgOTQ1o=; b=XANzmZM3NW9LL2GWDtN1Qz17nSJ//Ico7I/mmNWRb0V8BqvALSzDDocqYuavP8cBcA d8Ts7LiiMDVE29uvnK/NVqonJVlQMKliCAU54ewgBjWy6tEK6+K7gqhrPh0LKZnPMj3W kE/PImuPQ3+VpWKoLBSkcjM4sGy+FVFuvQxGks2pNALe9Fc7ocAm02jbLOBTYUi24f/B d3tTGfZosHtP0KCDdUiug0A1xt4uz9/cp156N9ol29Wa3DMl3CLzJrcR+MgiXVwbRf6k /lioWwkt7XOkc8j/A634FrJtezev2B+AxRFArYtr4oEmA3G1a8t6qu10gEQmdrvBljBJ Olfw== X-Gm-Message-State: AOAM530Qqqg5WDW9vUDjfknafj4VUU9N7bA5AcizNE/f9QYKQNCnccGc YxefoGHSFliLVdDvPZcqlwvhw5Q9fYc= X-Google-Smtp-Source: ABdhPJyYp//SvbflxSGl2XZ8nQnWCuFrItRN1OUlt9FX91g8kQyoZ+0I+ftRW3j9y7YA4OJ1yZ+2/T7y72E= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a25:34d2:: with SMTP id b201mr1585039yba.324.1642015561516; Wed, 12 Jan 2022 11:26:01 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:44 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-6-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view. From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a new program type called "bpf_view", which can be used to print out a kernel object's state to a seq file. So the signature of this program consists of two parameters: a seq file and a kernel object. Currently only 'struct cgroup' is supported. The following patches will introduce a call site for this program type and allow users to customize the format of printing out the state of kernel objects to userspace. Signed-off-by: Hao Luo --- include/linux/bpf.h | 4 + include/uapi/linux/bpf.h | 2 + kernel/bpf/Makefile | 2 +- kernel/bpf/bpf_view.c | 179 +++++++++++++++++++++++++++++++++ kernel/bpf/bpf_view.h | 24 +++++ kernel/bpf/syscall.c | 3 + kernel/bpf/verifier.c | 6 ++ kernel/trace/bpf_trace.c | 12 ++- tools/include/uapi/linux/bpf.h | 2 + 9 files changed, 230 insertions(+), 4 deletions(-) create mode 100644 kernel/bpf/bpf_view.c create mode 100644 kernel/bpf/bpf_view.h diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2ec693c3d6f6..16f582dfff7e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1622,6 +1622,10 @@ void bpf_iter_map_show_fdinfo(const struct bpf_iter_= aux_info *aux, int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info); =20 +bool bpf_view_prog_supported(struct bpf_prog *prog); +int bpf_view_link_attach(const union bpf_attr *attr, bpfptr_t uattr, + struct bpf_prog *prog); + int map_set_for_each_callback_args(struct bpf_verifier_env *env, struct bpf_func_state *caller, struct bpf_func_state *callee); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b0383d371b9a..efa0f21d13ba 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -982,6 +982,7 @@ enum bpf_attach_type { BPF_MODIFY_RETURN, BPF_LSM_MAC, BPF_TRACE_ITER, + BPF_TRACE_VIEW, BPF_CGROUP_INET4_GETPEERNAME, BPF_CGROUP_INET6_GETPEERNAME, BPF_CGROUP_INET4_GETSOCKNAME, @@ -1009,6 +1010,7 @@ enum bpf_link_type { BPF_LINK_TYPE_NETNS =3D 5, BPF_LINK_TYPE_XDP =3D 6, BPF_LINK_TYPE_PERF_EVENT =3D 7, + BPF_LINK_TYPE_VIEW =3D 8, =20 MAX_BPF_LINK_TYPE, }; diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index b1abf0d94b5b..c662734d83c5 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -8,7 +8,7 @@ CFLAGS_core.o +=3D $(call cc-disable-warning, override-init= ) $(cflags-nogcse-yy) =20 obj-$(CONFIG_BPF_SYSCALL) +=3D syscall.o verifier.o inode.o helpers.o tnum= .o bpf_iter.o map_iter.o task_iter.o prog_iter.o obj-$(CONFIG_BPF_SYSCALL) +=3D hashtab.o arraymap.o percpu_freelist.o bpf_= lru_list.o lpm_trie.o map_in_map.o bloom_filter.o -obj-$(CONFIG_BPF_SYSCALL) +=3D local_storage.o queue_stack_maps.o ringbuf.= o kernfs_node.o +obj-$(CONFIG_BPF_SYSCALL) +=3D local_storage.o queue_stack_maps.o ringbuf.= o kernfs_node.o bpf_view.o obj-$(CONFIG_BPF_SYSCALL) +=3D bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} +=3D bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) +=3D disasm.o diff --git a/kernel/bpf/bpf_view.c b/kernel/bpf/bpf_view.c new file mode 100644 index 000000000000..967a9240bab4 --- /dev/null +++ b/kernel/bpf/bpf_view.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include "bpf_view.h" + +static struct list_head targets =3D LIST_HEAD_INIT(targets); + +/* bpf_view_link operations */ + +struct bpf_view_target_info { + struct list_head list; + const char *target; + u32 ctx_arg_info_size; + struct bpf_ctx_arg_aux ctx_arg_info[BPF_VIEW_CTX_ARG_MAX]; + u32 btf_id; +}; + +struct bpf_view_link { + struct bpf_link link; + struct bpf_view_target_info *tinfo; +}; + +static void bpf_view_link_release(struct bpf_link *link) +{ +} + +static void bpf_view_link_dealloc(struct bpf_link *link) +{ + struct bpf_view_link *view_link =3D + container_of(link, struct bpf_view_link, link); + kfree(view_link); +} + +static void bpf_view_link_show_fdinfo(const struct bpf_link *link, + struct seq_file *seq) +{ + struct bpf_view_link *view_link =3D + container_of(link, struct bpf_view_link, link); + + seq_printf(seq, "attach_target:\t%s\n", view_link->tinfo->target); +} + +static const struct bpf_link_ops bpf_view_link_lops =3D { + .release =3D bpf_view_link_release, + .dealloc =3D bpf_view_link_dealloc, + .show_fdinfo =3D bpf_view_link_show_fdinfo, +}; + +bool bpf_link_is_view(struct bpf_link *link) +{ + return link->ops =3D=3D &bpf_view_link_lops; +} + +int bpf_view_link_attach(const union bpf_attr *attr, bpfptr_t uattr, + struct bpf_prog *prog) +{ + struct bpf_link_primer link_primer; + struct bpf_view_target_info *tinfo; + struct bpf_view_link *link; + u32 prog_btf_id; + bool existed =3D false; + int err; + + prog_btf_id =3D prog->aux->attach_btf_id; + list_for_each_entry(tinfo, &targets, list) { + if (tinfo->btf_id =3D=3D prog_btf_id) { + existed =3D true; + break; + } + } + if (!existed) + return -ENOENT; + + link =3D kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN); + if (!link) + return -ENOMEM; + + bpf_link_init(&link->link, BPF_LINK_TYPE_VIEW, &bpf_view_link_lops, prog); + link->tinfo =3D tinfo; + err =3D bpf_link_prime(&link->link, &link_primer); + if (err) { + kfree(link); + return err; + } + + return bpf_link_settle(&link_primer); +} + +int run_view_prog(struct bpf_prog *prog, void *ctx) +{ + int ret; + + rcu_read_lock(); + migrate_disable(); + ret =3D bpf_prog_run(prog, ctx); + migrate_enable(); + rcu_read_unlock(); + + return ret; +} + +bool bpf_view_prog_supported(struct bpf_prog *prog) +{ + const char *attach_fname =3D prog->aux->attach_func_name; + const char *prefix =3D BPF_VIEW_FUNC_PREFIX; + u32 prog_btf_id =3D prog->aux->attach_btf_id; + struct bpf_view_target_info *tinfo; + int prefix_len =3D strlen(prefix); + bool supported =3D false; + + if (strncmp(attach_fname, prefix, prefix_len)) + return false; + + list_for_each_entry(tinfo, &targets, list) { + if (tinfo->btf_id && tinfo->btf_id =3D=3D prog_btf_id) { + supported =3D true; + break; + } + if (!strcmp(attach_fname + prefix_len, tinfo->target)) { + tinfo->btf_id =3D prog->aux->attach_btf_id; + supported =3D true; + break; + } + } + if (supported) { + prog->aux->ctx_arg_info_size =3D tinfo->ctx_arg_info_size; + prog->aux->ctx_arg_info =3D tinfo->ctx_arg_info; + } + return supported; +} + +/* Generate BTF_IDs */ +BTF_ID_LIST(bpf_view_btf_ids) +BTF_ID(struct, seq_file) +BTF_ID(struct, cgroup) + +/* Index of bpf_view_btf_ids */ +enum { + BTF_ID_SEQ_FILE =3D 0, + BTF_ID_CGROUP, +}; + +static void register_bpf_view_target(struct bpf_view_target_info *target, + int idx[BPF_VIEW_CTX_ARG_MAX]) +{ + int i; + + for (i =3D 0; i < target->ctx_arg_info_size; ++i) + target->ctx_arg_info[i].btf_id =3D bpf_view_btf_ids[idx[i]]; + + INIT_LIST_HEAD(&target->list); + list_add(&target->list, &targets); +} + +DEFINE_BPF_VIEW_FUNC(cgroup, struct seq_file *seq, struct cgroup *cgroup) + +static struct bpf_view_target_info cgroup_view_tinfo =3D { + .target =3D "cgroup", + .ctx_arg_info_size =3D 2, + .ctx_arg_info =3D { + { offsetof(struct bpf_view_cgroup_ctx, seq), PTR_TO_BTF_ID }, + { offsetof(struct bpf_view_cgroup_ctx, cgroup), PTR_TO_BTF_ID }, + }, + .btf_id =3D 0, +}; + +static int __init bpf_view_init(void) +{ + int cgroup_view_idx[BPF_VIEW_CTX_ARG_MAX] =3D { + BTF_ID_SEQ_FILE, BTF_ID_CGROUP }; + + register_bpf_view_target(&cgroup_view_tinfo, cgroup_view_idx); + + return 0; +} +late_initcall(bpf_view_init); + diff --git a/kernel/bpf/bpf_view.h b/kernel/bpf/bpf_view.h new file mode 100644 index 000000000000..1a1110a5727f --- /dev/null +++ b/kernel/bpf/bpf_view.h @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef _BPF_VIEW_H_ +#define _BPF_VIEW_H_ + +#include + +#define BPF_VIEW_FUNC_PREFIX "bpf_view_" +#define DEFINE_BPF_VIEW_FUNC(target, args...) \ + extern int bpf_view_ ## target(args); \ + int __init bpf_view_ ## target(args) { return 0; } + +#define BPF_VIEW_CTX_ARG_MAX 2 + +struct bpf_view_cgroup_ctx { + __bpf_md_ptr(struct seq_file *, seq); + __bpf_md_ptr(struct cgroup *, cgroup); +}; + +bool bpf_link_is_view(struct bpf_link *link); + +/* Run a bpf_view program */ +int run_view_prog(struct bpf_prog *prog, void *ctx); + +#endif // _BPF_VIEW_H_ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index fa4505f9b611..32ac84d3ac0b 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3175,6 +3175,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_= type) case BPF_CGROUP_SETSOCKOPT: return BPF_PROG_TYPE_CGROUP_SOCKOPT; case BPF_TRACE_ITER: + case BPF_TRACE_VIEW: return BPF_PROG_TYPE_TRACING; case BPF_SK_LOOKUP: return BPF_PROG_TYPE_SK_LOOKUP; @@ -4235,6 +4236,8 @@ static int tracing_bpf_link_attach(const union bpf_at= tr *attr, bpfptr_t uattr, =20 if (prog->expected_attach_type =3D=3D BPF_TRACE_ITER) return bpf_iter_link_attach(attr, uattr, prog); + else if (prog->expected_attach_type =3D=3D BPF_TRACE_VIEW) + return bpf_view_link_attach(attr, uattr, prog); else if (prog->type =3D=3D BPF_PROG_TYPE_EXT) return bpf_tracing_prog_attach(prog, attr->link_create.target_fd, diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index bfb45381fb3f..ce7816519c93 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9770,6 +9770,7 @@ static int check_return_code(struct bpf_verifier_env = *env) case BPF_MODIFY_RETURN: return 0; case BPF_TRACE_ITER: + case BPF_TRACE_VIEW: break; default: return -ENOTSUPP; @@ -13971,6 +13972,7 @@ int bpf_check_attach_target(struct bpf_verifier_log= *log, =20 break; case BPF_TRACE_ITER: + case BPF_TRACE_VIEW: if (!btf_type_is_func(t)) { bpf_log(log, "attach_btf_id %u is not a function\n", btf_id); @@ -14147,6 +14149,10 @@ static int check_attach_btf_id(struct bpf_verifier= _env *env) if (!bpf_iter_prog_supported(prog)) return -EINVAL; return 0; + } else if (prog->expected_attach_type =3D=3D BPF_TRACE_VIEW) { + if (!bpf_view_prog_supported(prog)) + return -EINVAL; + return 0; } =20 if (prog->type =3D=3D BPF_PROG_TYPE_LSM) { diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 21aa30644219..9413b5af6e2c 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1630,6 +1630,12 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, con= st struct bpf_prog *prog) } } =20 +static inline bool prog_support_seq_helpers(const struct bpf_prog *prog) +{ + return prog->expected_attach_type =3D=3D BPF_TRACE_ITER || + prog->expected_attach_type =3D=3D BPF_TRACE_VIEW; +} + const struct bpf_func_proto * tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *p= rog) { @@ -1663,15 +1669,15 @@ tracing_prog_func_proto(enum bpf_func_id func_id, c= onst struct bpf_prog *prog) return &bpf_get_socket_ptr_cookie_proto; #endif case BPF_FUNC_seq_printf: - return prog->expected_attach_type =3D=3D BPF_TRACE_ITER ? + return prog_support_seq_helpers(prog) ? &bpf_seq_printf_proto : NULL; case BPF_FUNC_seq_write: - return prog->expected_attach_type =3D=3D BPF_TRACE_ITER ? + return prog_support_seq_helpers(prog) ? &bpf_seq_write_proto : NULL; case BPF_FUNC_seq_printf_btf: - return prog->expected_attach_type =3D=3D BPF_TRACE_ITER ? + return prog_support_seq_helpers(prog) ? &bpf_seq_printf_btf_proto : NULL; case BPF_FUNC_d_path: diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b0383d371b9a..efa0f21d13ba 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -982,6 +982,7 @@ enum bpf_attach_type { BPF_MODIFY_RETURN, BPF_LSM_MAC, BPF_TRACE_ITER, + BPF_TRACE_VIEW, BPF_CGROUP_INET4_GETPEERNAME, BPF_CGROUP_INET6_GETPEERNAME, BPF_CGROUP_INET4_GETSOCKNAME, @@ -1009,6 +1010,7 @@ enum bpf_link_type { BPF_LINK_TYPE_NETNS =3D 5, BPF_LINK_TYPE_XDP =3D 6, BPF_LINK_TYPE_PERF_EVENT =3D 7, + BPF_LINK_TYPE_VIEW =3D 8, =20 MAX_BPF_LINK_TYPE, }; --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17537C433F5 for ; Wed, 12 Jan 2022 19:27:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345135AbiALT1Z (ORCPT ); Wed, 12 Jan 2022 14:27:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345148AbiALT0a (ORCPT ); Wed, 12 Jan 2022 14:26:30 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FB68C06175E for ; Wed, 12 Jan 2022 11:26:05 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id y10-20020a2586ca000000b006116aaeeee6so6228651ybm.21 for ; Wed, 12 Jan 2022 11:26:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=bFDryb9jb8NTEZANKT/JVNtftTgumdYcR8KASxgiLpk=; b=ZGXT1UObup073swIqmO5+1QRXtl9hmA/6NTt0hfO179rFjFvCw8fqGDyKY+tb7Fvja Aa2bVZRYiHFHSDw9aLiHJfvYqKowJfz7NyG28e7krCflCqlayrV475tQlF+tqyXyQBf7 uCUSYpL6/BK5oT4iRcVeMB2QEm4FksCN3L7obXYzpe8xUjvicHFkhatArw5O/MrHjeHl CNuVabGmHM6mvE/pkR/tFg3cAqrUnvhsW3qdv+UB0l20//61MyAwswJjujSG7mVLDxGR vDo98OFhtrlwOXwQdsBRJ8Q5gm5fGBPzehZCeIW5LxHZSL8Qiz9h3blFu84b6S2V9bvp n9gQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=bFDryb9jb8NTEZANKT/JVNtftTgumdYcR8KASxgiLpk=; b=qHc0517wd0iV8GFS7QnCyRSt8N8ur5g3Y6D9Hqlfu/Llona0c4e8lVUJZyQPNy1quk 21tkSdoxII7fzl7H16c8tk4IAlF5ic5Bcyo2xW9OqMTw7GJ3lIl1h3tX+dyGvpLDILyN wb7b5TUdrv1SkxY92Y5XjIdOq+Xgsu3ctj1fgqqyNftZCwJKsxOpLPsNPRKJ9SfnDYzt 6q9Ve/zBd/eJE/C3lmCnVGAwpM3hKw8xTJU2gWe1tKlHF1nH4jpSy4MWlNzvvWXjSQHT zWGBt052RTIfcut7QXcaeTxaDjOC2+6TiRHXVVGJgMEcmF6BPJgIGnUkQH2IthVw9F82 ja8w== X-Gm-Message-State: AOAM530tNuyb/Le4bfsKRqCqjUDaR8QFMe8W/o1FC9Qt7yBuNYHj8+Uy K7NJyLHY37/ua/6Gv/gYynpYI4eviZA= X-Google-Smtp-Source: ABdhPJyMMvFLrHlA70ZdzGQYlaAvxkUsxzB1wsmsKlFxF+l/nhGDrTOdSFvhYWSKqtZGin3dFOLepDLpF2g= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a5b:352:: with SMTP id q18mr1490351ybp.23.1642015564256; Wed, 12 Jan 2022 11:26:04 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:45 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-7-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type. From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The previous patch introdued a new program type bpf_view. This patch adds support for bpf_view in libbpf. Signed-off-by: Hao Luo --- tools/lib/bpf/libbpf.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 7f10dd501a52..0d458e34d82c 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -8570,6 +8570,7 @@ static struct bpf_link *attach_raw_tp(const struct bp= f_program *prog, long cooki static struct bpf_link *attach_trace(const struct bpf_program *prog, long = cookie); static struct bpf_link *attach_lsm(const struct bpf_program *prog, long co= okie); static struct bpf_link *attach_iter(const struct bpf_program *prog, long c= ookie); +static struct bpf_link *attach_view(const struct bpf_program *prog, long c= ookie); =20 static const struct bpf_sec_def section_defs[] =3D { SEC_DEF("socket", SOCKET_FILTER, 0, SEC_NONE | SEC_SLOPPY_PFX), @@ -8599,6 +8600,7 @@ static const struct bpf_sec_def section_defs[] =3D { SEC_DEF("lsm/", LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm), SEC_DEF("lsm.s/", LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, atta= ch_lsm), SEC_DEF("iter/", TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF, attach_iter), + SEC_DEF("view/", TRACING, BPF_TRACE_VIEW, SEC_ATTACH_BTF, attach_view), SEC_DEF("syscall", SYSCALL, 0, SEC_SLEEPABLE), SEC_DEF("xdp_devmap/", XDP, BPF_XDP_DEVMAP, SEC_ATTACHABLE), SEC_DEF("xdp_cpumap/", XDP, BPF_XDP_CPUMAP, SEC_ATTACHABLE), @@ -8896,6 +8898,7 @@ static int bpf_object__collect_st_ops_relos(struct bp= f_object *obj, #define BTF_TRACE_PREFIX "btf_trace_" #define BTF_LSM_PREFIX "bpf_lsm_" #define BTF_ITER_PREFIX "bpf_iter_" +#define BTF_VIEW_PREFIX "bpf_view_" #define BTF_MAX_NAME_SIZE 128 =20 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type, @@ -8914,6 +8917,10 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type= attach_type, *prefix =3D BTF_ITER_PREFIX; *kind =3D BTF_KIND_FUNC; break; + case BPF_TRACE_VIEW: + *prefix =3D BTF_VIEW_PREFIX; + *kind =3D BTF_KIND_FUNC; + break; default: *prefix =3D ""; *kind =3D BTF_KIND_FUNC; @@ -10575,6 +10582,20 @@ struct bpf_link *bpf_program__attach_freplace(cons= t struct bpf_program *prog, } } =20 +static struct bpf_link *attach_view(const struct bpf_program *prog, long c= ookie) +{ + const char *target_name; + const char *prefix =3D "view/"; + int btf_id; + + target_name =3D prog->sec_name + strlen(prefix); + btf_id =3D libbpf_find_vmlinux_btf_id(target_name, BPF_TRACE_VIEW); + if (btf_id < 0) + return libbpf_err_ptr(btf_id); + + return bpf_program__attach_fd(prog, 0, btf_id, "view"); +} + struct bpf_link * bpf_program__attach_iter(const struct bpf_program *prog, const struct bpf_iter_attach_opts *opts) --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 452ABC433EF for ; Wed, 12 Jan 2022 19:27:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345187AbiALT1l (ORCPT ); Wed, 12 Jan 2022 14:27:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345164AbiALT0c (ORCPT ); Wed, 12 Jan 2022 14:26:32 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E1D6C061201 for ; Wed, 12 Jan 2022 11:26:07 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id a84-20020a251a57000000b0061171f19f8dso6368113yba.13 for ; Wed, 12 Jan 2022 11:26:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=yDXL8UUzK/9+buL6gkbXy9spq4LQS1BDsh/qrssDu7Q=; b=RF11mGZx2zr+S6RJLpOtEnxIWCNKMsMa12V1GuPNVcWvrLZvemgyDY/sSPkFnYOf+D 6TkEJKWH0W6R4Tnd7+dZBv/ZDktlsWiVNxKVTtjsH2tjEYJXpFtFkoNSMPwYhtiOcJRD 5lWWYmPkXwuSnh/qEUkzhAoVPpvrVW6MOp6iJXm95a/Tw3elLkr5jhEcJusq41vbLl6X J/+WC/5aVpuPk6ZIAjndPoWba80QD0wDZxW0jlVcBp6K1UWEA1RL9yzIte2PHJUa6P9A HhtTiDiFp7zwezlHQ2jl90giKPao9u/qfJc2eRg0EzSiWYbiSUHbGZRfIPBDYO2vp0n9 /dTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=yDXL8UUzK/9+buL6gkbXy9spq4LQS1BDsh/qrssDu7Q=; b=rlEYPr9cFZzT37KCq31ybtwwjcAtD3OkJLDlrYafEkb9DgqYQs1dbWfppHXqYyfzGz Min+338NvWmzPCnB85BfMFhdzJGlzwcfiYoGiciTqKO6G6hVDU4HwWJ94Djasl+8y2U3 6vopjocGCYC30XPcxgedSo5GZcaVNcFI22pL5xp3p78XOkaznpE+JoDi999flcfLL2o+ c1jX1ej9MVqXiomcBfbFn7RGPqCnwx3BLeOJe1lCmYv2scG2repAcBcS3b3ETf7sHFL7 IAmXwOc8DYgGHk01NVJfJz0u6w0xx9Gv2Vv8BqLg/BSP4wAC7Gu/CH5iOWVkbWvnQdMd pkQg== X-Gm-Message-State: AOAM532IttLcIxuqUisWu7CMW13OrlYSlRbOuL308g21RBa2h0XK5LU6 +HtQcgc++5dneZX4jSZJuWzeVY+XpyY= X-Google-Smtp-Source: ABdhPJyheBqhOzTcvV+uhHfkduy3S5cLxMWybXi9ki2uwDuLhYbJe68pxiDg5RFWa1AsJ9LAQi/k8ditmTk= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a25:c3c4:: with SMTP id t187mr1409170ybf.634.1642015566506; Wed, 12 Jan 2022 11:26:06 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:46 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-8-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Previous patches allow exposing bpf objects to kernfs file system. They allow creating file entries in kernfs, which can reference bpf objects. The referred bpf objects can be used to customize the new entry's file operations. In particular, this patch introduces one concrete use case of this feature. It implements the .seq_show file operation for the cgroup file system. The seq_show handler takes the bpf object and use it to format its output seq file. The bpf object needs to be a link to the newly introduced "bpf_view" program type. Signed-off-by: Hao Luo --- kernel/bpf/bpf_view.c | 11 ++++++++ kernel/bpf/bpf_view.h | 1 + kernel/bpf/inode.c | 4 +-- kernel/bpf/inode.h | 3 +++ kernel/bpf/kernfs_node.c | 58 +++++++++++++++++++++++++++++++++++++++- 5 files changed, 73 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/bpf_view.c b/kernel/bpf/bpf_view.c index 967a9240bab4..8f035c5a9b6a 100644 --- a/kernel/bpf/bpf_view.c +++ b/kernel/bpf/bpf_view.c @@ -166,6 +166,17 @@ static struct bpf_view_target_info cgroup_view_tinfo = =3D { .btf_id =3D 0, }; =20 +bool bpf_link_is_cgroup_view(struct bpf_link *link) +{ + struct bpf_view_link *view_link; + + if (!bpf_link_is_view(link)) + return false; + + view_link =3D container_of(link, struct bpf_view_link, link); + return view_link->tinfo =3D=3D &cgroup_view_tinfo; +} + static int __init bpf_view_init(void) { int cgroup_view_idx[BPF_VIEW_CTX_ARG_MAX] =3D { diff --git a/kernel/bpf/bpf_view.h b/kernel/bpf/bpf_view.h index 1a1110a5727f..a02564e529cb 100644 --- a/kernel/bpf/bpf_view.h +++ b/kernel/bpf/bpf_view.h @@ -17,6 +17,7 @@ struct bpf_view_cgroup_ctx { }; =20 bool bpf_link_is_view(struct bpf_link *link); +bool bpf_link_is_cgroup_view(struct bpf_link *link); =20 /* Run a bpf_view program */ int run_view_prog(struct bpf_prog *prog, void *ctx); diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 7e93e477b57c..1ae4a7b8c732 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -115,8 +115,6 @@ struct fsnotify_ops bpf_notify_ops =3D { .free_mark =3D notify_free_mark, }; =20 -static int bpf_inode_type(const struct inode *inode, enum bpf_type *type); - /* Watch the destruction of an inode and calls the callbacks in the given * notify_ops. */ @@ -211,7 +209,7 @@ static struct inode *bpf_get_inode(struct super_block *= sb, return inode; } =20 -static int bpf_inode_type(const struct inode *inode, enum bpf_type *type) +int bpf_inode_type(const struct inode *inode, enum bpf_type *type) { *type =3D BPF_TYPE_UNSPEC; if (inode->i_op =3D=3D &bpf_prog_iops) diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h index c12d385a3e2a..dea78341549b 100644 --- a/kernel/bpf/inode.h +++ b/kernel/bpf/inode.h @@ -17,6 +17,9 @@ struct notify_ops { void (*free_inode)(void *object, enum bpf_type type, void *priv); }; =20 +/* Get the type of bpf object from bpffs inode. */ +int bpf_inode_type(const struct inode *inode, enum bpf_type *type); + #ifdef CONFIG_FSNOTIFY /* Watch the destruction of an inode and calls the callbacks in the given * notify_ops. diff --git a/kernel/bpf/kernfs_node.c b/kernel/bpf/kernfs_node.c index 3d331d8357db..7b58bfc1951e 100644 --- a/kernel/bpf/kernfs_node.c +++ b/kernel/bpf/kernfs_node.c @@ -3,15 +3,33 @@ * Expose eBPF objects in kernfs file system. */ =20 +#include #include #include +#include +#include +#include #include "inode.h" +#include "bpf_view.h" =20 /* file_operations for kernfs file system */ =20 /* Command for removing a kernfs entry */ #define REMOVE_CMD "rm" =20 +static const struct kernfs_ops bpf_generic_ops; +static const struct kernfs_ops bpf_cgroup_ops; + +/* Choose the right kernfs_ops for different kernfs. */ +static const struct kernfs_ops *bpf_kernfs_ops(struct super_block *sb) +{ + if (sb->s_magic =3D=3D CGROUP_SUPER_MAGIC || + sb->s_magic =3D=3D CGROUP2_SUPER_MAGIC) + return &bpf_cgroup_ops; + + return &bpf_generic_ops; +} + /* Handler when the watched inode is freed. */ static void kn_watch_free_inode(void *obj, enum bpf_type type, void *kn) { @@ -80,7 +98,7 @@ int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t = mode, void *obj, if (!inode) return -ENXIO; =20 - ops =3D &bpf_generic_ops; + ops =3D bpf_kernfs_ops(sb); kn =3D __kernfs_create_file(parent_kn, dentry->d_iname, mode, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0, ops, inode, NULL, NULL); @@ -107,3 +125,41 @@ int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode= _t mode, void *obj, iput(inode); return 0; } + +/* file_operations for cgroup file system */ +static int bpf_cgroup_seq_show(struct seq_file *seq, void *v) +{ + struct bpf_view_cgroup_ctx ctx; + struct kernfs_open_file *of; + struct kernfs_node *kn; + struct cgroup *cgroup; + struct inode *inode; + struct bpf_link *link; + enum bpf_type type; + + of =3D seq->private; + kn =3D of->kn; + cgroup =3D kn->parent->priv; + + inode =3D kn->priv; + if (bpf_inode_type(inode, &type)) + return -ENXIO; + + if (type !=3D BPF_TYPE_LINK) + return -EACCES; + + link =3D inode->i_private; + if (!bpf_link_is_cgroup_view(link)) + return -EACCES; + + ctx.seq =3D seq; + ctx.cgroup =3D cgroup; + + return run_view_prog(link->prog, &ctx); +} + +static const struct kernfs_ops bpf_cgroup_ops =3D { + .seq_show =3D bpf_cgroup_seq_show, + .write =3D bpf_generic_write, + .read =3D bpf_generic_read, +}; --=20 2.34.1.448.ga2b2bfdf31-goog From nobody Tue Jun 30 17:45:56 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2F9BC433EF for ; Wed, 12 Jan 2022 19:28:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356810AbiALT2A (ORCPT ); Wed, 12 Jan 2022 14:28:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345309AbiALT0h (ORCPT ); Wed, 12 Jan 2022 14:26:37 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 430FFC06118C for ; Wed, 12 Jan 2022 11:26:09 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id q185-20020a25d9c2000000b00611ae9c8773so2546072ybg.18 for ; Wed, 12 Jan 2022 11:26:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VsFFxxwANtIrhwRpJUE2VY/wPWYvmyT5gfi+R44pAG0=; b=ENdmOdatY4zEiRnYojm/ITEfSN1H8OKJs2sqMObJUrQuyAnQL29NnS6KfPbj2g5Qnd YJ+SI7OApM3JzYQWWA+m4+OkEkHa/ifyxZz7TRpORAoFVUh+f19+ll3idvlJaWwGiz0x RVymw0RpjRv0pKuGBsvbyaeCwNn6/X1qgBlbISYnKEkSNkqUOS8ytGWS6J/F5bujUfv7 ZsmSPauNSBF8Yp7zauBJ52KmJM9Hw3zNQxAkmXOPFpVWGa8yGzEIKyXYD3slkNmAU5BR GA5MN9JR+rZw2Tv4z/zRY11br063GYBf0z3PDIiQfW3F/AdoLSjSnfI1SdcWK7ORR5EH bdbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VsFFxxwANtIrhwRpJUE2VY/wPWYvmyT5gfi+R44pAG0=; b=nZ8yMIZp7ujE3xTNW4R+m5HaixIx28wXE90dIDVV38OcN4SJhiQrilgpn4aZvnQz/h szs42r8BU+1DKGSga5RK/diHUwBZJ8GQZ6jrSCkcd1ksiBzhsucEZNbpBCY6jQ1DmKWu SN0eT/saWq9OZ6gjakFwfDvF9T4Nc3bug00toXHB8SlIp8/eJVmlTpnYPSPPtyZa0U/U pUS0OV2GHGzLbPBeoVof5Q/18nKYqYMTnx8uRpV/hCWpgawbUl5ERFrTMWgDFhRztcUS MH+F8Bjk4wup9fIE2f6hGK3HPO0ZtF8WHOc2U07nY4wxwGqUNJB3AKJPJZKWkHe/58vX /sVQ== X-Gm-Message-State: AOAM531k2z0rCur7YCD22rGFvWY8J2vSITim1596rMbqvNujC+Gbq17S eFHtcm4FEV8iSrgn8bjyDPakyWXxYUg= X-Google-Smtp-Source: ABdhPJz8qpgAQmgwYbd9zW1gXEAdsK0dh9ecxoggPUKYUrNtALfc2eMBc/Rst1BQvOSd8nJ964YzuHP40Lo= X-Received: from haoluo.svl.corp.google.com ([2620:15c:2cd:202:ddf2:9aea:6994:df79]) (user=haoluo job=sendgmr) by 2002:a25:248a:: with SMTP id k132mr1659355ybk.282.1642015568531; Wed, 12 Jan 2022 11:26:08 -0800 (PST) Date: Wed, 12 Jan 2022 11:25:47 -0800 In-Reply-To: <20220112192547.3054575-1-haoluo@google.com> Message-Id: <20220112192547.3054575-9-haoluo@google.com> Mime-Version: 1.0 References: <20220112192547.3054575-1-haoluo@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH RESEND RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs From: Hao Luo To: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann Cc: Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe@google.com, Burton@google.com, jevburton.kernel@gmail.com, Tejun Heo , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Luo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add selftests for exposing bpf objects in kernfs. Basically the added test tests two functionalities: 1. the ability to expose generic bpf objects in kernfs. 2. the ability to expose bpf_view programs to cgroup file system and read from the created cgroupfs entry. The test assumes cgroup v2 is mounted at /sys/fs/cgroup/ and bpffs is mounted at /sys/fs/bpf/ Signed-off-by: Hao Luo --- .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++ .../selftests/bpf/progs/pinning_kernfs.c | 72 +++++ 2 files changed, 317 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c diff --git a/tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c b/tool= s/testing/selftests/bpf/prog_tests/pinning_kernfs.c new file mode 100644 index 000000000000..aa702d05bf25 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c @@ -0,0 +1,245 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include "pinning_kernfs.skel.h" + +/* remove pinned object from kernfs */ +static void do_unpin(const char *kernfs_path, const char *msg) +{ + struct stat statbuf =3D {}; + const char cmd[] =3D "rm"; + int fd; + + fd =3D open(kernfs_path, O_WRONLY); + if (fd < 0) + return; + ASSERT_GE(write(fd, cmd, sizeof(cmd)), 0, "fail_unpin_cgroup_entry"); + close(fd); + + ASSERT_ERR(stat(kernfs_path, &statbuf), msg); +} + +static void do_pin(int fd, const char *pinpath) +{ + struct stat statbuf =3D {}; + + if (!ASSERT_OK(bpf_obj_pin(fd, pinpath), "bpf_obj_pin")) + return; + + ASSERT_OK(stat(pinpath, &statbuf), "stat"); +} + +static void check_pinning(const char *bpffs_rootpath, + const char *kernfs_rootpath) +{ + const char msg[] =3D "xxx"; + char buf[8]; + struct pinning_kernfs *skel; + struct bpf_link *link; + int prog_fd, map_fd, link_fd; + char bpffs_path[64]; + char kernfs_path[64]; + struct stat statbuf =3D {}; + int err, fd; + + skel =3D pinning_kernfs__open_and_load(); + if (!ASSERT_OK_PTR(skel, "pinning_kernfs__open_and_load")) + return; + + snprintf(kernfs_path, 64, "%s/bpf_obj", kernfs_rootpath); + snprintf(bpffs_path, 64, "%s/bpf_obj", bpffs_rootpath); + + prog_fd =3D bpf_program__fd(skel->progs.wait_show); + + /* test 1: + * + * - expose object in kernfs without pinning in bpffs in the first place. + */ + ASSERT_ERR(bpf_obj_pin(prog_fd, kernfs_path), "pin_kernfs_first"); + + /* test 2: + * + * - expose bpf prog in kernfs. + * - read/write the newly creaded kernfs entry. + */ + do_pin(prog_fd, bpffs_path); + do_pin(prog_fd, kernfs_path); + fd =3D open(kernfs_path, O_RDWR); + err =3D read(fd, buf, sizeof(buf)); + if (!ASSERT_EQ(err, -1, "unexpected_successful_read")) + goto out; + + err =3D write(fd, msg, sizeof(msg)); + if (!ASSERT_EQ(err, -1, "unexpected_successful_write")) + goto out; + + close(fd); + do_unpin(kernfs_path, "kernfs_unpin_prog"); + ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_prog"); + + /* test 3: + * + * - expose bpf map in kernfs. + * - read/write the newly created kernfs entry. + */ + map_fd =3D bpf_map__fd(skel->maps.wait_map); + do_pin(map_fd, bpffs_path); + do_pin(map_fd, kernfs_path); + fd =3D open(kernfs_path, O_RDWR); + err =3D read(fd, buf, sizeof(buf)); + if (!ASSERT_EQ(err, -1, "unexpected_successful_read")) + goto out; + + err =3D write(fd, msg, sizeof(msg)); + if (!ASSERT_EQ(err, -1, "unexpected_successful_write")) + goto out; + + close(fd); + do_unpin(kernfs_path, "kernfs_unpin_map"); + ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_map"); + + /* test 4: + * + * - expose bpf link in kernfs. + * - read/write the newly created kernfs entry. + * - removing bpffs entry also removes kernfs entries. + */ + link =3D bpf_program__attach(skel->progs.wait_record); + link_fd =3D bpf_link__fd(link); + do_pin(link_fd, bpffs_path); + do_pin(link_fd, kernfs_path); + fd =3D open(kernfs_path, O_RDWR); + err =3D read(fd, buf, sizeof(buf)); + if (!ASSERT_EQ(err, -1, "unexpected_successful_read")) + goto destroy_link; + + err =3D write(fd, msg, sizeof(msg)); + if (!ASSERT_EQ(err, -1, "unexpected_successful_write")) + goto destroy_link; + + ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_link"); + ASSERT_ERR(stat(kernfs_path, &statbuf), "unpin_bpffs_first"); + + /* cleanup */ +destroy_link: + bpf_link__destroy(link); +out: + close(fd); + pinning_kernfs__destroy(skel); +} + +static void spin_on_cpu(int seconds) +{ + time_t start, now; + + start =3D time(NULL); + do { + now =3D time(NULL); + } while (now - start < seconds); +} + +static void do_work(const char *cgroup) +{ + int cpu =3D 0, pid; + char cmd[128]; + + /* make cgroup threaded */ + snprintf(cmd, 128, "echo threaded > %s/cgroup.type", cgroup); + system(cmd); + + pid =3D fork(); + if (pid =3D=3D 0) { + /* attach to cgroup */ + snprintf(cmd, 128, "echo %d > %s/cgroup.procs", getpid(), cgroup); + system(cmd); + + /* pin process to target cpu */ + snprintf(cmd, 128, "taskset -pc %d %d", cpu, getpid()); + system(cmd); + + spin_on_cpu(3); /* spin on cpu for 3 seconds */ + exit(0); + } + + /* pin process to target cpu */ + snprintf(cmd, 128, "taskset -pc %d %d", cpu, getpid()); + system(cmd); + + spin_on_cpu(3); /* spin on cpu for 3 seconds */ + wait(NULL); +} + +void read_from_file(const char *path) +{ + int id =3D 0, lat; + char buf[64]; + int fd; + + fd =3D open(path, O_RDONLY); + if (fd < 0) + return; + ASSERT_GE(read(fd, buf, sizeof(buf)), 0, "fail_read_cgroup_entry"); + ASSERT_EQ(sscanf(buf, "%d %d", &id, &lat), 2, "unexpected_seq_show_output= "); + close(fd); +} + +static void check_cgroup_seq_show(const char *bpffs_dir, + const char *cgroup_dir) +{ + struct pinning_kernfs *skel; + char bpffs_path[64]; + char cgroup_path[64]; + int fd; + + skel =3D pinning_kernfs__open_and_load(); + if (!ASSERT_OK_PTR(skel, "pinning_kernfs__open_and_load")) + return; + + pinning_kernfs__attach(skel); + + snprintf(bpffs_path, 64, "%s/bpf_obj", bpffs_dir); + snprintf(cgroup_path, 64, "%s/bpf_obj", cgroup_dir); + + /* generate wait events for the cgroup */ + do_work(cgroup_dir); + + /* expose wait_show prog to cgroupfs */ + fd =3D bpf_link__fd(skel->links.wait_show); + ASSERT_OK(bpf_obj_pin(fd, bpffs_path), "pin_bpffs"); + ASSERT_OK(bpf_obj_pin(fd, cgroup_path), "pin_cgroupfs"); + + /* read from cgroupfs and check results */ + read_from_file(cgroup_path); + + /* cleanup */ + do_unpin(cgroup_path, "cgroup_unpin_seq_show"); + ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_seq_show"); + + pinning_kernfs__destroy(skel); +} + +void test_pinning_kernfs(void) +{ + char kernfs_tmpl[] =3D "/sys/fs/cgroup/bpf_pinning_test_XXXXXX"; + char bpffs_tmpl[] =3D "/sys/fs/bpf/pinning_test_XXXXXX"; + char *kernfs_rootpath, *bpffs_rootpath; + + kernfs_rootpath =3D mkdtemp(kernfs_tmpl); + bpffs_rootpath =3D mkdtemp(bpffs_tmpl); + + /* check pinning map, prog and link in kernfs */ + if (test__start_subtest("pinning")) + check_pinning(bpffs_rootpath, kernfs_rootpath); + + /* check cgroup seq_show implemented using bpf */ + if (test__start_subtest("cgroup_seq_show")) + check_cgroup_seq_show(bpffs_rootpath, kernfs_rootpath); + + rmdir(kernfs_rootpath); + rmdir(bpffs_rootpath); +} diff --git a/tools/testing/selftests/bpf/progs/pinning_kernfs.c b/tools/tes= ting/selftests/bpf/progs/pinning_kernfs.c new file mode 100644 index 000000000000..ca03a9443794 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/pinning_kernfs.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Google */ + +#include "vmlinux.h" +#include +#include + +struct bpf_map_def SEC("maps") wait_map =3D { + .type =3D BPF_MAP_TYPE_HASH, + .key_size =3D sizeof(__u64), + .value_size =3D sizeof(__u64), + .max_entries =3D 65532, +}; + +/* task_group() from kernel/sched/sched.h */ +static struct task_group *task_group(struct task_struct *p) +{ + return p->sched_task_group; +} + +static struct cgroup *task_cgroup(struct task_struct *p) +{ + struct task_group *tg; + + tg =3D task_group(p); + return tg->css.cgroup; +} + +/* cgroup_id() from linux/cgroup.h */ +static __u64 cgroup_id(const struct cgroup *cgroup) +{ + return cgroup->kn->id; +} + +SEC("tp_btf/sched_stat_wait") +int BPF_PROG(wait_record, struct task_struct *p, __u64 delta) +{ + struct cgroup *cgrp; + __u64 *wait_ns; + __u64 id; + + cgrp =3D task_cgroup(p); + if (!cgrp) + return 0; + + id =3D cgroup_id(cgrp); + wait_ns =3D bpf_map_lookup_elem(&wait_map, &id); + + /* record the max wait latency seen so far */ + if (!wait_ns) + bpf_map_update_elem(&wait_map, &id, &delta, BPF_NOEXIST); + else if (*wait_ns < delta) + *wait_ns =3D delta; + return 0; +} + +SEC("view/cgroup") +int BPF_PROG(wait_show, struct seq_file *seq, struct cgroup *cgroup) +{ + __u64 id, *value; + + id =3D cgroup_id(cgroup); + value =3D bpf_map_lookup_elem(&wait_map, &id); + + if (value) + BPF_SEQ_PRINTF(seq, "%llu %llu\n", id, *value); + else + BPF_SEQ_PRINTF(seq, "%llu 0\n", id); + return 0; +} + +char _license[] SEC("license") =3D "GPL"; --=20 2.34.1.448.ga2b2bfdf31-goog