From nobody Mon Feb 9 21:37:44 2026 Received: from mail-pj1-f68.google.com (mail-pj1-f68.google.com [209.85.216.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7939A1373 for ; Fri, 31 Oct 2025 08:00:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761897636; cv=none; b=Fl8c+KjOxy429qrWYUqqUOvYtBpMzreBUbw4gxIW9UoxuGMTUd558R4aPXoDCEdX04iEHRI5Fz3GS2B1oJKC6f/3ERvP1ZkPOpPt3/8NjU8DJ11BFlJkLu/+uFIMcg5bhmELAYdScdv0QLGcM7y3fKhAF60RnKctdcEGS8Qb22g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761897636; c=relaxed/simple; bh=PY5JcpIz1RbUpSehb4OzqIXr+m4nDWvNZTAADYEpQgQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GH/DNg1jsVKZ5SwNG1infDK6c+ij0Oqdwkacna9vy1G0l6A5yC9bWVL6m7rLEu1ZySjemooHcpbpJTGOIkr4b0uHHgoeEpGSbwAXK2xJuwKSMMTl7clWhXYpf08HzKE4320r+OWobPHecnkpW1CHU8EI1Nn5z7CF2WE0kzdKaj0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=E2oNB6XZ; arc=none smtp.client-ip=209.85.216.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="E2oNB6XZ" Received: by mail-pj1-f68.google.com with SMTP id 98e67ed59e1d1-340564186e0so1650260a91.0 for ; Fri, 31 Oct 2025 01:00:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761897631; x=1762502431; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aMEI8A5exlVvtBbYgkjVXGm9tCatVMbqAKq87JD9SN8=; b=E2oNB6XZDVfz2OEY8d42303HLwuruhOzAcpEZYthVkRdwTFpDyriOrscaNowC4FZry HbdSTtU4FvYN7z/slo0Qr5vZtk2xBQt/vfSsQA4NfsH5BuZdag/7xvI7zjG5pafFJWLw LFifE3TOWkErWgCw3fdw0nb57ZHFNMRJIdDsIKL6lO0YdEYc4Ds5NR3Ja3WJ+Bl2t7mq Dhxptn8HjxHu1+OxWBHfm/qgA7xNnoCx4xvejZtkQPyDij/GAuXU5ZpRolePsZabqPy7 ldcigfHwMFhFqsdNe7jfqqljH1bahqfI5i5WNV7fqI2FFAkqqeJKsIdDt+fKZF30dCc2 nQWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761897631; x=1762502431; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aMEI8A5exlVvtBbYgkjVXGm9tCatVMbqAKq87JD9SN8=; b=QjEioNSfNbT6c8wjsTb9AaNXUNdNfDanSVDHT9AdBNFB+t9XQH050qtPTMhOE0FMlz 80/gnyKbJz3xd+HgrVZeglMk9NL2L5mSjtDkHF1FHzid7MzAAnHrk+Ad/esBs/KILldI szoaCGzfEmeh6jKAl8u/PLVxly3TsTh3ydJ76o+3T1ed7iJpF1+6gslZsLRxNwu9xhaD YG6JOCrZLkB62qapkbo6YQ+N9KYhx+TksdpQW8QLhUhJ3nsxnLjb+rEZYslnPbUQ201j 0265bGmPyrkY1CQsizIZ1OuNC8ahZMUipIRHz9lvztLdbWoQpCXny8+ybKQS53LD9h06 tEZg== X-Gm-Message-State: AOJu0YxpAxm46j5LOaXIpN/e39VsDzXJZRfPptxFxwRqkZwpAK/JpdWs fi71GvCOPRo6S4U+kvIhNzVoY69PwAThHJBQ7BAs7m9pEZv9hi5bjIe1lsIddvcl X-Gm-Gg: ASbGncsh2Yh9MJ3lxv/m4MLv9/7MV+zxuWjNbbQU4IW1YtKhtpenpa8tMSyl2aZp2T4 BPEy+4oEg/kZqgzzHb/HOy6zyeJ2REPsLQx1GQB9AO+D3awKJNG5PGHVOmscJ5RLdnY1m+1xEfz 27QZua/ynGv4URiX8+JS4VqmM7d46cOq4gF1kuw/EFPHl/BqolD/TCl4loptqcbEjlgHnaoFG8u P3XIj7uuOxT2XYkzrq8RrtHsfW/4sWIUAopDer5vNw7UiRpbrtZYBB4D1C8DAZwrlDyQXZ/uBIO rDghz5Yxqb+VqEG3hbwwH3IL1iBa6fvloPBIgm49dQ4KxH3QGoWmLUQRTIdhUytf+Z6D5R+KZEV wJO44AjweqRteD7CZ/vo6TaVuQn406c8rQIjtJkW7kAEQnXdvBJpKtFj8WlEBNkopjNWuFD4Eez d4uIi23fFNDnGUNZlO3g== X-Google-Smtp-Source: AGHT+IE6I/euK7zSq1aq635xnYJW9TJkfDVuIBI96r+FjKNtnQk+YSU7YsB4Q+x/iCNHMPaC9aOiRQ== X-Received: by 2002:a17:90b:4c10:b0:32e:936f:ad7 with SMTP id 98e67ed59e1d1-3408308ab21mr3411827a91.27.1761897630788; Fri, 31 Oct 2025 01:00:30 -0700 (PDT) Received: from E07P150077.ecarx.com.cn ([103.52.189.23]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b93be4045fbsm1216575a12.28.2025.10.31.01.00.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Oct 2025 01:00:29 -0700 (PDT) From: Jianyun Gao To: linux-kernel@vger.kernel.org Cc: Jianyun Gao , Andrii Nakryiko , Eduard Zingerman , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , bpf@vger.kernel.org (open list:BPF [LIBRARY] (libbpf)) Subject: [PATCH v2 5/5] libbpf: Add doxygen documentation for btf/iter etc. in bpf.h Date: Fri, 31 Oct 2025 15:59:07 +0800 Message-Id: <20251031075908.1472249-6-jianyungao89@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251031075908.1472249-1-jianyungao89@gmail.com> References: <20251031075908.1472249-1-jianyungao89@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add doxygen comment blocks for remaining helpers (btf/iter etc.) in tools/lib/bpf/bpf.h. These doc comments are for: -libbpf_set_memlock_rlim() -bpf_btf_load() -bpf_iter_create() -bpf_btf_get_next_id() -bpf_btf_get_fd_by_id() -bpf_btf_get_fd_by_id_opts() -bpf_raw_tracepoint_open_opts() -bpf_raw_tracepoint_open() -bpf_task_fd_query() Signed-off-by: Jianyun Gao --- v1->v2: - Fixed compilation error caused by embedded literal "/*" inside a comment (rephrased/escaped). - Fixed the non-ASCII characters in this patch. The v1 is here: https://lore.kernel.org/lkml/20251031032627.1414462-6-jianyungao89@gmail.co= m/ tools/lib/bpf/bpf.h | 745 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 740 insertions(+), 5 deletions(-) diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index a0cebda09e16..6ef1ea7921c4 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -34,7 +34,61 @@ #ifdef __cplusplus extern "C" { #endif - +/** + * @brief Adjust process RLIMIT_MEMLOCK to facilitate loading BPF objects. + * + * libbpf_set_memlock_rlim() raises (or lowers) the calling process's + * RLIMIT_MEMLOCK soft and hard limits to at least the number of bytes + * specified by memlock_bytes. BPF map and program creation can require + * locking kernel/user pages; if RLIMIT_MEMLOCK is too low the kernel + * will fail operations with EPERM/ENOMEM. This helper provides a + * convenient way to pre-allocate sufficient memlock quota. + * + * Semantics: + * - If current (soft or hard) RLIMIT_MEMLOCK is already >=3D memlock_by= tes, + * the limit is left unchanged and the function succeeds. + * - Otherwise, the function attempts to set both soft and hard limits + * to memlock_bytes using setrlimit(RLIMIT_MEMLOCK, ...). + * - On systems enforcing privilege constraints, increasing the hard + * limit may require CAP_SYS_RESOURCE; lack of privilege yields failur= e. + * + * Typical usage (before loading large maps/programs): + * size_t needed =3D 128ul * 1024 * 1024; // 128 MB + * if (libbpf_set_memlock_rlim(needed) < 0) { + * // handle error (e.g., fall back to smaller maps or abort) + * } + * + * Choosing a value: + * - Sum anticipated sizes of maps (key_size + value_size) * max_entries + * plus overhead. Add headroom for verifier, BTF, and future growth. + * - Large per-CPU maps multiply value storage by number of CPUs. + * - Overestimating is usually harmless (within administrative policy). + * + * Concurrency & scope: + * - Affects only the calling process's RLIMIT_MEMLOCK. + * - Child processes inherit the adjusted limits after fork/exec. + * + * Security / privileges: + * - Increasing the hard limit above the current maximum may require + * CAP_SYS_RESOURCE or appropriate PAM/ulimit configuration. + * - Without sufficient privilege, the call fails with -errno (often -EP= ERM). + * + * @param memlock_bytes Desired minimum RLIMIT_MEMLOCK (in bytes). If zero, + * the function is a no-op (always succeeds). + * + * @return 0 on success; + * < 0 negative error code (libbpf style =3D=3D -errno) on failure: + * - -EINVAL: Invalid argument (e.g., internal conversion issues= ). + * - -EPERM / -EACCES: Insufficient privilege to raise hard limi= t. + * - -ENOMEM: Rare failure allocating internal structures. + * - Other -errno codes propagated from setrlimit(). + * + * Failure handling: + * - A failure means RLIMIT_MEMLOCK is unchanged; subsequent BPF map/pro= gram + * loads may still succeed if existing limit is adequate. + * - Check current limits manually (getrlimit) if precise sizing is crit= ical. + * + */ LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes); =20 struct bpf_map_create_opts { @@ -295,7 +349,104 @@ struct bpf_btf_load_opts { size_t :0; }; #define bpf_btf_load_opts__last_field token_fd - +/** + * @brief Load a BTF (BPF Type Format) blob into the kernel and obtain a B= TF object FD. + * + * bpf_btf_load() wraps the BPF_BTF_LOAD command of the bpf(2) syscall. It= validates + * and registers the BTF metadata described by @p btf_data so that subsequ= ently loaded + * BPF programs and maps can reference rich type information (for CO-RE re= locations, + * pretty printing, introspection, etc.). + * + * Typical usage: + * // Prepare optional verifier/logging buffer (only if you want kernel = diagnostics) + * char log_buf[1 << 20] =3D {}; + * struct bpf_btf_load_opts opts =3D { + * .sz =3D sizeof(opts), + * .log_buf =3D log_buf, + * .log_size =3D sizeof(log_buf), + * .log_level =3D 1, // >0 to request kernel parsing/va= lidation log + * }; + * int btf_fd =3D bpf_btf_load(btf_blob_ptr, btf_blob_size, &opts); + * if (btf_fd < 0) { + * // Inspect errno; if opts.log_buf was provided, it may contain de= tails. + * } else { + * // Use btf_fd (e.g. pass to bpf_prog_load() via prog_btf_fd, or q= uery info). + * } + * + * Input expectations: + * - @p btf_data must point to a complete, well-formed BTF buffer starti= ng with + * struct btf_header followed by the type section and string section. + * - @p btf_size is the total size in bytes of that buffer. + * - Endianness must match the running kernel; cross-endian BTF is rejec= ted. + * - Types must obey kernel constraints (e.g., no unsupported kinds, val= id string + * offsets, canonical integer encodings, no dangling references). + * + * Logging (opts->log_*): + * - If @p opts is non-NULL and opts->log_level > 0, the kernel may emit= a textual + * parse/validation log into opts->log_buf (up to opts->log_size - 1 b= ytes, with + * trailing '\0'). + * - On supported kernels, opts->log_true_size is updated to reflect the= full (untruncated) + * length of the internal log; if larger than log_size, the log was tr= uncated. + * - If the kernel does not support returning true size, log_true_size r= emains equal + * to the original log_size value or zero. + * + * Privileges & security: + * - CAP_BPF and/or CAP_SYS_ADMIN may be required depending on kernel co= nfiguration, + * LSM policy, and lockdown mode. Lack of privilege yields -EPERM / -E= ACCES. + * - In delegated environments, opts->token_fd (if available and support= ed) can grant + * scoped permission to load BTF without full global capabilities. + * + * Memory and lifetime: + * - On success a file descriptor (>=3D 0) referencing the in-kernel BTF= object is returned. + * Close it with close() when no longer needed. + * - The kernel makes its own copy of the supplied BTF blob; the caller = can free or reuse + * @p btf_data immediately after the call returns. + * - BTF objects can be queried via bpf_btf_get_info_by_fd() and referen= ced by programs + * (prog_btf_fd) or maps for type information. + * + * Concurrency & races: + * - Loading is independent; multiple BTF objects may coexist. + * - There is no automatic deduplication across separate loads (except a= ny internal + * kernel optimizations); user space manages uniqueness/pinning if des= ired. + * + * Validation tips: + * - Use bpftool btf dump to sanity-check a blob before loading. + * - Keep string table minimal; excessive strings inflate memory and may= hit limits. + * - Ensure all referenced type IDs exist and form a closed, acyclic gra= ph (except + * for permitted self-references in struct/union definitions). + * + * After loading: + * - Pass the returned FD as prog_btf_fd when loading programs that rely= on CO-RE + * relocations or need BTF type validation. + * - Optionally pin the BTF object with bpf_obj_pin() for persistence ac= ross process + * lifetimes. + * - Query metadata (e.g., number of types, string section size) with bp= f_btf_get_info_by_fd(). + * + * @param btf_data Pointer to the raw in-memory BTF blob. + * @param btf_size Size (in bytes) of the BTF blob pointed to by @p btf_da= ta. + * @param opts Optional pointer to a bpf_btf_load_opts struct. May be = NULL. + * Must set opts->sz =3D sizeof(*opts) when non-NULL. Fiel= ds: + * - log_buf / log_size / log_level: Request and store k= ernel + * validation log (see Logging). + * - log_true_size: Updated by kernel on success (if sup= ported). + * - btf_flags: Reserved for future extensions (must be = 0 unless documented). + * - token_fd: Delegated permission token (0 or -1 if un= used). + * + * @return + * >=3D 0 : File descriptor referencing the loaded BTF object. + * < 0 : Negative error code (see Error handling). + * + * Error handling (negative return codes =3D=3D -errno style): + * - -EINVAL: Malformed BTF (bad header, section sizes, invalid type gra= ph, bad string + * offsets, unsupported features), opts->sz mismatch, bad fla= gs. + * - -EFAULT: @p btf_data or opts->log_buf points to unreadable/writable= memory. + * - -ENOMEM: Kernel failed to allocate memory for internal BTF represen= tation. + * - -EPERM / -EACCES: Insufficient privileges or blocked by security po= licy. + * - -E2BIG: Exceeds kernel size/complexity limits (e.g., too many types= or strings). + * - -ENOTSUP / -EOPNOTSUPP: Kernel lacks support for a feature used in = the blob (rare). + * - Other negative codes may be propagated from the underlying syscall. + * + */ LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts); =20 @@ -1840,7 +1991,84 @@ struct bpf_link_update_opts { */ LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, const struct bpf_link_update_opts *opts); - +/** + * @brief Create a user space iterator stream FD from an existing BPF iter= ator link. + * + * bpf_iter_create() wraps the kernel's BPF_ITER_CREATE command. Given a B= PF + * link FD (@p link_fd) that represents an attached BPF iterator program + * (i.e., a program of type BPF_PROG_TYPE_TRACING with an iterator attach + * type such as BPF_TRACE_ITER), this function returns a new file descript= or + * from which user space can sequentially read the iterator's textual or + * binary output. + * + * Reading the returned FD: + * - Use read(), pread(), or a buffered I/O layer to consume iterator da= ta. + * - Each read() returns zero (EOF) when the iterator has completed prod= ucing + * all elements; close the FD afterward. + * - Short reads are normal; loop until EOF or error. + * + * Lifetime & ownership: + * - Success returns a new FD; caller owns it and must close() when fini= shed. + * - Closing the iterator FD does NOT destroy the underlying link or pro= gram. + * - You can create multiple iterator FDs from the same link concurrentl= y; + * each is an independent traversal. + * + * Typical usage: + * int link_fd =3D bpf_link_create(prog_fd, -1, BPF_TRACE_ITER, &opts); + * if (link_fd < 0) { // handle error } + * int iter_fd =3D bpf_iter_create(link_fd); + * if (iter_fd < 0) { // handle error } + * char buf[4096]; + * for (;;) { + * ssize_t n =3D read(iter_fd, buf, sizeof(buf)); + * if (n < 0) { + * if (errno =3D=3D EINTR) continue; + * perror("read iter"); + * break; + * } + * if (n =3D=3D 0) // end of iteration + * break; + * fwrite(buf, 1, n, stdout); + * } + * close(iter_fd); + * + * Concurrency & races: + * - Safe to call concurrently from multiple threads; each iterator FD + * represents its own walk. + * - Underlying kernel objects (maps, tasks, etc.) may change while iter= ating; + * output is a best-effort snapshot, not a stable, atomic view. + * + * Performance considerations: + * - Large buffers (e.g., 16-64 KiB) reduce syscall overhead for high-vo= lume + * iterators. + * - For blocking behavior, select()/poll()/epoll() can be used; EOF is + * indicated by read() returning 0. + * + * Security & privileges: + * - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel config= uration, + * lockdown mode, and LSM policy governing the iterator target. + * + * @param link_fd File descriptor of a BPF link representing an attached i= terator program. + * + * @return >=3D 0: Iterator stream file descriptor to read from. + * < 0 : Negative error code (libbpf style, =3D=3D -errno) on fail= ure. + * + * + * Error handling (negative libbpf-style return value =3D=3D -errno): + * - -EBADF: @p link_fd is not a valid open FD. + * - -EINVAL: @p link_fd does not refer to an iterator-capable BPF link,= or + * unsupported combination for the running kernel. + * - -EPERM / -EACCES: Insufficient privileges / blocked by security pol= icy. + * - -EOPNOTSUPP / -ENOTSUP: Kernel lacks iterator creation support for = this link. + * - -ENOMEM: Kernel could not allocate internal data structures. + * - Other -errno codes may be propagated from the underlying bpf() sysc= all. + * + * Robustness tips: + * - Verify the program was attached with the correct iterator attach ty= pe. + * - Treat a 0-length read as normal completion, not an error. + * - Always handle transient read() failures (EINTR, EAGAIN if non-block= ing). + * + */ LIBBPF_API int bpf_iter_create(int link_fd); =20 struct bpf_prog_test_run_attr { @@ -1953,6 +2181,68 @@ LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, = __u32 *next_id); */ LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id); =20 +/** + * @brief Retrieve the next existing BTF object ID after a given starting = ID. + * + * This helper wraps the kernel's BPF_BTF_GET_NEXT_ID command and enumerat= es + * in-kernel BTF (BPF Type Format) objects in strictly ascending order of + * their kernel-assigned IDs. It is typically used to iterate all currently + * loaded BTF objects (e.g., vmlinux BTF, module BTFs, user-loaded BTF blo= bs). + * + * Enumeration pattern: + * 1. Initialize start_id to 0 to obtain the first (lowest) existing BTF= ID. + * 2. On success, *next_id is set to the first BTF ID strictly greater t= han start_id. + * 3. Use the returned *next_id as the new start_id in a subsequent call. + * 4. Repeat until the function returns -ENOENT, which signals there is = no + * BTF object with ID greater than start_id (end of iteration). + * + * Concurrency & races: + * - BTF objects can be loaded or unloaded concurrently with enumeration. + * An ID retrieved in one call may become invalid (object unloaded) be= fore + * you convert it to a file descriptor with bpf_btf_get_fd_by_id(). + * - Enumeration does not provide a stable snapshot. Newly loaded BTFs m= ay + * appear after you've passed their predecessor ID. + * + * Lifetime & validity: + * - IDs are monotonically increasing and effectively never wrap in norm= al + * operation. + * - Successfully retrieving an ID does NOT pin the corresponding BTF ob= ject. + * Obtain a file descriptor immediately if you need to interact with i= t. + * + * Typical usage: + * __u32 id =3D 0, next; + * while (bpf_btf_get_next_id(id, &next) =3D=3D 0) { + * int btf_fd =3D bpf_btf_get_fd_by_id(next); + * if (btf_fd >=3D 0) { + * // Inspect/query BTF (e.g. bpf_btf_get_info_by_fd()). + * close(btf_fd); + * } + * id =3D next; + * } + * // Loop ends when bpf_btf_get_next_id() returns -ENOENT. + * + * @param start_id + * Starting point for the search. The helper finds the first BTF ID + * strictly greater than start_id. Use 0 to begin enumeration. + * @param next_id + * Pointer to a __u32 that receives the next BTF ID on success. + * Must not be NULL. + * + * @return + * 0 on success (next_id populated); + * -ENOENT if there is no BTF ID greater than start_id (normal end of i= teration); + * -EINVAL if next_id is NULL or arguments are otherwise invalid; + * -EPERM / -EACCES if denied by security policy or lacking required pri= vileges; + * Other negative libbpf-style codes (-errno) on transient or system fai= lures. + * + * Error handling notes: + * - Treat -ENOENT as normal termination, not an exceptional error. + * - For other failures, errno is set to the underlying cause. + * + * Follow-up: + * - Convert retrieved IDs to FDs with bpf_btf_get_fd_by_id() to inspect + * metadata or pin the BTF object. + */ LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id); /** * @brief Retrieve the next existing BPF link ID after a given starting ID. @@ -2227,9 +2517,171 @@ LIBBPF_API int bpf_map_get_fd_by_id(__u32 id); */ LIBBPF_API int bpf_map_get_fd_by_id_opts(__u32 id, const struct bpf_get_fd_by_id_opts *opts); - +/** + * @brief Obtain a file descriptor for an existing in-kernel BTF (BPF Type= Format) + * object given its kernel-assigned ID. + * + * bpf_btf_get_fd_by_id() wraps the BPF_BTF_GET_FD_BY_ID command of the bp= f(2) + * syscall. Each loaded BTF object (vmlinux BTF, kernel module BTF, or use= r-supplied + * BTF blob loaded via BPF_BTF_LOAD) has a monotonically increasing, uniqu= e ID. + * This helper converts that stable ID into a process-local file descriptor + * suitable for introspection (e.g., via bpf_btf_get_info_by_fd()), pinning + * (bpf_obj_pin()), or reuse when loading BPF programs/maps that reference= types + * from this BTF. + * + * Typical enumeration + open pattern: + * __u32 id =3D 0, next; + * while (bpf_btf_get_next_id(id, &next) =3D=3D 0) { + * int btf_fd =3D bpf_btf_get_fd_by_id(next); + * if (btf_fd >=3D 0) { + * // inspect with bpf_btf_get_info_by_fd(btf_fd, ...) + * close(btf_fd); + * } + * id =3D next; + * } + * // Loop ends when bpf_btf_get_next_id() returns -ENOENT. + * + * Concurrency & races: + * - A BTF object may be unloaded (e.g., module removal) between discove= ring + * its ID and calling this function; in that case the call fails with = -ENOENT. + * - Successfully obtaining a file descriptor does not prevent later unl= oading + * by other processes; subsequent operations on the FD can still fail. + * + * Lifetime & ownership: + * - On success the caller owns the returned descriptor and must close()= it + * when no longer needed. + * - Closing the FD does not destroy the underlying BTF object if other + * references (FDs or pinned bpffs paths) remain. + * + * Privileges / security: + * - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel config= uration, + * LSM policies, or lockdown mode. Lack of privilege yields -EPERM / -= EACCES. + * - Access can also be restricted by namespace or cgroup-based security= policies. + * + * Use cases: + * - Retrieve BTF metadata (type counts, string section size, specific t= ype + * definitions) via bpf_btf_get_info_by_fd(). + * - Pass the FD as prog_btf_fd when loading eBPF programs needing CO-RE= or + * type validation. + * - Pin the BTF object for persistence across process lifetimes. + * + * @param id + * Kernel-assigned unique (non-zero) BTF object ID. Typically obtai= ned via + * bpf_btf_get_next_id() or from a prior info query. Must be > 0. + * + * @return + * >=3D 0 : File descriptor referencing the BTF object (caller must clos= e()). + * < 0 : Negative libbpf-style error code (=3D=3D -errno): + * - -ENOENT : No BTF object with this ID (unloaded or never ex= isted). + * - -EPERM / -EACCES : Insufficient privileges / blocked by po= licy. + * - -EINVAL : Invalid ID (e.g., 0) or kernel rejected the requ= est. + * - -ENOMEM : Kernel memory/resource exhaustion. + * - Other negative values: Propagated syscall failures. + * + * Error handling notes: + * - Treat -ENOENT as a normal race outcome if objects can disappear. + * - Always close the returned FD to avoid resource leaks. + * + * Thread safety: + * - Safe to call concurrently; each successful invocation yields an ind= ependent FD. + * + * Forward compatibility: + * - ID space is monotonic; practical wraparound is not expected. + * - Future kernels may add additional validation or permission gating; = handle + * new -errno codes conservatively. + */ LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id); =20 +/** + * @brief Obtain a file descriptor for an existing in-kernel BTF (BPF Type= Format) + * object by its kernel-assigned ID, with extended open options. + * + * bpf_btf_get_fd_by_id_opts() is an extended variant of bpf_btf_get_fd_by= _id(). + * It wraps the BPF_BTF_GET_FD_BY_ID command of the bpf(2) syscall and con= verts + * a stable, monotonically increasing BTF object ID (@p id) into a process= -local + * file descriptor, honoring optional attributes supplied via @p opts. + * + * A BTF object represents a loaded collection of type metadata (vmlinux B= TF, + * kernel module BTF, or user-supplied BTF blob). Programs and maps can re= fer + * to these types for CO-RE relocations, verification, and introspection. + * + * Typical enumeration + open pattern: + * __u32 cur =3D 0, next; + * while (bpf_btf_get_next_id(cur, &next) =3D=3D 0) { + * struct bpf_get_fd_by_id_opts o =3D { + * .sz =3D sizeof(o), + * .open_flags =3D 0, + * .token_fd =3D -1, + * }; + * int btf_fd =3D bpf_btf_get_fd_by_id_opts(next, &o); + * if (btf_fd >=3D 0) { + * // use btf_fd (e.g. bpf_btf_get_info_by_fd()) + * close(btf_fd); + * } + * cur =3D next; + * } + * // Loop ends when bpf_btf_get_next_id() returns -ENOENT. + * + * Initialization & @p opts usage: + * - @p opts may be NULL for default behavior (equivalent to zeroed fiel= ds). + * - If @p opts is non-NULL, opts->sz MUST be set to sizeof(*opts); mism= atch + * yields -EINVAL. + * - opts->open_flags: + * Reserved for future kernel extensions; pass 0 unless a documented= flag + * is supported. Unsupported bits =3D> -EINVAL. + * - opts->token_fd: + * Optional BPF token FD enabling delegated (restricted) permissions= . Set + * to -1 or 0 if unused. Provides a way to open BTF objects without = full + * CAP_BPF/CAP_SYS_ADMIN in controlled environments. + * + * Concurrency & races: + * - A BTF object can be unloaded (e.g., module removal) after ID discov= ery + * but before this call; expect -ENOENT in such races. + * - Successfully obtaining a file descriptor does not guarantee the obj= ect + * will remain available for its entire lifetime (it could still be re= moved + * depending on kernel policies), so subsequent operations may fail. + * + * Lifetime & ownership: + * - On success you own the returned FD and must close() it when done. + * - Closing the FD does not destroy the BTF object if other references = (FDs, + * pinned bpffs entries) remain. + * - You may pin the BTF object via bpf_obj_pin() for persistence. + * + * Security / privileges: + * - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel config= uration, + * LSM policy, and lockdown mode. + * - Access via a token_fd is subject to token scope; insufficient right= s yield + * -EPERM / -EACCES. + * + * Use cases: + * - Retrieve type information with bpf_btf_get_info_by_fd(). + * - Supply prog_btf_fd when loading eBPF programs needing CO-RE relocat= ions. + * - Enumerate and manage user-loaded or kernel-provided BTF datasets. + * + * Robustness tips: + * - Treat -ENOENT as a normal race when enumerating dynamic BTF objects. + * - Always zero-initialize opts before setting recognized fields: + * struct bpf_get_fd_by_id_opts o =3D {}; + * o.sz =3D sizeof(o); + * - Avoid non-zero open_flags until documented; future kernels may add = semantic + * modifiers (e.g., restricted viewing modes). + * + * @param id Kernel-assigned unique BTF object ID (> 0). + * @param opts Optional pointer to struct bpf_get_fd_by_id_opts controllin= g open + * behavior; may be NULL for defaults. + * + * @return >=3D 0: File descriptor referencing the BTF object (caller must= close()). + * < 0 : Negative error code (libbpf style =3D=3D -errno) on failu= re. + * + * Error handling (negative return values are libbpf-style =3D=3D -errno): + * - -ENOENT: No BTF object with @p id (unloaded or never existed). + * - -EINVAL: Invalid @p id (e.g., 0), malformed @p opts (bad sz), or un= supported + * open_flags bits. + * - -EPERM / -EACCES: Insufficient privileges or blocked by security po= licy. + * - -ENOMEM: Kernel resource allocation failure. + * - Other -errno codes may be propagated from underlying syscall failur= es. + * + */ LIBBPF_API int bpf_btf_get_fd_by_id_opts(__u32 id, const struct bpf_get_fd_by_id_opts *opts); /** @@ -2650,11 +3102,294 @@ struct bpf_raw_tp_opts { size_t :0; }; #define bpf_raw_tp_opts__last_field cookie - +/** + * @brief Attach a loaded BPF program to a raw tracepoint using extended o= ptions. + * + * bpf_raw_tracepoint_open_opts() wraps the BPF_RAW_TRACEPOINT_OPEN comman= d and + * creates a persistent attachment of @p prog_fd to the raw tracepoint nam= ed in + * @p opts->tp_name. On success it returns a file descriptor representing = the + * attachment. Closing that FD detaches the program from the tracepoint. + * + * Compared to bpf_raw_tracepoint_open(), this variant allows passing a us= er + * cookie (opts->cookie) and provides forward/backward compatibility via t= he + * @p opts->sz field. + * + * Typical usage: + * struct bpf_raw_tp_opts ropts =3D { + * .sz =3D sizeof(ropts), + * .tp_name =3D "sched_switch", // raw tracepoint name (no "tracep= oint/" prefix) + * .cookie =3D 0xdeadbeef, // optional user cookie (visible t= o program) + * }; + * int tp_fd =3D bpf_raw_tracepoint_open_opts(prog_fd, &ropts); + * if (tp_fd < 0) { + * // handle error (inspect errno or negative return value) + * } + * // ... use attachment; close(tp_fd) to detach when done. + * + * Tracepoint name: + * - Use the raw tracepoint identifier as exposed under + * /sys/kernel/debug/tracing/events/ without category prefixes. For raw + * tracepoints this is typically the internal kernel name (e.g., "sche= d_switch"). + * - Passing NULL or an empty string fails with -EINVAL. + * + * Cookie: + * - opts->cookie (if non-zero) becomes available to the attached progra= m via + * bpf_get_attach_cookie() helper (where supported). + * - Set to 0 if you don't need a cookie; kernel treats it as absent. + * + * Structure initialization: + * - opts MUST NOT be NULL. + * - Zero-initialize the struct, then set: + * opts->sz =3D sizeof(struct bpf_raw_tp_opts); + * opts->tp_name =3D ""; + * opts->cookie =3D ; + * - Unrecognized future fields must remain zero for compatibility. + * + * Lifetime & detachment: + * - The returned FD solely controls the attachment lifetime. Closing it + * detaches the program. + * - The program FD @p prog_fd may be closed independently after success= ful + * attachment; the link remains active until the tracepoint FD is clos= ed. + * + * Concurrency: + * - Multiple programs can attach to the same raw tracepoint (each gets = its + * own FD). + * - Attaching/detaching is atomic from the program's perspective; events + * arriving after success will invoke the program. + * + * Privileges: + * - Typically requires CAP_BPF and/or CAP_SYS_ADMIN depending on kernel + * configuration, LSM policy, and lockdown mode. + * + * Performance considerations: + * - Raw tracepoints invoke programs on every event occurrence; ensure p= rogram + * logic is efficient to avoid noticeable system overhead. + * + * @param prog_fd + * File descriptor of a previously loaded BPF program (bpf_prog_load()) = that + * is compatible with raw tracepoint attachment (e.g., program type + * BPF_PROG_TYPE_RAW_TRACEPOINT or suitable tracing type). + * + * @param opts + * Pointer to an initialized bpf_raw_tp_opts structure describing the ta= rget + * tracepoint and optional cookie. Must not be NULL. opts->sz must equal + * sizeof(struct bpf_raw_tp_opts). + * + * @return + * >=3D 0 : File descriptor representing the attachment (close to detach= ). + * < 0 : Negative libbpf-style error code (=3D=3D -errno) on failure: + * - -EINVAL : Bad prog_fd, malformed opts (sz mismatch, NULL= tp_name), + * unsupported program type, or kernel lacks raw T= P support. + * - -EPERM/-EACCES : Insufficient privileges or blocked by sec= urity policy. + * - -ENOENT : Tracepoint name not found / not supported by c= urrent kernel. + * - -EBADF : Invalid prog_fd. + * - -ENOMEM : Kernel memory/resource exhaustion. + * - -EOPNOTSUPP/-ENOTSUP : Raw tracepoint attachment not suppo= rted. + * - Other -errno codes may be propagated from the underlying s= yscall. + * + * Error handling: + * - Inspect the negative return value or errno for diagnostics. + * - Treat -ENOENT as "tracepoint unavailable" (kernel config or version= gap). + * + * After attachment: + * - Optionally pin the FD (bpf_obj_pin()) if you need persistence. + * - Use bpf_obj_get_info_by_fd() to query attachment metadata if suppor= ted. + */ LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp= _opts *opts); =20 +/** + * @brief Attach a loaded BPF program to a raw tracepoint (legacy/simple A= PI). + * + * bpf_raw_tracepoint_open() is a convenience wrapper that issues the + * BPF_RAW_TRACEPOINT_OPEN command to attach the BPF program referenced + * by @p prog_fd to the raw tracepoint named @p name. On success it returns + * a file descriptor representing the attachment; closing that FD detaches + * the program from the tracepoint. + * + * Compared to bpf_raw_tracepoint_open_opts(), this legacy interface + * provides no ability to specify an attach cookie or future extension + * fields. For new code prefer bpf_raw_tracepoint_open_opts() to enable + * forward/backward compatible option passing. + * + * Tracepoint name: + * - @p name must be a non-NULL, null-terminated string identifying a + * raw tracepoint (e.g. "sched_switch"). + * - Pass the raw kernel tracepoint identifier without any category + * prefix (do not include "tracepoint/" or directory components). + * - If the tracepoint is not available (kernel config/version) the + * call fails with -ENOENT. + * + * Program requirements: + * - @p prog_fd must refer to a loaded BPF program of a type compatible + * with raw tracepoint attachment (e.g., BPF_PROG_TYPE_RAW_TRACEPOINT + * or an allowed tracing program type accepted by the kernel). + * - The program may be safely closed after a successful attachment; + * the returned FD controls the lifetime of the link. + * + * Lifetime & detachment: + * - Each successful call creates a distinct attachment with its own FD. + * - Closing the returned FD immediately detaches the program from the + * tracepoint. + * - The returned FD can be pinned (bpf_obj_pin()) for persistence. + * + * Concurrency: + * - Multiple programs can be attached to the same raw tracepoint. + * - Attach/detach operations are atomic; events after success invoke + * the program until its FD is closed. + * + * Privileges & security: + * - Typically requires CAP_BPF and/or CAP_SYS_ADMIN depending on + * kernel configuration, LSM, and lockdown mode. + * - Insufficient privilege yields -EPERM / -EACCES. + * + * Performance considerations: + * - Raw tracepoints can be very frequent; ensure attached program + * logic is efficient to avoid noticeable overhead. + * + * @param name Null-terminated raw tracepoint name (e.g. "sched_switch"= ). + * @param prog_fd File descriptor of a loaded, compatible BPF program. + * + * @return >=3D 0 : Attachment file descriptor (close to detach). + * < 0 : Negative error code (libbpf style =3D=3D -errno) on fail= ure. + * + * Error handling (negative libbpf-style return value =3D=3D -errno): + * - -EINVAL : Invalid @p prog_fd, NULL/empty @p name, incompatible pr= ogram type. + * - -ENOENT : Tracepoint not found / unsupported by current kernel. + * - -EPERM/-EACCES : Insufficient privileges or blocked by security pol= icy. + * - -EBADF : @p prog_fd is not a valid file descriptor. + * - -ENOMEM : Kernel memory/resource exhaustion. + * - -EOPNOTSUPP/-ENOTSUP : Raw tracepoints unsupported by the kernel. + * - Other negative codes may be propagated from the underlying syscall. + * + * Best practices: + * - Prefer bpf_raw_tracepoint_open_opts() for new development to + * gain cookie support and extensibility. + * - Immediately check the return value; do not rely solely on errno. + * - Pin the attachment if you need persistence across process lifetimes. + * + */ LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd); =20 +/** + * @brief Query metadata about a file descriptor in another task (process)= that + * is associated with a BPF tracing/perf event and (optionally) an + * attached BPF program. + * + * This helper wraps the kernel's BPF_TASK_FD_QUERY command. It inspects t= he + * file descriptor number @p fd that belongs to the task identified by @p = pid + * and, if that FD represents a perf event or similar tracing attachment, = it + * returns descriptive information about: + * - The attached BPF program (its kernel program ID). + * - The nature/type of the FD (tracepoint, raw_tracepoint, kprobe, upro= be, etc.). + * - Target symbol/address/offset data for kprobe/uprobes. + * - A human-readable identifier (tracepoint name, kprobe function name, + * uprobe file path), copied into @p buf when provided. + * + * Typical use cases: + * - Introspecting perf event FDs opened by another process to discover + * which BPF program is attached. + * - Enumerating and characterizing dynamically created kprobes or uprob= es + * (e.g., by observability agents). + * - Building higher-level tooling that correlates program IDs with their + * originating probe specifications. + * + * Usage pattern: + * char info[256]; + * __u32 info_len =3D sizeof(info); + * __u32 prog_id =3D 0, fd_type =3D 0; + * __u64 probe_off =3D 0, probe_addr =3D 0; + * int err =3D bpf_task_fd_query(target_pid, target_fd, 0, + * info, &info_len, + * &prog_id, &fd_type, + * &probe_off, &probe_addr); + * if (err =3D=3D 0) { + * // info[] now holds a NUL-terminated identifier (if available) + * // info_len =3D=3D actual length (including terminating '\0') + * // fd_type enumerates one of BPF_FD_TYPE_* values + * // prog_id is the kernel-assigned BPF program ID (0 if none) + * // probe_off / probe_addr describe offsets/addresses for kprobe/u= probe + * } else if (err =3D=3D -ENOSPC) { + * // info_len contains required size; allocate larger buffer and re= try + * } + * + * Buffer semantics (@p buf / @p buf_len): + * - On input @p *buf_len must hold the capacity (in bytes) of @p buf. + * - If @p buf is large enough, the kernel copies a NUL-terminated string + * (tracepoint name, kprobe symbol, uprobe path, etc.) and updates + * @p *buf_len with the actual string length (including the NUL). + * - If @p buf is too small, the call fails with -ENOSPC and sets + * @p *buf_len to the required length; reallocate and retry. + * - If a textual identifier is not applicable (or unavailable), the ker= nel + * may set @p *buf_len to 0 (and leave @p buf untouched). + * - Passing @p buf =3D=3D NULL is allowed only if @p buf_len is non-NUL= L and + * points to 0; otherwise -EINVAL is returned. + * + * Output parameters: + * - @p prog_id: Set to the kernel BPF program ID attached to the perf e= vent + * FD (0 if no BPF program is attached). + * - @p fd_type: Set to one of the BPF_FD_TYPE_* enum values describing = the + * FD (e.g., BPF_FD_TYPE_TRACEPOINT, BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_U= PROBE, + * BPF_FD_TYPE_RAW_TRACEPOINT). Use this to disambiguate interpretatio= n of + * other outputs. + * - @p probe_offset: For kprobe/uprobes, the offset within the symbol or + * mapped file that was requested when the probe was created. + * - @p probe_addr: For kprobes, the resolved kernel address of the prob= ed + * symbol/instruction; for uprobes may be 0 or implementation-dependen= t. + * - Any output pointer may be NULL if the caller is not interested in t= hat + * field (it will simply be skipped). + * + * Privileges & access control: + * - Querying another task's file descriptor typically requires sufficie= nt + * permissions (ptrace-like restrictions, CAP_BPF / CAP_SYS_ADMIN, and= /or + * LSM allowances). Lack of privilege yields -EPERM / -EACCES. + * - The target task must exist and the FD must be valid at query time. + * + * Concurrency / races: + * - The target process may close or replace its FD concurrently; the qu= ery + * can fail with -EBADF or -ENOENT in such races. + * - Retrieved metadata is a point-in-time snapshot and can become stale + * immediately after return. + * + * @param pid PID of the target task whose file descriptor table = should be queried. + * Use the numeric PID (thread group leader or specifi= c thread PID); + * passing 0 is typically invalid (returns -EINVAL). + * @param fd File descriptor number as seen from inside the task= identified by @p pid. + * @param flags Query modifier flags. Must be 0 on current kernels;= non-zero + * (unsupported) bits return -EINVAL. + * @param buf Optional user buffer to receive a NUL-terminated id= entifier string + * (tracepoint name, kprobe symbol, uprobe path). Can = be NULL if + * @p buf_len points to 0. + * @param buf_len In/out pointer to buffer length. On input: capacity= of @p buf. + * On success: actual length copied (including termina= ting NUL). + * On -ENOSPC: required length (caller should realloca= te and retry). + * @param prog_id Optional output pointer receiving the attached BPF = program ID (0 if none). + * @param fd_type Optional output pointer receiving one of BPF_FD_TYP= E_* constants identifying FD type. + * @param probe_offset Optional output pointer receiving the probe offset = (for kprobe/uprobe types). + * @param probe_addr Optional output pointer receiving resolved kernel a= ddress (kprobe) or relevant mapping address. + * + * @return 0 on success; + * Negative libbpf-style error code (< 0) on failure: + * - -EINVAL : Invalid arguments (bad pid/fd, unsupported flags= , inconsistent buf/buf_len). + * - -ENOENT : Task, file descriptor, or associated probe/progr= am not found. + * - -EBADF : Bad file descriptor in target task at time of qu= ery. + * - -ENOSPC : @p buf too small; @p *buf_len updated with requi= red size. + * - -EPERM / -EACCES : Insufficient privileges or access denied= by security policy. + * - -EFAULT : User memory (buf or buf_len or an output pointer= ) not accessible. + * - -ENOMEM : Temporary kernel memory/resource exhaustion. + * - Other -errno codes may be propagated from the underlying sy= scall. + * + * Best practices: + * - Initialize *buf_len with the size of your buffer; handle -ENOSPC by= allocating + * a larger buffer using the returned required length. + * - Check @p fd_type first to interpret @p probe_offset / @p probe_addr= meaningfully. + * - Treat -ENOENT and -EBADF as normal race outcomes in dynamic environ= ments. + * - Avoid querying extremely frequently in production paths; this is in= trospective + * debug/management tooling, not a fast data path primitive. + * + * Thread safety: + * - This helper is thread-safe; multiple threads can query different (o= r the same) + * tasks concurrently. Returned data structures are per-call (no share= d state). + */ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len, __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset, __u64 *probe_addr); --=20 2.34.1