From nobody Mon Feb 9 13:57:57 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07BF326B777 for ; Mon, 19 Jan 2026 03:26:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768793171; cv=none; b=EYHNQiwMHHHHA2tIrIHcEcVm/AvKmUQSdofX6/EmIh2x7yppgzT9+LtfFW/qOi34RskLdRWzb8oGGmeRic2EGzS9ew24Nmqoz+R7mupYQuUHxHgQ2bbekFP//z8yV5UHgVqo8IXoePwfSQyqIjIX1AtogM2vPoP9sL64e0c3Eks= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768793171; c=relaxed/simple; bh=fIvLL1De30RcxQb7+CKWRCpyMoJ4pOMT3KIG1pxkQV0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g+7IFh4UDRrVhtCyifb1x+xm5p0oD/DRysEhY7ZRn3HLqDAair0TaN8Kr8CZxWigtli8o1h1aXCC16gWfQ1kuTqnU7JRBLr3Vcg/eOKVjlijXF0k3syxM0OS8yuAVBjuGej7NXinXYvJp50Yubrw3OAE+T2R8MxlE2dqDDGmNaM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=H5JaDAAV; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H5JaDAAV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768793169; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pv0T+x9AR2PJROBQmbxh+6k2mKN+f/nip4Zou9Zi7Ow=; b=H5JaDAAVTWkVLZxEbVqpu20XkPwn2+Jl3dIG1T80b83j6u/l2r7y0XxIHgbBFdHQqD++Qb 5wdDQedoQ84qkIkNcHGiCBWbyRfFBd/KkUSjK9JMWpl6seRR1oVwIpheqMK6dIdWL9lIme vPifG35y+uErohE/dn/JN7GbWyZbzYE= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-606-KVDeXZCXNESpvZ0icQjNMg-1; Sun, 18 Jan 2026 22:26:05 -0500 X-MC-Unique: KVDeXZCXNESpvZ0icQjNMg-1 X-Mimecast-MFC-AGG-ID: KVDeXZCXNESpvZ0icQjNMg_1768793163 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9E4E6180044D; Mon, 19 Jan 2026 03:26:02 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.72.112.74]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A6F521955F22; Mon, 19 Jan 2026 03:25:51 +0000 (UTC) From: Pingfan Liu To: kexec@lists.infradead.org Cc: Pingfan Liu , "David S. Miller" , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Jeremy Linton , Catalin Marinas , Will Deacon , Ard Biesheuvel , Simon Horman , Gerd Hoffmann , Vitaly Kuznetsov , Philipp Rudo , Viktor Malik , Jan Hendrik Farr , Baoquan He , Dave Young , Andrew Morton , bpf@vger.kernel.org, systemd-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCHv6 04/13] kexec_file: Use bpf-prog to decompose image Date: Mon, 19 Jan 2026 11:24:15 +0800 Message-ID: <20260119032424.10781-5-piliu@redhat.com> In-Reply-To: <20260119032424.10781-1-piliu@redhat.com> References: <20260119032424.10781-1-piliu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" As UEFI becomes popular, a few architectures support to boot a PE format kernel image directly. But the internal of PE format varies, which means each parser for each format. This patch (with the rest in this series) introduces a common skeleton to all parsers, and leave the format parsing in bpf-prog, so the kernel code can keep relative stable. History, the syscall SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, unsigned long, cmdline_len, const char __user *, cmdline_pt= r, unsigned long, flags) complies with the kernel protocol: bootable kernel, initramfs, cmdline. But the occurrence of UKI images challenges the traditional model. The image itself contains the kernel, initrd, and cmdline. To be compatible with both the old and new models, kexec_file_load can be reorganized into two stages. In the first stage, "decompose_kexec_image()" breaks down the passed-in image into the components required by the kernel boot protocol. In the second stage, the traditional image loader "arch_kexec_kernel_image_load()" prepares the switch to the next kernel. During the decomposition stage, the decomposition process can be nested. In each sub-process, BPF bytecode is extracted from the '.bpf' section to parse the current PE file. If the data section in the PE file contains another PE file, the sub-process is repeated. This is designed to handle the zboot format embedded in UKI format on the arm64 platform. There are some placeholder functions in this patch. (They will take effect after the introduction of kexec BPF light skeleton and BPF helpers.) Signed-off-by: Pingfan Liu Cc: Baoquan He Cc: Dave Young Cc: Andrew Morton Cc: Philipp Rudo To: kexec@lists.infradead.org --- kernel/Kconfig.kexec | 8 ++ kernel/Makefile | 2 +- kernel/kexec_bpf_loader.c | 161 ++++++++++++++++++++++++++++++++++++++ kernel/kexec_file.c | 9 ++- kernel/kexec_internal.h | 1 + 5 files changed, 179 insertions(+), 2 deletions(-) create mode 100644 kernel/kexec_bpf_loader.c diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index 15632358bcf71..0c5d619820bcd 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -46,6 +46,14 @@ config KEXEC_FILE for kernel and initramfs as opposed to list of segments as accepted by kexec system call. =20 +config KEXEC_BPF + bool "Enable bpf-prog to parse the kexec image" + depends on KEXEC_FILE + depends on DEBUG_INFO_BTF && BPF_SYSCALL + help + This is a feature to run bpf section inside a kexec image file, which + parses the image properly and help kernel set up kexec boot protocol + config KEXEC_SIG bool "Verify kernel signature during kexec_file_load() syscall" depends on ARCH_SUPPORTS_KEXEC_SIG diff --git a/kernel/Makefile b/kernel/Makefile index f9e85c4a0622b..05177a867690d 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -83,7 +83,7 @@ obj-$(CONFIG_CRASH_DUMP_KUNIT_TEST) +=3D crash_core_test.o obj-$(CONFIG_KEXEC) +=3D kexec.o obj-$(CONFIG_KEXEC_FILE) +=3D kexec_file.o obj-$(CONFIG_KEXEC_ELF) +=3D kexec_elf.o -obj-$(CONFIG_KEXEC_BPF) +=3D kexec_uefi_app.o +obj-$(CONFIG_KEXEC_BPF) +=3D kexec_bpf_loader.o kexec_uefi_app.o obj-$(CONFIG_BACKTRACE_SELF_TEST) +=3D backtracetest.o obj-$(CONFIG_COMPAT) +=3D compat.o obj-$(CONFIG_CGROUPS) +=3D cgroup/ diff --git a/kernel/kexec_bpf_loader.c b/kernel/kexec_bpf_loader.c new file mode 100644 index 0000000000000..dc59e1389da94 --- /dev/null +++ b/kernel/kexec_bpf_loader.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Kexec image bpf section helpers + * + * Copyright (C) 2025, 2026 Red Hat, Inc + */ + +#define pr_fmt(fmt) "kexec_file(Image): " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "kexec_internal.h" + +/* Load a ELF */ +static int arm_bpf_prog(char *bpf_elf, unsigned long sz) +{ + return 0; +} + +static void disarm_bpf_prog(void) +{ +} + +struct kexec_context { + bool kdump; + char *kernel; + int kernel_sz; + char *initrd; + int initrd_sz; + char *cmdline; + int cmdline_sz; +}; + +void kexec_image_parser_anchor(struct kexec_context *context, + unsigned long parser_id); + +/* + * optimize("O0") prevents inline, compiler constant propagation + * + * Let bpf be the program context pointer so that it will not be spilled i= nto + * stack. + */ +__attribute__((used, optimize("O0"))) void kexec_image_parser_anchor( + struct kexec_context *context, + unsigned long parser_id) +{ + /* + * To prevent linker from Identical Code Folding (ICF) with kexec_image_p= arser_anchor, + * making them have different code. + */ + volatile int dummy =3D 0; + + dummy +=3D 1; +} + + +BTF_KFUNCS_START(kexec_modify_return_ids) +BTF_ID_FLAGS(func, kexec_image_parser_anchor, KF_SLEEPABLE) +BTF_KFUNCS_END(kexec_modify_return_ids) + +static const struct btf_kfunc_id_set kexec_modify_return_set =3D { + .owner =3D THIS_MODULE, + .set =3D &kexec_modify_return_ids, +}; + +static int __init kexec_bpf_prog_run_init(void) +{ + return register_btf_fmodret_id_set(&kexec_modify_return_set); +} +late_initcall(kexec_bpf_prog_run_init); + +static int kexec_buff_parser(struct bpf_parser_context *parser) +{ + return 0; +} + +/* At present, only PE format file with .bpf section is supported */ +#define file_has_bpf_section pe_has_bpf_section +#define file_get_section pe_get_section + +int decompose_kexec_image(struct kimage *image, int extended_fd) +{ + struct kexec_context context =3D { 0 }; + struct bpf_parser_context *bpf; + unsigned long kernel_sz, bpf_sz; + char *kernel_start, *bpf_start; + int ret =3D 0; + + if (image->type !=3D KEXEC_TYPE_CRASH) + context.kdump =3D false; + else + context.kdump =3D true; + + kernel_start =3D image->kernel_buf; + kernel_sz =3D image->kernel_buf_len; + + while (file_has_bpf_section(kernel_start, kernel_sz)) { + + bpf =3D alloc_bpf_parser_context(kexec_buff_parser, &context); + if (!bpf) + return -ENOMEM; + file_get_section((const char *)kernel_start, ".bpf", &bpf_start, &bpf_sz= ); + if (!!bpf_sz) { + /* load and attach bpf-prog */ + ret =3D arm_bpf_prog(bpf_start, bpf_sz); + if (ret) { + put_bpf_parser_context(bpf); + pr_err("Fail to load .bpf section\n"); + goto err; + } + } + context.kernel =3D kernel_start; + context.kernel_sz =3D kernel_sz; + /* bpf-prog fentry, which handle above buffers. */ + kexec_image_parser_anchor(&context, (unsigned long)bpf); + + /* + * Container may be nested and should be unfold one by one. + * The former bpf-prog should prepare 'kernel', 'initrd', + * 'cmdline' for the next phase by calling kexec_buff_parser() + */ + kernel_start =3D context.kernel; + kernel_sz =3D context.kernel_sz; + + /* + * detach the current bpf-prog from their attachment points. + */ + disarm_bpf_prog(); + put_bpf_parser_context(bpf); + } + + /* + * image's kernel_buf, initrd_buf, cmdline_buf are set. Now they should + * be updated to the new content. + */ + image->kernel_buf =3D context.kernel; + image->kernel_buf_len =3D context.kernel_sz; + image->initrd_buf =3D context.initrd; + image->initrd_buf_len =3D context.initrd_sz; + image->cmdline_buf =3D context.cmdline; + image->cmdline_buf_len =3D context.cmdline_sz; + + return 0; +err: + vfree(context.kernel); + vfree(context.initrd); + vfree(context.cmdline); + return ret; +} + diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 0222d17072d40..f9674bb5bd8db 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -238,7 +238,14 @@ kimage_file_prepare_segments(struct kimage *image, int= kernel_fd, int initrd_fd, goto out; #endif =20 - /* Call arch image probe handlers */ + if (IS_ENABLED(CONFIG_KEXEC_BPF)) + decompose_kexec_image(image, initrd_fd); + + /* + * From this point, the kexec subsystem handle the kernel boot protocol. + * + * Call arch image probe handlers + */ ret =3D arch_kexec_kernel_image_probe(image, image->kernel_buf, image->kernel_buf_len); if (ret) diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 8e5e5c1237732..ee01d0c8bb377 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -39,6 +39,7 @@ extern size_t kexec_purgatory_size; extern bool pe_has_bpf_section(const char *file_buf, unsigned long pe_sz); extern int pe_get_section(const char *file_buf, const char *sect_name, char **sect_start, unsigned long *sect_sz); +extern int decompose_kexec_image(struct kimage *image, int extended_fd); #else /* CONFIG_KEXEC_FILE */ static inline void kimage_file_post_load_cleanup(struct kimage *image) { } #endif /* CONFIG_KEXEC_FILE */ --=20 2.49.0