From nobody Mon Feb 9 15:09:02 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1614E26ED3A for ; Mon, 19 Jan 2026 03:26:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768793196; cv=none; b=HLW+V8eM4TMfb11JJ9zdRrWkxdXNnTsZjrBIsliSMJUbaVgC1qE8Runm3BiN5mF8lI3Hld4HddvsNsOErpmwKp+XD2P5KDHEneQFol7qWY+sRpV0gtW9sx7YlHhZ/KfwLubsuxta19uHydUCl1hp+0mp9oFlYrPJEjtw4TrNOGs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768793196; c=relaxed/simple; bh=oDGl3auAeZXO4XYXRZ0gANgpXVRHzHt7SXKT39Am37Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=of4IO6kzFcD9QrwfARjhMbC/zOyk9EhYmchqgO5VowTTaln8vF2WyhQAa2wu2eCzoEMlT2hsqFoz0jpECTZWg1Z57BFwFmdw0dZRI1FegpIFQ7mERk0ybcqAQntZGw7DO8RyOwpE97sIIv8WJoQpuXVUHPGQ/UKmuc0nHg9R6G8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HQRqDBzE; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HQRqDBzE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768793193; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ugTOtriRaWK2mGaURO2PPvf5ie8qlQnqTxshN8BJKDc=; b=HQRqDBzEGwk912iKcECyz10VikuTPYYjOj7tCghuXexXuNPUXeMLMYdpJo9JUmeYTrOYp0 75HMoGoXJn4YpbFyBLf0vRkOpEXsqrVws1I3ac8qhSwygcLczqWx691t+cHS5OUo18xFL4 ilpbt2ctSvl+JF7xQStDO0BKLO3z+2Q= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-610-nvbzhAXAPeSDtbFAM1V6uA-1; Sun, 18 Jan 2026 22:26:29 -0500 X-MC-Unique: nvbzhAXAPeSDtbFAM1V6uA-1 X-Mimecast-MFC-AGG-ID: nvbzhAXAPeSDtbFAM1V6uA_1768793187 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8A6C71800342; Mon, 19 Jan 2026 03:26:26 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.72.112.74]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 16FB71955F22; Mon, 19 Jan 2026 03:26:14 +0000 (UTC) From: Pingfan Liu To: kexec@lists.infradead.org Cc: Pingfan Liu , "David S. Miller" , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Jeremy Linton , Catalin Marinas , Will Deacon , Ard Biesheuvel , Simon Horman , Gerd Hoffmann , Vitaly Kuznetsov , Philipp Rudo , Viktor Malik , Jan Hendrik Farr , Baoquan He , Dave Young , Andrew Morton , bpf@vger.kernel.org, systemd-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCHv6 06/13] kexec_file: Implement decompress method for parser Date: Mon, 19 Jan 2026 11:24:17 +0800 Message-ID: <20260119032424.10781-7-piliu@redhat.com> In-Reply-To: <20260119032424.10781-1-piliu@redhat.com> References: <20260119032424.10781-1-piliu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" On arm64, there is no boot-time decompression for the kernel image. Therefore, when a compressed kernel image is loaded, it must be decompressed. It is impractical to implement the complex decompression methods in BPF bytecode. However, decompression routines exist in the kernel. This patch bridges the compressed data with the kernel's decompression methods. Signed-off-by: Pingfan Liu Cc: Baoquan He Cc: Dave Young Cc: Andrew Morton Cc: Philipp Rudo To: kexec@lists.infradead.org --- kernel/Kconfig.kexec | 2 +- kernel/kexec_bpf_loader.c | 203 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 204 insertions(+), 1 deletion(-) diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index 0c5d619820bcd..dbfdf34a78aa0 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -49,7 +49,7 @@ config KEXEC_FILE config KEXEC_BPF bool "Enable bpf-prog to parse the kexec image" depends on KEXEC_FILE - depends on DEBUG_INFO_BTF && BPF_SYSCALL + depends on DEBUG_INFO_BTF && BPF_SYSCALL && KEEP_DECOMPRESSOR help This is a feature to run bpf section inside a kexec image file, which parses the image properly and help kernel set up kexec boot protocol diff --git a/kernel/kexec_bpf_loader.c b/kernel/kexec_bpf_loader.c index dc59e1389da94..bd6a47fc53ed3 100644 --- a/kernel/kexec_bpf_loader.c +++ b/kernel/kexec_bpf_loader.c @@ -20,6 +20,7 @@ #include #include #include +#include #include "kexec_internal.h" =20 /* Load a ELF */ @@ -80,8 +81,210 @@ static int __init kexec_bpf_prog_run_init(void) } late_initcall(kexec_bpf_prog_run_init); =20 +#define KEXEC_BPF_CMD_DECOMPRESS 0x1 + +#define KEXEC_BPF_SUBCMD_KERNEL 0x1 +#define KEXEC_BPF_SUBCMD_INITRD 0x2 +#define KEXEC_BPF_SUBCMD_CMDLINE 0x3 + +struct cmd_hdr { + uint16_t cmd; + uint16_t subcmd; + uint32_t payload_len; +} __packed; + + +/* Max decompressed size is capped at 512M */ +#define MAX_UNCOMPRESSED_BUF_SIZE (1 << 29) +#define CHUNK_SIZE (1 << 23) + +struct decompress_mem_allocator { + void *chunk_start; + unsigned int chunk_size; + void *chunk_cur; + unsigned int next_idx; + char **chunk_base_addr; +}; + +/* + * This global allocator for decompression is protected by kexec lock. + */ +static struct decompress_mem_allocator dcmpr_allocator; + +/* + * Set up an active chunk to hold partial decompressed data. + */ +static char *allocate_chunk_memory(void) +{ + struct decompress_mem_allocator *a =3D &dcmpr_allocator; + char *p; + + if (unlikely((a->next_idx * a->chunk_size >=3D MAX_UNCOMPRESSED_BUF_SIZE)= )) + return NULL; + + p =3D __vmalloc(a->chunk_size, GFP_KERNEL | __GFP_ACCOUNT); + if (!p) + return NULL; + a->chunk_base_addr[a->next_idx++] =3D p; + a->chunk_start =3D a->chunk_cur =3D p; + + return p; +} + +static int merge_decompressed_data(struct decompress_mem_allocator *a, + char **out, unsigned int *size) +{ + unsigned int last_chunk_sz =3D a->chunk_cur - a->chunk_start; + unsigned long total_sz; + char *dst, *cur_dst; + int i; + + total_sz =3D (a->next_idx - 1) * a->chunk_size + last_chunk_sz; + cur_dst =3D dst =3D __vmalloc(total_sz, GFP_KERNEL | __GFP_ACCOUNT); + if (!dst) + return -ENOMEM; + + for (i =3D 0; i < a->next_idx - 1; i++) { + memcpy(cur_dst, a->chunk_base_addr[i], a->chunk_size); + cur_dst +=3D a->chunk_size; + vfree(a->chunk_base_addr[i]); + } + + memcpy(cur_dst, a->chunk_base_addr[i], last_chunk_sz); + vfree(a->chunk_base_addr[i]); + *out =3D dst; + *size =3D total_sz; + + return 0; +} + +static int decompress_mem_allocator_init( + struct decompress_mem_allocator *a, + unsigned int chunk_size) +{ + unsigned long sz =3D (MAX_UNCOMPRESSED_BUF_SIZE / chunk_size) * sizeof(vo= id *); + char *buf; + + a->chunk_base_addr =3D __vmalloc(sz, GFP_KERNEL | __GFP_ACCOUNT); + if (!a->chunk_base_addr) + return -ENOMEM; + + /* Pre-allocate the memory for the first chunk */ + buf =3D __vmalloc(chunk_size, GFP_KERNEL | __GFP_ACCOUNT); + if (!buf) { + vfree(a->chunk_base_addr); + return -ENOMEM; + } + a->chunk_base_addr[0] =3D buf; + a->chunk_start =3D a->chunk_cur =3D buf; + a->chunk_size =3D chunk_size; + a->next_idx =3D 1; + return 0; +} + +static void decompress_mem_allocator_fini(struct decompress_mem_allocator = *a) +{ + vfree(a->chunk_base_addr); +} + +/* + * This is a callback for decompress_fn. + * + * It copies the partial decompressed content in [buf, buf + len) to dst. = If the + * active chunk is not large enough, retire it and activate a new chunk to= hold + * the remaining data. + */ +static long flush(void *buf, unsigned long len) +{ + struct decompress_mem_allocator *a =3D &dcmpr_allocator; + long free, copied =3D 0; + + if (unlikely(len > a->chunk_size)) { + pr_info("Chunk size is too small to hold decompressed data\n"); + return -1; + } + free =3D a->chunk_start + a->chunk_size - a->chunk_cur; + BUG_ON(free < 0); + if (free < len) { + memcpy(a->chunk_cur, buf, free); + copied +=3D free; + a->chunk_cur +=3D free; + buf +=3D free; + len -=3D free; + a->chunk_start =3D a->chunk_cur =3D allocate_chunk_memory(); + if (unlikely(!a->chunk_start)) { + pr_info("Decompression runs out of memory\n"); + return -1; + } + } + memcpy(a->chunk_cur, buf, len); + copied +=3D len; + a->chunk_cur +=3D len; + return copied; +} + +static int parser_cmd_decompress(char *compressed_data, int image_gz_sz, + char **out_buf, int *out_sz, struct kexec_context *ctx) +{ + struct decompress_mem_allocator *a =3D &dcmpr_allocator; + decompress_fn decompressor; + const char *name; + int ret; + + decompress_mem_allocator_init(a, CHUNK_SIZE); + decompressor =3D decompress_method(compressed_data, image_gz_sz, &name); + if (!decompressor) { + pr_err("Can not find decompress method\n"); + return -1; + } + pr_debug("Find decompressing method: %s, compressed sz:0x%x\n", + name, image_gz_sz); + ret =3D decompressor(compressed_data, image_gz_sz, NULL, flush, + NULL, NULL, NULL); + if (!!ret) + goto err; + ret =3D merge_decompressed_data(a, out_buf, out_sz); + +err: + decompress_mem_allocator_fini(a); + + return ret; +} + static int kexec_buff_parser(struct bpf_parser_context *parser) { + struct bpf_parser_buf *pbuf =3D parser->buf; + struct kexec_context *ctx =3D (struct kexec_context *)parser->data; + struct cmd_hdr *cmd =3D (struct cmd_hdr *)pbuf->buf; + char *decompressed_buf, *buf, *p; + int decompressed_sz, ret; + + buf =3D pbuf->buf + sizeof(struct cmd_hdr); + if (cmd->payload_len + sizeof(struct cmd_hdr) > pbuf->size) { + pr_info("Invalid payload size:0x%x, while buffer size:0x%x\n", + cmd->payload_len, pbuf->size); + return -EINVAL; + } + switch (cmd->cmd) { + case KEXEC_BPF_CMD_DECOMPRESS: + ret =3D parser_cmd_decompress(buf, cmd->payload_len, &decompressed_buf, + &decompressed_sz, ctx); + if (!ret) { + switch (cmd->subcmd) { + case KEXEC_BPF_SUBCMD_KERNEL: + vfree(ctx->kernel); + ctx->kernel =3D decompressed_buf; + ctx->kernel_sz =3D decompressed_sz; + break; + default: + break; + } + } + break; + default: + break; + } + return 0; } =20 --=20 2.49.0