From nobody Fri Dec 19 20:13:34 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2051017623C for ; Mon, 19 Aug 2024 14:55:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724079346; cv=none; b=LNJUSVCPR78Vg+RPOgHV+r5nnyBS3bk0UTay+Y3cGcraTvUFB16Gr8u38KG+XuxAbBlbSBGKClUVa42qi+aPiKRJVlQBCzxg4M5q/1L1OYcrpJCAxxjORaWy+2Q3KgwU2EMMDiO7sHPxJXrVJX6EO6iuu2Ymu2SBNU7eDiodMcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724079346; c=relaxed/simple; bh=8s4T1MmfOYRHa3jn57rN4qLtao9vpsJGkAiHYpBYsWg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X/crcAqhoXzR74SvyA5iVre9bkQ1OZBRWTrnXnPgCB004UvKqMTs1HluZJO44LvQ6ykHupJ7myTh4FiEMfAFACZxueFQmd8gllpL8njw6wuu73+gxy5LiBGeIeU3coB2yKBG+jFumafYIV4uWKNg7wkzSFnY61vSUeMIkYjg/2o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=U+Ktwuwm; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="U+Ktwuwm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724079343; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4jaYTHdT48upkBaupKJQL4htOu5I8EmdY+ZBsK3Qb3Y=; b=U+Ktwuwm9EQK2qTEp0YlPmB4ISSSvJC3/VYTvNTB6odtZvQWDicvn1Sr8+W1TjuSg7c7Hr EkHWfFe2Pw4by2zXnBkVmAKIjedOtnBetdTwJVytfc9TK6ogBz9Jg8XSAht1BnLYZlS1Ku ZdEouAs2BihA9PRolnv56GBPEsqazgc= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-263-ojJaEWOyO4-NUJRFsmIPpg-1; Mon, 19 Aug 2024 10:55:39 -0400 X-MC-Unique: ojJaEWOyO4-NUJRFsmIPpg-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 448F51955D58; Mon, 19 Aug 2024 14:55:38 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.72.116.15]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 666D619560A3; Mon, 19 Aug 2024 14:55:29 +0000 (UTC) From: Pingfan Liu To: kexec@lists.infradead.org Cc: Pingfan Liu , Ard Biesheuvel , Jan Hendrik Farr , Philipp Rudo , Lennart Poettering , Jarkko Sakkinen , Eric Biederman , Baoquan He , Dave Young , Mark Rutland , Will Deacon , Catalin Marinas , linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFCv2 5/9] kexec: Introduce kexec_pe_image to parse and load PE file Date: Mon, 19 Aug 2024 22:53:38 +0800 Message-ID: <20240819145417.23367-6-piliu@redhat.com> In-Reply-To: <20240819145417.23367-1-piliu@redhat.com> References: <20240819145417.23367-1-piliu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" As UEFI becomes popular, a few architectures support to boot a PE format kernel image directly. And the internal of PE format may be different. This introduces a new kexec_file_ops implementation, named pe_image_ops, which prepares UEFI environment for the trampoline code 'efi emulator'. The pe_image_ops considers efi emulator and its input parameters, 'efi_emulator_param' as two additional kexec_segment. And it constructs efi_memory_desc_t[], encodes efi runtime service info inside the parameter = buffer. Finally, it asks architecture implement's page table routine to set up identity map for all memory used in 'efi emulator' To do: This is a POC version, at present, it aims for arm64, later, it needs abstraction to cope with x86 Signed-off-by: Pingfan Liu Cc: Baoquan He Cc: Dave Young Cc: Eric Biederman Cc: Ard Biesheuvel Cc: linux-kernel@vger.kernel.org To: kexec@lists.infradead.org --- include/linux/kexec.h | 1 + kernel/Makefile | 1 + kernel/kexec_pe_image.c | 503 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 505 insertions(+) create mode 100644 kernel/kexec_pe_image.c diff --git a/include/linux/kexec.h b/include/linux/kexec.h index cff6b6869498b..57b98bcaa5228 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -385,6 +385,7 @@ static inline int machine_kexec_post_load(struct kimage= *image) { return 0; } =20 extern struct kimage *kexec_image; extern struct kimage *kexec_crash_image; +extern const struct kexec_file_ops pe_image_ops; =20 bool kexec_load_permitted(int kexec_image_type); =20 diff --git a/kernel/Makefile b/kernel/Makefile index 3c13240dfc9f0..f14d78b03fd0f 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -74,6 +74,7 @@ obj-$(CONFIG_KEXEC_CORE) +=3D kexec_core.o obj-$(CONFIG_CRASH_DUMP) +=3D crash_core.o obj-$(CONFIG_KEXEC) +=3D kexec.o obj-$(CONFIG_KEXEC_FILE) +=3D kexec_file.o +obj-$(CONFIG_ARCH_SELECTS_KEXEC_PEIMAGE) +=3D kexec_pe_image.o obj-$(CONFIG_KEXEC_ELF) +=3D kexec_elf.o obj-$(CONFIG_BACKTRACE_SELF_TEST) +=3D backtracetest.o obj-$(CONFIG_COMPAT) +=3D compat.o diff --git a/kernel/kexec_pe_image.c b/kernel/kexec_pe_image.c new file mode 100644 index 0000000000000..d14b8a5f69a99 --- /dev/null +++ b/kernel/kexec_pe_image.c @@ -0,0 +1,503 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Kexec image loader + + * Copyright (C) 2024 Red Hat, Inc + * Author: Pingfan Liu + */ + +#define pr_fmt(fmt) "kexec_file(Image): " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * The UEFI Terse Executable (TE) image has MZ header. +*/ +static int pe_image_probe(const char *kernel_buf, unsigned long kernel_len) +{ + struct mz_hdr *mz =3D (struct mz_hdr *)kernel_buf; + struct pe_hdr *pe; + + if (mz->magic !=3D MZ_MAGIC) + return -1; + pe =3D (struct pe_hdr *)(kernel_buf + mz->peaddr); + if (pe->magic !=3D PE_MAGIC) + return -1; + + return 0; +} + +/* + * Efi runtime code or data are marked as EFI_RUNTIME_SERVICES_CODE + * or EFI_RUNTIME_SERVICES_DATA memory descriptor. When building mapping f= or the + * emulator, they should be mapped. Here just recording the entries to tho= se regions. + */ +static void build_rt_info(struct efi_rt_info *rt) +{ + unsigned int desc_size =3D efi.memmap.desc_size; + int i, cnt; + + i =3D cnt =3D 0; + if (desc_size) { + cnt =3D (efi.memmap.map_end - efi.memmap.map) / desc_size; + /* The emulator heap memory split the continous memory into three chunk = */ + cnt +=3D 2; + } + + /* + * This virtual address is in UEFI address space, and the mapping is reco= rded + * in efi_memory_desc_t[] and passed to reboot kernel. + * + * EFI stub may use runtime service, so its mapping should be set up + */ + rt->runtime =3D efi.runtime; + rt->runtime_version =3D efi.runtime_version; + rt->runtime_supported_mask =3D efi.runtime_supported_mask; + /* + * Reconstruct the persistent systab's tables, which are recorded in efi. + */ + if (efi.acpi !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.acpi; + rt->systab_tables[i].guid =3D ACPI_TABLE_GUID; + i++; + } + if (efi.acpi20 !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.acpi20; + rt->systab_tables[i].guid =3D ACPI_20_TABLE_GUID; + i++; + } + if (efi.smbios !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.smbios; + rt->systab_tables[i].guid =3D SMBIOS_TABLE_GUID; + i++; + } + if (efi.smbios3 !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.smbios3; + rt->systab_tables[i].guid =3D SMBIOS3_TABLE_GUID; + i++; + } + if (efi.esrt !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.esrt; + rt->systab_tables[i].guid =3D EFI_SYSTEM_RESOURCE_TABLE_GUID; + i++; + } + if (efi.tpm_log !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.tpm_log; + rt->systab_tables[i].guid =3D LINUX_EFI_TPM_EVENT_LOG_GUID; + i++; + } + if (efi.tpm_final_log !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.tpm_final_log; + rt->systab_tables[i].guid =3D EFI_TCG2_FINAL_EVENTS_TABLE_GUID; + i++; + } +#ifdef CONFIG_LOAD_UEFI_KEYS + if (efi.mokvar_table !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.mokvar_table; + rt->systab_tables[i].guid =3D LINUX_EFI_MOK_VARIABLE_TABLE_GUID; + i++; + } +#endif +#ifdef CONFIG_EFI_COCO_SECRET + if (efi.coco_secret !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.coco_secret; + rt->systab_tables[i].guid =3D LINUX_EFI_COCO_SECRET_AREA_GUID; + i++; + } +#endif +#ifdef CONFIG_UNACCEPTED_MEMORY + if (efi.unaccepted !=3D EFI_INVALID_TABLE_ADDR) { + rt->systab_tables[i].table =3D (void *)efi.unaccepted; + rt->systab_tables[i].guid =3D LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID; + i++; + } +#endif + if (!desc_size) { + //todo, if noefi boot, LINUX_EFI_MEMRESERVE_TABLE_GUID should be used to= pass IMA + } + + rt->systab_nr_tables =3D i; + + rt->memmap.map_size =3D desc_size * cnt; + /* In case of non-EFI booting, these two fields will be faked later */ + rt->memmap.desc_size =3D efi.memmap.desc_size; + rt->memmap.desc_ver =3D efi.memmap.desc_version; +} + +static int create_md_from_res(struct resource *res, void *data) +{ + struct efi_emulator_param *param =3D (struct efi_emulator_param *)data; + struct efi_boot_memmap *memmap =3D ¶m->rt_info.memmap; + efi_memory_desc_t *dst_md; + + dst_md =3D (void *)memmap->map + memmap->map_size; + + /* Split res into three chunk */ + if ((res->start <=3D param->mempool_start) && + res->end > (param->mempool_start + param->mempool_sz)) { + + dst_md->phys_addr =3D res->start; + dst_md->num_pages =3D (param->mempool_start - res->start) >> EFI_PAGE_SH= IFT; + /* Pretend that it is occupied */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + dst_md->attribute =3D EFI_MEMORY_WB; + dst_md =3D (void *)dst_md + memmap->desc_size; + + dst_md->phys_addr =3D param->mempool_start; + dst_md->num_pages =3D param->mempool_sz >> EFI_PAGE_SHIFT; + /* Confine memory footprint inside this region before exit boot service = */ + dst_md->type =3D EFI_CONVENTIONAL_MEMORY; + dst_md->attribute =3D EFI_MEMORY_WB; + dst_md =3D (void *)dst_md + memmap->desc_size; + + dst_md->phys_addr =3D param->mempool_start + param->mempool_sz; + dst_md->num_pages =3D (res->end - dst_md->phys_addr) >> EFI_PAGE_SHIFT; + /* Pretend that it is occupied */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + dst_md->attribute =3D EFI_MEMORY_WB; + + memmap->map_size +=3D 3 * (memmap->desc_size); + } else { + dst_md->phys_addr =3D res->start; + dst_md->num_pages =3D (res->end - res->start) >> EFI_PAGE_SHIFT; + /* Pretend that it is occupied */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + dst_md->attribute =3D EFI_MEMORY_WB; + + memmap->map_size +=3D memmap->desc_size; + } + + return 0; +} + +static efi_memory_desc_t *conclude_md(efi_memory_desc_t *dst_md, + unsigned int desc_size, unsigned long pool_start, unsigned long num_page= s) +{ + if (dst_md->num_pages =3D=3D 0) + return dst_md; + + /* Split into three sections */ + if (dst_md->phys_addr <=3D pool_start && dst_md->num_pages >=3D num_pages= ) { + u64 virt_base, phys_base, left_pages, attribute; + u32 type; + + type =3D dst_md->type; + attribute =3D dst_md->attribute; + phys_base =3D dst_md->phys_addr; + /* + * After SetVirtualAddressMap, the mapping is installed into + * firmware and can not be changed. The second kernel should + * be aware of this info + */ + virt_base =3D dst_md->virt_addr; + left_pages =3D dst_md->num_pages; + + dst_md->num_pages =3D (pool_start - phys_base) >> EFI_PAGE_SHIFT; + /* Pretend that it is occupied until exit boot service */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + left_pages -=3D dst_md->num_pages; + + dst_md =3D (void *)dst_md + desc_size; + dst_md->phys_addr =3D pool_start; + dst_md->virt_addr =3D virt_base + (dst_md->phys_addr - phys_base); + dst_md->num_pages =3D num_pages; + dst_md->attribute =3D attribute; + /* Confine the memory footprint on it, which has page table mapping */ + dst_md->type =3D EFI_CONVENTIONAL_MEMORY; + left_pages -=3D dst_md->num_pages; + + dst_md =3D (void *)dst_md + desc_size; + dst_md->phys_addr =3D pool_start + (num_pages << EFI_PAGE_SHIFT); + dst_md->virt_addr =3D virt_base + (dst_md->phys_addr - phys_base); + dst_md->num_pages =3D left_pages; + dst_md->attribute =3D attribute; + /* Pretend that it is occupied until exit boot service */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + } + + dst_md =3D (void *)dst_md + desc_size; + + return dst_md; +} + +static void create_md_from_efi(efi_memory_desc_t *dst, unsigned int desc_s= z, + struct efi_emulator_param *param) +{ + efi_memory_desc_t *md, *dst_md, *prev_md =3D NULL; + unsigned long pool_start, num_pages; + + dst_md =3D dst; + pool_start =3D param->mempool_start; + num_pages =3D param->mempool_sz >> EFI_PAGE_SHIFT; + for_each_efi_memory_desc(md) { + switch (md->type) { + // todo, how to ensure kexec dst avoid the EFI_RUNTIME_SERVICES_CODE etc + case EFI_RUNTIME_SERVICES_CODE: + case EFI_RUNTIME_SERVICES_DATA: + case EFI_RESERVED_TYPE: + case EFI_UNUSABLE_MEMORY: + case EFI_ACPI_RECLAIM_MEMORY: + //test whether the current dst_md covers the heap, if yes, split it in= to three + dst_md =3D conclude_md(dst_md, desc_sz, pool_start, num_pages); + dst_md->phys_addr =3D md->phys_addr; + dst_md->virt_addr =3D md->virt_addr; + dst_md->num_pages =3D md->num_pages; + dst_md->type =3D md->type; + dst_md->attribute =3D md->attribute; + dst_md =3D conclude_md(dst_md, desc_sz, pool_start, num_pages); + break; + + default: + if (dst_md->num_pages =3D=3D 0) { + dst_md->phys_addr =3D md->phys_addr; + dst_md->virt_addr =3D md->virt_addr; + dst_md->num_pages =3D md->num_pages; + /* + * Pretend as boot service data to prevent allocation + * from it before efi_exit_boot_services. + */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + dst_md->attribute =3D md->attribute; + } else { + /* merge */ + if (prev_md && prev_md->attribute =3D=3D md->attribute && + (md->phys_addr - prev_md->phys_addr) >> EFI_PAGE_SHIFT =3D=3D prev_md= ->num_pages) { + dst_md->num_pages +=3D md->num_pages; + } else { + dst_md =3D conclude_md(dst_md, desc_sz, pool_start, num_pages); + dst_md->phys_addr =3D md->phys_addr; + dst_md->virt_addr =3D md->virt_addr; + dst_md->num_pages =3D md->num_pages; + /* Pretend as boot service data */ + dst_md->type =3D EFI_BOOT_SERVICES_DATA; + dst_md->attribute =3D md->attribute; + } + } + break; + } + + prev_md =3D md; + } + dst_md =3D conclude_md(dst_md, desc_sz, pool_start, num_pages); + param->rt_info.memmap.map_size =3D (void *)dst_md - (void *)dst; +} + +/* + * All efi runtime information should be passed to kexec reboot kernel. Re= cord + * them in scratch. These information should be in EFI_RUNTIME_SERVICES md= , so + * copying index is enough. Later the mapping for that md will be set up in + * arch_emulator_prepare_pgtable(). + * + * If the system is not booted by efi, fake one. + */ +static void encode_efi_runtime_info(struct efi_emulator_param *param, stru= ct kimage *image) +{ + struct efi_rt_info *rt =3D ¶m->rt_info; + efi_memory_desc_t *dst_md; + unsigned int sz; + + build_rt_info(rt); + sz =3D efi.memmap.map_end - efi.memmap.map; + /* booted by efi firmware */ + if (sz) { + unsigned long desc_size =3D efi.memmap.desc_size; + + dst_md =3D rt->memmap.map; + /* + * It has no point to pass efi.memmap directly to the reboot kernel since + * EFI_BOOT_SERVICES_DATA etc has changed. But EFI_RUNTIME_SERVICES_DATA + * etc should be paid attention to. + */ + create_md_from_efi(dst_md, desc_size, param); + } else { + param->noefi_boot =3D true; + /* + * Kernel is booted by non-EFI loader, which parses the PE format. In + * kexec case, faking memory descriptor so that efi-stub can self-parse. + * But there is no efi runtime service, and the kernel cmdline should + * have 'noefi'=20 + */ + rt->memmap.desc_size =3D sizeof(efi_memory_desc_t); + rt->memmap.desc_ver =3D EFI_MEMORY_DESCRIPTOR_VERSION; + walk_system_ram_res(0, ULONG_MAX, param, create_md_from_res); + + /* + * Besides, efi stub removes two dt node '/memreserve', one for initrd + * and the other for IMA. EFI_RESERVED_TYPE can not serve that purpose. + * That should be handled by LINUX_EFI_MEMRESERVE_TABLE_GUID + */ + } +} + +extern phys_addr_t arch_emulator_prepare_pgtable(struct kimage *kimage, + struct efi_emulator_param *param); + +static void detect_earlycon(struct efi_emulator_param *param) +{ + //todo, autodetect the console register base and size + // aarch64 MACHINE_VIRT_UART_BASE + size_t sz =3D 15 < strlen("amba-pl011") ? 15 : strlen("amba-pl011"); + + memcpy(param->earlycon_name, "amba-pl011", sz); + param->earlycon_reg_base =3D 0x9000000; + param->earlycon_reg_sz =3D PAGE_SIZE; + param->print_enabled =3D true; +} + +static phys_addr_t emulator_prepare_pgtable(struct kimage *kimage, + struct efi_emulator_param *param) +{ + + detect_earlycon(param); + + return arch_emulator_prepare_pgtable(kimage, param); +} + +/* param, stack, and basic heap */ +#define EMULATOR_STACK_SIZE (1 << 17) +#define EMULATOR_PARAM_SIZE (1 << 20) + +static void *pe_image_load(struct kimage *image, + char *kernel, unsigned long kernel_len, + char *initrd, unsigned long initrd_len, + char *cmdline, unsigned long cmdline_len) +{ + struct kexec_segment *emulator_seg, *param_seg, *kernel_segment; + struct efi_emulator_param *param; + struct page *param_pages; + struct kexec_buf kbuf; + unsigned long kseg_num; + int ret; + + image->is_pe =3D true; + /* Do not parse image format, just load it. */ + kbuf.image =3D image; + kbuf.buf_min =3D 0; + kbuf.buf_max =3D ULONG_MAX; + kbuf.top_down =3D false; + kbuf.buffer =3D kernel; + kbuf.bufsz =3D kernel_len; + kbuf.mem =3D KEXEC_BUF_MEM_UNKNOWN; + kbuf.memsz =3D kernel_len; + kbuf.buf_align =3D MIN_KIMG_ALIGN; + + kseg_num =3D image->nr_segments; + /* + * The location of the kernel segment may make it impossible to satisfy + * the other segment requirements, so we try repeatedly to find a + * location that will work. + */ + while ((ret =3D kexec_add_buffer(&kbuf)) =3D=3D 0) { + /* Try to load additional data */ + kernel_segment =3D &image->segment[kseg_num]; + ret =3D load_other_segments(image, kernel_segment->mem, + kernel_segment->memsz, initrd, + initrd_len, cmdline); + if (!ret) + break; + + /* + * We couldn't find space for the other segments; erase the + * kernel segment and try the next available hole. + */ + image->nr_segments -=3D 1; + kbuf.buf_min =3D kernel_segment->mem + kernel_segment->memsz; + kbuf.mem =3D KEXEC_BUF_MEM_UNKNOWN; + } + + if (ret) { + pr_err("Could not find any suitable kernel location!"); + return ERR_PTR(ret); + } + kernel_segment =3D &image->segment[kseg_num]; + + /* Load EFI emulator */ + emulator_seg =3D &image->segment[image->nr_segments]; + kbuf.buffer =3D _efi_emulator_start; + kbuf.bufsz =3D _efi_emulator_end - _efi_emulator_start; + kbuf.mem =3D KEXEC_BUF_MEM_UNKNOWN; + kbuf.memsz =3D kbuf.bufsz; + kbuf.buf_align =3D PAGE_SIZE; + ret =3D kexec_add_buffer(&kbuf); + if (ret) + return ERR_PTR(ret); + + /*=20 + * Prepare param and memory for emulator. One page for param, + * rear page for stack and the rest for runtime heap. + */ + param_seg =3D &image->segment[image->nr_segments]; + //to do, zero page is not required in kimage_entry_t + // and free them + param_pages =3D alloc_pages(GFP_KERNEL | __GFP_ZERO, 2); + param =3D page_to_virt(param_pages); + + /* These chunk of information will be copied to the KEXEC SOURCE PAGE */ + /* emulator loaded address */ + param->load_address =3D emulator_seg->mem; + /* PE file payload at the beginning of this RAM range */ + param->kernel_img_start =3D kernel_segment->mem; + param->kernel_img_sz =3D kernel_segment->memsz; + /* uefi protocol need it */ + param->dtb =3D image->arch.dtb_mem; + param->sz_in_byte =3D utf8s_to_utf16s(cmdline, cmdline_len, + UTF16_LITTLE_ENDIAN, param->cmdline, 512); + param->sz_in_byte *=3D sizeof(wchar_t); + kbuf.buffer =3D param; + /* + * One page for param, one page for stack, the rest for heap, which should + * also has enough room for kernel and its decompression (256MB) + */ + kbuf.bufsz =3D EMULATOR_PARAM_SIZE + kernel_segment->memsz + (1 << 28); + kbuf.mem =3D KEXEC_BUF_MEM_UNKNOWN; + kbuf.memsz =3D kbuf.bufsz; + kbuf.buf_align =3D PAGE_SIZE; + ret =3D kexec_add_buffer(&kbuf); + if (ret) + return ERR_PTR(ret); + + /* These info is formed after param segment is built */ + /* stack size equals 64K -sizeof(*param) */ + param->sp =3D param_seg->mem + EMULATOR_STACK_SIZE; + /* heap information */ + param->mempool_start =3D param->sp; + param->mempool_sz =3D kbuf.bufsz - EMULATOR_STACK_SIZE; + + param->pgd_root =3D emulator_prepare_pgtable(image, param); + /* For the time being, reset routine will turn off mmu */ + param->mmu_on =3D false; + encode_efi_runtime_info(param, image); + + /* relocate.s jumps to it */ + image->start =3D emulator_seg->mem; + /* the 3rd input param for restart() */ + image->arch.param_mem =3D param_seg->mem; + image->arch.dtb_mem =3D image->arch.param_mem; + + kexec_dprintk("Loaded emulator at 0x%lx bufsz=3D0x%lx\n", + emulator_seg->mem, emulator_seg->memsz); + kexec_dprintk("Loaded param blob at 0x%lx bufsz=3D0x%lx, sp=3D0x%lx\n", + param_seg->mem, param_seg->memsz, param->sp); + kexec_dprintk("pgd_root:0x%llx\n", param->pgd_root); + + return NULL; +} + +const struct kexec_file_ops pe_image_ops =3D { + .probe =3D pe_image_probe, + .load =3D pe_image_load, +#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG + .verify_sig =3D kexec_kernel_verify_pe_sig, +#endif +}; --=20 2.41.0