From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25CC42D7DDE for ; Thu, 18 Sep 2025 22:26:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234385; cv=none; b=lC4f3OzwNvItafz9itvkHzbTzgv00FoCfTBVQ4+xQBFfhEEtgKXEqJ0lVrGyGolQlKABGDfFS938miiy5W65oAFz9nA+TCvhbIsRqJjbJtWwp7NhQff+Lu/iAKMmBlxJNrMt1a1mhJI7T+cs32BN0Hs2NbF29bSxiGnKhotIcKI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234385; c=relaxed/simple; bh=2OKqvyQICRFZSfPZAhPfgwUUxwuBx7yc6B6USN+Ak8c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jLxsQ0dCvFrqBa7hacIwz//Q3QG4NWH+TAx5Vg20ebKO400NSen8tEJjiWKVzPszvP/BJOvDMQKayJTZMSxZHO8/VGBOwOFqKSa+cFL7eZ1m3lDYVN1kHnz4BEeCmfzdF53KbhXioDN9NW/yhil9tLnVa1U53EmJzDLkJY4dsmA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QhnKfrXA; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QhnKfrXA" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-76e6cbb991aso1410707b3a.1 for ; Thu, 18 Sep 2025 15:26:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234382; x=1758839182; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T1dFsvQMxvcjtQz197QK9xl+/t1w6WalvHTZ6lHd38A=; b=QhnKfrXAYIHbyKISrJtzCrJrhQFDP8wAtPrKxVCqkqRA+4ziFVHSBRrMD5JqgceKZM yBOMYaxLWlnGpgJJu1c0xbqnZcIyjzasIeGVyH0SYd3TiRwUNrrtUyhg1VngNnwoy2ej jXsr0qYktjwtY4ktVxAIyaRxl/xoNAAoJ5Qa0gaiX7JXYyNnDZb5KC/sTqZ4qyuHD5q5 KOM3quHWPXjCCItD7jlLD0zesP0o22oEf9VZwoey3wzdKKKPSoigrEdXVqomjP/vyWR5 upU/PqHmRjIKaV9yVEZHPbpZvT418z6k/4rb9UEWee+LDGPM30lXJzNsJTWTuTyi0DFM kmuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234382; x=1758839182; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T1dFsvQMxvcjtQz197QK9xl+/t1w6WalvHTZ6lHd38A=; b=V5V9eL4wbEbn/uGpcLb6NpEIp98tEe0t/P3vDfbfdikMLoFy9o1oafTFUubZvUSyiD mVEYK1TbDeT6DFo9Erz2nmLNdPlVRU9vSMmCEiwEujwYCQ9r1sNoVky7c05L9SvhNB5C CfzgRhU7gLz4MPMb8oAgwWe3T6SM9KwnWN0DNmBgeJS1k49gR7eSBi3JXFIF6wXWpp5X qG23/8bZKNIGtQC3NK6/mUqo6Lz57Anj3FEYT76w96oLwW8VXUiuU0ka3uZcHo5kXK9T HUvz+eICDIZQTLG9ZBilOLAD3PgPoipNlvfH4lqA650M+WFjFzwDTLpKiX3mB3mf6S+G h/8g== X-Gm-Message-State: AOJu0YwXgXEzT04uXJdRdIuVERBta8xPGdJNW/Ywfm0BXURcG1N9A4ut ziqlFToOo1n5EMUyT2++oR2fohC0RUSUGRxo+1VMf+4l/xE7fy8CbKfb+6ez5w== X-Gm-Gg: ASbGnctMDmiyIrKr0IvyHT5CbV+EaDYfAYkfrC8Wm1hlzfk0GnbylUsTUCHPeg7y2Xn bKqWucfi+sG3/7ocE+IJLZ28rW8Kxtbs4C0mlLb1km5tKBJePwlajMmdYhj6ICk55JjH+OX+ZAh +IZecAnfzO2mk6wMRTyiNdEE2hMYlDVP0cb8Sw6HD4V97Xwz+L9KqmfM2TnIwCZRyLb7QAFZqLF R6DaYKVO9jqJLJSvDgg0tyQ07fl2x3volze/fu09H0DS5IbF1gNmPems/KAe+NCOP1hchOBRH8x 1G8yfbW1A1ABbRvrd8/bSB+bb71JM6nEYgusI+WgpQJ3vFmluJhp83f6IVhp4P98i5j94goDjMo 5DeLLBP4jlwGfwHMsHnhhZ8bJpZ5dicgOB5QcjNpIisvunDI= X-Google-Smtp-Source: AGHT+IHiSRMLDAqLw/Cik8QC7OKyXdyrkBpwegYIn++ScY0FjSln61N27eWeZhvArH7Hx9TmCQ0oyw== X-Received: by 2002:a05:6a00:1bcd:b0:77e:87ea:ecac with SMTP id d2e1a72fcca58-77e87eaef9dmr281236b3a.24.1758234381924; Thu, 18 Sep 2025 15:26:21 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:21 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 1/7] kexec: Introduce multikernel support via kexec Date: Thu, 18 Sep 2025 15:26:00 -0700 Message-Id: <20250918222607.186488-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang This patch extends the kexec subsystem to support multikernel functionality, allowing different kernel instances to be loaded and executed on specific CPUs. The implementation introduces: - New KEXEC_TYPE_MULTIKERNEL type and KEXEC_MULTIKERNEL flag - multikernel_kick_ap() function for CPU-specific kernel booting - LINUX_REBOOT_CMD_MULTIKERNEL reboot command with CPU parameter - Specialized segment loading for multikernel images using memremap - Integration with existing kexec infrastructure while bypassing standard machine_kexec_prepare() for avoiding resets The multikernel_kexec() function validates CPU availability and uses the existing kexec image start address to boot the target CPU with a different kernel instance. This enables heterogeneous computing scenarios where different CPUs can run specialized kernel variants. Signed-off-by: Cong Wang --- arch/x86/include/asm/smp.h | 1 + arch/x86/kernel/smpboot.c | 104 +++++++++++++++++++++++++++ include/linux/kexec.h | 6 +- include/uapi/linux/kexec.h | 1 + include/uapi/linux/reboot.h | 2 +- kernel/kexec.c | 41 ++++++++++- kernel/kexec_core.c | 135 ++++++++++++++++++++++++++++++++++++ kernel/reboot.c | 10 +++ 8 files changed, 294 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 22bfebe6776d..1a59fd0de759 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -107,6 +107,7 @@ void native_smp_prepare_cpus(unsigned int max_cpus); void native_smp_cpus_done(unsigned int max_cpus); int common_cpu_up(unsigned int cpunum, struct task_struct *tidle); int native_kick_ap(unsigned int cpu, struct task_struct *tidle); +int multikernel_kick_ap(unsigned int cpu, unsigned long kernel_start_addre= ss); int native_cpu_disable(void); void __noreturn hlt_play_dead(void); void native_play_dead(void); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 33e166f6ab12..c2844a493ebf 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -833,6 +833,72 @@ int common_cpu_up(unsigned int cpu, struct task_struct= *idle) return 0; } =20 +// must be locked by cpus_read_lock() +static int do_multikernel_boot_cpu(u32 apicid, int cpu, unsigned long kern= el_start_address) +{ + unsigned long start_ip =3D real_mode_header->trampoline_start; + int ret; + + pr_info("do_multikernel_boot_cpu(apicid=3D%u, cpu=3D%u, kernel_start_addr= ess=3D%lx)\n", apicid, cpu, kernel_start_address); +#ifdef CONFIG_X86_64 + /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */ + if (apic->wakeup_secondary_cpu_64) + start_ip =3D real_mode_header->trampoline_start64; +#endif + //initial_code =3D (unsigned long)start_secondary; + initial_code =3D (unsigned long)kernel_start_address; + + if (IS_ENABLED(CONFIG_X86_32)) { + early_gdt_descr.address =3D (unsigned long)get_cpu_gdt_rw(cpu); + //initial_stack =3D idle->thread.sp; + } else if (!(smpboot_control & STARTUP_PARALLEL_MASK)) { + smpboot_control =3D cpu; + } + + /* Skip init_espfix_ap(cpu); */ + + /* Skip announce_cpu(cpu, apicid); */ + + /* + * This grunge runs the startup process for + * the targeted processor. + */ + if (x86_platform.legacy.warm_reset) { + + pr_debug("Setting warm reset code and vector.\n"); + + smpboot_setup_warm_reset_vector(start_ip); + /* + * Be paranoid about clearing APIC errors. + */ + if (APIC_INTEGRATED(boot_cpu_apic_version)) { + apic_write(APIC_ESR, 0); + apic_read(APIC_ESR); + } + } + + smp_mb(); + + /* + * Wake up a CPU in difference cases: + * - Use a method from the APIC driver if one defined, with wakeup + * straight to 64-bit mode preferred over wakeup to RM. + * Otherwise, + * - Use an INIT boot APIC message + */ + if (apic->wakeup_secondary_cpu_64) + ret =3D apic->wakeup_secondary_cpu_64(apicid, start_ip, cpu); + else if (apic->wakeup_secondary_cpu) + ret =3D apic->wakeup_secondary_cpu(apicid, start_ip, cpu); + else + ret =3D wakeup_secondary_cpu_via_init(apicid, start_ip, cpu); + + pr_info("do_multikernel_boot_cpu end\n"); + /* If the wakeup mechanism failed, cleanup the warm reset vector */ + if (ret) + arch_cpuhp_cleanup_kick_cpu(cpu); + return ret; +} /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -905,6 +971,44 @@ static int do_boot_cpu(u32 apicid, unsigned int cpu, s= truct task_struct *idle) return ret; } =20 +// must be locked by cpus_read_lock() +int multikernel_kick_ap(unsigned int cpu, unsigned long kernel_start_addre= ss) +{ + u32 apicid =3D apic->cpu_present_to_apicid(cpu); + int err; + + lockdep_assert_irqs_enabled(); + + pr_info("++++++++++++++++++++=3D_---CPU UP %u\n", cpu); + + if (apicid =3D=3D BAD_APICID || !apic_id_valid(apicid)) { + pr_err("CPU %u has invalid APIC ID %x. Aborting bringup\n", cpu, apicid); + return -EINVAL; + } + + if (!test_bit(apicid, phys_cpu_present_map)) { + pr_err("CPU %u APIC ID %x is not present. Aborting bringup\n", cpu, apic= id); + return -EINVAL; + } + + /* + * Save current MTRR state in case it was changed since early boot + * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync: + */ + mtrr_save_state(); + + /* the FPU context is blank, nobody can own it */ + per_cpu(fpu_fpregs_owner_ctx, cpu) =3D NULL; + /* skip common_cpu_up(cpu, tidle); */ + + err =3D do_multikernel_boot_cpu(apicid, cpu, kernel_start_address); + if (err) + pr_err("do_multikernel_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu= ); + + return err; +} + + int native_kick_ap(unsigned int cpu, struct task_struct *tidle) { u32 apicid =3D apic->cpu_present_to_apicid(cpu); diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 39fe3e6cd282..a3ae3e561109 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -358,9 +358,10 @@ struct kimage { unsigned long control_page; =20 /* Flags to indicate special processing */ - unsigned int type : 1; + unsigned int type : 2; #define KEXEC_TYPE_DEFAULT 0 #define KEXEC_TYPE_CRASH 1 +#define KEXEC_TYPE_MULTIKERNEL 2 unsigned int preserve_context : 1; /* If set, we are using file mode kexec syscall */ unsigned int file_mode:1; @@ -434,6 +435,7 @@ extern void machine_kexec(struct kimage *image); extern int machine_kexec_prepare(struct kimage *image); extern void machine_kexec_cleanup(struct kimage *image); extern int kernel_kexec(void); +extern int multikernel_kexec(int cpu); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); =20 @@ -455,7 +457,7 @@ bool kexec_load_permitted(int kexec_image_type); #define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_C= RASH_HOTPLUG_SUPPORT) #else #define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UP= DATE_ELFCOREHDR | \ - KEXEC_CRASH_HOTPLUG_SUPPORT) + KEXEC_CRASH_HOTPLUG_SUPPORT | KEXEC_MULTIKERNEL) #endif =20 /* List of defined/legal kexec file flags */ diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 8958ebfcff94..4ed8660ef95e 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -14,6 +14,7 @@ #define KEXEC_PRESERVE_CONTEXT 0x00000002 #define KEXEC_UPDATE_ELFCOREHDR 0x00000004 #define KEXEC_CRASH_HOTPLUG_SUPPORT 0x00000008 +#define KEXEC_MULTIKERNEL 0x00000010 #define KEXEC_ARCH_MASK 0xffff0000 =20 /* diff --git a/include/uapi/linux/reboot.h b/include/uapi/linux/reboot.h index 58e64398efc5..aac2f2f94a98 100644 --- a/include/uapi/linux/reboot.h +++ b/include/uapi/linux/reboot.h @@ -34,7 +34,7 @@ #define LINUX_REBOOT_CMD_RESTART2 0xA1B2C3D4 #define LINUX_REBOOT_CMD_SW_SUSPEND 0xD000FCE2 #define LINUX_REBOOT_CMD_KEXEC 0x45584543 - +#define LINUX_REBOOT_CMD_MULTIKERNEL 0x4D4B4C49 =20 =20 #endif /* _UAPI_LINUX_REBOOT_H */ diff --git a/kernel/kexec.c b/kernel/kexec.c index 28008e3d462e..49e62f804674 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -16,6 +16,7 @@ #include #include #include +#include =20 #include "kexec_internal.h" =20 @@ -27,6 +28,7 @@ static int kimage_alloc_init(struct kimage **rimage, unsi= gned long entry, int ret; struct kimage *image; bool kexec_on_panic =3D flags & KEXEC_ON_CRASH; + bool multikernel_load =3D flags & KEXEC_MULTIKERNEL; =20 #ifdef CONFIG_CRASH_DUMP if (kexec_on_panic) { @@ -37,6 +39,30 @@ static int kimage_alloc_init(struct kimage **rimage, uns= igned long entry, } #endif =20 +#if 0 + if (multikernel_load) { + // Check if entry is in a reserved memory region + bool in_reserved_region =3D false; + phys_addr_t start, end; + u64 i; + + for_each_reserved_mem_range(i, &start, &end) { + if (entry >=3D start && entry < end) { + in_reserved_region =3D true; + break; + } + } + + if (!in_reserved_region) { + pr_err("Entry point 0x%lx is not in a reserved memory region\n", entry); + return -EADDRNOTAVAIL; // Return an error if not in a reserved region + } + + pr_info("multikernel load: got to multikernel_load syscall, entry 0x%lx,= nr_segments %lu, flags 0x%lx\n", + entry, nr_segments, flags); + } +#endif + /* Allocate and initialize a controlling structure */ image =3D do_kimage_alloc_init(); if (!image) @@ -54,10 +80,16 @@ static int kimage_alloc_init(struct kimage **rimage, un= signed long entry, } #endif =20 + if (multikernel_load) { + image->type =3D KEXEC_TYPE_MULTIKERNEL; + } + ret =3D sanity_check_segment_list(image); if (ret) goto out_free_image; =20 + if (multikernel_load) + goto done; /* * Find a location for the control code buffer, and add it * the vector of segments so that it's pages will also be @@ -79,6 +111,7 @@ static int kimage_alloc_init(struct kimage **rimage, uns= igned long entry, } } =20 +done: *rimage =3D image; return 0; out_free_control_pages: @@ -139,9 +172,11 @@ static int do_kexec_load(unsigned long entry, unsigned= long nr_segments, image->hotplug_support =3D 1; #endif =20 - ret =3D machine_kexec_prepare(image); - if (ret) - goto out; + if (!(flags & KEXEC_MULTIKERNEL)) { + ret =3D machine_kexec_prepare(image); + if (ret) + goto out; + } =20 /* * Some architecture(like S390) may touch the crash memory before diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 31203f0bacaf..35a66c8dd78b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -41,6 +41,7 @@ #include #include #include +#include =20 #include #include @@ -211,6 +212,32 @@ int sanity_check_segment_list(struct kimage *image) } #endif =20 +#if 0 + if (image->type =3D=3D KEXEC_TYPE_MULTIKERNEL) { + for (i =3D 0; i < nr_segments; i++) { + unsigned long mstart, mend; + phys_addr_t start, end; + bool in_reserved_region =3D false; + u64 i; + + mstart =3D image->segment[i].mem; + mend =3D mstart + image->segment[i].memsz - 1; + for_each_reserved_mem_range(i, &start, &end) { + if (mstart >=3D start && mend <=3D end) { + in_reserved_region =3D true; + break; + } + } + + if (!in_reserved_region) { + pr_err("Segment 0x%lx-0x%lx is not in a reserved memory region\n", + mstart, mend); + return -EADDRNOTAVAIL; + } + } + } +#endif + /* * The destination addresses are searched from system RAM rather than * being allocated from the buddy allocator, so they are not guaranteed @@ -943,6 +970,84 @@ static int kimage_load_crash_segment(struct kimage *im= age, int idx) } #endif =20 +static int kimage_load_multikernel_segment(struct kimage *image, int idx) +{ + /* For multikernel we simply copy the data from + * user space to it's destination. + * We do things a page at a time for the sake of kmap. + */ + struct kexec_segment *segment =3D &image->segment[idx]; + unsigned long maddr; + size_t ubytes, mbytes; + int result; + unsigned char __user *buf =3D NULL; + unsigned char *kbuf =3D NULL; + + result =3D 0; + if (image->file_mode) + kbuf =3D segment->kbuf; + else + buf =3D segment->buf; + ubytes =3D segment->bufsz; + mbytes =3D segment->memsz; + maddr =3D segment->mem; + pr_info("Loading multikernel segment: mem=3D0x%lx, memsz=3D0x%zu, buf=3D0= x%px, bufsz=3D0x%zu\n", + maddr, mbytes, buf, ubytes); + while (mbytes) { + char *ptr; + size_t uchunk, mchunk; + unsigned long page_addr =3D maddr & PAGE_MASK; + unsigned long page_offset =3D maddr & ~PAGE_MASK; + + /* Use memremap to map the physical address */ + ptr =3D memremap(page_addr, PAGE_SIZE, MEMREMAP_WB); + if (!ptr) { + pr_err("Failed to memremap memory at 0x%lx\n", page_addr); + result =3D -ENOMEM; + goto out; + } + + /* Adjust pointer to the offset within the page */ + ptr +=3D page_offset; + + /* Calculate chunk sizes */ + mchunk =3D min_t(size_t, mbytes, PAGE_SIZE - page_offset); + uchunk =3D min(ubytes, mchunk); + + /* Zero the trailing part of the page if needed */ + if (mchunk > uchunk) { + /* Zero the trailing part of the page */ + memset(ptr + uchunk, 0, mchunk - uchunk); + } + + if (uchunk) { + /* For file based kexec, source pages are in kernel memory */ + if (image->file_mode) + memcpy(ptr, kbuf, uchunk); + else + result =3D copy_from_user(ptr, buf, uchunk); + ubytes -=3D uchunk; + if (image->file_mode) + kbuf +=3D uchunk; + else + buf +=3D uchunk; + } + + /* Clean up */ + memunmap(ptr - page_offset); + if (result) { + result =3D -EFAULT; + goto out; + } + maddr +=3D mchunk; + mbytes -=3D mchunk; + + cond_resched(); + } +out: + return result; +} + int kimage_load_segment(struct kimage *image, int idx) { int result =3D -ENOMEM; @@ -956,6 +1061,9 @@ int kimage_load_segment(struct kimage *image, int idx) result =3D kimage_load_crash_segment(image, idx); break; #endif + case KEXEC_TYPE_MULTIKERNEL: + result =3D kimage_load_multikernel_segment(image, idx); + break; } =20 return result; @@ -1230,3 +1338,30 @@ int kernel_kexec(void) kexec_unlock(); return error; } + +int multikernel_kexec(int cpu) +{ + int rc; + + pr_info("multikernel kexec: cpu %d\n", cpu); + + if (cpu_online(cpu)) { + pr_err("The CPU is currently running with this kernel instance."); + return -EBUSY; + } + + if (!kexec_trylock()) + return -EBUSY; + if (!kexec_image) { + rc =3D -EINVAL; + goto unlock; + } + + cpus_read_lock(); + rc =3D multikernel_kick_ap(cpu, kexec_image->start); + cpus_read_unlock(); + +unlock: + kexec_unlock(); + return rc; +} diff --git a/kernel/reboot.c b/kernel/reboot.c index ec087827c85c..f3ac703c4695 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -717,6 +717,10 @@ EXPORT_SYMBOL_GPL(kernel_power_off); =20 DEFINE_MUTEX(system_transition_mutex); =20 +struct multikernel_boot_args { + int cpu; +}; + /* * Reboot system call: for obvious reasons only root may call it, * and even root needs to set up some magic numbers in the registers @@ -729,6 +733,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsig= ned int, cmd, void __user *, arg) { struct pid_namespace *pid_ns =3D task_active_pid_ns(current); + struct multikernel_boot_args boot_args; char buffer[256]; int ret =3D 0; =20 @@ -799,6 +804,11 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsi= gned int, cmd, case LINUX_REBOOT_CMD_KEXEC: ret =3D kernel_kexec(); break; + case LINUX_REBOOT_CMD_MULTIKERNEL: + if (copy_from_user(&boot_args, arg, sizeof(boot_args))) + return -EFAULT; + ret =3D multikernel_kexec(boot_args.cpu); + break; #endif =20 #ifdef CONFIG_HIBERNATION --=20 2.34.1 From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E73972FD7A0 for ; Thu, 18 Sep 2025 22:26:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234386; cv=none; b=QH5eA1abf38R/LxPoy+lgcVIYtdlXiKy/Hzu/SL7wMVjBCl94swxyqfukfB98eYu/cdAnAbgwteT+4ohcU3VZRtmCQ+0OUXEOW3mnS2QrDGUqiS6UaRu7tRErUc3Ghvk8rz9Op9Hdf3kjdH5vHxbm1pAkijzCmKT1JJQnX3cccw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234386; c=relaxed/simple; bh=7MWW2TTxu6YEu4H2gCa4UFt/qn9WW6GBXzHketEZLbg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RdjLpSRaOlAzJlfeQsef3U3AYDkOHfJYwc/6ZMti44OvyzolK7lAFw8VhCt5upkjiNUAY4xhpakjTwQI2TWjGE2d/zMwRtlwyBMrPwY4maDQgHwVX0U4hDsXGDmjd0GSMft8JSbD1kw+MVq4Ar6RQrou3yQLzFe94AcyZOiUzmg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J8TEL3dB; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J8TEL3dB" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-77ccb67f8e5so1700025b3a.1 for ; Thu, 18 Sep 2025 15:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234384; x=1758839184; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oqZ7RnV7jRwLZA1ZXag8L1sahIyCZcbj4OJ09T3a3GA=; b=J8TEL3dBuiYS5rqFaRR1JEj/kZVHgqGcRXNmClHiTQvrEzooSGA5Xhe1Wf81KlUml/ oSG3hrgPM5YkxN084fRouFyW1wgUyOtcPzD1IR48XnL8QyTU2PnNW7frQRdFWjtlV48D rFsZ8Bk5Fw2EGykg3qYNyIy+crZP7mAp7t538SFBIRREUowGR6wcGeJ1qOSCAJATQNqi tTcPt9orG7stF6q5UsVbJWXWUnmwV+zGi6eMRX3hbgOSaZwQQBAKRwPz28SbTCNk1vz5 QFyYOhKfpH+jGbWzRnX+jNMmnY5Ky4sZ3lPRppFdHBLziuL2uaXQrdvQZddvJ4ec3Izh 06Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234384; x=1758839184; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oqZ7RnV7jRwLZA1ZXag8L1sahIyCZcbj4OJ09T3a3GA=; b=gxi+8PqN1JwxqMXtFb2/AQDoSG1X6wfs9+qUTUd8h8ye6guk7C+9vI9qwexL2CjFpl gaxnN3UA8kQajyz5N+VNGirZQpUAFsg9SaKZt+tG1csrplwn421C7A5PwNHbVfskrwQu bAVRIqTblDArE3GXLc43Y4dKe+6ou7FvIC26isUZu4p2+Zo6ffN4tw05GN3xMU0Y+ZUN mVEvjuUNcPMojI4VBgqh11z84y31C2FFO6fXTPfZagwBpF6Sw/7j9LwfFL20cuI1Qsu4 GpEDHJ0+UCAI1JLqy4lU1SJWXWZpCIf/KMG84KprShtRcy4L4pTqpabZGLQRWynodmCh WqvQ== X-Gm-Message-State: AOJu0YxguNRbbCX9euo3hP1nUCVDlRHUXLaIvXJoe4zgu+76ck7TK526 ccBVW71+JWcaFHxpsf5aCsXsghOW9JS2kGumgYMv1yenBZU1oj2ki9grFjk0KA== X-Gm-Gg: ASbGnctI7zt5fC8QDnNmmRIUQ1G50bnBssREuXa5ocnzs4EqMl3VRLVXSrWnC5VODxv bN2J4FawRp6qp+nOQYeMVXd+UVFYEVYcpfaMIjCchTyNlzFRroOaUlcjSUDIbOeAjz7nOig06/A +amhw9Sj3QRPNcU14H7DZjX7YrhINgBLiwdFM16I/pe/jS3fOWjUciMcy+PE8ypURFYXkyVMcXH oODqXeHy0qKAk7pBXqMnxHfDJnCLpsnFwecaRYDurLoe0Fm9MajRKW9iarHKJIBrCjLHsNxuaY2 CurkSQgxvr/1KiQcBX+Dwq/JZD9hzUtRZGuUEvnJxd8O+CTAyfoXD3dzTjAWCpLqecGIWZXBxy5 TvNUrP/sD1W/OEwjFIO7lE2vbr7BLWjEKXZ66wdNE5Ffl/61UCL8WH7EKGg== X-Google-Smtp-Source: AGHT+IELDxwl023oJLRmLHqDvYEHpxpw4zqR4LlWYFnOUceLV4c1nRGLvqIrVsP6oJRuokOT6UXRyQ== X-Received: by 2002:a05:6a00:14cf:b0:778:97e1:f499 with SMTP id d2e1a72fcca58-77e4e8bc469mr1385199b3a.21.1758234383776; Thu, 18 Sep 2025 15:26:23 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:23 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 2/7] x86: Introduce SMP INIT trampoline for multikernel CPU bootstrap Date: Thu, 18 Sep 2025 15:26:01 -0700 Message-Id: <20250918222607.186488-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang This patch introduces a dedicated trampoline mechanism for booting secondary CPUs with different kernel instances in multikernel mode. The implementation provides: - New trampoline_64_bsp.S assembly code for real-mode to long-mode transition when launching kernels on secondary CPUs - Trampoline memory allocation and setup in low memory (<1MB) for real-mode execution compatibility - Page table construction for identity mapping during CPU bootstrap - Integration with existing multikernel kexec infrastructure The trampoline handles the complete CPU initialization sequence from 16-bit real mode through 32-bit protected mode to 64-bit long mode, setting up appropriate GDT, page tables, and control registers before jumping to the target kernel entry point without resetting the whole system or the running kernel. Note: This implementation uses legacy assembly-based trampoline code and should be migrated to C-based x86 trampoline in future updates. Signed-off-by: Cong Wang --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/head64.c | 5 + arch/x86/kernel/setup.c | 3 + arch/x86/kernel/smpboot.c | 87 +++++++-- arch/x86/kernel/trampoline_64_bsp.S | 288 ++++++++++++++++++++++++++++ arch/x86/kernel/vmlinux.lds.S | 6 + 6 files changed, 375 insertions(+), 15 deletions(-) create mode 100644 arch/x86/kernel/trampoline_64_bsp.S diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 0d2a6d953be9..ac89d82bf25b 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -50,6 +50,7 @@ CFLAGS_irq.o :=3D -I $(src)/../include/asm/trace =20 obj-y +=3D head_$(BITS).o obj-y +=3D head$(BITS).o +obj-y +=3D trampoline_64_bsp.o obj-y +=3D ebda.o obj-y +=3D platform-quirks.o obj-y +=3D process_$(BITS).o signal.o signal_$(BITS).o diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 533fcf5636fc..4097101011d2 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -216,6 +216,9 @@ static void __init copy_bootdata(char *real_mode_data) sme_unmap_bootdata(real_mode_data); } =20 +unsigned long orig_boot_params; +EXPORT_SYMBOL(orig_boot_params); + asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * rea= l_mode_data) { /* @@ -285,6 +288,8 @@ asmlinkage __visible void __init __noreturn x86_64_star= t_kernel(char * real_mode /* set init_top_pgt kernel high mapping*/ init_top_pgt[511] =3D early_top_pgt[511]; =20 + orig_boot_params =3D (unsigned long) real_mode_data; + x86_64_start_reservations(real_mode_data); } =20 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 1b2edd07a3e1..8342c4e46bad 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -877,6 +877,8 @@ static void __init x86_report_nx(void) * Note: On x86_64, fixmaps are ready for use even before this is called. */ =20 +extern void __init setup_trampolines_bsp(void); + void __init setup_arch(char **cmdline_p) { #ifdef CONFIG_X86_32 @@ -1103,6 +1105,7 @@ void __init setup_arch(char **cmdline_p) (max_pfn_mapped<trampoline_start; + unsigned long start_ip; int ret; =20 - pr_info("do_multikernel_boot_cpu(apicid=3D%u, cpu=3D%u, kernel_start_addr= ess=3D%lx)\n", apicid, cpu, kernel_start_address); -#ifdef CONFIG_X86_64 - /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */ - if (apic->wakeup_secondary_cpu_64) - start_ip =3D real_mode_header->trampoline_start64; -#endif - //initial_code =3D (unsigned long)start_secondary; - initial_code =3D (unsigned long)kernel_start_address; + /* Multikernel -- set physical address where kernel has been copied. + Note that this needs to be written to the location where the + trampoline was copied, not to the location within the original + kernel itself. */ + unsigned long *kernel_virt_addr =3D TRAMPOLINE_SYM_BSP(&kernel_phy= s_addr); + unsigned long *boot_params_virt_addr =3D TRAMPOLINE_SYM_BSP(&boot_= params_phys_addr); =20 - if (IS_ENABLED(CONFIG_X86_32)) { - early_gdt_descr.address =3D (unsigned long)get_cpu_gdt_rw(cpu); - //initial_stack =3D idle->thread.sp; - } else if (!(smpboot_control & STARTUP_PARALLEL_MASK)) { - smpboot_control =3D cpu; - } + *kernel_virt_addr =3D kernel_start_address; + *boot_params_virt_addr =3D orig_boot_params; + + /* start_ip had better be page-aligned! */ + start_ip =3D trampoline_bsp_address(); + + pr_info("do_multikernel_boot_cpu(apicid=3D%u, cpu=3D%u, kernel_start_addr= ess=3D%lx)\n", apicid, cpu, kernel_start_address); =20 /* Skip init_espfix_ap(cpu); */ =20 @@ -897,6 +918,9 @@ static int do_multikernel_boot_cpu(u32 apicid, int cpu,= unsigned long kernel_sta /* If the wakeup mechanism failed, cleanup the warm reset vector */ if (ret) arch_cpuhp_cleanup_kick_cpu(cpu); + + /* mark "stuck" area as not stuck */ + *(volatile u32 *)TRAMPOLINE_SYM_BSP(trampoline_status_bsp) =3D 0; return ret; } /* @@ -1008,6 +1032,39 @@ int multikernel_kick_ap(unsigned int cpu, unsigned l= ong kernel_start_address) return err; } =20 +void __init setup_trampolines_bsp(void) +{ + phys_addr_t mem; + size_t size =3D PAGE_ALIGN(x86_trampoline_bsp_end - x86_trampoline= _bsp_start); + + /* Has to be in very low memory so we can execute real-mode AP cod= e. */ + mem =3D memblock_phys_alloc_range(size, PAGE_SIZE, 0, 1<<20); + if (!mem) + panic("Cannot allocate trampoline\n"); + + x86_trampoline_bsp_base =3D __va(mem); + memblock_reserve(mem, mem + size); + + printk(KERN_DEBUG "Base memory trampoline BSP at [%p] %llx size %z= u\n", + x86_trampoline_bsp_base, (unsigned long long)mem, size); + + //if (!mklinux_boot) { + memcpy(x86_trampoline_bsp_base, trampoline_data_bsp, size); + + //} else { + // printk("Multikernel boot: BSP trampoline will NOT be cop= ied\n"); + //} +} + +static int __init configure_trampolines_bsp(void) +{ + size_t size =3D PAGE_ALIGN(x86_trampoline_bsp_end - x86_trampoline= _bsp_start); + + set_memory_x((unsigned long)x86_trampoline_bsp_base, size >> PAGE_= SHIFT); + return 0; +} + +arch_initcall(configure_trampolines_bsp); =20 int native_kick_ap(unsigned int cpu, struct task_struct *tidle) { diff --git a/arch/x86/kernel/trampoline_64_bsp.S b/arch/x86/kernel/trampoli= ne_64_bsp.S new file mode 100644 index 000000000000..0bd2a971a973 --- /dev/null +++ b/arch/x86/kernel/trampoline_64_bsp.S @@ -0,0 +1,288 @@ +/* + * + * Derived from Setup.S by Linus Torvalds, then derived from Popcorn Linux + * + * 4 Jan 1997 Michael Chastain: changed to gnu as. + * 15 Sept 2005 Eric Biederman: 64bit PIC support + * + * Entry: CS:IP point to the start of our code, we are=20 + * in real mode with no stack, but the rest of the=20 + * trampoline page to make our stack and everything else + * is a mystery. + * + * On entry to trampoline_data, the processor is in real mode + * with 16-bit addressing and 16-bit data. CS has some value + * and IP is zero. Thus, data addresses need to be absolute + * (no relocation) and are taken with regard to r_base. + * + * With the addition of trampoline_level4_pgt this code can + * now enter a 64bit kernel that lives at arbitrary 64bit + * physical addresses. + * + * If you work on this file, check the object module with objdump + * --full-contents --reloc to make sure there are no relocation + * entries. + */ + +#include +#include +#include +#include +#include +#include +#include + + .section ".x86_trampoline_bsp","a" + .balign PAGE_SIZE + .code16 + +SYM_CODE_START(trampoline_data_bsp) +bsp_base =3D . + cli # We should be safe anyway + wbinvd + mov %cs, %ax # Code and data in the same place + mov %ax, %ds + mov %ax, %es + mov %ax, %ss + + + movl $0xA5A5A5A5, trampoline_status_bsp - bsp_base + # write marker for master knows we're running + + # Setup stack + movw $(trampoline_stack_bsp_end - bsp_base), %sp + + # call verify_cpu # Verify the cpu supports long mode + # testl %eax, %eax # Check for return code + # jnz no_longmode_bsp + + mov %cs, %ax + movzx %ax, %esi # Find the 32bit trampoline location + shll $4, %esi + + # Fixup the absolute vectors + leal (startup_32_bsp - bsp_base)(%esi), %eax + movl %eax, startup_32_vector_bsp - bsp_base + leal (startup_64_bsp - bsp_base)(%esi), %eax + movl %eax, startup_64_vector_bsp - bsp_base + leal (tgdt_bsp - bsp_base)(%esi), %eax + movl %eax, (tgdt_bsp + 2 - bsp_base) + + /* + * GDT tables in non default location kernel can be beyond 16MB and + * lgdt will not be able to load the address as in real mode default + * operand size is 16bit. Use lgdtl instead to force operand size + * to 32 bit. + */ + + lidtl tidt_bsp - bsp_base # load idt with 0, 0 + lgdtl tgdt_bsp - bsp_base # load gdt with whatever is appropriate + + mov $X86_CR0_PE, %ax # protected mode (PE) bit + lmsw %ax # into protected mode + + # flush prefetch and jump to startup_32 + ljmpl *(startup_32_vector_bsp - bsp_base) +SYM_CODE_END(trampoline_data_bsp) + + .code32 + .balign 4 +startup_32_bsp: + + cli + movl $(__KERNEL_DS), %eax + movl %eax, %ds + movl %eax, %es + movl %eax, %ss + + /* Load new GDT with the 64bit segments using 32bit descriptor. + * The new GDT labels the entire address space as 64-bit, so we + * can switch into long mode later. */ + leal (gdt_bsp_64 - bsp_base)(%esi), %eax + movl %eax, (gdt_bsp_64 - bsp_base + 2)(%esi) + lgdt (gdt_bsp_64 - bsp_base)(%esi) + + /* Enable PAE mode. Note that this does not actually take effect + * until paging is enabled */ + movl %cr4, %eax + orl $(X86_CR4_PAE), %eax + movl %eax, %cr4 + + /* Initialize Page tables to 0 */ + leal (pgtable_bsp - bsp_base)(%esi), %edi + xorl %eax, %eax + movl $((4096*6)/4), %ecx + rep stosl + + /* Build Level 4 */ + leal (pgtable_bsp - bsp_base)(%esi), %edi + leal 0x1007 (%edi), %eax + movl %eax, 0(%edi) + + /* Build Level 3 */ + leal (pgtable_bsp - bsp_base + 0x1000)(%esi), %edi + leal 0x1007(%edi), %eax + movl $4, %ecx +1: movl %eax, 0x00(%edi) + addl $0x00001000, %eax + addl $8, %edi + decl %ecx + jnz 1b + + /* Build Level 2 */ + leal (pgtable_bsp - bsp_base + 0x2000)(%esi), %edi + movl $0x00000183, %eax + movl $2048, %ecx +1: movl %eax, 0(%edi) + addl $0x00200000, %eax + addl $8, %edi + decl %ecx + jnz 1b + + /* Enable the boot page tables */ + leal (pgtable_bsp - bsp_base)(%esi), %eax + movl %eax, %cr3 + + /* Enable Long mode in EFER (Extended Feature Enable Register) */ + movl $MSR_EFER, %ecx + rdmsr + btsl $_EFER_LME, %eax + wrmsr + + /* + * Setup for the jump to 64bit mode + * + * When the jump is performend we will be in long mode but + * in 32bit compatibility mode with EFER.LME =3D 1, CS.L =3D 0, CS= .D =3D 1 + * (and in turn EFER.LMA =3D 1). To jump into 64bit mode we use + * the new gdt/idt that has __KERNEL_CS with CS.L =3D 1. + * We place all of the values on our mini stack so lret can + * used to perform that far jump. + */ + pushl $__KERNEL_CS + leal (startup_64_bsp - bsp_base)(%esi), %eax + pushl %eax + + /* Enter paged protected Mode, activating Long Mode */ + movl $(X86_CR0_PG | X86_CR0_PE), %eax /* Enable Paging and Prot= ected mode */ + movl %eax, %cr0 + + /* Jump from 32bit compatibility mode into 64bit mode. */ + lret + + .code64 + .balign 4 +startup_64_bsp: + + /* Get physical address of boot_params structure */ + movq (boot_params_phys_addr - bsp_base)(%rsi), %r15 + + /* Load kernel address into register */ + movq (kernel_phys_addr - bsp_base)(%rsi), %r14 + + /* Check whether the kernel is in the 4 GB we mapped already, + * and if not, add an additional mapping */ + movq $0xffffffff00000000, %r8 + testq %r8, %r14 + je 2f + + /* If we got here, we need to identity-map an additional 1 GB */ +=09 + /* Mask off to figure out what our directory pointer should be */ + movq %r14, %r13 + movq $0xffffffffc0000000, %r12 + andq %r12, %r13 + + /* Set our PDPTE */ + movq %r13, %r11 + shrq $(30-3), %r11 + leaq (pgtable_bsp - bsp_base + 0x1000)(%rsi), %rdi + addq %r11, %rdi + leaq (pgtable_extra_bsp - bsp_base + 0x7)(%rsi), %rax + movq %rax, 0(%rdi) + + /* Populate the page directory */ + leaq (pgtable_extra_bsp - bsp_base)(%rsi), %rdi + movq $0x00000183, %rax + addq %r13, %rax + movq $512, %rcx +1: movq %rax, 0(%rdi) + addq $0x00200000, %rax + addq $8, %rdi + decq %rcx + jnz 1b + + /* Set esi to point to the boot_params structure */ +2: movq %r15, %rsi + jmp *%r14 + + .align 8 +SYM_DATA(boot_params_phys_addr, .quad 0) + + .align 8 +SYM_DATA(kernel_phys_addr, .quad 0) + + .code16 + .balign 4 + # Careful these need to be in the same 64K segment as the above; +tidt_bsp: + .word 0 # idt limit =3D 0 + .word 0, 0 # idt base =3D 0L + + # Duplicate the global descriptor table + # so the kernel can live anywhere + .balign 4 +tgdt_bsp: + .short tgdt_bsp_end - tgdt_bsp # gdt limit + .long tgdt_bsp - bsp_base + .short 0 + .quad 0x00cf9b000000ffff # __KERNEL32_CS + .quad 0x00af9b000000ffff # __KERNEL_CS + .quad 0x00cf93000000ffff # __KERNEL_DS +tgdt_bsp_end: + + .code64 + .balign 4 +gdt_bsp_64: + .word gdt_bsp_64_end - gdt_bsp_64 + .long gdt_bsp_64 - bsp_base + .word 0 + .quad 0x0000000000000000 /* NULL descriptor */ + .quad 0x00af9a000000ffff /* __KERNEL_CS */ + .quad 0x00cf92000000ffff /* __KERNEL_DS */ + .quad 0x0080890000000000 /* TS descriptor */ + .quad 0x0000000000000000 /* TS continued */ +gdt_bsp_64_end: + + .code16 + .balign 4 +startup_32_vector_bsp: + .long startup_32_bsp - bsp_base + .word __KERNEL32_CS, 0 + + .balign 4 +startup_64_vector_bsp: + .long startup_64_bsp - bsp_base + .word __KERNEL_CS, 0 + + .balign 4 +SYM_DATA(trampoline_status_bsp, .long 0) + + .balign 4 +SYM_DATA(trampoline_location, .quad 0) + +trampoline_stack_bsp: + .fill 512,8,0 +trampoline_stack_bsp_end: + +SYM_DATA(trampoline_bsp_end) + +/* + * Space for page tables (not in .bss so not zeroed) + */ + .balign 4096 +pgtable_bsp: + .fill 6*4096, 1, 0 +pgtable_extra_bsp: + .fill 1*4096, 1, 0 + diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 4fa0be732af1..86f4fd37dc18 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -231,6 +231,12 @@ SECTIONS =20 INIT_DATA_SECTION(16) =20 + .x86_trampoline_bsp : AT(ADDR(.x86_trampoline_bsp) - LOAD_OFFSET)= { + x86_trampoline_bsp_start =3D .; + *(.x86_trampoline_bsp) + x86_trampoline_bsp_end =3D .; + } + .x86_cpu_dev.init : AT(ADDR(.x86_cpu_dev.init) - LOAD_OFFSET) { __x86_cpu_dev_start =3D .; *(.x86_cpu_dev.init) --=20 2.34.1 From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD68D3009F7 for ; Thu, 18 Sep 2025 22:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234388; cv=none; b=q6tkceGC8OCnWlmna0ZSpxUgeCRER3IAJyPU7qZOE/DvK9g/ltqXZw4V9yCH1tHXfb2wS5h3Ks5CRZFmG1xqhw8mC/hXqmoberXLZHpmC9qGXV2z90qewN2COLJKQCJutp7PeHbbUpyBHciYz2I+gpkei+qeHrT5bS8lVifDH/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234388; c=relaxed/simple; bh=Act+beKdM1cR3FlnZ6neLM9so3BVgfZf1Rj+jP1FSIU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ctkTz+Va8+r3z6P5M9KO4olTfhAOY+A1JvZXlQIyIeBr1qk8jiRascT1DSF0BiqFxxgE70GEAXFcvRclHtLcXG8+rFsE1vmWG7/V65zn+vL7ap65FaOJLjJfGXvcWvuZgbLp8HHLLi7sN8Q/lAGglu5YrJbN+GWZQDv2KUhQQsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ICq8C5Iw; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ICq8C5Iw" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-7728815e639so1055745b3a.1 for ; Thu, 18 Sep 2025 15:26:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234386; x=1758839186; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZYY6rFeiN3Nd63QlR7gbViGMQddkJPo4uSa23B5h294=; b=ICq8C5Iwn4+wmYkZ2yDgHMkFjbmSEhoq1xGP1iaQ3AxnHbhtvLndYKRQ5PIVONHZae gtg5tc33gKA6FKKUjOL6+DjsLi6q3hNK03QxrPTAqr9jOhom1adz0HHWRhVw769wYpFu 8iv5Nem4HWR/r5uU4Rhu+U1HYm0t2utV9ZWK6MpM7jxWMQoUXJuKuh/Scw/lzcQhIRjN NlLM3W8bk0a294GYxwl5lpfDKH/WI+AgaWNn6SHxCi9+ldfCwaPYMvHeB/a6p1mwEzaZ u9zbvc6J3M7yR5FbULeWcIde1zE3edcfXL+TQcmuY2GmM1rM5KpCmuBg7uK5SB8i/ORf xgMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234386; x=1758839186; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZYY6rFeiN3Nd63QlR7gbViGMQddkJPo4uSa23B5h294=; b=aEWW3uccaw8+ad0eKzII+XeSMOQbUqrFL+RmZ+3yUxOHl94XYie7WQSdzqor3QzxDt T1+7LOXExuk0NaPsEYpBKy33oh2NcvO/BWfi/kqazoKTUahPDeXi+O7qETeH2wvEKqYU Db6qPyWHl+guOL+nm8ejpvpi0mpiLfXPA421rvwMQe+sA796wlcZjSF9frWL7zaUl0+e 6bNuVeeA21gWZrXiPFpMSLscTyrSpyva0BHdxhq2u3WS0ibOPRzaYS725y2LyywQD/K3 sk20rHuqW8zK4EXAJMjzbEWf6iRWlRq7bOcOvR/bWB4C5CAMqma2V2IAEc3YTNisSbjA eocw== X-Gm-Message-State: AOJu0YyU2U46ShOBaY7xG/dxIp+FegyR8/ZYdsjvhAcegmIKzHjsf4uY JEi30V+4eNp0Q9fjK7UClAbh3hdCCoWOAHiLPTJkI8doOxm5R84ZvZBaDKUm4Q== X-Gm-Gg: ASbGncvf/ffQr/B6l4g/4dqJg3GV1tohnf53/Fy+wMbttQBosNaPpK658/wqDR1EypJ Iw30D1u7+Yo5PGVve/u8ayymtLbyqnb6rg/twibyYcsDAbki5wy9qZIFvDIaO3NFloRnxE5mG24 OFrbqgPxVXFnoU4FmmDNxT5zayEdA797QE7iVxJ3bmEaLuPD1kbxiFNmQEQAA5KSnI69+aX0yql kgb5TEpiebT0jQsCaIB44s9PugAjc9StVIe3BTcBZ12BfcwURAL1W7sFb8XxzII16uQYRKLj7S/ rQDg/9kvEvccB9Tt+ju21c5bZ0gmhCckz/B/EsBcrmvAEHo4lJv7XDZnmojEh08A0HrVFDsvjpy cjSlK90eEcHvK6c18ZOCqVimYn/KyC4ldsDXQ1X+b49N3dFyAh09C/YgAug== X-Google-Smtp-Source: AGHT+IHTdANHonYuStw4Z+dxwvWzRAP6qfwbsiUcPsTKHk0WpS2t8kUHQdGqTWKF8i2+XxV1lSVzNg== X-Received: by 2002:a05:6a00:3c94:b0:772:871c:1e49 with SMTP id d2e1a72fcca58-77e4f48ee3fmr1082246b3a.29.1758234385633; Thu, 18 Sep 2025 15:26:25 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:25 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 3/7] x86: Introduce MULTIKERNEL_VECTOR for inter-kernel communication Date: Thu, 18 Sep 2025 15:26:02 -0700 Message-Id: <20250918222607.186488-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang This patch adds a dedicated IPI vector (0xea) for multikernel communication, enabling different kernel instances running on separate CPUs to send interrupts to each other. The implementation includes: - MULTIKERNEL_VECTOR definition at interrupt vector 0xea - IDT entry declaration and registration for sysvec_multikernel - Interrupt handler sysvec_multikernel() with proper APIC EOI and IRQ statistics tracking - Placeholder generic_multikernel_interrupt() function for extensible multikernel interrupt handling This vector provides the foundational interrupt mechanism required for implementing inter-kernel communication protocols in multikernel environments, where heterogeneous kernel instances coordinate while maintaining CPU-level isolation. Signed-off-by: Cong Wang --- arch/x86/include/asm/idtentry.h | 1 + arch/x86/include/asm/irq_vectors.h | 1 + arch/x86/kernel/idt.c | 1 + arch/x86/kernel/smp.c | 12 ++++++++++++ 4 files changed, 15 insertions(+) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentr= y.h index a4ec27c67988..219ee36def33 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -708,6 +708,7 @@ DECLARE_IDTENTRY(RESCHEDULE_VECTOR, sysvec_reschedule= _ipi); DECLARE_IDTENTRY_SYSVEC(REBOOT_VECTOR, sysvec_reboot); DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_SINGLE_VECTOR, sysvec_call_function_= single); DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_VECTOR, sysvec_call_function); +DECLARE_IDTENTRY_SYSVEC(MULTIKERNEL_VECTOR, sysvec_multikernel); #else # define fred_sysvec_reschedule_ipi NULL # define fred_sysvec_reboot NULL diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_= vectors.h index 47051871b436..478e2e2d188a 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -102,6 +102,7 @@ * the host kernel. */ #define POSTED_MSI_NOTIFICATION_VECTOR 0xeb +#define MULTIKERNEL_VECTOR 0xea =20 #define NR_VECTORS 256 =20 diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index f445bec516a0..063b330d9fbf 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -135,6 +135,7 @@ static const __initconst struct idt_data apic_idts[] = =3D { INTG(RESCHEDULE_VECTOR, asm_sysvec_reschedule_ipi), INTG(CALL_FUNCTION_VECTOR, asm_sysvec_call_function), INTG(CALL_FUNCTION_SINGLE_VECTOR, asm_sysvec_call_function_single), + INTG(MULTIKERNEL_VECTOR, asm_sysvec_multikernel), INTG(REBOOT_VECTOR, asm_sysvec_reboot), #endif =20 diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index b014e6d229f9..028cc423a772 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -272,6 +272,18 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_call_function_single) trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); } =20 +static void generic_multikernel_interrupt(void) +{ + pr_info("Multikernel interrupt\n"); +} + +DEFINE_IDTENTRY_SYSVEC(sysvec_multikernel) +{ + apic_eoi(); + inc_irq_stat(irq_call_count); + generic_multikernel_interrupt(); +} + static int __init nonmi_ipi_setup(char *str) { smp_no_nmi_ipi =3D true; --=20 2.34.1 From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B251730FC25 for ; Thu, 18 Sep 2025 22:26:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234391; cv=none; b=YJpn9Yuijs3h837FODXZ1d9vfmMpiMviC2NPHWPv02D8FZSEhRr+z0UGkTzC6UF5uKOYrJykZ8HVChtsi1OIUoMq7WCB/DC+EzmCM+9jt3OtAmsnR7qBNyQer6fMZkyBOXDmm5KLv2Ojbqb4W1ckSRj1WOf4h0hYXTgfR4ns2B4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234391; c=relaxed/simple; bh=w42DmKJQJ732kdg3GWkZIxaq+JRMhebKM26e41ZUa+U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZQZCLW9bCv2XZk7ky4AZIi42cNyPImAJQy8MpibmPWzLZ8+nRp+086abOBvaZptuv0PloXro2W7/RcW2gLofhwgb1Yv1l/XnTTCxRjTyf8dygS9una0IgBIqboVTNfVqKg8WkYHLpDrEXeY1oHGWZUKMEkRRzYXc1Be5j8jSRmg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JTMSKmaH; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JTMSKmaH" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-77d64726e47so1043752b3a.3 for ; Thu, 18 Sep 2025 15:26:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234388; x=1758839188; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tzZPK+tD8ua9mkD8VwlX20r11BOjtw6Um42qZGE0OrE=; b=JTMSKmaHGzwQmrFmG0TsiQQVWBWNioIrJ4CK4EO2VbsdECNlNyi5dAA7xJpNSKh1HX CpWn+1y0JCVs9/nwf7P62kXHGSyNivxefL4h13IqgiS0BYnsy4BRra/umDvk1Z9c07pU bD9mmJ9q5qt1iuoVMBIOaxzTRF6mqF1kRH38rEuQVcgEDQKq2Dllorsw6gv5A6vb7JrG Q2FgebpuCbKcazSjEjOKttmrdq02HIFFF8CGL+GorLevIQjxo0T5wLWwA9occ3kGG+lE STlFYApJmtFO9rqI3WQfQOk2cGzADiU6rl0AFk3E5PXHOkHXytyYXBliuFBVWz/47Fl/ ZkBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234388; x=1758839188; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tzZPK+tD8ua9mkD8VwlX20r11BOjtw6Um42qZGE0OrE=; b=KyOful11XdwGkHDexqkVpNDy+7ec6yOrQlfH6v+syO5k8poobydpQCyYKwdxGNhT7m DWU1QUMtAWsOO5lwvuwdhIQaUgIz4zrKB5bSJ4EfWwYX3jlIKSx0bGUt+2FzwXoywaNm jtva7R8zjYQ35YZ0xnWYlxspzSk6LV7J4d0RKbsQqrrALWtB4m3cC4Yi/ucNK0V3DcJY WYEWDi19InTTT3PnkDHxecIHOVCxX6jjBas4fQAIc4gFomLiG1vgkTRU4eBpnDcGFMpi UQ8+6WoYaxIwnW+ADq3TQ4gHldoHcsxIyI/7qxWaYia9S9VgWgH2/esFpbHhf74iPkLm BhJQ== X-Gm-Message-State: AOJu0Yyy7EUcUMXnV91X02H//G6RQQEOsqSpuCThYBkNMvaA/1tgfNvx T3oq4YkiRzxv5/g4mfaU+Wop2oBtDylVwNOehWHsBu6CAv4vefeNc0LGlIrhLQ== X-Gm-Gg: ASbGncssKkTsT7QrWsq/RXEj7Qc56EJ6JuF59iGjHB17EXgltRYgmj4IQaWhX+Qrmb5 K7s/AcXswb6h259cngo4S3zJxuhwVRiwQ9JJNFY2++ebQSC4E9RPK14Cxpj2Lx8sXc6NhWJNlp9 ayuig1X38uQaWyCHHKZvUVb1s8E28NY0YqQpQCS1IT+k25dMtyWTeRrn/wak2oGBkbcPgCC37uV leM31KCVD2XvmoSKfbIhvBH0r/tUneGUn9+tuBfOvBumlR2Oe6CFeIl/hzQeZLtwvqIAhQXdvQR UugTs5wKIgrhCfGcpkObR+hXkrf4vPpuWX3fm6YTLUSDvWEdXSslmzYw7UNoZc1x9sQJ56++kr0 1B08/J5BRdmebcXtWvMxros4ySpVP6a5M0R5XgPy95LsxdeA= X-Google-Smtp-Source: AGHT+IEcmBpKgxWK65kqL+t4N9tpvg/YpMV9lF9NLfIoi4A5w/Hpr9vbHnSl2Ql5oJ01WR0BGA3tRA== X-Received: by 2002:a05:6a00:cc7:b0:776:1804:6fe with SMTP id d2e1a72fcca58-77e4cc3c5e1mr1115624b3a.7.1758234387508; Thu, 18 Sep 2025 15:26:27 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:26 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 4/7] kernel: Introduce generic multikernel IPI communication framework Date: Thu, 18 Sep 2025 15:26:03 -0700 Message-Id: <20250918222607.186488-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang This patch implements a comprehensive IPI-based communication system for multikernel environments, enabling data exchange between different kernel instances running on separate CPUs. Key features include: - Generic IPI handler registration and callback mechanism allowing modules to register for multikernel communication events - Shared memory infrastructure using either boot parameter-specified or dynamically allocated physical memory regions - Per-CPU data buffers in shared memory for efficient IPI payload transfer up to 256 bytes per message - IRQ work integration for safe callback execution in interrupt context - PFN-based flexible shared memory APIs for page-level data sharing - Resource tracking integration for /proc/iomem visibility The implementation provides multikernel_send_ipi_data() for sending typed data to target CPUs and multikernel_register_handler() for receiving notifications. Shared memory is established during early boot and mapped using memremap() for cache-coherent access. This infrastructure enables heterogeneous computing scenarios where multikernel instances can coordinate and share data while maintaining isolation on their respective CPU cores. Note, as a proof-of-concept, we have only implemented the x86 part. Signed-off-by: Cong Wang --- arch/x86/kernel/smp.c | 5 +- include/linux/multikernel.h | 81 ++++++++++ init/main.c | 2 + kernel/Makefile | 2 +- kernel/multikernel.c | 313 ++++++++++++++++++++++++++++++++++++ 5 files changed, 398 insertions(+), 5 deletions(-) create mode 100644 include/linux/multikernel.h create mode 100644 kernel/multikernel.c diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 028cc423a772..3ee515e32383 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -272,10 +272,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_call_function_single) trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); } =20 -static void generic_multikernel_interrupt(void) -{ - pr_info("Multikernel interrupt\n"); -} +void generic_multikernel_interrupt(void); =20 DEFINE_IDTENTRY_SYSVEC(sysvec_multikernel) { diff --git a/include/linux/multikernel.h b/include/linux/multikernel.h new file mode 100644 index 000000000000..12ed5e03f92e --- /dev/null +++ b/include/linux/multikernel.h @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Multikernel Technologies, Inc. All rights reserved + */ +#ifndef _LINUX_MULTIKERNEL_H +#define _LINUX_MULTIKERNEL_H + +#include +#include + +/** + * Multikernel IPI interface + * + * This header provides declarations for the multikernel IPI interface, + * allowing modules to register callbacks for IPI events and pass data + * between CPUs. + */ + +/* Maximum data size that can be transferred via IPI */ +#define MK_MAX_DATA_SIZE 256 + +/* Data structure for passing parameters via IPI */ +struct mk_ipi_data { + int sender_cpu; /* Which CPU sent this IPI */ + unsigned int type; /* User-defined type identifier */ + size_t data_size; /* Size of the data */ + char buffer[MK_MAX_DATA_SIZE]; /* Actual data buffer */ +}; + +/* Function pointer type for IPI callbacks */ +typedef void (*mk_ipi_callback_t)(struct mk_ipi_data *data, void *ctx); + +struct mk_ipi_handler { + mk_ipi_callback_t callback; + void *context; + struct mk_ipi_handler *next; + struct mk_ipi_data *saved_data; + struct irq_work work; +}; + +/** + * multikernel_register_handler - Register a callback for multikernel IPI + * @callback: Function to call when IPI is received + * @ctx: Context pointer passed to the callback + * + * Returns pointer to handler on success, NULL on failure + */ +struct mk_ipi_handler *multikernel_register_handler(mk_ipi_callback_t call= back, void *ctx); + +/** + * multikernel_unregister_handler - Unregister a multikernel IPI callback + * @handler: Handler pointer returned from multikernel_register_handler + */ +void multikernel_unregister_handler(struct mk_ipi_handler *handler); + +/** + * multikernel_send_ipi_data - Send data to another CPU via IPI + * @cpu: Target CPU + * @data: Pointer to data to send + * @data_size: Size of data + * @type: User-defined type identifier + * + * This function copies the data to per-CPU storage and sends an IPI + * to the target CPU. + * + * Returns 0 on success, negative error code on failure + */ +int multikernel_send_ipi_data(int cpu, void *data, size_t data_size, unsig= ned long type); + +void generic_multikernel_interrupt(void); + +int __init multikernel_init(void); + +/* Flexible shared memory APIs (PFN-based) */ +int mk_send_pfn(int target_cpu, unsigned long pfn); +int mk_receive_pfn(struct mk_ipi_data *data, unsigned long *out_pfn); +void *mk_receive_map_page(struct mk_ipi_data *data); + +#define mk_receive_unmap_page(p) memunmap(p) + +#endif /* _LINUX_MULTIKERNEL_H */ diff --git a/init/main.c b/init/main.c index 5753e9539ae6..46a199bcb389 100644 --- a/init/main.c +++ b/init/main.c @@ -103,6 +103,7 @@ #include #include #include +#include #include =20 #include @@ -955,6 +956,7 @@ void start_kernel(void) vfs_caches_init_early(); sort_main_extable(); trap_init(); + multikernel_init(); mm_core_init(); maple_tree_init(); poking_init(); diff --git a/kernel/Makefile b/kernel/Makefile index c60623448235..e5216610a4e7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -10,7 +10,7 @@ obj-y =3D fork.o exec_domain.o panic.o \ extable.o params.o \ kthread.o sys_ni.o nsproxy.o \ notifier.o ksysfs.o cred.o reboot.o \ - async.o range.o smpboot.o ucount.o regset.o ksyms_common.o + async.o range.o smpboot.o ucount.o regset.o ksyms_common.o multikerne= l.o =20 obj-$(CONFIG_MULTIUSER) +=3D groups.o obj-$(CONFIG_VHOST_TASK) +=3D vhost_task.o diff --git a/kernel/multikernel.c b/kernel/multikernel.c new file mode 100644 index 000000000000..74e2f84b7914 --- /dev/null +++ b/kernel/multikernel.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Multikernel Technologies, Inc. All rights reserved + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Memory parameters for shared region */ +#define MK_IPI_DATA_SIZE (sizeof(struct mk_ipi_data) * NR_CPUS) +#define MK_MEM_BASE_SIZE (sizeof(struct mk_shared_data)) +#define MK_MEM_SIZE (MK_MEM_BASE_SIZE + PAGE_SIZE) + +/* Boot parameter for physical address */ +static unsigned long mk_phys_addr_param; + +/* Parse multikernel physical address from kernel command line */ +static int __init multikernel_phys_addr_setup(char *str) +{ + return kstrtoul(str, 0, &mk_phys_addr_param); +} +early_param("mk_shared_memory", multikernel_phys_addr_setup); + +/* Allocated/assigned physical address for shared memory */ +static phys_addr_t mk_phys_addr_base; + +/* Resource structure for tracking the memory in /proc/iomem */ +static struct resource mk_mem_res __ro_after_init =3D { + .name =3D "Multikernel Shared Memory", + .flags =3D IORESOURCE_MEM | IORESOURCE_BUSY, +}; + +/* Shared memory structures */ +struct mk_shared_data { + struct mk_ipi_data cpu_data[NR_CPUS]; /* Data area for each CPU */ +}; + +/* Pointer to the shared memory area (remapped virtual address) */ +static struct mk_shared_data *mk_shared_mem; + +/* Callback management */ +static struct mk_ipi_handler *mk_handlers; +static raw_spinlock_t mk_handlers_lock =3D __RAW_SPIN_LOCK_UNLOCKED(mk_han= dlers_lock); + +static void handler_work(struct irq_work *work) +{ + struct mk_ipi_handler *handler =3D container_of(work, struct mk_ipi_ha= ndler, work); + if (handler->callback) + handler->callback(handler->saved_data, handler->context); +} + +/** + * multikernel_register_handler - Register a callback for multikernel IPI + * @callback: Function to call when IPI is received + * @ctx: Context pointer passed to the callback + * + * Returns pointer to handler on success, NULL on failure + */ +struct mk_ipi_handler *multikernel_register_handler(mk_ipi_callback_t call= back, void *ctx) +{ + struct mk_ipi_handler *handler; + unsigned long flags; + + if (!callback) + return NULL; + + handler =3D kzalloc(sizeof(*handler), GFP_KERNEL); + if (!handler) + return NULL; + + handler->callback =3D callback; + handler->context =3D ctx; + + init_irq_work(&handler->work, handler_work); + + raw_spin_lock_irqsave(&mk_handlers_lock, flags); + handler->next =3D mk_handlers; + mk_handlers =3D handler; + raw_spin_unlock_irqrestore(&mk_handlers_lock, flags); + + return handler; +} +EXPORT_SYMBOL(multikernel_register_handler); + +/** + * multikernel_unregister_handler - Unregister a multikernel IPI callback + * @handler: Handler pointer returned from multikernel_register_handler + */ +void multikernel_unregister_handler(struct mk_ipi_handler *handler) +{ + struct mk_ipi_handler **pp, *p; + unsigned long flags; + + if (!handler) + return; + + raw_spin_lock_irqsave(&mk_handlers_lock, flags); + pp =3D &mk_handlers; + while ((p =3D *pp) !=3D NULL) { + if (p =3D=3D handler) { + *pp =3D p->next; + break; + } + pp =3D &p->next; + } + raw_spin_unlock_irqrestore(&mk_handlers_lock, flags); + + /* Wait for pending work to complete */ + irq_work_sync(&handler->work); + kfree(p); +} +EXPORT_SYMBOL(multikernel_unregister_handler); + +/** + * multikernel_send_ipi_data - Send data to another CPU via IPI + * @cpu: Target CPU + * @data: Pointer to data to send + * @data_size: Size of data + * @type: User-defined type identifier + * + * This function copies the data to per-CPU storage and sends an IPI + * to the target CPU. + * + * Returns 0 on success, negative error code on failure + */ +int multikernel_send_ipi_data(int cpu, void *data, size_t data_size, unsig= ned long type) +{ + struct mk_ipi_data *target; + + if (cpu < 0 || cpu >=3D nr_cpu_ids) + return -EINVAL; + + if (data_size > MK_MAX_DATA_SIZE) + return -EINVAL; /* Data too large for buffer */ + + /* Ensure shared memory is initialized */ + if (!mk_shared_mem) + return -ENOMEM; + + /* Get target CPU's data area from shared memory */ + target =3D &mk_shared_mem->cpu_data[cpu]; + + /* Set header information */ + target->data_size =3D data_size; + target->sender_cpu =3D smp_processor_id(); + target->type =3D type; + + /* Copy the actual data into the buffer */ + if (data && data_size > 0) + memcpy(target->buffer, data, data_size); + + /* Send IPI to target CPU */ + __apic_send_IPI(cpu, MULTIKERNEL_VECTOR); + + return 0; +} +EXPORT_SYMBOL(multikernel_send_ipi_data); + +/** + * multikernel_interrupt_handler - Handle the multikernel IPI + * + * This function is called when a multikernel IPI is received. + * It invokes all registered callbacks with the per-CPU data. + */ +static void multikernel_interrupt_handler(void) +{ + struct mk_ipi_data *data; + struct mk_ipi_handler *handler; + int current_cpu =3D smp_processor_id(); + + /* Ensure shared memory is initialized */ + if (!mk_shared_mem) { + pr_err("Multikernel IPI received but shared memory not initialized\n"); + return; + } + + /* Get this CPU's data area from shared memory */ + data =3D &mk_shared_mem->cpu_data[current_cpu]; + + pr_debug("Multikernel IPI received on CPU %d from CPU %d, type=3D%u\n", + current_cpu, data->sender_cpu, data->type); + + raw_spin_lock(&mk_handlers_lock); + for (handler =3D mk_handlers; handler; handler =3D handler->next) { + handler->saved_data =3D data; + irq_work_queue(&handler->work); + } + raw_spin_unlock(&mk_handlers_lock); +} + +/** + * Generic multikernel interrupt handler - called by the IPI vector + * + * This is the function that gets called by the IPI vector handler. + */ +void generic_multikernel_interrupt(void) +{ + multikernel_interrupt_handler(); +} + +/** + * setup_shared_memory - Initialize shared memory for inter-kernel communi= cation + * + * Maps a fixed physical memory region for sharing IPI data between kernels + * Returns 0 on success, negative error code on failure + */ +static int __init setup_shared_memory(void) +{ + /* Check if a fixed physical address was provided via parameter */ + if (mk_phys_addr_param) { + /* Use the provided physical address */ + mk_phys_addr_base =3D (phys_addr_t)mk_phys_addr_param; + pr_info("Using specified physical address 0x%llx for multikernel shared = memory\n", + (unsigned long long)mk_phys_addr_base); + } else { + /* Dynamically allocate contiguous physical memory using memblock */ + mk_phys_addr_base =3D memblock_phys_alloc(MK_MEM_SIZE, PAGE_SIZE); + if (!mk_phys_addr_base) { + pr_err("Failed to allocate physical memory for multikernel IPI data\n"); + return -ENOMEM; + } + } + + /* Map the physical memory region to virtual address space */ + mk_shared_mem =3D memremap(mk_phys_addr_base, MK_MEM_SIZE, MEMREMAP_WB); + if (!mk_shared_mem) { + pr_err("Failed to map shared memory at 0x%llx for multikernel IPI data\n= ", + (unsigned long long)mk_phys_addr_base); + + /* Only free the memory if we allocated it dynamically */ + if (!mk_phys_addr_param) + memblock_phys_free(mk_phys_addr_base, MK_MEM_SIZE); + return -ENOMEM; + } + + /* Initialize the memory to zero */ + memset(mk_shared_mem, 0, sizeof(struct mk_shared_data)); + + pr_info("Allocated and mapped multikernel shared memory: phys=3D0x%llx, v= irt=3D%px, size=3D%lu bytes\n", + (unsigned long long)mk_phys_addr_base, mk_shared_mem, MK_MEM_SIZE); + + return 0; +} + +int __init multikernel_init(void) +{ + int ret; + + ret =3D setup_shared_memory(); + if (ret < 0) + return ret; + + pr_info("Multikernel IPI support initialized\n"); + return 0; +} + +static int __init init_shared_memory(void) +{ + /* Set up resource structure for /proc/iomem visibility */ + mk_mem_res.start =3D mk_phys_addr_base; + mk_mem_res.end =3D mk_phys_addr_base + MK_MEM_SIZE - 1; + + /* Register the resource in the global resource tree */ + if (insert_resource(&iomem_resource, &mk_mem_res)) { + pr_warn("Could not register multikernel shared memory region in resource= tracking\n"); + /* Continue anyway as this is not fatal */ + return -1; + } + + pr_info("Registered multikernel shared memory in resource tree: 0x%llx-0x= %llx\n", + (unsigned long long)mk_mem_res.start, (unsigned long long)mk_mem_res.end= ); + return 0; +} +core_initcall(init_shared_memory); + +/* ---- Flexible shared memory APIs (PFN-based) ---- */ +#define MK_PFN_IPI_TYPE 0x80000001U + +/* Send a PFN to another kernel via mk_ipi_data */ +int mk_send_pfn(int target_cpu, unsigned long pfn) +{ + return multikernel_send_ipi_data(target_cpu, &pfn, sizeof(pfn), MK_PFN_IP= I_TYPE); +} + +/* Receive a PFN from mk_ipi_data. Caller must check type. */ +int mk_receive_pfn(struct mk_ipi_data *data, unsigned long *out_pfn) +{ + if (!data || !out_pfn) + return -EINVAL; + if (data->type !=3D MK_PFN_IPI_TYPE || data->data_size !=3D sizeof(unsign= ed long)) + return -EINVAL; + *out_pfn =3D *(unsigned long *)data->buffer; + return 0; +} + +void *mk_receive_map_page(struct mk_ipi_data *data) +{ + unsigned long pfn; + int ret; + + ret =3D mk_receive_pfn(data, &pfn); + if (ret < 0) + return NULL; + return memremap(pfn << PAGE_SHIFT, PAGE_SIZE, MEMREMAP_WB); +} --=20 2.34.1 From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54E5D313290 for ; Thu, 18 Sep 2025 22:26:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234391; cv=none; b=k4Mku+G/zG2OuXNcyWrcopm+HRMT/a36dekFyfvi8HxOt9GBovuGk7tpD9nnLnh3Tl5TFW21HdvoumEM56xZRazGQ+Ca2hxIqThlBhaMVXzUDqsmIrVyXEtZOADq1sgKnCASAY5n2Ac9IGYOaumKV0FyNojeDKTGSmqZBoxfAC4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234391; c=relaxed/simple; bh=XZ1DoEgvqQUQR498R4YYqLRTrrOnCDYL9OikAtCa8cU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qTOL4QrD3d9k/O0wnxExm13LxhXkJBvrAkqLoGAAhOZo2pfQ8nHaUAINRB806sw7sMzdzOdcoq6gVOBsMh8rkGPXmJIRn+Cps0X6d8ERjWrWo0/mfqis2vnRe5et4R0E7hvPU6Sqp9RV/L0B15xHo7eGIu+Z7WYJhcsGr3afANA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iZjp53pp; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iZjp53pp" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-7761a8a1dbcso1458941b3a.1 for ; Thu, 18 Sep 2025 15:26:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234389; x=1758839189; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7tZ1awk6SoUQIOVQF3O5Wg32Zi3pcSNXGfmdT++muNE=; b=iZjp53ppjwbF1k/BvAVdcMIcpyzF4hfsESk6wixZcgZzcXuLvNeBvOGXPzE/UBMt0c WE2EWBO/pDi+KViqvYrdsG14FJP3ZzTlmKJZjoRrEj1FHPks95kU27BD84+zOzgPigaG 6AGgLxs3rkEiqbIe552qhPsT4vyVLLgqL0UEVimQ9jAK753UyXWmjVyk/GirP+xiCwH1 s5d0QCTOBoTpEirXK78FoUTXzizx4mZSgbk9GcpdGOuA3sSF7rIav8A+t5UIW4qSNltv pCASRMpkmS85KB5pxOlVaW8pMQGk/44m4Zf3PaCwnvIi66dHEvzG2ckOhdjvkpJTsu6c z0FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234389; x=1758839189; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7tZ1awk6SoUQIOVQF3O5Wg32Zi3pcSNXGfmdT++muNE=; b=FBBxg1z+iLS3bXiaiifCbTq8L2pwlouIEioOJlmvpekjFSTqrbK3grpEDn3dT7yA7V fJ2uavdgOrysouQSM4mkS3ltArF5SnQaVWJALGhz+lA74eJT+QRvx8RJjSlimLWAU7VG vTpqKTfKHBY7GursbHdRtAvvrSdSZYSiI5KyiQZ94fikV1f/QE4+/QYK52L5kDkhskgF 2WTppxraIdAZbECR6ZNwh9/yLLTsJpfs40gZE6OfWQhiJh9FXuJgpGxgDHFD4NKdoZj/ 9erkbSB3vIer1VYGMy6YgKpj+0pY/MGG4912IlvNYhnz9oBXunQpU3+UscgDIMEpQc6g wGeQ== X-Gm-Message-State: AOJu0Yy6k+WAcRfcntm8RwuSdGDhZrW2SDGhTuNp7gFc9kZzImXt/qqL oQRDBlNoU5J8nnAFQx8ASFCOHZOPrmBTt8noNJoR0SEY5W0lr36770zyuWDmmA== X-Gm-Gg: ASbGnctpR9/QbRXZtj3ah+9AD1zy7ME3r4LmyA5niNoiBcBRcth4LmPZ5pvQjJuk8t+ 3Bwyhv5H6tDTgHZiB9xReBuleMGtvbFLGmVPjVBX/h7kZOqsbXEcgutpTxhc2jo7n5i9To6AQus 7LUre5pQ0EGlgUOM8tunLubWbdUdEEPGOS5TZEz0835e1sPypKS40g4I8eIzdXhhI+7ODi/neOB INXYe3I8+RUCd4RVBwnjfxfqE4aaHVDTKQEYD2pUBjZTzIxMPHU/I8+j7/M9uq33URXKu0ZPkh5 xZOWB/rLEN0pmrGNBStFDje6m1FD4RtH0103NC+9jI+W3LZjaxr0bXY68xZQfxqmPDbfBSJNhQ3 B60sN8YvxNmllBtNlMJh49J4rROTqKUbCPt52b7n76xdsJTI= X-Google-Smtp-Source: AGHT+IEWFbh2aOOMKw7ejiTJRJw8lOcG5jy15DL2xMCbkrQw2DZczjpxb6lHP6fuCOtxt7Ts0n9cBA== X-Received: by 2002:a05:6a00:1250:b0:77d:6a00:1cd1 with SMTP id d2e1a72fcca58-77e4d031e97mr1237987b3a.12.1758234389331; Thu, 18 Sep 2025 15:26:29 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:28 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 5/7] x86: Introduce arch_cpu_physical_id() to obtain physical CPU ID Date: Thu, 18 Sep 2025 15:26:04 -0700 Message-Id: <20250918222607.186488-6-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang The tranditional smp_processor_id() is a software-defined CPU ID which is only unique within the same kernel. With Multikernel architecture, we run multiple Linux kernels on different CPU's, hence the host kernel needs a globally unique CPU ID to manage the CPU's. The physical CPU ID is perfect for this case. This API will be used to globally distinguish CPU's among different multikernels. Signed-off-by: Cong Wang --- arch/x86/include/asm/smp.h | 6 ++++++ arch/x86/kernel/smp.c | 6 ++++++ kernel/multikernel.c | 9 +++++---- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 1a59fd0de759..378be65ceafa 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -40,6 +40,7 @@ struct smp_ops { =20 void (*send_call_func_ipi)(const struct cpumask *mask); void (*send_call_func_single_ipi)(int cpu); + int (*cpu_physical_id)(int cpu); }; =20 /* Globals due to paravirt */ @@ -100,6 +101,11 @@ static inline void arch_send_call_function_ipi_mask(co= nst struct cpumask *mask) smp_ops.send_call_func_ipi(mask); } =20 +static inline int arch_cpu_physical_id(int cpu) +{ + return smp_ops.cpu_physical_id(cpu); +} + void cpu_disable_common(void); void native_smp_prepare_boot_cpu(void); void smp_prepare_cpus_common(void); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 3ee515e32383..face9f80e05c 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -289,6 +289,11 @@ static int __init nonmi_ipi_setup(char *str) =20 __setup("nonmi_ipi", nonmi_ipi_setup); =20 +static int native_cpu_physical_id(int cpu) +{ + return cpu_physical_id(cpu); +} + struct smp_ops smp_ops =3D { .smp_prepare_boot_cpu =3D native_smp_prepare_boot_cpu, .smp_prepare_cpus =3D native_smp_prepare_cpus, @@ -306,6 +311,7 @@ struct smp_ops smp_ops =3D { =20 .send_call_func_ipi =3D native_send_call_func_ipi, .send_call_func_single_ipi =3D native_send_call_func_single_ipi, + .cpu_physical_id =3D native_cpu_physical_id, }; EXPORT_SYMBOL_GPL(smp_ops); =20 diff --git a/kernel/multikernel.c b/kernel/multikernel.c index 74e2f84b7914..7f6f90485876 100644 --- a/kernel/multikernel.c +++ b/kernel/multikernel.c @@ -150,7 +150,7 @@ int multikernel_send_ipi_data(int cpu, void *data, size= _t data_size, unsigned lo =20 /* Set header information */ target->data_size =3D data_size; - target->sender_cpu =3D smp_processor_id(); + target->sender_cpu =3D arch_cpu_physical_id(smp_processor_id()); target->type =3D type; =20 /* Copy the actual data into the buffer */ @@ -175,6 +175,7 @@ static void multikernel_interrupt_handler(void) struct mk_ipi_data *data; struct mk_ipi_handler *handler; int current_cpu =3D smp_processor_id(); + int current_physical_id =3D arch_cpu_physical_id(current_cpu); =20 /* Ensure shared memory is initialized */ if (!mk_shared_mem) { @@ -183,10 +184,10 @@ static void multikernel_interrupt_handler(void) } =20 /* Get this CPU's data area from shared memory */ - data =3D &mk_shared_mem->cpu_data[current_cpu]; + data =3D &mk_shared_mem->cpu_data[current_physical_id]; =20 - pr_debug("Multikernel IPI received on CPU %d from CPU %d, type=3D%u\n", - current_cpu, data->sender_cpu, data->type); + pr_info("Multikernel IPI received on CPU %d (physical id %d) from CPU %d = type=3D%u\n", + current_cpu, current_physical_id, data->sender_cpu, data->type); =20 raw_spin_lock(&mk_handlers_lock); for (handler =3D mk_handlers; handler; handler =3D handler->next) { --=20 2.34.1 From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F40D31A7E9 for ; Thu, 18 Sep 2025 22:26:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234394; cv=none; b=mo8fKYsxk8H2N0Yp1QTjSF+48ZNwucQoMSaQdVWb3t6bugUhs7kMqi9zsZXZG/qaJ1odJwPofvLFq7stdvA8D8YBUbvvFGt0A1+dwEVC+GIAWMFQsVFAIgiwfWkChRnxJtWVXseGUPVMEhr7Vw9vBYkmkVZrdS5bsaelphX+3nU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234394; c=relaxed/simple; bh=1zIMwTGDOFhw6U9pruOywbnorcq1NdEtWZve4Zf9LJk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ud4Mqd3AeauPIVzBNTL4N4fMuv37s6H2GTtyVjkME8dGcILo3Q1GVUWfeeb/mxbp8wj3vk2iYRkSM3KXnf5rAFzs3BfRfRatSys0KW07135aRHAL+lkMVQmmNpoV6r6vn7tO3NaSf7l8ITS7afrNDZKFdkAaZvSiynr39Br4viw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RHSwL2Hb; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RHSwL2Hb" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-77e87003967so37541b3a.1 for ; Thu, 18 Sep 2025 15:26:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234391; x=1758839191; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xmD63BbCvLw1+b6tWETrdoM1eyA9FzYJ5sTU3sUEEu0=; b=RHSwL2Hby5re5Ztk6xVb3HcEX9MHNTMYocLgoKaTTpr/JGk0aD+ZewpPQz/fs3KyCb GRj0WzJlFDYZMkFpRdMCO7qsnuFV+6zYau2BuVoIQNZNwnLivZbIQP61+xGMXoy5GYSH W2Z08yd6fOijQPfScINVK9gcR7H3xq3RONnFM07rmi7P/YDCHS2eCYDha/w947dGy8YM q2iXgGYJCrUgzSR+TTlCIqGZbpPpDpuPTmu8RpQxuewbMU36lGTgTGe8txGXTCT/Tlw2 iknrQRwnRIjdDEdGEgZVdhyWMzr/c34uvigUFqIaGJNaMpwqk6P2CkqivleHLHJV7f1l pjaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234391; x=1758839191; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xmD63BbCvLw1+b6tWETrdoM1eyA9FzYJ5sTU3sUEEu0=; b=iCp1M2NM2mzeyghFXkHv8K9cjrp/PP+/mw1Tx5yAQMUS2aMv/a2NMFCLwMdxqpRW3O C9UI7fkGOwnpl+Q0HNGbdXxufsELw9UWBN7A5zPmrutDvPan/0llCv5KbJ5Q05PaCsof NR0LU0NmmFfp9gUyOlrefX/Q0zK1OfXZfejN5aIdShhGn+9Jf4PLoCR+WP/tXldYlJFC wk4arLU05H9mJieoUhWEaEhTq0rIwXHac6m6HHnTbjAyRSqcKfGz9OTtj55/xhqHEf4r /Sg3om9J8+LFQGot6vDHyrQgLHfF/pMJZGNCFhqlnijrN/Tn1EBBvrhSeRnPqnHxSt3t 1BSQ== X-Gm-Message-State: AOJu0YyqwgHqB/xAb3AM+IRkouFRmPFjCKF9Oh5imMraSwTofnC4Atto fnM/UJ5wqfM0G3rXcLDUbPtwwL3SvXQEIAPOOUC+gtM93a1y5NOZYeSPQKN+Zw== X-Gm-Gg: ASbGnct1hrx98RhRKwZeCd5yO0S3R+I27CO8grnnX/o9yk0A9ksrcyFqbUvphKd8M5Y bDVo48V3mV9bs5/vDxW5/khJaHuSnVpApHmLRCyZdy6FUndefiKamKysoqoPzdanDMZaRDIK4mg kKxkc2tkKsdjtfdLYq84Wz8lyQ0Itd7vDyIwwkoY53vkfDJer/WvodiPNCXS4XEFZXhI133F1Ty cod5saAaJFS0Eh1zsRjz+MmtEa1y1HgNGYHDO+WRP6/qKy9qOBWuUpPNGxYCZyn0tUoA6XIlVJO yCGPKqA9kows73Udi9vLVifAggB9gQC0bh/l/CRJcN30XFNSztEEzm1/wiDN3FMZKDSK2/GnMpH 0iN/HJaCcNEfS/HDu0wNYKASi0d374+A7bE8umDE3bARvFa+xEOp61xODFQ== X-Google-Smtp-Source: AGHT+IHHwLBA5GlUoOvD4xD7+nuj+LIGVESRzM/UopIIETfrLPNF1NwNONMXXiHPjNBi/2zForUYOw== X-Received: by 2002:a05:6a20:3c8e:b0:262:c083:bb47 with SMTP id adf61e73a8af0-2925c552ae1mr1736118637.14.1758234391195; Thu, 18 Sep 2025 15:26:31 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:30 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 6/7] kexec: Implement dynamic kimage tracking Date: Thu, 18 Sep 2025 15:26:05 -0700 Message-Id: <20250918222607.186488-7-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang Replace static kexec_image and kexec_crash_image globals with a dynamic linked list infrastructure to support multiple kernel images. This change enables multikernel functionality while maintaining backward compatibility. Key changes: - Add list_head member to kimage structure for chaining - Implement thread-safe linked list management with global mutex - Update kexec load/unload logic to use list-based APIs for multikernel - Add helper functions for finding and managing multiple kimages - Preserve existing kexec_image/kexec_crash_image pointers for compatibility - Update architecture-specific crash handling to use new APIs The multikernel case now properly uses list-based management instead of overwriting compatibility pointers, allowing multiple multikernel images to coexist in the system. Signed-off-by: Cong Wang --- arch/powerpc/kexec/crash.c | 8 +- arch/x86/kernel/crash.c | 4 +- include/linux/kexec.h | 16 ++++ kernel/kexec.c | 62 +++++++++++++- kernel/kexec_core.c | 165 ++++++++++++++++++++++++++++++++++++- kernel/kexec_file.c | 33 +++++++- 6 files changed, 274 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index a325c1c02f96..af190fad4f22 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -477,13 +477,13 @@ static void update_crash_elfcorehdr(struct kimage *im= age, struct memory_notify * ptr =3D __va(mem); if (ptr) { /* Temporarily invalidate the crash image while it is replaced */ - xchg(&kexec_crash_image, NULL); + kimage_update_compat_pointers(NULL, KEXEC_TYPE_CRASH); =20 /* Replace the old elfcorehdr with newly prepared elfcorehdr */ memcpy((void *)ptr, elfbuf, elfsz); =20 /* The crash image is now valid once again */ - xchg(&kexec_crash_image, image); + kimage_update_compat_pointers(image, KEXEC_TYPE_CRASH); } out: kvfree(cmem); @@ -537,14 +537,14 @@ static void update_crash_fdt(struct kimage *image) fdt =3D __va((void *)image->segment[fdt_index].mem); =20 /* Temporarily invalidate the crash image while it is replaced */ - xchg(&kexec_crash_image, NULL); + kimage_update_compat_pointers(NULL, KEXEC_TYPE_CRASH); =20 /* update FDT to reflect changes in CPU resources */ if (update_cpus_node(fdt)) pr_err("Failed to update crash FDT"); =20 /* The crash image is now valid once again */ - xchg(&kexec_crash_image, image); + kimage_update_compat_pointers(image, KEXEC_TYPE_CRASH); } =20 int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_f= lags) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c6b12bed173d..fc561d5e058e 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -546,9 +546,9 @@ void arch_crash_handle_hotplug_event(struct kimage *ima= ge, void *arg) * Temporarily invalidate the crash image while the * elfcorehdr is updated. */ - xchg(&kexec_crash_image, NULL); + kimage_update_compat_pointers(NULL, KEXEC_TYPE_CRASH); memcpy_flushcache(old_elfcorehdr, elfbuf, elfsz); - xchg(&kexec_crash_image, image); + kimage_update_compat_pointers(image, KEXEC_TYPE_CRASH); kunmap_local(old_elfcorehdr); pr_debug("updated elfcorehdr\n"); =20 diff --git a/include/linux/kexec.h b/include/linux/kexec.h index a3ae3e561109..3bcbbacc0108 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -428,6 +428,9 @@ struct kimage { /* dm crypt keys buffer */ unsigned long dm_crypt_keys_addr; unsigned long dm_crypt_keys_sz; + + /* For multikernel support: linked list node */ + struct list_head list; }; =20 /* kexec interface functions */ @@ -531,6 +534,19 @@ extern bool kexec_file_dbg_print; =20 extern void *kimage_map_segment(struct kimage *image, unsigned long addr, = unsigned long size); extern void kimage_unmap_segment(void *buffer); + +/* Multikernel support functions */ +extern struct kimage *kimage_find_by_type(int type); +extern void kimage_add_to_list(struct kimage *image); +extern void kimage_remove_from_list(struct kimage *image); +extern void kimage_update_compat_pointers(struct kimage *new_image, int ty= pe); +extern int kimage_get_all_by_type(int type, struct kimage **images, int ma= x_count); +extern void kimage_list_lock(void); +extern void kimage_list_unlock(void); +extern struct kimage *kimage_find_multikernel_by_entry(unsigned long entry= ); +extern struct kimage *kimage_get_multikernel_by_index(int index); +extern int multikernel_kexec_by_entry(int cpu, unsigned long entry); +extern void kimage_list_multikernel_images(void); #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; struct task_struct; diff --git a/kernel/kexec.c b/kernel/kexec.c index 49e62f804674..3d37925ee15a 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -147,7 +147,31 @@ static int do_kexec_load(unsigned long entry, unsigned= long nr_segments, =20 if (nr_segments =3D=3D 0) { /* Uninstall image */ - kimage_free(xchg(dest_image, NULL)); + if (flags & KEXEC_ON_CRASH) { + struct kimage *old_image =3D xchg(&kexec_crash_image, NULL); + if (old_image) { + kimage_remove_from_list(old_image); + kimage_free(old_image); + } + } else if (flags & KEXEC_MULTIKERNEL) { + /* For multikernel unload, we need to specify which image to remove */ + /* For now, remove all multikernel images - this could be enhanced */ + struct kimage *images[10]; + int count, i; + + count =3D kimage_get_all_by_type(KEXEC_TYPE_MULTIKERNEL, images, 10); + for (i =3D 0; i < count; i++) { + kimage_remove_from_list(images[i]); + kimage_free(images[i]); + } + pr_info("Unloaded %d multikernel images\n", count); + } else { + struct kimage *old_image =3D xchg(&kexec_image, NULL); + if (old_image) { + kimage_remove_from_list(old_image); + kimage_free(old_image); + } + } ret =3D 0; goto out_unlock; } @@ -157,7 +181,11 @@ static int do_kexec_load(unsigned long entry, unsigned= long nr_segments, * crashes. Free any current crash dump kernel before * we corrupt it. */ - kimage_free(xchg(&kexec_crash_image, NULL)); + struct kimage *old_crash_image =3D xchg(&kexec_crash_image, NULL); + if (old_crash_image) { + kimage_remove_from_list(old_crash_image); + kimage_free(old_crash_image); + } } =20 ret =3D kimage_alloc_init(&image, entry, nr_segments, segments, flags); @@ -199,7 +227,35 @@ static int do_kexec_load(unsigned long entry, unsigned= long nr_segments, goto out; =20 /* Install the new kernel and uninstall the old */ - image =3D xchg(dest_image, image); + if (flags & KEXEC_ON_CRASH) { + struct kimage *old_image =3D xchg(&kexec_crash_image, image); + if (old_image) { + kimage_remove_from_list(old_image); + kimage_free(old_image); + } + if (image) { + kimage_add_to_list(image); + kimage_update_compat_pointers(image, KEXEC_TYPE_CRASH); + } + image =3D NULL; /* Don't free the new image */ + } else if (flags & KEXEC_MULTIKERNEL) { + if (image) { + kimage_add_to_list(image); + pr_info("Added multikernel image to list (entry: 0x%lx)\n", image->star= t); + } + image =3D NULL; /* Don't free the new image */ + } else { + struct kimage *old_image =3D xchg(&kexec_image, image); + if (old_image) { + kimage_remove_from_list(old_image); + kimage_free(old_image); + } + if (image) { + kimage_add_to_list(image); + kimage_update_compat_pointers(image, KEXEC_TYPE_DEFAULT); + } + image =3D NULL; /* Don't free the new image */ + } =20 out: #ifdef CONFIG_CRASH_DUMP diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 35a66c8dd78b..4e489a7031e6 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -56,6 +56,10 @@ bool kexec_in_progress =3D false; =20 bool kexec_file_dbg_print; =20 +/* Linked list of dynamically allocated kimages */ +static LIST_HEAD(kexec_image_list); +static DEFINE_MUTEX(kexec_image_mutex); + /* * When kexec transitions to the new kernel there is a one-to-one * mapping between physical and virtual addresses. On processors @@ -275,6 +279,9 @@ struct kimage *do_kimage_alloc_init(void) /* Initialize the list of unusable pages */ INIT_LIST_HEAD(&image->unusable_pages); =20 + /* Initialize the list node for multikernel support */ + INIT_LIST_HEAD(&image->list); + #ifdef CONFIG_CRASH_HOTPLUG image->hp_action =3D KEXEC_CRASH_HP_NONE; image->elfcorehdr_index =3D -1; @@ -607,6 +614,13 @@ void kimage_free(struct kimage *image) if (!image) return; =20 + /* Remove from linked list and update compatibility pointers */ + kimage_remove_from_list(image); + if (image =3D=3D kexec_image) + kimage_update_compat_pointers(NULL, KEXEC_TYPE_DEFAULT); + else if (image =3D=3D kexec_crash_image) + kimage_update_compat_pointers(NULL, KEXEC_TYPE_CRASH); + #ifdef CONFIG_CRASH_DUMP if (image->vmcoreinfo_data_copy) { crash_update_vmcoreinfo_safecopy(NULL); @@ -1123,6 +1137,72 @@ void kimage_unmap_segment(void *segment_buffer) vunmap(segment_buffer); } =20 +void kimage_add_to_list(struct kimage *image) +{ + mutex_lock(&kexec_image_mutex); + list_add_tail(&image->list, &kexec_image_list); + mutex_unlock(&kexec_image_mutex); +} + +void kimage_remove_from_list(struct kimage *image) +{ + mutex_lock(&kexec_image_mutex); + if (!list_empty(&image->list)) + list_del_init(&image->list); + mutex_unlock(&kexec_image_mutex); +} + +struct kimage *kimage_find_by_type(int type) +{ + struct kimage *image; + + mutex_lock(&kexec_image_mutex); + list_for_each_entry(image, &kexec_image_list, list) { + if (image->type =3D=3D type) { + mutex_unlock(&kexec_image_mutex); + return image; + } + } + mutex_unlock(&kexec_image_mutex); + return NULL; +} + +void kimage_update_compat_pointers(struct kimage *new_image, int type) +{ + mutex_lock(&kexec_image_mutex); + if (type =3D=3D KEXEC_TYPE_CRASH) { + kexec_crash_image =3D new_image; + } else if (type =3D=3D KEXEC_TYPE_DEFAULT) { + kexec_image =3D new_image; + } + mutex_unlock(&kexec_image_mutex); +} + +int kimage_get_all_by_type(int type, struct kimage **images, int max_count) +{ + struct kimage *image; + int count =3D 0; + + mutex_lock(&kexec_image_mutex); + list_for_each_entry(image, &kexec_image_list, list) { + if (image->type =3D=3D type && count < max_count) { + images[count++] =3D image; + } + } + mutex_unlock(&kexec_image_mutex); + return count; +} + +void kimage_list_lock(void) +{ + mutex_lock(&kexec_image_mutex); +} + +void kimage_list_unlock(void) +{ + mutex_unlock(&kexec_image_mutex); +} + struct kexec_load_limit { /* Mutex protects the limit count. */ struct mutex mutex; @@ -1139,6 +1219,7 @@ static struct kexec_load_limit load_limit_panic =3D { .limit =3D -1, }; =20 +/* Compatibility: maintain pointers to current default and crash images */ struct kimage *kexec_image; struct kimage *kexec_crash_image; static int kexec_load_disabled; @@ -1339,8 +1420,49 @@ int kernel_kexec(void) return error; } =20 +/* + * Find a multikernel image by entry point + */ +struct kimage *kimage_find_multikernel_by_entry(unsigned long entry) +{ + struct kimage *image; + + kimage_list_lock(); + list_for_each_entry(image, &kexec_image_list, list) { + if (image->type =3D=3D KEXEC_TYPE_MULTIKERNEL && image->start =3D=3D ent= ry) { + kimage_list_unlock(); + return image; + } + } + kimage_list_unlock(); + return NULL; +} + +/* + * Get multikernel image by index (0-based) + */ +struct kimage *kimage_get_multikernel_by_index(int index) +{ + struct kimage *image; + int count =3D 0; + + kimage_list_lock(); + list_for_each_entry(image, &kexec_image_list, list) { + if (image->type =3D=3D KEXEC_TYPE_MULTIKERNEL) { + if (count =3D=3D index) { + kimage_list_unlock(); + return image; + } + count++; + } + } + kimage_list_unlock(); + return NULL; +} + int multikernel_kexec(int cpu) { + struct kimage *mk_image; int rc; =20 pr_info("multikernel kexec: cpu %d\n", cpu); @@ -1352,13 +1474,52 @@ int multikernel_kexec(int cpu) =20 if (!kexec_trylock()) return -EBUSY; - if (!kexec_image) { + + mk_image =3D kimage_find_by_type(KEXEC_TYPE_MULTIKERNEL); + if (!mk_image) { + pr_err("No multikernel image loaded\n"); rc =3D -EINVAL; goto unlock; } =20 + pr_info("Found multikernel image with entry point: 0x%lx\n", mk_image->st= art); + + cpus_read_lock(); + rc =3D multikernel_kick_ap(cpu, mk_image->start); + cpus_read_unlock(); + +unlock: + kexec_unlock(); + return rc; +} + +int multikernel_kexec_by_entry(int cpu, unsigned long entry) +{ + struct kimage *mk_image; + int rc; + + pr_info("multikernel kexec: cpu %d, entry 0x%lx\n", cpu, entry); + + if (cpu_online(cpu)) { + pr_err("The CPU is currently running with this kernel instance."); + return -EBUSY; + } + + if (!kexec_trylock()) + return -EBUSY; + + /* Find the specific multikernel image by entry point */ + mk_image =3D kimage_find_multikernel_by_entry(entry); + if (!mk_image) { + pr_err("No multikernel image found with entry point 0x%lx\n", entry); + rc =3D -EINVAL; + goto unlock; + } + + pr_info("Using multikernel image with entry point: 0x%lx\n", mk_image->st= art); + cpus_read_lock(); - rc =3D multikernel_kick_ap(cpu, kexec_image->start); + rc =3D multikernel_kick_ap(cpu, mk_image->start); cpus_read_unlock(); =20 unlock: diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 91d46502a817..d4b8831eb59c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -399,8 +399,13 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, = initrd_fd, * same memory where old crash kernel might be loaded. Free any * current crash dump kernel before we corrupt it. */ - if (flags & KEXEC_FILE_ON_CRASH) - kimage_free(xchg(&kexec_crash_image, NULL)); + if (flags & KEXEC_FILE_ON_CRASH) { + struct kimage *old_crash_image =3D xchg(&kexec_crash_image, NULL); + if (old_crash_image) { + kimage_remove_from_list(old_crash_image); + kimage_free(old_crash_image); + } + } =20 ret =3D kimage_file_alloc_init(&image, kernel_fd, initrd_fd, cmdline_ptr, cmdline_len, flags); @@ -456,7 +461,29 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, = initrd_fd, */ kimage_file_post_load_cleanup(image); exchange: - image =3D xchg(dest_image, image); + if (image_type =3D=3D KEXEC_TYPE_CRASH) { + struct kimage *old_image =3D xchg(&kexec_crash_image, image); + if (old_image) { + kimage_remove_from_list(old_image); + kimage_free(old_image); + } + if (image) { + kimage_add_to_list(image); + kimage_update_compat_pointers(image, KEXEC_TYPE_CRASH); + } + image =3D NULL; /* Don't free the new image */ + } else { + struct kimage *old_image =3D xchg(&kexec_image, image); + if (old_image) { + kimage_remove_from_list(old_image); + kimage_free(old_image); + } + if (image) { + kimage_add_to_list(image); + kimage_update_compat_pointers(image, KEXEC_TYPE_DEFAULT); + } + image =3D NULL; /* Don't free the new image */ + } out: #ifdef CONFIG_CRASH_DUMP if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image) --=20 2.34.1 From nobody Thu Oct 2 07:44:01 2025 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 084A031B120 for ; Thu, 18 Sep 2025 22:26:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234395; cv=none; b=gi7If9GggxuC7bNh0gXoZhiMoLhofNpgeBx9DR655yZszazm5+X5bmqHxYPTUss29rgN9Iqvflz4bXgd54pDRqEwHVb3jwuWbf9D7DrdmJKoImSXQFmIUWOjFqEm1z8ePNZS3/JWx0Pan0k8m65iTUXD5pZu3YOCX2rdxRRUxc8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758234395; c=relaxed/simple; bh=HXOVkeb/AbxAfsZ9XuWOxMBgSD9X3noGMn+5V1M+aOg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BHoBPk6l6G7WTqfY76hQnH9CjHz/x0VBYFMBHVFScr5boS3+IgBS9nLXBQYyeqZCm+6jKhU/9o+vxfYKkPyASZ8egMIsT93qMGICoCkf7e+H3kw+aAx9xhiT/NiPeaUoo9UFv/dKn7Y6Up/3vBDhJ8RYTp3OLobdNTBGZo/YkZ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U672b9o3; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U672b9o3" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-77251d7cca6so1566128b3a.3 for ; Thu, 18 Sep 2025 15:26:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758234393; x=1758839193; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BwRb7SzhfZg7neWGwUqJxva5Q/Tf1fgxJiUlDnCKY6o=; b=U672b9o32A5shV5K2z6M7u20yYB7dFdm6Rr1GcJ1+B+JvEst6DLFzy48PWCE+bEiXc tRxyxmJc4DBfdb61d2dknCKW+rbbgLiaeOZ4ylBri/2tZewigUdGzmWFjLzEd4kVe7i+ wplNL72/p7FdyP8Re6Y9Ciz+FurI5RS0glN4I0Llactfc05dADpU9iQSO5s5hkOp9N4a q5Eti9tZjxdYpX+Q3pO51YgljB0+wx7qWnSQdcg37EJz3LNj+NvdrGqMY/qj8sO5I4kw viPdKrsMo7XaD0mM+L1i0Y2hQiZoMjMQY70DohOScT7cQjMsJSbgMsqhLgOJEjjQepkb +vaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758234393; x=1758839193; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BwRb7SzhfZg7neWGwUqJxva5Q/Tf1fgxJiUlDnCKY6o=; b=eoSL9Cxg7oEpLCsBnhYdp3GE2WPcHaJf12BZFdLcmR0UmjGmVGignq/6ID+OqWMQxV 3VLzmVB2Nf1sVHQcsi+Yem8Nz/Tig1fSPFWxOm91lzF6yzmpdXpAJK7Hl9AAv6f/iE9F ks78g/cfwikPdPwrgQUlOsLJLB+6tumayT3J2c7VR9VApfCfNTBGUYLi4Q6m2vRO68dG bU78Pn7XWA8LSub30ILDRKZI58W4gYAiJLQFI5QjgpsKwRZzb6Uk8w/8EdBwgevVieLh 8/WSzWEpmUdlHKmeuITfIyTFbmCcHi76dDIgK9s2LjzMEAZA6fmrxySjuoi03wk1UFdR QJTA== X-Gm-Message-State: AOJu0Yx8X4dsFHdQmYx4B4AjO2Qf7dAHdj/U9V9ZxvYPQPX4lTTinCZ7 e4fMkWAT0+UTKUuU4wzyLonWd65Y0swQv2UZVqgHWYG+tQ4vS+SMt0K37xG9pg== X-Gm-Gg: ASbGncsUXtwLW7aWSgtcY5AIYw+8B5ZP9YuFiTpb/80CKPe+SKEBM2gf1VGqWW0QiuD TdfmmQNZkSvhsexmNT137JD/3Rmwdh6SqGAwzKIcclq/BrK0eb0thkSiGrJz2d1AdDktwj8mdhY v+zvV/3aRqDstbyi00nwqh2F4dggFo3kMDT18B5pDl0DgqqxL+/N0cn1JXQ4+hnGSGMzL+3uy5n 3fTs5LMQ13G1Vi9vX312ukVWfWxADzqX8T8UNaXM5+5fW1rcSpo1hhELVi3cayOgB5F9AfeG1WE wO6rXRDgRKjUkyrAPW6SZJVsKKZn89l1tWb//7SBiZpoauR4Sdto1kKq/P4/l9SbD8tpCEkfMpE ugq9FzWK5EYdMbG8AQYp/b2MkVhQPvfENlRA0HfUPbdgHlSA= X-Google-Smtp-Source: AGHT+IENSD4FSp5oCJNvN+B+S7Ehm9laSk+STvS8UEs+QD3NwWg6ip8ENfOjlvBCKPM94MRsd6BQ9g== X-Received: by 2002:a05:6a00:139a:b0:772:7ddd:3fe0 with SMTP id d2e1a72fcca58-77e4d03267emr1101959b3a.2.1758234392993; Thu, 18 Sep 2025 15:26:32 -0700 (PDT) Received: from pop-os.. ([2601:647:6881:9060:c5c1:2e33:6cf2:beed]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-77e5fe6de75sm407542b3a.19.2025.09.18.15.26.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Sep 2025 15:26:32 -0700 (PDT) From: Cong Wang To: linux-kernel@vger.kernel.org Cc: pasha.tatashin@soleen.com, Cong Wang , Andrew Morton , Baoquan He , Alexander Graf , Mike Rapoport , Changyuan Lyu , kexec@lists.infradead.org, linux-mm@kvack.org Subject: [RFC Patch 7/7] kexec: Add /proc/multikernel interface for kimage tracking Date: Thu, 18 Sep 2025 15:26:06 -0700 Message-Id: <20250918222607.186488-8-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250918222607.186488-1-xiyou.wangcong@gmail.com> References: <20250918222607.186488-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang Add a dedicated /proc/multikernel file to provide read-only access to all loaded kernel images in the system. The interface displays kernel images in a tabular format showing: - Type: kexec type (default, crash, multikernel) - Start Address: entry point in hexadecimal format - Segments: number of memory segments A lot more information needs to be added here, for example a unique kernel ID allocated for each kimage. For now, let's focus on the design first. This interface is particularly useful for debugging multikernel setups, system monitoring, and verifying that kernel images are loaded correctly. Signed-off-by: Cong Wang --- kernel/kexec_core.c | 63 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 4e489a7031e6..8306c10fc337 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include #include @@ -1224,6 +1226,52 @@ struct kimage *kexec_image; struct kimage *kexec_crash_image; static int kexec_load_disabled; =20 +/* + * Proc interface for /proc/multikernel + */ +static int multikernel_proc_show(struct seq_file *m, void *v) +{ + struct kimage *image; + const char *type_names[] =3D { + [KEXEC_TYPE_DEFAULT] =3D "default", + [KEXEC_TYPE_CRASH] =3D "crash", + [KEXEC_TYPE_MULTIKERNEL] =3D "multikernel" + }; + + seq_printf(m, "Type Start Address Segments\n"); + seq_printf(m, "---------- -------------- --------\n"); + + kimage_list_lock(); + if (list_empty(&kexec_image_list)) { + seq_printf(m, "No kimages loaded\n"); + } else { + list_for_each_entry(image, &kexec_image_list, list) { + const char *type_name =3D "unknown"; + + if (image->type < ARRAY_SIZE(type_names) && type_names[image->type]) + type_name =3D type_names[image->type]; + + seq_printf(m, "%-10s 0x%012lx %8lu\n", + type_name, image->start, image->nr_segments); + } + } + kimage_list_unlock(); + + return 0; +} + +static int multikernel_proc_open(struct inode *inode, struct file *file) +{ + return single_open(file, multikernel_proc_show, NULL); +} + +static const struct proc_ops multikernel_proc_ops =3D { + .proc_open =3D multikernel_proc_open, + .proc_read =3D seq_read, + .proc_lseek =3D seq_lseek, + .proc_release =3D single_release, +}; + #ifdef CONFIG_SYSCTL static int kexec_limit_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) @@ -1295,6 +1343,21 @@ static int __init kexec_core_sysctl_init(void) late_initcall(kexec_core_sysctl_init); #endif =20 +static int __init multikernel_proc_init(void) +{ + struct proc_dir_entry *entry; + + entry =3D proc_create("multikernel", 0444, NULL, &multikernel_proc_ops); + if (!entry) { + pr_err("Failed to create /proc/multikernel\n"); + return -ENOMEM; + } + + pr_debug("Created /proc/multikernel interface\n"); + return 0; +} +late_initcall(multikernel_proc_init); + bool kexec_load_permitted(int kexec_image_type) { struct kexec_load_limit *limit; --=20 2.34.1