From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED64B21D59F for ; Mon, 10 Mar 2025 12:04:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608279; cv=none; b=ZWlv9Y4lOHQ3OymTEryan4ZMOF7v9nminj/I9nFN1zerEJUOpw8N3GYob8zH0bCX5ujpgRy+/yMHiVLk32STFW9sYm6ANo+65Up++UL3tiy8lCvTL5zFLqsW2L+AeXO7K2OXR9yp1z7DVcxvM0uGUXKGnGS5xC5BaNlBWPVxMj4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608279; c=relaxed/simple; bh=QggnY5Fc5rnH5a5rz/i951URt03uV3ivnH+MIDlD5RA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aIjFypWg5NZiLz5/5ZgOTj26FRBMM6ynS4EKk5Qrjexm3QJczvCI73jXz9z2oAIA50Dv1w3nOqQagkixhgAmNS8qQEjm4IG13FM12xin8v7W2cNjZ99ezVNQamCxLz/s3oTGyXaBDmT1HlnNN00OBn7DfXp7wpDozJSjhIg+cpk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=YmAdzPZ6; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="YmAdzPZ6" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 03EC360B5B; Mon, 10 Mar 2025 15:04:04 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-wpbfwx9m; Mon, 10 Mar 2025 15:04:02 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608243; bh=1b7amWEdfOMn3dBa5y8kCkl4gC/rL4o8XC5y5Q7zNnY=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=YmAdzPZ6RtAzCKFkm/3kw3wrdqegk5mcA5h8cUxsfgsVjemo8cq/ZFPdS717vlMeI eosrMAMlEZ1IzlszEE9Z4z08VMWAer6yN0gi/IBi+yJJ9S/5qQGFDAuRRnUR03yAJO fX9DMIwj0e2Fd0FPp8DyK3kUZU8pVYI4/J6Kb2vo= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 1/7] kstate: Add kstate - a mechanism to describe and migrate kernel state across kexec Date: Mon, 10 Mar 2025 13:03:12 +0100 Message-ID: <20250310120318.2124-2-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" KSTATE (kernel state) is a mechanism to describe internal kernel state save it into the memory and restore the state after kexec in new kernel. The end goal here and the main use case for this is to be able to update host kernel under VMs with VFIO pass-through devices running on that host. The idea behind KSTATE resembles QEMU's migration framework [1], which solves quite similar problem - migrate state of VM/emulated devices across different versions of QEMU. This and following patches try to establish some basic infrastructure to describe and migrate in-kernel data structures. State of kernel data (usually it's some struct) is described by the 'struct kstate_description' containing the array of individual fields descpriptions - 'struct kstate_field'. Each field has set of bits in ->flags which instructs how to save/restore a certain field of the struct. E.g. (see kstate.h for the full list): KS_BASE_TYPE flag tells that field can be just copied by value, KS_POINTER means that the struct member is a pointer to the actual data, so it needs to be dereference before saving/restoring data to/from kstate data steam. kstate_register() call accepts kstate_description along with an instance of an object and registers it in the global 'states' list. During kexec reboot phase we go through the list of 'kstate_description's and each instance of kstate_description forms the 'struct kstate_entry' which save into the kstate's data stream. The 'kstate_entry' contains information like ID of kstate_description, vers= ion of it, size of migration data and the data itself. The ->data is formed in accordance to the kstate_field's of the corresponding kstate_description. After the reboot, when the kstate_register() called it parses migration stream, finds the appropriate 'kstate_entry' and restores the contents of the object in accordance with kstate_description and ->fields. [1] https://www.qemu.org/docs/master/devel/migration/main.html#vmstate Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 178 ++++++++++++++++++++++++++ kernel/Kconfig.kexec | 13 ++ kernel/Makefile | 1 + kernel/kstate.c | 282 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 474 insertions(+) create mode 100644 include/linux/kstate.h create mode 100644 kernel/kstate.c diff --git a/include/linux/kstate.h b/include/linux/kstate.h new file mode 100644 index 000000000000..4fc01e535bc0 --- /dev/null +++ b/include/linux/kstate.h @@ -0,0 +1,178 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _KSTATE_H +#define _KSTATE_H + +#include +#include +#include +#include + +struct kstate_description; +struct kstate_stream; +struct kimage; + +enum kstate_flags { + + /* + * The struct member at 'obj + kstate_field.offset' is some basic + * type, just copy it by value. The size is kstate_field->size. + */ + + KS_BASE_TYPE =3D (1 << 0), + + /* + * The struct member at 'obj + kstate_field.offset' is a pointer + * to the actual data (e.g. struct a { int *b; }). + * save_kstate() will dereference the pointer to get the actual data + * and store it to the stream. restore_kstate() will copy the data from + * the stream to wherever the pointer points to. + */ + KS_POINTER =3D (1 << 1), + + /* + * The struct member at 'obj + kstate_field.offset' is another struct. + * kstate_field->ksd points to 'kstate_description' of that struct. + */ + KS_STRUCT =3D (1 << 2), + + /* + * Some non-trivial field that requires custom kstate_field->save() + * ->restore() callbacks to save/restore data. + */ + KS_CUSTOM =3D (1 << 3), + + /* + * The field is a array of kstate_field->count() pointers + * (e.g. struct a { uint8_t *b[]; }). Dereference each array entry + * before store/restore data. + */ + KS_ARRAY_OF_POINTER =3D (1 << 4), + + /* + * The field is a pointer to vmemmap or linear memory (determined by + * kstate_field->addr_type). This is used for pointers to persistent + * pages/data. Store offset from the start of the area instead of + * pointer itself, so we could defeat KASLR on restore phase (by adding + * new kernel's corresponding offset). + */ + KS_ADDRESS =3D (1 << 5), + + /* Marks the end of fields list */ + KS_END =3D (1UL << 31), +}; + +enum kstate_addr_type { + KS_VMEMMAP_ADDR, + KS_LINEAR_ADDR, +}; + +struct kstate_stream { + void *start; + void *pos; + size_t size; +}; + +struct kstate_field { + const char *name; + size_t offset; + size_t size; + enum kstate_flags flags; + const struct kstate_description *ksd; + enum kstate_addr_type addr_type; + int version_id; + int (*restore)(struct kstate_stream *stream, void *obj, + const struct kstate_field *field); + int (*save)(struct kstate_stream *stream, void *obj, + const struct kstate_field *field); + int (*count)(void); +}; + +enum kstate_ids { + KSTATE_LAST_ID =3D -1, +}; + +struct kstate_description { + const char *name; + enum kstate_ids id; + atomic_t instance_id; + int version_id; + struct list_head state_list; + + const struct kstate_field *fields; +}; + +struct state_entry { + u64 id; + struct list_head list; + struct kstate_description *kstd; + void *obj; +}; + +extern int kstate_save_data(struct kstate_stream *stream, void *val, size_= t size); + +static inline bool kstate_get_byte(struct kstate_stream *stream) +{ + bool ret =3D *(u8 *)stream->pos; + stream->pos++; + return ret; +} + +static inline unsigned long kstate_get_ulong(struct kstate_stream *stream) +{ + unsigned long ret =3D *(unsigned long *)stream->pos; + stream->pos +=3D sizeof(unsigned long); + return ret; +} + +#ifdef CONFIG_KSTATE + +int kstate_save_state(void); +void free_kstate_stream(void); + +int kstate_register(struct kstate_description *state, void *obj); + +struct kstate_entry; +int save_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, + void *obj); +void restore_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, void *obj); + +#else + +#define kstate_register(state, obj) + +static inline int kstate_save_state(void) { return 0; } +static inline void free_kstate_stream(void) { } + +#endif + + +#define KSTATE_BASE_TYPE(_f, _state, _type) { \ + .name =3D (__stringify(_f)), \ + .size =3D sizeof(_type) + BUILD_BUG_ON_ZERO( \ + !__same_type(typeof_member(_state, _f), _type)),\ + .flags =3D KS_BASE_TYPE, \ + .offset =3D offsetof(_state, _f), \ +} + +#define KSTATE_POINTER(_f, _state) { \ + .name =3D (__stringify(_f)), \ + .size =3D sizeof(*(((_state *)0)->_f)), \ + .flags =3D KS_POINTER, \ + .offset =3D offsetof(_state, _f), \ + } + +#define KSTATE_ADDRESS(_f, _state, _addr_type) { \ + .name =3D (__stringify(_f)), \ + .size =3D sizeof(*(((_state *)0)->_f)), \ + .addr_type =3D (_addr_type), \ + .flags =3D KS_ADDRESS, \ + .offset =3D offsetof(_state, _f), \ + } + +#define KSTATE_END_OF_LIST() { \ + .flags =3D KS_END,\ + } + +#endif diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index 4d111f871951..480dc156b08b 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -151,4 +151,17 @@ config CRASH_MAX_MEMORY_RANGES the computation behind the value provided through the /sys/kernel/crash_elfcorehdr_size attribute. =20 +config ARCH_HAS_KSTATE + bool + +config KSTATE + bool "Migrate internal kernel state across kexec" + default n + depends on ARCH_HAS_KSTATE + depends on KEXEC_FILE + help + KSTATE (kernel state) is a mechanism to describe internal kernel + state, save it into the memory and restore the state after kexec + in new kernel. + endmenu diff --git a/kernel/Makefile b/kernel/Makefile index 87866b037fbe..6bdf947fc84f 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -75,6 +75,7 @@ obj-$(CONFIG_CRASH_DUMP) +=3D crash_core.o obj-$(CONFIG_KEXEC) +=3D kexec.o obj-$(CONFIG_KEXEC_FILE) +=3D kexec_file.o obj-$(CONFIG_KEXEC_ELF) +=3D kexec_elf.o +obj-$(CONFIG_KSTATE) +=3D kstate.o obj-$(CONFIG_BACKTRACE_SELF_TEST) +=3D backtracetest.o obj-$(CONFIG_COMPAT) +=3D compat.o obj-$(CONFIG_CGROUPS) +=3D cgroup/ diff --git a/kernel/kstate.c b/kernel/kstate.c new file mode 100644 index 000000000000..a73a9a42e55b --- /dev/null +++ b/kernel/kstate.c @@ -0,0 +1,282 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include + +static LIST_HEAD(states); + +struct kstate_entry { + int state_id; + int version_id; + int instance_id; + int size; + DECLARE_FLEX_ARRAY(u8, data); +}; + +struct kstate_stream kstate_stream; + +static unsigned long get_addr_offset(const struct kstate_field *field) +{ + switch (field->addr_type) { + case KS_VMEMMAP_ADDR: + return VMEMMAP_START; + case KS_LINEAR_ADDR: + return PAGE_OFFSET; + default: + WARN_ON(1); + } + return 0; +} + +static int alloc_space(struct kstate_stream *stream, size_t size) +{ + void *new_start; + size_t new_size; + size_t cur_size =3D stream->pos - stream->start; + + size =3D size + 4; /* Always alloc extra for KSTATE_LAST_ID */ + if (cur_size + size < stream->size) + return 0; + + new_size =3D PAGE_ALIGN(cur_size + size); + + new_start =3D vrealloc(stream->start, new_size, GFP_KERNEL); + if (!new_start) + return -ENOMEM; + + stream->start =3D new_start; + stream->size =3D new_size; + stream->pos =3D stream->start + cur_size; + return 0; +} + +int kstate_save_data(struct kstate_stream *stream, void *val, size_t size) +{ + int ret; + + ret =3D alloc_space(stream, size); + if (ret) + return ret; + memcpy(stream->pos, val, size); + stream->pos +=3D size; + return 0; +} + +int save_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, + void *obj) +{ + const struct kstate_field *field =3D kstate->fields; + struct kstate_entry *ke; + unsigned long ke_off; + int ret =3D 0; + + ret =3D alloc_space(stream, sizeof(*ke)); + if (ret) + goto err; + + ke_off =3D stream->pos - stream->start; + ke =3D stream->pos; + stream->pos +=3D sizeof(*ke); + + ke->state_id =3D kstate->id; + ke->version_id =3D kstate->version_id; + ke->instance_id =3D id; + + while (field->flags !=3D KS_END) { + void *first, *cur; + int n_elems =3D 1; + int size, i; + + first =3D obj + field->offset; + + if (field->flags & KS_POINTER) + first =3D *(void **)(obj + field->offset); + if (field->count) + n_elems =3D field->count(); + size =3D field->size; + for (i =3D 0; i < n_elems; i++) { + cur =3D first + i * size; + + if (field->flags & KS_ARRAY_OF_POINTER) + cur =3D *(void **)cur; + + if (field->flags & KS_STRUCT) { + ret =3D save_kstate(stream, 0, field->ksd, cur); + if (ret) + goto err; + } else if (field->flags & KS_CUSTOM) { + if (field->save) { + ret =3D field->save(stream, cur, field); + if (ret) + goto err; + } + } else if (field->flags & (KS_BASE_TYPE|KS_POINTER)) { + ret =3D kstate_save_data(stream, cur, size); + if (ret) + goto err; + } else if (field->flags & KS_ADDRESS) { + void *addr_offset =3D *(void **)cur + - get_addr_offset(field); + ret =3D kstate_save_data(stream, &addr_offset, + sizeof(addr_offset)); + if (ret) + goto err; + } else + WARN_ON_ONCE(1); + } + field++; + + } + + ke =3D stream->start + ke_off; + ke->size =3D (stream->pos - stream->start) - (ke_off + sizeof(*ke)); +err: + if (ret) + pr_err("kstate: save of state %s failed\n", kstate->name); + + return ret; +} + +static int alloc_kstate_stream(void) +{ + size_t size =3D PAGE_SIZE; + void *buf; + + buf =3D vzalloc(size); + if (!buf) + return -ENOMEM; + + kstate_stream.size =3D size; + kstate_stream.start =3D kstate_stream.pos =3D buf; + return 0; +} + +void free_kstate_stream(void) +{ + vfree(kstate_stream.start); + kstate_stream.start =3D NULL; + kstate_stream.size =3D 0; +} + +int kstate_save_state(void) +{ + struct state_entry *se; + struct kstate_entry *ke; + int ret; + + ret =3D alloc_kstate_stream(); + if (ret) + return ret; + + list_for_each_entry(se, &states, list) { + ret =3D save_kstate(&kstate_stream, se->id, se->kstd, se->obj); + if (ret) + return ret; + } + ke =3D kstate_stream.pos; + ke->state_id =3D KSTATE_LAST_ID; + return 0; +} + +void restore_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, void *obj) +{ + const struct kstate_field *field =3D kstate->fields; + struct kstate_entry *ke =3D stream->pos; + stream->pos =3D ke->data; + + WARN_ONCE(ke->version_id !=3D kstate->version_id, "version mismatch %d %d= \n", + ke->version_id, kstate->version_id); + + WARN_ONCE(ke->instance_id !=3D id, "instance id mismatch %d %d\n", + ke->instance_id, id); + + while (field->flags !=3D KS_END) { + void *first, *cur; + int n_elems =3D 1; + int size, i; + + first =3D obj + field->offset; + if (field->flags & KS_POINTER) + first =3D *(void **)(obj + field->offset); + if (field->count) + n_elems =3D field->count(); + size =3D field->size; + for (i =3D 0; i < n_elems; i++) { + cur =3D first + i * size; + + if (field->flags & KS_ARRAY_OF_POINTER) + cur =3D *(void **)cur; + + if (field->flags & KS_STRUCT) + restore_kstate(stream, 0, field->ksd, cur); + else if (field->flags & KS_CUSTOM) { + if (field->restore) + field->restore(stream, cur, field); + } else if (field->flags & (KS_BASE_TYPE | KS_POINTER)) { + memcpy(cur, stream->pos, size); + stream->pos +=3D size; + } else if (field->flags & KS_ADDRESS) { + *(void **)cur =3D (*(void **)stream->pos) + + get_addr_offset(field); + stream->pos +=3D sizeof(void *); + } else + WARN_ON_ONCE(1); + + } + field++; + } +} + +static void restore_migrate_state(unsigned long kstate_data, + struct state_entry *se) +{ + struct kstate_stream stream; + struct kstate_entry *ke; + + if (kstate_data =3D=3D -1) + return; + + ke =3D (struct kstate_entry *)phys_to_virt(kstate_data); + if (WARN_ON_ONCE(ke->state_id =3D=3D 0)) + return; + + stream.start =3D stream.pos =3D ke; + while (ke->state_id !=3D KSTATE_LAST_ID) { + if (ke->state_id !=3D se->kstd->id || + ke->instance_id !=3D se->id) { + ke =3D (struct kstate_entry *)(ke->data + ke->size); + continue; + } + stream.pos =3D ke; + restore_kstate(&stream, se->id, se->kstd, se->obj); + ke =3D (struct kstate_entry *)(ke->data + ke->size); + } +} + +static void __kstate_register(struct kstate_description *state, void *obj, + struct state_entry *se) +{ + se->kstd =3D state; + se->id =3D atomic_inc_return(&state->instance_id); + se->obj =3D obj; + list_add(&se->list, &states); + restore_migrate_state(0 /*migrate_stream_addr*/, se); +} + +int kstate_register(struct kstate_description *state, void *obj) +{ + struct state_entry *se; + + se =3D kmalloc(sizeof(*se), GFP_KERNEL); + if (!se) + return -ENOMEM; + + __kstate_register(state, obj, se); + return 0; +} + --=20 2.45.3 From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 730B322156F for ; Mon, 10 Mar 2025 12:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608280; cv=none; b=RSTqAt//cCLveaP9EqJDrJkVeMd6feUYEJgfmx7cXhHBanOWuI4hMuFcrYjUz2SSTbQFu0wPwlHDdZDcIUxdds5xslsXQHFm8hDnkHyManIvKgVgjO7nkf09mPbdevnz7l5RaY0TL5dG0jquWGFd0wyFs0o8FWFm5tVFngk5tnk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608280; c=relaxed/simple; bh=5OH6zwiSw0oNmq1LgVVbHiG2za82nMq8SrllEkAQIHg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FL6+u3YesI+DvrFQN6C4w2zhCyvlxrwOKuRwTVSgto2CUQGyvQKGqlxdH7GLJhGXcXzxmpPDbWBid9acHhpzJztNLO1VJPJ/ZGAtkAcjhO0kTkbRpFdEV7nK5ytzVh0Gwoy8BK9/xdIYgxJifs/rqUlmEVx2sayGz7SADseQFH8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=N3tQSyH7; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="N3tQSyH7" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 3D4D160DE9; Mon, 10 Mar 2025 15:04:06 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-U5ZTnMcq; Mon, 10 Mar 2025 15:04:05 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608245; bh=QkhEET4Wo+gouJiK8Lgb4vMFbcgAXJpnZW6CjWUXYUI=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=N3tQSyH7q5Aw8vLpj6oIzXJUFol2N5amtQM0FyQ56H5V1ZTHW1L15Ttrhb4B+gc9s kwO45GOraHszy2v9J2VM/BFPgJC3K8JF8XkbIkUICHgn30PaHURZx5W6AKTRCMaz9h /YGOyUFNzrS71N7G5BQA84g6wuvpXWXemUAU3A/g= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 2/7] kstate, kexec, x86: transfer kstate data across kexec Date: Mon, 10 Mar 2025 13:03:13 +0100 Message-ID: <20250310120318.2124-3-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add kstate data to kexec segments so it got copied to the new kernel. Use cmdline to inform next kernel about kstate data location and size. Signed-off-by: Andrey Ryabinin --- I've used cmdline as it's the simplest way to transfer address to the new kernel. Perhaps passing it via dtb would be more elegant solution, but I don't have strong opinion here. --- arch/x86/Kconfig | 1 + arch/x86/kernel/kexec-bzimage64.c | 4 +++ arch/x86/kernel/setup.c | 2 ++ include/linux/kexec.h | 2 ++ include/linux/kstate.h | 5 ++++ kernel/kexec_file.c | 5 ++++ kernel/kstate.c | 49 ++++++++++++++++++++++++++++++- 7 files changed, 67 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0e27ebd7e36a..7358d9e15957 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -90,6 +90,7 @@ config X86 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_KERNEL_FPU_SUPPORT + select ARCH_HAS_KSTATE if X86_64 select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzim= age64.c index 68530fad05f7..d3c98c8bda29 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -77,6 +78,9 @@ static int setup_cmdline(struct kimage *image, struct boo= t_params *params, len =3D sprintf(cmdline_ptr, "elfcorehdr=3D0x%lx ", image->elf_load_addr); } + if (IS_ENABLED(CONFIG_KSTATE)) + len =3D sprintf(cmdline_ptr, "kstate_stream=3D0x0%lx@%ld ", + image->kstate_stream_addr, image->kstate_size); memcpy(cmdline_ptr + len, cmdline, cmdline_len); cmdline_len +=3D len; =20 diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index cebee310e200..b32c141ffcdd 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -992,6 +993,7 @@ void __init setup_arch(char **cmdline_p) =20 memblock_set_current_limit(ISA_END_ADDRESS); e820__memblock_setup(); + kstate_init(); =20 /* * Needs to run after memblock setup because it needs the physical diff --git a/include/linux/kexec.h b/include/linux/kexec.h index f0e9f8eda7a3..bd82f04888a1 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -299,6 +299,8 @@ struct kimage { unsigned long start; struct page *control_code_page; struct page *swap_page; + unsigned long kstate_stream_addr; + size_t kstate_size; void *vmcoreinfo_data_copy; /* locates in the crash memory */ =20 unsigned long nr_segments; diff --git a/include/linux/kstate.h b/include/linux/kstate.h index 4fc01e535bc0..ae583d090111 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -126,6 +126,8 @@ static inline unsigned long kstate_get_ulong(struct kst= ate_stream *stream) =20 #ifdef CONFIG_KSTATE =20 +void kstate_init(void); + int kstate_save_state(void); void free_kstate_stream(void); =20 @@ -137,14 +139,17 @@ int save_kstate(struct kstate_stream *stream, int id, void *obj); void restore_kstate(struct kstate_stream *stream, int id, const struct kstate_description *kstate, void *obj); +int kstate_load_migrate_buf(struct kimage *image); =20 #else =20 +static inline void kstate_init(void) { } #define kstate_register(state, obj) =20 static inline int kstate_save_state(void) { return 0; } static inline void free_kstate_stream(void) { } =20 +static inline int kstate_load_migrate_buf(struct kimage *image) { return 0= ; } #endif =20 =20 diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 3eedb8c226ad..a024ff379133 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -253,6 +254,10 @@ kimage_file_prepare_segments(struct kimage *image, int= kernel_fd, int initrd_fd, /* IMA needs to pass the measurement list to the next kernel. */ ima_add_kexec_buffer(image); =20 + ret =3D kstate_load_migrate_buf(image); + if (ret) + goto out; + /* Call image load handler */ ldata =3D kexec_image_load_default(image); =20 diff --git a/kernel/kstate.c b/kernel/kstate.c index a73a9a42e55b..d35996287b76 100644 --- a/kernel/kstate.c +++ b/kernel/kstate.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -182,6 +183,31 @@ int kstate_save_state(void) return 0; } =20 +int kstate_load_migrate_buf(struct kimage *image) +{ + int ret; + struct kexec_buf kbuf =3D { .image =3D image, .buf_min =3D 0, + .buf_max =3D ULONG_MAX, .top_down =3D true }; + + kbuf.bufsz =3D kstate_stream.size; + kbuf.buffer =3D kstate_stream.start; + + kbuf.memsz =3D kstate_stream.size; + + kbuf.buf_align =3D PAGE_SIZE; + kbuf.mem =3D KEXEC_BUF_MEM_UNKNOWN; + ret =3D kexec_add_buffer(&kbuf); + if (ret) + return ret; + image->kstate_stream_addr =3D kbuf.mem; + image->kstate_size =3D kstate_stream.size; + + pr_info("kstate: Loaded mig_stream at 0x%lx bufsz=3D0x%lx memsz=3D0x%lx\n= ", + kbuf.mem, kbuf.bufsz, kbuf.memsz); + + return ret; +} + void restore_kstate(struct kstate_stream *stream, int id, const struct kstate_description *kstate, void *obj) { @@ -258,6 +284,9 @@ static void restore_migrate_state(unsigned long kstate_= data, } } =20 +static unsigned long kstate_stream_addr =3D -1; +static unsigned long kstate_size; + static void __kstate_register(struct kstate_description *state, void *obj, struct state_entry *se) { @@ -265,7 +294,7 @@ static void __kstate_register(struct kstate_description= *state, void *obj, se->id =3D atomic_inc_return(&state->instance_id); se->obj =3D obj; list_add(&se->list, &states); - restore_migrate_state(0 /*migrate_stream_addr*/, se); + restore_migrate_state(kstate_stream_addr, se); } =20 int kstate_register(struct kstate_description *state, void *obj) @@ -280,3 +309,21 @@ int kstate_register(struct kstate_description *state, = void *obj) return 0; } =20 +static int __init setup_kstate(char *arg) +{ + char *end; + + if (!arg) + return -EINVAL; + kstate_stream_addr =3D memparse(arg, &end); + if (*end =3D=3D '@') + kstate_size =3D memparse(end + 1, &end); + + return end > arg ? 0 : -EINVAL; +} +early_param("kstate_stream", setup_kstate); + +void __init kstate_init(void) +{ + memblock_reserve(kstate_stream_addr, kstate_size); +} --=20 2.45.3 From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CCB8215F49 for ; Mon, 10 Mar 2025 12:04:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608265; cv=none; b=QZjh5pKBmiI+U+Bk5wun6qo1K8oRY9XA1xVsP8LFbK3IY/IAaizyFkYiifcWp0Tpxs3OOsNGhKQoMKSHXpH7YFu1AUaNi34RBQ/dkD/lZ4CcF2sGKOgQELis2xl/v/OsxmF6cPkX9LmT+r0eieRhcXtBBBD7IaFW74s9PbbWswM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608265; c=relaxed/simple; bh=gYFjsqXR8gnavjmf/To2k/w9el2VyW5ETtk72eDoBLE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TJ5zkJOVz6QvhYOggQYJGdG6Gae7xeUJi9X0fF0tRuR4hh8ydpdRFl8PSxU5wYQfB4hvjE182tOmeD9GdJHGUFWHuCr3TGH0bQTmlYXSvT1jOvGXzad9MaXnruyvmiPYn8umZ/gzkUzHIwyBFVR+hvnJl8+AXD4S7Q0h/Uk9MWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=Zt+/vVXD; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="Zt+/vVXD" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 5958960DCE; Mon, 10 Mar 2025 15:04:08 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-61iTaEl4; Mon, 10 Mar 2025 15:04:07 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608247; bh=dCfHmi7JSqwfRn9hK0doUgOfig37g7tF5ZyIee0j3QQ=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=Zt+/vVXDKpSCu8KDQZJ8u8uEpxL5R4fTTyn3U+HF42BdMdqDXozWrf/zuiX14bZkH oTU0E2qHEUTnKlbABGK7W3Ny4IiGns9mKIZhLDvhYoLMNrAZ/l3Mgu8cIFmpSlrxFX CW2cOcIjkxXIsVbLsjrUPNEkV+dqhsmZyuQeWDOQ= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 3/7] kexec: exclude control pages from the destination addresses Date: Mon, 10 Mar 2025 13:03:14 +0100 Message-ID: <20250310120318.2124-4-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Kexec relies on control pages allocated after all destination ranges have been chosen. To be able to preserve memory across kexec we need to be able to pick destination ranges after the control pages allocated. Add check for control pages to locate_mem_hole() callbacks so it excludes control pages, hence we can allocate them in any order. Signed-off-by: Andrey Ryabinin --- kernel/kexec_core.c | 18 ++++++++++++++++++ kernel/kexec_file.c | 18 ++++-------------- kernel/kexec_internal.h | 3 +++ 3 files changed, 25 insertions(+), 14 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c0bdc1686154..647ab5705c37 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -264,6 +264,24 @@ int kimage_is_destination_range(struct kimage *image, return 0; } =20 +int kimage_is_control_page(struct kimage *image, + unsigned long start, + unsigned long end) +{ + + struct page *page; + + list_for_each_entry(page, &image->control_pages, lru) { + unsigned long pstart, pend; + pstart =3D page_to_boot_pfn(page) << PAGE_SHIFT; + pend =3D pstart + PAGE_SIZE * (1 << page_private(page)) - 1; + if ((end >=3D pstart) && (start <=3D pend)) + return 1; + } + + return 0; +} + static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order) { struct page *pages; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index a024ff379133..8ecd34071bfa 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -464,7 +464,8 @@ static int locate_mem_hole_top_down(unsigned long start= , unsigned long end, * Make sure this does not conflict with any of existing * segments */ - if (kimage_is_destination_range(image, temp_start, temp_end)) { + if (kimage_is_destination_range(image, temp_start, temp_end) || + kimage_is_control_page(image, temp_start, temp_end)) { temp_start =3D temp_start - PAGE_SIZE; continue; } @@ -498,7 +499,8 @@ static int locate_mem_hole_bottom_up(unsigned long star= t, unsigned long end, * Make sure this does not conflict with any of existing * segments */ - if (kimage_is_destination_range(image, temp_start, temp_end)) { + if (kimage_is_destination_range(image, temp_start, temp_end) || + kimage_is_control_page(image, temp_start, temp_end)) { temp_start =3D temp_start + PAGE_SIZE; continue; } @@ -671,18 +673,6 @@ int kexec_add_buffer(struct kexec_buf *kbuf) if (kbuf->image->nr_segments >=3D KEXEC_SEGMENT_MAX) return -EINVAL; =20 - /* - * Make sure we are not trying to add buffer after allocating - * control pages. All segments need to be placed first before - * any control pages are allocated. As control page allocation - * logic goes through list of segments to make sure there are - * no destination overlaps. - */ - if (!list_empty(&kbuf->image->control_pages)) { - WARN_ON(1); - return -EINVAL; - } - /* Ensure minimum alignment needed for segments. */ kbuf->memsz =3D ALIGN(kbuf->memsz, PAGE_SIZE); kbuf->buf_align =3D max(kbuf->buf_align, PAGE_SIZE); diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index d35d9792402d..12e655a70e25 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -14,6 +14,9 @@ int kimage_load_segment(struct kimage *image, struct kexe= c_segment *segment); void kimage_terminate(struct kimage *image); int kimage_is_destination_range(struct kimage *image, unsigned long start, unsigned long end); +int kimage_is_control_page(struct kimage *image, + unsigned long start, + unsigned long end); =20 /* * Whatever is used to serialize accesses to the kexec_crash_image needs t= o be --=20 2.45.3 From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6CAA221556 for ; Mon, 10 Mar 2025 12:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608279; cv=none; b=JjzDMrwmY2BKDsLTfgn4WXSU04Pl1V5B140UPaPAAthJJyQu+0jJ2FTWp3oSbXw78oZRHHtG+6imtK247TxkSxdcjxshFV66QCeEfiYHwP02J1ztbpIYQ7SidtA1EBpI/7KnZ2otrjGOOS2q723EX+bB5LJfOo0ZUELYXRSPBqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608279; c=relaxed/simple; bh=ZiV9jUv1hl43mTkUHz7sC932CxLq046Mkxz7ZY1uga4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CdumMiVLBOpJDKD8rYSvPvq+Sv9zj6kNthHP6+IiV+QCogAPenkxBYMZO1sgUXrUK8TDPD7iUHvXmngEDulubvIF2Fpswzu/6sc1LRGNxg+EPiy9L1GsulUTllED+3yURqBIqp4ssl6DqrmCVvkDJSGV+gfkE4hl/9PYqBHGEK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=x72TBAYJ; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="x72TBAYJ" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id A80A160E06; Mon, 10 Mar 2025 15:04:10 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-3q4ic3FI; Mon, 10 Mar 2025 15:04:09 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608250; bh=C5c3jBE2oO6bqhBgG572MhRimfpuB96f90y0qeqDqaE=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=x72TBAYJRosfJkV7QqfGY6QJIW4ozXZqVj+oHuUpLHNMH2cZMJvUt085RbtBwN2x+ d7TAjQWgFTsAD4ThVtRq38Oxh4sxvtZW/C5NJi3+wwzfXsmGrTOuJh6TQNaT1lZGRs bX2YRijLOeTBrqHRVuYuRUt4MkNUCTd9QMpWNxT4= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 4/7] kexec, kstate: delay loading of kexec segments Date: Mon, 10 Mar 2025 13:03:15 +0100 Message-ID: <20250310120318.2124-5-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" KSTATE's purpose is to preserve some memory across kexec. To make this happen kexec needs to choose destination ranges after the KSTATE, so these ranges doesn't collide with KSTATE preserved memory. Kexec chooses destination ranges on the kexec load stage which might happen long before the actual reboot to the new kernel. This means that KSTATE must know all preserved memory before the kexec_file_load(), unless we delay loading of kexec segments/destination addresses to the latter, at the point of reboot to the new kernel. So let's do that. Signed-off-by: Andrey Ryabinin --- include/linux/kexec.h | 1 + kernel/kexec_core.c | 6 ++ kernel/kexec_file.c | 144 ++++++++++++++++++++++++++-------------- kernel/kexec_internal.h | 6 ++ 4 files changed, 108 insertions(+), 49 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index bd82f04888a1..539aaacfd3fd 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -377,6 +377,7 @@ extern void machine_kexec(struct kimage *image); extern int machine_kexec_prepare(struct kimage *image); extern void machine_kexec_cleanup(struct kimage *image); extern int kernel_kexec(void); +extern int kexec_file_load_segments(struct kimage *image); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); =20 diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 647ab5705c37..7c79addeb93b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1017,6 +1017,12 @@ int kernel_kexec(void) goto Unlock; } =20 + if (kexec_late_load(kexec_image)) { + error =3D kexec_file_load_segments(kexec_image); + if (error) + goto Unlock; + } + #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { /* diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 8ecd34071bfa..634e2ed4cc4c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -187,6 +187,34 @@ kimage_validate_signature(struct kimage *image) } #endif =20 +static int kimage_add_buffers(struct kimage *image) +{ + void *ldata; + int ret =3D 0; + + /* IMA needs to pass the measurement list to the next kernel. */ + ima_add_kexec_buffer(image); + + ret =3D kstate_load_migrate_buf(image); + if (ret) + goto out; + + /* Call image load handler */ + ldata =3D kexec_image_load_default(image); + + if (IS_ERR(ldata)) { + ret =3D PTR_ERR(ldata); + goto out; + } + + image->image_loader_data =3D ldata; +out: + /* In case of error, free up all allocated memory in this function */ + if (ret) + kimage_file_post_load_cleanup(image); + return ret; + +} /* * In file mode list of segments is prepared by kernel. Copy relevant * data from user space, do error checking, prepare segment list @@ -197,7 +225,6 @@ kimage_file_prepare_segments(struct kimage *image, int = kernel_fd, int initrd_fd, unsigned long cmdline_len, unsigned flags) { ssize_t ret; - void *ldata; =20 ret =3D kernel_read_file_from_fd(kernel_fd, 0, &image->kernel_buf, KEXEC_FILE_SIZE_MAX, NULL, @@ -251,22 +278,6 @@ kimage_file_prepare_segments(struct kimage *image, int= kernel_fd, int initrd_fd, image->cmdline_buf_len - 1); } =20 - /* IMA needs to pass the measurement list to the next kernel. */ - ima_add_kexec_buffer(image); - - ret =3D kstate_load_migrate_buf(image); - if (ret) - goto out; - - /* Call image load handler */ - ldata =3D kexec_image_load_default(image); - - if (IS_ERR(ldata)) { - ret =3D PTR_ERR(ldata); - goto out; - } - - image->image_loader_data =3D ldata; out: /* In case of error, free up all allocated memory in this function */ if (ret) @@ -303,10 +314,6 @@ kimage_file_alloc_init(struct kimage **rimage, int ker= nel_fd, if (ret) goto out_free_image; =20 - ret =3D sanity_check_segment_list(image); - if (ret) - goto out_free_post_load_bufs; - ret =3D -ENOMEM; image->control_code_page =3D kimage_alloc_control_pages(image, get_order(KEXEC_CONTROL_PAGE_SIZE)); @@ -334,6 +341,70 @@ kimage_file_alloc_init(struct kimage **rimage, int ker= nel_fd, return ret; } =20 +static int kimage_post_load(struct kimage *image) +{ + int ret, i; + + ret =3D kexec_calculate_store_digests(image); + if (ret) + goto out; + + kexec_dprintk("nr_segments =3D %lu\n", image->nr_segments); + for (i =3D 0; i < image->nr_segments; i++) { + struct kexec_segment *ksegment; + + ksegment =3D &image->segment[i]; + kexec_dprintk("segment[%d]: buf=3D0x%p bufsz=3D0x%zx mem=3D0x%lx memsz= =3D0x%zx\n", + i, ksegment->buf, ksegment->bufsz, ksegment->mem, + ksegment->memsz); + + ret =3D kimage_load_segment(image, &image->segment[i]); + if (ret) + goto out; + } + + kimage_terminate(image); + + ret =3D machine_kexec_post_load(image); + if (ret) + goto out; + + kexec_dprintk("kexec_file_load: type:%u, start:0x%lx head:0x%lx\n", + image->type, image->start, image->head); +out: + return ret; +} + +int kexec_file_load_segments(struct kimage *image) +{ + int ret; + + ret =3D kimage_add_buffers(image); + if (ret) { + pr_err("failed to add kimage buffers %d\n", ret); + goto out; + } + + ret =3D sanity_check_segment_list(image); + if (ret) { + pr_err("sanity check failed %d\n", ret); + goto out; + } + + ret =3D kimage_post_load(image); + if (ret) + pr_err("kimage post load failed %d\n", ret); + +out: + /* + * Free up any temporary buffers allocated which are not needed + * after image has been loaded + */ + kimage_file_post_load_cleanup(image); + + return ret; +} + SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, unsigned long, cmdline_len, const char __user *, cmdline_ptr, unsigned long, flags) @@ -341,7 +412,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, i= nitrd_fd, int image_type =3D (flags & KEXEC_FILE_ON_CRASH) ? KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; struct kimage **dest_image, *image; - int ret =3D 0, i; + int ret =3D 0; =20 /* We only trust the superuser with rebooting the system. */ if (!kexec_load_permitted(image_type)) @@ -398,37 +469,12 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int,= initrd_fd, if (ret) goto out; =20 - ret =3D kexec_calculate_store_digests(image); - if (ret) - goto out; - - kexec_dprintk("nr_segments =3D %lu\n", image->nr_segments); - for (i =3D 0; i < image->nr_segments; i++) { - struct kexec_segment *ksegment; - - ksegment =3D &image->segment[i]; - kexec_dprintk("segment[%d]: buf=3D0x%p bufsz=3D0x%zx mem=3D0x%lx memsz= =3D0x%zx\n", - i, ksegment->buf, ksegment->bufsz, ksegment->mem, - ksegment->memsz); - - ret =3D kimage_load_segment(image, &image->segment[i]); + if (!kexec_late_load(image)) { + ret =3D kexec_file_load_segments(image); if (ret) goto out; } =20 - kimage_terminate(image); - - ret =3D machine_kexec_post_load(image); - if (ret) - goto out; - - kexec_dprintk("kexec_file_load: type:%u, start:0x%lx head:0x%lx flags:0x%= lx\n", - image->type, image->start, image->head, flags); - /* - * Free up any temporary buffers allocated which are not needed - * after image has been loaded - */ - kimage_file_post_load_cleanup(image); exchange: image =3D xchg(dest_image, image); out: diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 12e655a70e25..690b1c21b642 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -34,6 +34,12 @@ static inline void kexec_unlock(void) atomic_set_release(&__kexec_lock, 0); } =20 +static inline bool kexec_late_load(struct kimage *image) +{ + return IS_ENABLED(CONFIG_KSTATE) && image->file_mode && + (image->type =3D=3D KEXEC_TYPE_DEFAULT); +} + #ifdef CONFIG_KEXEC_FILE #include void kimage_file_post_load_cleanup(struct kimage *image); --=20 2.45.3 From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ADE722172E for ; Mon, 10 Mar 2025 12:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608279; cv=none; b=qU+MhnIKPhuILfGNaCBM31SIjCczcsBeLCKsahauQn13ZX/vhP4neJ0DAms44XG7fZUeXmN+AagwvTCLK4Zm6aTIlVLvgw0Ktg9yEDBXg1kOR6GALckQA55wyz0pcSYEBX6bXMghs7B9hkjRf0WIK4dtTzmuFnNZr9BI9gj6OX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608279; c=relaxed/simple; bh=2vklyyA3ErRN0JV+jV70G8ZrirPgj1dr+ngipHvmLnE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HyqdxsGaGy1c8PE5T0SGY2kO+zB4k2ktCoTLUX0hGJdMOqbI7mqPnjVr5fqdsdlV6VwAii57Ns/S77x3VMt/mgezaZFJYz9RQ8pMFjhf3qTuHDs7hFjl3iZQIlufSbJfMdna8Rzha9eugB+ytCu1RDvSyYFBxV34BzEFZIPqezE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=tKVDYRr1; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="tKVDYRr1" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id D8CAE60EAF; Mon, 10 Mar 2025 15:04:12 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-nBaQjWYT; Mon, 10 Mar 2025 15:04:12 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608252; bh=9sCYhPB2JXvpHRR7mGAnOD5wFhzVOzlRXeW9jewkrPs=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=tKVDYRr1Fc9PZMG5kcffLo5ML52XYvjB9fT2RP6/B4ZrI68qC5XErlG3jfkkQS0xQ LrPykZVjqAxqEQcIvjW8Q1fYRGUY3Nxe3UrpdFO1uG7CnyBvoqCzCxivetoS5oBrG4 Z3/CxdprsbnC7nsU6PIIrdFT3yANX6Z3kGq3BFqU= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 5/7] x86, kstate: Add the ability to preserve memory pages across kexec. Date: Mon, 10 Mar 2025 13:03:16 +0100 Message-ID: <20250310120318.2124-6-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds ability to specify page of memory that kstate needs to preserve across kexec. kstate_register_page() stores struct page in the special list of 'struct kpage_state's. At kexec reboot stage this list iterated, pfns saved into kstate's data stream. The new kernel after kexec reads pfns from the stream and marks memory as reserved to keep it intact. Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 30 ++++++++++ kernel/kexec_core.c | 3 +- kernel/kstate.c | 124 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 156 insertions(+), 1 deletion(-) diff --git a/include/linux/kstate.h b/include/linux/kstate.h index ae583d090111..36cfefd87572 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -88,6 +88,8 @@ struct kstate_field { }; =20 enum kstate_ids { + KSTATE_RSVD_MEM_ID =3D 1, + KSTATE_STRUCT_PAGE_ID, KSTATE_LAST_ID =3D -1, }; =20 @@ -124,6 +126,8 @@ static inline unsigned long kstate_get_ulong(struct kst= ate_stream *stream) return ret; } =20 +extern struct kstate_description page_state; + #ifdef CONFIG_KSTATE =20 void kstate_init(void); @@ -141,6 +145,12 @@ void restore_kstate(struct kstate_stream *stream, int = id, const struct kstate_description *kstate, void *obj); int kstate_load_migrate_buf(struct kimage *image); =20 +int kstate_page_save(struct kstate_stream *stream, void *obj, + const struct kstate_field *field); +int kstate_register_page(struct page *page, int order); + +bool kstate_range_is_preserved(unsigned long start, unsigned long end); + #else =20 static inline void kstate_init(void) { } @@ -150,6 +160,11 @@ static inline int kstate_save_state(void) { return 0; } static inline void free_kstate_stream(void) { } =20 static inline int kstate_load_migrate_buf(struct kimage *image) { return 0= ; } + +static inline bool kstate_range_is_preserved(unsigned long start, + unsigned long end) +{ return 0; } + #endif =20 =20 @@ -176,6 +191,21 @@ static inline int kstate_load_migrate_buf(struct kimag= e *image) { return 0; } .offset =3D offsetof(_state, _f), \ } =20 +#define KSTATE_PAGE(_f, _state) \ + { \ + .name =3D "page", \ + .flags =3D KS_CUSTOM, \ + .offset =3D offsetof(_state, _f), \ + .save =3D kstate_page_save, \ + }, \ + KSTATE_ADDRESS(_f, _state, KS_VMEMMAP_ADDR), \ + { \ + .name =3D "struct_page", \ + .flags =3D KS_STRUCT | KS_POINTER, \ + .offset =3D offsetof(_state, _f), \ + .ksd =3D &page_state, \ + } + #define KSTATE_END_OF_LIST() { \ .flags =3D KS_END,\ } diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 7c79addeb93b..5d001b7a9e44 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -261,7 +262,7 @@ int kimage_is_destination_range(struct kimage *image, return 1; } =20 - return 0; + return kstate_range_is_preserved(start, end); } =20 int kimage_is_control_page(struct kimage *image, diff --git a/kernel/kstate.c b/kernel/kstate.c index d35996287b76..68a1272abceb 100644 --- a/kernel/kstate.c +++ b/kernel/kstate.c @@ -309,6 +309,13 @@ int kstate_register(struct kstate_description *state, = void *obj) return 0; } =20 +int kstate_page_save(struct kstate_stream *stream, void *obj, + const struct kstate_field *field) +{ + kstate_register_page(*(struct page **)obj, 0); + return 0; +} + static int __init setup_kstate(char *arg) { char *end; @@ -323,7 +330,124 @@ static int __init setup_kstate(char *arg) } early_param("kstate_stream", setup_kstate); =20 +/* + * TODO: probably should use folio instead/in addition, + * also will need to think/decide what fields + * to preserve or not + */ +struct kstate_description page_state =3D { + .name =3D "struct_page", + .id =3D KSTATE_STRUCT_PAGE_ID, + .state_list =3D LIST_HEAD_INIT(page_state.state_list), + .fields =3D (const struct kstate_field[]) { + KSTATE_BASE_TYPE(_mapcount, struct page, atomic_t), + KSTATE_BASE_TYPE(_refcount, struct page, atomic_t), + KSTATE_END_OF_LIST() + }, +}; + +struct state_entry preserved_se; + +struct preserved_pages { + unsigned int nr_pages; + struct list_head list; +}; +struct kpage_state { + struct list_head list; + u8 order; + struct page *page; +}; + +struct preserved_pages preserved_pages =3D { + .list =3D LIST_HEAD_INIT(preserved_pages.list) +}; + +int kstate_register_page(struct page *page, int order) +{ + struct kpage_state *state; + + state =3D kmalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return -ENOMEM; + + state->page =3D page; + state->order =3D order; + list_add(&state->list, &preserved_pages.list); + preserved_pages.nr_pages++; + return 0; +} + +static int kstate_pages_save(struct kstate_stream *stream, void *obj, + const struct kstate_field *field) +{ + struct kpage_state *p_state; + int ret; + + list_for_each_entry(p_state, &preserved_pages.list, list) { + unsigned long paddr =3D page_to_phys(p_state->page); + + ret =3D kstate_save_data(stream, &p_state->order, + sizeof(p_state->order)); + if (ret) + return ret; + ret =3D kstate_save_data(stream, &paddr, sizeof(paddr)); + if (ret) + return ret; + } + return 0; +} + +bool kstate_range_is_preserved(unsigned long start, unsigned long end) +{ + struct kpage_state *p_state; + + list_for_each_entry(p_state, &preserved_pages.list, list) { + unsigned long pstart, pend; + pstart =3D page_to_boot_pfn(p_state->page); + pend =3D pstart + (p_state->order << PAGE_SHIFT) - 1; + if ((end >=3D pstart) && (start <=3D pend)) + return 1; + } + return 0; +} + +static int __init kstate_pages_restore(struct kstate_stream *stream, void = *obj, + const struct kstate_field *field) +{ + struct preserved_pages *preserved_pages =3D obj; + int nr_pages, i; + + nr_pages =3D preserved_pages->nr_pages; + for (i =3D 0; i < nr_pages; i++) { + int order =3D kstate_get_byte(stream); + unsigned long phys =3D kstate_get_ulong(stream); + + memblock_reserve(phys, PAGE_SIZE << order); + } + return 0; +} + +struct kstate_description kstate_preserved_mem =3D { + .name =3D "preserved_range", + .id =3D KSTATE_RSVD_MEM_ID, + .state_list =3D LIST_HEAD_INIT(kstate_preserved_mem.state_list), + .fields =3D (const struct kstate_field[]) { + KSTATE_BASE_TYPE(nr_pages, struct preserved_pages, unsigned int), + { + .name =3D "pages", + .flags =3D KS_CUSTOM, + .size =3D sizeof(struct preserved_pages), + .save =3D kstate_pages_save, + .restore =3D kstate_pages_restore, + }, + + KSTATE_END_OF_LIST() + }, +}; + void __init kstate_init(void) { memblock_reserve(kstate_stream_addr, kstate_size); + __kstate_register(&kstate_preserved_mem, &preserved_pages, + &preserved_se); } --=20 2.45.3 From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AE49221733 for ; Mon, 10 Mar 2025 12:04:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608278; cv=none; b=YmiEyvc8YFsro59t0v4j8SJKpCmmMFkRGVQNsCE0LqBJjHvRnaYjlwdFByTepOeN8L/DTnvxI2uSJe1yvc5m3zFOCq5q+y7I+AOgCgWjedsqXa3T+N2spF4b3uI0OwZ0HkVmu5ezonwTvWrIAULLeS7Ezg8ohRywXRE+I0ysvqA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608278; c=relaxed/simple; bh=Npn4X6d6WGu5s6M3va8+mm8EMB42OwT4Yrz2X7vo9ok=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n4gxrQUKPMYd+UmhzNjU+Fqi86uhmQSwmh1XmAtn6fsXqfNc1fY1CaN6NeM4EZEVvyvz7/osyRA2l1hCyQGYJeuibQ+3wVZsBPsUWa42azb9s+n20GHgfqpTmc4YRwDZWLHx4HZUhJmz4AqTcUCDK0oYeutq+PlTJfKja9fNthQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=X8XMfwpK; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="X8XMfwpK" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id EB62A60EA5; Mon, 10 Mar 2025 15:04:14 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-4JGepy3f; Mon, 10 Mar 2025 15:04:14 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608254; bh=x8DAP9eUkaG75fkMO+dR1XpuCnXAgRk3qwZNeEbFzSE=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=X8XMfwpKpDu3TGM8eK67J+BXWD2pI0JbT66tVSPMZZ0DiCY2JiXh7/uXc320kZaMU IJHEWFV9cPloQKHvb18wQ0KuvOvE+q0ePmRIbCRYatLzo/haokQJ41l3Mfs/0sOpwX 2f3uIuhOrLk7paFJV3a+Y4xsnqPm9YHI58QoRTVk= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 6/7] kexec, kstate: save kstate data before kexec'ing Date: Mon, 10 Mar 2025 13:03:17 +0100 Message-ID: <20250310120318.2124-7-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Call kstate_save_state() to serialize all the required data into the kstate data stream. Signed-off-by: Andrey Ryabinin --- kernel/kexec_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 5d001b7a9e44..7dcdaee14bfa 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1017,11 +1017,14 @@ int kernel_kexec(void) error =3D -EINVAL; goto Unlock; } + error =3D kstate_save_state(); + if (error) + goto Unlock; =20 if (kexec_late_load(kexec_image)) { error =3D kexec_file_load_segments(kexec_image); if (error) - goto Unlock; + goto Free_kstate; } =20 #ifdef CONFIG_KEXEC_JUMP @@ -1104,6 +1107,8 @@ int kernel_kexec(void) } #endif =20 + Free_kstate: + free_kstate_stream(); Unlock: kexec_unlock(); return error; --=20 2.45.3 From nobody Wed Dec 17 12:51:16 2025 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF14D223323 for ; Mon, 10 Mar 2025 12:04:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.72 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608280; cv=none; b=sz3QPbTo6WjgMTGOn+G2TKzAA6ATHmy8l1DtbCgh2Gdz7QaN/SU8Hg8K19DhE9uBQJmJqPke9CY9k1sepoa9KpjaGhLAHp+CGw0meWYZlRf/gAa7wmwoAUCcpqnExyDJr5ymzJBpgG5yovGmmQOIah9BcKSTY8NQ0gVFwRdYGh4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741608280; c=relaxed/simple; bh=n3dgl6K4Adj0IvmwC1gJJjBtOyR4T3IlLIdrP+cQco4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BNNlUOVhDhlwcBqPMhftFhzgEwhLjVkMCwVtWY6fav9+ZGK8RvJoLe9kziyYL2Qfe7+3M8HS6mPxzbFO0e3EA/j0ON5JVO7HQg5XnYxST54x0KnBRY0W6AWO62f+cALhFND0ZYWMJNpvciriyb/7y1XgxmNpA6vaROWQEVA/pTI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com; spf=pass smtp.mailfrom=yandex-team.com; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b=3N8MAR+n; arc=none smtp.client-ip=178.154.239.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex-team.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.com header.i=@yandex-team.com header.b="3N8MAR+n" Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 1ED3460EB2; Mon, 10 Mar 2025 15:04:17 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-HnZRPCd1; Mon, 10 Mar 2025 15:04:16 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608256; bh=yQbSR/IJoyQ0CM0slHnQK+Nqd9vWCYt9LC8hjrwd9hg=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=3N8MAR+n+0LltwNhIsUCsS4z9ZdlN3mVjT3YjhVxhQapXN9CB0W8hACTOOcCmQHqZ mfYcPKoqXadSjs+nvISkRWgKPHTOWWT9SOuKc3vWVBZnWWdW6b1pj0AUVYfILPIgOg 7txAX17EwUCueYGi5I8why2gQaWxCynL+viNswMM= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.com From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 7/7] kstate, test: add test module for testing kstate subsystem. Date: Mon, 10 Mar 2025 13:03:18 +0100 Message-ID: <20250310120318.2124-8-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is simple test and playground useful kstate subsystem development. It contains some structure with different kind of data which migrated across kexec to the new kernel using kstate. Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 3 ++ kernel/kstate.c | 5 +++ lib/Makefile | 2 + lib/test_kstate.c | 86 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+) create mode 100644 lib/test_kstate.c diff --git a/include/linux/kstate.h b/include/linux/kstate.h index 36cfefd87572..0bde76aa4d8f 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -90,6 +90,7 @@ struct kstate_field { enum kstate_ids { KSTATE_RSVD_MEM_ID =3D 1, KSTATE_STRUCT_PAGE_ID, + KSTATE_TEST_ID, KSTATE_LAST_ID =3D -1, }; =20 @@ -132,6 +133,8 @@ extern struct kstate_description page_state; =20 void kstate_init(void); =20 +bool is_kstate_kernel(void); + int kstate_save_state(void); void free_kstate_stream(void); =20 diff --git a/kernel/kstate.c b/kernel/kstate.c index 68a1272abceb..3d9b786da72a 100644 --- a/kernel/kstate.c +++ b/kernel/kstate.c @@ -287,6 +287,11 @@ static void restore_migrate_state(unsigned long kstate= _data, static unsigned long kstate_stream_addr =3D -1; static unsigned long kstate_size; =20 +bool is_kstate_kernel(void) +{ + return kstate_stream_addr !=3D -1; +} + static void __kstate_register(struct kstate_description *state, void *obj, struct state_entry *se) { diff --git a/lib/Makefile b/lib/Makefile index d5cfc7afbbb8..1395b852b58d 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -356,6 +356,8 @@ obj-$(CONFIG_PARMAN) +=3D parman.o =20 obj-y +=3D group_cpus.o =20 +obj-$(CONFIG_KSTATE) +=3D test_kstate.o + # GCC library routines obj-$(CONFIG_GENERIC_LIB_ASHLDI3) +=3D ashldi3.o obj-$(CONFIG_GENERIC_LIB_ASHRDI3) +=3D ashrdi3.o diff --git a/lib/test_kstate.c b/lib/test_kstate.c new file mode 100644 index 000000000000..1d9feb017415 --- /dev/null +++ b/lib/test_kstate.c @@ -0,0 +1,86 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +static unsigned long ulong_val; +struct kstate_test_data { + int i; + unsigned long *p_ulong; + char s[10]; + struct page *page; +}; + +struct kstate_description test_state =3D { + .name =3D "test", + .version_id =3D 1, + .id =3D KSTATE_TEST_ID, + .state_list =3D LIST_HEAD_INIT(test_state.state_list), + .fields =3D (const struct kstate_field[]) { + KSTATE_BASE_TYPE(i, struct kstate_test_data, int), + KSTATE_BASE_TYPE(s, struct kstate_test_data, char [10]), + KSTATE_POINTER(p_ulong, struct kstate_test_data), + KSTATE_PAGE(page, struct kstate_test_data), + KSTATE_END_OF_LIST() + }, +}; + +static struct kstate_test_data test_data; + +static int init_test_data(void) +{ + struct page *page; + int i; + + test_data.i =3D 10; + ulong_val =3D 20; + memcpy(test_data.s, "abcdefghk", sizeof(test_data.s)); + page =3D alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + for (i =3D 0; i < PAGE_SIZE/4; i +=3D 4) + *((u32 *)page_address(page) + i) =3D 0xdeadbeef; + test_data.page =3D page; + return 0; +} + +static void validate_test_data(void) +{ + int i; + + if (WARN_ON(test_data.i !=3D 10)) + return; + if (WARN_ON(*test_data.p_ulong !=3D 20)) + return; + if (WARN_ON(strcmp(test_data.s, "abcdefghk") !=3D 0)) + return; + + for (i =3D 0; i < PAGE_SIZE/4; i +=3D 4) { + u32 val =3D *((u32 *)page_address(test_data.page) + i); + + WARN_ON(val !=3D 0xdeadbeef); + } +} + +static int __init test_kstate_init(void) +{ + int ret =3D 0; + + test_data.p_ulong =3D &ulong_val; + + if (!is_kstate_kernel()) { + ret =3D init_test_data(); + if (ret) + goto out; + } + + kstate_register(&test_state, &test_data); + + validate_test_data(); + +out: + return ret; +} +__initcall(test_kstate_init); --=20 2.45.3