From nobody Thu Oct 2 11:49:14 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82C0B165F16 for ; Wed, 17 Sep 2025 02:50:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077438; cv=none; b=EmJgcSavK5YDXb2AWReDl4oWUbqejN0v7D/Cf+SBdwc7pchvG81VEuYIVagWmgFyG+E/+Tj8ml5nXYWvoZx9oEJSlNwEiG/elbpedWkjjY8b/w0W7zRd9jRRIwfoPya7RHS6TLyrX/4TJV2H5D+nc7A61H3UZhmEKObh0ErysfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077438; c=relaxed/simple; bh=la2sRRtzR5JDMIAAh/doGQw8WebURrqNFVMYWyGCFPA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=b4JPF/wHmhrGxegBxiKOnoAOplvbRzKAXnspnMhsZvBqamG1McZPw2CSxZWJ2QHwfiBdxlerF2fGLEmeMNgsRn6rIpBBAgxq8hACfuGZbGNjPe6gDMSLKMg5BU1IIzJXQgOFZrWZqW+TWc0v/HBMpjP2Gf4BlGUHLyBGuGtApxs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fnj2ubLP; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fnj2ubLP" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2659a7488a2so57838515ad.3 for ; Tue, 16 Sep 2025 19:50:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758077436; x=1758682236; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=3wAnisGajbcYa8IELNM7cwb6nACj53AhyM2UU/05uLE=; b=fnj2ubLPXr8FWqlJjE4QOH8/Q2Sjr10a/UIMj3nIeIZY8/vDteuP1lPf75/54eTa/D aPGuZSEmPAgghccRPoHptOwuoH6pAbQdq91aXvmsLvcOkN98adfUkPAs4TB5DqNbTwaX FXGOYyrA4lbaNeFmR2LiTT72+AHUDPQ4EULQNMlpdCxbL2hfMfXPBbnRUgr6OxEp7/p8 g7ll15mZSd/e2cM9cwS3rt/jqxP4h2rEPkZHGVe6trMCckyLYqf4Pa0RVMx69OV97fLv 7Lt6l++pyswnD9Y4V7r7ISonkcoo6eUfEWzWfFi+NsGL2uuYeYMuGIPkgbuUg2NygfnQ T3Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758077436; x=1758682236; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3wAnisGajbcYa8IELNM7cwb6nACj53AhyM2UU/05uLE=; b=UCt4GoOdDg2/QDJsS28sSKQBFSncGOscHBGaWaMpmb9rqRDPERR8MRE9PG5a+2k3BD NOIGxY/MDxAoVhnjzQdtiMiXyDMNpobW0t1dSAQR72ApJH/Q566CkkKo/ZS8aYeVKD4u aWpOIovsSTSi3yMHsXUYAI1DoM6UmSdWNZjK79rZPg/VE5KttHJt1AHWy8jrOJIKeiqp eUpM1V7If4CwF9a1eAURRyPoLF8YixBVaS4udab1gUpgbEyYd9GAt1P//ugH2/wGw0v7 RCOOiVKRw4+QYDcM6/QvJex4XZFZzPw68WU8x9W/PjRGVHwna2xGABWRWFFIzT+mcBn9 x7FQ== X-Forwarded-Encrypted: i=1; AJvYcCXwLuzsrhoCgFSFXHCiUL/+36+OZmXEzUtMCN56gZaLBZXST849qEuNIAe9B15ANnG3bWO7PM203yxqbL4=@vger.kernel.org X-Gm-Message-State: AOJu0Ywe+zasWP6HegKyAqUOmkVFOhtwHcpZWOSRlBlOOZmZMHG60QQb kE1WvRgvHoGSGQzfr4A3J0eN+3KjcH4YXR8PFM6lJbaEK6IRHJRMhVGHSLEfbucYGYl5dYvhr2R kHhGGOxcQ6xQc5g== X-Google-Smtp-Source: AGHT+IEpg0QQAP6xJjBpVWMcX87q/1xHOaFPzqUck3HzcEwQKjUg13HBNsgVlIXij2DbclZb+DlqACShaMT9mQ== X-Received: from pluq14.prod.google.com ([2002:a17:903:4b4e:b0:267:b6b7:9ac3]) (user=jasonmiu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1b24:b0:24d:4d8b:a17 with SMTP id d9443c01a7336-26813cf391amr6167545ad.56.1758077435641; Tue, 16 Sep 2025 19:50:35 -0700 (PDT) Date: Tue, 16 Sep 2025 19:50:16 -0700 In-Reply-To: <20250917025019.1585041-1-jasonmiu@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250917025019.1585041-1-jasonmiu@google.com> X-Mailer: git-send-email 2.51.0.384.g4c02a37b29-goog Message-ID: <20250917025019.1585041-2-jasonmiu@google.com> Subject: [RFC v1 1/4] kho: Introduce KHO page table data structures From: Jason Miu To: Alexander Graf , Andrew Morton , Baoquan He , Changyuan Lyu , David Matlack , David Rientjes , Jason Gunthorpe , Jason Miu , Joel Granados , Marcos Paulo de Souza , Mario Limonciello , Mike Rapoport , Pasha Tatashin , Petr Mladek , "Rafael J . Wysocki" , Steven Chen , Yan Zhao , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a page-table-like data structure for tracking preserved memory pages, which will replace the current xarray-based implementation. The primary motivation for this change is to eliminate the need for serialization. By marking preserved pages directly in these new tables and passing them to the next kernel, the entire serialization process can be removed. This ultimately allows for the removal of the KHO finalize and abort states, simplifying the overall design. The new KHO page table is a hierarchical structure that maps physical addresses to preservation metadata. It begins with a root `kho_order_table` that contains an entry for each page order. Each entry points to a separate, multi-level tree of `kho_page_table`s that splits a physical address into indices. The traversal terminates at a `kho_bitmap_table`, where each bit represents a single preserved page. This commit adds the core data structures for this hierarchy: - kho_order_table: The root table, indexed by page order. - kho_page_table: Intermediate-level tables. - kho_bitmap_table: The lowest-level table where individual pages are marked. The new functions are not yet integrated with the public `kho_preserve_*` APIs and are marked `__maybe_unused`. The full integration and the removal of the old xarray code will follow in a subsequent commit. Signed-off-by: Jason Miu --- kernel/kexec_handover.c | 344 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 344 insertions(+) diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index ecd1ac210dbd..0daed51c8fb7 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -46,6 +46,350 @@ static int __init kho_parse_enable(char *p) } early_param("kho", kho_parse_enable); =20 +/* + * KHO page tables provide a page-table-like data structure for tracking + * preserved memory pages. It is a hierarchical structure that starts with= a + * `struct kho_order_table`. Each entry in this table points to the root o= f a + * `struct kho_page_table` tree, which tracks the preserved memory pages f= or a + * specific page order. + * + * Each entry in a `struct kho_page_table` points to the next level page t= able, + * until level 2, which points to a `struct kho_bitmap_table`. The lowest = level + * (level 1) is a bitmap table where each bit represents a preserved page. + * + * The table hierarchy is shown as below. + * + * kho_order_table + * +-------------------------------+--------------------+ + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER | + * ++------------------------------+--------------------+ + * | + * | + * v + * ++------+ + * | Lv6 | kho_page_table + * ++------+ + * | + * | + * | +-------+ + * +-> | Lv5 | kho_page_table + * ++------+ + * | + * | + * | +-------+ + * +-> | Lv4 | kho_page_table + * ++------+ + * | + * | + * | +-------+ + * +-> | Lv3 | kho_page_table + * ++------+ + * | + * | + * | +-------+ + * +> | Lv2 | kho_page_table + * ++------+ + * | + * | + * | +-------+ + * +-> | Lv1 | kho_bitmap_table + * +-------+ + * + * The depth of the KHO page tables depends on the system's page size and = the + * page order. Both larger page sizes and higher page orders result in + * shallower KHO page tables. For example, on a system with a 4KB native + * page size, 0-order tables have a depth of 6 levels. + * + * The following diagram illustrates how a physical address is split into + * indices for the different KHO page table levels and the final bitmap. + * + * 63 62:54 53:45 44:36 35:27 26:0 + * +--------+--------+--------+--------+--------+-----------------+ + * | Lv 6 | Lv 5 | Lv 4 | Lv 3 | Lv 2 | Lv 1 (bitmap) | + * +--------+--------+--------+--------+--------+-----------------+ + * + * For higher order pages, the bit fields for each level shift to the left= by + * the page order. + * + * Each KHO page table and bitmap table is PAGE_SIZE in size. For 0-order + * pages, the bitmap table contains (PAGE_SIZE * 8) bits, covering a + * (PAGE_SIZE * 8 * PAGE_SIZE) memory range. For example, on a system with= a + * 4KB native page size, the bitmap table contains 32768 bits and covers a + * 128MB memory range. + * + * Each KHO page table contains (PAGE_SIZE / 8) entries, where each entry = is a + * descriptor (a physical address) pointing to the next level table. + * For example, with a 4KB page size, each page table holds 512 entries. + * The level 2 KHO page table is an exception, where each entry points to a + * KHO bitmap table instead. + * + * An entry of a KHO page table of a 4KB page system is shown as below as = an + * example. + * + * 63:12 11:0 + * +------------------------------+--------------+ + * | descriptor to next table | zeros | + * +------------------------------+--------------+ + */ + +#define BITMAP_TABLE_SHIFT(_order) (PAGE_SHIFT + PAGE_SHIFT + 3 + (_order)) +#define BITMAP_TABLE_MASK(_order) ((1ULL << BITMAP_TABLE_SHIFT(_order)) - = 1) +#define PRESERVED_PAGE_OFFSET_SHIFT(_order) (PAGE_SHIFT + (_order)) +#define PAGE_TABLE_SHIFT_PER_LEVEL (ilog2(PAGE_SIZE / sizeof(unsigned long= ))) +#define PAGE_TABLE_LEVEL_MASK ((1ULL << PAGE_TABLE_SHIFT_PER_LEVEL) - 1) +#define PTR_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long)) + +typedef int (*kho_walk_callback_t)(phys_addr_t pa, int order); + +struct kho_bitmap_table { + unsigned long bitmaps[PAGE_SIZE / sizeof(unsigned long)]; +}; + +struct kho_page_table { + unsigned long tables[PTR_PER_LEVEL]; +}; + +struct kho_order_table { + unsigned long orders[HUGETLB_PAGE_ORDER + 1]; +}; + +/* + * `kho_order_table` points to a page that serves as the root of the KHO p= age + * table hierarchy. This page is allocated during KHO module initializatio= n. + * Its physical address is written to the FDT and passed to the next kernel + * during kexec. + */ +static struct kho_order_table *kho_order_table; + +static unsigned long kho_page_table_level_shift(int level, int order) +{ + /* + * Calculate the cumulative bit shift required to extract the page table + * index for a given physical address at a specific `level` and `order`. + * + * - Level 1 is the bitmap table, which has its own indexing logic, so + * the shift is 0. + * - Level 2 and above: The base shift is `BITMAP_TABLE_SHIFT(order)`, + * which corresponds to the entire address space covered by a single + * level 1 bitmap table. + * - Each subsequent level adds `PAGE_TABLE_SHIFT_PER_LEVEL` to the + * total shift amount. + */ + return level <=3D 1 ? 0 : + BITMAP_TABLE_SHIFT(order) + PAGE_TABLE_SHIFT_PER_LEVEL * (level - 2); +} + +static int kho_get_bitmap_table_index(unsigned long pa, int order) +{ + /* 4KB (12bits of addr) + 8B per entries (6bits of addr) + order bits */ + unsigned long idx =3D pa >> (PAGE_SHIFT + 6 + order); + + return idx; +} + +static int kho_get_page_table_index(unsigned long pa, int order, int level) +{ + unsigned long high_addr; + unsigned long page_table_offset; + unsigned long shift; + + if (level =3D=3D 1) + return kho_get_bitmap_table_index(pa, order); + + shift =3D kho_page_table_level_shift(level, order); + high_addr =3D pa >> shift; + + page_table_offset =3D high_addr & PAGE_TABLE_LEVEL_MASK; + return page_table_offset; +} + +static int kho_table_level(int order) +{ + unsigned long bits_to_resolve; + int page_table_num; + + /* We just need 1 bitmap table to cover all addresses */ + if (BITMAP_TABLE_SHIFT(order) >=3D 64) + return 1; + + bits_to_resolve =3D 64 - BITMAP_TABLE_SHIFT(order); + + /* + * The level we need is the bits to resolve over the bits a page tabel + * can resolve. Get the ceiling as ceil(a/b) =3D (a + b - 1) / b. + * Total level is the all table levels plus the buttom + * bitmap level. + */ + page_table_num =3D (bits_to_resolve + PAGE_TABLE_SHIFT_PER_LEVEL - 1) + / PAGE_TABLE_SHIFT_PER_LEVEL; + return page_table_num + 1; +} + +static struct kho_page_table *kho_alloc_page_table(void) +{ + return (struct kho_page_table *)get_zeroed_page(GFP_KERNEL); +} + +static void kho_set_preserved_page_bit(struct kho_bitmap_table *bitmap_tab= le, + unsigned long pa, int order) +{ + int bitmap_table_index =3D kho_get_bitmap_table_index(pa, order); + int offset; + + /* Get the bit offset in a 64bits bitmap entry */ + offset =3D (pa >> PRESERVED_PAGE_OFFSET_SHIFT(order)) & 0x3f; + + set_bit(offset, + (unsigned long *)&bitmap_table->bitmaps[bitmap_table_index]); +} + +static unsigned long kho_pgt_desc(struct kho_page_table *va) +{ + return (unsigned long)virt_to_phys(va); +} + +static struct kho_page_table *kho_page_table(unsigned long desc) +{ + return (struct kho_page_table *)phys_to_virt(desc); +} + +static int __kho_preserve_page_table(unsigned long pa, int order) +{ + int num_table_level =3D kho_table_level(order); + struct kho_page_table *cur; + struct kho_page_table *next; + struct kho_bitmap_table *bitmap_table; + int i, page_table_index; + unsigned long page_table_desc; + + if (!kho_order_table->orders[order]) { + cur =3D kho_alloc_page_table(); + if (!cur) + return -ENOMEM; + page_table_desc =3D kho_pgt_desc(cur); + kho_order_table->orders[order] =3D page_table_desc; + } + + cur =3D kho_page_table(kho_order_table->orders[order]); + + /* Go from high level tables to low level tables */ + for (i =3D num_table_level; i > 1; i--) { + page_table_index =3D kho_get_page_table_index(pa, order, i); + + if (!cur->tables[page_table_index]) { + next =3D kho_alloc_page_table(); + if (!next) + return -ENOMEM; + cur->tables[page_table_index] =3D kho_pgt_desc(next); + } else { + next =3D kho_page_table(cur->tables[page_table_index]); + } + + cur =3D next; + } + + /* Cur is now pointing to the level 1 bitmap table */ + bitmap_table =3D (struct kho_bitmap_table *)cur; + kho_set_preserved_page_bit(bitmap_table, + pa & BITMAP_TABLE_MASK(order), + order); + + return 0; +} + +/* + * TODO: __maybe_unused is added to the functions: + * kho_preserve_page_table() + * kho_walk_tables() + * kho_memblock_reserve() + * since they are not actually being called in this change. + * __maybe_unused will be removed in the next patch. + */ +static __maybe_unused int kho_preserve_page_table(unsigned long pfn, int o= rder) +{ + unsigned long pa =3D PFN_PHYS(pfn); + + might_sleep(); + + return __kho_preserve_page_table(pa, order); +} + +static int __kho_walk_bitmap_table(int order, + struct kho_bitmap_table *bitmap_table, + unsigned long pa, + kho_walk_callback_t cb) +{ + int i; + unsigned long offset; + int ret =3D 0; + int order_factor =3D 1 << order; + unsigned long *bitmap =3D (unsigned long *)bitmap_table; + + for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { + offset =3D (unsigned long)PAGE_SIZE * order_factor * i; + ret =3D cb(offset + pa, order); + if (ret) + return ret; + } + + return 0; +} + +static int __kho_walk_page_tables(int order, int level, + struct kho_page_table *cur, unsigned long pa, + kho_walk_callback_t cb) +{ + struct kho_page_table *next; + struct kho_bitmap_table *bitmap_table; + int i; + unsigned long offset; + int ret =3D 0; + + if (level =3D=3D 1) { + bitmap_table =3D (struct kho_bitmap_table *)cur; + return __kho_walk_bitmap_table(order, bitmap_table, pa, cb); + } + + for (i =3D 0; i < PTR_PER_LEVEL; i++) { + if (cur->tables[i]) { + next =3D kho_page_table(cur->tables[i]); + offset =3D i; + offset <<=3D kho_page_table_level_shift(level, order); + ret =3D __kho_walk_page_tables(order, level - 1, + next, offset + pa, cb); + if (ret < 0) + return ret; + } + } + + return 0; +} + +static __maybe_unused int kho_walk_page_tables(struct kho_page_table *top,= int order, + kho_walk_callback_t cb) +{ + int num_table_level; + + if (top) { + num_table_level =3D kho_table_level(order); + return __kho_walk_page_tables(order, num_table_level, top, 0, cb); + } + + return 0; +} + +static __maybe_unused int kho_memblock_reserve(phys_addr_t pa, int order) +{ + int sz =3D 1 << (order + PAGE_SHIFT); + struct page *page =3D phys_to_page(pa); + + memblock_reserve(pa, sz); + memblock_reserved_mark_noinit(pa, sz); + page->private =3D order; + + return 0; +} + /* * Keep track of memory that is to be preserved across KHO. * --=20 2.51.0.384.g4c02a37b29-goog From nobody Thu Oct 2 11:49:14 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D70D62F60C0 for ; Wed, 17 Sep 2025 02:50:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077444; cv=none; b=ZIRrKZhFdF86fzz68FTZOjjbsIU6JgyWLrGVxkmwAuq0bQbura6aGvevX36LJrWc9qgx3NXndG5FiGE16mirzElkzg78VyKZ0M4Gw0hg8qt9XmWaBHAyZ5iDHFJ1js2nQjlQ2X5WfqB6XeLuP58/xyWIlI/4Hogr5WXFBcjArf8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077444; c=relaxed/simple; bh=ZeFJ5CJ93ezJ9ud5P1PKnY1D5bX4jEEc7kPQgimxR9A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=cA5sjb1GX/y9wMIQdiLhh1+ssBAra3eYoGQQc0bhH/ARbWtsinO4b8lERD7KBlEO7ep2RKrKr9wI4jyNqJZhbmoQ72Ks0GkaYDXPKXQ0ZrcrxH9rMdO5lxFvY50XcbMft52vLvhCkbmz47d9H2co27JTr3CrJjI0tPRYPwHAtKY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hzpsCtt0; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hzpsCtt0" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b54b311beabso3297677a12.0 for ; Tue, 16 Sep 2025 19:50:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758077441; x=1758682241; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=RU6/kpZEGAR06M2wVDolyGswZ9cjx0r+1Ic9uqduI2Q=; b=hzpsCtt0nhXZset6gmWkYJAFRgFTcQYGV1oSKUtFwfXOd8IMGd/jpcWh52YC5a/2gF SdIfvwLwaK4p3jB9VAKJCwvfUuKed0YMBqUWIos4BWS/l6+/dZs2MzYD9Nj1jnmXrPH+ AvBFRCHddAMxd1J4Pfi8jW4ARYll40jygc0MmWQ/bJpspJFdNm17PvRwQAIrDWxja5ZU FD5oigYUcPrlJB51BorS3+Uw1E5Z79dGny6Cma4uN+4+ZKksfn7eIstRogNvKy7Ice8A zCicdnffnC0k0opeEhRpWivG/YAO/0Ya1ABN/HeOmhp08Z1AWZTUzj1wFCkcW6IxRafw 9/lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758077441; x=1758682241; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RU6/kpZEGAR06M2wVDolyGswZ9cjx0r+1Ic9uqduI2Q=; b=saJWNE9W7CXSyXFImpLghoeaptJrKF9Q2W9bqlNsmuqIlhsjxB3VkN/xDWNzy1acKS V29T7icOBNfmH3XraseHvClvj/B967SkAsPCM7jxacjSNJFeoy7Bg6CyrYYMiy4Tbfuj JdeBEm6YMiReAt4lwJiyRK2ypqbw0uUZ3OGs29nxZ8QFYczmGCV3elaNlSnUiR7xjngf KtH604M9Fha9R5tWn2JH2WS5gRtrHem+iX6Zx442fNIM3J+gB1Wvj87RkDzLlJr9gsyv 8w+N4A76VYXz4m16aemr+QsDz8A6QsWvUAL8+TuWeRmz6zdrtzf8i51cJbGmNHCmRWKi 8Tzg== X-Forwarded-Encrypted: i=1; AJvYcCULcLJ8xkldUpdsHFs8doAB5lQFNWL1dkiwlm0rJOImOHU4Fx78gUvs1wspg3B0CaaFDjdQcbjBWmPMV/4=@vger.kernel.org X-Gm-Message-State: AOJu0YzVxKqi6+QrJJWWr0tGSnEXC0IzkfMZgUGGBivZvmwyscSs1w7j uXh4v0ExWNCz0DDjNdcObjMIUfMrCIs5+fz2BL4poYTuM/A6Ggo4mDQ+fVTdeo04yADMig/j+x3 OJXx2dQagr6qUoA== X-Google-Smtp-Source: AGHT+IErFehkHx558QRyQefB1/NtFlc2KGjefUET9/Av+8/Gyz/SnvIhDvJdH2sa4sKdNORbBQODik0RC7MKrg== X-Received: from plkb5.prod.google.com ([2002:a17:903:fa5:b0:264:7b3c:4fe4]) (user=jasonmiu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:c406:b0:260:df70:f753 with SMTP id d9443c01a7336-268137fd0a4mr6203875ad.38.1758077441117; Tue, 16 Sep 2025 19:50:41 -0700 (PDT) Date: Tue, 16 Sep 2025 19:50:17 -0700 In-Reply-To: <20250917025019.1585041-1-jasonmiu@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250917025019.1585041-1-jasonmiu@google.com> X-Mailer: git-send-email 2.51.0.384.g4c02a37b29-goog Message-ID: <20250917025019.1585041-3-jasonmiu@google.com> Subject: [RFC v1 2/4] kho: Adopt KHO page tables and remove serialization From: Jason Miu To: Alexander Graf , Andrew Morton , Baoquan He , Changyuan Lyu , David Matlack , David Rientjes , Jason Gunthorpe , Jason Miu , Joel Granados , Marcos Paulo de Souza , Mario Limonciello , Mike Rapoport , Pasha Tatashin , Petr Mladek , "Rafael J . Wysocki" , Steven Chen , Yan Zhao , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Transition the KHO system to use the new page table data structures for managing preserved memory, replacing the previous xarray-based approach. Remove the serialization process and the associated finalization and abort logic. Update the methods for marking memory to be preserved to use the KHO page table hierarchy. Remove the former system of tracking preserved pages using an xarray-based structure. Change the method of passing preserved memory information to the next kernel to be direct. Instead of serializing the memory map, place the physical address of the `kho_order_table`, which holds the roots of the KHO page tables for each order, in the FDT. Remove the explicit `kho_finalize()` and `kho_abort()` functions and the logic supporting the finalize and abort states, as they are no longer needed. This simplifies the KHO lifecycle. Enable the next kernel's initialization process to read the `kho_order_table` address from the FDT. The kernel will then traverse the KHO page table structures to discover all preserved memory regions, reserving them to prevent early boot-time allocators from overwriting them. This architectural shift to using a shared page table structure simplifies the KHO design and eliminates the overhead of serializing and deserializing the preserved memory map. Signed-off-by: Jason Miu --- include/linux/kexec_handover.h | 17 -- kernel/kexec_handover.c | 532 +++++---------------------------- 2 files changed, 71 insertions(+), 478 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index 348844cffb13..c8229cb11f4b 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -19,23 +19,6 @@ enum kho_event { struct folio; struct notifier_block; =20 -#define DECLARE_KHOSER_PTR(name, type) \ - union { \ - phys_addr_t phys; \ - type ptr; \ - } name -#define KHOSER_STORE_PTR(dest, val) \ - ({ \ - typeof(val) v =3D val; \ - typecheck(typeof((dest).ptr), v); \ - (dest).phys =3D virt_to_phys(v); \ - }) -#define KHOSER_LOAD_PTR(src) = \ - ({ \ - typeof(src) s =3D src; \ - (typeof((s).ptr))((s).phys ? phys_to_virt((s).phys) : NULL); \ - }) - struct kho_serialization; =20 #ifdef CONFIG_KEXEC_HANDOVER diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 0daed51c8fb7..578d1c1b9cea 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -29,7 +29,7 @@ #include "kexec_internal.h" =20 #define KHO_FDT_COMPATIBLE "kho-v1" -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" +#define PROP_PRESERVED_ORDER_TABLE "preserved-order-table" #define PROP_SUB_FDT "fdt" =20 static bool kho_enable __ro_after_init; @@ -297,15 +297,7 @@ static int __kho_preserve_page_table(unsigned long pa,= int order) return 0; } =20 -/* - * TODO: __maybe_unused is added to the functions: - * kho_preserve_page_table() - * kho_walk_tables() - * kho_memblock_reserve() - * since they are not actually being called in this change. - * __maybe_unused will be removed in the next patch. - */ -static __maybe_unused int kho_preserve_page_table(unsigned long pfn, int o= rder) +static int kho_preserve_page_table(unsigned long pfn, int order) { unsigned long pa =3D PFN_PHYS(pfn); =20 @@ -365,8 +357,8 @@ static int __kho_walk_page_tables(int order, int level, return 0; } =20 -static __maybe_unused int kho_walk_page_tables(struct kho_page_table *top,= int order, - kho_walk_callback_t cb) +static int kho_walk_page_tables(struct kho_page_table *top, int order, + kho_walk_callback_t cb) { int num_table_level; =20 @@ -378,7 +370,7 @@ static __maybe_unused int kho_walk_page_tables(struct k= ho_page_table *top, int o return 0; } =20 -static __maybe_unused int kho_memblock_reserve(phys_addr_t pa, int order) +static int kho_memblock_reserve(phys_addr_t pa, int order) { int sz =3D 1 << (order + PAGE_SHIFT); struct page *page =3D phys_to_page(pa); @@ -390,143 +382,12 @@ static __maybe_unused int kho_memblock_reserve(phys_= addr_t pa, int order) return 0; } =20 -/* - * Keep track of memory that is to be preserved across KHO. - * - * The serializing side uses two levels of xarrays to manage chunks of per= -order - * 512 byte bitmaps. For instance if PAGE_SIZE =3D 4096, the entire 1G ord= er of a - * 1TB system would fit inside a single 512 byte bitmap. For order 0 alloc= ations - * each bitmap will cover 16M of address space. Thus, for 16G of memory at= most - * 512K of bitmap memory will be needed for order 0. - * - * This approach is fully incremental, as the serialization progresses fol= ios - * can continue be aggregated to the tracker. The final step, immediately = prior - * to kexec would serialize the xarray information into a linked list for = the - * successor kernel to parse. - */ - -#define PRESERVE_BITS (512 * 8) - -struct kho_mem_phys_bits { - DECLARE_BITMAP(preserve, PRESERVE_BITS); -}; - -struct kho_mem_phys { - /* - * Points to kho_mem_phys_bits, a sparse bitmap array. Each bit is sized - * to order. - */ - struct xarray phys_bits; -}; - -struct kho_mem_track { - /* Points to kho_mem_phys, each order gets its own bitmap tree */ - struct xarray orders; -}; - -struct khoser_mem_chunk; - struct kho_serialization { struct page *fdt; struct list_head fdt_list; struct dentry *sub_fdt_dir; - struct kho_mem_track track; - /* First chunk of serialized preserved memory map */ - struct khoser_mem_chunk *preserved_mem_map; }; =20 -static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size= _t sz) -{ - void *elm, *res; - - elm =3D xa_load(xa, index); - if (elm) - return elm; - - elm =3D kzalloc(sz, GFP_KERNEL); - if (!elm) - return ERR_PTR(-ENOMEM); - - res =3D xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); - if (xa_is_err(res)) - res =3D ERR_PTR(xa_err(res)); - - if (res) { - kfree(elm); - return res; - } - - return elm; -} - -static void __kho_unpreserve(struct kho_mem_track *track, unsigned long pf= n, - unsigned long end_pfn) -{ - struct kho_mem_phys_bits *bits; - struct kho_mem_phys *physxa; - - while (pfn < end_pfn) { - const unsigned int order =3D - min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); - const unsigned long pfn_high =3D pfn >> order; - - physxa =3D xa_load(&track->orders, order); - if (!physxa) - continue; - - bits =3D xa_load(&physxa->phys_bits, pfn_high / PRESERVE_BITS); - if (!bits) - continue; - - clear_bit(pfn_high % PRESERVE_BITS, bits->preserve); - - pfn +=3D 1 << order; - } -} - -static int __kho_preserve_order(struct kho_mem_track *track, unsigned long= pfn, - unsigned int order) -{ - struct kho_mem_phys_bits *bits; - struct kho_mem_phys *physxa, *new_physxa; - const unsigned long pfn_high =3D pfn >> order; - - might_sleep(); - - physxa =3D xa_load(&track->orders, order); - if (!physxa) { - int err; - - new_physxa =3D kzalloc(sizeof(*physxa), GFP_KERNEL); - if (!new_physxa) - return -ENOMEM; - - xa_init(&new_physxa->phys_bits); - physxa =3D xa_cmpxchg(&track->orders, order, NULL, new_physxa, - GFP_KERNEL); - - err =3D xa_err(physxa); - if (err || physxa) { - xa_destroy(&new_physxa->phys_bits); - kfree(new_physxa); - - if (err) - return err; - } else { - physxa =3D new_physxa; - } - } - - bits =3D xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, - sizeof(*bits)); - if (IS_ERR(bits)) - return PTR_ERR(bits); - - set_bit(pfn_high % PRESERVE_BITS, bits->preserve); - - return 0; -} - /* almost as free_reserved_page(), just don't free the page */ static void kho_restore_page(struct page *page, unsigned int order) { @@ -568,151 +429,29 @@ struct folio *kho_restore_folio(phys_addr_t phys) } EXPORT_SYMBOL_GPL(kho_restore_folio); =20 -/* Serialize and deserialize struct kho_mem_phys across kexec - * - * Record all the bitmaps in a linked list of pages for the next kernel to - * process. Each chunk holds bitmaps of the same order and each block of b= itmaps - * starts at a given physical address. This allows the bitmaps to be spars= e. The - * xarray is used to store them in a tree while building up the data struc= ture, - * but the KHO successor kernel only needs to process them once in order. - * - * All of this memory is normal kmalloc() memory and is not marked for - * preservation. The successor kernel will remain isolated to the scratch = space - * until it completes processing this list. Once processed all the memory - * storing these ranges will be marked as free. - */ - -struct khoser_mem_bitmap_ptr { - phys_addr_t phys_start; - DECLARE_KHOSER_PTR(bitmap, struct kho_mem_phys_bits *); -}; - -struct khoser_mem_chunk_hdr { - DECLARE_KHOSER_PTR(next, struct khoser_mem_chunk *); - unsigned int order; - unsigned int num_elms; -}; - -#define KHOSER_BITMAP_SIZE \ - ((PAGE_SIZE - sizeof(struct khoser_mem_chunk_hdr)) / \ - sizeof(struct khoser_mem_bitmap_ptr)) - -struct khoser_mem_chunk { - struct khoser_mem_chunk_hdr hdr; - struct khoser_mem_bitmap_ptr bitmaps[KHOSER_BITMAP_SIZE]; -}; - -static_assert(sizeof(struct khoser_mem_chunk) =3D=3D PAGE_SIZE); - -static struct khoser_mem_chunk *new_chunk(struct khoser_mem_chunk *cur_chu= nk, - unsigned long order) -{ - struct khoser_mem_chunk *chunk; - - chunk =3D kzalloc(PAGE_SIZE, GFP_KERNEL); - if (!chunk) - return NULL; - chunk->hdr.order =3D order; - if (cur_chunk) - KHOSER_STORE_PTR(cur_chunk->hdr.next, chunk); - return chunk; -} - -static void kho_mem_ser_free(struct khoser_mem_chunk *first_chunk) -{ - struct khoser_mem_chunk *chunk =3D first_chunk; - - while (chunk) { - struct khoser_mem_chunk *tmp =3D chunk; - - chunk =3D KHOSER_LOAD_PTR(chunk->hdr.next); - kfree(tmp); - } -} - -static int kho_mem_serialize(struct kho_serialization *ser) -{ - struct khoser_mem_chunk *first_chunk =3D NULL; - struct khoser_mem_chunk *chunk =3D NULL; - struct kho_mem_phys *physxa; - unsigned long order; - - xa_for_each(&ser->track.orders, order, physxa) { - struct kho_mem_phys_bits *bits; - unsigned long phys; - - chunk =3D new_chunk(chunk, order); - if (!chunk) - goto err_free; - - if (!first_chunk) - first_chunk =3D chunk; - - xa_for_each(&physxa->phys_bits, phys, bits) { - struct khoser_mem_bitmap_ptr *elm; - - if (chunk->hdr.num_elms =3D=3D ARRAY_SIZE(chunk->bitmaps)) { - chunk =3D new_chunk(chunk, order); - if (!chunk) - goto err_free; - } - - elm =3D &chunk->bitmaps[chunk->hdr.num_elms]; - chunk->hdr.num_elms++; - elm->phys_start =3D (phys * PRESERVE_BITS) - << (order + PAGE_SHIFT); - KHOSER_STORE_PTR(elm->bitmap, bits); - } - } - - ser->preserved_mem_map =3D first_chunk; - - return 0; - -err_free: - kho_mem_ser_free(first_chunk); - return -ENOMEM; -} - -static void __init deserialize_bitmap(unsigned int order, - struct khoser_mem_bitmap_ptr *elm) -{ - struct kho_mem_phys_bits *bitmap =3D KHOSER_LOAD_PTR(elm->bitmap); - unsigned long bit; - - for_each_set_bit(bit, bitmap->preserve, PRESERVE_BITS) { - int sz =3D 1 << (order + PAGE_SHIFT); - phys_addr_t phys =3D - elm->phys_start + (bit << (order + PAGE_SHIFT)); - struct page *page =3D phys_to_page(phys); - - memblock_reserve(phys, sz); - memblock_reserved_mark_noinit(phys, sz); - page->private =3D order; - } -} - static void __init kho_mem_deserialize(const void *fdt) { - struct khoser_mem_chunk *chunk; const phys_addr_t *mem; - int len; - - mem =3D fdt_getprop(fdt, 0, PROP_PRESERVED_MEMORY_MAP, &len); + int len, i; + struct kho_order_table *order_table; =20 + /* Retrieve the KHO order table from passed-in FDT. */ + mem =3D fdt_getprop(fdt, 0, PROP_PRESERVED_ORDER_TABLE, &len); if (!mem || len !=3D sizeof(*mem)) { - pr_err("failed to get preserved memory bitmaps\n"); + pr_err("failed to get preserved order table\n"); return; } =20 - chunk =3D *mem ? phys_to_virt(*mem) : NULL; - while (chunk) { - unsigned int i; + order_table =3D *mem ? + (struct kho_order_table *)phys_to_virt(*mem) : + NULL; =20 - for (i =3D 0; i !=3D chunk->hdr.num_elms; i++) - deserialize_bitmap(chunk->hdr.order, - &chunk->bitmaps[i]); - chunk =3D KHOSER_LOAD_PTR(chunk->hdr.next); + if (!order_table) + return; + + for (i =3D 0; i < HUGETLB_PAGE_ORDER + 1; i++) { + kho_walk_page_tables(kho_page_table(order_table->orders[i]), + i, kho_memblock_reserve); } } =20 @@ -977,25 +716,15 @@ EXPORT_SYMBOL_GPL(kho_add_subtree); =20 struct kho_out { struct blocking_notifier_head chain_head; - struct dentry *dir; - - struct mutex lock; /* protects KHO FDT finalization */ - struct kho_serialization ser; - bool finalized; }; =20 static struct kho_out kho_out =3D { .chain_head =3D BLOCKING_NOTIFIER_INIT(kho_out.chain_head), - .lock =3D __MUTEX_INITIALIZER(kho_out.lock), .ser =3D { .fdt_list =3D LIST_HEAD_INIT(kho_out.ser.fdt_list), - .track =3D { - .orders =3D XARRAY_INIT(kho_out.ser.track.orders, 0), - }, }, - .finalized =3D false, }; =20 int register_kho_notifier(struct notifier_block *nb) @@ -1023,12 +752,8 @@ int kho_preserve_folio(struct folio *folio) { const unsigned long pfn =3D folio_pfn(folio); const unsigned int order =3D folio_order(folio); - struct kho_mem_track *track =3D &kho_out.ser.track; - - if (kho_out.finalized) - return -EBUSY; =20 - return __kho_preserve_order(track, pfn, order); + return kho_preserve_page_table(pfn, order); } EXPORT_SYMBOL_GPL(kho_preserve_folio); =20 @@ -1045,14 +770,8 @@ EXPORT_SYMBOL_GPL(kho_preserve_folio); int kho_preserve_phys(phys_addr_t phys, size_t size) { unsigned long pfn =3D PHYS_PFN(phys); - unsigned long failed_pfn =3D 0; - const unsigned long start_pfn =3D pfn; const unsigned long end_pfn =3D PHYS_PFN(phys + size); int err =3D 0; - struct kho_mem_track *track =3D &kho_out.ser.track; - - if (kho_out.finalized) - return -EBUSY; =20 if (!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size)) return -EINVAL; @@ -1061,19 +780,14 @@ int kho_preserve_phys(phys_addr_t phys, size_t size) const unsigned int order =3D min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); =20 - err =3D __kho_preserve_order(track, pfn, order); - if (err) { - failed_pfn =3D pfn; - break; - } + err =3D kho_preserve_page_table(pfn, order); + if (err) + return err; =20 pfn +=3D 1 << order; } =20 - if (err) - __kho_unpreserve(track, start_pfn, failed_pfn); - - return err; + return 0; } EXPORT_SYMBOL_GPL(kho_preserve_phys); =20 @@ -1081,150 +795,6 @@ EXPORT_SYMBOL_GPL(kho_preserve_phys); =20 static struct dentry *debugfs_root; =20 -static int kho_out_update_debugfs_fdt(void) -{ - int err =3D 0; - struct fdt_debugfs *ff, *tmp; - - if (kho_out.finalized) { - err =3D kho_debugfs_fdt_add(&kho_out.ser.fdt_list, kho_out.dir, - "fdt", page_to_virt(kho_out.ser.fdt)); - } else { - list_for_each_entry_safe(ff, tmp, &kho_out.ser.fdt_list, list) { - debugfs_remove(ff->file); - list_del(&ff->list); - kfree(ff); - } - } - - return err; -} - -static int kho_abort(void) -{ - int err; - unsigned long order; - struct kho_mem_phys *physxa; - - xa_for_each(&kho_out.ser.track.orders, order, physxa) { - struct kho_mem_phys_bits *bits; - unsigned long phys; - - xa_for_each(&physxa->phys_bits, phys, bits) - kfree(bits); - - xa_destroy(&physxa->phys_bits); - kfree(physxa); - } - xa_destroy(&kho_out.ser.track.orders); - - if (kho_out.ser.preserved_mem_map) { - kho_mem_ser_free(kho_out.ser.preserved_mem_map); - kho_out.ser.preserved_mem_map =3D NULL; - } - - err =3D blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_ABORT, - NULL); - err =3D notifier_to_errno(err); - - if (err) - pr_err("Failed to abort KHO finalization: %d\n", err); - - return err; -} - -static int kho_finalize(void) -{ - int err =3D 0; - u64 *preserved_mem_map; - void *fdt =3D page_to_virt(kho_out.ser.fdt); - - err |=3D fdt_create(fdt, PAGE_SIZE); - err |=3D fdt_finish_reservemap(fdt); - err |=3D fdt_begin_node(fdt, ""); - err |=3D fdt_property_string(fdt, "compatible", KHO_FDT_COMPATIBLE); - /** - * Reserve the preserved-memory-map property in the root FDT, so - * that all property definitions will precede subnodes created by - * KHO callers. - */ - err |=3D fdt_property_placeholder(fdt, PROP_PRESERVED_MEMORY_MAP, - sizeof(*preserved_mem_map), - (void **)&preserved_mem_map); - if (err) - goto abort; - - err =3D kho_preserve_folio(page_folio(kho_out.ser.fdt)); - if (err) - goto abort; - - err =3D blocking_notifier_call_chain(&kho_out.chain_head, - KEXEC_KHO_FINALIZE, &kho_out.ser); - err =3D notifier_to_errno(err); - if (err) - goto abort; - - err =3D kho_mem_serialize(&kho_out.ser); - if (err) - goto abort; - - *preserved_mem_map =3D (u64)virt_to_phys(kho_out.ser.preserved_mem_map); - - err |=3D fdt_end_node(fdt); - err |=3D fdt_finish(fdt); - -abort: - if (err) { - pr_err("Failed to convert KHO state tree: %d\n", err); - kho_abort(); - } - - return err; -} - -static int kho_out_finalize_get(void *data, u64 *val) -{ - mutex_lock(&kho_out.lock); - *val =3D kho_out.finalized; - mutex_unlock(&kho_out.lock); - - return 0; -} - -static int kho_out_finalize_set(void *data, u64 _val) -{ - int ret =3D 0; - bool val =3D !!_val; - - mutex_lock(&kho_out.lock); - - if (val =3D=3D kho_out.finalized) { - if (kho_out.finalized) - ret =3D -EEXIST; - else - ret =3D -ENOENT; - goto unlock; - } - - if (val) - ret =3D kho_finalize(); - else - ret =3D kho_abort(); - - if (ret) - goto unlock; - - kho_out.finalized =3D val; - ret =3D kho_out_update_debugfs_fdt(); - -unlock: - mutex_unlock(&kho_out.lock); - return ret; -} - -DEFINE_DEBUGFS_ATTRIBUTE(fops_kho_out_finalize, kho_out_finalize_get, - kho_out_finalize_set, "%llu\n"); - static int scratch_phys_show(struct seq_file *m, void *v) { for (int i =3D 0; i < kho_scratch_cnt; i++) @@ -1265,11 +835,6 @@ static __init int kho_out_debugfs_init(void) if (IS_ERR(f)) goto err_rmdir; =20 - f =3D debugfs_create_file("finalize", 0600, dir, NULL, - &fops_kho_out_finalize); - if (IS_ERR(f)) - goto err_rmdir; - kho_out.dir =3D dir; kho_out.ser.sub_fdt_dir =3D sub_fdt_dir; return 0; @@ -1381,6 +946,35 @@ static __init int kho_in_debugfs_init(const void *fdt) return err; } =20 +static int kho_out_fdt_init(void) +{ + int err =3D 0; + void *fdt =3D page_to_virt(kho_out.ser.fdt); + u64 *preserved_order_table; + + err |=3D fdt_create(fdt, PAGE_SIZE); + err |=3D fdt_finish_reservemap(fdt); + err |=3D fdt_begin_node(fdt, ""); + err |=3D fdt_property_string(fdt, "compatible", KHO_FDT_COMPATIBLE); + + err |=3D fdt_property_placeholder(fdt, PROP_PRESERVED_ORDER_TABLE, + sizeof(*preserved_order_table), + (void **)&preserved_order_table); + if (err) + goto abort; + + *preserved_order_table =3D (u64)virt_to_phys(kho_order_table); + + err |=3D fdt_end_node(fdt); + err |=3D fdt_finish(fdt); + +abort: + if (err) + pr_err("Failed to convert KHO state tree: %d\n", err); + + return err; +} + static __init int kho_init(void) { int err =3D 0; @@ -1395,15 +989,26 @@ static __init int kho_init(void) goto err_free_scratch; } =20 + kho_order_table =3D (struct kho_order_table *) + kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!kho_order_table) { + err =3D -ENOMEM; + goto err_free_fdt; + } + + err =3D kho_out_fdt_init(); + if (err) + goto err_free_kho_order_table; + debugfs_root =3D debugfs_create_dir("kho", NULL); if (IS_ERR(debugfs_root)) { err =3D -ENOENT; - goto err_free_fdt; + goto err_free_kho_order_table; } =20 err =3D kho_out_debugfs_init(); if (err) - goto err_free_fdt; + goto err_free_kho_order_table; =20 if (fdt) { err =3D kho_in_debugfs_init(fdt); @@ -1431,6 +1036,9 @@ static __init int kho_init(void) =20 return 0; =20 +err_free_kho_order_table: + kfree(kho_order_table); + kho_order_table =3D NULL; err_free_fdt: put_page(kho_out.ser.fdt); kho_out.ser.fdt =3D NULL; @@ -1581,6 +1189,8 @@ int kho_fill_kimage(struct kimage *image) return 0; =20 image->kho.fdt =3D page_to_phys(kho_out.ser.fdt); + /* Preserve the memory page of FDT for the next kernel */ + kho_preserve_phys(image->kho.fdt, PAGE_SIZE); =20 scratch_size =3D sizeof(*kho_scratch) * kho_scratch_cnt; scratch =3D (struct kexec_buf){ --=20 2.51.0.384.g4c02a37b29-goog From nobody Thu Oct 2 11:49:14 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D2192F6181 for ; Wed, 17 Sep 2025 02:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077447; cv=none; b=gwO3aR8XZE+kRulfNvIjQjeRPJeRoDyRbjd7y7yof1IrZLfXZJR4eBTVc2s2BdjoDhNbiS8SWMNpFvGRX6R+11rBzfcZ+WXU3gXeW3GcoK2BfrP9QP8+GL6bn28Jrz/Eb3Nbps0QiFbCVaXcLBs39ziXE3XTlwFlIM13bUFUQY4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077447; c=relaxed/simple; bh=jyztMEGoywuu6yjn5VQ7guEnx23P6qZwoB6xeLfCf7E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=WKov8Xu/O/d/HXqX3Ml0D+uzQe52/lcS9V8XmUTalg8bH/Xto1EdOIWruEKEYmbZ8kaKM591Oj2otj9EMB21q1vEOUOSJHon33dseIGu8Y0UdulDIpa4DQDBLDp4PGMRnE+usj9I9Hm+2hEzcipgpZ7riwu1MXgm9cgsIoqg6mY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=bf7gX+CU; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bf7gX+CU" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-32eae48beaaso1045614a91.0 for ; Tue, 16 Sep 2025 19:50:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758077445; x=1758682245; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=lSn2CmdF8+yf4DtOsVnJBkMWdMtw6s5eBLN2Lm9vVpQ=; b=bf7gX+CUOT03XkBCViX0fQ/r/4ghv8BOfSit4Up9J8AUIW17CKMpkUoAFWf69Z1xPo KcY05KhcZJpOUDL53ZmgPvH4Avhz9s0+2UKEClDWOljVPpEUwEb7G/YZXlDqIVhXm12G gNenYHAjFhLEgzg/YCAomLHQYZy0yrJ/aHu+H0+rg/YqWyq7ko92xTGhqUd/udnaSe9j t6C8OHtq+/DXyM4KuCz0Ok7funseeejopTdQsQpewVLKRl0ZYMj+vAFXM6g0MWyj2gHq D6SbdhnnjOw6vm433dD+XQetN0ZPAQ9Z7shmKPz0HDJ5QM7S8cM2y5M5mHAt8jbhF0qZ Y4DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758077445; x=1758682245; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lSn2CmdF8+yf4DtOsVnJBkMWdMtw6s5eBLN2Lm9vVpQ=; b=ASencYoHYuO5mQGzTyZpIOGXPkjTfvzBVYU+ZQnsHgGRPkZ/Oo7jeA2CHlD5QjOY7H oWt/J199jJDrJx6HTz8agRQHaNmS431sASFWIZrr0JI24P04BMQLJPKWlkIlaz0aMtyF AapWfK2LjVrjTz/8VaeoIvlPtF2DsOmO+knuvbpdh66BmqDBQtUycKS7gBKS8cDkW1Hx mVs1HHIsdiTNbx7maY6BSYpkO6Gc7+TmHA8dslU5cN4iviXU+BHyyBZNsLNuMcs7YfUk H2cyjf1IwLZl6xRcaBLJ57MsoIDujGVVgskEWow4mn/dLyFjWfKRz4IW+n+ynB/AF/1t HivA== X-Forwarded-Encrypted: i=1; AJvYcCWJNmDaPi9tuPlZzHVQOgsR60oSP2Pd9RXOA3gZBh0Z2VRWfaiKjToJ93yPljhIWXKI57aHx7OSmzVbBOU=@vger.kernel.org X-Gm-Message-State: AOJu0YyXdDzCFCXJVVE0mO+UV2oW5GcIKwlPTpMQU+/GGm3OCmg1r3YK AoGTwpsuyxUNa7d4apoh1VVz9eFmCMgAUQgu5tuK26jCGLCfTGetw1sW07ilA1PlAnTD82ctZyO GXulCUyGPbLb1Ow== X-Google-Smtp-Source: AGHT+IHQ74t+zzjdi8h+z/5IY48HHRSYodyFCrNO4puZo9tU2iy1v7RraJy6Ayld7eqRNTEp/i2TMXedcJZ7fA== X-Received: from pjbdy13.prod.google.com ([2002:a17:90b:6cd:b0:32e:2c41:7def]) (user=jasonmiu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4fcf:b0:32b:9774:d340 with SMTP id 98e67ed59e1d1-32ee3f65f8fmr715320a91.33.1758077443958; Tue, 16 Sep 2025 19:50:43 -0700 (PDT) Date: Tue, 16 Sep 2025 19:50:18 -0700 In-Reply-To: <20250917025019.1585041-1-jasonmiu@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250917025019.1585041-1-jasonmiu@google.com> X-Mailer: git-send-email 2.51.0.384.g4c02a37b29-goog Message-ID: <20250917025019.1585041-4-jasonmiu@google.com> Subject: [RFC v1 3/4] memblock: Remove KHO notifier usage From: Jason Miu To: Alexander Graf , Andrew Morton , Baoquan He , Changyuan Lyu , David Matlack , David Rientjes , Jason Gunthorpe , Jason Miu , Joel Granados , Marcos Paulo de Souza , Mario Limonciello , Mike Rapoport , Pasha Tatashin , Petr Mladek , "Rafael J . Wysocki" , Steven Chen , Yan Zhao , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update memblock to use direct KHO API calls for memory preservation. Remove the KHO notifier registration and callbacks from the memblock subsystem. These notifiers were tied to the former KHO finalize and abort events, which are no longer used. Memblock now preserves its `reserve_mem` regions and registers its metadata by calling kho_preserve_phys(), kho_preserve_folio(), and kho_add_subtree() directly within its initialization function. This is made possible by changes in the KHO core to complete the FDT at a later stage, during kexec. Signed-off-by: Jason Miu --- include/linux/kexec_handover.h | 7 ++---- kernel/kexec_core.c | 4 +++ kernel/kexec_handover.c | 46 +++++++++++++++++++++------------- kernel/kexec_internal.h | 2 ++ mm/memblock.c | 46 ++++++++-------------------------- 5 files changed, 47 insertions(+), 58 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index c8229cb11f4b..e29dcf53de7e 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -19,15 +19,13 @@ enum kho_event { struct folio; struct notifier_block; =20 -struct kho_serialization; - #ifdef CONFIG_KEXEC_HANDOVER bool kho_is_enabled(void); =20 int kho_preserve_folio(struct folio *folio); int kho_preserve_phys(phys_addr_t phys, size_t size); struct folio *kho_restore_folio(phys_addr_t phys); -int kho_add_subtree(struct kho_serialization *ser, const char *name, void = *fdt); +int kho_add_subtree(const char *name, void *fdt); int kho_retrieve_subtree(const char *name, phys_addr_t *phys); =20 int register_kho_notifier(struct notifier_block *nb); @@ -58,8 +56,7 @@ static inline struct folio *kho_restore_folio(phys_addr_t= phys) return NULL; } =20 -static inline int kho_add_subtree(struct kho_serialization *ser, - const char *name, void *fdt) +static inline int kho_add_subtree(const char *name, void *fdt) { return -EOPNOTSUPP; } diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 31203f0bacaf..3cf33aaded17 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1147,6 +1147,10 @@ int kernel_kexec(void) goto Unlock; } =20 + error =3D kho_commit_fdt(); + if (error) + goto Unlock; + #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { /* diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 578d1c1b9cea..f7933b434364 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -682,9 +682,21 @@ static int kho_debugfs_fdt_add(struct list_head *list,= struct dentry *dir, return 0; } =20 +struct kho_out { + struct blocking_notifier_head chain_head; + struct dentry *dir; + struct kho_serialization ser; +}; + +static struct kho_out kho_out =3D { + .chain_head =3D BLOCKING_NOTIFIER_INIT(kho_out.chain_head), + .ser =3D { + .fdt_list =3D LIST_HEAD_INIT(kho_out.ser.fdt_list), + }, +}; + /** * kho_add_subtree - record the physical address of a sub FDT in KHO root = tree. - * @ser: serialization control object passed by KHO notifiers. * @name: name of the sub tree. * @fdt: the sub tree blob. * @@ -697,8 +709,9 @@ static int kho_debugfs_fdt_add(struct list_head *list, = struct dentry *dir, * * Return: 0 on success, error code on failure */ -int kho_add_subtree(struct kho_serialization *ser, const char *name, void = *fdt) +int kho_add_subtree(const char *name, void *fdt) { + struct kho_serialization *ser =3D &kho_out.ser; int err =3D 0; u64 phys =3D (u64)virt_to_phys(fdt); void *root =3D page_to_virt(ser->fdt); @@ -714,19 +727,6 @@ int kho_add_subtree(struct kho_serialization *ser, con= st char *name, void *fdt) } EXPORT_SYMBOL_GPL(kho_add_subtree); =20 -struct kho_out { - struct blocking_notifier_head chain_head; - struct dentry *dir; - struct kho_serialization ser; -}; - -static struct kho_out kho_out =3D { - .chain_head =3D BLOCKING_NOTIFIER_INIT(kho_out.chain_head), - .ser =3D { - .fdt_list =3D LIST_HEAD_INIT(kho_out.ser.fdt_list), - }, -}; - int register_kho_notifier(struct notifier_block *nb) { return blocking_notifier_chain_register(&kho_out.chain_head, nb); @@ -952,6 +952,7 @@ static int kho_out_fdt_init(void) void *fdt =3D page_to_virt(kho_out.ser.fdt); u64 *preserved_order_table; =20 + /* Do not close the root node and FDT until kho_commit_fdt() */ err |=3D fdt_create(fdt, PAGE_SIZE); err |=3D fdt_finish_reservemap(fdt); err |=3D fdt_begin_node(fdt, ""); @@ -965,9 +966,6 @@ static int kho_out_fdt_init(void) =20 *preserved_order_table =3D (u64)virt_to_phys(kho_order_table); =20 - err |=3D fdt_end_node(fdt); - err |=3D fdt_finish(fdt); - abort: if (err) pr_err("Failed to convert KHO state tree: %d\n", err); @@ -1211,6 +1209,18 @@ int kho_fill_kimage(struct kimage *image) return 0; } =20 +int kho_commit_fdt(void) +{ + int err =3D 0; + void *fdt =3D page_to_virt(kho_out.ser.fdt); + + /* Close the root node and commit the FDT */ + err =3D fdt_end_node(fdt); + err |=3D fdt_finish(fdt); + + return err; +} + static int kho_walk_scratch(struct kexec_buf *kbuf, int (*func)(struct resource *, void *)) { diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 228bb88c018b..490170911f5a 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -46,6 +46,7 @@ struct kexec_buf; int kho_locate_mem_hole(struct kexec_buf *kbuf, int (*func)(struct resource *, void *)); int kho_fill_kimage(struct kimage *image); +int kho_commit_fdt(void); #else static inline int kho_locate_mem_hole(struct kexec_buf *kbuf, int (*func)(struct resource *, void *)) @@ -54,5 +55,6 @@ static inline int kho_locate_mem_hole(struct kexec_buf *k= buf, } =20 static inline int kho_fill_kimage(struct kimage *image) { return 0; } +static inline int kho_commit_fdt(void) { return 0; } #endif /* CONFIG_KEXEC_HANDOVER */ #endif /* LINUX_KEXEC_INTERNAL_H */ diff --git a/mm/memblock.c b/mm/memblock.c index 117d963e677c..978717d59a6f 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -6,6 +6,7 @@ * Copyright (C) 2001 Peter Bergner. */ =20 +#include "asm-generic/memory_model.h" #include #include #include @@ -2510,39 +2511,6 @@ int reserve_mem_release_by_name(const char *name) #define RESERVE_MEM_KHO_NODE_COMPATIBLE "reserve-mem-v1" static struct page *kho_fdt; =20 -static int reserve_mem_kho_finalize(struct kho_serialization *ser) -{ - int err =3D 0, i; - - for (i =3D 0; i < reserved_mem_count; i++) { - struct reserve_mem_table *map =3D &reserved_mem_table[i]; - - err |=3D kho_preserve_phys(map->start, map->size); - } - - err |=3D kho_preserve_folio(page_folio(kho_fdt)); - err |=3D kho_add_subtree(ser, MEMBLOCK_KHO_FDT, page_to_virt(kho_fdt)); - - return notifier_from_errno(err); -} - -static int reserve_mem_kho_notifier(struct notifier_block *self, - unsigned long cmd, void *v) -{ - switch (cmd) { - case KEXEC_KHO_FINALIZE: - return reserve_mem_kho_finalize((struct kho_serialization *)v); - case KEXEC_KHO_ABORT: - return NOTIFY_DONE; - default: - return NOTIFY_BAD; - } -} - -static struct notifier_block reserve_mem_kho_nb =3D { - .notifier_call =3D reserve_mem_kho_notifier, -}; - static int __init prepare_kho_fdt(void) { int err =3D 0, i; @@ -2583,7 +2551,7 @@ static int __init prepare_kho_fdt(void) =20 static int __init reserve_mem_init(void) { - int err; + int err, i; =20 if (!kho_is_enabled() || !reserved_mem_count) return 0; @@ -2592,7 +2560,15 @@ static int __init reserve_mem_init(void) if (err) return err; =20 - err =3D register_kho_notifier(&reserve_mem_kho_nb); + for (i =3D 0; i < reserved_mem_count; i++) { + struct reserve_mem_table *map =3D &reserved_mem_table[i]; + + err |=3D kho_preserve_phys(map->start, map->size); + } + + err |=3D kho_preserve_folio(page_folio(kho_fdt)); + err |=3D kho_add_subtree(MEMBLOCK_KHO_FDT, page_to_virt(kho_fdt)); + if (err) { put_page(kho_fdt); kho_fdt =3D NULL; --=20 2.51.0.384.g4c02a37b29-goog From nobody Thu Oct 2 11:49:14 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7981B2F6574 for ; Wed, 17 Sep 2025 02:50:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077452; cv=none; b=nJtsZO6awINFqTHmH0TkPcBG37xsHhoIzkt9IZFeaJ1yXwFPETRQAUuIPWBdY0BsAs0XJrsYXPmj+vde5hCStHXVayOVVP8TqJGDrpMywk2uClA51w6ix0ginsp+ASOSHdi+8Fr/zxPa4I3QoQXRYd0zyqorIQ8pcZJFxU+qUvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758077452; c=relaxed/simple; bh=GskX7HywQRThAUfOJqR0ccdZH+mTXepYX+Yt7+guEf4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=cNZiBbT9mrG6/xes83oX0SYpZZUvcMZfrxJAWS9A1KFY1pKv1nmZ/O9TSvi1GLSB8f5/UeLIo+W81aemDSzHPS9a2kMJGcK3kp2KluLFJLByItrHi+54CH8INt8ktfbZOUWPdBf6hWxS7seBpPmKFXSM2QurA4BS5IgXH79Iv3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IEh5rxuT; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jasonmiu.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IEh5rxuT" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-24458345f5dso74983845ad.3 for ; Tue, 16 Sep 2025 19:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758077450; x=1758682250; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=B0DvEATy1boxO8jY+p5CHjM48FocEtwboNagjTq4rGo=; b=IEh5rxuTbRwRf/2fAfKPB3I7Hb3RONKTe5ba88/a6wmUCp2QFInJR6Z38FAY5HjTba 3/LA77wD003dk8yYPzzB1Z2dLm8SaJRBw0ap3SYewR5rlZFR6PAUeKgO4qiKo6UnNORb qL9vrXjwfGvpRC+vQmCs3y6Bxi3NptK71MrxjpWlPLufYtCqqpqiNj0jLx4NtCKLYQth I+kh4viPiPgUC+CJ9kkn/sdL0LYIH/3rPkWN/R4j3X4x5q81VtlIWNXwaLofSgibZ9M2 glKkHQ8DIqlhdGTqyk7o7xSERuZsBIZi4M5cSH2K/7PN/B2R0oTkY+cFG41vW3SMq/g4 RL1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758077450; x=1758682250; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=B0DvEATy1boxO8jY+p5CHjM48FocEtwboNagjTq4rGo=; b=g2v1yVwgnxELKM0+UslzGFnanME5a/DbP6U5zZRCAK5A5ejtXuLDwLnHn+zHtG6yRQ +4SzweWH6lQUzWsL1un8UBCD77NTSDUaj5n15Z/+uexXEfPCBPbg4CRqP+BLuOfGf/Jx 69wjITcxsGyhagr7+JS8qO0OTumWoNUwithm9DqrnLY3VYposhKiRn/emrO1OXBhWVoN d9GMUupC3frLOBPdTNSoWSn9HY4RXvXIdyFBQHX51GVBCXd2c5WFojABH7Xz3e+lB4sh x4EIdTFtmDpN55l7U0lyBDEcel2q7F/slzlItJvxUPtiG1WtX9S8Rqy1frSzMEPYECag XQTQ== X-Forwarded-Encrypted: i=1; AJvYcCU3QF5AXFbAZDKgdd1OT3ACyLQZKDeNW79oY2YawW7sCx3rn8HSlXam2xDRQxwZ7uHjaiddl978WOrA9gA=@vger.kernel.org X-Gm-Message-State: AOJu0YxfiPWd/KZKhwozxQscqfkqHTyEUHoj+Fc7QzWqQfoMsN8iZeBF R1jkSsJeXECZxHmdXJ7TDrNlGYBn62zuZnqvVstmMrLGN7m3Pq6Irk5TCDV8/egWKic6DPrcmO6 qYtvZaVO+/rpFXA== X-Google-Smtp-Source: AGHT+IERUNLZmWljnHVW/VI2Iz32N7whxhHeQqNofUwRKCrC4MDBM1MGWQZdKx/R1kYbmKHfIml8z0yP7qtaZA== X-Received: from pjk5.prod.google.com ([2002:a17:90b:5585:b0:32e:3646:77d6]) (user=jasonmiu job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:db02:b0:267:cdb8:c683 with SMTP id d9443c01a7336-268123806c5mr6135285ad.27.1758077449502; Tue, 16 Sep 2025 19:50:49 -0700 (PDT) Date: Tue, 16 Sep 2025 19:50:19 -0700 In-Reply-To: <20250917025019.1585041-1-jasonmiu@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250917025019.1585041-1-jasonmiu@google.com> X-Mailer: git-send-email 2.51.0.384.g4c02a37b29-goog Message-ID: <20250917025019.1585041-5-jasonmiu@google.com> Subject: [RFC v1 4/4] kho: Remove notifier system infrastructure From: Jason Miu To: Alexander Graf , Andrew Morton , Baoquan He , Changyuan Lyu , David Matlack , David Rientjes , Jason Gunthorpe , Jason Miu , Joel Granados , Marcos Paulo de Souza , Mario Limonciello , Mike Rapoport , Pasha Tatashin , Petr Mladek , "Rafael J . Wysocki" , Steven Chen , Yan Zhao , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the KHO notifier system. Eliminate the core KHO notifier API functions (`register_kho_notifier`, `unregister_kho_notifier`), the `kho_event` enum, and the notifier chain head from KHO internal structures. This infrastructure was used to support the now-removed finalize and abort states and is no longer required. Client subsystems now interact with KHO through direct API calls. Signed-off-by: Jason Miu --- include/linux/kexec_handover.h | 20 -------------------- kernel/kexec_handover.c | 15 --------------- 2 files changed, 35 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index e29dcf53de7e..09e8f0b0fcab 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -10,14 +10,7 @@ struct kho_scratch { phys_addr_t size; }; =20 -/* KHO Notifier index */ -enum kho_event { - KEXEC_KHO_FINALIZE =3D 0, - KEXEC_KHO_ABORT =3D 1, -}; - struct folio; -struct notifier_block; =20 #ifdef CONFIG_KEXEC_HANDOVER bool kho_is_enabled(void); @@ -28,9 +21,6 @@ struct folio *kho_restore_folio(phys_addr_t phys); int kho_add_subtree(const char *name, void *fdt); int kho_retrieve_subtree(const char *name, phys_addr_t *phys); =20 -int register_kho_notifier(struct notifier_block *nb); -int unregister_kho_notifier(struct notifier_block *nb); - void kho_memory_init(void); =20 void kho_populate(phys_addr_t fdt_phys, u64 fdt_len, phys_addr_t scratch_p= hys, @@ -66,16 +56,6 @@ static inline int kho_retrieve_subtree(const char *name,= phys_addr_t *phys) return -EOPNOTSUPP; } =20 -static inline int register_kho_notifier(struct notifier_block *nb) -{ - return -EOPNOTSUPP; -} - -static inline int unregister_kho_notifier(struct notifier_block *nb) -{ - return -EOPNOTSUPP; -} - static inline void kho_memory_init(void) { } diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index f7933b434364..62f654b08c74 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -16,7 +16,6 @@ #include #include #include -#include #include =20 #include @@ -683,13 +682,11 @@ static int kho_debugfs_fdt_add(struct list_head *list= , struct dentry *dir, } =20 struct kho_out { - struct blocking_notifier_head chain_head; struct dentry *dir; struct kho_serialization ser; }; =20 static struct kho_out kho_out =3D { - .chain_head =3D BLOCKING_NOTIFIER_INIT(kho_out.chain_head), .ser =3D { .fdt_list =3D LIST_HEAD_INIT(kho_out.ser.fdt_list), }, @@ -727,18 +724,6 @@ int kho_add_subtree(const char *name, void *fdt) } EXPORT_SYMBOL_GPL(kho_add_subtree); =20 -int register_kho_notifier(struct notifier_block *nb) -{ - return blocking_notifier_chain_register(&kho_out.chain_head, nb); -} -EXPORT_SYMBOL_GPL(register_kho_notifier); - -int unregister_kho_notifier(struct notifier_block *nb) -{ - return blocking_notifier_chain_unregister(&kho_out.chain_head, nb); -} -EXPORT_SYMBOL_GPL(unregister_kho_notifier); - /** * kho_preserve_folio - preserve a folio across kexec. * @folio: folio to preserve. --=20 2.51.0.384.g4c02a37b29-goog