From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C63E3DFC7F for ; Fri, 5 Jun 2026 18:35:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684517; cv=none; b=m9F1Fb3vteEepOTcelxEL7t695rV4FnIzdozzUnWHN22ePVIxW4yWAacuwrjP7Ek+eXgq+MSRYN/83jNwEdxlTfbHV8z37WFn9laAn0bZtQiAj63VPZArjPBoO+GWlMOZABzs0y22Ij7meX8LZl+3ZXMQhRUlkNoZFWXhlIEcFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684517; c=relaxed/simple; bh=DEmVf4hBXT3xj5Jrny/ZqsdGdCy1hngY/fKfDn72Lzs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TgLPOYRV6HmOZLXmRP4cIYnvBppH4mFl1UN36Qd6EfAKlEhBiyKVsuBJjZ205EGA2nSCmWoK3BA2kuR5hxSCO2ZL6oV0U9T9ivZsUm7xdqg50C1lU/R4GnCCa9NVj/VrIZ1fVK64D/x1C43SjTpQKXDx9N75kqufEweVtSy0gh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SXN+/qOS; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SXN+/qOS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C352D1F00898; Fri, 5 Jun 2026 18:35:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684514; bh=rkbkFtqpt3sYkBM3VkUIpTALQ213CQ+IiAxeYtzN8vI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=SXN+/qOS0mYT0P7lGbZkzsjqLv4mH5Hg1qm196JtmR3WJzUqiAvHYpLTZMOOcor+0 M2H+j+K9Or619ubBiJiQo5TLVtpMx82tyvP+Wf+pwmHtR1xGFFRnc/E3iNq5IqAqFd kGTsTqic+vBJjx9yq9nCeILOucactdDNjJ5oF+T6L5L/4e3tk9bW+Hax3ZKA4n0upp gpcEk6hawR6o2hgbwcRZ7OnP8/xNjmxk8bhx0M8XV4WMb2VoFEq3F94d8i2GfDBdrU ImeoaXqEReRH5YuNfMBswww2PXjI4s03LNominlNeVf2wKfXWz8XplSHsI1dYkbRy2 sK0AtL9AZ5ZWA== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 01/18] kho: generalize radix tree APIs Date: Fri, 5 Jun 2026 20:34:34 +0200 Message-ID: <20260605183501.3884950-2-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" The KHO radix tree is a data structure that can track the presence or absence of an arbitrary key, with nothing inherently tied to KHO memory preservation tracking. This was one of the design goals of the radix tree. This was done to enable it to be re-used by other users of KHO. Despite that, the radix tree APIs are very closely tied to KHO memory preservation tracking. Adding a key is done by kho_radix_add_page(), which encodes it as a page tracking operation and takes in PFN and order. kho_radix_del_page() does the same. These functions encode the key internally that goes into the radix tree. kho_radix_walk_tree() does the same by baking the PFN and order into the callback arguments. Generalize the APIs by taking the key directly and doing the encoding at the callers. Rename the functions to kho_radix_add_key() and kho_radix_del_key(). In practice, this removes a line each from the functions and moves the encoding function call to the callers. Similarly, update kho_radix_tree_walk_callback_t to take the key directly. Now that key encoding is no longer an inherent part of the radix tree and can be decided by the user, rename kho_radix_{encode,decode}_key() to kho_{encode,decode}_radix_key(). This moves them out of the "kho_radix_" name space into the "kho_" namespace. This emphasizes that this is KHO's way of encoding the key for its radix tree. Reviewed-by: Pasha Tatashin Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho_radix_tree.h | 18 +++---- kernel/liveupdate/kexec_handover.c | 76 ++++++++++++++---------------- 2 files changed, 42 insertions(+), 52 deletions(-) diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h index 84e918b96e53..f368f3b9f923 100644 --- a/include/linux/kho_radix_tree.h +++ b/include/linux/kho_radix_tree.h @@ -34,30 +34,24 @@ struct kho_radix_tree { struct mutex lock; /* protects the tree's structure and root pointer */ }; =20 -typedef int (*kho_radix_tree_walk_callback_t)(phys_addr_t phys, - unsigned int order); +typedef int (*kho_radix_tree_walk_callback_t)(unsigned long key); =20 #ifdef CONFIG_KEXEC_HANDOVER =20 -int kho_radix_add_page(struct kho_radix_tree *tree, unsigned long pfn, - unsigned int order); - -void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn, - unsigned int order); - +int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key); +void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key); int kho_radix_walk_tree(struct kho_radix_tree *tree, kho_radix_tree_walk_callback_t cb); =20 #else /* #ifdef CONFIG_KEXEC_HANDOVER */ =20 -static inline int kho_radix_add_page(struct kho_radix_tree *tree, long pfn, - unsigned int order) +static inline int kho_radix_add_key(struct kho_radix_tree *tree, unsigned = long key) { return -EOPNOTSUPP; } =20 -static inline void kho_radix_del_page(struct kho_radix_tree *tree, - unsigned long pfn, unsigned int order) { } +static inline void kho_radix_del_key(struct kho_radix_tree *tree, + unsigned long key) { } =20 static inline int kho_radix_walk_tree(struct kho_radix_tree *tree, kho_radix_tree_walk_callback_t cb) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 4834a809985a..7349cc82f6dc 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -85,7 +85,7 @@ static struct kho_out kho_out =3D { }; =20 /** - * kho_radix_encode_key - Encodes a physical address and order into a radi= x key. + * kho_encode_radix_key - Encodes a physical address and order into a radi= x key. * @phys: The physical address of the page. * @order: The order of the page. * @@ -95,7 +95,7 @@ static struct kho_out kho_out =3D { * * Return: The encoded unsigned long radix key. */ -static unsigned long kho_radix_encode_key(phys_addr_t phys, unsigned int o= rder) +static unsigned long kho_encode_radix_key(phys_addr_t phys, unsigned int o= rder) { /* Order bits part */ unsigned long h =3D 1UL << (KHO_ORDER_0_LOG2 - order); @@ -106,17 +106,17 @@ static unsigned long kho_radix_encode_key(phys_addr_t= phys, unsigned int order) } =20 /** - * kho_radix_decode_key - Decodes a radix key back into a physical address= and order. + * kho_decode_radix_key - Decodes a radix key back into a physical address= and order. * @key: The unsigned long key to decode. * @order: An output parameter, a pointer to an unsigned int where the dec= oded * page order will be stored. * - * This function reverses the encoding performed by kho_radix_encode_key(), + * This function reverses the encoding performed by kho_encode_radix_key(), * extracting the original physical address and page order from a given ke= y. * * Return: The decoded physical address. */ -static phys_addr_t kho_radix_decode_key(unsigned long key, unsigned int *o= rder) +static phys_addr_t kho_decode_radix_key(unsigned long key, unsigned int *o= rder) { unsigned int order_bit =3D fls64(key); phys_addr_t phys; @@ -144,24 +144,21 @@ static unsigned long kho_radix_get_table_index(unsign= ed long key, } =20 /** - * kho_radix_add_page - Marks a page as preserved in the radix tree. + * kho_radix_add_key - Add a key to the radix tree. * @tree: The KHO radix tree. - * @pfn: The page frame number of the page to preserve. - * @order: The order of the page. + * @key: The key to add. * - * This function traverses the radix tree based on the key derived from @p= fn - * and @order. It sets the corresponding bit in the leaf bitmap to mark the - * page for preservation. If intermediate nodes do not exist along the pat= h, - * they are allocated and added to the tree. + * This function traverses the radix tree based on the @key provided. It s= ets the + * corresponding bit in the leaf bitmap to mark the @key as present. If + * intermediate nodes do not exist along the path, they are allocated and = added + * to the tree. * * Return: 0 on success, or a negative error code on failure. */ -int kho_radix_add_page(struct kho_radix_tree *tree, - unsigned long pfn, unsigned int order) +int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key) { /* Newly allocated nodes for error cleanup */ struct kho_radix_node *intermediate_nodes[KHO_TREE_MAX_DEPTH] =3D { 0 }; - unsigned long key =3D kho_radix_encode_key(PFN_PHYS(pfn), order); struct kho_radix_node *anchor_node =3D NULL; struct kho_radix_node *node =3D tree->root; struct kho_radix_node *new_node; @@ -224,22 +221,19 @@ int kho_radix_add_page(struct kho_radix_tree *tree, =20 return err; } -EXPORT_SYMBOL_GPL(kho_radix_add_page); +EXPORT_SYMBOL_GPL(kho_radix_add_key); =20 /** - * kho_radix_del_page - Removes a page's preservation status from the radi= x tree. + * kho_radix_del_key - Removes the key from the radix tree. * @tree: The KHO radix tree. - * @pfn: The page frame number of the page to unpreserve. - * @order: The order of the page. + * @key: The key to remove. * * This function traverses the radix tree and clears the bit corresponding= to - * the page, effectively removing its "preserved" status. It does not free - * the tree's intermediate nodes, even if they become empty. + * the @key, effectively removing it from the tree. It does not free the t= ree's + * intermediate nodes, even if they become empty. */ -void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn, - unsigned int order) +void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key) { - unsigned long key =3D kho_radix_encode_key(PFN_PHYS(pfn), order); struct kho_radix_node *node =3D tree->root; struct kho_radix_leaf *leaf; unsigned int i, idx; @@ -270,21 +264,18 @@ void kho_radix_del_page(struct kho_radix_tree *tree, = unsigned long pfn, idx =3D kho_radix_get_bitmap_index(key); __clear_bit(idx, leaf->bitmap); } -EXPORT_SYMBOL_GPL(kho_radix_del_page); +EXPORT_SYMBOL_GPL(kho_radix_del_key); =20 static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf, unsigned long key, kho_radix_tree_walk_callback_t cb) { unsigned long *bitmap =3D (unsigned long *)leaf; - unsigned int order; - phys_addr_t phys; unsigned int i; int err; =20 for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { - phys =3D kho_radix_decode_key(key | i, &order); - err =3D cb(phys, order); + err =3D cb(key | i); if (err) return err; } @@ -332,15 +323,14 @@ static int __kho_radix_walk_tree(struct kho_radix_nod= e *root, } =20 /** - * kho_radix_walk_tree - Traverses the radix tree and calls a callback for= each preserved page. + * kho_radix_walk_tree - Traverses the radix tree and calls a callback for= each key. * @tree: A pointer to the KHO radix tree to walk. * @cb: A callback function of type kho_radix_tree_walk_callback_t that wi= ll be - * invoked for each preserved page found in the tree. The callback re= ceives - * the physical address and order of the preserved page. + * invoked for each key in the tree. * * This function walks the radix tree, searching from the specified top le= vel - * down to the lowest level (level 0). For each preserved page found, it i= nvokes - * the provided callback, passing the page's physical address and order. + * down to the lowest level (level 0). For each key found, it invokes the + * provided callback. * * Return: 0 if the walk completed the specified tree, or the non-zero ret= urn * value from the callback that stopped the walk. @@ -484,13 +474,16 @@ static struct page *__init kho_get_preserved_page(phy= s_addr_t phys, return pfn_to_page(pfn); } =20 -static int __init kho_preserved_memory_reserve(phys_addr_t phys, - unsigned int order) +static int __init kho_preserved_memory_reserve(unsigned long key) { union kho_page_info info; struct page *page; + unsigned int order; + phys_addr_t phys; u64 sz; =20 + phys =3D kho_decode_radix_key(key, &order); + sz =3D 1 << (order + PAGE_SHIFT); page =3D kho_get_preserved_page(phys, order); =20 @@ -859,7 +852,8 @@ int kho_preserve_folio(struct folio *folio) if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order))) return -EINVAL; =20 - return kho_radix_add_page(tree, pfn, order); + return kho_radix_add_key(tree, kho_encode_radix_key(PFN_PHYS(pfn), + order)); } EXPORT_SYMBOL_GPL(kho_preserve_folio); =20 @@ -877,7 +871,7 @@ void kho_unpreserve_folio(struct folio *folio) const unsigned long pfn =3D folio_pfn(folio); const unsigned int order =3D folio_order(folio); =20 - kho_radix_del_page(tree, pfn, order); + kho_radix_del_key(tree, kho_encode_radix_key(PFN_PHYS(pfn), order)); } EXPORT_SYMBOL_GPL(kho_unpreserve_folio); =20 @@ -906,7 +900,8 @@ static void __kho_unpreserve(struct kho_radix_tree *tre= e, while (pfn < end_pfn) { order =3D __kho_preserve_pages_order(pfn, end_pfn); =20 - kho_radix_del_page(tree, pfn, order); + kho_radix_del_key(tree, kho_encode_radix_key(PFN_PHYS(pfn), + order)); =20 pfn +=3D 1 << order; } @@ -939,7 +934,8 @@ int kho_preserve_pages(struct page *page, unsigned long= nr_pages) while (pfn < end_pfn) { unsigned int order =3D __kho_preserve_pages_order(pfn, end_pfn); =20 - err =3D kho_radix_add_page(tree, pfn, order); + err =3D kho_radix_add_key(tree, kho_encode_radix_key(PFN_PHYS(pfn), + order)); if (err) { failed_pfn =3D pfn; break; --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EC653E1226 for ; Fri, 5 Jun 2026 18:35:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684521; cv=none; b=bS6MnxAYoZokWadRAOFN2FenijbtMamepN2Sb9nVZt+qyh/PxfMFWX3KxkxptolWx0qnGBO+YchOjDu1LXOD7W77fHfwx2fOPc0ypMToY0g2fdyJEj53O6Ytqk4sioTEwDh/+KqY9kI2DsJdAMnC24zhHV0aZsM2icyL4ch1uAY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684521; c=relaxed/simple; bh=yecGgepi4VHH9x0JMtZfqxDIBnBxx9Y1T8tCyKLgYcY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CpWhz4ugkVs1CPQXF8wJSJ5ihhV4AClFL/D7ocL1Fsv3Kh12Y1hTvC6y3CoHgMsAMdtmjmR0GYzfC7cQXY5NknZaJOdrUhLRrc8+2YMMV+3G0ZX+LIs4y8MWhmcAdblGXgBN89N+Gpak0wncnC6bh/i1IfLOW4tbPlqez0Vh29s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eTBqi736; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eTBqi736" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 404861F0089B; Fri, 5 Jun 2026 18:35:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684517; bh=ioAGw4mo1ZSaLPYQr72a7mMnE73SA6K6tW1O9fGp0Eo=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=eTBqi736uwqUzUJI9Q+VNGsjRDZoHKuLa+3iVAAhbnHhWJxIXcxg8gU8z0tqWrtVv 9FCWlBj9VEcEa/BnpGozoyHllj034V/bRRKn+vHsuaxtG3LxCEzdPS2KRQx2L7kens btlbEQkkq3gzDNhJHd7W++WPQ/tKpmAukn0b3+OzlLlgAT3wwON9oh/5+Qda5zH/Jm 6FYzfGd2crcelE9VVBt3gw+qSb4VGtokhTO+III6C95A3b9aM/IBW6snlhZ6X3FdCJ ThGmbQaoWfLwvar6GMCXSLZABMu8DwNGXgyNGsv6i5gxMyOBjF3acia4WJ40dg7h+V 4oDX9OVkERArA== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 02/18] kho: disallow wide keys in radix tree Date: Fri, 5 Jun 2026 20:34:35 +0200 Message-ID: <20260605183501.3884950-3-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" The KHO radix tree was designed to track preserved pages. So it does not provide the capability to track any 64-bit key. Instead, it limits the key width to how much it needs for tracking PFNs and their orders. Limiting the width reduces the number of levels in the tree. KHO is not expected to be the only user of the radix tree. With the API generalized to allow other users, now it is possible to add any key to the tree. Check the key width at kho_radix_add_key(), and error out if it exceeds what the tree can handle. Do this instead of increasing the tree depth since right now there are no users that need to use wider keys, so this avoids memory overhead and ABI breakage. Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho/abi/kexec_handover.h | 8 ++++++++ kernel/liveupdate/kexec_handover.c | 12 ++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/linux/kho/abi/kexec_handover.h b/include/linux/kho/abi= /kexec_handover.h index fb2d37417ad9..6dbb98bfb586 100644 --- a/include/linux/kho/abi/kexec_handover.h +++ b/include/linux/kho/abi/kexec_handover.h @@ -278,6 +278,14 @@ enum kho_radix_consts { KHO_TABLE_SIZE_LOG2) + 1, }; =20 +/* + * The maximum key width this radix tree can track. + * + * This value isn't ABI itself, but it is derived from values that are ABI. + */ +#define KHO_RADIX_KEY_WIDTH (((KHO_TREE_MAX_DEPTH - 1) * KHO_TABLE_SIZE_LO= G2) + \ + KHO_BITMAP_SIZE_LOG2) + struct kho_radix_node { u64 table[1 << KHO_TABLE_SIZE_LOG2]; }; diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 7349cc82f6dc..e8454dc5b489 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -153,6 +153,11 @@ static unsigned long kho_radix_get_table_index(unsigne= d long key, * intermediate nodes do not exist along the path, they are allocated and = added * to the tree. * + * NOTE: Currently only keys of width up to %KHO_RADIX_KEY_WIDTH are suppo= rted. + * This limit only exists because current users of the radix tree don't us= e more + * than that. Changing the maximum width requires changing the tree depth,= which + * needs bumping the ABI version. + * * Return: 0 on success, or a negative error code on failure. */ int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key) @@ -169,6 +174,9 @@ int kho_radix_add_key(struct kho_radix_tree *tree, unsi= gned long key) if (WARN_ON_ONCE(!tree->root)) return -EINVAL; =20 + if (unlikely(fls64(key) > KHO_RADIX_KEY_WIDTH)) + return -ERANGE; + might_sleep(); =20 guard(mutex)(&tree->lock); @@ -241,6 +249,10 @@ void kho_radix_del_key(struct kho_radix_tree *tree, un= signed long key) if (WARN_ON_ONCE(!tree->root)) return; =20 + /* Keys wider than KHO_RADIX_KEY_WIDTH are not allowed to be added. */ + if (unlikely(fls64(key) > KHO_RADIX_KEY_WIDTH)) + return; + might_sleep(); =20 guard(mutex)(&tree->lock); --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 495623C5539 for ; Fri, 5 Jun 2026 18:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684521; cv=none; b=IDrHF/+An5T//tmO0GGyQaKQX68O96ZqtoiZjDsKwcDyqBo6U7XXrOyZsj2bLqw3FwunPckuHgp9RBK9X8jz/qwPaZlbkgGZHoA0LTpKYu8+quWd5TKS2VYHMF7KbaHLJRUYA+RqYgcFSw4b313d617t7DD6KTNBboZWJT75IVE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684521; c=relaxed/simple; bh=/UBD4SRfV209W6yXQvvrQH869Z4vAO+5/8zGi2mGhRg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MvCfYRS2Ab5GSdmixdhxwPz/E98clwKPid7u+5ZqO+9aHRvFhsnUHQcL+gojrcjxJ0uX6Y6XSLEYe8cSt54OJ5osAFfY67oblunmpNPB4hEiu+tpS1uzHSMiIFRlvNW7q2VMf4xeQQwBRBXlzIUzI/ph1gLkb/UXD/nYjlwsldo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XfzK9R9w; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XfzK9R9w" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B14F11F00898; Fri, 5 Jun 2026 18:35:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684519; bh=R6yDcpji1C+GYUKGpWyVAnr/m2Kie2TV2BOTVneeC1s=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=XfzK9R9w/+8j97Px+AmChM/ZPMAXQowShQTxBd5VUEBg+YEwWjMWhB0B6YsyVkPO+ GjB/7x0yCdaGCzZ8sxAZ/EE/hxP8QlxBiOJJ+gwE8p+KWWB/VuK1lk07m0UZTjsWBq cyHcRCjsUdF7+sZjLUA5E55RAgute94fLjtEcDtAbM7hQ1TM76OZEObpy+PL8+M66m kHWBuG90PapNjA3UsFO8LNgzplcVXSfi45kX9pPIaZiK7SocUu9CDbiBCr5JQDEXIZ OFbQtfkQS4PwE3ghGzotdfuDdTBG3uACNcT5bA+3546BO52jNNQK4scjWbvzP3++qY swHEtglvX23lw== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 03/18] kho: return virtual address of mem_map Date: Fri, 5 Jun 2026 20:34:36 +0200 Message-ID: <20260605183501.3884950-4-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Currently it is only used by kho_populate(), which doesn't care whether the address is virtual or physical and only cares that it exists and is valid. In coming patches, more callers will be added, all of which will need the virtual address. Make things simpler by directly returning the virtual address. Rename kho_get_mem_map_phys() to kho_get_mem_map() to accurately reflect what it returns. Signed-off-by: Pratyush Yadav (Google) --- kernel/liveupdate/kexec_handover.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index e8454dc5b489..d8dd0ede4f87 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -509,10 +509,11 @@ static int __init kho_preserved_memory_reserve(unsign= ed long key) return 0; } =20 -/* Returns physical address of the preserved memory map from FDT */ -static phys_addr_t __init kho_get_mem_map_phys(const void *fdt) +/* Returns virtual address of the preserved memory map from FDT */ +static __init void *kho_get_mem_map(const void *fdt) { const void *mem_ptr; + phys_addr_t mem_map_phys; int len; =20 mem_ptr =3D fdt_getprop(fdt, 0, KHO_FDT_MEMORY_MAP_PROP_NAME, &len); @@ -521,7 +522,11 @@ static phys_addr_t __init kho_get_mem_map_phys(const v= oid *fdt) return 0; } =20 - return get_unaligned((const u64 *)mem_ptr); + mem_map_phys =3D get_unaligned((const u64 *)mem_ptr); + if (!mem_map_phys) + return NULL; + + return phys_to_virt(mem_map_phys); } =20 /* @@ -1644,8 +1649,7 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fd= t_len, { unsigned int scratch_cnt =3D scratch_len / sizeof(*kho_scratch); struct kho_scratch *scratch =3D NULL; - phys_addr_t mem_map_phys; - void *fdt =3D NULL; + void *fdt =3D NULL, *mem_map; bool populated =3D false; int err; =20 @@ -1668,8 +1672,8 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fd= t_len, goto unmap_fdt; } =20 - mem_map_phys =3D kho_get_mem_map_phys(fdt); - if (!mem_map_phys) + mem_map =3D kho_get_mem_map(fdt); + if (!mem_map) goto unmap_fdt; =20 scratch =3D early_memremap(scratch_phys, scratch_len); --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 905B73D1A98 for ; Fri, 5 Jun 2026 18:35:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684524; cv=none; b=CVK0Nwg8EPtzjNO0jlNQSIQhKBambIh4xDNgZfKYAkNh6zEeSX6BxynuyEKZk/3JWiVpAUXu21A4WLiY7mKJ66UA3Wl1PiqTsET2wFNIX/OPIo1VOfl4kCQ9ZvfbEA7hW6whZu17d8X3ncimxu1EdGtpxg5eIBCkxM0ca/CdYig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684524; c=relaxed/simple; bh=M+4ZPwpYY+sNv/R8x76iTgCcpNLWjGm2xLtnh+Pu08E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ea/f92aDiD3MKjU8KECYzEcTTdv2kAaVW8sOiLKJbxVN8EYIZbub5TZmkBgEMN0d53oFKDi1hkuTWho2iiyiM0zadZ2xiBzc+YoJYJkp5lJvKjR0OupSeA50RLiRnhJs8CmKJU/6oRFJemviVjPrr2CPgmOQrk+Jf6STXuznz58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MPgdUrBe; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MPgdUrBe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E09D1F00893; Fri, 5 Jun 2026 18:35:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684522; bh=v/RGerRyxm8aTVCl41zoP38L2i7Ee87yOlG4ezqSjQo=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=MPgdUrBersvdIeiDJvd3LrUzlJMbmXc7u1YjP1PjvngfbnOg1wTMed/fISSH2vkVk eBSrWcyzbcOa0R9swpxLwaqTyUddu6ao3yUL8xKCDAN+N9Gp2VQqeXfynMFPggmv++ hMyo7fR3rde1IXG7mxJf3wKg9r9gOR/rJCgkI6xqG6n0sbmCkqc+B0kk/++ipOEwBF F7lKau1V1WtMfkSYGEK6vZYrTjxT5GZtaxWGKT8dPb/DE3jaKX3IuvrYdl1M27/k8F cwBxsx9XgdMuGiqDZMQ6GidAqfuioEjCuFMdGwOI1CaYau8zZ272Azz3K2wTHrBDDH Mjp/oWSKjc6DQ== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 04/18] kho: store incoming radix tree in kho_in Date: Fri, 5 Jun 2026 20:34:37 +0200 Message-ID: <20260605183501.3884950-5-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" This allows other functions to also use the radix tree. While at it, also use kho_get_mem_map_phys() instead of duplicating the code to get the radix tree root from the FDT. Signed-off-by: Pratyush Yadav (Google) --- kernel/liveupdate/kexec_handover.c | 32 ++++++++++++------------------ 1 file changed, 13 insertions(+), 19 deletions(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index d8dd0ede4f87..61e436f5077e 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -1334,6 +1334,7 @@ struct kho_in { char previous_release[__NEW_UTS_LEN + 1]; u32 kexec_count; struct kho_debugfs dbg; + struct kho_radix_tree radix_tree; }; =20 static struct kho_in kho_in =3D { @@ -1413,24 +1414,15 @@ EXPORT_SYMBOL_GPL(kho_retrieve_subtree); =20 static int __init kho_mem_retrieve(const void *fdt) { - struct kho_radix_tree tree; - const phys_addr_t *mem; - int len; - - /* Retrieve the KHO radix tree from passed-in FDT. */ - mem =3D fdt_getprop(fdt, 0, KHO_FDT_MEMORY_MAP_PROP_NAME, &len); - - if (!mem || len !=3D sizeof(*mem)) { - pr_err("failed to get preserved KHO memory tree\n"); - return -ENOENT; - } - - if (!*mem) - return -EINVAL; - - tree.root =3D phys_to_virt(*mem); - mutex_init(&tree.lock); - return kho_radix_walk_tree(&tree, kho_preserved_memory_reserve); + /* + * kho_get_mem_map() should always succeed. If it fails, kho_populate() + * catches that and never sets kho_in.scratch_phys, which stops memory + * retrieval. + */ + kho_in.radix_tree.root =3D kho_get_mem_map(fdt); + mutex_init(&kho_in.radix_tree.lock); + return kho_radix_walk_tree(&kho_in.radix_tree, + kho_preserved_memory_reserve); } =20 static __init int kho_out_fdt_setup(void) @@ -1637,8 +1629,10 @@ void __init kho_memory_init(void) if (kho_in.scratch_phys) { kho_scratch =3D phys_to_virt(kho_in.scratch_phys); =20 - if (kho_mem_retrieve(kho_get_fdt())) + if (kho_mem_retrieve(kho_get_fdt())) { kho_in.fdt_phys =3D 0; + kho_in.radix_tree.root =3D NULL; + } } else { kho_reserve_scratch(); } --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D76263E3DA2 for ; Fri, 5 Jun 2026 18:35:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684527; cv=none; b=OiWs+kFR1C27BjN69Lc6+PqcbIk5LAbvdIogO6gCqULdbmzBDXiQrXG+Ht11IzUx+/RGkC7eU99LCcIm9/uIJ6LJuYoZKMpdKWgPEjL7UVEok2RM28kp405AooKCVcrO5eOVXo/hmgaFlesWHe2T/ZSSSQ5TVyHCz1F8IrV+sVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684527; c=relaxed/simple; bh=Vnz++hGyFXtvQzIid6rIMpI4W5uPiUspW+qvtBe/yrM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L13Vim+bQMFzO1EplXxs5mVgy1pMBb9ZWqI2ouMZ9h3cnASSTvxUt8LMX5FKKUIzr1k4RswpIW8Ky7OCICGCxgy93GB45XaDbA36Ct/8eE6i2k2gugsZeBla7DZOdMjPBQDR65orcfbnKMnsCgaxvX07wrK6S2ZaToJ+5wjKf8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gmowAtOv; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gmowAtOv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F7341F00898; Fri, 5 Jun 2026 18:35:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684524; bh=alS6uT4DTp+WwL7sso5ivBySgAZ17Nint0mun5DyoD4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gmowAtOvxTzKu9ncHNPpRCnzGA8aVRacUBa9izDlXmH8W4c2eEeVmTQbE3rGtjedw 9ua31Na8GiOZ1dansxYyc4ww00X4kERyhigsjMj//6tPpR1dT8ta5FAYuC9rLTEbY6 clH238G1imQE2lF4pz4yqmu2Olzn6gfL3cS4YKiSBIBQxOkulAuY7RUm42r8PIKhIl AWQ4DPyn4eD0GIIoepvbCSkRashG4zd8VlKWBU/q5j/RtQ430n6KVgFAiCgEclekwd +GSPWHJEw/TboqaF5GTuSVNhantMko73vA//MVGrF5uImhBihA8fQztwHqDoFvqgWB zbxK+H2/vhaqQ== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 05/18] kho: move all memory retrieval logic to kho_mem_retrieve() Date: Fri, 5 Jun 2026 20:34:38 +0200 Message-ID: <20260605183501.3884950-6-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" The memory retrieval logic is spread out across kho_mem_retrieve() and kho_memory_init(). The incoming scratch area is initialized at kho_memory_init(), and the error handling is done there too. Consolidate all this logic into kho_mem_retrieve() to make the code cleaner. Signed-off-by: Pratyush Yadav (Google) --- kernel/liveupdate/kexec_handover.c | 31 ++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 61e436f5077e..7e556afae283 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -1412,8 +1412,13 @@ int kho_retrieve_subtree(const char *name, phys_addr= _t *phys, size_t *size) } EXPORT_SYMBOL_GPL(kho_retrieve_subtree); =20 -static int __init kho_mem_retrieve(const void *fdt) +static void __init kho_mem_retrieve(void) { + const void *fdt =3D kho_get_fdt(); + int err; + + kho_scratch =3D phys_to_virt(kho_in.scratch_phys); + /* * kho_get_mem_map() should always succeed. If it fails, kho_populate() * catches that and never sets kho_in.scratch_phys, which stops memory @@ -1421,8 +1426,16 @@ static int __init kho_mem_retrieve(const void *fdt) */ kho_in.radix_tree.root =3D kho_get_mem_map(fdt); mutex_init(&kho_in.radix_tree.lock); - return kho_radix_walk_tree(&kho_in.radix_tree, - kho_preserved_memory_reserve); + + err =3D kho_radix_walk_tree(&kho_in.radix_tree, kho_preserved_memory_rese= rve); + if (err) { + /* + * Failed to initialize preserved memory. Clear FDT and radix + * so KHO users don't treat it as a KHO boot. + */ + kho_in.fdt_phys =3D 0; + kho_in.radix_tree.root =3D NULL; + } } =20 static __init int kho_out_fdt_setup(void) @@ -1626,16 +1639,10 @@ fs_initcall(kho_init); =20 void __init kho_memory_init(void) { - if (kho_in.scratch_phys) { - kho_scratch =3D phys_to_virt(kho_in.scratch_phys); - - if (kho_mem_retrieve(kho_get_fdt())) { - kho_in.fdt_phys =3D 0; - kho_in.radix_tree.root =3D NULL; - } - } else { + if (kho_in.scratch_phys) + kho_mem_retrieve(); + else kho_reserve_scratch(); - } } =20 void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len, --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EEC03E3152 for ; Fri, 5 Jun 2026 18:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684529; cv=none; b=U5Gaz43T34spHU1SG500dPDbBoIqIIaGq3NjmV1rVzSGqyYfwP8FnFOigT7v8sPWxpF+WQ5TznMJKUDeY1p7usWBYBNGWuQR2SsxcEZh8WA3KQeSFR+TCMy2VsQDrrtpvVJliZ61srCB7XIpgdjKg/lOYvUdTcD+QXBrO8KM8e8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684529; c=relaxed/simple; bh=tSSQtFStz452Pkv/PmTb4DwMU7lzbc43oOnJYfN5GT4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RaR7gzSbSig9mRIOrDO04CJ1CJ0L3MJEImg42wEzyVBAqLW2phThZlqlctyADzBek5JrhPDPe0Cfwz87HoeXSy7TDdEtwlHeXRe3p0g/NkZUpt7O1C0WHpUeYYH0COLSTM1QMvnIyKUiMo60jDgPyRjgiCz1aUyZUCH83iwrnBQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AJSBHGNR; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AJSBHGNR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1CF961F0089A; Fri, 5 Jun 2026 18:35:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684527; bh=sy6ojk1d+tdK/kdjiYdwMCtmPtof9JNqEAxHhL3pkyA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=AJSBHGNRx27gBO59IJQX64bjQPvI1lQz/ZrLOX8eFzLfRcasMMQ5o5FADulVTfKgQ ttkoP2/FH7ji786zuFkiEhLxKcKDHPpsTdKxumBRrExnnUhzl5bMtXsSWvZtjcAzPG J0Y+o+5bKwOjoFj63tZcCk8pPoneEoNkdE/vuo/hOix7mndykoveQ8afXIZgG6oGbA zqRdEDlzG7rVlpmMiKRyV+BFZ3OaKYlMGa0cAXndLG6oy8yxnQ7kkT+qfmmFdfxxpT zxf0yCmyTuKUPGSxyi7N8I22P0B+SgxUvh6PRdAtZD/OlS4vTSvK3q09waOS9rZWz0 WoqZbaFpehRUQ== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 06/18] kho: add a struct for radix callbacks Date: Fri, 5 Jun 2026 20:34:39 +0200 Message-ID: <20260605183501.3884950-7-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" A future commit will add more callbacks for the KHO radix tree. Add a struct for collecting the callbacks. Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho_radix_tree.h | 15 ++++++++++++--- kernel/liveupdate/kexec_handover.c | 27 +++++++++++++++------------ 2 files changed, 27 insertions(+), 15 deletions(-) diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h index f368f3b9f923..426a9cc9bcde 100644 --- a/include/linux/kho_radix_tree.h +++ b/include/linux/kho_radix_tree.h @@ -34,14 +34,23 @@ struct kho_radix_tree { struct mutex lock; /* protects the tree's structure and root pointer */ }; =20 -typedef int (*kho_radix_tree_walk_callback_t)(unsigned long key); +/** + * struct kho_radix_walk_cb - Callbacks for KHO radix tree walk. + * @leaf: Called on each present key in the radix tree. + * + * For each callback, a return value of 0 continues the walk and a non-zero + * return value is directly returned to the caller. + */ +struct kho_radix_walk_cb { + int (*leaf)(unsigned long key); +}; =20 #ifdef CONFIG_KEXEC_HANDOVER =20 int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key); void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key); int kho_radix_walk_tree(struct kho_radix_tree *tree, - kho_radix_tree_walk_callback_t cb); + const struct kho_radix_walk_cb *cb); =20 #else /* #ifdef CONFIG_KEXEC_HANDOVER */ =20 @@ -54,7 +63,7 @@ static inline void kho_radix_del_key(struct kho_radix_tre= e *tree, unsigned long key) { } =20 static inline int kho_radix_walk_tree(struct kho_radix_tree *tree, - kho_radix_tree_walk_callback_t cb) + const struct kho_radix_walk_cb *cb) { return -EOPNOTSUPP; } diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 7e556afae283..dbe075348ce4 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -278,16 +278,18 @@ void kho_radix_del_key(struct kho_radix_tree *tree, u= nsigned long key) } EXPORT_SYMBOL_GPL(kho_radix_del_key); =20 -static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf, - unsigned long key, - kho_radix_tree_walk_callback_t cb) +static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf, unsigned long = key, + const struct kho_radix_walk_cb *cb) { unsigned long *bitmap =3D (unsigned long *)leaf; unsigned int i; int err; =20 + if (!cb->leaf) + return 0; + for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { - err =3D cb(key | i); + err =3D cb->leaf(key | i); if (err) return err; } @@ -297,7 +299,7 @@ static int kho_radix_walk_leaf(struct kho_radix_leaf *l= eaf, =20 static int __kho_radix_walk_tree(struct kho_radix_node *root, unsigned int level, unsigned long start, - kho_radix_tree_walk_callback_t cb) + const struct kho_radix_walk_cb *cb) { struct kho_radix_node *node; struct kho_radix_leaf *leaf; @@ -337,18 +339,16 @@ static int __kho_radix_walk_tree(struct kho_radix_nod= e *root, /** * kho_radix_walk_tree - Traverses the radix tree and calls a callback for= each key. * @tree: A pointer to the KHO radix tree to walk. - * @cb: A callback function of type kho_radix_tree_walk_callback_t that wi= ll be - * invoked for each key in the tree. + * @cb: Set of callbacks to be invoked during the tree walk. * - * This function walks the radix tree, searching from the specified top le= vel - * down to the lowest level (level 0). For each key found, it invokes the - * provided callback. + * This function walks the radix tree, searching from the top level down t= o the + * lowest level (level 0), invoking the appropriate callbacks. * * Return: 0 if the walk completed the specified tree, or the non-zero ret= urn * value from the callback that stopped the walk. */ int kho_radix_walk_tree(struct kho_radix_tree *tree, - kho_radix_tree_walk_callback_t cb) + const struct kho_radix_walk_cb *cb) { if (WARN_ON_ONCE(!tree->root)) return -EINVAL; @@ -1414,6 +1414,9 @@ EXPORT_SYMBOL_GPL(kho_retrieve_subtree); =20 static void __init kho_mem_retrieve(void) { + const struct kho_radix_walk_cb cb =3D { + .leaf =3D kho_preserved_memory_reserve, + }; const void *fdt =3D kho_get_fdt(); int err; =20 @@ -1427,7 +1430,7 @@ static void __init kho_mem_retrieve(void) kho_in.radix_tree.root =3D kho_get_mem_map(fdt); mutex_init(&kho_in.radix_tree.lock); =20 - err =3D kho_radix_walk_tree(&kho_in.radix_tree, kho_preserved_memory_rese= rve); + err =3D kho_radix_walk_tree(&kho_in.radix_tree, &cb); if (err) { /* * Failed to initialize preserved memory. Clear FDT and radix --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2ACC3D34A1 for ; Fri, 5 Jun 2026 18:35:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684531; cv=none; b=SvSH9t3MCKeVhF6YJj2DEn8Xk7Pfl6QUF1ZEwWn6sS1EfwGLNtZo3u79Ccp9OKsWkgV9glowJsmCI3XBPENeKdFhF7WqdGuRfQ9xcyYaqUi0p6zjqpEZQHVhugoffO5XoQjX8V5OGsD03thC9Zb2vQuBq0I4ukg4ZuE7vCJr/GM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684531; c=relaxed/simple; bh=ZmHYrRn2dd+FHcK0oWeoUFnAb2C1iqcvi8A5/qBbgsM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UP1KW5Hrj27L0UCKSx0Z5JlTdlUFcR7xmapWrT2E66HLQymvuE9j6VLCRB2WcNF2jQTpOsSnc0RDvqh7DEo7LmfDSWkDrmSJcjHevqa4GyM8FIIcfiZBvX4s9TTxWq638tvK+ESqLCtL4StXGxXnL2YmoO7mEceLjyXPoL9k/uY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CgCduHIB; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CgCduHIB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D7C21F00899; Fri, 5 Jun 2026 18:35:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684529; bh=//G582QmL83eTkSHOXL243u6P2jYvjAZreKxz5CziEA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=CgCduHIBqIPIikUteN8s7UcrkafZ328qitJTMRx9Gja3CtmQtqozZSz1Zo1cI3Q4a 0mf/tCWCSTcijpLqHNl9TEhAutmrsCk9GRJ+SYcGitEs482y4PCrL1BVdLCOEG9Nw4 5czURS+nx7/D/QTH4gdNW3LVBzzZLdXwdHsDgxSUn2+44JmtFlaUo9q6aiDvBG+Lgb qPA/1r89XudaEhwvOQFP0XGpFLG7C2ZCEeP6SmQ0AYVC9gh3MrgQoPi/Kr2z+gUrHP JdsoPk7nqhko/8Pvm3dp/u6IoE848ykhluss6wPDk/Al0xWHheka8i27MJ4VZZAOi3 LRjysH+vsMxAA== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 07/18] kho: add callback for table pages Date: Fri, 5 Jun 2026 20:34:40 +0200 Message-ID: <20260605183501.3884950-8-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" The KHO memory preservation radix tree does not mark the table pages themselves as preserved. This is done to avoid a circular dependency where preserving a page can lead of allocating other preserved pages. This means any walker looking for free ranges of memory outside of scratch areas will ignore the table Add a table callback that is invoked for each table page. The callback is given the physical address of the table page. This is useful for the upcoming mechanism that discovers blocks of memory with no preserved pages and lets them be used for boot memory. Another use case is for users of the radix tree other than KHO itself. The radix tree does not preserve its own pages due to the circular dependency described above. But external users of the radix tree would need to preserve and restore their pages for the radix tree to survive past early boot. They can use this callback to do so. Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho_radix_tree.h | 3 +++ kernel/liveupdate/kexec_handover.c | 12 ++++++++++++ 2 files changed, 15 insertions(+) diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h index 426a9cc9bcde..ac7ba7e567e1 100644 --- a/include/linux/kho_radix_tree.h +++ b/include/linux/kho_radix_tree.h @@ -37,12 +37,15 @@ struct kho_radix_tree { /** * struct kho_radix_walk_cb - Callbacks for KHO radix tree walk. * @leaf: Called on each present key in the radix tree. + * @node: Called on each node of the radix tree itself. Receives the + * physical address of the page containing the node. * * For each callback, a return value of 0 continues the walk and a non-zero * return value is directly returned to the caller. */ struct kho_radix_walk_cb { int (*leaf)(unsigned long key); + int (*node)(phys_addr_t phys); }; =20 #ifdef CONFIG_KEXEC_HANDOVER diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index dbe075348ce4..94f18fe42c4b 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -285,6 +285,12 @@ static int kho_radix_walk_leaf(struct kho_radix_leaf *= leaf, unsigned long key, unsigned int i; int err; =20 + if (cb->node) { + err =3D cb->node(virt_to_phys(leaf)); + if (err) + return err; + } + if (!cb->leaf) return 0; =20 @@ -307,6 +313,12 @@ static int __kho_radix_walk_tree(struct kho_radix_node= *root, unsigned int shift; int err; =20 + if (cb->node) { + err =3D cb->node(virt_to_phys(root)); + if (err) + return err; + } + for (i =3D 0; i < PAGE_SIZE / sizeof(phys_addr_t); i++) { if (!root->table[i]) continue; --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CA4E3E4C70 for ; Fri, 5 Jun 2026 18:35:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684534; cv=none; b=SlDWfyjhorBPWPJnikvKHFdexgWKXg/WyR6pybHW2SiNLJjImAvIkFlqiaLtpau7Y6Awt6/asf81dn3PhRQoDlhT15Y6ppms8r163qkNDzRG76gvEfDMUe+400yE+26e/Du8YJrn3iDPUESXXSLF1WsNyoSBUyRarw5lMbFEgN8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684534; c=relaxed/simple; bh=XOhw6LJKqqPTdd0dLRqGmLz0AM3gXn8BCu+xJjX9DtM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MvGjc+Z2c1dVhOcwNG6gCY84Gur7w3dKmKzBoJEhaZh26evt69ieYcwnA0xv1il1M1/nWKfDe6dFHoofW0rb0udvUGHri2HDHJbYO4cYln+PoJs9FZlr/jD9ZVCCC92yexJ4kXBrjj/bY1D/G4BUp2fqp/xLjH4LANQsQ348uh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ThcMUXFq; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ThcMUXFq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0BAF01F0089A; Fri, 5 Jun 2026 18:35:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684532; bh=8sc/XRFrwm98oAtw7zVK819UUwz2MaF8BMTpgUqYNBE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=ThcMUXFqJZ0AnThNzQZ/aK2AU9Xcxnfw9S4wGdBy7fsBsO8LAKZokAFOSjxWzk5sE Kd1A+4xWaypjRz/sJK7jlcnUhA0B6oob+CyqswxcuwebDx5RSgI8tuA0zK4QJUEq9n z0iOZAjfKbFC3oDzTVuUW/3y+OXxJ83Tjiv+teJqXxPP/Pnh7MNa7kc21xAFhFvhH9 PhsokDfB98GmGPz1WEuwQtz0K2ydZPOTA+rseSP09+apgqbjhcvCV1q8c/RmE/RKhA 22LsEmd1YPMIb29jGGOZ9D8Sey5mZv4h7ZIzv9R+jjFH8yPC5Auur1DE33jEz4aP4B rGXv4/cKms9FQ== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 08/18] kho: add data argument to radix walk callback Date: Fri, 5 Jun 2026 20:34:41 +0200 Message-ID: <20260605183501.3884950-9-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Add an opaque data pointer argument to kho_radix_walk_cb_t. This can be used by callers to pass extra information to the callback. Reviewed-by: Pasha Tatashin Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho_radix_tree.h | 8 ++++---- kernel/liveupdate/kexec_handover.c | 24 +++++++++++++----------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h index ac7ba7e567e1..4138621e0e87 100644 --- a/include/linux/kho_radix_tree.h +++ b/include/linux/kho_radix_tree.h @@ -44,8 +44,8 @@ struct kho_radix_tree { * return value is directly returned to the caller. */ struct kho_radix_walk_cb { - int (*leaf)(unsigned long key); - int (*node)(phys_addr_t phys); + int (*leaf)(unsigned long key, void *data); + int (*node)(phys_addr_t phys, void *data); }; =20 #ifdef CONFIG_KEXEC_HANDOVER @@ -53,7 +53,7 @@ struct kho_radix_walk_cb { int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key); void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key); int kho_radix_walk_tree(struct kho_radix_tree *tree, - const struct kho_radix_walk_cb *cb); + const struct kho_radix_walk_cb *cb, void *data); =20 #else /* #ifdef CONFIG_KEXEC_HANDOVER */ =20 @@ -66,7 +66,7 @@ static inline void kho_radix_del_key(struct kho_radix_tre= e *tree, unsigned long key) { } =20 static inline int kho_radix_walk_tree(struct kho_radix_tree *tree, - const struct kho_radix_walk_cb *cb) + const struct kho_radix_walk_cb *cb, void *data) { return -EOPNOTSUPP; } diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 94f18fe42c4b..b890a69bddd5 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -279,14 +279,14 @@ void kho_radix_del_key(struct kho_radix_tree *tree, u= nsigned long key) EXPORT_SYMBOL_GPL(kho_radix_del_key); =20 static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf, unsigned long = key, - const struct kho_radix_walk_cb *cb) + const struct kho_radix_walk_cb *cb, void *data) { unsigned long *bitmap =3D (unsigned long *)leaf; unsigned int i; int err; =20 if (cb->node) { - err =3D cb->node(virt_to_phys(leaf)); + err =3D cb->node(virt_to_phys(leaf), data); if (err) return err; } @@ -295,7 +295,7 @@ static int kho_radix_walk_leaf(struct kho_radix_leaf *l= eaf, unsigned long key, return 0; =20 for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { - err =3D cb->leaf(key | i); + err =3D cb->leaf(key | i, data); if (err) return err; } @@ -305,7 +305,7 @@ static int kho_radix_walk_leaf(struct kho_radix_leaf *l= eaf, unsigned long key, =20 static int __kho_radix_walk_tree(struct kho_radix_node *root, unsigned int level, unsigned long start, - const struct kho_radix_walk_cb *cb) + const struct kho_radix_walk_cb *cb, void *data) { struct kho_radix_node *node; struct kho_radix_leaf *leaf; @@ -314,7 +314,7 @@ static int __kho_radix_walk_tree(struct kho_radix_node = *root, int err; =20 if (cb->node) { - err =3D cb->node(virt_to_phys(root)); + err =3D cb->node(virt_to_phys(root), data); if (err) return err; } @@ -335,10 +335,10 @@ static int __kho_radix_walk_tree(struct kho_radix_nod= e *root, * node is pointing to the level 0 bitmap. */ leaf =3D (struct kho_radix_leaf *)node; - err =3D kho_radix_walk_leaf(leaf, key, cb); + err =3D kho_radix_walk_leaf(leaf, key, cb, data); } else { err =3D __kho_radix_walk_tree(node, level - 1, - key, cb); + key, cb, data); } =20 if (err) @@ -352,6 +352,7 @@ static int __kho_radix_walk_tree(struct kho_radix_node = *root, * kho_radix_walk_tree - Traverses the radix tree and calls a callback for= each key. * @tree: A pointer to the KHO radix tree to walk. * @cb: Set of callbacks to be invoked during the tree walk. + * @data: Opaque data pointer passed to each callback in @cb. * * This function walks the radix tree, searching from the top level down t= o the * lowest level (level 0), invoking the appropriate callbacks. @@ -360,14 +361,15 @@ static int __kho_radix_walk_tree(struct kho_radix_nod= e *root, * value from the callback that stopped the walk. */ int kho_radix_walk_tree(struct kho_radix_tree *tree, - const struct kho_radix_walk_cb *cb) + const struct kho_radix_walk_cb *cb, void *data) { if (WARN_ON_ONCE(!tree->root)) return -EINVAL; =20 guard(mutex)(&tree->lock); =20 - return __kho_radix_walk_tree(tree->root, KHO_TREE_MAX_DEPTH - 1, 0, cb); + return __kho_radix_walk_tree(tree->root, KHO_TREE_MAX_DEPTH - 1, 0, cb, + data); } EXPORT_SYMBOL_GPL(kho_radix_walk_tree); =20 @@ -498,7 +500,7 @@ static struct page *__init kho_get_preserved_page(phys_= addr_t phys, return pfn_to_page(pfn); } =20 -static int __init kho_preserved_memory_reserve(unsigned long key) +static int __init kho_preserved_memory_reserve(unsigned long key, void *da= ta) { union kho_page_info info; struct page *page; @@ -1442,7 +1444,7 @@ static void __init kho_mem_retrieve(void) kho_in.radix_tree.root =3D kho_get_mem_map(fdt); mutex_init(&kho_in.radix_tree.lock); =20 - err =3D kho_radix_walk_tree(&kho_in.radix_tree, &cb); + err =3D kho_radix_walk_tree(&kho_in.radix_tree, &cb, NULL); if (err) { /* * Failed to initialize preserved memory. Clear FDT and radix --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB1823ECBD9 for ; Fri, 5 Jun 2026 18:35:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684537; cv=none; b=STbhORQO8YJTmrOXrVzGhhNSVTRZYiN+ypi6G1VNf0sxjdxf+BGYtJrDAHByMkgWepQ5nI+GxmjK9Ge2vI8v1OD2ImGfSc6H+7cJf5Wm1o/+HXFKAdl9FEwKkKQTz43jJH9VeXwfvTZImzW+x1P/b9YeUo3pF3roNfHvqfgQI0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684537; c=relaxed/simple; bh=ZMus/+BO2Pl/1YIYSzuEuwEsLxaTcyn5kzeUqNBgxic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O3CTxkDSfPykzjeFO+ebfuuv7Qe42sCKwJ5lJ7w5ZjlhWj0z5I4tDY+8mU5UwCmqGz+6mqpXT2P9BgKcZtuynoOC+r7VENaNNaiYwf2gCwzaf7PzI7HghJRGW9A/IEMAwu/4zJ1QgZx5X3rX2c/CvyvAHuW591UFuj9VQCUVXcs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CsY8nuFv; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CsY8nuFv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D1E61F00893; Fri, 5 Jun 2026 18:35:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684534; bh=qTphUicBGxw+ljeZ1ORDyDZbANnd1GB5XieWEaaUYI4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=CsY8nuFvQ+DgMmh65ei90bv2MYJd/hBWXIrE9XLIGEI3BzPrS55ccke1KugAEMr2q fsdWabsyi+i/ee4Z8u2T9dKUwYd58AyFTYmr8UnT42taM0KChOZBTjJOdeQQi6jlFC 3EsQ31i+xm4ZWUiNYAUZfxIQ9gGyeaX3srtIp1duNND4yxhs38ecdPxADjb0OQ53iW jqRJKAnYn9hBvxXyDSO7OaF5gChGb97hTvdJLZq8vTswF3ve9ZfWZpDjeyMBalUmEq P+h7YcomJuRpKPQ/Np5eVsDVoF2nrAmnq1Wvqelv8Fas9UhoJi+Xw/UPi/z1DMoFtZ ABTaW9NCd4zXQ== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 09/18] kho: allow early-boot usage of the KHO radix tree Date: Fri, 5 Jun 2026 20:34:42 +0200 Message-ID: <20260605183501.3884950-10-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" The KHO radix tree allocates memory for table pages from the buddy allocator using get_zeroed_page(). This is not available in early boot when memblock is still active. Using the radix tree in early boot is useful for KHO to track metadata about its memory. One such example is for tracking free blocks for memory allocation when scratch runs out of space. This feature will be added in the following commits. Add kho_radix_{alloc,free}_node() which allocate and free the table pages. They use slab_is_available() to decide which allocator to use. While slab_is_available() indicates availability of the slab allocator, it gets initialized right after buddy so it serves the same practical purpose. Reviewed-by: Pasha Tatashin Signed-off-by: Pratyush Yadav (Google) --- kernel/liveupdate/kexec_handover.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index b890a69bddd5..452b4dcdf2d2 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -143,6 +143,26 @@ static unsigned long kho_radix_get_table_index(unsigne= d long key, return (key >> s) % (1 << KHO_TABLE_SIZE_LOG2); } =20 +static void __ref *kho_radix_alloc_node(void) +{ + struct kho_radix_node *node; + + if (slab_is_available()) + node =3D (struct kho_radix_node *)get_zeroed_page(GFP_KERNEL); + else + node =3D memblock_alloc(PAGE_SIZE, PAGE_SIZE); + + return node; +} + +static void __ref kho_radix_free_node(struct kho_radix_node *node) +{ + if (slab_is_available()) + free_page((unsigned long)node); + else + memblock_free(node, PAGE_SIZE); +} + /** * kho_radix_add_key - Add a key to the radix tree. * @tree: The KHO radix tree. @@ -191,7 +211,7 @@ int kho_radix_add_key(struct kho_radix_tree *tree, unsi= gned long key) } =20 /* Next node is empty, create a new node for it */ - new_node =3D (struct kho_radix_node *)get_zeroed_page(GFP_KERNEL); + new_node =3D kho_radix_alloc_node(); if (!new_node) { err =3D -ENOMEM; goto err_free_nodes; @@ -222,7 +242,7 @@ int kho_radix_add_key(struct kho_radix_tree *tree, unsi= gned long key) err_free_nodes: for (i =3D KHO_TREE_MAX_DEPTH - 1; i > 0; i--) { if (intermediate_nodes[i]) - free_page((unsigned long)intermediate_nodes[i]); + kho_radix_free_node(intermediate_nodes[i]); } if (anchor_node) anchor_node->table[anchor_idx] =3D 0; --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57CE43F23C9 for ; Fri, 5 Jun 2026 18:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684541; cv=none; b=RDkZpO7a1gpXkiW3tbL8U3fC/tR1wDj9TgLCqDkUCGvH43qm38KZ9gvSF8lO4v+8lMbDnIRsNtQXieJhyOwWL6m+JMt5hZv5u3nCeNLyu1y9h93mDF1xf/5rE3N9NXvlf/tq6YQx2fVOdLOF/rvb5CAzTfSBD9Qw+AjKLJ6PGbM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684541; c=relaxed/simple; bh=7WIR+aexSSPvq4UNHWiG9pWiz/TK26aFSuPVAtN9hE0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X6zYDxYKDMCLBfoTwgWrwPQaonCcKPHUJ4b/oW+Myp9inMofBgL1AjIrV24lt40nBiNTbyybtJNMHIDA2jW/OHlVgRttPDUN39DkU1f2QUdNEhxjvC6nLnJd12EDBe3z/Lo0VXhUIZbJbxjFkV4iTMUZs27oNsddVg1HuTSB2VE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ck4NKjcv; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ck4NKjcv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE3EB1F00898; Fri, 5 Jun 2026 18:35:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684537; bh=W+ieLiROXsrjmcbXie0/1ghO8w8NO2+45uMA2SUyf88=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Ck4NKjcv6KT67E4X0JZGiF3AgMSytQ+F9NG1+2UKCsfDerJ/WrWzB/nS8VfyWK2Hw yFviIheKsUScGgvfY+ZYCjZUfgxyQZab2SzgqXHccaRDIwn/TlVb3uwq56Nf0Q5gWM syPLIRi6Q+OMHb8FveP9D7hHXSdLQtDmTRY2Vdxh7oGq5x20RMa3oRtxe//x2RpbFa YQHRhzgcHATS8y4uSmLtyimCqznQM++24sXBO1Q5wmbvG+J+8Ku/0XaAYJ8rovnvTS lndZjmCbXaR0SodxcsI8bTiW93QerK8HEPpzgxpUQZTsfFs/m5BD+uOg5Zi4UF5A1T 0hbWpEPKoKMXw== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 10/18] kho: allow destroying KHO radix tree Date: Fri, 5 Jun 2026 20:34:43 +0200 Message-ID: <20260605183501.3884950-11-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Add kho_radix_destroy_tree() which allows destroying the radix tree and freeing all its pages. This is will be used by the upcoming scratch extension mechanism. It creates a radix tree to track free blocks and then frees them after telling memblock about them. Reviewed-by: Pasha Tatashin Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho_radix_tree.h | 3 +++ kernel/liveupdate/kexec_handover.c | 35 ++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h index 4138621e0e87..66ca936b3f06 100644 --- a/include/linux/kho_radix_tree.h +++ b/include/linux/kho_radix_tree.h @@ -54,6 +54,7 @@ int kho_radix_add_key(struct kho_radix_tree *tree, unsign= ed long key); void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key); int kho_radix_walk_tree(struct kho_radix_tree *tree, const struct kho_radix_walk_cb *cb, void *data); +void kho_radix_destroy_tree(struct kho_radix_tree *tree); =20 #else /* #ifdef CONFIG_KEXEC_HANDOVER */ =20 @@ -71,6 +72,8 @@ static inline int kho_radix_walk_tree(struct kho_radix_tr= ee *tree, return -EOPNOTSUPP; } =20 +static inline void kho_radix_destroy_tree(struct kho_radix_tree *tree) { } + #endif /* #ifdef CONFIG_KEXEC_HANDOVER */ =20 #endif /* _LINUX_KHO_RADIX_TREE_H */ diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 452b4dcdf2d2..df3f5eb01bf1 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -298,6 +298,41 @@ void kho_radix_del_key(struct kho_radix_tree *tree, un= signed long key) } EXPORT_SYMBOL_GPL(kho_radix_del_key); =20 +static void __kho_radix_destroy_tree(struct kho_radix_node *root, + unsigned int level) +{ + unsigned long i; + + if (level =3D=3D 0) { + kho_radix_free_node(root); + return; + } + + for (i =3D 0; i < PAGE_SIZE / sizeof(phys_addr_t); i++) { + if (root->table[i]) + __kho_radix_destroy_tree(phys_to_virt(root->table[i]), + level - 1); + } + + kho_radix_free_node(root); +} + +/** + * kho_radix_destroy_tree - Destroy the radix tree + * @tree: The radix tree to destroy + * + * Walk @tree and free all its nodes. + */ +void kho_radix_destroy_tree(struct kho_radix_tree *tree) +{ + if (!tree->root) + return; + + __kho_radix_destroy_tree(tree->root, KHO_TREE_MAX_DEPTH - 1); + tree->root =3D NULL; +} +EXPORT_SYMBOL_GPL(kho_radix_destroy_tree); + static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf, unsigned long = key, const struct kho_radix_walk_cb *cb, void *data) { --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C948C3EFFA5 for ; Fri, 5 Jun 2026 18:35:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684542; cv=none; b=EAtUHGQ0VDG7Dm5VV2wUrg44oHxj9fOBcT4KYlCsVyFuo2WG1Lggla5ZgNJO2NQ4Qe2YXY21rTS4P0y9rWke79UKooBJ/mZdEixhpUCTaxBdoJRWO8kbVzQVncT3NOocpvAun2ne3H+JdHTnSsaHWgyGYoUbRJA0OBuhDi6vLEY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684542; c=relaxed/simple; bh=LYhtDPZ7TaQSQFwKb1ftfd0EbDiUHnJjkobM7m/o0z0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MCBX3h2JEwqbD0RBJY0EF+8F1EHkZbNuewL4SMusbe9VCsPQoZiK+pE1eVVYh8/0GFI27aPwrtp/T+TLZWVITppro/SSZYxRc5u0dwd/VqNrfqnEucWhRTFQ/9Llaw7EICq66NOSavObH+NhyhjQcZ4jbWNAoN+0iN21tzzfE38= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TXKNcMEK; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TXKNcMEK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B9811F00893; Fri, 5 Jun 2026 18:35:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684539; bh=ZkWvf8Em4F4xpQ20Sn9ei7Pk/pLFMxxTuWJ2jy/x258=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=TXKNcMEK8ZHWObM629hgW5Sx/MnPTwMvjNI5ep5QCLxUwosJq3IHU+PK8DMXF0K+X nZ5jVgzl1ww8zJaj5c/HwyY1iOrlm8X6ygGSsszb/sqBbYLsABo9zI/r4nR6JN1+O6 AcYhdl4OvZw+wwDBZQ3t8kZPD3ro8dMdpuh0M50mSFJIpf43lAhtxxnfwXTf58U5N5 TRtIu0qUUY2HBJiekh/uJo0gNTV8bbDOiDOvc/NGbCBrOuLNWz1V/lbZnw57RrLB3L YAXVcnYpOB39Dz4G/t0GD38KXJ3JC22B7GiOkDPSPHRJxH4YApZddskdrzbZD2YRwt oTOz7+fqNCcew== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 11/18] kho: add kho_radix_init_tree() Date: Fri, 5 Jun 2026 20:34:44 +0200 Message-ID: <20260605183501.3884950-12-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Move the initialization logic of the radix tree into kho_radix_init_tree() instead of having users open-code it. Makes the boundaries cleaner and reduces code duplication when a new user of the radix tree will be added in a future commit. Signed-off-by: Pratyush Yadav (Google) --- include/linux/kho_radix_tree.h | 7 ++++ kernel/liveupdate/kexec_handover.c | 66 ++++++++++++++++++++++-------- 2 files changed, 55 insertions(+), 18 deletions(-) diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h index 66ca936b3f06..5d6ae2893684 100644 --- a/include/linux/kho_radix_tree.h +++ b/include/linux/kho_radix_tree.h @@ -54,6 +54,7 @@ int kho_radix_add_key(struct kho_radix_tree *tree, unsign= ed long key); void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key); int kho_radix_walk_tree(struct kho_radix_tree *tree, const struct kho_radix_walk_cb *cb, void *data); +int kho_radix_init_tree(struct kho_radix_tree *tree, struct kho_radix_node= *root); void kho_radix_destroy_tree(struct kho_radix_tree *tree); =20 #else /* #ifdef CONFIG_KEXEC_HANDOVER */ @@ -72,6 +73,12 @@ static inline int kho_radix_walk_tree(struct kho_radix_t= ree *tree, return -EOPNOTSUPP; } =20 +static inline int kho_radix_init_tree(struct kho_radix_tree *tree, + struct kho_radix_node *root) +{ + return 0; +} + static inline void kho_radix_destroy_tree(struct kho_radix_tree *tree) { } =20 #endif /* #ifdef CONFIG_KEXEC_HANDOVER */ diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index df3f5eb01bf1..8ab2c7e234e1 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -317,6 +317,34 @@ static void __kho_radix_destroy_tree(struct kho_radix_= node *root, kho_radix_free_node(root); } =20 +/** + * kho_radix_init_tree - initialize the radix tree. + * @tree: the tree to initialize. + * @root: root table of the radix tree. + * + * Initialize the radix tree with the given root node. If root is %NULL, an + * empty root table is allocated. If root is not %NULL, it is the caller's + * responsibility to make sure the root is valid and in the correct format. + * + * Return: 0 on success, -errno on failure. + */ +int kho_radix_init_tree(struct kho_radix_tree *tree, struct kho_radix_node= *root) +{ + /* Already initialized. */ + if (tree->root) + return 0; + + if (!root) + root =3D kho_radix_alloc_node(); + if (!root) + return -ENOMEM; + + tree->root =3D root; + mutex_init(&tree->lock); + return 0; +} +EXPORT_SYMBOL_GPL(kho_radix_init_tree); + /** * kho_radix_destroy_tree - Destroy the radix tree * @tree: The radix tree to destroy @@ -1496,18 +1524,23 @@ static void __init kho_mem_retrieve(void) * catches that and never sets kho_in.scratch_phys, which stops memory * retrieval. */ - kho_in.radix_tree.root =3D kho_get_mem_map(fdt); - mutex_init(&kho_in.radix_tree.lock); + err =3D kho_radix_init_tree(&kho_in.radix_tree, kho_get_mem_map(fdt)); + if (err) + goto err; =20 err =3D kho_radix_walk_tree(&kho_in.radix_tree, &cb, NULL); - if (err) { - /* - * Failed to initialize preserved memory. Clear FDT and radix - * so KHO users don't treat it as a KHO boot. - */ - kho_in.fdt_phys =3D 0; - kho_in.radix_tree.root =3D NULL; - } + if (err) + goto err; + + return; + +err: + /* + * Failed to initialize preserved memory. Clear FDT and radix so KHO + * users don't treat it as a KHO boot. + */ + kho_in.fdt_phys =3D 0; + kho_in.radix_tree.root =3D NULL; } =20 static __init int kho_out_fdt_setup(void) @@ -1633,16 +1666,14 @@ static __init int kho_init(void) if (!kho_enable) return 0; =20 - tree->root =3D kzalloc(PAGE_SIZE, GFP_KERNEL); - if (!tree->root) { - err =3D -ENOMEM; + err =3D kho_radix_init_tree(tree, NULL); + if (err) goto err_free_scratch; - } =20 kho_out.fdt =3D kho_alloc_preserve(PAGE_SIZE); if (IS_ERR(kho_out.fdt)) { err =3D PTR_ERR(kho_out.fdt); - goto err_free_kho_radix_tree_root; + goto err_free_kho_radix_tree; } =20 err =3D kho_debugfs_init(); @@ -1693,9 +1724,8 @@ static __init int kho_init(void) =20 err_free_fdt: kho_unpreserve_free(kho_out.fdt); -err_free_kho_radix_tree_root: - kfree(tree->root); - tree->root =3D NULL; +err_free_kho_radix_tree: + kho_radix_destroy_tree(tree); err_free_scratch: kho_out.fdt =3D NULL; for (int i =3D 0; i < kho_scratch_cnt; i++) { --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50B1C3EDE64 for ; Fri, 5 Jun 2026 18:35:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684545; cv=none; b=mVg1ehBWm5Kgb+ZYn6V85hggPMivB5Fe9a03h87yj3zV0sVr/V3cLxXH5rVpuEd2XAxgkhMBjQJzsVaOLx81P/wprFHI3UalrmAbFGST8gdCjf5gcJkKWdZzQa9tM8tat82niZniyT5aL1HdhkDdm4/Daa5KSbQjwTBuxxjy/0I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684545; c=relaxed/simple; bh=SfF9M9DuUmp6TbUqblSNcFDPXYUrpMVAIjfI/n/oID4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cpX7Qz0SX/b/nougNkbr2DibeEgTIHyfm59w9cvMdD7pHZ5uYGZORcJMywA0r/0EheGvqfAe01wTJjXwS/V4iD5lwrFRyVKnhfYvkSdwQqkCexpGIwh1bAYBZr7cc8wVF6rI0l9vav/mJ+1u6NtRUag45zhrCaXTqN9yW2VQ5pM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CCVRNwyb; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CCVRNwyb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DCB0A1F00899; Fri, 5 Jun 2026 18:35:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684542; bh=LOiNopfyqH+evMdJ4RgcQZeJWf2liHs2K6IwtzYjoWE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=CCVRNwyby+ygQUpFDO97aU/NeI3sJPjkf8B7Mrxw243tA405VrgkDHfzpVkBBL75D sCr3NTa3AYl5JZTLxRwqzRbnanUFMoh/VLphQ/GLQzn/4gznLTDZWd2iaQAQoc+RGh Mf/kq93HkUIqFIr1IWwHSPNVZ4jHg8gQz1vNXBMjbnx2HktU968oZ8H2jk7/8hges9 NvBQBbNXrvbyrsHM4bGiVPxBTCQLW6LMLNM6TQQ+u6adVZHnf6wzo57oKx3m7Q6+pi HKoIhKEVOIUyk/hFcoDJxNyFviWxzqFUNXy5mVs0zWcdGGtC7RRpOtDav5xI7xVF7p xS1nTEabVjOdA== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 12/18] kho: export kho_scratch_overlap() Date: Fri, 5 Jun 2026 20:34:45 +0200 Message-ID: <20260605183501.3884950-13-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Support for discovering memory blocks with no preserved memory will be added in coming patches. These areas will also be marked as scratch to allow allocations from them. Memblock will switch to looking through the scratch array to decide the right migratetype. Export kho_scratch_overlap(). Since it is now used by non-debug code, move it out of kexec_handover_debug.c and into kexec_handover.c. Gate the overlap checks in kho_preserve_folio() and kho_preserve_pages() by IS_ENABLED(CONFIG_KEXEC_HANDOVER_DEBUG) instead. Since kexec_handover_debug.c is now empty, delete it. No functional changes. Signed-off-by: Pratyush Yadav (Google) --- include/linux/kexec_handover.h | 7 ++++++ kernel/liveupdate/Makefile | 1 - kernel/liveupdate/kexec_handover.c | 22 ++++++++++++++++-- kernel/liveupdate/kexec_handover_debug.c | 25 --------------------- kernel/liveupdate/kexec_handover_internal.h | 9 -------- 5 files changed, 27 insertions(+), 37 deletions(-) delete mode 100644 kernel/liveupdate/kexec_handover_debug.c diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index 8968c56d2d73..3740c14d970d 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -40,6 +40,8 @@ void kho_memory_init(void); =20 void kho_populate(phys_addr_t fdt_phys, u64 fdt_len, phys_addr_t scratch_p= hys, u64 scratch_len); + +bool kho_scratch_overlap(phys_addr_t phys, size_t size); #else static inline bool kho_is_enabled(void) { @@ -116,6 +118,11 @@ static inline void kho_populate(phys_addr_t fdt_phys, = u64 fdt_len, phys_addr_t scratch_phys, u64 scratch_len) { } + +static inline bool kho_scratch_overlap(phys_addr_t phys, size_t size) +{ + return false; +} #endif /* CONFIG_KEXEC_HANDOVER */ =20 #endif /* LINUX_KEXEC_HANDOVER_H */ diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile index d2f779cbe279..dc352839ccf0 100644 --- a/kernel/liveupdate/Makefile +++ b/kernel/liveupdate/Makefile @@ -7,7 +7,6 @@ luo-y :=3D \ luo_session.o =20 obj-$(CONFIG_KEXEC_HANDOVER) +=3D kexec_handover.o -obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) +=3D kexec_handover_debug.o obj-$(CONFIG_KEXEC_HANDOVER_DEBUGFS) +=3D kexec_handover_debugfs.o =20 obj-$(CONFIG_LIVEUPDATE) +=3D luo.o diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 8ab2c7e234e1..a66f23a35389 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -766,6 +766,22 @@ static phys_addr_t __init scratch_size_node(int nid) return round_up(size, CMA_MIN_ALIGNMENT_BYTES); } =20 +bool kho_scratch_overlap(phys_addr_t phys, size_t size) +{ + phys_addr_t scratch_start, scratch_end; + unsigned int i; + + for (i =3D 0; i < kho_scratch_cnt; i++) { + scratch_start =3D kho_scratch[i].addr; + scratch_end =3D kho_scratch[i].addr + kho_scratch[i].size; + + if (phys < scratch_end && (phys + size) > scratch_start) + return true; + } + + return false; +} + /** * kho_reserve_scratch - Reserve a contiguous chunk of memory for kexec * @@ -963,7 +979,8 @@ int kho_preserve_folio(struct folio *folio) const unsigned long pfn =3D folio_pfn(folio); const unsigned int order =3D folio_order(folio); =20 - if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order))) + if (IS_ENABLED(CONFIG_KEXEC_HANDOVER_DEBUG) && + WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order))) return -EINVAL; =20 return kho_radix_add_key(tree, kho_encode_radix_key(PFN_PHYS(pfn), @@ -1040,7 +1057,8 @@ int kho_preserve_pages(struct page *page, unsigned lo= ng nr_pages) unsigned long failed_pfn =3D 0; int err =3D 0; =20 - if (WARN_ON(kho_scratch_overlap(start_pfn << PAGE_SHIFT, + if (IS_ENABLED(CONFIG_KEXEC_HANDOVER_DEBUG) && + WARN_ON(kho_scratch_overlap(start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT))) { return -EINVAL; } diff --git a/kernel/liveupdate/kexec_handover_debug.c b/kernel/liveupdate/k= exec_handover_debug.c deleted file mode 100644 index 6efb696f5426..000000000000 --- a/kernel/liveupdate/kexec_handover_debug.c +++ /dev/null @@ -1,25 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * kexec_handover_debug.c - kexec handover optional debug functionality - * Copyright (C) 2025 Google LLC, Pasha Tatashin - */ - -#define pr_fmt(fmt) "KHO: " fmt - -#include "kexec_handover_internal.h" - -bool kho_scratch_overlap(phys_addr_t phys, size_t size) -{ - phys_addr_t scratch_start, scratch_end; - unsigned int i; - - for (i =3D 0; i < kho_scratch_cnt; i++) { - scratch_start =3D kho_scratch[i].addr; - scratch_end =3D kho_scratch[i].addr + kho_scratch[i].size; - - if (phys < scratch_end && (phys + size) > scratch_start) - return true; - } - - return false; -} diff --git a/kernel/liveupdate/kexec_handover_internal.h b/kernel/liveupdat= e/kexec_handover_internal.h index 0399ff107775..805d2a76c388 100644 --- a/kernel/liveupdate/kexec_handover_internal.h +++ b/kernel/liveupdate/kexec_handover_internal.h @@ -41,13 +41,4 @@ static inline void kho_debugfs_blob_remove(struct kho_de= bugfs *dbg, void *blob) { } #endif /* CONFIG_KEXEC_HANDOVER_DEBUGFS */ =20 -#ifdef CONFIG_KEXEC_HANDOVER_DEBUG -bool kho_scratch_overlap(phys_addr_t phys, size_t size); -#else -static inline bool kho_scratch_overlap(phys_addr_t phys, size_t size) -{ - return false; -} -#endif /* CONFIG_KEXEC_HANDOVER_DEBUG */ - #endif /* LINUX_KEXEC_HANDOVER_INTERNAL_H */ --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B35313F4825 for ; Fri, 5 Jun 2026 18:35:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684547; cv=none; b=YOa0FF7AFOv/dqJd3WXgNqfYP3qvaBt3wZ+nnj4oWI++mUV+3EeWvNMjLLXR85xeP5y4pZZ8Sg8DQ0DxeupbZvrqEQ/i6VcIMBAMufQ+b0qiAaEIlnxswYPwG0hWiPNTLMTP3HwTDElHdPp6uNPorSsfZbzvauRUZja8DTK3v0E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684547; c=relaxed/simple; bh=lw+5TYsAVHh2euhs/gGmR+V1YcnV1T1/1LFm92fmVsc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gzUY+mphuKP3mOrAyQ+WvXDSlqJTXhWIVUX0LQn54P0TWYszjgTFNw+Xbc+8G8O+KjWXmYNA/RxmWQK34vbCXzT7EUrpmsbOBYRab45XbG3E5y6APY2IECUPpdKeAygoe6e+yjwPVXh4nmji5XCmipC1lUl2dMqRhjk2NBgEhnQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BydKaYKs; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BydKaYKs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5A1941F0089B; Fri, 5 Jun 2026 18:35:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684544; bh=MPgTzFffYY8+ZzNyhPkuE1ZkvrAANKQQVvr8ndIw9wg=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=BydKaYKsPPMaQYb1m+R//+fdgzANeeVCNM2zxeJ3Ypk4tYKIYwwCiv9GLWp4ulG1h Hc0purKydXxPZ3b9Mx47+4HFBVaDU+3r7gQGNx/2yAa3EfqD+rQdVRvVDlrHpMsPDW bW5KJdUUu3EF28QEw65ZeOfcTPHx3ZyqRN2Pl8WUd/c8pPfpC41wFOsOqYPfXs7Flb vtRghhCn81f64hN8Gbj0kumucDSXUFDTX5zyeWyDBkFFc30dUcc9meDdHNPc18VGId U5YcKmUUpNpAS/Tl6BvmYi4/U4k/iJw2G0iQX3Y1aSUT/0sF1Cwo+Cv0Bi7hTXg2nV 3wRBscIKlW1Bw== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 13/18] kho: initialize kho_scratch pointer earlier in boot Date: Fri, 5 Jun 2026 20:34:46 +0200 Message-ID: <20260605183501.3884950-14-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" In a future patch, mm init will use kho_scratch_overlap() for deciding the migrate type of pageblocks it initializes. The earliest user currently is free_area_init(). kho_scratch_overlap() relies on kho_scratch pointer being initialized. Introduce kho_memory_init_early() to do this. kho_populate() would normally be a good place to do this, but unfortunately, phys_to_virt() does not work at that point on ARM64. So we need yet another initialization function. Signed-off-by: Pratyush Yadav (Google) --- include/linux/kexec_handover.h | 3 +++ kernel/liveupdate/kexec_handover.c | 12 ++++++++++-- mm/mm_init.c | 1 + 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index 3740c14d970d..9e961032e06b 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -37,6 +37,7 @@ void kho_remove_subtree(void *blob); int kho_retrieve_subtree(const char *name, phys_addr_t *phys, size_t *size= ); =20 void kho_memory_init(void); +void kho_memory_init_early(void); =20 void kho_populate(phys_addr_t fdt_phys, u64 fdt_len, phys_addr_t scratch_p= hys, u64 scratch_len); @@ -114,6 +115,8 @@ static inline int kho_retrieve_subtree(const char *name= , phys_addr_t *phys, =20 static inline void kho_memory_init(void) { } =20 +static inline void kho_memory_init_early(void) { } + static inline void kho_populate(phys_addr_t fdt_phys, u64 fdt_len, phys_addr_t scratch_phys, u64 scratch_len) { diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index a66f23a35389..af22086ca2d6 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -1535,8 +1535,6 @@ static void __init kho_mem_retrieve(void) const void *fdt =3D kho_get_fdt(); int err; =20 - kho_scratch =3D phys_to_virt(kho_in.scratch_phys); - /* * kho_get_mem_map() should always succeed. If it fails, kho_populate() * catches that and never sets kho_in.scratch_phys, which stops memory @@ -1757,6 +1755,16 @@ static __init int kho_init(void) } fs_initcall(kho_init); =20 +void __init kho_memory_init_early(void) +{ + /* + * kho_scratch_overlap() needs kho_scratch to be initialized. It is used + * by free_area_init() on KHO boots, so initialize it early. + */ + if (kho_in.scratch_phys) + kho_scratch =3D phys_to_virt(kho_in.scratch_phys); +} + void __init kho_memory_init(void) { if (kho_in.scratch_phys) diff --git a/mm/mm_init.c b/mm/mm_init.c index eddc0f03a779..0675837bbfc9 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2688,6 +2688,7 @@ void __init __weak mem_init(void) =20 void __init mm_core_init_early(void) { + kho_memory_init_early(); hugetlb_cma_reserve(); hugetlb_bootmem_alloc(); =20 --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E7A33F660D for ; Fri, 5 Jun 2026 18:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684549; cv=none; b=ZL5DxuArY+qDdxLt0d3u30gQuLE2FqCXz0TqqDhAkxANMPRPCnBmu9UUorNhm4iGRdTPaRt9Bb4pskzd+IiM03CR/S9wapY0uOTOvTn8qbVx5bEvqG3UVUtuVFy/WhD62kXhhC8fWekwIfUlqZciyQNIulkLkjiN1Puo65elmX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684549; c=relaxed/simple; bh=qZ1gtfy3ZM9GrZ9GxBmrf7eivL3lnB6ZFkN43dxUSKk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LEv+A6huOZBZtcVUtt6G0QRZonCb423ubMu7QInUFJC7QRRUvK7YeEOjSSoiCAz6GOd07K5bpHX1wKadHwISJYm9K2owCKkzdwekXJUesgqMzEidEtsbhD+8/aSreM3SQHl1YPM6J6vH1UTIHqI9vxkIQEgK3u6C3B4q+Smuo40= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PLUDm4rU; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PLUDm4rU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB91E1F0089F; Fri, 5 Jun 2026 18:35:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684546; bh=/QT3CI10YOMV9hUEPRhdgcJ9QvcUudLRgOrpPReLPq4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=PLUDm4rUgoMvsHQTE1pGmbmzwZJfnKYM4r0Fl12Iiu55XvxbEQyIPEmbsBBCSIKAG SpQO6BJkJDSC6dVEv5wNBaRX43ykIXIxv4QMJKlIuqJNysvE4HD/rz32IZSeOg7lIE Wkm2HwaqEWxn/T/okFVc4wz4MkgvIqqg+nCvx8y4eOMT32nFZX/HnECOpnrXZzsOV4 gov6d2M2CfqI69tunKAwHVdRNCZJgzC66LztW+WFKFk9v3juqRCrCnBiP4aEdpHrwF rs27b8UYvAwU0ogmnnlz27raGiyjlJ1taOQx6kNCb4BbXeEoNyQPTc78Tsma1gBM1i 7fs6PMOVnY/2g== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 14/18] memblock: use kho_scratch_overlap() to decide migratetype Date: Fri, 5 Jun 2026 20:34:47 +0200 Message-ID: <20260605183501.3884950-15-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Support for discovering memory blocks with no preserved memory will be added in coming patches. These areas will also be marked as MEMBLOCK_KHO_SCRATCH to allow allocations from them. But only the scratch areas passed to KHO should be marked as MIGRATE_CMA, all others should be left as normal. So instead of checking the flags on the region, ask KHO to loop through its scratch array. Signed-off-by: Pratyush Yadav (Google) --- include/linux/memblock.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 5afcd99aa8c1..546d7ef798b8 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -11,6 +11,7 @@ #include #include #include +#include =20 extern unsigned long max_low_pfn; extern unsigned long min_low_pfn; @@ -618,7 +619,7 @@ bool memblock_is_kho_scratch_memory(phys_addr_t addr); static inline enum migratetype kho_scratch_migratetype(unsigned long pfn, enum migratetype mt) { - if (memblock_is_kho_scratch_memory(PFN_PHYS(pfn))) + if (kho_scratch_overlap(PFN_PHYS(pfn), pageblock_nr_pages << PAGE_SHIFT)) return MIGRATE_CMA; return mt; } --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6CEE3F7A9A for ; Fri, 5 Jun 2026 18:35:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684551; cv=none; b=kAq5P8sdEz3Hi6eDNhuv32AGg4GXX+5ZmRb8kN5+UU5pGip2rxkTaAkxTfTbOSMcWpJSK30aP8gxQYBdc1LzamjLnUL9dgnN8zTnl3Xf146niesuwkfUaBhVqvXMxqZ+UXzrRzC4SXlIiq3DuQqZHAIOqo5DGoUYEYWGuY3gtWQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684551; c=relaxed/simple; bh=6fT8XQ5B7UNb/YI6/i6I+SkOZgX8YBdrZ7MPy/WnAok=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f5kOp1/kR3xSCaLYMpLytkPNFbrQ11PrNhd2Vx5+xmU8d2CsSbwxfuFwahzs2Y/41vFAKXFGDGfMKM9ILDCSxZqpg42VCbtsCSC1H31NSPnYiA7iyQSmukrVjLbEqV0YROjUOznjQ+hn8YBflCwNFg43xiey0Jf+nvDCJdqcwiw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JD4n6zBy; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JD4n6zBy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 496271F00893; Fri, 5 Jun 2026 18:35:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684549; bh=O1/j3Ucet9v9ZybQsRQ8PCoEEC5Pb6SVzbYAbbFutT0=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=JD4n6zBytZ/U3VZK1UN3Ojx9L1qsqBExXmh5oUOQCk4b4WhJdk108VJqz03nKg6te i8KyS2qFb76L74r8jC4vVkGpCcu0yWhpLKD/w7hTRe+vsOg1828pB9kHKvRTeNAPq+ TwloeSgFBapgmMVq8oAmx4pNemeo6b9TjTcX+5F+dYUyRbXvhNsm/2hpOHoY7wICVK 8jFPy5s6eKmJavr/FuB3KJ08lYkdnu6iX5iwgVijGEHZfn2/M0yUvNtW0V2PUUkiOF gOyFxlo0Kok3SdSF31h779qqok4lOpr2amj0kyO9lGhZc8hVAhHUPUZZSYK/xKA0jf mXT73uwFg9Djw== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 15/18] kho: extend scratch Date: Fri, 5 Jun 2026 20:34:48 +0200 Message-ID: <20260605183501.3884950-16-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Motivation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The scratch space is allocated by the first kernel in the KHO chain, and is reused by all subsequent kernels. The size of the space is either set via the commandline by the system administrator or by calculating the amount of memory used by the kernel and adding a multiplier. In either case, the scratch space is a heuristic and is liable to fill up and fail allocation if a kernel uses more memory than expected. In addition, gigantic huge pages (usually 1 GiB) are allocated via memblock, and in a KHO boot that memory comes from the scratch space. In hypervisors it is common to dedicate a major part of the system's memory to gigantic hugepages for VM memory. If this memory needs to come from scratch space, then scratch needs to be greater than the memory needed for huge pages, which is impractical. In addition, hugepages can be preserved memory. Allocating them from scratch violates the assumption that scratch contains no preserved memory. Methodology =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Discover areas that don't contain any preserved memory at boot by walking the preserved memory radix tree. Mark them as scratch to allow allocations from them. This makes KHO more resilient to memory pressure and allows supporting huge page preservation. Since the preserved memory radix tree mixes both physical address and order into a single key, and does not track table pages, it is difficult to identify free areas from it directly. Walk the tree and digest it down into another radix tree. The latter tracks blocks of KHO_EXT_SHIFT (1 GiB as of now) granularity. Then walk the digested tree and mark the areas between the present keys as scratch. Performance =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The discovery algorithm traverses the preserved memory radix tree exactly once. While it does use memory for the digested radix tree, since the blocks are split by 1 GiB, a single bitmap with 4k pages can track up to 32 TiB of memory. So there are likely to be very few radix tree pages used in this tracking. For systems with all physical memory below 32 TiB, this should result in a total of 6 pages being used (KHO_TREE_MAX_DEPTH =3D=3D 6). An alternate way of achieving this would be to call kho_mem_retrieve() earlier in boot and mark all the KHO preservations as reserved. But that can blow up memblock.reserved with a bunch of 4K pages scattered everywhere, which will reduce performance of subsequent allocations. Since the free blocks are tracked in chunks of 1 GiB, this won't blow up memblock.memory as much. There is no inherent reason for using 1 GiB as the discovered block size. This can be changed later if needed. Currently, KHO is mainly targeted for server grade systems with hundreds of gigabytes to terabytes of memory. So 1 GiB is a reasonable granularity for those systems. For smaller systems this doesn't work as well, but we can arrive at a better heuristic when we have concrete use cases. Practical evaluation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The testing is done on a x86_64 qemu VM running under KVM with 64G memory and 12 CPUs. The machine pre-allocates 50 1G pages. Since the performance scales with how busy the radix tree is, tests are done with 2 preservation patterns: first with two 1M memfds, second with two 1G memfds, both using 4k pages. Test case 1 - 1M memfd ~~~~~~~~~~~~~~~~~~~~~~ This test case has two memfds with 1M memory each in 4k pages, plus other preservations from LUO core and other KHO users. This is how the radix tree stats look like: radix_nodes: 0x13 nr_preservations: 0x214 mem_preserved: 0x227000 per order preservations: order 0: 0x20f order 1: 0x4 order 4: 0x1 and this is how long it takes to extend the scratch after KHO boot: KHO: KHO extend time: 47 us KHO: KHO extend total mem: 0xe6c17b000 (~57G) Test case 2 - 1G memfd ~~~~~~~~~~~~~~~~~~~~~~ This test case has two memfds with 1G memory each in 4k pages, plus other preservations from LUO core and other KHO users. This is how the radix tree stats look like: radix_nodes: 0x28 nr_preservations: 0x80816 mem_preserved: 0x80829000 per order preservations: order 0: 0x80811 order 1: 0x4 order 4: 0x1 and this is how long it takes to extend the scratch after KHO boot: KHO: KHO extend time: 22514 us KHO: KHO extend total mem: 0xd3f200000 (~52G) Signed-off-by: Pratyush Yadav (Google) --- Notes: As one might notice, the "scratch" terminology starts to break down here. There is the original "scratch", which is passed down by the previous kernel. It is marked MEMBLOCK_KHO_SCRATCH. There is also the discovered "scratch", which also gets marked MEMBLOCK_KHO_SCRATCH, but has nothing to do with the former. =20 For limiting the scope of this series, I haven't done the rename here. I can do it as a follow up series once this stabilizes and lands into -next. =20 I suggest the following scheme: =20 - Rename "KHO scratch" to "KHO bootmem". Update the documentation and all code to use this name. We have the kho_scratch kernel cmdline parameter, which is harder to change, but perhaps we can rename it to "kho_bootmem" and if someone complains we can add it back. =20 - Rename MEMBLOCK_KHO_SCRATCH to MEMBLOCK_KHO_NOPRSRV. This describes the property of the memory not its origin. Then KHO can mark its "bootmem" as KHO_NOPRSRV because bootmem never has any preserved memory. Later, kho_extend_scratch() (which is also due for a better name) can also mark its discovered areas as KHO_NOPRSRV. kernel/liveupdate/kexec_handover.c | 149 +++++++++++++++++++++++++---- 1 file changed, 132 insertions(+), 17 deletions(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index af22086ca2d6..8540608b8602 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -84,6 +84,23 @@ static struct kho_out kho_out =3D { }, }; =20 +struct kho_in { + phys_addr_t fdt_phys; + phys_addr_t scratch_phys; + char previous_release[__NEW_UTS_LEN + 1]; + u32 kexec_count; + struct kho_debugfs dbg; + struct kho_radix_tree radix_tree; +}; + +static struct kho_in kho_in =3D { +}; + +static const void *kho_get_fdt(void) +{ + return kho_in.fdt_phys ? phys_to_virt(kho_in.fdt_phys) : NULL; +} + /** * kho_encode_radix_key - Encodes a physical address and order into a radi= x key. * @phys: The physical address of the page. @@ -869,6 +886,119 @@ static void __init kho_reserve_scratch(void) kho_enable =3D false; } =20 +#define KHO_EXT_SHIFT 30 /* 1 GiB */ + +static int __init kho_ext_walk_key(unsigned long key, void *data) +{ + struct kho_radix_tree *tree =3D data; + phys_addr_t start, end; + unsigned int order; + int err; + + start =3D kho_decode_radix_key(key, &order); + end =3D start + (1UL << (order + PAGE_SHIFT)); + + while (start < end) { + err =3D kho_radix_add_key(tree, start >> KHO_EXT_SHIFT); + if (err) + return err; + + start +=3D (1UL << KHO_EXT_SHIFT); + } + + return 0; +} + +static int __init kho_ext_walk_node(phys_addr_t phys, void *data) +{ + struct kho_radix_tree *tree =3D data; + + return kho_radix_add_key(tree, phys >> KHO_EXT_SHIFT); +} + +static int __init kho_ext_mark_scratch(unsigned long key, void *data) +{ + phys_addr_t *prev_end =3D data; + phys_addr_t start =3D key << KHO_EXT_SHIFT; + int err; + + if (start > *prev_end) { + err =3D memblock_mark_kho_scratch(*prev_end, start - *prev_end); + if (err) + return err; + } + + *prev_end =3D start + (1UL << KHO_EXT_SHIFT); + return 0; +} + +/** + * kho_extend_scratch - Extend the scratch regions + * + * The KHO radix tree mixes both physical address and order into a single = key. + * This makes it hard to look for free ranges directly. This function first + * walks the radix tree and digests it down into another radix tree, whose= keys + * identify blocks of KHO_EXT_SHIFT which contain preserved memory. + * + * Then it walks the digested radix tree and marks everything that doesn't= have + * preserved memory as scratch. + * + * NOTE: This function allocates memory so it should be called when scratc= h has + * available space. + * + * NOTE: The pages of the KHO radix tree tables are not marked as preserve= d in + * the KHO tree. But they are expected to remain untouched until the tree = is + * fully parsed. So this function also considers them to be "preserved mem= ory" + * and marks their blocks as busy. + */ +static void __init kho_extend_scratch(void) +{ + const struct kho_radix_walk_cb kho_cb =3D { + .leaf =3D kho_ext_walk_key, + .node =3D kho_ext_walk_node, + }; + const struct kho_radix_walk_cb ext_cb =3D { + .leaf =3D kho_ext_mark_scratch, + }; + struct kho_radix_tree radix; + phys_addr_t prev_end =3D 0; + int err =3D 0; + + if (!is_kho_boot()) + return; + + /* Make sure the KHO radix tree is initialized. */ + err =3D kho_radix_init_tree(&kho_in.radix_tree, + kho_get_mem_map(kho_get_fdt())); + if (err) + goto print; + + err =3D kho_radix_init_tree(&radix, NULL); + if (err) + goto print; + + /* Walk the KHO radix tree to find busy blocks. */ + err =3D kho_radix_walk_tree(&kho_in.radix_tree, &kho_cb, &radix); + if (err) + goto out; + + /* Walk the blocks and mark everything between keys as scratch. */ + err =3D kho_radix_walk_tree(&radix, &ext_cb, &prev_end); + if (err) + goto out; + + /* Mark everything from last busy block to end of DRAM. */ + if (prev_end < memblock_end_of_DRAM()) + err =3D memblock_mark_kho_scratch(prev_end, memblock_end_of_DRAM() - pre= v_end); + + /* fallthrough */ +out: + kho_radix_destroy_tree(&radix); +print: + if (err) + pr_err("Failed to extend scratch: %pe\n", ERR_PTR(err)); +} + /** * kho_add_subtree - record the physical address of a sub blob in KHO root= tree. * @name: name of the sub tree. @@ -1443,23 +1573,6 @@ void kho_restore_free(void *mem) } EXPORT_SYMBOL_GPL(kho_restore_free); =20 -struct kho_in { - phys_addr_t fdt_phys; - phys_addr_t scratch_phys; - char previous_release[__NEW_UTS_LEN + 1]; - u32 kexec_count; - struct kho_debugfs dbg; - struct kho_radix_tree radix_tree; -}; - -static struct kho_in kho_in =3D { -}; - -static const void *kho_get_fdt(void) -{ - return kho_in.fdt_phys ? phys_to_virt(kho_in.fdt_phys) : NULL; -} - /** * is_kho_boot - check if current kernel was booted via KHO-enabled * kexec @@ -1763,6 +1876,8 @@ void __init kho_memory_init_early(void) */ if (kho_in.scratch_phys) kho_scratch =3D phys_to_virt(kho_in.scratch_phys); + + kho_extend_scratch(); } =20 void __init kho_memory_init(void) --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA6CD3F660D for ; Fri, 5 Jun 2026 18:35:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684553; cv=none; b=O5vdkNushyXoLx6Cbax3k0l58wfKmBM0jxz86vO9yLQT7nb88CAQsSvK/Lp0wPH2b0WlRsyF556P3enjBVV2AsBtxS6o+OztfcsRGwgrpot1GtuPnvRXVEZlxc/VRdKrF0dU0Gw4ej9LCVmmtH4R1CcEDj4FSLSveOQVcQq3PMI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684553; c=relaxed/simple; bh=3dP24KzTFkZpn9yFu7i3dxmzjrdxjI81prbwFYLT2LQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WNoJyM2eEjveW1ho3HPdTiKS/HZb1mHL/ruGsLRVornQ9F52eU6wuiiQN8W700sdCTaD0lLAmFBMZtcQUJWs6bgvHY6j3IFQtZoznm22yyNSDCtkEOYsKYGIMPHpeiDp6NB8wyRVcdYgTjFz5EEmYzJhlY6Zcjf3Wd4RDbhJEi8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N+JEX5bD; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N+JEX5bD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BAD5E1F00898; Fri, 5 Jun 2026 18:35:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684551; bh=jPW7ndBXFMlA6BumIKFLGL62i1vzthwDUNuAA0KAxEI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=N+JEX5bD/I8m0d8e+OQIBAsmVQ/s+zaYtMEOhMZxjsUC1DvogGlpCdU5fU7dSgDuK 8FOWQdo9Jdkn1RAHZ4OeyZNGINZ3xZfaxNiixKoxDT6cDEs4v0ExWj/oQ5Og3nyed1 ASfJel4zQfjp3wIDec1AM44piTUCmrOYJfw1mOBrZWcEVcsnMDYB/CCcby8ha8JwCz +GCjI0StGsXqftjYieXLfwS/4R65Yt/er8LvobiSDz+DNU4x8j3rcON2Tq4V9pXNrL 67yP935PjiYOf1ITt6cFYJ+tZrJJClDC38cJ62veB3lkZvct/6DCvZ9bW2mGxop0E6 UsLSLdO/jBwdw== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 16/18] memblock: make HugeTLB bootmem allocation work with KHO Date: Fri, 5 Jun 2026 20:34:49 +0200 Message-ID: <20260605183501.3884950-17-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" Gigantic huge page allocation is somewhat broken currently when KHO is used. Firstly, they break KHO scratch size accounting. RSRV_KERN is used to track how much memory is reserved for use by the kernel. Since alloc_bootmem() calls the memblock_alloc*() APIs, the hugepages allocated also get marked as RSRV_KERN. Allocations marked RSRV_KERN are used by KHO to calculate how much scratch space it should reserve to make sure the next kernel has enough memory to boot when it is in scratch-only phase. Counting hugepages in that blows up scratch size, and can lead to the scratch allocation failing, making KHO unusable. This will show up when huge pages make up more than 50% of the system, which is a fairly common use case. Secondly, while not supported right now, huge pages are user memory and can be preserved via KHO. The scratch spaces should not have any preserved memory. Allocating hugepages from scratch (on a KHO boot) can lead to them being un-preservable. Introduce memblock_alloc_hugetlb(). This lets memblock tailor to the needs of hugetb without exposing those details to the general allocation routines. First, it does not use mirrored memory for hugetlb. Mirrored memory is a limited resource that is best saved for kernel data structures, not user memory. Second, if the memory found overlaps with KHO scratch areas, it discards the memory and retries. Third, it simplifies the argument list by baking in some hugetlb assumptions like alignment and exact_nid. This also simplifies allocation logic in alloc_bootmem(). Also introduce MEMBLOCK_RSRV_HUGETLB to mark reservations made for HugeTLB. This will be used by KHO in future patches to correctly calculate scratch sizes. Refactor some of the preparation logic like kmemleak tracking and accepting memory into a separate helper memblock_prep_allocation(), and use it from both memblock_alloc_hugetlb() and the usual memblock_alloc_range_nid(). Signed-off-by: Pratyush Yadav (Google) --- include/linux/memblock.h | 3 ++ mm/hugetlb.c | 22 +++----- mm/memblock.c | 112 +++++++++++++++++++++++++++++++-------- 3 files changed, 100 insertions(+), 37 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 546d7ef798b8..b3b4a6145fad 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -52,6 +52,7 @@ extern unsigned long long max_possible_pfn; * memory reservations yet, so we get scratch memory from the previous * kernel that we know is good to use. It is the only memory that * allocations may happen from in this phase. + * @MEMBLOCK_RSRV_HUGETLB: memory is reserved for hugetlb pages */ enum memblock_flags { MEMBLOCK_NONE =3D 0x0, /* No special request */ @@ -62,6 +63,7 @@ enum memblock_flags { MEMBLOCK_RSRV_NOINIT =3D 0x10, /* don't initialize struct pages */ MEMBLOCK_RSRV_KERN =3D 0x20, /* memory reserved for kernel use */ MEMBLOCK_KHO_SCRATCH =3D 0x40, /* scratch memory for kexec handover */ + MEMBLOCK_RSRV_HUGETLB =3D 0x80, /* memory reserved for hugetlb pages */ }; =20 /** @@ -421,6 +423,7 @@ void *memblock_alloc_try_nid_raw(phys_addr_t size, phys= _addr_t align, void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid); +void *memblock_alloc_hugetlb(phys_addr_t size, int nid, bool exact_nid); =20 static __always_inline void *memblock_alloc(phys_addr_t size, phys_addr_t = align) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4b80b167cc9c..fadcfa267ceb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3029,29 +3029,21 @@ static __init void *alloc_bootmem(struct hstate *h,= int nid, bool node_exact) if (hugetlb_early_cma(h)) m =3D hugetlb_cma_alloc_bootmem(h, &listnode, node_exact); else { - if (node_exact) - m =3D memblock_alloc_exact_nid_raw(huge_page_size(h), - huge_page_size(h), 0, - MEMBLOCK_ALLOC_ACCESSIBLE, nid); - else { - m =3D memblock_alloc_try_nid_raw(huge_page_size(h), - huge_page_size(h), 0, - MEMBLOCK_ALLOC_ACCESSIBLE, nid); + m =3D memblock_alloc_hugetlb(huge_page_size(h), nid, node_exact); + if (m) { + m->flags =3D 0; + m->cma =3D NULL; + /* * For pre-HVO to work correctly, pages need to be on * the list for the node they were actually allocated * from. That node may be different in the case of - * fallback by memblock_alloc_try_nid_raw. So, + * fallback by memblock_alloc_hugetlb_bootmem. So, * extract the actual node first. */ - if (m) + if (!node_exact) listnode =3D early_pfn_to_nid(PHYS_PFN(__pa(m))); } - - if (m) { - m->flags =3D 0; - m->cma =3D NULL; - } } =20 if (m) { diff --git a/mm/memblock.c b/mm/memblock.c index 6349c48154f4..131e54dd5d8d 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1506,6 +1506,32 @@ int __init_memblock memblock_set_node(phys_addr_t ba= se, phys_addr_t size, return 0; } =20 +static void memblock_prep_allocation(phys_addr_t start, phys_addr_t size, + bool leaktrace) +{ + /* + * Skip kmemleak for those places like kasan_init() and + * early_pgtable_alloc() due to high volume. + */ + if (leaktrace) + /* + * Memblock allocated blocks are never reported as + * leaks. This is because many of these blocks are + * only referred via the physical address which is + * not looked up by kmemleak. + */ + kmemleak_alloc_phys(start, size, 0); + + /* + * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, + * require memory to be accepted before it can be used by the + * guest. + * + * Accept the memory of the allocated buffer. + */ + accept_memory(start, size); +} + /** * memblock_alloc_range_nid - allocate boot memory block * @size: size of memory block to be allocated in bytes @@ -1580,28 +1606,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_add= r_t size, return 0; =20 done: - /* - * Skip kmemleak for those places like kasan_init() and - * early_pgtable_alloc() due to high volume. - */ - if (end !=3D MEMBLOCK_ALLOC_NOLEAKTRACE) - /* - * Memblock allocated blocks are never reported as - * leaks. This is because many of these blocks are - * only referred via the physical address which is - * not looked up by kmemleak. - */ - kmemleak_alloc_phys(found, size, 0); - - /* - * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, - * require memory to be accepted before it can be used by the - * guest. - * - * Accept the memory of the allocated buffer. - */ - accept_memory(found, size); - + memblock_prep_allocation(found, size, end !=3D MEMBLOCK_ALLOC_NOLEAKTRACE= ); return found; } =20 @@ -1756,6 +1761,69 @@ void * __init memblock_alloc_try_nid_raw( false); } =20 +/** + * memblock_alloc_hugetlb - allocate boot memory for HugeTLB pages + * @size: size of the memory to be allocated in bytes + * @nid: nid of the free memory to find, %NUMA_NO_NODE for any node + * @exact_nid: only allocate from the specified nid. If %false, the specif= ied + * nid is tried first, and then all nodes are tried as fallbac= k. + * + * HugeTLB pages are always aligned by their size, so the alignment matches + * @size. Since the memory is for userspace, mirrored memory is not used. = The + * memory is not zeroed. Does not panic if request cannot be satisfied. + * + * Return: + * Virtual address of allocated memory block on success, %NULL on failure. + */ +void * __init memblock_alloc_hugetlb(phys_addr_t size, int nid, bool exact= _nid) +{ + enum memblock_flags flags =3D choose_memblock_flags(); + phys_addr_t addr, start =3D 0, end =3D MEMBLOCK_ALLOC_ACCESSIBLE; + + memblock_dbg("%s: %llu bytes, nid=3D%d, exact_nid=3D%d %pS\n", __func__, + (u64)size, nid, exact_nid, (void *)_RET_IP_); + + /* Don't waste mirrored memory on HugeTLB pages. */ + flags &=3D ~MEMBLOCK_MIRROR; +retry: + /* HugeTLB pages are always aligned by their size. */ + addr =3D memblock_find_in_range_node(size, size, start, end, nid, flags); + if (addr) + goto found; + + /* Try all nodes if allowed. */ + if (numa_valid_node(nid) && !exact_nid) { + nid =3D NUMA_NO_NODE; + goto retry; + } + + /* Found nothing... :-( */ + return NULL; + +found: + /* + * HugeTLB pages can be preserved with KHO and no preserved memory can + * be in scratch. So retry if found address overlaps with scratch. + * + * Scratch areas are normally not very large, so this shouldn't take too + * many retries. + */ + if (kho_scratch_overlap(addr, size)) { + if (memblock_bottom_up()) + start =3D addr + size; + else + start =3D addr - size; + + goto retry; + } + + if (__memblock_reserve(addr, size, nid, MEMBLOCK_RSRV_KERN | MEMBLOCK_RSR= V_HUGETLB)) + return NULL; + + memblock_prep_allocation(addr, size, true); + return phys_to_virt(addr); +} + /** * memblock_alloc_try_nid - allocate boot memory block * @size: size of memory block to be allocated in bytes --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4FD23FAE0C for ; Fri, 5 Jun 2026 18:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684556; cv=none; b=Pz3ztq7j3YV9Rer4aJySJIFsmudZbB651gYKOmYH1eL7+10U771Iz+bs+ahshclyg/k3tU2DbSyRbLGZF7/iJ+ivzUO29VagJ6/bdeNC3XL5XDapgw+ez7oqLhX+wQy/wgh+HXhEXOn8MWRVW3wclNFdTZO7RotqBSnl7CaV55U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684556; c=relaxed/simple; bh=qCb0sBvwG/KiWMpvZeymKtbWNYp4yATJyUBxychVqXw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VL3WnY5GJx8K/b0lbniXwIWpJlG8wSQiEv2RQ+QWNppyvVtx9it4tjhQkdvPR+4Xyz1yl/ot+AdLJrp03ktE1rajV66OYo9kFwlPgOxYjwbFvn6siG8JJYPOA7y9pzG0rGCa/ABd/bZXLNYuCMl2F8U0bwger/NRaXRFYmTNBEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RDp1KNMz; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RDp1KNMz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 37AC11F00893; Fri, 5 Jun 2026 18:35:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684554; bh=WPKtkl44aSyXLvNV0tx53TGY2duqMQJ5kATLVHTv48Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=RDp1KNMzreXggo/k5UN4yM2Vg2nlZmZX0Wh5BFATiEUCiYt/OzIlMXhoid/ZuvPyN o4JRp+ybptcE3KR0SR3U/GQFA/Br52X8PwyfyCE4K3xKLiNjlhROn0yP1az6fS50Wn bckAsmKkXRIqSnpu1yJiHqNZVyowEnPJ10728SdPQGvjnMmzM9y1joufJMK910774F lRXdWbT6mcNbofW9uEe0x5g2GSQFs7sdkzA/1xaMKVz39RhPwDt8/omJg0HDymZ245 xuO+UITtPc5kwdHIQEnEVE5tMo1s3yjeCbPEqych8hP8XehChB+QxjwGezXjXy/QBP zU6rKaEGZ2Wig== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 17/18] memblock: allow calculating reserved size by flags Date: Fri, 5 Jun 2026 20:34:50 +0200 Message-ID: <20260605183501.3884950-18-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" memblock_reserved_kern_size() returns the total size of all reserved areas flagged RSRV_KERN. KHO also needs total size of all reserved areas flagged HUGETLB to correctly size its scratch areas. Refactor memblock_reserved_kern_size() into memblock_reserved_size_flags(). The new function returns total size of all reserved areas which match _any_ of the flags. Signed-off-by: Pratyush Yadav (Google) --- include/linux/memblock.h | 3 ++- kernel/liveupdate/kexec_handover.c | 14 ++++++++------ mm/memblock.c | 8 +++++--- 3 files changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index b3b4a6145fad..a3b57066611d 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -487,7 +487,8 @@ static inline __init_memblock bool memblock_bottom_up(v= oid) =20 phys_addr_t memblock_phys_mem_size(void); phys_addr_t memblock_reserved_size(void); -phys_addr_t memblock_reserved_kern_size(phys_addr_t limit, int nid); +phys_addr_t memblock_reserved_size_flags(phys_addr_t limit, int nid, + enum memblock_flags flags); unsigned long memblock_estimated_nr_free_pages(void); phys_addr_t memblock_start_of_DRAM(void); phys_addr_t memblock_end_of_DRAM(void); diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 8540608b8602..b3c33f150e85 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -749,13 +749,15 @@ static void __init scratch_size_update(void) if (scratch_scale) { phys_addr_t size; =20 - size =3D memblock_reserved_kern_size(ARCH_LOW_ADDRESS_LIMIT, - NUMA_NO_NODE); + size =3D memblock_reserved_size_flags(ARCH_LOW_ADDRESS_LIMIT, + NUMA_NO_NODE, + MEMBLOCK_RSRV_KERN); size =3D size * scratch_scale / 100; scratch_size_lowmem =3D size; =20 - size =3D memblock_reserved_kern_size(MEMBLOCK_ALLOC_ANYWHERE, - NUMA_NO_NODE); + size =3D memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, + NUMA_NO_NODE, + MEMBLOCK_RSRV_KERN); size =3D size * scratch_scale / 100 - scratch_size_lowmem; scratch_size_global =3D size; } @@ -773,8 +775,8 @@ static phys_addr_t __init scratch_size_node(int nid) phys_addr_t size; =20 if (scratch_scale) { - size =3D memblock_reserved_kern_size(MEMBLOCK_ALLOC_ANYWHERE, - nid); + size =3D memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, + nid, MEMBLOCK_RSRV_KERN); size =3D size * scratch_scale / 100; } else { size =3D scratch_size_pernode; diff --git a/mm/memblock.c b/mm/memblock.c index 131e54dd5d8d..cc21f877cb67 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1893,7 +1893,8 @@ phys_addr_t __init_memblock memblock_reserved_size(vo= id) return memblock.reserved.total_size; } =20 -phys_addr_t __init_memblock memblock_reserved_kern_size(phys_addr_t limit,= int nid) +phys_addr_t __init_memblock memblock_reserved_size_flags(phys_addr_t limit= , int nid, + enum memblock_flags flags) { struct memblock_region *r; phys_addr_t total =3D 0; @@ -1908,7 +1909,7 @@ phys_addr_t __init_memblock memblock_reserved_kern_si= ze(phys_addr_t limit, int n size =3D limit - r->base; =20 if (nid =3D=3D memblock_get_region_node(r) || !numa_valid_node(nid)) - if (r->flags & MEMBLOCK_RSRV_KERN) + if (r->flags & flags) total +=3D size; } =20 @@ -1930,7 +1931,8 @@ phys_addr_t __init_memblock memblock_reserved_kern_si= ze(phys_addr_t limit, int n unsigned long __init memblock_estimated_nr_free_pages(void) { return PHYS_PFN(memblock_phys_mem_size() - - memblock_reserved_kern_size(MEMBLOCK_ALLOC_ANYWHERE, NUMA_NO_NODE)); + memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, NUMA_NO_NODE, + MEMBLOCK_RSRV_KERN)); } =20 /* lowest address */ --=20 2.54.0.1032.g2f8565e1d1-goog From nobody Tue Jun 16 19:32:37 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E73D3FCB22 for ; Fri, 5 Jun 2026 18:35:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684558; cv=none; b=b8pGl5Z+tFAFa4UJsx06y3a63loBlEfnNzRA5iABwtNr8FEr+3zGbyV3VTnwPjMH34RZQMfJf9Rm0oWuAe2oYb843G1L/tXYe6TAzdC2OLqhM+tkDIrmEhjQCNz78QW6razkcYuu1qs48dxG+F3M37T1fctAaKGF93pfKpk5sPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780684558; c=relaxed/simple; bh=0efo/ycGcd2WP7drmYEXYpmg66GfnAWUymidyYhZlTg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tNw1dJ/aJ5sjL8sce2JAkQSlSe2EAhomAQPRwZ5VZXohZDjdTWoaCugcbgo+ChxaWq/bupYOjte6hm6GpyuGjMzGlFonJ9+p0WJXyfQhxPbkPiGXRblQ1CmZpuMGwKrC5D93Bm/dnATRfXDSga1hiI3w0zLlqwXA9oAxNTKcQ+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bUfHWrW5; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bUfHWrW5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A913E1F0089C; Fri, 5 Jun 2026 18:35:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780684556; bh=1Hq29nobeHte/zgkXSzWoaRdOwXra0la1omlEHsUXW0=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=bUfHWrW5rXMnv2UvKJJvwS1trNmxa2rXAjoYxroYpe1q5lIcs760w3ankAJ6BB1Py g73wErrnH91np/zObpNmIFsHWrAUvhr3VJlJAcj4xTTHk/cvcRExmAy12bklfYEH+w kAaKAZcTcXTI0uCCo5vFlyOqxzHlqa46oLSVDq6qe5fpJTloUX20PBsylCddF9x0z1 wSJgYCMJGOGDk0bXCPolKk9SFYRhFNmMU1wQ2M0lk0ybHyOBjtAtQX1FwVJvlUEqUg bVjPGABJTG6H4zot5hDdg1Ua9KeRvqsn0epir31a1xmaTVbNjXDhZhjEnok0JZYRXG YmdbPzofSIHPA== From: Pratyush Yadav To: Mike Rapoport , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 18/18] kho: exclude hugetlb memory from scratch size calculation Date: Fri, 5 Jun 2026 20:34:51 +0200 Message-ID: <20260605183501.3884950-19-pratyush@kernel.org> X-Mailer: git-send-email 2.54.0.1032.g2f8565e1d1-goog In-Reply-To: <20260605183501.3884950-1-pratyush@kernel.org> References: <20260605183501.3884950-1-pratyush@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Pratyush Yadav (Google)" HugeTLB pages can be preserved memory. So they are never allocated from scratch. Instead, they are allocated from the memory blocks with no preserved memory. These areas are detected at runtime on each boot. But since they are allocated via memblock, they show up as RSRV_KERN, and blow up the scratch size when scratch scale is in use. All hugetlb pages are marked RSRV_HUGETLB. Subtract their size from RSRV_KERN when calculating scratch sizes. Signed-off-by: Pratyush Yadav (Google) --- kernel/liveupdate/kexec_handover.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index b3c33f150e85..0d106c9197d9 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -744,7 +744,8 @@ static void __init scratch_size_update(void) { /* * If fixed sizes are not provided via command line, calculate them - * now. + * now. Use RSRV_KERN to count allocated memory, but remove HugeTLB + * allocations from it because they never get allocated from scratch. */ if (scratch_scale) { phys_addr_t size; @@ -752,12 +753,19 @@ static void __init scratch_size_update(void) size =3D memblock_reserved_size_flags(ARCH_LOW_ADDRESS_LIMIT, NUMA_NO_NODE, MEMBLOCK_RSRV_KERN); + size -=3D memblock_reserved_size_flags(ARCH_LOW_ADDRESS_LIMIT, + NUMA_NO_NODE, + MEMBLOCK_RSRV_HUGETLB); + size =3D size * scratch_scale / 100; scratch_size_lowmem =3D size; =20 size =3D memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, NUMA_NO_NODE, MEMBLOCK_RSRV_KERN); + size -=3D memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, + NUMA_NO_NODE, + MEMBLOCK_RSRV_HUGETLB); size =3D size * scratch_scale / 100 - scratch_size_lowmem; scratch_size_global =3D size; } @@ -777,6 +785,9 @@ static phys_addr_t __init scratch_size_node(int nid) if (scratch_scale) { size =3D memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, nid, MEMBLOCK_RSRV_KERN); + /* Do not count HugeTLB pages. */ + size -=3D memblock_reserved_size_flags(MEMBLOCK_ALLOC_ANYWHERE, + nid, MEMBLOCK_RSRV_HUGETLB); size =3D size * scratch_scale / 100; } else { size =3D scratch_size_pernode; --=20 2.54.0.1032.g2f8565e1d1-goog