[PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO

Pratyush Yadav posted 18 patches 1 week, 3 days ago
include/linux/kexec_handover.h              |  10 +
include/linux/kho/abi/kexec_handover.h      |   8 +
include/linux/kho_radix_tree.h              |  44 +-
include/linux/memblock.h                    |   9 +-
kernel/liveupdate/Makefile                  |   1 -
kernel/liveupdate/kexec_handover.c          | 495 +++++++++++++++-----
kernel/liveupdate/kexec_handover_debug.c    |  25 -
kernel/liveupdate/kexec_handover_internal.h |   9 -
mm/hugetlb.c                                |  22 +-
mm/memblock.c                               | 120 ++++-
mm/mm_init.c                                |   1 +
11 files changed, 540 insertions(+), 204 deletions(-)
delete mode 100644 kernel/liveupdate/kexec_handover_debug.c
[PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO
Posted by Pratyush Yadav 1 week, 3 days ago
From: "Pratyush Yadav (Google)" <pratyush@kernel.org>

Hi,

Gigantic huge page allocation is somewhat broken currently with KHO.

First, they break scratch size accounting. Since they are allocated
using the memblock alloc APIs, they count towards RSRV_KERN, and this
scratch size when using scratch_scale. This means if huge pages take a
large enough chunk of system memory scratch size will blow up and fail
to allocate.

Second, scratch can not contain preserved memory, and if hugepages are
allocated from scratch, they will fail to be preserved with the upcoming
hugetlb preservation series [0].

Fix this by introducing the concept of extended scratch areas. They are
areas that the kernel discovers on boot by walking the radix tree and
finding free memory ranges. See patch 10 for more details.

Discovering the scratch areas needs some preparatory changes to KHO, the
radix tree APIs, and to memblock. Patches 1-14 do that.

Patch 15 adds the scratch discovery logic.

Patch 16 adds the dedicated memblock hugetlb allocator.

Patch 17-18 fix the scratch size calculation with using scratch_scale.

[0] https://lore.kernel.org/linux-mm/20251206230222.853493-1-pratyush@kernel.org/T/#u

Changes in v2:

Detailed changelog below.

At a high level, the major change in this version is to remove
MEMBLOCK_KHO_SCRATCH_EXT. Keep MEMBLOCK_KHO_SCRATCH as the only memory
type and mark the discovered areas with it. For HugeTLB, add a dedicated
allocation routine and if allocated memory lands in scratch, do a retry.
Also introduce MEMBLOCK_RSRV_HUGETLB to help with accounting of scratch
area sizes.

- Fixup commit message in patch 1 to make namespacing change clearer.
- Use @key in kernel-doc for radix functions.
- Add a runtime check on key width.
- Move all mem retrieval logic to kho_mem_retrieve().
- Add a comment in kho_mem_retrieve() explaining why mem_map won't be NULL.
- Rename callbacks to ->leaf() and ->node().
- Fixup commit messages.
- Clear tree->root in kho_radix_destroy_tree(). This lets the tree be
  re-initialized by calling kho_radix_init_tree()
- Add kho_get_mem_map() earlier in the series.
- Export kho_scratch_overlap() and use it in memblock_is_kho_scratch_memory().
- Get rid of MEMBLOCK_KHO_SCRATCH_EXT.
- Introduce MEMBLOCK_RSRV_HUGETLB.
- Introduce memblock_alloc_hugetlb() for hugetlb bootmem allocations.
- Refactor memblock_reserved_kern_size() to allow calculating size by flags.
- Exclude hugetlb memory from scratch size calculation.
- Collect R-bys.

Regards,
Pratyush Yadav

Pratyush Yadav (Google) (18):
  kho: generalize radix tree APIs
  kho: disallow wide keys in radix tree
  kho: return virtual address of mem_map
  kho: store incoming radix tree in kho_in
  kho: move all memory retrieval logic to kho_mem_retrieve()
  kho: add a struct for radix callbacks
  kho: add callback for table pages
  kho: add data argument to radix walk callback
  kho: allow early-boot usage of the KHO radix tree
  kho: allow destroying KHO radix tree
  kho: add kho_radix_init_tree()
  kho: export kho_scratch_overlap()
  kho: initialize kho_scratch pointer earlier in boot
  memblock: use kho_scratch_overlap() to decide migratetype
  kho: extend scratch
  memblock: make HugeTLB bootmem allocation work with KHO
  memblock: allow calculating reserved size by flags
  kho: exclude hugetlb memory from scratch size calculation

 include/linux/kexec_handover.h              |  10 +
 include/linux/kho/abi/kexec_handover.h      |   8 +
 include/linux/kho_radix_tree.h              |  44 +-
 include/linux/memblock.h                    |   9 +-
 kernel/liveupdate/Makefile                  |   1 -
 kernel/liveupdate/kexec_handover.c          | 495 +++++++++++++++-----
 kernel/liveupdate/kexec_handover_debug.c    |  25 -
 kernel/liveupdate/kexec_handover_internal.h |   9 -
 mm/hugetlb.c                                |  22 +-
 mm/memblock.c                               | 120 ++++-
 mm/mm_init.c                                |   1 +
 11 files changed, 540 insertions(+), 204 deletions(-)
 delete mode 100644 kernel/liveupdate/kexec_handover_debug.c


base-commit: 2935777b418d2bfcbfe96705bb2c0fa6c0d94e18
-- 
2.54.0.1032.g2f8565e1d1-goog
Re: [PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO
Posted by Mike Rapoport 2 days, 6 hours ago
On Fri, 05 Jun 2026 20:34:33 +0200, Pratyush Yadav <pratyush@kernel.org> wrote:

Hi,

> [...]
> allocated from scratch, they will fail to be preserved with the upcoming
> hugetlb preservation series [0].
> 
> Fix this by introducing the concept of extended scratch areas. They are
> areas that the kernel discovers on boot by walking the radix tree and
> finding free memory ranges. See patch 10 for more details.

Overlall LGTM.

I have some small comments here and there for now as I din't get into
all the details yet.

-- 
Sincerely yours,
Mike.
Re: [PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO
Posted by Pratyush Yadav 1 day, 4 hours ago
On Sun, Jun 14 2026, Mike Rapoport wrote:

> On Fri, 05 Jun 2026 20:34:33 +0200, Pratyush Yadav <pratyush@kernel.org> wrote:
>
> Hi,
>
>> [...]
>> allocated from scratch, they will fail to be preserved with the upcoming
>> hugetlb preservation series [0].
>> 
>> Fix this by introducing the concept of extended scratch areas. They are
>> areas that the kernel discovers on boot by walking the radix tree and
>> finding free memory ranges. See patch 10 for more details.
>
> Overlall LGTM.
>
> I have some small comments here and there for now as I din't get into
> all the details yet.

Thanks for reviewing, much appreciated! :-)

-- 
Regards,
Pratyush Yadav