From nobody Tue Apr 7 12:37:03 2026 Received: from mail-ed1-f74.google.com (mail-ed1-f74.google.com [209.85.208.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49BB13E8C68 for ; Wed, 25 Feb 2026 15:40:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772034008; cv=none; b=XIxGTTUXZriLDs4P19fXjYraV8VkXKkHEq8BA9B8Ji1TK+nuhbnZs0cXYBbrbOLyxX1Nl9ZQzgghrd+XYBZNolEcwCX2BnJikgPdHXJmZuEUl24p5v92sluoa9tyHyfzlhhfXaWKy/B1fDHvwIbV1m1L0Sc2EgR/JxhZM2vHMGI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772034008; c=relaxed/simple; bh=9dnP3Kplei6Wvy5lZCCybShSS/eX737SSWZf8GOKF0s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tg1SUas6aDksn+WyN9TmOuYEslqm0NMTVGZc9PpYwc6RybwmQK1IDttYq0E80Dd23GXu3nCFmFAQUqaSB/bLXRG0Oe1gzf41YCcTuKznfF8VxyZ6DTlblF8cvxu3dRDDiY+NwA9rzOgAj4ryZhSSraAgidT1oOciQsF+4RIsP2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--mclapinski.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Fb/bh5XX; arc=none smtp.client-ip=209.85.208.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--mclapinski.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Fb/bh5XX" Received: by mail-ed1-f74.google.com with SMTP id 4fb4d7f45d1cf-65b26363923so8688934a12.0 for ; Wed, 25 Feb 2026 07:40:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772034005; x=1772638805; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ipTaz++qTtdDCBjKXU7vNcSleKIn31kyDo4+ZfAiimU=; b=Fb/bh5XXeVIc39NxAkAA1PwmxflJCtbAl7mIdqJu7mJqD/XOIqhNk9Q+c/rqJz3ltk SouJJXJ2Aj2OUHBbIHUGQ1b/oCwVl6xDYtIABn8UXEHia0oVZZDsgJtYhR6DiMsfDmqV fJj9O9np6SVwEQjBaXZl/fBwC57tgwZqXmiKCFH2OXiMl0qZxdeddjN5wFz5L7OYficf ZVWUjL9bWK5qLcN02UD1V1zmGbIlEiOg4gm1bocgNvA2E6suVvDzOqrauQhqGq37L3BD k9WgXgrZEbcCEoMUIarWgNoUAPbS8ijHR92YrVRSUzUbWVcu4/+NW166ZGzj1uoQUSWe 6IOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772034005; x=1772638805; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ipTaz++qTtdDCBjKXU7vNcSleKIn31kyDo4+ZfAiimU=; b=eleyAIXcFjulYlQQOKdhBcRpB+4xE8U+njjvbczohHcsihe326S+EitG0i9ItYleA6 os2WV2L5TOkAJ+p0YU9MOeh7W+lxahbwY1FK2UoJwHi6gm896xEiRqveB7IDqR/Xsg8S rlxKCvVvj6c/LJDnfKIwKL3Vb4RvavV1MLFniyFGqXggVi7R9vo2jUPZfLSp05QUaRvj X6oMVq6BH3gnRawZu3Eqg/3XBk+y7jDt8zNau5RJous8sBwag7B3cWDkOUesbagfYOlP TNEQOOANnZz4KMpE2nHnp1ynsLBz74+eeVfn14WnYKrIPnSYPoiuIGI0CLrGnUeuq7HW IFQg== X-Gm-Message-State: AOJu0Yx8pzs85lmAxeKza1Ee+JSQcbPo0ts/wIB82TMGuTwi+YWBUgWq Bar+3KSVh7hsquWdv0drfL2ZXtKjwInx7JkOQR9OLHJzbip0DNYdVJ+Crx8KB01INbgdLRaCD2K 6IzP0txCq8RdSrWTgJ1FSaA== X-Received: from edsu1.prod.google.com ([2002:aa7:d981:0:b0:65a:1eb4:60db]) (user=mclapinski job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:360e:b0:65f:7000:928f with SMTP id 4fb4d7f45d1cf-65f700095a8mr3689412a12.0.1772034005339; Wed, 25 Feb 2026 07:40:05 -0800 (PST) Date: Wed, 25 Feb 2026 16:39:54 +0100 In-Reply-To: <20260225153955.1006649-1-mclapinski@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225153955.1006649-1-mclapinski@google.com> X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260225153955.1006649-2-mclapinski@google.com> Subject: [PATCH v5 1/2] kho: fix deferred init of kho scratch From: Michal Clapinski To: Evangelos Petrongonas , Pasha Tatashin , Mike Rapoport , Pratyush Yadav , Alexander Graf , kexec@lists.infradead.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Michal Clapinski Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, mm_core_init calls kho_memory_init, which calls kho_release_scratch. If DEFERRED is enabled, kho_release_scratch will first initialize the struct pages of kho scratch. This is not needed. We can just let page_alloc_init_late init it. Next, kho_release_scratch will mark scratch as MIGRATE_CMA. If DEFERRED is enabled, this will be overwritten later in deferred_free_pages. To fix this, I removed the whole kho_release_scratch. Marking the pageblocks as MIGRATE_CMA now happens in kho_init, which runs after deferred_free_pages. Signed-off-by: Michal Clapinski Reviewed-by: Mike Rapoport (Microsoft) --- include/linux/memblock.h | 2 -- kernel/liveupdate/kexec_handover.c | 43 ++++++++---------------------- mm/memblock.c | 22 --------------- 3 files changed, 11 insertions(+), 56 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 6ec5e9ac0699..3e217414e12d 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -614,11 +614,9 @@ static inline void memtest_report_meminfo(struct seq_f= ile *m) { } #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH void memblock_set_kho_scratch_only(void); void memblock_clear_kho_scratch_only(void); -void memmap_init_kho_scratch_pages(void); #else static inline void memblock_set_kho_scratch_only(void) { } static inline void memblock_clear_kho_scratch_only(void) { } -static inline void memmap_init_kho_scratch_pages(void) {} #endif =20 #endif /* _LINUX_MEMBLOCK_H */ diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 410098bae0bf..1e1f69b10457 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -1388,11 +1388,6 @@ static __init int kho_init(void) if (err) goto err_free_fdt; =20 - if (fdt) { - kho_in_debugfs_init(&kho_in.dbg, fdt); - return 0; - } - for (int i =3D 0; i < kho_scratch_cnt; i++) { unsigned long base_pfn =3D PHYS_PFN(kho_scratch[i].addr); unsigned long count =3D kho_scratch[i].size >> PAGE_SHIFT; @@ -1408,8 +1403,17 @@ static __init int kho_init(void) */ kmemleak_ignore_phys(kho_scratch[i].addr); for (pfn =3D base_pfn; pfn < base_pfn + count; - pfn +=3D pageblock_nr_pages) - init_cma_reserved_pageblock(pfn_to_page(pfn)); + pfn +=3D pageblock_nr_pages) { + if (fdt) + init_cma_pageblock(pfn_to_page(pfn)); + else + init_cma_reserved_pageblock(pfn_to_page(pfn)); + } + } + + if (fdt) { + kho_in_debugfs_init(&kho_in.dbg, fdt); + return 0; } =20 WARN_ON_ONCE(kho_debugfs_fdt_add(&kho_out.dbg, "fdt", @@ -1435,35 +1439,10 @@ static __init int kho_init(void) } fs_initcall(kho_init); =20 -static void __init kho_release_scratch(void) -{ - phys_addr_t start, end; - u64 i; - - memmap_init_kho_scratch_pages(); - - /* - * Mark scratch mem as CMA before we return it. That way we - * ensure that no kernel allocations happen on it. That means - * we can reuse it as scratch memory again later. - */ - __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, - MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) { - ulong start_pfn =3D pageblock_start_pfn(PFN_DOWN(start)); - ulong end_pfn =3D pageblock_align(PFN_UP(end)); - ulong pfn; - - for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pageblock_nr_pages) - init_pageblock_migratetype(pfn_to_page(pfn), - MIGRATE_CMA, false); - } -} - void __init kho_memory_init(void) { if (kho_in.scratch_phys) { kho_scratch =3D phys_to_virt(kho_in.scratch_phys); - kho_release_scratch(); =20 if (kho_mem_retrieve(kho_get_fdt())) kho_in.fdt_phys =3D 0; diff --git a/mm/memblock.c b/mm/memblock.c index b3ddfdec7a80..ae6a5af46bd7 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -959,28 +959,6 @@ __init void memblock_clear_kho_scratch_only(void) { kho_scratch_only =3D false; } - -__init void memmap_init_kho_scratch_pages(void) -{ - phys_addr_t start, end; - unsigned long pfn; - int nid; - u64 i; - - if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) - return; - - /* - * Initialize struct pages for free scratch memory. - * The struct pages for reserved scratch memory will be set up in - * reserve_bootmem_region() - */ - __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, - MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) { - for (pfn =3D PFN_UP(start); pfn < PFN_DOWN(end); pfn++) - init_deferred_page(pfn, nid); - } -} #endif =20 /** --=20 2.53.0.414.gf7e9f6c205-goog From nobody Tue Apr 7 12:37:03 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 416BD3E8C7C for ; Wed, 25 Feb 2026 15:40:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772034011; cv=none; b=tx27BlfdLY5EPjwS0B+pqYpqWAF/rDs6KUJ6qvpV9e+1bxKuZPjiY12SjT0vTOewH62OHQyxs2t2mTnHbHDEdyyAbotHfiPme1Ji6YAMKsmpvEsDL25GNcN9nT3XSi1G7I2283u5qJLbAyrhx3rFDibSqh9cfi7WKhs8w8oPCz0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772034011; c=relaxed/simple; bh=gmi4XhDQgnWZCdjsqYyQssgH7XBx48zNVin7JheFPCI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TmXcu4jdLthUonVc3JW2aixgGjCuAzqZ5aM0OaT/y2rYeq/uKzZKQpz++HkQ2KYNxZGk6+8lM7CysY32b2R/oGVOh19o9hZ26c0odUR/n58gILaKPkPB8S1qN765Nxtp0Wcqawi81F2BuiQEsO21y+CP65Icjv/BI8/MKtfTS7w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--mclapinski.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=I0Q1JBm6; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--mclapinski.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="I0Q1JBm6" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4837907ec88so77810705e9.0 for ; Wed, 25 Feb 2026 07:40:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772034008; x=1772638808; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tQzAMSBymjIc7twUCIwXZitW9lygfMtcVFGOkxE5cgc=; b=I0Q1JBm6Fgp/OpFNeAOjjnvXdJxH9rGzhxeRUdHtVBoctg+q14vjvztS6gusv5nmqy ivfA534oy1pHkcnbKU8nyOVYoFpQpdsAZ3l/hjCXIiteJgLC4V2gN3Kvf1GPUN5jJJ2K AJRixfkAT+FdZRs4tdOzq2IWigSXHEif2PltYLo/KnlOF1ZMQoQ/VcZ+5Ffs5I1lspfH OpwAYW/YR587qdfRDmuXMlb3zLZ76YNdORYQ4AePMji4vbofnsn3JH27qQwQPzSy0+SP vMCqe1NdQKwzodoZskQB3Sweu2L3sUf3tKImV2Rb0enA4Ghej/psiBI7m9PJgCqo1kvp ZShQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772034008; x=1772638808; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tQzAMSBymjIc7twUCIwXZitW9lygfMtcVFGOkxE5cgc=; b=ta1I5P19xCdrx4ArIvqBkdVqoJ4pxxykRWm0Tc36fPMVtSOzefRL0JuO1A8nqQUIb3 Z7u79ia95fSQRVTqSPEXMcx2dhrarAblBFbGS59rV7/G+J1564HnlNsYIjffPuIfUZ0P c68HINzIwr3ERKLFMgpjqTtJEjz5VdrJy4AjsOguXYgA/IzpyLFqmKmMJoTGu7Vp6Y/9 +Xn5y/qa45koZTJ4XiUrR8Obx39l/IYqBLJOnkDzVxNWdaIJ+xI8JTlJyTxacl2UZ/j7 QKsnZshHqx6/g4OyRjfpLlckbzQdWxHlQqAylMEFvIuJodYcglYkefbKWwmZHAhxWKU5 etew== X-Gm-Message-State: AOJu0YwBryfxVxaOeP8vdiA+fnKwlPFGhMkQJ2XhHyViv4p1H+3F97fX F3fADtAAkbjdYiv6+wE3JNtA/RHqX21iV4ZIsw4mom4MEXvFurpYumtBdcQUvZIXOLFNdook6ON jjiySqmjPFAPq3z3VE/n9Dw== X-Received: from wmjs3.prod.google.com ([2002:a7b:c383:0:b0:47e:e47f:de89]) (user=mclapinski job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:8595:b0:483:badb:618a with SMTP id 5b1f17b1804b1-483badb6355mr83029045e9.27.1772034008494; Wed, 25 Feb 2026 07:40:08 -0800 (PST) Date: Wed, 25 Feb 2026 16:39:55 +0100 In-Reply-To: <20260225153955.1006649-1-mclapinski@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225153955.1006649-1-mclapinski@google.com> X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260225153955.1006649-3-mclapinski@google.com> Subject: [PATCH v5 2/2] kho: make preserved pages compatible with deferred struct page init From: Michal Clapinski To: Evangelos Petrongonas , Pasha Tatashin , Mike Rapoport , Pratyush Yadav , Alexander Graf , kexec@lists.infradead.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Michal Clapinski Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Evangelos Petrongonas When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, struct page initialization is deferred to parallel kthreads that run later in the boot process. During KHO restoration, kho_preserved_memory_reserve() writes metadata for each preserved memory region. However, if the struct page has not been initialized, this write targets uninitialized memory, potentially leading to errors like: BUG: unable to handle page fault for address: ... Fix this by introducing kho_get_preserved_page(), which ensures all struct pages in a preserved region are initialized by calling init_deferred_page() which is a no-op when the struct page is already initialized. Signed-off-by: Evangelos Petrongonas Co-developed-by: Michal Clapinski Signed-off-by: Michal Clapinski Reviewed-by: Pratyush Yadav (Google) Reviewed-by: Pasha Tatashin Reviewed-by: Mike Rapoport (Microsoft) --- I think we can't initialize those struct pages in kho_restore_page. I encountered this stack: page_zone(start_page) __pageblock_pfn_to_page set_zone_contiguous page_alloc_init_late So, at the end of page_alloc_init_late struct pages are expected to be already initialized. set_zone_contiguous() looks at the first and last struct page of each pageblock in each populated zone to figure out if the zone is contiguous. If a kho page lands on a pageblock boundary, this will lead to access of an uninitialized struct page. There is also page_ext_init that invokes pfn_to_nid, which calls page_to_nid for each section-aligned page. There might be other places that do something similar. Therefore, it's a good idea to initialize all struct pages by the end of deferred struct page init. That's why I'm resending Evangelos's patch. I also tried to implement Pratyush's idea, i.e. iterate over zones, then get node from zone. I didn't notice any performance difference even with 8GB of kho. --- kernel/liveupdate/Kconfig | 2 -- kernel/liveupdate/kexec_handover.c | 27 ++++++++++++++++++++++++++- 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig index 1a8513f16ef7..c13af38ba23a 100644 --- a/kernel/liveupdate/Kconfig +++ b/kernel/liveupdate/Kconfig @@ -1,12 +1,10 @@ # SPDX-License-Identifier: GPL-2.0-only =20 menu "Live Update and Kexec HandOver" - depends on !DEFERRED_STRUCT_PAGE_INIT =20 config KEXEC_HANDOVER bool "kexec handover" depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE - depends on !DEFERRED_STRUCT_PAGE_INIT select MEMBLOCK_KHO_SCRATCH select KEXEC_FILE select LIBFDT diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_h= andover.c index 1e1f69b10457..761d7c9c1ed8 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -470,6 +470,31 @@ struct page *kho_restore_pages(phys_addr_t phys, unsig= ned long nr_pages) } EXPORT_SYMBOL_GPL(kho_restore_pages); =20 +/* + * With CONFIG_DEFERRED_STRUCT_PAGE_INIT, struct pages in higher memory re= gions + * may not be initialized yet at the time KHO deserializes preserved memor= y. + * KHO uses the struct page to store metadata and a later initialization w= ould + * overwrite it. + * Ensure all the struct pages in the preservation are + * initialized. kho_preserved_memory_reserve() marks the reservation as no= init + * to make sure they don't get re-initialized later. + */ +static struct page *__init kho_get_preserved_page(phys_addr_t phys, + unsigned int order) +{ + unsigned long pfn =3D PHYS_PFN(phys); + int nid; + + if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) + return pfn_to_page(pfn); + + nid =3D early_pfn_to_nid(pfn); + for (unsigned long i =3D 0; i < (1UL << order); i++) + init_deferred_page(pfn + i, nid); + + return pfn_to_page(pfn); +} + static int __init kho_preserved_memory_reserve(phys_addr_t phys, unsigned int order) { @@ -478,7 +503,7 @@ static int __init kho_preserved_memory_reserve(phys_add= r_t phys, u64 sz; =20 sz =3D 1 << (order + PAGE_SHIFT); - page =3D phys_to_page(phys); + page =3D kho_get_preserved_page(phys, order); =20 /* Reserve the memory preserved in KHO in memblock */ memblock_reserve(phys, sz); --=20 2.53.0.414.gf7e9f6c205-goog