From nobody Mon Feb 9 03:17:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D4A4C0015E for ; Mon, 24 Jul 2023 13:50:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230460AbjGXNu1 (ORCPT ); Mon, 24 Jul 2023 09:50:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230330AbjGXNtw (ORCPT ); Mon, 24 Jul 2023 09:49:52 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 556FB100 for ; Mon, 24 Jul 2023 06:46:50 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-3fbc244d307so41942555e9.1 for ; Mon, 24 Jul 2023 06:46:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690206408; x=1690811208; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2+5f3AgE150Mh6cYaW30LczJjyIbsUmp9ItlqNwo9Gs=; b=ikYTxmvNKkByuJlR0NutAuZwBS+HffntLHXBTOm8ABSvwCfWL2fKqLXT+iOY+era+y 0f+kQ2586uXX6d82nfLBSpOQr1gqnMFjKQ47G8ZLbVTI8NSLejjGc+kDPEvl1HVxnBUF jCpEFBNwQAA+XsldO7xDMXzVVhi5WukMODB/6kuUFR9VRoS+9pioBp657mkjAf4PNZ/i F4wa0/ndwm2yPUB7BijO2XndG4yNOsOBIbw8Z0pGdeoakBDEl8/5c+TqIcujNE6fR8ZQ FpFK7b0NO5bBOQMrm+8GlX2hKWNa2zqyvycuRIQfoA56RG/Cn9fc7U79QcopUi7lSSGW lQSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690206408; x=1690811208; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2+5f3AgE150Mh6cYaW30LczJjyIbsUmp9ItlqNwo9Gs=; b=iMLyZHJf6jEXtbxwDDAos85BMzdy8NNfzcJ10mey5YSkwmYR8b7Jsj2xd2eaoU6scJ y5j3dyZeD0/a/kXOptt1NXnN38Y/SAyJUd+WcaD5L2rBTFz/mJv92AEoi+i/e0VZgxcz w09ZZyhueHjiblw0pRjr9hybFOKMoTy/MqFg5TNTkyHyqruwNLta33WCxy+KUh/iDL53 ag2dQQxHN0FuxcOAjjXtgez/AUT5CqkW6tpu1rnKkYU0fNVu5cKcM2Pe5Ym/6orXm3+T WbzbGO8+MDq4uezjOquvOZrjwcqqS9ek2Lm3dRw5FdlFJqG+YSSXICshGLp4BVf1CSyT MxPg== X-Gm-Message-State: ABy/qLbGeto0GKGPKAG5dsVAIgo3zjevd1AqopZ+fXRLYVhMJgLk8tqs h2PXpXyPYlnLf3FQ2YLFqdP3qw== X-Google-Smtp-Source: APBJJlG2oRr8ivYhrN6NtzN8nuU0+P+lfubBc/YogasGIn8tgMOZpf/GJE4V1pfPjhr20IzDbaEKIw== X-Received: by 2002:a1c:750a:0:b0:3fd:ad65:ea8b with SMTP id o10-20020a1c750a000000b003fdad65ea8bmr3233762wmc.12.1690206408483; Mon, 24 Jul 2023 06:46:48 -0700 (PDT) Received: from localhost.localdomain ([2a02:6b6a:b465:0:d7c4:7f46:8fed:f874]) by smtp.gmail.com with ESMTPSA id e19-20020a05600c219300b003fbe791a0e8sm10209354wme.0.2023.07.24.06.46.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 06:46:48 -0700 (PDT) From: Usama Arif To: linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, rppt@kernel.org Cc: linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com, Usama Arif Subject: [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Date: Mon, 24 Jul 2023 14:46:41 +0100 Message-Id: <20230724134644.1299963-2-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230724134644.1299963-1-usama.arif@bytedance.com> References: <20230724134644.1299963-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When vmemmap is optimizable, it will free all the duplicated tail pages in hugetlb_vmemmap_optimize while preparing the new hugepage. Hence, there is no need to prepare them. For 1G x86 hugepages, it avoids preparing 262144 - 64 =3D 262080 struct pages per hugepage. Signed-off-by: Usama Arif --- mm/hugetlb.c | 30 +++++++++++++++++++++--------- mm/hugetlb_vmemmap.c | 2 +- mm/hugetlb_vmemmap.h | 1 + 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 64a3239b6407..24352abbb9e5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1943,13 +1943,22 @@ static void prep_new_hugetlb_folio(struct hstate *h= , struct folio *folio, int ni } =20 static bool __prep_compound_gigantic_folio(struct folio *folio, - unsigned int order, bool demote) + unsigned int order, bool demote, + bool hugetlb_vmemmap_optimizable) { int i, j; int nr_pages =3D 1 << order; struct page *p; =20 __folio_clear_reserved(folio); + + /* + * No need to prep pages that will be freed later by hugetlb_vmemmap_opti= mize + * in prep_new_huge_page. Hence, reduce nr_pages to the pages that will b= e kept. + */ + if (hugetlb_vmemmap_optimizable) + nr_pages =3D HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page); + for (i =3D 0; i < nr_pages; i++) { p =3D folio_page(folio, i); =20 @@ -2020,15 +2029,15 @@ static bool __prep_compound_gigantic_folio(struct f= olio *folio, } =20 static bool prep_compound_gigantic_folio(struct folio *folio, - unsigned int order) + unsigned int order, bool hugetlb_vmemmap_optimizable) { - return __prep_compound_gigantic_folio(folio, order, false); + return __prep_compound_gigantic_folio(folio, order, false, hugetlb_vmemma= p_optimizable); } =20 static bool prep_compound_gigantic_folio_for_demote(struct folio *folio, - unsigned int order) + unsigned int order, bool hugetlb_vmemmap_optimizable) { - return __prep_compound_gigantic_folio(folio, order, true); + return __prep_compound_gigantic_folio(folio, order, true, hugetlb_vmemmap= _optimizable); } =20 /* @@ -2185,7 +2194,8 @@ static struct folio *alloc_fresh_hugetlb_folio(struct= hstate *h, if (!folio) return NULL; if (hstate_is_gigantic(h)) { - if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) { + if (!prep_compound_gigantic_folio(folio, huge_page_order(h), + vmemmap_should_optimize(h, &folio->page))) { /* * Rare failure to convert pages to compound page. * Free pages and try again - ONCE! @@ -3201,7 +3211,8 @@ static void __init gather_bootmem_prealloc(void) =20 VM_BUG_ON(!hstate_is_gigantic(h)); WARN_ON(folio_ref_count(folio) !=3D 1); - if (prep_compound_gigantic_folio(folio, huge_page_order(h))) { + if (prep_compound_gigantic_folio(folio, huge_page_order(h), + vmemmap_should_optimize(h, page))) { WARN_ON(folio_test_reserved(folio)); prep_new_hugetlb_folio(h, folio, folio_nid(folio)); free_huge_page(page); /* add to the hugepage allocator */ @@ -3624,8 +3635,9 @@ static int demote_free_hugetlb_folio(struct hstate *h= , struct folio *folio) subpage =3D folio_page(folio, i); inner_folio =3D page_folio(subpage); if (hstate_is_gigantic(target_hstate)) - prep_compound_gigantic_folio_for_demote(inner_folio, - target_hstate->order); + prep_compound_gigantic_folio_for_demote(folio, + target_hstate->order, + vmemmap_should_optimize(target_hstate, subpage)); else prep_compound_page(subpage, target_hstate->order); folio_change_private(inner_folio, NULL); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index c2007ef5e9b0..b721e87de2b3 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -486,7 +486,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, str= uct page *head) } =20 /* Return true iff a HugeTLB whose vmemmap should and can be optimized. */ -static bool vmemmap_should_optimize(const struct hstate *h, const struct p= age *head) +bool vmemmap_should_optimize(const struct hstate *h, const struct page *he= ad) { if (!READ_ONCE(vmemmap_optimize_enabled)) return false; diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 25bd0e002431..3525c514c061 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -57,4 +57,5 @@ static inline bool hugetlb_vmemmap_optimizable(const stru= ct hstate *h) { return hugetlb_vmemmap_optimizable_size(h) !=3D 0; } +bool vmemmap_should_optimize(const struct hstate *h, const struct page *he= ad); #endif /* _LINUX_HUGETLB_VMEMMAP_H */ --=20 2.25.1 From nobody Mon Feb 9 03:17:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19BDEC0015E for ; Mon, 24 Jul 2023 13:49:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231527AbjGXNtu (ORCPT ); Mon, 24 Jul 2023 09:49:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231524AbjGXNt2 (ORCPT ); Mon, 24 Jul 2023 09:49:28 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7A741BC5 for ; Mon, 24 Jul 2023 06:46:51 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-3fd0f000f1cso27354865e9.1 for ; Mon, 24 Jul 2023 06:46:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690206409; x=1690811209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TrBvXJXG5pOuPGnFnmmWgOCSm+Hu58ziTnCPKJdrxFI=; b=hpDj6ajLvbAW9mGF/M6fPkHM3QLBe4nIO3sZ+5rXgwwUYggGi/WBHwUVl1j7lubKb8 LIXvVDqKO3ib1Q5crCeVVLcCwz6zNW1WL7yBf3YfRSnBoROpth2S/AcSHCUOGFSvOamJ WwynED0rZsIpTLVIUvw0WZlcGPrJtSbolW7vEwQvV4fqE/ypPbFP/TVjajy/h0SLgyJd DEkKUg1jUWxEp6QBOkRtSMiC3xBufpqtPJyRabtQ/i1vuBg6r5wxRkqLOsuG4q264q1h bnSfu1davANntTXapNjREHqzZwcU0yVx4hSTqtUWrFB7eFjVaKtY5FR0QwLONqCmO8lV fFVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690206409; x=1690811209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TrBvXJXG5pOuPGnFnmmWgOCSm+Hu58ziTnCPKJdrxFI=; b=VuptH4MhAZNJJFuNrue9thiCzOvm8H3KwcqEGzFvXW19j7tLXeq5LPHhzdtzjS9YeD IJ8Qf45KwPfUEhGJ5y8BOhSfC1Yfpoqel8S10o1vYZAhmUttM+SEEo0rEIQgW35uPegY ybWyycyo5UZ1pfho7jZpwNQsz3UJwy/+728fXTX02ssRdJB0h+hpNVD6ewfFS7zNln5Y t+NJSte9MVaH4RAudzzapulgdMOx3Hp3Ry9R0eN2BZ3/DpslmbRd+8QIgdyuBS3hasN0 Xy4BngaDMLmLPFsrjtdDvoFgON/fBXR4vfBua/dXD5ZHwH8RZisn8eiW61SEQg6f97Bi l6fg== X-Gm-Message-State: ABy/qLYm+pdhEkJ22/n2f5rLSMp2Mim0ncWT3Y/GKlaGAciRZMG4LTMt e4iwE/0l/uBqPDrzHSiH3sE63A== X-Google-Smtp-Source: APBJJlHk8KJWAC/OoxngikxSJIK1cZRDdcxZvpPXhgKaCZyVzjAtt+mOGG+/DXri1zptdNJ2s+Z0zg== X-Received: by 2002:a7b:c8d7:0:b0:3fc:180:6463 with SMTP id f23-20020a7bc8d7000000b003fc01806463mr8081280wml.7.1690206409462; Mon, 24 Jul 2023 06:46:49 -0700 (PDT) Received: from localhost.localdomain ([2a02:6b6a:b465:0:d7c4:7f46:8fed:f874]) by smtp.gmail.com with ESMTPSA id e19-20020a05600c219300b003fbe791a0e8sm10209354wme.0.2023.07.24.06.46.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 06:46:48 -0700 (PDT) From: Usama Arif To: linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, rppt@kernel.org Cc: linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com, Usama Arif Subject: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Date: Mon, 24 Jul 2023 14:46:42 +0100 Message-Id: <20230724134644.1299963-3-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230724134644.1299963-1-usama.arif@bytedance.com> References: <20230724134644.1299963-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This propagates the hugepage size from the memblock APIs (memblock_alloc_try_nid_raw and memblock_alloc_range_nid) so that it can be stored in struct memblock region. This does not introduce any functional change and hugepage_size is not used in this commit. It is just a setup for the next commit where huge_pagesize is used to skip initialization of struct pages that will be freed later when HVO is enabled. Signed-off-by: Usama Arif --- arch/arm64/mm/kasan_init.c | 2 +- arch/powerpc/platforms/pasemi/iommu.c | 2 +- arch/powerpc/platforms/pseries/setup.c | 4 +- arch/powerpc/sysdev/dart_iommu.c | 2 +- include/linux/memblock.h | 8 ++- mm/cma.c | 4 +- mm/hugetlb.c | 6 +- mm/memblock.c | 60 ++++++++++++-------- mm/mm_init.c | 2 +- mm/sparse-vmemmap.c | 2 +- tools/testing/memblock/tests/alloc_nid_api.c | 2 +- 11 files changed, 56 insertions(+), 38 deletions(-) diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index f17d066e85eb..39992a418891 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c @@ -50,7 +50,7 @@ static phys_addr_t __init kasan_alloc_raw_page(int node) void *p =3D memblock_alloc_try_nid_raw(PAGE_SIZE, PAGE_SIZE, __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_NOLEAKTRACE, - node); + node, 0); if (!p) panic("%s: Failed to allocate %lu bytes align=3D0x%lx nid=3D%d from=3D%l= lx\n", __func__, PAGE_SIZE, PAGE_SIZE, node, diff --git a/arch/powerpc/platforms/pasemi/iommu.c b/arch/powerpc/platforms= /pasemi/iommu.c index 375487cba874..6963cdf76bce 100644 --- a/arch/powerpc/platforms/pasemi/iommu.c +++ b/arch/powerpc/platforms/pasemi/iommu.c @@ -201,7 +201,7 @@ static int __init iob_init(struct device_node *dn) /* For 2G space, 8x64 pages (2^21 bytes) is max total l2 size */ iob_l2_base =3D memblock_alloc_try_nid_raw(1UL << 21, 1UL << 21, MEMBLOCK_LOW_LIMIT, 0x80000000, - NUMA_NO_NODE); + NUMA_NO_NODE, 0); if (!iob_l2_base) panic("%s: Failed to allocate %lu bytes align=3D0x%lx max_addr=3D%x\n", __func__, 1UL << 21, 1UL << 21, 0x80000000); diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platform= s/pseries/setup.c index e2a57cfa6c83..cec7198b59d2 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -160,7 +160,7 @@ static void __init fwnmi_init(void) */ mce_data_buf =3D memblock_alloc_try_nid_raw(RTAS_ERROR_LOG_MAX * nr_cpus, RTAS_ERROR_LOG_MAX, MEMBLOCK_LOW_LIMIT, - ppc64_rma_size, NUMA_NO_NODE); + ppc64_rma_size, NUMA_NO_NODE, 0); if (!mce_data_buf) panic("Failed to allocate %d bytes below %pa for MCE buffer\n", RTAS_ERROR_LOG_MAX * nr_cpus, &ppc64_rma_size); @@ -176,7 +176,7 @@ static void __init fwnmi_init(void) size =3D sizeof(struct slb_entry) * mmu_slb_size * nr_cpus; slb_ptr =3D memblock_alloc_try_nid_raw(size, sizeof(struct slb_entry), MEMBLOCK_LOW_LIMIT, - ppc64_rma_size, NUMA_NO_NODE); + ppc64_rma_size, NUMA_NO_NODE, 0); if (!slb_ptr) panic("Failed to allocate %zu bytes below %pa for slb area\n", size, &ppc64_rma_size); diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_io= mmu.c index 98096bbfd62e..86c676b61899 100644 --- a/arch/powerpc/sysdev/dart_iommu.c +++ b/arch/powerpc/sysdev/dart_iommu.c @@ -239,7 +239,7 @@ static void __init allocate_dart(void) */ dart_tablebase =3D memblock_alloc_try_nid_raw(SZ_16M, SZ_16M, MEMBLOCK_LOW_LIMIT, SZ_2G, - NUMA_NO_NODE); + NUMA_NO_NODE, 0); if (!dart_tablebase) panic("Failed to allocate 16MB below 2GB for DART table\n"); =20 diff --git a/include/linux/memblock.h b/include/linux/memblock.h index f71ff9f0ec81..bb8019540d73 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -63,6 +63,7 @@ struct memblock_region { #ifdef CONFIG_NUMA int nid; #endif + phys_addr_t hugepage_size; }; =20 /** @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size,= phys_addr_t align, phys_addr_t start, phys_addr_t end); phys_addr_t memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, - phys_addr_t end, int nid, bool exact_nid); + phys_addr_t end, int nid, bool exact_nid, + phys_addr_t hugepage_size); phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t alig= n, int nid); =20 static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size, @@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, ph= ys_addr_t align, int nid); void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, - int nid); + int nid, phys_addr_t hugepage_size); void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid); @@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size, { return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT, MEMBLOCK_ALLOC_ACCESSIBLE, - NUMA_NO_NODE); + NUMA_NO_NODE, 0); } =20 static inline void *memblock_alloc_from(phys_addr_t size, diff --git a/mm/cma.c b/mm/cma.c index a4cfe995e11e..a270905aa7f2 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -334,7 +334,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, if (!memblock_bottom_up() && memblock_end >=3D SZ_4G + size) { memblock_set_bottom_up(true); addr =3D memblock_alloc_range_nid(size, alignment, SZ_4G, - limit, nid, true); + limit, nid, true, 0); memblock_set_bottom_up(false); } #endif @@ -353,7 +353,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, =20 if (!addr) { addr =3D memblock_alloc_range_nid(size, alignment, base, - limit, nid, true); + limit, nid, true, 0); if (!addr) { ret =3D -ENOMEM; goto err; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 24352abbb9e5..5ba7fd702458 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3168,7 +3168,8 @@ int __alloc_bootmem_huge_page(struct hstate *h, int n= id) /* do node specific alloc */ if (nid !=3D NUMA_NO_NODE) { m =3D memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h), - 0, MEMBLOCK_ALLOC_ACCESSIBLE, nid); + 0, MEMBLOCK_ALLOC_ACCESSIBLE, nid, + hugetlb_vmemmap_optimizable(h) ? huge_page_size(h) : 0); if (!m) return 0; goto found; @@ -3177,7 +3178,8 @@ int __alloc_bootmem_huge_page(struct hstate *h, int n= id) for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) { m =3D memblock_alloc_try_nid_raw( huge_page_size(h), huge_page_size(h), - 0, MEMBLOCK_ALLOC_ACCESSIBLE, node); + 0, MEMBLOCK_ALLOC_ACCESSIBLE, node, + hugetlb_vmemmap_optimizable(h) ? huge_page_size(h) : 0); /* * Use the beginning of the huge page to store the * huge_bootmem_page struct (until gather_bootmem diff --git a/mm/memblock.c b/mm/memblock.c index f9e61e565a53..e92d437bcb51 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -549,7 +549,8 @@ static void __init_memblock memblock_insert_region(stru= ct memblock_type *type, int idx, phys_addr_t base, phys_addr_t size, int nid, - enum memblock_flags flags) + enum memblock_flags flags, + phys_addr_t hugepage_size) { struct memblock_region *rgn =3D &type->regions[idx]; =20 @@ -558,6 +559,7 @@ static void __init_memblock memblock_insert_region(stru= ct memblock_type *type, rgn->base =3D base; rgn->size =3D size; rgn->flags =3D flags; + rgn->hugepage_size =3D hugepage_size; memblock_set_region_node(rgn, nid); type->cnt++; type->total_size +=3D size; @@ -581,7 +583,7 @@ static void __init_memblock memblock_insert_region(stru= ct memblock_type *type, */ static int __init_memblock memblock_add_range(struct memblock_type *type, phys_addr_t base, phys_addr_t size, - int nid, enum memblock_flags flags) + int nid, enum memblock_flags flags, phys_addr_t hugepage_size) { bool insert =3D false; phys_addr_t obase =3D base; @@ -598,6 +600,7 @@ static int __init_memblock memblock_add_range(struct me= mblock_type *type, type->regions[0].base =3D base; type->regions[0].size =3D size; type->regions[0].flags =3D flags; + type->regions[0].hugepage_size =3D hugepage_size; memblock_set_region_node(&type->regions[0], nid); type->total_size =3D size; return 0; @@ -646,7 +649,7 @@ static int __init_memblock memblock_add_range(struct me= mblock_type *type, end_rgn =3D idx + 1; memblock_insert_region(type, idx++, base, rbase - base, nid, - flags); + flags, hugepage_size); } } /* area below @rend is dealt with, forget about it */ @@ -661,7 +664,7 @@ static int __init_memblock memblock_add_range(struct me= mblock_type *type, start_rgn =3D idx; end_rgn =3D idx + 1; memblock_insert_region(type, idx, base, end - base, - nid, flags); + nid, flags, hugepage_size); } } =20 @@ -705,7 +708,7 @@ int __init_memblock memblock_add_node(phys_addr_t base,= phys_addr_t size, memblock_dbg("%s: [%pa-%pa] nid=3D%d flags=3D%x %pS\n", __func__, &base, &end, nid, flags, (void *)_RET_IP_); =20 - return memblock_add_range(&memblock.memory, base, size, nid, flags); + return memblock_add_range(&memblock.memory, base, size, nid, flags, 0); } =20 /** @@ -726,7 +729,7 @@ int __init_memblock memblock_add(phys_addr_t base, phys= _addr_t size) memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, &base, &end, (void *)_RET_IP_); =20 - return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0); + return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0, = 0); } =20 /** @@ -782,7 +785,7 @@ static int __init_memblock memblock_isolate_range(struc= t memblock_type *type, type->total_size -=3D base - rbase; memblock_insert_region(type, idx, rbase, base - rbase, memblock_get_region_node(rgn), - rgn->flags); + rgn->flags, 0); } else if (rend > end) { /* * @rgn intersects from above. Split and redo the @@ -793,7 +796,7 @@ static int __init_memblock memblock_isolate_range(struc= t memblock_type *type, type->total_size -=3D end - rbase; memblock_insert_region(type, idx--, rbase, end - rbase, memblock_get_region_node(rgn), - rgn->flags); + rgn->flags, 0); } else { /* @rgn is fully contained, record it */ if (!*end_rgn) @@ -863,14 +866,20 @@ int __init_memblock memblock_phys_free(phys_addr_t ba= se, phys_addr_t size) return memblock_remove_range(&memblock.reserved, base, size); } =20 -int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size) +int __init_memblock memblock_reserve_huge(phys_addr_t base, phys_addr_t si= ze, + phys_addr_t hugepage_size) { phys_addr_t end =3D base + size - 1; =20 memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, &base, &end, (void *)_RET_IP_); =20 - return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0= ); + return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0= , hugepage_size); +} + +int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size) +{ + return memblock_reserve_huge(base, size, 0); } =20 #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP @@ -881,7 +890,7 @@ int __init_memblock memblock_physmem_add(phys_addr_t ba= se, phys_addr_t size) memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, &base, &end, (void *)_RET_IP_); =20 - return memblock_add_range(&physmem, base, size, MAX_NUMNODES, 0); + return memblock_add_range(&physmem, base, size, MAX_NUMNODES, 0, 0); } #endif =20 @@ -1365,6 +1374,7 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone *z= one, * @end: the upper bound of the memory region to allocate (phys address) * @nid: nid of the free area to find, %NUMA_NO_NODE for any node * @exact_nid: control the allocation fall back to other nodes + * @hugepage_size: size of the hugepages in bytes * * The allocation is performed from memory region limited by * memblock.current_limit if @end =3D=3D %MEMBLOCK_ALLOC_ACCESSIBLE. @@ -1385,7 +1395,7 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone *z= one, phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, phys_addr_t end, int nid, - bool exact_nid) + bool exact_nid, phys_addr_t hugepage_size) { enum memblock_flags flags =3D choose_memblock_flags(); phys_addr_t found; @@ -1402,14 +1412,14 @@ phys_addr_t __init memblock_alloc_range_nid(phys_ad= dr_t size, again: found =3D memblock_find_in_range_node(size, align, start, end, nid, flags); - if (found && !memblock_reserve(found, size)) + if (found && !memblock_reserve_huge(found, size, hugepage_size)) goto done; =20 if (nid !=3D NUMA_NO_NODE && !exact_nid) { found =3D memblock_find_in_range_node(size, align, start, end, NUMA_NO_NODE, flags); - if (found && !memblock_reserve(found, size)) + if (found && !memblock_reserve_huge(found, size, hugepage_size)) goto done; } =20 @@ -1469,7 +1479,7 @@ phys_addr_t __init memblock_phys_alloc_range(phys_add= r_t size, __func__, (u64)size, (u64)align, &start, &end, (void *)_RET_IP_); return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE, - false); + false, 0); } =20 /** @@ -1488,7 +1498,7 @@ phys_addr_t __init memblock_phys_alloc_range(phys_add= r_t size, phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr= _t align, int nid) { return memblock_alloc_range_nid(size, align, 0, - MEMBLOCK_ALLOC_ACCESSIBLE, nid, false); + MEMBLOCK_ALLOC_ACCESSIBLE, nid, false, 0); } =20 /** @@ -1514,7 +1524,7 @@ phys_addr_t __init memblock_phys_alloc_try_nid(phys_a= ddr_t size, phys_addr_t ali static void * __init memblock_alloc_internal( phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, - int nid, bool exact_nid) + int nid, bool exact_nid, phys_addr_t hugepage_size) { phys_addr_t alloc; =20 @@ -1530,12 +1540,12 @@ static void * __init memblock_alloc_internal( max_addr =3D memblock.current_limit; =20 alloc =3D memblock_alloc_range_nid(size, align, min_addr, max_addr, nid, - exact_nid); + exact_nid, hugepage_size); =20 /* retry allocation without lower limit */ if (!alloc && min_addr) alloc =3D memblock_alloc_range_nid(size, align, 0, max_addr, nid, - exact_nid); + exact_nid, hugepage_size); =20 if (!alloc) return NULL; @@ -1571,7 +1581,7 @@ void * __init memblock_alloc_exact_nid_raw( &max_addr, (void *)_RET_IP_); =20 return memblock_alloc_internal(size, align, min_addr, max_addr, nid, - true); + true, 0); } =20 /** @@ -1585,25 +1595,29 @@ void * __init memblock_alloc_exact_nid_raw( * is preferred (phys address), or %MEMBLOCK_ALLOC_ACCESSIBLE to * allocate only from memory limited by memblock.current_limit value * @nid: nid of the free area to find, %NUMA_NO_NODE for any node + * @hugepage_size: size of the hugepages in bytes * * Public function, provides additional debug information (including caller * info), if enabled. Does not zero allocated memory, does not panic if re= quest * cannot be satisfied. * + * If hugepage_size is not 0 and HVO is enabled, then only the struct pages + * that are not freed by HVO are initialized using the hugepage_size param= eter. + * * Return: * Virtual address of allocated memory block on success, NULL on failure. */ void * __init memblock_alloc_try_nid_raw( phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, - int nid) + int nid, phys_addr_t hugepage_size) { memblock_dbg("%s: %llu bytes align=3D0x%llx nid=3D%d from=3D%pa max_addr= =3D%pa %pS\n", __func__, (u64)size, (u64)align, nid, &min_addr, &max_addr, (void *)_RET_IP_); =20 return memblock_alloc_internal(size, align, min_addr, max_addr, nid, - false); + false, hugepage_size); } =20 /** @@ -1634,7 +1648,7 @@ void * __init memblock_alloc_try_nid( __func__, (u64)size, (u64)align, nid, &min_addr, &max_addr, (void *)_RET_IP_); ptr =3D memblock_alloc_internal(size, align, - min_addr, max_addr, nid, false); + min_addr, max_addr, nid, false, 0); if (ptr) memset(ptr, 0, size); =20 diff --git a/mm/mm_init.c b/mm/mm_init.c index a1963c3322af..c36d768bb671 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1615,7 +1615,7 @@ void __init *memmap_alloc(phys_addr_t size, phys_addr= _t align, else ptr =3D memblock_alloc_try_nid_raw(size, align, min_addr, MEMBLOCK_ALLOC_ACCESSIBLE, - nid); + nid, 0); =20 if (ptr && size > 0) page_init_poison(ptr, size); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index a044a130405b..56b8b8e684df 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -43,7 +43,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node, unsigned long goal) { return memblock_alloc_try_nid_raw(size, align, goal, - MEMBLOCK_ALLOC_ACCESSIBLE, node); + MEMBLOCK_ALLOC_ACCESSIBLE, node, 0); } =20 void * __meminit vmemmap_alloc_block(unsigned long size, int node) diff --git a/tools/testing/memblock/tests/alloc_nid_api.c b/tools/testing/m= emblock/tests/alloc_nid_api.c index 49bb416d34ff..225044366fbb 100644 --- a/tools/testing/memblock/tests/alloc_nid_api.c +++ b/tools/testing/memblock/tests/alloc_nid_api.c @@ -43,7 +43,7 @@ static inline void *run_memblock_alloc_nid(phys_addr_t si= ze, max_addr, nid); if (alloc_nid_test_flags & TEST_F_RAW) return memblock_alloc_try_nid_raw(size, align, min_addr, - max_addr, nid); + max_addr, nid, 0); return memblock_alloc_try_nid(size, align, min_addr, max_addr, nid); } =20 --=20 2.25.1 From nobody Mon Feb 9 03:17:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BCBFC0015E for ; Mon, 24 Jul 2023 13:50:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231435AbjGXNuR (ORCPT ); Mon, 24 Jul 2023 09:50:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231556AbjGXNtw (ORCPT ); Mon, 24 Jul 2023 09:49:52 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE1BC1BE7 for ; Mon, 24 Jul 2023 06:46:52 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-3fbc1218262so41886315e9.3 for ; Mon, 24 Jul 2023 06:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690206410; x=1690811210; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VUFHsk/m/GY1eTNfZx8Py//rj4lEwu1OMjpML8i4XiA=; b=MgcuJRY5CZqiaM6oN+2ZgOrZfx75ny19uB226sSasic7Mnx54MWMiPH2vVvPOLfApJ prGjNWXiRaSzV4+PdVcYOKWYfpjwq8s/M8+afykb+Nc06bZYe/+gbxa1+Be4Rm3jeWI0 8cCsZq8cfyydU59ItSQYEJs5yxH0yKdyeTfFZmwx6fK1dDSSKyPHBvlQzTgukYT3kPUX EQcr7s+o30vHTb1ipJOy2fJMjidD+R8BdV4yReZet1FTj9rlNgXk8HK6ZpTxwwfR3+Y5 MrrxBs3KhFwb0TnnMWnCPZSjTb//F/UJ43wX8Vj0/TqJXnIo2EGyycAVn4dvvn7wm2nd L09Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690206410; x=1690811210; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VUFHsk/m/GY1eTNfZx8Py//rj4lEwu1OMjpML8i4XiA=; b=XhtmUH4OuJ4WByiG5qJ3hctyQSSnW2cEK8uJQNivA21DpL1qHugvL90JO0BAyUUQwo 368/i0Rthopp0qxK3N3cpwh47d19QU/9//VqhWB8jHJKvE0mZYOi4wn7zWJii9CW49C0 o2D/v3Q5vWv+tg311fjZdHBkp6I01Srv6uoUfvM0gln8LwUCkycNNsqeJi9vUWS0wiMX VQ3l1jHxo6bgR3H1YfJlaVvUi5bOAv79PxcmVMorHKw7EK9iKu8AyDfM1qnDveFBeH4x sMrpKz5ePFxJBPhVYF8fi7bqmhCqq9jbPUOcwOc5wnJQFI4VUrbzeddvrUdi5yA6tltw NPXg== X-Gm-Message-State: ABy/qLY12o5WG42qKKKFekNAuic5JSHMIyUgceW6yh+l3S9DMq2UCm0A 50eeD/h0kL1xam8mPF9IqOWUMw== X-Google-Smtp-Source: APBJJlGKyOnJzA0snmPQ5aknmyxWInB37EjZPNgnEdY747vTPxkIvkOio6Xp4jLPbIkb3rasSdJxGQ== X-Received: by 2002:a7b:c349:0:b0:3fc:5bcc:a909 with SMTP id l9-20020a7bc349000000b003fc5bcca909mr7478357wmj.2.1690206410523; Mon, 24 Jul 2023 06:46:50 -0700 (PDT) Received: from localhost.localdomain ([2a02:6b6a:b465:0:d7c4:7f46:8fed:f874]) by smtp.gmail.com with ESMTPSA id e19-20020a05600c219300b003fbe791a0e8sm10209354wme.0.2023.07.24.06.46.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 06:46:49 -0700 (PDT) From: Usama Arif To: linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, rppt@kernel.org Cc: linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com, Usama Arif Subject: [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it Date: Mon, 24 Jul 2023 14:46:43 +0100 Message-Id: <20230724134644.1299963-4-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230724134644.1299963-1-usama.arif@bytedance.com> References: <20230724134644.1299963-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If tail page prep and initialization is skipped, then the "start" page will not contain the correct nid. Use the nid from first vmemap page. Signed-off-by: Usama Arif --- mm/hugetlb_vmemmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index b721e87de2b3..bdf750a4786b 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -324,7 +324,7 @@ static int vmemmap_remap_free(unsigned long start, unsi= gned long end, .reuse_addr =3D reuse, .vmemmap_pages =3D &vmemmap_pages, }; - int nid =3D page_to_nid((struct page *)start); + int nid =3D page_to_nid((struct page *)reuse); gfp_t gfp_mask =3D GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY | __GFP_NOWARN; =20 --=20 2.25.1 From nobody Mon Feb 9 03:17:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9FDCC0015E for ; Mon, 24 Jul 2023 13:50:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229835AbjGXNuU (ORCPT ); Mon, 24 Jul 2023 09:50:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231553AbjGXNtw (ORCPT ); Mon, 24 Jul 2023 09:49:52 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BA991BD8 for ; Mon, 24 Jul 2023 06:46:53 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-3fb4146e8ceso34268115e9.0 for ; Mon, 24 Jul 2023 06:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690206411; x=1690811211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l4pGELgjvEXBJuRM+kyS6zBxvRB1Y0OOMkMsDX033Vw=; b=U9TJoPZM8Vb+e/+TDxZOJ/G64+DOvXK8duWaCY98EJ9LR85s7Ir6jydu8BZa479KxZ jRykc91Lr65/WdO14wqIbrmESX+QVJSfPKmbkz08W6PegHJYRKhZsHYX5/ZeohmFO5I6 ROm+xlqvOa/+KqvRaMncQyPQY2le2+8HqRa+EVD/36i0+8/kad2mKe6z940vhrF7rJAO MA+rdL862CQeWZy+wv9ZnuKLEUaUrm6Ho1ie1q6HfyD0bJpWH4Cwb1xB+e3/cLjPXN9C v8pFji+l0xoy/MHWdK9ePxYNKHToDuhq5eys6H6aN5bmin9SIzYcVj3G2fhYuOBSlPSk kY4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690206411; x=1690811211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l4pGELgjvEXBJuRM+kyS6zBxvRB1Y0OOMkMsDX033Vw=; b=keKkqPfnskHUYymN/Qw3p+uXYNwNhkabm1tSoPmCi3PSLpqQhWQaJCppKzQh6qZCp4 xo3ITe5nhKQnQyjnt4zDJ88qtcCYx2utMw+I6kXI6YDKoIesuzwP7FQiZnze3VnJHY6o 7jjbmiKMPZ90BNwCsBcMRX5YBRPKftHtctUeQIrESYLFUgmR6sEsgy+ef5wn2tUt/5Vv y5EBc/Rc1eVZnYVZRJikj6an+drbMnVmQnZckG1W34OaxLHu/CV3d5yUl9HlC+AQ3FSM qFsTdayJwNfIOh5lI+KSAeT2Fve7DqM7QnZYSaSEFM/q3EJvvBf8Vj60uY8ki3DocJIc DRQQ== X-Gm-Message-State: ABy/qLbCbo/XemkDQNmub3SA+O74ShCyCAaTV1h/8zC94g2DFbMObrTz p8zA/QY0V2IfHyk3Bi9WPUPT+Db7pjvTTJf0BMg= X-Google-Smtp-Source: APBJJlGC9ngrxmvPhoLL2+ZVdDO/kVFluie5EE/bs+tXQXKrVPr0EB5rctNwlrUxp2EATgtzWXfzgg== X-Received: by 2002:a1c:7703:0:b0:3fa:d160:fc6d with SMTP id t3-20020a1c7703000000b003fad160fc6dmr6581149wmi.30.1690206411484; Mon, 24 Jul 2023 06:46:51 -0700 (PDT) Received: from localhost.localdomain ([2a02:6b6a:b465:0:d7c4:7f46:8fed:f874]) by smtp.gmail.com with ESMTPSA id e19-20020a05600c219300b003fbe791a0e8sm10209354wme.0.2023.07.24.06.46.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 06:46:50 -0700 (PDT) From: Usama Arif To: linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, rppt@kernel.org Cc: linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com, Usama Arif Subject: [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Date: Mon, 24 Jul 2023 14:46:44 +0100 Message-Id: <20230724134644.1299963-5-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230724134644.1299963-1-usama.arif@bytedance.com> References: <20230724134644.1299963-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If the region is for hugepages and if HVO is enabled, then those struct pages which will be freed later don't need to be initialized. This can save significant time when a large number of hugepages are allocated at boot time. As memmap_init_reserved_pages is only called at boot time, we don't need to worry about memory hotplug. Hugepage regions are kept separate from non hugepage regions in memblock_merge_regions so that initialization for unused struct pages can be skipped for the entire region. Signed-off-by: Usama Arif --- mm/hugetlb_vmemmap.c | 2 +- mm/hugetlb_vmemmap.h | 3 +++ mm/memblock.c | 27 ++++++++++++++++++++++----- 3 files changed, 26 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index bdf750a4786b..b5b7834e0f42 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -443,7 +443,7 @@ static int vmemmap_remap_alloc(unsigned long start, uns= igned long end, DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); =20 -static bool vmemmap_optimize_enabled =3D IS_ENABLED(CONFIG_HUGETLB_PAGE_OP= TIMIZE_VMEMMAP_DEFAULT_ON); +bool vmemmap_optimize_enabled =3D IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_= VMEMMAP_DEFAULT_ON); core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0); =20 /** diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 3525c514c061..8b9a1563f7b9 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -58,4 +58,7 @@ static inline bool hugetlb_vmemmap_optimizable(const stru= ct hstate *h) return hugetlb_vmemmap_optimizable_size(h) !=3D 0; } bool vmemmap_should_optimize(const struct hstate *h, const struct page *he= ad); + +extern bool vmemmap_optimize_enabled; + #endif /* _LINUX_HUGETLB_VMEMMAP_H */ diff --git a/mm/memblock.c b/mm/memblock.c index e92d437bcb51..62072a0226de 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -21,6 +21,7 @@ #include =20 #include "internal.h" +#include "hugetlb_vmemmap.h" =20 #define INIT_MEMBLOCK_REGIONS 128 #define INIT_PHYSMEM_REGIONS 4 @@ -519,7 +520,8 @@ static void __init_memblock memblock_merge_regions(stru= ct memblock_type *type, if (this->base + this->size !=3D next->base || memblock_get_region_node(this) !=3D memblock_get_region_node(next) || - this->flags !=3D next->flags) { + this->flags !=3D next->flags || + this->hugepage_size !=3D next->hugepage_size) { BUG_ON(this->base + this->size > next->base); i++; continue; @@ -2125,10 +2127,25 @@ static void __init memmap_init_reserved_pages(void) /* initialize struct pages for the reserved regions */ for_each_reserved_mem_region(region) { nid =3D memblock_get_region_node(region); - start =3D region->base; - end =3D start + region->size; - - reserve_bootmem_region(start, end, nid); + /* + * If the region is for hugepages and if HVO is enabled, then those + * struct pages which will be freed later don't need to be initialized. + * This can save significant time when a large number of hugepages are + * allocated at boot time. As this is at boot time, we don't need to + * worry about memory hotplug. + */ + if (region->hugepage_size && vmemmap_optimize_enabled) { + for (start =3D region->base; + start < region->base + region->size; + start +=3D region->hugepage_size) { + end =3D start + HUGETLB_VMEMMAP_RESERVE_SIZE * sizeof(struct page); + reserve_bootmem_region(start, end, nid); + } + } else { + start =3D region->base; + end =3D start + region->size; + reserve_bootmem_region(start, end, nid); + } } } =20 --=20 2.25.1