From nobody Wed Dec 17 05:18:25 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1108CC7EE2C for ; Fri, 25 Aug 2023 11:19:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241794AbjHYLTc (ORCPT ); Fri, 25 Aug 2023 07:19:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243123AbjHYLTU (ORCPT ); Fri, 25 Aug 2023 07:19:20 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6817E1FDE for ; Fri, 25 Aug 2023 04:18:53 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-31c79850df5so657777f8f.1 for ; Fri, 25 Aug 2023 04:18:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1692962332; x=1693567132; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r6ewY66oYf3vDH9aPoj7pb9ZlWRxktJaiCYaiSDx5Sk=; b=KyUsO0r4rRvScvgaql7Mq5sBM59AgdiLlku31XVLehfo/OyOKwTeyZBHqhENtKIEl9 P3ymLBkS3If/LAKRNANtlRWYrskCp2K3RSSWfZcBfwGiK9kYFXjQDHz7qE2ymRYfFUMn uUbWss9XFgzITSidEKkZUWaJh7avtNC9lP7769KoK82tO41UWCXcLCs+4Xm8A5SkBzXK RYBbCFiYgMht2pLOGOKshr6GOT8+pWg3D47dUqEho1eTmsge1mAirZ0LvqexlT/0jC5M TYPbzcJPLn+3vaif0HDBwx0dC08rv+ZWDWMU5TML4/SojKT5VJpBFSDMRI3vpn/dt11/ wBLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692962332; x=1693567132; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r6ewY66oYf3vDH9aPoj7pb9ZlWRxktJaiCYaiSDx5Sk=; b=QTthRpqfQprC+8HObVMngMV6VLH8JOlTSwpTynXdg+4CA0zeU6QYY3N4/77VG4irWG ctFot6IUcClDyfSUOyo2PfGfeX2IiU/2llgSxZB9QwjkTrr3gD3B7CQlvFl3F0yYzgzc xX+g8YJqDqBhy2Df6Hb1/bQSBiqBON0feIheup+s2vuhkpN7wjmHJLvR058Eim5f/OYo O2w3/vbhfo5kCxukaTfaaHVNg+27Dp6MhTlF9ssVZpMHj7TD309zlVqrYt61JHzT6Dul RfWC5LzspDRhoWAuP1ZNc81nHxGrCgwUFtsYbvrs97ja6UVlnXiOwMaxyE4C8am6+nZk TnbA== X-Gm-Message-State: AOJu0YxaBWTAm9iQDXHCMtSauu7y2+SNvL10gbq7GuGESiJBIDbV0nSh vhtoLAFQ2GbzH7Sy1xsdpeHjwA== X-Google-Smtp-Source: AGHT+IGJuVnbgEdL7B1yqcNDta9SW8PrCZLooH6K+4U6WsXn8SKs1tWicPmF0cFJa/KnOnpJGZAshw== X-Received: by 2002:adf:f08b:0:b0:315:a32d:311f with SMTP id n11-20020adff08b000000b00315a32d311fmr14949179wro.14.1692962331890; Fri, 25 Aug 2023 04:18:51 -0700 (PDT) Received: from localhost.localdomain ([2a02:6b6a:b5c7:0:9000:8d13:d0bb:ba7d]) by smtp.gmail.com with ESMTPSA id g9-20020a056000118900b0031ad5fb5a0fsm1939502wrx.58.2023.08.25.04.18.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Aug 2023 04:18:51 -0700 (PDT) From: Usama Arif To: linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, rppt@kernel.org Cc: linux-kernel@vger.kernel.org, songmuchun@bytedance.com, fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com, Usama Arif Subject: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Date: Fri, 25 Aug 2023 12:18:36 +0100 Message-Id: <20230825111836.1715308-5-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230825111836.1715308-1-usama.arif@bytedance.com> References: <20230825111836.1715308-1-usama.arif@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The new boot flow when it comes to initialization of gigantic pages is as follows: - At boot time, for a gigantic page during __alloc_bootmem_hugepage, the region after the first struct page is marked as noinit. - This results in only the first struct page to be initialized in reserve_bootmem_region. As the tail struct pages are not initialized at this point, there can be a significant saving in boot time if HVO succeeds later on. - Later on in the boot, HVO is attempted. If its successful, only the first HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages after the head struct page are initialized. If it is not successful, then all of the tail struct pages are initialized. Signed-off-by: Usama Arif --- mm/hugetlb.c | 52 +++++++++++++++++++++++++++++++++++--------- mm/hugetlb_vmemmap.h | 8 +++---- mm/internal.h | 3 +++ mm/mm_init.c | 2 +- 4 files changed, 50 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6da626bfb52e..964f7a2b693e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1953,7 +1953,6 @@ static void __prep_account_new_huge_page(struct hstat= e *h, int nid) =20 static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio) { - hugetlb_vmemmap_optimize(h, &folio->page); INIT_LIST_HEAD(&folio->lru); folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR); hugetlb_set_folio_subpool(folio, NULL); @@ -2225,6 +2224,7 @@ static struct folio *alloc_fresh_hugetlb_folio(struct= hstate *h, return NULL; } } + hugetlb_vmemmap_optimize(h, &folio->page); prep_new_hugetlb_folio(h, folio, folio_nid(folio)); =20 return folio; @@ -2943,6 +2943,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct hs= tate *h, new_folio =3D alloc_buddy_hugetlb_folio(h, gfp_mask, nid, NULL, NULL); if (!new_folio) return -ENOMEM; + hugetlb_vmemmap_optimize(h, &new_folio->page); __prep_new_hugetlb_folio(h, new_folio); =20 retry: @@ -3206,6 +3207,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int = nid) } =20 found: + + /* + * Only initialize the head struct page in memmap_init_reserved_pages, + * rest of the struct pages will be initialized by the HugeTLB subsystem = itself. + * The head struct page is used to get folio information by the HugeTLB + * subsystem like zone id and node id. + */ + memblock_reserved_mark_noinit_vmemmap(virt_to_phys((void *)m + PAGE_SIZE), + huge_page_size(h) - PAGE_SIZE); /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); list_add(&m->list, &huge_boot_pages); @@ -3213,6 +3223,27 @@ int __alloc_bootmem_huge_page(struct hstate *h, int = nid) return 1; } =20 +static void __init hugetlb_folio_init_vmemmap(struct hstate *h, struct fol= io *folio, + unsigned long nr_pages) +{ + enum zone_type zone =3D zone_idx(folio_zone(folio)); + int nid =3D folio_nid(folio); + unsigned long head_pfn =3D folio_pfn(folio); + unsigned long pfn, end_pfn =3D head_pfn + nr_pages; + + __folio_clear_reserved(folio); + __folio_set_head(folio); + + for (pfn =3D head_pfn + 1; pfn < end_pfn; pfn++) { + struct page *page =3D pfn_to_page(pfn); + + __init_single_page(page, pfn, zone, nid); + prep_compound_tail((struct page *)folio, pfn - head_pfn); + set_page_count(page, 0); + } + prep_compound_head((struct page *)folio, huge_page_order(h)); +} + /* * Put bootmem huge pages into the standard lists after mem_map is up. * Note: This only applies to gigantic (order > MAX_ORDER) pages. @@ -3223,19 +3254,19 @@ static void __init gather_bootmem_prealloc(void) =20 list_for_each_entry(m, &huge_boot_pages, list) { struct page *page =3D virt_to_page(m); - struct folio *folio =3D page_folio(page); + struct folio *folio =3D (void *)page; struct hstate *h =3D m->hstate; + unsigned long nr_pages =3D pages_per_huge_page(h); =20 VM_BUG_ON(!hstate_is_gigantic(h)); WARN_ON(folio_ref_count(folio) !=3D 1); - if (prep_compound_gigantic_folio(folio, huge_page_order(h))) { - WARN_ON(folio_test_reserved(folio)); - prep_new_hugetlb_folio(h, folio, folio_nid(folio)); - free_huge_page(page); /* add to the hugepage allocator */ - } else { - /* VERY unlikely inflated ref count on a tail page */ - free_gigantic_folio(folio, huge_page_order(h)); - } + + hugetlb_vmemmap_optimize(h, &folio->page); + if (HPageVmemmapOptimized(&folio->page)) + nr_pages =3D HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page); + hugetlb_folio_init_vmemmap(h, folio, nr_pages); + prep_new_hugetlb_folio(h, folio, folio_nid(folio)); + free_huge_page(page); /* add to the hugepage allocator */ =20 /* * We need to restore the 'stolen' pages to totalram_pages @@ -3656,6 +3687,7 @@ static int demote_free_hugetlb_folio(struct hstate *h= , struct folio *folio) else prep_compound_page(subpage, target_hstate->order); folio_change_private(inner_folio, NULL); + hugetlb_vmemmap_optimize(h, &folio->page); prep_new_hugetlb_folio(target_hstate, inner_folio, nid); free_huge_page(subpage); } diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 25bd0e002431..d30aff8f3573 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -10,16 +10,16 @@ #define _LINUX_HUGETLB_VMEMMAP_H #include =20 -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head); -void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head); - /* * Reserve one vmemmap page, all vmemmap addresses are mapped to it. See * Documentation/vm/vmemmap_dedup.rst. */ #define HUGETLB_VMEMMAP_RESERVE_SIZE PAGE_SIZE =20 +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head); +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head); + static inline unsigned int hugetlb_vmemmap_size(const struct hstate *h) { return pages_per_huge_page(h) * sizeof(struct page); diff --git a/mm/internal.h b/mm/internal.h index a7d9e980429a..31b3d45f4609 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1102,4 +1102,7 @@ struct vma_prepare { struct vm_area_struct *remove; struct vm_area_struct *remove2; }; + +void __meminit __init_single_page(struct page *page, unsigned long pfn, + unsigned long zone, int nid); #endif /* __MM_INTERNAL_H */ diff --git a/mm/mm_init.c b/mm/mm_init.c index a1963c3322af..3d4ab595ca7d 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -551,7 +551,7 @@ static void __init find_zone_movable_pfns_for_nodes(voi= d) node_states[N_MEMORY] =3D saved_node_state; } =20 -static void __meminit __init_single_page(struct page *page, unsigned long = pfn, +void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { mm_zero_struct_page(page); --=20 2.25.1