From nobody Mon May 25 04:35:09 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6689818DB01 for ; Tue, 19 May 2026 00:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; cv=none; b=q8XJ4z4+/YSM7JTyPzwXz0ZDlUnzsu2KKvaQz/2zlgbKRDm+KBnQXZ15O2/kMKpFK3bMjAH4DDwyzm42X1gyYAKFEDUghkh6bj7+xUqrqrHZbf+DdXYCwEdv6ih/Ui8fKUUl6fpPAtqaZrRBEsL/guX6nS9QUglbNG1ZQXuoaoA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; c=relaxed/simple; bh=DmF912XBXWSZg+goW1bFDdFsol5YVx4sSEtgnClKN1k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BRJr2cSK4MX9FbYUqDlRyeq9lRuvGe3/LDs2XxSv9l4OCVXRqIOCHLH26gpiG/ZKchSHcvseZMaiIdUYG8BES+rlz1Jpka25DiQyHy+d7KiqJ/GROfKqfTrFKyAn/DHAKr0v1k+lZNIgl7mi2R/hSHAbdIE1N6sdQJjZZapwixo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jog/QEL/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jog/QEL/" Received: by smtp.kernel.org (Postfix) with ESMTPS id 1D870C2BCB8; Tue, 19 May 2026 00:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779149989; bh=DmF912XBXWSZg+goW1bFDdFsol5YVx4sSEtgnClKN1k=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=jog/QEL/Mtj6SyktShDjWR+b0urP8hq/27Yb3AHS8Brlshe7wjXWSa3/ycz7WSgiG oykV+LN7ZJhVY0ccWQqHKOaYH7XscLcMuYQ4HG20uf5MuNE2vvIhUHnwXCRwkkgOl8 t2JWwdqIETULelt/1jBzmvaaE8b7CYGztlflYUq0TCgJivFObhj2XqaMrDgNn2X2Tb A1LEt0gpYba+72tMtBqRzxieML8D/nr/32TBmbgiXEsh8tl5+uJH/2KS6nII27VjWN CCKYYFLot0sbpVldHGFWjoXP/AzFWT3zMDcPFn/mJKy+4kjuWfgTOdgxwbh5aAqzNt j0wAo1frEXKng== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08AF6CD343F; Tue, 19 May 2026 00:19:49 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Mon, 18 May 2026 17:19:40 -0700 Subject: [PATCH v3 1/6] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260518-hugetlb-open-up-v3-1-e14b302477f8@google.com> References: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> In-Reply-To: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Zi Yan , Matthew Brost , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779149988; l=2912; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=AePQ5ge08BDvno3Z2sz00AANV/evPZ/MLFYsGkJ3Hf0=; b=0skywfukH1jzfYFODEad/4ypPJF3H2kngkFerrbzsvccVY7o7T5yItuX2PowuUM77EcPbwDhF MwzZdmsTk/RBAMrVi2w8pIQySnBvDmHLk/qra4e6oRCSiyOsn5aDJ5F X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng The dequeue_hugetlb_folio_vma() function currently handles the gbl_chg parameter to determine if a folio can be dequeued based on global page availability. This leaks reservation-specific logic into the dequeueing path. Relocate this logic to alloc_hugetlb_folio() so that dequeue_hugetlb_folio_vma() focuses solely on selecting and dequeuing a folio. In alloc_hugetlb_folio(), only attempt to dequeue a folio if a reservation exists (gbl_chg =3D=3D 0) or if there are available huge pages = in the global pool. No functional change intended. Reviewed-by: James Houghton Acked-by: Oscar Salvador Reviewed-by: Joshua Hahn Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f24bf49be047e..190ab539a97d4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1336,7 +1336,7 @@ static unsigned long available_huge_pages(struct hsta= te *h) =20 static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, long gbl_chg) + unsigned long address) { struct folio *folio =3D NULL; struct mempolicy *mpol; @@ -1344,13 +1344,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struc= t hstate *h, nodemask_t *nodemask; int nid; =20 - /* - * gbl_chg=3D=3D1 means the allocation requires a new page that was not - * reserved before. Making sure there's at least one free page. - */ - if (gbl_chg && !available_huge_pages(h)) - goto err; - gfp_mask =3D htlb_alloc_mask(h); nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); =20 @@ -1368,9 +1361,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct= hstate *h, =20 mpol_cond_put(mpol); return folio; - -err: - return NULL; } =20 #if defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE) && defined(CONFIG_CONTIG_ALLOC) @@ -2939,12 +2929,17 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, goto out_uncharge_cgroup_reservation; =20 spin_lock_irq(&hugetlb_lock); + /* - * glb_chg is passed to indicate whether or not a page must be taken - * from the global free pool (global change). gbl_chg =3D=3D 0 indicates - * a reservation exists for the allocation. + * gbl_chg =3D=3D 0 indicates a reservation exists for the + * allocation, so try dequeuing a page. In case there was no + * reservation, try dequeuing a page if there are available + * pages in the global pool. */ - folio =3D dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg); + folio =3D NULL; + if (!gbl_chg || available_huge_pages(h)) + folio =3D dequeue_hugetlb_folio_vma(h, vma, addr); + if (!folio) { spin_unlock_irq(&hugetlb_lock); folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 04:35:09 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 669BF18DB26 for ; Tue, 19 May 2026 00:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; cv=none; b=Y54Cv4szyWbZbMXLFl2c6Va14Hq1zUGN/cIRH4SDz4T7F1R6wEimbK201nkOaM7+o18Ygu2L9qM1G6zAxKACGltrInwLyIHu0R1srIjpILVbEH4Pcs7PsnXvSfbpJg8F/EBKG7givTDeTRzSuSljSe0HCXSS9jzIDa+bCnIgMj8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; c=relaxed/simple; bh=ADZ2XOBOHQDWc7jjdUHTLF/VJ0O9HDOTr3qobMe2dh4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=J8dNPr2dp57BTg+x86V5LE2TslzZyTV7FNWFW634oE4xa4FoIhl3mKyUQ2DRR8arFlrCl5CalLJM/XCO4vGBB1CxuL1MbC1iwwQm5pLc5AwE6fgwwtkNvht0Yhd2xJiEWWcfEM0r5JNIZSsMbA8pgjl3ZlCTCFu9V7ouAhTqP4c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GrlmsOvl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GrlmsOvl" Received: by smtp.kernel.org (Postfix) with ESMTPS id 31ABFC2BCF7; Tue, 19 May 2026 00:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779149989; bh=ADZ2XOBOHQDWc7jjdUHTLF/VJ0O9HDOTr3qobMe2dh4=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=GrlmsOvlcXlX50eqKeBJtRqX8nJkcoAgQHucGYdw8yPaMb6jFucryTNvKAIWLE7HX P+smS5nif/3oBLS+HNNjo6D0GnN2z3K8hShSg+wnIAyUcY/iWthYtnA2xnlCaBOea9 wVR00+ybGumGe83mF27/Iq1Pacmaqx7x7TKFA5fO/BOvpAue6x06D6Tu2tZ60NN1rd 1+okBrB22/7z88sJch4TWl8pnXOFy24MdZD4dDYCb4j24MUFRfPmvkdbcE6r0eFpDf yI6Afg4Wz5THRNF1Y1fuc51KohhszEbwE0PGiEt3WYNj5HLyathudGBSmRMVMgZqj/ d+bs7yA0KGMMA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EC8DCD4F4A; Tue, 19 May 2026 00:19:49 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Mon, 18 May 2026 17:19:41 -0700 Subject: [PATCH v3 2/6] mm: hugetlb: Move mpol interpretation out of alloc_buddy_hugetlb_folio_with_mpol() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260518-hugetlb-open-up-v3-2-e14b302477f8@google.com> References: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> In-Reply-To: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Zi Yan , Matthew Brost , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779149988; l=4657; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=tD7lNPr5Joukm7mZDeaHgQAhJLEw2bXVLa7MLfuBvEM=; b=+LySESN/X13IMDWPIx+4vGv56FS9Ee4/FyVcwQEipOHfVF27nIF/LxESvqaveWWLkE45UaMt/ vbkI70tcrkaCNtpnsPRp7fpNkAFz0egcOdIsdTRnIjSTJbyqCickbv+ X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Move memory policy interpretation out of alloc_buddy_hugetlb_folio_with_mpol() and into alloc_hugetlb_folio() to separate reading and interpretation of memory policy from actual allocation. This will later allow memory policy to be interpreted outside of the process of allocating a hugetlb folio entirely. This opens doors for other callers of the HugeTLB folio allocation function, such as guest_memfd, where memory may not always be mapped and hence may not have an associated vma. Introduce struct mempolicy_interpreted to hold all the components of an interpreted memory policy. Rename alloc_buddy_hugetlb_folio_with_mpol() to alloc_buddy_hugetlb_folio() since the function no longer interprets memory policy. No functional change intended. Reviewed-by: James Houghton Acked-by: Oscar Salvador Signed-off-by: Ackerley Tng --- include/uapi/linux/mempolicy.h | 2 +- mm/hugetlb.c | 50 +++++++++++++++++++++++++++-----------= ---- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 6c962d866e864..7f6fc9599693b 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -16,7 +16,7 @@ */ =20 /* Policies */ -enum { +enum mempolicy_mode { MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 190ab539a97d4..6a5f69b3b1cb4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1334,6 +1334,12 @@ static unsigned long available_huge_pages(struct hst= ate *h) return h->free_huge_pages - h->resv_huge_pages; } =20 +struct mempolicy_interpreted { + int nid; + nodemask_t *nodemask; + enum mempolicy_mode mode; +}; + static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address) @@ -2155,32 +2161,28 @@ static struct folio *alloc_migrate_hugetlb_folio(st= ruct hstate *h, gfp_t gfp_mas return folio; } =20 -/* - * Use the VMA's mpolicy to allocate a huge page from the buddy. - */ static -struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, + gfp_t gfp_mask, struct mempolicy_interpreted *mpoli) { struct folio *folio =3D NULL; - struct mempolicy *mpol; - gfp_t gfp_mask =3D htlb_alloc_mask(h); - int nid; - nodemask_t *nodemask; + nodemask_t *nodemask =3D mpoli->nodemask; =20 - nid =3D huge_node(vma, addr, gfp_mask, &mpol, &nodemask); - if (mpol_is_preferred_many(mpol)) { + if (mpoli->mode =3D=3D MPOL_PREFERRED_MANY) { gfp_t gfp =3D gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); =20 - folio =3D alloc_surplus_hugetlb_folio(h, gfp, nid, nodemask); + folio =3D alloc_surplus_hugetlb_folio(h, gfp, mpoli->nid, + nodemask); =20 /* Fallback to all nodes if page=3D=3DNULL */ nodemask =3D NULL; } =20 - if (!folio) - folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, nid, nodemask); - mpol_cond_put(mpol); + if (!folio) { + folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, mpoli->nid, + nodemask); + } + return folio; } =20 @@ -2869,7 +2871,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; - gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + gfp_t gfp =3D htlb_alloc_mask(h); =20 idx =3D hstate_index(h); =20 @@ -2941,8 +2943,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, folio =3D dequeue_hugetlb_folio_vma(h, vma, addr); =20 if (!folio) { + struct mempolicy_interpreted mpoli; + struct mempolicy *mpol; + nodemask_t *nodemask; + int nid; + spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); + nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); + mpoli =3D (struct mempolicy_interpreted){ + .nid =3D nid, + .mode =3D mpol->mode, + .nodemask =3D nodemask, + }; + folio =3D alloc_buddy_hugetlb_folio(h, gfp, &mpoli); + mpol_cond_put(mpol); if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); @@ -2998,7 +3012,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, } } =20 - ret =3D mem_cgroup_charge_hugetlb(folio, gfp); + ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); /* * Unconditionally increment NR_HUGETLB here. If it turns out that * mem_cgroup_charge_hugetlb failed, then immediately free the page and --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 04:35:09 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 667FF7262A for ; Tue, 19 May 2026 00:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; cv=none; b=SHT8C233IAY+qho6PmOCVFtnqUa/LC4xdqy5oHHcTKFTdRRP++3/195YwCXdBuiM4ULeelLF5Ago410zoERj2xPIAXJJk2D2eIt6ual1k7VNVWe5nZ3qWhr+qRHtUCN39iX5vgvVO2A8l+a+CejKljGHdS6NZaDwOs/CxIApB9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; c=relaxed/simple; bh=9WgM/k2IJyHrbk7WXf2mgxFfKhGk6TnbvjvPwy9E4wE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Ux2Bc69apBP8d/+XnwRZXqULaGzbSHFobYQdr0pymPL9Dl+6t/l3Ql0COeJXQ04LoIce20393xzHB8qE4mHDQbsqsgfjLkj3crvLty+6TzehbNSDtVNwWc3n90xcdIY1D2w/pi/8qspaJUqfki7pR7X4ros89mmy/u8nX4G6MA4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KINh+Kv9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KINh+Kv9" Received: by smtp.kernel.org (Postfix) with ESMTPS id 3F7FFC2BCF5; Tue, 19 May 2026 00:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779149989; bh=9WgM/k2IJyHrbk7WXf2mgxFfKhGk6TnbvjvPwy9E4wE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=KINh+Kv93PCIrNV/ijhVMcn5JB1SFUUKhd8WeljcNOBTSC+g/LieipySZ0w163Z8j wQj/tH1UW/DB0ziGqa5y+gq4V6bRTsPaxwNUbV9ziXAcNt3UDoY966E7OIKS6xz3xS Uha5Hi7wkX6nwvmWBM0P7n0IUXAiI9DlQM6sRlhWHr3yDbbAKm1rywVhe3/B69l5bv 2Q4AXcWAizhGCIobJtAkznsa2zS0xP2t60v7icFBtO+GtYPicURclE+PF2BtjJVBw2 Ntd7B7UYgNAy+htvWLUqIvOTIvDA6Idq9ZlAKW70c7v3bYZTRwv0zmFcv36NxrcpvB PPkcT+zcKEdrQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 342B7CD4F57; Tue, 19 May 2026 00:19:49 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Mon, 18 May 2026 17:19:42 -0700 Subject: [PATCH v3 3/6] mm: hugetlb: Move mpol interpretation out of dequeue_hugetlb_folio_vma() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260518-hugetlb-open-up-v3-3-e14b302477f8@google.com> References: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> In-Reply-To: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Zi Yan , Matthew Brost , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779149988; l=4047; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=XBsBcsD+y0HocOj/hXA27UpdO3jJ7U9sMFebRSppli8=; b=WqYG2G00uuR/tiQzDklNV902phklHnP5SUs2U3AsZetnI5SBKRVwRY7HamLyLMuWvBKuQPHq8 z5+aqi51hc6DJW7cfi+SaJMPylyc07wN+/bpiNTZ0k196Z8OTsYgou+ X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Move memory policy interpretation out of dequeue_hugetlb_folio_vma() and into alloc_hugetlb_folio() to separate reading and interpretation of memory policy from actual allocation. Also rename dequeue_hugetlb_folio_vma() to dequeue_hugetlb_folio_with_mpol() to remove association with vma and to align with alloc_buddy_hugetlb_folio_with_mpol(). This will later allow memory policy to be interpreted outside of the process of allocating a hugetlb folio entirely. This opens doors for other callers of the HugeTLB folio allocation function, such as guest_memfd, where memory may not always be mapped and hence may not have an associated vma. No functional change intended. Signed-off-by: Ackerley Tng Reviewed-by: James Houghton --- mm/hugetlb.c | 57 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 28 insertions(+), 29 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6a5f69b3b1cb4..9807bbe0d70df 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1340,32 +1340,26 @@ struct mempolicy_interpreted { enum mempolicy_mode mode; }; =20 -static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, - struct vm_area_struct *vma, - unsigned long address) +static struct folio *dequeue_hugetlb_folio(struct hstate *h, gfp_t gfp_mas= k, + struct mempolicy_interpreted *mpoli) { + nodemask_t *nodemask =3D mpoli->nodemask; struct folio *folio =3D NULL; - struct mempolicy *mpol; - gfp_t gfp_mask; - nodemask_t *nodemask; - int nid; =20 - gfp_mask =3D htlb_alloc_mask(h); - nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); - - if (mpol_is_preferred_many(mpol)) { + if (mpoli->mode =3D=3D MPOL_PREFERRED_MANY) { folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, - nid, nodemask); + mpoli->nid, + nodemask); =20 /* Fallback to all nodes if page=3D=3DNULL */ nodemask =3D NULL; } =20 - if (!folio) + if (!folio) { folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, - nid, nodemask); - - mpol_cond_put(mpol); + mpoli->nid, + nodemask); + } return folio; } =20 @@ -2871,7 +2865,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; + struct mempolicy_interpreted mpoli; gfp_t gfp =3D htlb_alloc_mask(h); + struct mempolicy *mpol; + nodemask_t *nodemask; + int nid; =20 idx =3D hstate_index(h); =20 @@ -2930,6 +2928,14 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, if (ret) goto out_uncharge_cgroup_reservation; =20 + /* Takes reference on mpol. */ + nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); + mpoli =3D (struct mempolicy_interpreted){ + .nid =3D nid, + .mode =3D mpol->mode, + .nodemask =3D nodemask, + }; + spin_lock_irq(&hugetlb_lock); =20 /* @@ -2940,31 +2946,24 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, */ folio =3D NULL; if (!gbl_chg || available_huge_pages(h)) - folio =3D dequeue_hugetlb_folio_vma(h, vma, addr); + folio =3D dequeue_hugetlb_folio(h, gfp, &mpoli); =20 if (!folio) { - struct mempolicy_interpreted mpoli; - struct mempolicy *mpol; - nodemask_t *nodemask; - int nid; - spin_unlock_irq(&hugetlb_lock); - nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); - mpoli =3D (struct mempolicy_interpreted){ - .nid =3D nid, - .mode =3D mpol->mode, - .nodemask =3D nodemask, - }; folio =3D alloc_buddy_hugetlb_folio(h, gfp, &mpoli); mpol_cond_put(mpol); - if (!folio) + if (!folio) { + mpol_cond_put(mpol); goto out_uncharge_cgroup; + } spin_lock_irq(&hugetlb_lock); list_add(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); /* Fall through */ } =20 + mpol_cond_put(mpol); + /* * Either dequeued or buddy-allocated folio needs to add special * mark to the folio when it consumes a global reservation. --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 04:35:09 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 807E3196C7C for ; Tue, 19 May 2026 00:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; cv=none; b=SHst6Mf/irAaFX0paQNUv2FTY61Raa9O+QC3RGEE2QwRsKJr7KaFp7wfAX/zxR4bqU9Gais6Jr/0J6+GgvhbF/yqQfOIJsmVhlltKXR8tv/CdyAK0Rbn7qdG5CNA6BYBPKoKR/tU5LpE5PAd6jNZqYqN1jNGBRC1HgFkO8eTXJE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; c=relaxed/simple; bh=4pRcmdoszQb1kW+RAvRDYtLTa/5NT++4GEUmZxA+3co=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=J4LHprY9FEu3MQ6rhqvRfbTyRzB6FX4Itb9UcOd6t3DMGq5y4QhRHLSZHyxZQcXiJjxL6EgW3FBSKQrRVwCX8icRZCellvwkiAysJL9+0ojEsJAYJmtmgz+SAGXmekiwQ3tFCkPw+8RGLZ2Fq905/kRoPjsJzccdIsmHiUA6SpM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a6C6ODg2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a6C6ODg2" Received: by smtp.kernel.org (Postfix) with ESMTPS id 598BAC2BCFB; Tue, 19 May 2026 00:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779149989; bh=4pRcmdoszQb1kW+RAvRDYtLTa/5NT++4GEUmZxA+3co=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=a6C6ODg25asan6i8laEWhVV95H1HYC5NafWQrytlq4++06Nf+W3LzmyvwxuCKTTxX uI0Nt874Zxp0C3Vhswe9ewi39wDzTPzhQnZGLp6/bulUMoLVPa3pJ4/4NFf8xxMxjU saw6LbcaDsi9yt7DjqJ3Fpab2/41VskhWHxMRe4WOJOhrwMmGn0BEWc/vYZNWLOuLy 7tWtw8tnWDw0sU3lTWpSJenxsWjc1LrcqaUzrUvuiF6DyMJnU5THOJ3bx2SNe/w2SA AfpfKGKWI0+OdWeToOvOtA/mm7jJMXOBUo8oEEb/0Jjov9YhhOnSMn/u5zv2cWkyD3 9jgYyg7SWPC5w== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C132CD4F58; Tue, 19 May 2026 00:19:49 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Mon, 18 May 2026 17:19:43 -0700 Subject: [PATCH v3 4/6] mm: hugetlb: Use error variable in alloc_hugetlb_folio Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260518-hugetlb-open-up-v3-4-e14b302477f8@google.com> References: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> In-Reply-To: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Zi Yan , Matthew Brost , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779149988; l=2132; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=0XVqHZOzRz+o5ty9A92C0Mk3eHhDCZNGyrqA29XGPQE=; b=aFCnv6H1zaN1sS3mad+QbIK71b5juUR1G+mlB4WyY2Y3mw9BwW1DP63mKit/pyTTPVbAf0h5d +889JUL5xDMDkN8PQ3bi9eyZygo2V5kuVz1Cd97ELKEDT2+VXxoiWEB X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Refactor alloc_hugetlb_folio to use a local variable for returning error codes. Instead of returning ERR_PTR(-ENOSPC) at the end of the error path, assign -ENOSPC to a return variable at each failure point and return that variable at the end. This allows the cleanup goto targets to be used with other errors in a later patch. No functional change intended. Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9807bbe0d70df..ad07e72d6fac3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2903,8 +2903,10 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, */ if (map_chg) { gbl_chg =3D hugepage_subpool_get_pages(spool, 1); - if (gbl_chg < 0) + if (gbl_chg < 0) { + ret =3D -ENOSPC; goto out_end_reservation; + } } else { /* * If we have the vma reservation ready, no need for extra @@ -2920,13 +2922,17 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, if (map_chg) { ret =3D hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); - if (ret) + if (ret) { + ret =3D -ENOSPC; goto out_subpool_put; + } } =20 ret =3D hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) + if (ret) { + ret =3D -ENOSPC; goto out_uncharge_cgroup_reservation; + } =20 /* Takes reference on mpol. */ nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); @@ -2954,6 +2960,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, mpol_cond_put(mpol); if (!folio) { mpol_cond_put(mpol); + ret =3D -ENOSPC; goto out_uncharge_cgroup; } spin_lock_irq(&hugetlb_lock); @@ -3046,7 +3053,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, out_end_reservation: if (map_chg !=3D MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); - return ERR_PTR(-ENOSPC); + return ERR_PTR(ret); } =20 static __init void *alloc_bootmem(struct hstate *h, int nid, bool node_exa= ct) --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 04:35:09 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C65D1DA0E1 for ; Tue, 19 May 2026 00:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; cv=none; b=LhZYhqNxHxyoyP0DjTfImPDJAYdkXlPx6fxgdmEhED86HO+kUoHMSWJYpvvcL+Xwy7yrlJ/ge+TR5pZHy8KDqzjrO4hH3dz6zUqDVTqVvolAym4gQg/6wwaPvNaG96I5tg+C42l6ZKAzIQeXf9gbNVAbocxSyV72taJdj1gPvn8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; c=relaxed/simple; bh=3zXO270ajr1tnjZV8A/bAEEs7bzJ1z/L+UKQw43NjcA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=RJLHk9UTEr7jya4LfjT1oByf+A/lnpt3ltbAs1Ub/OpYJ8lXTFCwJ1pa5mITpbkVdxec69OBdAHrM5POq9HyOozdouLEQ3AiUY8DyYVgvTJLhbUKB0CCupmH5ft0FuVx9y1bm3qDmYHD9NtSqZpqxenAqmuBwI0qtz8DzlOxtO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J9DRXfHT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J9DRXfHT" Received: by smtp.kernel.org (Postfix) with ESMTPS id 73B2FC2BCC7; Tue, 19 May 2026 00:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779149989; bh=3zXO270ajr1tnjZV8A/bAEEs7bzJ1z/L+UKQw43NjcA=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=J9DRXfHTV1x2wT9FDO37latHI4bpt8gKaGiG5kfWObrWNPOZ1PwiIDrOaEJd+qvUc fX8hs18VuXivZ952QtRTEqLm9GDQOebDPvvpE9leVmnrASt8EZc/cXz6xShxEgVU0c b4smGrhdd51TB8f7BsoVCY4yPNH0LmIU3+U7u4X+CnHBCK/Bx6W2HZhu6J0m+pIZLN iHZhcVrD/U0y0EoS9jwt1VwH06vAlMgb9QNeUtqc86x6/5dUBPPKicK9yb/AdxLiPY eVFi+X0T0rI6PwlZjyYq5+elbPAY9VpQNo7nI2LWfrTevOu3tHlBQceHIEdmUp/aY8 JXySAESSXTJgQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 651A9CD4F3C; Tue, 19 May 2026 00:19:49 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Mon, 18 May 2026 17:19:44 -0700 Subject: [PATCH v3 5/6] mm: hugetlb: Move mem_cgroup_charge_hugetlb() earlier in allocation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260518-hugetlb-open-up-v3-5-e14b302477f8@google.com> References: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> In-Reply-To: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Zi Yan , Matthew Brost , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779149988; l=2460; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=D0/c/tPxzDKBQ5LqjBs64oNrn/JHwmJjr3L/Nq/FGGs=; b=5rbwFVt3aoGkbaTw+PHbELOYhBY/NhAnIaG1JfuEYmejI73V/Ie6kUAohYd18a9Q8wrlAb3d0 25QVtOzNcpCBbwDanBDndQxjoPXO5fAgCNnuuhzu8IADnbhR2zQU0vp X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Move mem_cgroup_charge_hugetlb() earlier in the folio allocation process. This change draws a cleaner line between memcg charging and the subsequent hugetlb-specific reservation logic for VMAs and subpools. While it would be ideal to make all accounting and reservations perfectly symmetric, mem_cgroup_charge_hugetlb() is a complex operation that cannot be performed under the hugetlb_lock. Moving the charge to this earlier point ensures that memcg charging is handled before the code begins manipulating subpool and VMA-specific state. These two types of accounting will be separated in a future patch. If mem_cgroup_charge_hugetlb() fails, the code now branches to out_subpool_put to ensure the folio is freed and the subpool references are handled correctly. Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ad07e72d6fac3..81e73186dff09 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2991,6 +2991,24 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, =20 spin_unlock_irq(&hugetlb_lock); =20 + ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); + /* + * Unconditionally increment NR_HUGETLB here. If it turns out that + * mem_cgroup_charge_hugetlb failed, then immediately free the page and + * decrement NR_HUGETLB. + */ + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); + + if (ret =3D=3D -ENOMEM) { + free_huge_folio(folio); + /* + * Skip uncharging hugetlb_cgroup since the charges + * were committed to the folio and freeing the folio + * would have cleared those up. + */ + goto out_subpool_put; + } + hugetlb_set_folio_subpool(folio, spool); =20 if (map_chg !=3D MAP_CHG_ENFORCED) { @@ -3018,19 +3036,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, } } =20 - ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); - /* - * Unconditionally increment NR_HUGETLB here. If it turns out that - * mem_cgroup_charge_hugetlb failed, then immediately free the page and - * decrement NR_HUGETLB. - */ - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); - - if (ret =3D=3D -ENOMEM) { - free_huge_folio(folio); - return ERR_PTR(-ENOMEM); - } - return folio; =20 out_uncharge_cgroup: --=20 2.54.0.563.g4f69b47b94-goog From nobody Mon May 25 04:35:09 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A521A1EB19B for ; Tue, 19 May 2026 00:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; cv=none; b=U3xWALAe+yh1irHAEpouOiqarTjEiET5ARDp/Qnv6EGlWnz9PtLvgyPgqZMu3JWpwbwnlEB6FUZwPsmnlIk9V/wXHVXAhowmS3vXoWWiE8J/LPza4QxApk5dDBNYPDX+lliJzV5AenF63R+kJA/LaB1ry68aFRfsxA7rVgsk8rI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779149989; c=relaxed/simple; bh=Hzt/De4mOk+Mrf1ClqEzbjx7/3o5fPwlCY7Yl8WAxDo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=hjbTPKnQwQS6Q8BlX7EzqkDz3SIpAFRTgGa8x5V/rZXtXEGFdKci8ULYALsYpW5CWjbhwfIKP8jRp+qlu7QUZHNjw9xmAL68b+9LhiCAbH52Jx6BZn3rUxp6/YvkQZP4uXUtUndQh7UmE/8lUm9U/r/J03Kjz+4p10zCIEhMDFA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fP+rcvHj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fP+rcvHj" Received: by smtp.kernel.org (Postfix) with ESMTPS id 86E48C2BCB7; Tue, 19 May 2026 00:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779149989; bh=Hzt/De4mOk+Mrf1ClqEzbjx7/3o5fPwlCY7Yl8WAxDo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=fP+rcvHjoZRdF3Es0PDdaC8inQibLP81EnJeouUhnH0sNoDR2R5iV/YDhxIVzMCJg rDVz4sWtRyHnbaGNA+iozbhkAlHLLHyH9DR+d52pcqx6f8oy225qebN50jDA1ayC+x lAB0zpd1/25JPHiic18kb4de5VqJj2d/8ZVd2iGTmcBCO7AAffDiX/qtLdJKMtIyie QEFPHkmb9xUbvW9LQ8iV9Hopb9TWdwnvHWgvZGK1dcjwIBjY6hCa89EMmbPlUTcYJC XgrCVknyiy8fhw8Y6GuxK93APqSnSTwJSMJsUQZj9KxDqiMqtrYLCz/MMH9Be/SOA6 PEhedo1UJnFkA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AFF0CD343F; Tue, 19 May 2026 00:19:49 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Mon, 18 May 2026 17:19:45 -0700 Subject: [PATCH v3 6/6] mm: hugetlb: Refactor out hugetlb_alloc_folio() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260518-hugetlb-open-up-v3-6-e14b302477f8@google.com> References: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> In-Reply-To: <20260518-hugetlb-open-up-v3-0-e14b302477f8@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Zi Yan , Matthew Brost , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779149988; l=10377; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=Rm+Th8lyn4N0VMVm3x2XIXppuej/2vu+OQYaa8lCewA=; b=y6o3OBX386APgcc0m2g7ZJsWrHXUNu6KGXPM0gA4cXK8pSE2pB9R+d+OoAfGl2fStV8hvVFTI nI3r4KVxx/mDHvHL5ItCeqgiwEdJpbJJJwg6p6VZpBHT3nLu5QeiwDv X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which handles allocation of a folio and memory and HugeTLB charging to cgroups. This refactoring decouples the HugeTLB page allocation from VMAs, specifically: 1. Reservations (as in resv_map) are stored in the vma 2. mpol is stored at vma->vm_policy 3. A vma must be used for allocation even if the pages are not meant to be used by host process. Without this coupling, VMAs are no longer a requirement for allocation. This opens up the allocation routine for usage without VMAs, which will allow guest_memfd to use HugeTLB as a more generic allocator of huge pages, since guest_memfd memory may not have any associated VMAs by design. In addition, direct allocations from HugeTLB could possibly be refactored to avoid the use of a pseudo-VMA. Also, this decouples HugeTLB page allocation from HugeTLBfs, where the subpool is stored at the fs mount. This is also a requirement for guest_memfd, where the plan is to have a subpool created per-fd and stored on the inode. Provide and use alloc_flags to allow more allocation knobs in future without expanding the number of parameters in hugetlb_alloc_folio(). No functional change intended. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 19 +++++ mm/hugetlb.c | 188 +++++++++++++++++++++++++-------------------= ---- 2 files changed, 117 insertions(+), 90 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 93418625d3c5f..9a0222851573d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -2,6 +2,7 @@ #ifndef _LINUX_HUGETLB_H #define _LINUX_HUGETLB_H =20 +#include #include #include #include @@ -705,6 +706,24 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct = huge_bootmem_page *m); int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *= list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long en= d_pfn); void wait_for_freed_hugetlb_folios(void); + +struct mempolicy_interpreted { + int nid; + nodemask_t *nodemask; + enum mempolicy_mode mode; +}; + +enum hugetlb_alloc_flag { + HUGETLB_ALLOC_CHARGE_CGROUP_RSVD_BIT =3D 0, + HUGETLB_ALLOC_USE_GLOBAL_RESERVATIONS_BIT, +}; + +#define HUGETLB_ALLOC_CHARG_CGROUP_RSVD BIT(HUGETLB_ALLOC_CHARGE_CGROUP_RS= VD_BIT) +#define HUGETLB_ALLOC_USE_GLOBAL_RESERVATIONS BIT(HUGETLB_ALLOC_USE_GLOBAL= _RESERVATIONS_BIT) + +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpoo= l *spool, + gfp_t gfp, struct mempolicy_interpreted *mpoli, + u8 alloc_flags); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred= _nid, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 81e73186dff09..abce2ca76fb9c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1334,12 +1334,6 @@ static unsigned long available_huge_pages(struct hst= ate *h) return h->free_huge_pages - h->resv_huge_pages; } =20 -struct mempolicy_interpreted { - int nid; - nodemask_t *nodemask; - enum mempolicy_mode mode; -}; - static struct folio *dequeue_hugetlb_folio(struct hstate *h, gfp_t gfp_mas= k, struct mempolicy_interpreted *mpoli) { @@ -2829,6 +2823,90 @@ void wait_for_freed_hugetlb_folios(void) flush_work(&free_hpage_work); } =20 +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpoo= l *spool, + gfp_t gfp, struct mempolicy_interpreted *mpoli, + u8 alloc_flags) +{ + bool charge_hugetlb_cgroup_rsvd =3D alloc_flags & + HUGETLB_ALLOC_CHARG_CGROUP_RSVD; + bool use_global_reservation =3D alloc_flags & + HUGETLB_ALLOC_USE_GLOBAL_RESERVATIONS; + size_t nr_pages =3D pages_per_huge_page(h); + struct hugetlb_cgroup *h_cg =3D NULL; + int idx =3D hstate_index(h); + struct folio *folio; + int ret; + + if (charge_hugetlb_cgroup_rsvd && + hugetlb_cgroup_charge_cgroup_rsvd(idx, nr_pages, &h_cg)) + return ERR_PTR(-ENOSPC); + + if (hugetlb_cgroup_charge_cgroup(idx, nr_pages, &h_cg)) { + ret =3D -ENOSPC; + goto err_uncharge_hugetlb_cgroup_rsvd; + } + + spin_lock_irq(&hugetlb_lock); + + folio =3D NULL; + if (use_global_reservation || available_huge_pages(h)) + folio =3D dequeue_hugetlb_folio(h, gfp, mpoli); + + if (!folio) { + spin_unlock_irq(&hugetlb_lock); + folio =3D alloc_buddy_hugetlb_folio(h, gfp, mpoli); + if (!folio) { + ret =3D -ENOSPC; + goto err_uncharge_hugetlb_cgroup; + } + spin_lock_irq(&hugetlb_lock); + list_add(&folio->lru, &h->hugepage_activelist); + folio_ref_unfreeze(folio, 1); + } + + if (use_global_reservation) { + folio_set_hugetlb_restore_reserve(folio); + h->resv_huge_pages--; + } + + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); + + if (charge_hugetlb_cgroup_rsvd) { + hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), + h_cg, folio); + } + + spin_unlock_irq(&hugetlb_lock); + + ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); + /* + * Unconditionally increment NR_HUGETLB here because if + * mem_cgroup_charge_hugetlb failed, freeing the page will + * decrement NR_HUGETLB. + */ + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); + + if (ret =3D=3D -ENOMEM) { + free_huge_folio(folio); + /* + * Skip uncharging hugetlb_cgroup since the charges + * were committed to the folio and freeing the folio + * would have cleared those up. + */ + return ERR_PTR(ret); + } + + return folio; + + err_uncharge_hugetlb_cgroup: + hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg); + err_uncharge_hugetlb_cgroup_rsvd: + if (charge_hugetlb_cgroup_rsvd) + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, h_cg); + + return ERR_PTR(ret); +} + typedef enum { /* * For either 0/1: we checked the per-vma resv map, and one resv @@ -2864,11 +2942,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, long retval, gbl_chg, gbl_reserve; map_chg_state map_chg; int ret, idx; - struct hugetlb_cgroup *h_cg =3D NULL; struct mempolicy_interpreted mpoli; gfp_t gfp =3D htlb_alloc_mask(h); struct mempolicy *mpol; nodemask_t *nodemask; + u8 alloc_flags =3D 0; int nid; =20 idx =3D hstate_index(h); @@ -2916,23 +2994,18 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, } =20 /* - * If this allocation is not consuming a per-vma reservation, - * charge the hugetlb cgroup now. + * If allocation doesn't reuse a reservation in the resv_map, + * charge for the reservation. */ - if (map_chg) { - ret =3D hugetlb_cgroup_charge_cgroup_rsvd( - idx, pages_per_huge_page(h), &h_cg); - if (ret) { - ret =3D -ENOSPC; - goto out_subpool_put; - } - } + if (map_chg !=3D MAP_CHG_REUSE) + alloc_flags |=3D HUGETLB_ALLOC_CHARG_CGROUP_RSVD; =20 - ret =3D hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) { - ret =3D -ENOSPC; - goto out_uncharge_cgroup_reservation; - } + /* + * gbl_chg =3D=3D 0 indicates a reservation exists for this + * allocation, so try to use it. + */ + if (gbl_chg =3D=3D 0) + alloc_flags |=3D HUGETLB_ALLOC_USE_GLOBAL_RESERVATIONS; =20 /* Takes reference on mpol. */ nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); @@ -2942,70 +3015,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, .nodemask =3D nodemask, }; =20 - spin_lock_irq(&hugetlb_lock); - - /* - * gbl_chg =3D=3D 0 indicates a reservation exists for the - * allocation, so try dequeuing a page. In case there was no - * reservation, try dequeuing a page if there are available - * pages in the global pool. - */ - folio =3D NULL; - if (!gbl_chg || available_huge_pages(h)) - folio =3D dequeue_hugetlb_folio(h, gfp, &mpoli); - - if (!folio) { - spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_buddy_hugetlb_folio(h, gfp, &mpoli); - mpol_cond_put(mpol); - if (!folio) { - mpol_cond_put(mpol); - ret =3D -ENOSPC; - goto out_uncharge_cgroup; - } - spin_lock_irq(&hugetlb_lock); - list_add(&folio->lru, &h->hugepage_activelist); - folio_ref_unfreeze(folio, 1); - /* Fall through */ - } + folio =3D hugetlb_alloc_folio(h, spool, gfp, &mpoli, alloc_flags); =20 mpol_cond_put(mpol); =20 - /* - * Either dequeued or buddy-allocated folio needs to add special - * mark to the folio when it consumes a global reservation. - */ - if (!gbl_chg) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } - - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); - /* If allocation is not consuming a reservation, also store the - * hugetlb_cgroup pointer on the page. - */ - if (map_chg) { - hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), - h_cg, folio); - } - - spin_unlock_irq(&hugetlb_lock); - - ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); - /* - * Unconditionally increment NR_HUGETLB here. If it turns out that - * mem_cgroup_charge_hugetlb failed, then immediately free the page and - * decrement NR_HUGETLB. - */ - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); - - if (ret =3D=3D -ENOMEM) { - free_huge_folio(folio); - /* - * Skip uncharging hugetlb_cgroup since the charges - * were committed to the folio and freeing the folio - * would have cleared those up. - */ + if (IS_ERR(folio)) { + ret =3D PTR_ERR(folio); goto out_subpool_put; } =20 @@ -3038,12 +3053,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, =20 return folio; =20 -out_uncharge_cgroup: - hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); -out_uncharge_cgroup_reservation: - if (map_chg) - hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), - h_cg); out_subpool_put: /* * put page to subpool iff the quota of subpool's rsv_hpages is used @@ -3054,7 +3063,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, hugetlb_acct_memory(h, -gbl_reserve); } =20 - out_end_reservation: if (map_chg !=3D MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); --=20 2.54.0.563.g4f69b47b94-goog