From nobody Sat Jun 13 14:52:17 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC56329C35A for ; Wed, 6 May 2026 15:54:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; cv=none; b=p6dDk9EtmiOjgDktAajfmRkshBjdb29yqmzyFnAo8Ojt3oPpemcweq6+HIKtq531aczPE9ZO2ZAgR8ucVrAW/XQKYsGhR8peJ/xflnSyF3RVUiwicd5QgMiiXDuZeEuLS8GKsD0Va5yN5GAsMC0KTXy+ZokvN5tI915u28aK/9s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; c=relaxed/simple; bh=aXXAbdhropEO/MvYuH57KnmYvw/h15PGDWvnxkuSvMk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=vFlFWJvuMpCADA2eqDR9ibFLi8xJOaPfIjl0g6Ydq5GBeInqCxVwvHQnjzPHvyhofljrJtajt7KKcWShOwHUha5QNTeDTyi8zwTDyRT412B1+fVc/w7Qahaz5lmSW0JORiYpzLnwByOT4UPRVG9A37osHIotLxDGEBpdPRHXbAU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=heM//5zU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="heM//5zU" Received: by smtp.kernel.org (Postfix) with ESMTPS id 54E61C2BCB8; Wed, 6 May 2026 15:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778082887; bh=aXXAbdhropEO/MvYuH57KnmYvw/h15PGDWvnxkuSvMk=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=heM//5zU8WXXyBRnidKaEUVgFJrHBQsWaVz7Ri1cBb20IY0hk9IRWEQEbzQ3qm132 kBTV8LbGbmlwck4uvVik1emkYmeTHTtWSuUlSI1KV7i9HntOilcSf7cP+LxyKvR3FE qq1EX/vmoN/xnFGZ9ONdTmUdf5HXyioY8OJNRSt7OvZSqoxtPHqxuP2dqjGEI8NqsI RIS20NWOnefRftD7nVcg9mVby1wvlbJ0PWIDG96Dzpg+04RvPGSMBvy/VTIVKpI2HQ b8n2HH/goWp4vsK5PbaZXAzM3/u9GYef10qU24Yar0IVno8QC1XbkBrcXcqFyUIzWI 8ge6HntrRB2WA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4365ACD3442; Wed, 6 May 2026 15:54:47 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Wed, 06 May 2026 08:54:37 -0700 Subject: [PATCH v2 1/6] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260506-hugetlb-open-up-v2-1-826a0c5f28fc@google.com> References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> In-Reply-To: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778082886; l=2756; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=d07xjMz226JjEc/S2RGQulaBk1NX7KLyZoqjiwrAxHY=; b=+/SUHAmWoMFGdn/H4OjwEcyODzKvoeTyEnJ4IzRXpHEDn9Nb3OZ7OR2lii3cgkG937pidxwl/ emDZjH9Y099DWpGjfBTGt96dGIVzbBczTi8J9uhVKuMPxthX4fi1lSL X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng The dequeue_hugetlb_folio_vma() function currently handles the gbl_chg parameter to determine if a folio can be dequeued based on global page availability. This leaks reservation-specific logic into the dequeueing path. Relocate this logic to alloc_hugetlb_folio() so that dequeue_hugetlb_folio_vma() focuses solely on selecting and dequeuing a folio. In alloc_hugetlb_folio(), only attempt to dequeue a folio if a reservation exists (gbl_chg =3D=3D 0) or if there are available huge pages = in the global pool. No functional change intended. Signed-off-by: Ackerley Tng Reviewed-by: James Houghton Acked-by: Oscar Salvador --- mm/hugetlb.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f24bf49be047e..8be246b4e6134 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1336,7 +1336,7 @@ static unsigned long available_huge_pages(struct hsta= te *h) =20 static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, long gbl_chg) + unsigned long address) { struct folio *folio =3D NULL; struct mempolicy *mpol; @@ -1344,13 +1344,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struc= t hstate *h, nodemask_t *nodemask; int nid; =20 - /* - * gbl_chg=3D=3D1 means the allocation requires a new page that was not - * reserved before. Making sure there's at least one free page. - */ - if (gbl_chg && !available_huge_pages(h)) - goto err; - gfp_mask =3D htlb_alloc_mask(h); nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); =20 @@ -1368,9 +1361,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct= hstate *h, =20 mpol_cond_put(mpol); return folio; - -err: - return NULL; } =20 #if defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE) && defined(CONFIG_CONTIG_ALLOC) @@ -2939,12 +2929,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, goto out_uncharge_cgroup_reservation; =20 spin_lock_irq(&hugetlb_lock); + /* - * glb_chg is passed to indicate whether or not a page must be taken - * from the global free pool (global change). gbl_chg =3D=3D 0 indicates - * a reservation exists for the allocation. + * gbl_chg =3D=3D 0 indicates a reservation exists for the allocation - so + * try dequeuing a page. If there are available_huge_pages(), try using + * them! */ - folio =3D dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg); + folio =3D NULL; + if (!gbl_chg || available_huge_pages(h)) + folio =3D dequeue_hugetlb_folio_vma(h, vma, addr); + if (!folio) { spin_unlock_irq(&hugetlb_lock); folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); --=20 2.54.0.545.g6539524ca2-goog From nobody Sat Jun 13 14:52:17 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5C31481236 for ; Wed, 6 May 2026 15:54:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; cv=none; b=ihaXQgYW+vOkwKRBexhPjUKZv/oJW72FVjzD/c0+I4TIuL9fHeVrzSSro4JENjF4HKNjTU56aYD/G+jkOiQ39ej6GxWC66rjJOM29jwl9HpLPT9wPrwv5N+O/PHQsqWMKkF38awr/ZHpp2Po2ZTP0AXpYnS7x4DUyQ72EzUg6/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; c=relaxed/simple; bh=EFOzcDiW1BDWFgsTmk3gKLgei1KKWehsjtmzBwOCodE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=hfJkut0Q+gPf16WS1h58S1dvqOJ6M//Ivh3pa4ng6DA5U6i5OW50LbyrXH7ZjUgA8wv+pdOD7SikS/b4QhMSIWXJAU6hAjBAvceVZ9Zvq3apA66gEF2kIyrvCszDO5xqUv8KiZKChDYZnSLbD7U3VkpncVNLv+bxuEw+diCKceE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N0aUTaZk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N0aUTaZk" Received: by smtp.kernel.org (Postfix) with ESMTPS id 652F4C4AF0B; Wed, 6 May 2026 15:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778082887; bh=EFOzcDiW1BDWFgsTmk3gKLgei1KKWehsjtmzBwOCodE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=N0aUTaZkA0ShwVqw98q8eo8OiQaJq2Cur8HhxDQ8LlVPfSYGaMIUNSxQk66h2rKfJ S0wiJVIhUQEJk1GJJ+KvZlXbbBFxk51iVTzu3v120FMdaC8H7tj6bCCAtZjNw8rYD6 JYTD2LkhHj8TjebFmXeVsw5AQX8JoXPDJDgylije6DOJo80sjlyvGvFyLZVlIAn5KY vaisEsDr9euP7N4+zt3WdaDiFmQAFMm3lDP7FWA55XzQSsM/aGISzpQuJH6DxLlPLr nBAyUTndM3/pAWC2oq9y3xbW2a2UuUZhOkO+buS7N0BaXoE7ryATxFptA8gx4rqYn8 0lhyQZEjCJjmw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55E32CD342C; Wed, 6 May 2026 15:54:47 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Wed, 06 May 2026 08:54:38 -0700 Subject: [PATCH v2 2/6] mm: hugetlb: Move mpol interpretation out of alloc_buddy_hugetlb_folio_with_mpol() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260506-hugetlb-open-up-v2-2-826a0c5f28fc@google.com> References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> In-Reply-To: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778082886; l=2961; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=ngW2MpP6uF73PbJ5TvqQViScClcx1raOJSqESPZNT2g=; b=60OulaMrVIeL0AUDLw/Oj4vJ4TXPrn+qA6Ksx0VcV+YxxEG7QSPdreEFwdfYdmXpaEq60mkKZ NM7+wE2n83lBiKpobQJz9oQFqYCtE8npaDcWzeD6ssL5cTOLYqSVXQ1 X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Move memory policy interpretation out of alloc_buddy_hugetlb_folio_with_mpol() and into alloc_hugetlb_folio() to separate reading and interpretation of memory policy from actual allocation. This will later allow memory policy to be interpreted outside of the process of allocating a hugetlb folio entirely. This opens doors for other callers of the HugeTLB folio allocation function, such as guest_memfd, where memory may not always be mapped and hence may not have an associated vma. No functional change intended. Signed-off-by: Ackerley Tng Reviewed-by: James Houghton Acked-by: Oscar Salvador --- mm/hugetlb.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8be246b4e6134..ea3bc405b3162 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2160,15 +2160,11 @@ static struct folio *alloc_migrate_hugetlb_folio(st= ruct hstate *h, gfp_t gfp_mas */ static struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) + struct mempolicy *mpol, int nid, nodemask_t *nodemask) { struct folio *folio =3D NULL; - struct mempolicy *mpol; gfp_t gfp_mask =3D htlb_alloc_mask(h); - int nid; - nodemask_t *nodemask; =20 - nid =3D huge_node(vma, addr, gfp_mask, &mpol, &nodemask); if (mpol_is_preferred_many(mpol)) { gfp_t gfp =3D gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); =20 @@ -2180,7 +2176,7 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(str= uct hstate *h, =20 if (!folio) folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, nid, nodemask); - mpol_cond_put(mpol); + return folio; } =20 @@ -2869,7 +2865,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; - gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + gfp_t gfp =3D htlb_alloc_mask(h); =20 idx =3D hstate_index(h); =20 @@ -2940,8 +2936,14 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, folio =3D dequeue_hugetlb_folio_vma(h, vma, addr); =20 if (!folio) { + struct mempolicy *mpol; + nodemask_t *nodemask; + int nid; + spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); + nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); + folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); + mpol_cond_put(mpol); if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); @@ -2997,7 +2999,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, } } =20 - ret =3D mem_cgroup_charge_hugetlb(folio, gfp); + ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); /* * Unconditionally increment NR_HUGETLB here. If it turns out that * mem_cgroup_charge_hugetlb failed, then immediately free the page and --=20 2.54.0.545.g6539524ca2-goog From nobody Sat Jun 13 14:52:17 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF363481FC1 for ; Wed, 6 May 2026 15:54:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; cv=none; b=mRXxy6RfofJ/JUAbOVV3v2LBzIsnqe1ZmTO3viHThpXt8QmezPQN+2cDKooJ1sFxgkCudfbv9Vz+MPemwpqusCcLvF8WsUa7TgPqEzjpYsUHBRgeHB1gbBq5Q97wpWJX97946jjPWZoYvBqCG5zAkSZIcDjJIvajeBLUEdCl9AY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; c=relaxed/simple; bh=Xc4Mx3TAo5LaqhnOgXJq/fFetgGTOsrrZKWvr/AN4iA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sCIPaYz9u42EXUNkskM2ENXxHADjXHyVqm3Pfu/ABuFcJTDNBtCLAk6IK/VPRKpI0KhPMhjENsk31wzMruyVAPRoA0Qwd2OuEiM1JJvsIHEyGhuVNWs3CBPymK+GOvVljCuDqHWp8TMyx9cbdqRvITxLgzEddEZZV3O0EcarRrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NLnbJ6By; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NLnbJ6By" Received: by smtp.kernel.org (Postfix) with ESMTPS id 746E3C2BCB0; Wed, 6 May 2026 15:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778082887; bh=Xc4Mx3TAo5LaqhnOgXJq/fFetgGTOsrrZKWvr/AN4iA=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=NLnbJ6BypjwjDgypbp/bz7x0EmLAhF/iB1cjZRPf7zn7V3ZU1ySM9xh195P+B5TWD n9MtvN0Z+pnO0sLXjJ8wj3AbF2U91Q/bzv8iNVf0admIuAQgEd+2ezCoiOWrrXvP2g VuUKXkofrO2nup1/Hr+sNkhGfbbrZ38dTlTceN+1Aq1H3dT56V3uumpBd8fkz3FEnC oZrri0d9zzORGdf/wFs5CPSjvHxN5qD9BgakXZEUVMH3jYKAubgJEeqjmc/d5E0oR4 LCtqML6v+3XM3ZFd88yb0EYREblaColKQmyX8KFHIAS2vNVRUwSKsNT/ED9T4ShG+p +MjDCyEqHhsbg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69D3ACD3445; Wed, 6 May 2026 15:54:47 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Wed, 06 May 2026 08:54:39 -0700 Subject: [PATCH v2 3/6] mm: hugetlb: Move mpol interpretation out of dequeue_hugetlb_folio_vma() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260506-hugetlb-open-up-v2-3-826a0c5f28fc@google.com> References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> In-Reply-To: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778082886; l=3663; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=zIDAPwSnRf8adh9I7F/T1NlbtOUHnAoWsG2BkZYAmpE=; b=e6zxTOz1Z5Z7o18qfH+LW7SX9rmjkxEwXMwf606rlXQdoT9U30y3BMbzYKZvyIorkfFoOtRCm J8flqEynR5oAbgxmGpozVQHraboIoYB2OmWYo5LFk4l23gaMyzZ0nSY X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Move memory policy interpretation out of dequeue_hugetlb_folio_vma() and into alloc_hugetlb_folio() to separate reading and interpretation of memory policy from actual allocation. Also rename dequeue_hugetlb_folio_vma() to dequeue_hugetlb_folio_with_mpol() to remove association with vma and to align with alloc_buddy_hugetlb_folio_with_mpol(). This will later allow memory policy to be interpreted outside of the process of allocating a hugetlb folio entirely. This opens doors for other callers of the HugeTLB folio allocation function, such as guest_memfd, where memory may not always be mapped and hence may not have an associated vma. No functional change intended. Signed-off-by: Ackerley Tng Reviewed-by: James Houghton --- mm/hugetlb.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ea3bc405b3162..3395de4d0999a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1334,18 +1334,11 @@ static unsigned long available_huge_pages(struct hs= tate *h) return h->free_huge_pages - h->resv_huge_pages; } =20 -static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, - struct vm_area_struct *vma, - unsigned long address) +static struct folio *dequeue_hugetlb_folio_with_mpol(struct hstate *h, + struct mempolicy *mpol, int nid, nodemask_t *nodemask) { struct folio *folio =3D NULL; - struct mempolicy *mpol; - gfp_t gfp_mask; - nodemask_t *nodemask; - int nid; - - gfp_mask =3D htlb_alloc_mask(h); - nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); + gfp_t gfp_mask =3D htlb_alloc_mask(h); =20 if (mpol_is_preferred_many(mpol)) { folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, @@ -1359,7 +1352,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct= hstate *h, folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); =20 - mpol_cond_put(mpol); return folio; } =20 @@ -2866,6 +2858,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; gfp_t gfp =3D htlb_alloc_mask(h); + struct mempolicy *mpol; + nodemask_t *nodemask; + int nid; =20 idx =3D hstate_index(h); =20 @@ -2926,6 +2921,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, =20 spin_lock_irq(&hugetlb_lock); =20 + /* Takes reference on mpol. */ + nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); + /* * gbl_chg =3D=3D 0 indicates a reservation exists for the allocation - so * try dequeuing a page. If there are available_huge_pages(), try using @@ -2933,25 +2931,23 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, */ folio =3D NULL; if (!gbl_chg || available_huge_pages(h)) - folio =3D dequeue_hugetlb_folio_vma(h, vma, addr); + folio =3D dequeue_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); =20 if (!folio) { - struct mempolicy *mpol; - nodemask_t *nodemask; - int nid; - spin_unlock_irq(&hugetlb_lock); - nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); - mpol_cond_put(mpol); - if (!folio) + if (!folio) { + mpol_cond_put(mpol); goto out_uncharge_cgroup; + } spin_lock_irq(&hugetlb_lock); list_add(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); /* Fall through */ } =20 + mpol_cond_put(mpol); + /* * Either dequeued or buddy-allocated folio needs to add special * mark to the folio when it consumes a global reservation. --=20 2.54.0.545.g6539524ca2-goog From nobody Sat Jun 13 14:52:17 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF24B3ED5A4 for ; Wed, 6 May 2026 15:54:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; cv=none; b=uj1S4K1Oyhb2FAjXSExch5wXZgEVZidhG2pnTZXYjAU4ribtZ4X9rAPS3CvGd8COoF3LYcbY6KG0sZWIACujSbo0IneWz72arcNiKuRr3/YvdG2va1mmW5Hu5gnI6hbtR6BsqmQCBL6biVtPd/WGzLn5Ri+SwC/oMV1a9w34vOk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082887; c=relaxed/simple; bh=MTAqXqjU1haYCX3aL6jCZ5kr7+TwMGVT8o6E66wbHEw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FIrcBAbERUwUKQjUF5xAqTwDMrPNNfU18iKALVN+/bq7v3sdj+kq3ILW7MJF8s1UH04K9NSYUSmfmlPscw/O31/j8Av925wpudPHOPRJUmhUsuBExhEmFxhromRz2wXBUZTmRd8nxRZqOeTt9jGmBMJr5WpKIaDK2evvgSLV+B0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Spy2fGUa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Spy2fGUa" Received: by smtp.kernel.org (Postfix) with ESMTPS id 8589EC4AF10; Wed, 6 May 2026 15:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778082887; bh=MTAqXqjU1haYCX3aL6jCZ5kr7+TwMGVT8o6E66wbHEw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=Spy2fGUaj609nhPpwatopMmHdK8eVklmI2H0UvK27zGUzJjsmAD6P6C/4FfQ4/nzb 7Yge0qfDp2zwOejFgfgs4f2S3DD1UmRA/Tdi8r2B6R503D1LWasVDnTCm3b/qcha9z 6ENjWJA6tT+OjegIKwHMWM4EkWRdWB6MT54dw/raCw+OKT7AULGJzfKFwQhv6Hz+xS UY/WjNeBt8GrmZSc/aCO4GROsN5u2wq44sE1vh3jOrDZwLM+gjUyCnncv3XewFMhgJ zp3hBlpQ5W6sEGQkXWy+08WleFIun1POXV2x3po0wZyRCKccUXIIojhorX1WWm+4f2 owEjzjDqIMELA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D4FECD3443; Wed, 6 May 2026 15:54:47 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Wed, 06 May 2026 08:54:40 -0700 Subject: [PATCH v2 4/6] mm: hugetlb: Use error variable in alloc_hugetlb_folio Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260506-hugetlb-open-up-v2-4-826a0c5f28fc@google.com> References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> In-Reply-To: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778082886; l=2128; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=eGP3URO25a2yJfXjXfljDSZfxvkXYHSvdYeyD7MI+ek=; b=7RdhDIfpu2hRHtT21qfwETAVWbSePUPaxz9JrwOIiwwAWtPE6ipWXGpk2G8+NkmVEBhRnCSRm dv6BvSt6FHdDvL8xQXod9qkE63QOT2QnsHlTzCbxf64nENwkYBB79re X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Refactor alloc_hugetlb_folio to use a local variable for returning error codes. Instead of returning ERR_PTR(-ENOSPC) at the end of the error path, assign -ENOSPC to a return variable at each failure point and return that variable at the end. This allows the cleanup goto targets to be used with other errors in a later patch. No functional change intended. Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3395de4d0999a..68c21305fc86a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2894,8 +2894,10 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, */ if (map_chg) { gbl_chg =3D hugepage_subpool_get_pages(spool, 1); - if (gbl_chg < 0) + if (gbl_chg < 0) { + ret =3D -ENOSPC; goto out_end_reservation; + } } else { /* * If we have the vma reservation ready, no need for extra @@ -2911,13 +2913,17 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, if (map_chg) { ret =3D hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); - if (ret) + if (ret) { + ret =3D -ENOSPC; goto out_subpool_put; + } } =20 ret =3D hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) + if (ret) { + ret =3D -ENOSPC; goto out_uncharge_cgroup_reservation; + } =20 spin_lock_irq(&hugetlb_lock); =20 @@ -2938,6 +2944,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); if (!folio) { mpol_cond_put(mpol); + ret =3D -ENOSPC; goto out_uncharge_cgroup; } spin_lock_irq(&hugetlb_lock); @@ -3030,7 +3037,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, out_end_reservation: if (map_chg !=3D MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); - return ERR_PTR(-ENOSPC); + return ERR_PTR(ret); } =20 static __init void *alloc_bootmem(struct hstate *h, int nid, bool node_exa= ct) --=20 2.54.0.545.g6539524ca2-goog From nobody Sat Jun 13 14:52:17 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63F9948BD3E for ; Wed, 6 May 2026 15:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082888; cv=none; b=XBc8UN/6tN1G0YarEA1I90zJT7qBXwn2khdU1+EKTmj5mZHT41yxUM/QQhCh9Qlb6q/aE4tUQz5xgSPSIhoyn7OlqwhRKY3Wyv35mN0P6wpH2Y2U+z6gakSILPa9I4X3hUXsrFgRYrAInKgmmrghqZJEoo4StZXnEGhRbPLfVxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082888; c=relaxed/simple; bh=Cjxf/EnKO3INuCZtalhdX+tplsTqxpNUbg9nn1buYwI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=i0r6bCE4A0KP2r+a7svRKq4PQf8/ejwulPWAeUvNXftM25O9hkKXwLGbfHOs8Tlu8qSP0+z1bdGNaTeujBgSpjmejldaveOpjw64xo4OBrdTmIaZdvK2k0dnovX154xopXFL1rI3qxJRcK7r3Zz4MiaL24vXbvBf7KF7elXaZSQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CvwEGP9v; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CvwEGP9v" Received: by smtp.kernel.org (Postfix) with ESMTPS id A03F9C2BCFB; Wed, 6 May 2026 15:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778082887; bh=Cjxf/EnKO3INuCZtalhdX+tplsTqxpNUbg9nn1buYwI=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=CvwEGP9vLmFdRLhKLodlSbyf1vlagsSAcJNfXsa9FJgTWAHPFusFLc2BiFb9ddUr6 9WIStheQWKcOfx2O5jqQZVvKA8l36RqYe07OYPQ0BsRUEIIexxiwQgGU2YoHq0f+L0 ZXdSQMyZRWjCEPAcLW1v3gkrYXPWGbAnG84BvKBX2L/8dQQ7kOV669ZaUgMLsX+3YY WSNHw0MJCGG7eOvQaErmkzVMCvH85PF/wdmUyNKH9ycfE63VYnzNJYZRhCBR77zkh4 M+AtB1ajHLW+TdW7OqOgn4ClWYs2Shcs5QZQQ7RkWSCjDerEuq5WxaHZj5bt1AVgpx JXQ6sx6Z7linQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95B9ACD3444; Wed, 6 May 2026 15:54:47 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Wed, 06 May 2026 08:54:41 -0700 Subject: [PATCH v2 5/6] mm: hugetlb: Move mem_cgroup_charge_hugetlb() earlier in allocation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260506-hugetlb-open-up-v2-5-826a0c5f28fc@google.com> References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> In-Reply-To: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778082886; l=2460; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=MPbeDKNyLcbnXQ1xH9ZoNE3gcjhnProATlB2NU584e0=; b=MOe7vO6p8nmK9LtfzZoHWqxWO9rab9/pRCiXIaTQpCsX0qT0RUFUF8USA1VfyA7gflpQBiBnN YGA/0QFBYpYB+KI9ayI9sSIXOmTRv/VCNKint5FcRJbYweNG5u6hQDt X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Move mem_cgroup_charge_hugetlb() earlier in the folio allocation process. This change draws a cleaner line between memcg charging and the subsequent hugetlb-specific reservation logic for VMAs and subpools. While it would be ideal to make all accounting and reservations perfectly symmetric, mem_cgroup_charge_hugetlb() is a complex operation that cannot be performed under the hugetlb_lock. Moving the charge to this earlier point ensures that memcg charging is handled before the code begins manipulating subpool and VMA-specific state. These two types of accounting will be separated in a future patch. If mem_cgroup_charge_hugetlb() fails, the code now branches to out_subpool_put to ensure the folio is freed and the subpool references are handled correctly. Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 68c21305fc86a..4159b3565a9be 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2975,6 +2975,24 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, =20 spin_unlock_irq(&hugetlb_lock); =20 + ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); + /* + * Unconditionally increment NR_HUGETLB here. If it turns out that + * mem_cgroup_charge_hugetlb failed, then immediately free the page and + * decrement NR_HUGETLB. + */ + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); + + if (ret =3D=3D -ENOMEM) { + free_huge_folio(folio); + /* + * Skip uncharging hugetlb_cgroup since the charges + * were committed to the folio and freeing the folio + * would have cleared those up. + */ + goto out_subpool_put; + } + hugetlb_set_folio_subpool(folio, spool); =20 if (map_chg !=3D MAP_CHG_ENFORCED) { @@ -3002,19 +3020,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, } } =20 - ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); - /* - * Unconditionally increment NR_HUGETLB here. If it turns out that - * mem_cgroup_charge_hugetlb failed, then immediately free the page and - * decrement NR_HUGETLB. - */ - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); - - if (ret =3D=3D -ENOMEM) { - free_huge_folio(folio); - return ERR_PTR(-ENOMEM); - } - return folio; =20 out_uncharge_cgroup: --=20 2.54.0.545.g6539524ca2-goog From nobody Sat Jun 13 14:52:17 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62BD148AE32 for ; Wed, 6 May 2026 15:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082888; cv=none; b=SmogIAYreyIIV4qEy/Rdu3R0jN+c+mNTo0jjJvB8roebjpsi8jDihJnnvH6Qg8aXsYMzDOo01j3kCjfCT/R+AQWaoN2RzeVLLfUrxBKsV/5nPdUcxW+ANkvpj8gz5qR+M09y37tkp50PFA6535bqw7ukj3iXjraWHSUrcf0sy5g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778082888; c=relaxed/simple; bh=+7G4RM081CTfKDu8eUBQS6YQNemzuMtvLb/fbGmv+eY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ZK85uimOyirZCf1AeMK93Xz+5q8tWRS6YFOZE8XYJaAX6C51/QE0R1RIID63aaWC2IVHDy4j0XkMmRCwJSKoW+vnUt//TfdPq2C31xK1s9oMUeRJw0/BMtVwHXAwQJEOIG7qNllgnYRjAnczmiPPYG8Te1+dvK6onxrNg4H+dJs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EZ5tb2rB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EZ5tb2rB" Received: by smtp.kernel.org (Postfix) with ESMTPS id B3ED3C2BCF4; Wed, 6 May 2026 15:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778082887; bh=+7G4RM081CTfKDu8eUBQS6YQNemzuMtvLb/fbGmv+eY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=EZ5tb2rB4uixhV43NTPrXveOV9p7u1rAJCDGdMp8WBd6b99CYZkP+/c8nGn1rwHqG pmhTSckT+IHNBnLddE9G8ena2PrPX4ZUnKdCfNW8l1GB7mmAZE7/8jCNXntule71l8 F/IcaW/teRkSvxo3o6iQ2NKmAMZElu+/D+XUkM21i9Rip/UEoVoJQ55O4+u/H0k7Jr oYHoUiWoe/Z2KV7+QapyARywE7IAco8FBjduNInabw5FGLSXQIaxKtJTGZprxP5rGY 1s7MiRshd3TBRczF+tIDbFNqhUXK3U4uPLobNISyyeI7mwNiYl6uZWXDMymocpuWyq jEvhQnJjPtZEQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id A92DCCD3442; Wed, 6 May 2026 15:54:47 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Wed, 06 May 2026 08:54:42 -0700 Subject: [PATCH v2 6/6] mm: hugetlb: Refactor out hugetlb_alloc_folio() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260506-hugetlb-open-up-v2-6-826a0c5f28fc@google.com> References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> In-Reply-To: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> To: Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1778082886; l=9112; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=zPlFgYkmuWlbZ0fv93XRQ9dNZUEoV1wVV83bZAcmM2o=; b=HPPWp6DtX5ckKKL2Y2JyaJF+iyeM5p2mthrOWGId5tZ2kGaFEKAjjUeJk0x1ijjrxS2mgxyZD 757hPSkEcAiA6KNYC/D+jwmDVhIhVulPQxc2ibS0qLSwfEUrGctzgr5 X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which handles allocation of a folio and memory and HugeTLB charging to cgroups. This refactoring decouples the HugeTLB page allocation from VMAs, specifically: 1. Reservations (as in resv_map) are stored in the vma 2. mpol is stored at vma->vm_policy 3. A vma must be used for allocation even if the pages are not meant to be used by host process. Without this coupling, VMAs are no longer a requirement for allocation. This opens up the allocation routine for usage without VMAs, which will allow guest_memfd to use HugeTLB as a more generic allocator of huge pages, since guest_memfd memory may not have any associated VMAs by design. In addition, direct allocations from HugeTLB could possibly be refactored to avoid the use of a pseudo-VMA. Also, this decouples HugeTLB page allocation from HugeTLBfs, where the subpool is stored at the fs mount. This is also a requirement for guest_memfd, where the plan is to have a subpool created per-fd and stored on the inode. No functional change intended. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 3 + mm/hugetlb.c | 179 ++++++++++++++++++++++++++------------------= ---- 2 files changed, 100 insertions(+), 82 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 93418625d3c5f..ec205d8580885 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -705,6 +705,9 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct h= uge_bootmem_page *m); int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *= list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long en= d_pfn); void wait_for_freed_hugetlb_folios(void); +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpoo= l *spool, + struct mempolicy *mpol, int nid, nodemask_t *nodemask, + bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred= _nid, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4159b3565a9be..a1c5b94e52e0a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2821,6 +2821,88 @@ void wait_for_freed_hugetlb_folios(void) flush_work(&free_hpage_work); } =20 +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpoo= l *spool, + struct mempolicy *mpol, int nid, nodemask_t *nodemask, + bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation) +{ + size_t nr_pages =3D pages_per_huge_page(h); + struct hugetlb_cgroup *h_cg =3D NULL; + gfp_t gfp =3D htlb_alloc_mask(h); + int idx =3D hstate_index(h); + struct folio *folio; + int ret; + + if (charge_hugetlb_cgroup_rsvd && + hugetlb_cgroup_charge_cgroup_rsvd(idx, nr_pages, &h_cg)) + return ERR_PTR(-ENOSPC); + + if (hugetlb_cgroup_charge_cgroup(idx, nr_pages, &h_cg)) { + ret =3D -ENOSPC; + goto err_uncharge_hugetlb_cgroup_rsvd; + } + + spin_lock_irq(&hugetlb_lock); + + folio =3D NULL; + if (use_global_reservation || available_huge_pages(h)) + folio =3D dequeue_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); + + if (!folio) { + spin_unlock_irq(&hugetlb_lock); + folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); + if (!folio) { + ret =3D -ENOSPC; + goto err_uncharge_hugetlb_cgroup; + } + spin_lock_irq(&hugetlb_lock); + list_add(&folio->lru, &h->hugepage_activelist); + folio_ref_unfreeze(folio, 1); + /* Fall through */ + } + + if (use_global_reservation) { + folio_set_hugetlb_restore_reserve(folio); + h->resv_huge_pages--; + } + + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); + + if (charge_hugetlb_cgroup_rsvd) { + hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), + h_cg, folio); + } + + spin_unlock_irq(&hugetlb_lock); + + ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); + /* + * Unconditionally increment NR_HUGETLB here because if + * mem_cgroup_charge_hugetlb failed, freeing the page will + * decrement NR_HUGETLB. + */ + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); + + if (ret =3D=3D -ENOMEM) { + free_huge_folio(folio); + /* + * Skip uncharging hugetlb_cgroup since the charges + * were committed to the folio and freeing the folio + * would have cleared those up. + */ + return ERR_PTR(ret); + } + + return folio; + + err_uncharge_hugetlb_cgroup: + hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg); + err_uncharge_hugetlb_cgroup_rsvd: + if (charge_hugetlb_cgroup_rsvd) + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, h_cg); + + return ERR_PTR(ret); +} + typedef enum { /* * For either 0/1: we checked the per-vma resv map, and one resv @@ -2856,11 +2938,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, long retval, gbl_chg, gbl_reserve; map_chg_state map_chg; int ret, idx; - struct hugetlb_cgroup *h_cg =3D NULL; gfp_t gfp =3D htlb_alloc_mask(h); struct mempolicy *mpol; nodemask_t *nodemask; int nid; + bool charge_hugetlb_cgroup_rsvd; + bool global_reservation_exists; =20 idx =3D hstate_index(h); =20 @@ -2907,89 +2990,28 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, } =20 /* - * If this allocation is not consuming a per-vma reservation, - * charge the hugetlb cgroup now. + * If allocation doesn't reuse a reservation in the resv_map, + * charge for the reservation. */ - if (map_chg) { - ret =3D hugetlb_cgroup_charge_cgroup_rsvd( - idx, pages_per_huge_page(h), &h_cg); - if (ret) { - ret =3D -ENOSPC; - goto out_subpool_put; - } - } + charge_hugetlb_cgroup_rsvd =3D map_chg !=3D MAP_CHG_REUSE; =20 - ret =3D hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) { - ret =3D -ENOSPC; - goto out_uncharge_cgroup_reservation; - } - - spin_lock_irq(&hugetlb_lock); + /* + * gbl_chg =3D=3D 0 indicates a reservation exists for this + * allocation, so try to use it. + */ + global_reservation_exists =3D gbl_chg =3D=3D 0; =20 /* Takes reference on mpol. */ nid =3D huge_node(vma, addr, gfp, &mpol, &nodemask); =20 - /* - * gbl_chg =3D=3D 0 indicates a reservation exists for the allocation - so - * try dequeuing a page. If there are available_huge_pages(), try using - * them! - */ - folio =3D NULL; - if (!gbl_chg || available_huge_pages(h)) - folio =3D dequeue_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); - - if (!folio) { - spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, mpol, nid, nodemask); - if (!folio) { - mpol_cond_put(mpol); - ret =3D -ENOSPC; - goto out_uncharge_cgroup; - } - spin_lock_irq(&hugetlb_lock); - list_add(&folio->lru, &h->hugepage_activelist); - folio_ref_unfreeze(folio, 1); - /* Fall through */ - } + folio =3D hugetlb_alloc_folio(h, spool, mpol, nid, nodemask, + charge_hugetlb_cgroup_rsvd, + global_reservation_exists); =20 mpol_cond_put(mpol); =20 - /* - * Either dequeued or buddy-allocated folio needs to add special - * mark to the folio when it consumes a global reservation. - */ - if (!gbl_chg) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } - - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); - /* If allocation is not consuming a reservation, also store the - * hugetlb_cgroup pointer on the page. - */ - if (map_chg) { - hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), - h_cg, folio); - } - - spin_unlock_irq(&hugetlb_lock); - - ret =3D mem_cgroup_charge_hugetlb(folio, gfp | __GFP_RETRY_MAYFAIL); - /* - * Unconditionally increment NR_HUGETLB here. If it turns out that - * mem_cgroup_charge_hugetlb failed, then immediately free the page and - * decrement NR_HUGETLB. - */ - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); - - if (ret =3D=3D -ENOMEM) { - free_huge_folio(folio); - /* - * Skip uncharging hugetlb_cgroup since the charges - * were committed to the folio and freeing the folio - * would have cleared those up. - */ + if (IS_ERR(folio)) { + ret =3D PTR_ERR(folio); goto out_subpool_put; } =20 @@ -3022,12 +3044,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, =20 return folio; =20 -out_uncharge_cgroup: - hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); -out_uncharge_cgroup_reservation: - if (map_chg) - hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), - h_cg); out_subpool_put: /* * put page to subpool iff the quota of subpool's rsv_hpages is used @@ -3038,7 +3054,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, hugetlb_acct_memory(h, -gbl_reserve); } =20 - out_end_reservation: if (map_chg !=3D MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); --=20 2.54.0.545.g6539524ca2-goog