From nobody Tue Feb 10 05:27:09 2026 Received: from sg-1-102.ptr.blmpb.com (sg-1-102.ptr.blmpb.com [118.26.132.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAC952FDC54 for ; Wed, 7 Jan 2026 11:34:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.102 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767785653; cv=none; b=TUH9GJPfn4MDrRRIDZ3UkUke4izW+CBacOCvCdxJMdNDRDy0x2tDRkDu3dLV2dB5zVQo+ZBMaAiCJdo2zJyx2M2CEhN/ykjZXOMH6ACzW+JAYyfWB2TYp0cvIJSOx/GpPMBxmxBTHV4T1mgqSwbHDYEKF1Yo5ZGcqb7sYH+EvT4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767785653; c=relaxed/simple; bh=JYng2GjIP1Zr6KVFfmNP13EyIvqMCS/y6rmW7E7ZVEU=; h=Subject:Date:From:Message-Id:References:To:Content-Type:Cc: Mime-Version:In-Reply-To; b=HlsJKGCFD0e5U5sfJgx5nQynA+mf5FkIqunYEB7PCOnVZy1xXLMvDWmgNWACA1Cx84xRhWkPDk+sQsn5mTkSHYOga1qyU2WhTrDhq2FWG7lp0WccaWLxzw/R1z8SNGYiOnMINSNhnu4EJYqAXHCey8QmQYdOfYu33+McIb3Mcgg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=oCBbv+Ay; arc=none smtp.client-ip=118.26.132.102 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="oCBbv+Ay" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1767785645; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=xEwE/5mv3+6N5jXpp615j2AhMiOJolhztasIKQwJLWI=; b=oCBbv+AyrRruOaEpQ2tgQ3K8UOLiFLP+n2T2M0rYpgczIr40X5rUEeRaXclcgbAhKoJe7f cn7Jz1/UEV8uajkxD3Nd3KfIsmfz1vepRiuphMi1H19H+Q2k+qRFNgOzB2j4pXJjtxVnHm PfsuY6p9QciFEjhAAV13HhisSbvxBexEC5zcAJ0oPkeOui0/o87XGFV5Nlf4N5vJN60e21 tl8rAiyuURDwJHZHK3vHh50bqrV24ycornSyzrTK2mlRrBeKsYjVGiijwSy+oAw991pS8u 9rpNaAwyGo+/YX4Fi6ty+26WgUx3m71/eHT3WhVOK0R23j93Ht+a6Fh3e3Hy4A== Subject: [PATCH v2 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Date: Wed, 7 Jan 2026 19:31:29 +0800 X-Mailer: git-send-email 2.45.2 X-Original-From: Li Zhe From: "Li Zhe" Message-Id: <20260107113130.37231-8-lizhe.67@bytedance.com> References: <20260107113130.37231-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable To: , , , , Cc: , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: In-Reply-To: <20260107113130.37231-1-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" Add epoll support for interface "zeroable_hugepages". When no huge folios are available for pre-zeroing, user space can block on the zeroable_hugepages file with epoll, and it will be woken as soon as one or more huge folios become eligible for pre-zeroing. Signed-off-by: Li Zhe --- mm/hugetlb.c | 13 +++++++++++++ mm/hugetlb_internal.h | 6 ++++++ mm/hugetlb_sysfs.c | 22 +++++++++++++++++++++- 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 42d327152da9..314734e434e2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1868,6 +1868,7 @@ void free_huge_folio(struct folio *folio) arch_clear_hugetlb_flags(folio); enqueue_hugetlb_folio(h, folio); spin_unlock_irqrestore(&hugetlb_lock, flags); + do_zero_free_notify(h, folio_nid(folio)); } } =20 @@ -1999,8 +2000,10 @@ static struct folio *alloc_fresh_hugetlb_folio(struc= t hstate *h, void prep_and_add_allocated_folios(struct hstate *h, struct list_head *folio_list) { + nodemask_t allocated_mask =3D NODE_MASK_NONE; unsigned long flags; struct folio *folio, *tmp_f; + int nid; =20 /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); @@ -2010,8 +2013,12 @@ void prep_and_add_allocated_folios(struct hstate *h, list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { prep_account_new_hugetlb_folio(h, folio); enqueue_hugetlb_folio(h, folio); + node_set(folio_nid(folio), allocated_mask); } spin_unlock_irqrestore(&hugetlb_lock, flags); + + for_each_node_mask(nid, allocated_mask) + do_zero_free_notify(h, nid); } =20 /* @@ -2383,6 +2390,8 @@ static int gather_surplus_pages(struct hstate *h, lon= g delta) long needed, allocated; bool alloc_ok =3D true; nodemask_t *mbind_nodemask, alloc_nodemask; + nodemask_t allocated_mask =3D NODE_MASK_NONE; + int nid; =20 mbind_nodemask =3D policy_mbind_nodemask(htlb_alloc_mask(h)); if (mbind_nodemask) @@ -2455,9 +2464,12 @@ static int gather_surplus_pages(struct hstate *h, lo= ng delta) break; /* Add the page to the hugetlb allocator */ enqueue_hugetlb_folio(h, folio); + node_set(folio_nid(folio), allocated_mask); } free: spin_unlock_irq(&hugetlb_lock); + for_each_node_mask(nid, allocated_mask) + do_zero_free_notify(h, nid); =20 /* * Free unnecessary surplus pages to the buddy allocator. @@ -2841,6 +2853,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct fo= lio *old_folio, * Folio has been replaced, we can safely free the old one. */ spin_unlock_irq(&hugetlb_lock); + do_zero_free_notify(h, folio_nid(new_folio)); update_and_free_hugetlb_folio(h, old_folio, false); } =20 diff --git a/mm/hugetlb_internal.h b/mm/hugetlb_internal.h index 1d2f870deccf..9c60661283c7 100644 --- a/mm/hugetlb_internal.h +++ b/mm/hugetlb_internal.h @@ -106,6 +106,12 @@ extern ssize_t __nr_hugepages_store_common(bool obey_m= empolicy, struct hstate *h, int nid, unsigned long count, size_t len); =20 +#ifdef CONFIG_NUMA +extern void do_zero_free_notify(struct hstate *h, int nid); +#else +static inline void do_zero_free_notify(struct hstate *h, int nid) {} +#endif + extern void hugetlb_sysfs_init(void) __init; =20 #ifdef CONFIG_SYSCTL diff --git a/mm/hugetlb_sysfs.c b/mm/hugetlb_sysfs.c index 03b774b1191a..77e7214a380e 100644 --- a/mm/hugetlb_sysfs.c +++ b/mm/hugetlb_sysfs.c @@ -340,6 +340,7 @@ static bool hugetlb_sysfs_initialized __ro_after_init; =20 struct node_hstate_item { struct kobject *hstate_kobj; + struct work_struct notify_work; }; =20 /* @@ -355,6 +356,21 @@ struct node_hstate { }; static struct node_hstate node_hstates[MAX_NUMNODES]; =20 +static void pre_zero_notify_fun(struct work_struct *work) +{ + struct node_hstate_item *item =3D + container_of(work, struct node_hstate_item, notify_work); + + sysfs_notify(item->hstate_kobj, NULL, "zeroable_hugepages"); +} + +void do_zero_free_notify(struct hstate *h, int nid) +{ + struct node_hstate *nhs =3D &node_hstates[nid]; + + schedule_work(&nhs->items[hstate_index(h)].notify_work); +} + static ssize_t zeroable_hugepages_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -568,8 +584,11 @@ void hugetlb_register_node(struct node *node) return; =20 for_each_hstate(h) { + int index =3D hstate_index(h); + struct node_hstate_item *item =3D &nhs->items[index]; + err =3D hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj, - &nhs->items[hstate_index(h)].hstate_kobj, + &item->hstate_kobj, &per_node_hstate_attr_group); if (err) { pr_err("HugeTLB: Unable to add hstate %s for node %d\n", @@ -577,6 +596,7 @@ void hugetlb_register_node(struct node *node) hugetlb_unregister_node(node); break; } + INIT_WORK(&item->notify_work, pre_zero_notify_fun); } } =20 --=20 2.20.1