From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-104.ptr.blmpb.com (sg-1-104.ptr.blmpb.com [118.26.132.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35F061A0BD6 for ; Thu, 25 Dec 2025 08:21:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.104 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650923; cv=none; b=tm6jdPKvJ9yGMxyb7wwXaf9KiTg4WnF7fKo1xzfyaOGU02LVVU3GXNJNCweIjBVLesskD1REnlRdvE9bWVT1+1M5Xd/upQC3zZVrbbqD6nEW8RTEAFQ9ZjDhBG0ohQsI/yL9mLOcToQSqSUtkDfGIl1Xp3iX5OZk/4tHSLtIUaQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650923; c=relaxed/simple; bh=Dp7cHe3gRB1pb4ERMpmvIWljt0qYCs79eURi3ewULjg=; h=To:Message-Id:Cc:Mime-Version:Subject:Date:From:In-Reply-To: References:Content-Type; b=eM1XyRO49CEjdEqp+8qwvlU+H3e6J2De9dKLB/E+2l+Z6ugS06hIBWrPr3vo4qTfAWrPgDyJc2VW3xtDRbXSTtv/vgNoOltUznIqPfVyEyVkC37Hm3bQzi1n14oZHMqSLgOhRq+rsAspUZbdF84M+P5Xpo/jVD4SJOLp/NoaeNw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XQBVdjdR; arc=none smtp.client-ip=118.26.132.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XQBVdjdR" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650907; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=+r0VFqcM+WeLTZRrlUPH/veK2s7ukAnrdDDBIh/Zd8Y=; b=XQBVdjdR4orYpadnV12J6eKNNjbp9rihd9eQ2QaJV8cEJGQJAaPhH+ISxBnhMJtuT49EWz PNXtgtRSaeV9xTKJIWdh36CSDbgEQtfwVptLh8ajVKiVG7wF9dOOGLJe/Ip5f7+xn9OMjd JoPMas5kkTkCys0PAggA8JgkuNViut9GJVkSXL5VTD1s6LCvGIJKqmdfY/aoTjwh05oFOm hC1AW7oXCHLeNXYv5GzyX26eYrd18RiKUPgSmNk4jKKbWgXKjrV2I4Ne8bI0hLcWcqjwcx toPX0A0I8vRMS0xC7IeKCoELCVSG4uhc0Aqw63QOAjjXDqRkS3kwqCR6uj4KIg== To: , , , , Message-Id: <20251225082059.1632-2-lizhe.67@bytedance.com> X-Lms-Return-Path: Cc: , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: lizhe.67@bytedance.com Subject: [PATCH 1/8] mm/hugetlb: add pre-zeroed framework Date: Thu, 25 Dec 2025 16:20:52 +0800 Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.45.2 From: =?utf-8?q?=E6=9D=8E=E5=96=86?= In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> References: <20251225082059.1632-1-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" From: Li Zhe This patch establishes a pre-zeroing framework by introducing two new hugetlb page flags and extends the code at every point where these flags may later be required. The roles of the two flags are as follows. (1) HPG_zeroed =E2=80=93 indicates that the huge folio has already been zeroed (2) HPG_zeroing =E2=80=93 marks that the huge folio is currently being zero= ed No functional change, as nothing sets the flags yet. Co-developed-by: Frank van der Linden Signed-off-by: Frank van der Linden Signed-off-by: Li Zhe --- fs/hugetlbfs/inode.c | 3 +- include/linux/hugetlb.h | 26 +++++++++ mm/hugetlb.c | 113 +++++++++++++++++++++++++++++++++++++--- 3 files changed, 133 insertions(+), 9 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3b4c152c5c73..be6b32ab3ca8 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -828,8 +828,7 @@ static long hugetlbfs_fallocate(struct file *file, int = mode, loff_t offset, error =3D PTR_ERR(folio); goto out; } - folio_zero_user(folio, addr); - __folio_mark_uptodate(folio); + hugetlb_zero_folio(folio, addr); error =3D hugetlb_add_to_page_cache(folio, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, folio); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 019a1c5281e4..2daf4422a17d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -584,6 +584,17 @@ hugetlb_get_unmapped_area(struct file *file, unsigned = long addr, * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are free= d. * HPG_raw_hwp_unreliable - Set when the hugetlb page has a hwpoison sub-p= age * that is not tracked by raw_hwp_page list. + * HPG_zeroed - page was pre-zeroed. + * Synchronization: hugetlb_lock held when set by pre-zero thread. + * Only valid to read outside hugetlb_lock once the page is off + * the freelist, and HPG_zeroing is clear. Always cleared when a + * page is put (back) on the freelist. + * HPG_zeroing - page is being zeroed by the pre-zero thread. + * Synchronization: set and cleared by the pre-zero thread with + * hugetlb_lock held. Access by others is read-only. Once the page + * is off the freelist, this can only change from set -> clear, + * which the new page owner must wait for. Always cleared + * when a page is put (back) on the freelist. */ enum hugetlb_page_flags { HPG_restore_reserve =3D 0, @@ -593,6 +604,8 @@ enum hugetlb_page_flags { HPG_vmemmap_optimized, HPG_raw_hwp_unreliable, HPG_cma, + HPG_zeroed, + HPG_zeroing, __NR_HPAGEFLAGS, }; =20 @@ -653,6 +666,8 @@ HPAGEFLAG(Freed, freed) HPAGEFLAG(VmemmapOptimized, vmemmap_optimized) HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable) HPAGEFLAG(Cma, cma) +HPAGEFLAG(Zeroed, zeroed) +HPAGEFLAG(Zeroing, zeroing) =20 #ifdef CONFIG_HUGETLB_PAGE =20 @@ -678,6 +693,12 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; + + unsigned int free_huge_pages_zero_node[MAX_NUMNODES]; + + /* Queue to wait for a hugetlb folio that is being prezeroed */ + wait_queue_head_t dqzero_wait[MAX_NUMNODES]; + char name[HSTATE_NAME_LEN]; }; =20 @@ -711,6 +732,7 @@ int hugetlb_add_to_page_cache(struct folio *folio, stru= ct address_space *mapping pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct folio *folio); +void hugetlb_zero_folio(struct folio *folio, unsigned long address); =20 /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h, int nid); @@ -1303,6 +1325,10 @@ static inline bool hugetlb_bootmem_allocated(void) { return false; } + +static inline void hugetlb_zero_folio(struct folio *folio, unsigned long a= ddress) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ =20 static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 51273baec9e5..d20614b1c927 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -93,6 +93,8 @@ static int hugetlb_param_index __initdata; static __init int hugetlb_add_param(char *s, int (*setup)(char *val)); static __init void hugetlb_parse_params(void); =20 +static void hpage_wait_zeroing(struct hstate *h, struct folio *folio); + #define hugetlb_early_param(str, func) \ static __init int func##args(char *s) \ { \ @@ -1292,21 +1294,33 @@ void clear_vma_resv_huge_pages(struct vm_area_struc= t *vma) hugetlb_dup_vma_private(vma); } =20 +/* + * Clear flags for either a fresh page or one that is being + * added to the free list. + */ +static inline void prep_clear_zeroed(struct folio *folio) +{ + folio_clear_hugetlb_zeroed(folio); + folio_clear_hugetlb_zeroing(folio); +} + static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) { int nid =3D folio_nid(folio); =20 lockdep_assert_held(&hugetlb_lock); VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb_zeroing(folio), folio); =20 list_move(&folio->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + prep_clear_zeroed(folio); folio_set_hugetlb_freed(folio); } =20 -static struct folio *dequeue_hugetlb_folio_node_exact(struct hstate *h, - int nid) +static struct folio *dequeue_hugetlb_folio_node_exact(struct hstate *h, in= t nid, + gfp_t gfp_mask) { struct folio *folio; bool pin =3D !!(current->flags & PF_MEMALLOC_PIN); @@ -1316,6 +1330,16 @@ static struct folio *dequeue_hugetlb_folio_node_exac= t(struct hstate *h, if (pin && !folio_is_longterm_pinnable(folio)) continue; =20 + /* + * This shouldn't happen, as hugetlb pages are never allocated + * with GFP_ATOMIC. But be paranoid and check for it, as + * a zero_busy page might cause a sleep later in + * hpage_wait_zeroing(). + */ + if (WARN_ON_ONCE(folio_test_hugetlb_zeroing(folio) && + !gfpflags_allow_blocking(gfp_mask))) + continue; + if (folio_test_hwpoison(folio)) continue; =20 @@ -1327,6 +1351,10 @@ static struct folio *dequeue_hugetlb_folio_node_exac= t(struct hstate *h, folio_clear_hugetlb_freed(folio); h->free_huge_pages--; h->free_huge_pages_node[nid]--; + if (folio_test_hugetlb_zeroed(folio) || + folio_test_hugetlb_zeroing(folio)) + h->free_huge_pages_zero_node[nid]--; + return folio; } =20 @@ -1363,7 +1391,7 @@ static struct folio *dequeue_hugetlb_folio_nodemask(s= truct hstate *h, gfp_t gfp_ continue; node =3D zone_to_nid(zone); =20 - folio =3D dequeue_hugetlb_folio_node_exact(h, node); + folio =3D dequeue_hugetlb_folio_node_exact(h, node, gfp_mask); if (folio) return folio; } @@ -1490,7 +1518,16 @@ void remove_hugetlb_folio(struct hstate *h, struct f= olio *folio, folio_clear_hugetlb_freed(folio); h->free_huge_pages--; h->free_huge_pages_node[nid]--; + folio_clear_hugetlb_freed(folio); } + /* + * Adjust the zero page counters now. Note that + * if a page is currently being zeroed, that + * will be waited for in update_and_free_page() + */ + if (folio_test_hugetlb_zeroed(folio) || + folio_test_hugetlb_zeroing(folio)) + h->free_huge_pages_zero_node[nid]--; if (adjust_surplus) { h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; @@ -1543,6 +1580,8 @@ static void __update_and_free_hugetlb_folio(struct hs= tate *h, { bool clear_flag =3D folio_test_hugetlb_vmemmap_optimized(folio); =20 + VM_WARN_ON_FOLIO(folio_test_hugetlb_zeroing(folio), folio); + if (hstate_is_gigantic_no_runtime(h)) return; =20 @@ -1627,6 +1666,7 @@ static void free_hpage_workfn(struct work_struct *wor= k) */ h =3D size_to_hstate(folio_size(folio)); =20 + hpage_wait_zeroing(h, folio); __update_and_free_hugetlb_folio(h, folio); =20 cond_resched(); @@ -1643,7 +1683,8 @@ static inline void flush_free_hpage_work(struct hstat= e *h) static void update_and_free_hugetlb_folio(struct hstate *h, struct folio *= folio, bool atomic) { - if (!folio_test_hugetlb_vmemmap_optimized(folio) || !atomic) { + if ((!folio_test_hugetlb_zeroing(folio) && + !folio_test_hugetlb_vmemmap_optimized(folio)) || !atomic) { __update_and_free_hugetlb_folio(h, folio); return; } @@ -1840,6 +1881,13 @@ static void account_new_hugetlb_folio(struct hstate = *h, struct folio *folio) h->nr_huge_pages_node[folio_nid(folio)]++; } =20 +static void prep_new_hugetlb_folio(struct folio *folio) +{ + lockdep_assert_held(&hugetlb_lock); + folio_clear_hugetlb_freed(folio); + prep_clear_zeroed(folio); +} + void init_new_hugetlb_folio(struct folio *folio) { __folio_set_hugetlb(folio); @@ -1964,6 +2012,7 @@ void prep_and_add_allocated_folios(struct hstate *h, /* Add all new pool pages to free lists in one lock cycle */ spin_lock_irqsave(&hugetlb_lock, flags); list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { + prep_new_hugetlb_folio(folio); account_new_hugetlb_folio(h, folio); enqueue_hugetlb_folio(h, folio); } @@ -2171,6 +2220,7 @@ static struct folio *alloc_surplus_hugetlb_folio(stru= ct hstate *h, return NULL; =20 spin_lock_irq(&hugetlb_lock); + prep_new_hugetlb_folio(folio); /* * nr_huge_pages needs to be adjusted within the same lock cycle * as surplus_pages, otherwise it might confuse @@ -2214,6 +2264,7 @@ static struct folio *alloc_migrate_hugetlb_folio(stru= ct hstate *h, gfp_t gfp_mas return NULL; =20 spin_lock_irq(&hugetlb_lock); + prep_new_hugetlb_folio(folio); account_new_hugetlb_folio(h, folio); spin_unlock_irq(&hugetlb_lock); =20 @@ -2289,6 +2340,13 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hs= tate *h, int preferred_nid, preferred_nid, nmask); if (folio) { spin_unlock_irq(&hugetlb_lock); + /* + * The contents of this page will be completely + * overwritten immediately, as its a migration + * target, so no clearing is needed. Do wait in + * case pre-zero thread was working on it, though. + */ + hpage_wait_zeroing(h, folio); return folio; } } @@ -2779,6 +2837,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct fo= lio *old_folio, */ remove_hugetlb_folio(h, old_folio, false); =20 + prep_new_hugetlb_folio(new_folio); /* * Ref count on new_folio is already zero as it was dropped * earlier. It can be directly added to the pool free list. @@ -2999,6 +3058,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, =20 spin_unlock_irq(&hugetlb_lock); =20 + hpage_wait_zeroing(h, folio); + hugetlb_set_folio_subpool(folio, spool); =20 if (map_chg !=3D MAP_CHG_ENFORCED) { @@ -3257,6 +3318,7 @@ static void __init prep_and_add_bootmem_folios(struct= hstate *h, hugetlb_bootmem_init_migratetype(folio, h); /* Subdivide locks to achieve better parallel performance */ spin_lock_irqsave(&hugetlb_lock, flags); + prep_new_hugetlb_folio(folio); account_new_hugetlb_folio(h, folio); enqueue_hugetlb_folio(h, folio); spin_unlock_irqrestore(&hugetlb_lock, flags); @@ -4190,6 +4252,42 @@ bool __init __attribute((weak)) arch_hugetlb_valid_s= ize(unsigned long size) return size =3D=3D HPAGE_SIZE; } =20 +/* + * Zero a hugetlb page. + * + * The caller has already made sure that the page is not + * being actively zeroed out in the background. + * + * If it wasn't zeroed out, do it ourselves. + */ +void hugetlb_zero_folio(struct folio *folio, unsigned long address) +{ + if (!folio_test_hugetlb_zeroed(folio)) + folio_zero_user(folio, address); + + __folio_mark_uptodate(folio); +} + +/* + * Once a page has been taken off the freelist, the new page owner + * must wait for the pre-zero thread to finish if it happens + * to be working on this page (which should be rare). + */ +static void hpage_wait_zeroing(struct hstate *h, struct folio *folio) +{ + if (!folio_test_hugetlb_zeroing(folio)) + return; + + spin_lock_irq(&hugetlb_lock); + + wait_event_cmd(h->dqzero_wait[folio_nid(folio)], + !folio_test_hugetlb_zeroing(folio), + spin_unlock_irq(&hugetlb_lock), + spin_lock_irq(&hugetlb_lock)); + + spin_unlock_irq(&hugetlb_lock); +} + void __init hugetlb_add_hstate(unsigned int order) { struct hstate *h; @@ -4205,8 +4303,10 @@ void __init hugetlb_add_hstate(unsigned int order) __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key); h->order =3D order; h->mask =3D ~(huge_page_size(h) - 1); - for (i =3D 0; i < MAX_NUMNODES; ++i) + for (i =3D 0; i < MAX_NUMNODES; ++i) { INIT_LIST_HEAD(&h->hugepage_freelists[i]); + init_waitqueue_head(&h->dqzero_wait[i]); + } INIT_LIST_HEAD(&h->hugepage_activelist); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/SZ_1K); @@ -5804,8 +5904,7 @@ static vm_fault_t hugetlb_no_page(struct address_spac= e *mapping, ret =3D 0; goto out; } - folio_zero_user(folio, vmf->real_address); - __folio_mark_uptodate(folio); + hugetlb_zero_folio(folio, vmf->address); new_folio =3D true; =20 if (vma->vm_flags & VM_MAYSHARE) { --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-100.ptr.blmpb.com (sg-1-100.ptr.blmpb.com [118.26.132.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B19F30B501 for ; Thu, 25 Dec 2025 08:22:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650935; cv=none; b=nKHe+ILFlED4Bhd03dw5Daey+v45KHeWx75geOiub2C3ok4sjSEtQtZWIeMX8LJM1jgsczuuNy2Dd5icWIMwpAvF0fRPmo8X5vHB+2wETrf2qhLpeXbBy0z2JZ7VqNZ9wZAbi6BDHzPdSBa2Pt+9c9nJ9Y1y7oGB8vfpJ/s+oHk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650935; c=relaxed/simple; bh=GQIkfXQ/K0Nh7zHEGSDwiB5PLzTkcYE14horMa8yxBg=; h=Content-Type:Cc:From:Message-Id:Mime-Version:In-Reply-To: References:To:Subject:Date; b=LlkNUfV3YeQI7l6YrgkDlw6Qp0waOiI6uxKafsRRuK5HQFkRsdjLSM4QnQ5gvAlZD210JewkWNi9sLY+WF82N+3xq2YYrTjnYFjbv+4U4uVlw7eDB01XTPRCrWqhRKmR6tmDPLX/Kh930q2adSJFYXo+QAg2D2HUMaA1xYwoijY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=U85SMdZQ; arc=none smtp.client-ip=118.26.132.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="U85SMdZQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650919; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=7K6GkPYyCFR++S6aGBfoJNJnDFOzlltDzPDQoso/00A=; b=U85SMdZQcuVuEGqqNtBjY6wwCReAVbjoh8N7d3Rty8cue0Q9D4pvF+oEXlGi03Zk0Wjo83 QBYVwU7UkEv42plV6A0MPILvv4BtQ/1bAl+PmsbuFntC6lAn6YhX3c0Axczq7Pmba0GYej SRJwXNLfNrCOEbNooZEJcAMb1P/IMYt2bRRl3slbaTS++YW/O8NSIAVz3XiOtP4STDBrrD GgZU6++uUs23psCNtrjl9Kn6RrCBUT45uDHZB/VzXMGIAc10tAAmPYoItZCGThgcHmVBI6 vweFSmMkX/4Hx+YuZZmm8fS6IQ+TBXuYdC4HgTPCH4/ndr0gZYItCbLNTP7YjQ== Cc: , , From: =?utf-8?q?=E6=9D=8E=E5=96=86?= Message-Id: <20251225082059.1632-3-lizhe.67@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> References: <20251225082059.1632-1-lizhe.67@bytedance.com> X-Lms-Return-Path: To: , , , , Content-Transfer-Encoding: quoted-printable Subject: [PATCH 2/8] mm/hugetlb: convert to prep_account_new_hugetlb_folio() Date: Thu, 25 Dec 2025 16:20:53 +0800 X-Mailer: git-send-email 2.45.2 X-Original-From: lizhe.67@bytedance.com Content-Type: text/plain; charset="utf-8" From: Li Zhe After a huge folio is instantiated, it is always initialized through the successive calls to prep_new_hugetlb_folio() and account_new_hugetlb_folio(). To eliminate the risk that future changes update one routine but overlook the other, the two functions have been consolidated into a single entry point prep_account_new_hugetlb_folio(). Signed-off-by: Li Zhe --- mm/hugetlb.c | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d20614b1c927..63f9369789b5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1874,18 +1874,14 @@ void free_huge_folio(struct folio *folio) /* * Must be called with the hugetlb lock held */ -static void account_new_hugetlb_folio(struct hstate *h, struct folio *foli= o) -{ - lockdep_assert_held(&hugetlb_lock); - h->nr_huge_pages++; - h->nr_huge_pages_node[folio_nid(folio)]++; -} - -static void prep_new_hugetlb_folio(struct folio *folio) +static void prep_account_new_hugetlb_folio(struct hstate *h, + struct folio *folio) { lockdep_assert_held(&hugetlb_lock); folio_clear_hugetlb_freed(folio); prep_clear_zeroed(folio); + h->nr_huge_pages++; + h->nr_huge_pages_node[folio_nid(folio)]++; } =20 void init_new_hugetlb_folio(struct folio *folio) @@ -2012,8 +2008,7 @@ void prep_and_add_allocated_folios(struct hstate *h, /* Add all new pool pages to free lists in one lock cycle */ spin_lock_irqsave(&hugetlb_lock, flags); list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { - prep_new_hugetlb_folio(folio); - account_new_hugetlb_folio(h, folio); + prep_account_new_hugetlb_folio(h, folio); enqueue_hugetlb_folio(h, folio); } spin_unlock_irqrestore(&hugetlb_lock, flags); @@ -2220,13 +2215,12 @@ static struct folio *alloc_surplus_hugetlb_folio(st= ruct hstate *h, return NULL; =20 spin_lock_irq(&hugetlb_lock); - prep_new_hugetlb_folio(folio); /* * nr_huge_pages needs to be adjusted within the same lock cycle * as surplus_pages, otherwise it might confuse * persistent_huge_pages() momentarily. */ - account_new_hugetlb_folio(h, folio); + prep_account_new_hugetlb_folio(h, folio); =20 /* * We could have raced with the pool size change. @@ -2264,8 +2258,7 @@ static struct folio *alloc_migrate_hugetlb_folio(stru= ct hstate *h, gfp_t gfp_mas return NULL; =20 spin_lock_irq(&hugetlb_lock); - prep_new_hugetlb_folio(folio); - account_new_hugetlb_folio(h, folio); + prep_account_new_hugetlb_folio(h, folio); spin_unlock_irq(&hugetlb_lock); =20 /* fresh huge pages are frozen */ @@ -2831,18 +2824,17 @@ static int alloc_and_dissolve_hugetlb_folio(struct = folio *old_folio, /* * Ok, old_folio is still a genuine free hugepage. Remove it from * the freelist and decrease the counters. These will be - * incremented again when calling account_new_hugetlb_folio() + * incremented again when calling prep_account_new_hugetlb_folio() * and enqueue_hugetlb_folio() for new_folio. The counters will * remain stable since this happens under the lock. */ remove_hugetlb_folio(h, old_folio, false); =20 - prep_new_hugetlb_folio(new_folio); /* * Ref count on new_folio is already zero as it was dropped * earlier. It can be directly added to the pool free list. */ - account_new_hugetlb_folio(h, new_folio); + prep_account_new_hugetlb_folio(h, new_folio); enqueue_hugetlb_folio(h, new_folio); =20 /* @@ -3318,8 +3310,7 @@ static void __init prep_and_add_bootmem_folios(struct= hstate *h, hugetlb_bootmem_init_migratetype(folio, h); /* Subdivide locks to achieve better parallel performance */ spin_lock_irqsave(&hugetlb_lock, flags); - prep_new_hugetlb_folio(folio); - account_new_hugetlb_folio(h, folio); + prep_account_new_hugetlb_folio(h, folio); enqueue_hugetlb_folio(h, folio); spin_unlock_irqrestore(&hugetlb_lock, flags); } --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-102.ptr.blmpb.com (sg-1-102.ptr.blmpb.com [118.26.132.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE01E235BE2 for ; Thu, 25 Dec 2025 08:22:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.102 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650949; cv=none; b=YroY1EksrLFvb5ikTQ6CofZxj+Ld34vrkZYoiS7dlxlUeDQKiOYUCCzXWtYkEf9oFF168d2Eqa1edltdDe6ORS/60/thttIK6jp2WLvbh8w6v8VGOjudj5SsHZxGYxqNF9A9sqmWT6/c0qAAPQ59sfBRaKeBn0O1wbwnxUdsvG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650949; c=relaxed/simple; bh=rA2ogcxcGOGCTmhdh+3lxnXL54WJnBR7ukwiPqWeWuo=; h=To:Date:Content-Type:From:Mime-Version:In-Reply-To:References:Cc: Subject:Message-Id; b=fZXnfVR2cjSH3uvRQ2SM/S5zlG8fqCieze5Kv1LwmmMsmIJ7p8T8tM0YWK8n+uYO+PglQjeTD5shTkD1yi0jGvj554EswOrDsI6L0y5vu8tSJa0M1/t8qLj22zoX7j2ENneGlUQOFXqqNIPLAcjmkwtiZnCJZGCUJFR2X+3pM5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=MWaCbPQu; arc=none smtp.client-ip=118.26.132.102 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="MWaCbPQu" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650930; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=8UJwqsCU7lKx7MsCtXv7Xp41I0TJ8HI3IDAYvOZfD+4=; b=MWaCbPQu8GTmozuRmHrykATUlgO2YPmKAcBH007EnG3KE7muzSEjdmQmJ/Tbgono/biUbq qIngagg+a9e/CiBczSE0hHY7TQJbOjFIubeabmh6EXb7kCuv9dpFUJcBDvVpF2Il6QbsCK kjJmEtsvZBA9O5wYj/WFAFDH5M2JkJKMSgRuMLhesTxeiTsVny2UHoZCA1rg6nJIK4dJBn VzHQ2ynv9ADTwXXkeHsXLMwhVmNnQUO1RIcMwQl5TTn1D+ugxLgyWxQhvuHoEQp6ButZKY EdxI5RPLHrR187XI6XmJInlXlO7OpfuaIlsJCse/FwYejRNZ84BkasR1/+nh5A== To: , , , , Date: Thu, 25 Dec 2025 16:20:54 +0800 X-Original-From: lizhe.67@bytedance.com X-Lms-Return-Path: From: =?utf-8?q?=E6=9D=8E=E5=96=86?= Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 References: <20251225082059.1632-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable Cc: , , Subject: [PATCH 3/8] mm/hugetlb: move the huge folio to the end of the list during enqueue Message-Id: <20251225082059.1632-4-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" From: Li Zhe For the huge-folio free list, unzeroed huge folios are now inserted at the tail; a follow-on patch will place pre-zeroed ones at the head, so that allocations can obtain a pre-zeroed huge folio with minimal search. Also, placing newly zeroed pages at the head of the queue so they're chosen first in the next allocation helps keep the cache hot. Signed-off-by: Li Zhe --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 63f9369789b5..8d36487659f8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1312,7 +1312,7 @@ static void enqueue_hugetlb_folio(struct hstate *h, s= truct folio *folio) VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); VM_WARN_ON_FOLIO(folio_test_hugetlb_zeroing(folio), folio); =20 - list_move(&folio->lru, &h->hugepage_freelists[nid]); + list_move_tail(&folio->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; prep_clear_zeroed(folio); --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-103.ptr.blmpb.com (sg-1-103.ptr.blmpb.com [118.26.132.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74517261B80 for ; Thu, 25 Dec 2025 08:22:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.103 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650956; cv=none; b=XXcT66vwqbiFleSkApRdT9/j6Ad6e3PNNAWNJasrd2cB/10vHv/hXzRBqYIndY8hhpkQKSrgue5d8hJzDq0OS1oN3m+yL6f9LMZy8+R+fibmT8ILOmxVdYhyUFUpJCOUHTcpZNtfExd31ZJWX0P4mUNDh1DwZcHdoTWHAP3M8vY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650956; c=relaxed/simple; bh=kU8mVe0sMtsE+kTaLR8ZmKcKsO3vnbnfE9zaUvDUF30=; h=To:From:Mime-Version:In-Reply-To:Content-Type:Cc:Date:Subject: Message-Id:References; b=VHZ9hWSUx8WrKVKrvfw+NUMVbQ3+FCmSldNy4DZxRzv9v0IyWHw7H/U9nU5WwpgcodZieNK/Sxe4iTjY7CfXyQCX2cN0skgFZcHOZwXEF1+yC7A00b8uDS8pfpdLKzwzrs7CMbyVQhIV4Yxj/Hu5WPYtNmlG6XYyph+P7Z/PdCk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=hNdInncA; arc=none smtp.client-ip=118.26.132.103 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="hNdInncA" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650941; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=Ti9W+1CuspxDIvv7pxjazdy7Xiv9t090Z7oTyo6LLN4=; b=hNdInncA1Uh3h+eS+4jgdIfNkBAccQpVGuQ0/aTFJiT77YjuD34P/sdHW/sZeWS1cji00Y uDf7BqSD2S+2zYr862dlZRvIo0XilGGpKDE1mdDOHYNlq2rfshmtMljI8G7C76H6+5h5tz 66zBkBu/HoxAHz0JeJNihCP0pLvBI+Csbh8WCDhV6+rgxnCl820JnE4LvBnVXOdW2gfTbc x/nhVWnbFj1oc+xwsyEXaZa4XWYs5tnSdTjEsoXUoqhClugJz7SyfSe/GJQAywVVEHGoOq BoeXUJgILoiBvOG7BxmPc+PdlJuYw94Bq1ElmgKHzRcBgK9aESW9LrdW534T4A== To: , , , , X-Original-From: lizhe.67@bytedance.com From: =?utf-8?q?=E6=9D=8E=E5=96=86?= Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> Cc: , , Date: Thu, 25 Dec 2025 16:20:55 +0800 Subject: [PATCH 4/8] mm/hugetlb: introduce per-node sysfs interface "zeroable_hugepages" Message-Id: <20251225082059.1632-5-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 References: <20251225082059.1632-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" From: Li Zhe Fresh hugetlb pages are zeroed out when they are faulted in, just like with all other page types. This can take up a good amount of time for larger page sizes (e.g. around 40 milliseconds for a 1G page on a recent AMD-based system). This normally isn't a problem, since hugetlb pages are typically mapped by the application for a long time, and the initial delay when touching them isn't much of an issue. However, there are some use cases where a large number of hugetlb pages are touched when an application (such as a VM backed by these pages) starts. For 256 1G pages and 40ms per page, this would take 10 seconds, a noticeable delay. This patch adds a new zeroable_hugepages interface under each /sys/devices/system/node/node*/hugepages/hugepages-***kB directory. Reading it returns the number of huge folios of the corresponding size on that node that are eligible for pre-zeroing. The interface also accepts an integer x in the range [0, max], enabling user space to request that x huge pages be zeroed on demand. Exporting this interface offers the following advantages: (1) User space gains full control over when zeroing is triggered, enabling it to minimize the impact on both CPU and cache utilization. (2) Applications can spawn as many zeroing processes as they need, enabling concurrent background zeroing. (3) By binding the process to specific CPUs, users can confine zeroing threads to cores that do not run latency-critical tasks, eliminating interference. (4) A zeroing process can be interrupted at any time through standard signal mechanisms, allowing immediate cancellation. (5) The CPU consumption incurred by zeroing can be throttled and contained with cgroups, ensuring that the cost is not borne system-wide. On an AMD Milan platform, each 1 GB huge-page fault is shortened by at least 25628 us (figure inherited from the test results cited herein[1]). [1]: https://lore.kernel.org/linux-mm/202412030519.W14yll4e-lkp@intel.com/T= /#t Co-developed-by: Frank van der Linden Signed-off-by: Frank van der Linden Signed-off-by: Li Zhe --- mm/hugetlb_sysfs.c | 120 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) diff --git a/mm/hugetlb_sysfs.c b/mm/hugetlb_sysfs.c index 79ece91406bf..8c3e433209c3 100644 --- a/mm/hugetlb_sysfs.c +++ b/mm/hugetlb_sysfs.c @@ -352,6 +352,125 @@ struct node_hstate { }; static struct node_hstate node_hstates[MAX_NUMNODES]; =20 +static ssize_t zeroable_hugepages_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct hstate *h; + unsigned long free_huge_pages_zero; + int nid; + + h =3D kobj_to_hstate(kobj, &nid); + if (WARN_ON(nid =3D=3D NUMA_NO_NODE)) + return -EPERM; + + free_huge_pages_zero =3D h->free_huge_pages_node[nid] - + h->free_huge_pages_zero_node[nid]; + + return sprintf(buf, "%lu\n", free_huge_pages_zero); +} + +static inline bool zero_should_abort(struct hstate *h, int nid) +{ + return (h->free_huge_pages_zero_node[nid] =3D=3D + h->free_huge_pages_node[nid]) || + list_empty(&h->hugepage_freelists[nid]); +} + +static void zero_free_hugepages_nid(struct hstate *h, + int nid, unsigned int nr_zero) +{ + struct list_head *freelist =3D &h->hugepage_freelists[nid]; + unsigned int nr_zerod =3D 0; + struct folio *folio; + + if (zero_should_abort(h, nid)) + return; + + spin_lock_irq(&hugetlb_lock); + + while (nr_zerod < nr_zero) { + + if (zero_should_abort(h, nid) || fatal_signal_pending(current)) + break; + + freelist =3D freelist->prev; + if (unlikely(list_is_head(freelist, &h->hugepage_freelists[nid]))) + break; + folio =3D list_entry(freelist, struct folio, lru); + + if (folio_test_hugetlb_zeroed(folio) || + folio_test_hugetlb_zeroing(folio)) + continue; + + folio_set_hugetlb_zeroing(folio); + + /* + * Incrementing this here is a bit of a fib, since + * the page hasn't been cleared yet (it will be done + * immediately after dropping the lock below). But + * it keeps the count consistent with the overall + * free count in case the page gets taken off the + * freelist while we're working on it. + */ + h->free_huge_pages_zero_node[nid]++; + spin_unlock_irq(&hugetlb_lock); + + /* + * HWPoison pages may show up on the freelist. + * Don't try to zero it out, but do set the flag + * and counts, so that we don't consider it again. + */ + if (!folio_test_hwpoison(folio)) + folio_zero_user(folio, 0); + + cond_resched(); + + spin_lock_irq(&hugetlb_lock); + folio_set_hugetlb_zeroed(folio); + folio_clear_hugetlb_zeroing(folio); + + /* + * If the page is still on the free list, move + * it to the head. + */ + if (folio_test_hugetlb_freed(folio)) + list_move(&folio->lru, &h->hugepage_freelists[nid]); + + /* + * If someone was waiting for the zero to + * finish, wake them up. + */ + if (waitqueue_active(&h->dqzero_wait[nid])) + wake_up(&h->dqzero_wait[nid]); + nr_zerod++; + freelist =3D &h->hugepage_freelists[nid]; + } + spin_unlock_irq(&hugetlb_lock); +} + +static ssize_t zeroable_hugepages_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t len) +{ + unsigned int nr_zero; + struct hstate *h; + int err; + int nid; + + if (!strcmp(buf, "max") || !strcmp(buf, "max\n")) { + nr_zero =3D UINT_MAX; + } else { + err =3D kstrtouint(buf, 10, &nr_zero); + if (err) + return err; + } + h =3D kobj_to_hstate(kobj, &nid); + + zero_free_hugepages_nid(h, nid, nr_zero); + + return len; +} +HSTATE_ATTR(zeroable_hugepages); + /* * A subset of global hstate attributes for node devices */ @@ -359,6 +478,7 @@ static struct attribute *per_node_hstate_attrs[] =3D { &nr_hugepages_attr.attr, &free_hugepages_attr.attr, &surplus_hugepages_attr.attr, + &zeroable_hugepages_attr.attr, NULL, }; =20 --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-100.ptr.blmpb.com (sg-1-100.ptr.blmpb.com [118.26.132.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AD1630B50A for ; Thu, 25 Dec 2025 08:22:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650958; cv=none; b=N/kzkc8ccQyyhab4DMljtZKtkO1CRIrHjbT5hfTgDzALxjGpCx82mXHpefeDuVAIhRodjqPnTLitkCO8GWmZEf1T72OcWqaJXeDjw5hI80nRqiKH44VtJNqUFscU0ecXPo9UDG9pRJHNUA34TKdN2nQIHCnIClI87lsrkUEeACM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650958; c=relaxed/simple; bh=KNT3yuKR2PFsAjZaEUwiJhekKytBLPgdBQY+cLK2jrY=; h=Subject:Content-Type:To:Message-Id:In-Reply-To:References:From: Date:Mime-Version:Cc; b=in5ludMocJYcRmymtcuRof0+rlbaQ53sSWtx7BhE1kQ8LXMdYrFemXO6GHV53tXGYtwFIwMgDNRjGhVPlLo90vLeozOTQHh333Dkx0vNVWhUOf+MxWADGye1y3aHpp5RcPmKTElQ5XPLtVwtgnIs9QMzH5UCDoWwCn2+MMaPet8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=h6o3xmqD; arc=none smtp.client-ip=118.26.132.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="h6o3xmqD" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650950; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=x60aj7gnZL//ydPkRsAnsXmDZpL6pv5KHgFx5n1s35I=; b=h6o3xmqDft/dybsqQitwfvYQlBt/IiZOxNOz3EpMcNQTS3pnD/vVZf8XiL8YqDyslnDCCl G6+MqlP+yAZy1ZxyLFv4oPJNbCTYC5rlT+yzQciIzpTDSLbKJURJBuHUsgTiD/xweEvYdm Rj4pLj1BEat7/ZLuow8LaVSGEvuBTa1gp/Dux3bRQnmXD2opUbHtUh+5XjebSNroRmksKt BQW4h9U459+R3+QvLdlOgUd/gmH7iMWp8JJgbKTilqiuAvSwHq9NWban5oaFCKyQrAL5zJ 0wgqfsA0xD/u4C//nlvAtwzleKVl7OLJ6GQlITQpdleTd5Ty3BcSHItX8mx0VA== Subject: [PATCH 5/8] mm/hugetlb: simplify function hugetlb_sysfs_add_hstate() X-Original-From: lizhe.67@bytedance.com X-Mailer: git-send-email 2.45.2 To: , , , , Message-Id: <20251225082059.1632-6-lizhe.67@bytedance.com> In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> References: <20251225082059.1632-1-lizhe.67@bytedance.com> X-Lms-Return-Path: From: =?utf-8?q?=E6=9D=8E=E5=96=86?= Date: Thu, 25 Dec 2025 16:20:56 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Cc: , , Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Li Zhe The third parameter of hugetlb_sysfs_add_hstate() is currently an array of struct kobject *, yet the function only ever uses a single element. This patch narrows the argument to a pointer to that specific member, eliminating the unused array. Signed-off-by: Li Zhe --- mm/hugetlb_sysfs.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/mm/hugetlb_sysfs.c b/mm/hugetlb_sysfs.c index 8c3e433209c3..87dcd3038abc 100644 --- a/mm/hugetlb_sysfs.c +++ b/mm/hugetlb_sysfs.c @@ -304,31 +304,30 @@ static const struct attribute_group hstate_demote_att= r_group =3D { }; =20 static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *pare= nt, - struct kobject **hstate_kobjs, + struct kobject **hstate_kobj, const struct attribute_group *hstate_attr_group) { int retval; - int hi =3D hstate_index(h); =20 - hstate_kobjs[hi] =3D kobject_create_and_add(h->name, parent); - if (!hstate_kobjs[hi]) + *hstate_kobj =3D kobject_create_and_add(h->name, parent); + if (!*hstate_kobj) return -ENOMEM; =20 - retval =3D sysfs_create_group(hstate_kobjs[hi], hstate_attr_group); + retval =3D sysfs_create_group(*hstate_kobj, hstate_attr_group); if (retval) { - kobject_put(hstate_kobjs[hi]); - hstate_kobjs[hi] =3D NULL; + kobject_put(*hstate_kobj); + *hstate_kobj =3D NULL; return retval; } =20 if (h->demote_order) { - retval =3D sysfs_create_group(hstate_kobjs[hi], + retval =3D sysfs_create_group(*hstate_kobj, &hstate_demote_attr_group); if (retval) { pr_warn("HugeTLB unable to create demote interfaces for %s\n", h->name); - sysfs_remove_group(hstate_kobjs[hi], hstate_attr_group); - kobject_put(hstate_kobjs[hi]); - hstate_kobjs[hi] =3D NULL; + sysfs_remove_group(*hstate_kobj, hstate_attr_group); + kobject_put(*hstate_kobj); + *hstate_kobj =3D NULL; return retval; } } @@ -562,8 +561,8 @@ void hugetlb_register_node(struct node *node) =20 for_each_hstate(h) { err =3D hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj, - nhs->hstate_kobjs, - &per_node_hstate_attr_group); + &nhs->hstate_kobjs[hstate_index(h)], + &per_node_hstate_attr_group); if (err) { pr_err("HugeTLB: Unable to add hstate %s for node %d\n", h->name, node->dev.id); @@ -610,7 +609,7 @@ void __init hugetlb_sysfs_init(void) =20 for_each_hstate(h) { err =3D hugetlb_sysfs_add_hstate(h, hugepages_kobj, - hstate_kobjs, &hstate_attr_group); + &hstate_kobjs[hstate_index(h)], &hstate_attr_group); if (err) pr_err("HugeTLB: Unable to add hstate %s\n", h->name); } --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-101.ptr.blmpb.com (sg-1-101.ptr.blmpb.com [118.26.132.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB3D42E8B9B for ; Thu, 25 Dec 2025 08:22:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650976; cv=none; b=PA0ky/DEIpHERdgiqZbiLnUkbuxWcoiIRzm8R+YqzaYWKgPW+Jkiqzf3ObkoHTh67gXRNb7Eq88+VjZQVwEBfn1pCIdmkbsoNa580su5ErX+jAhfmQ/a7HI4/xaVoIYW9ELIKns+EQoHk+j+9HdMO+BNC4HwNwNvCYHBIN7Egcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650976; c=relaxed/simple; bh=06rh8Joms93QDBNUBZb2zYmICCnqR+1B4l3kJbSTg1E=; h=Mime-Version:Subject:Message-Id:Date:Content-Type:To:From:Cc: In-Reply-To:References; b=a7K+1ZOiTqbpWEXY7KgiHvicTrWWnP1qm9XWMs0vZbzYdJjhOgitFJmgl7VQCwfphcr6CUvUvnu4AQpeLdCVnyMFcKnxs7DnruSq7qa0dfzAmukB8weipvP5TOItkPWtTG2BuIncJMEkOl7z2vikZDYQE54NO7FjOIViodoEK+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=aazWuKjP; arc=none smtp.client-ip=118.26.132.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="aazWuKjP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650961; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=8ttJk0p7bBgZoxWv2A6j13QFJkQF0RozkKQHOvbY4aE=; b=aazWuKjPLSwoZ+++jhTiQ1QBD2pTOYVlO5Rn7L25rQVSb2K1T8nOV7zZ+sJ36DF3KTyUlW PkCxTHJTF8G8GU/0iSLg1POffx9sZhgONYuG3v3qYK1SBGjwpnI7GMzygRC7yJTD6F0DXl 8D/cGNpJBpvFWrJi62qzkU5Ur1tLv9STdDvEJPgd8YIRg5w82wrwD5T6PiL+6Ep7yWOmuZ wUxv1jxqu5hKVNnl2FzIEnWyCaolqq0Zd5StYr2yZl9I+g/9XKerjNnDRef0ylO0TTRG5Y DN3Qd+obFfEtQLyn1ojFc40Of2QMvUtWTl+MgKonLBb8ET4Sfntphy2niVHiCQ== Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Subject: [PATCH 6/8] mm/hugetlb: relocate the per-hstate struct kobject pointer Message-Id: <20251225082059.1632-7-lizhe.67@bytedance.com> Date: Thu, 25 Dec 2025 16:20:57 +0800 X-Mailer: git-send-email 2.45.2 Content-Transfer-Encoding: quoted-printable X-Original-From: lizhe.67@bytedance.com To: , , , , From: =?utf-8?q?=E6=9D=8E=E5=96=86?= Cc: , , In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> X-Lms-Return-Path: References: <20251225082059.1632-1-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" From: Li Zhe Relocate the per-hstate struct kobject pointer from struct node_hstate into a standalone structure. This change prepares for a future patch that adds epoll support to the =E2=80=9Czeroable_hugepages=E2=80=9D interface. When a huge folio is freed = we must emit an event, yet the freeing context may be atomic; therefore the notification will be delegated to a workqueue. Extracting the struct kobject pointer allows the workqueue callback to obtain it effortlessly. Signed-off-by: Li Zhe --- mm/hugetlb_sysfs.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb_sysfs.c b/mm/hugetlb_sysfs.c index 87dcd3038abc..08ad39d3e022 100644 --- a/mm/hugetlb_sysfs.c +++ b/mm/hugetlb_sysfs.c @@ -338,6 +338,10 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, = struct kobject *parent, #ifdef CONFIG_NUMA static bool hugetlb_sysfs_initialized __ro_after_init; =20 +struct node_hstate_item { + struct kobject *hstate_kobj; +}; + /* * node_hstate/s - associate per node hstate attributes, via their kobject= s, * with node devices in node_devices[] using a parallel array. The array @@ -347,7 +351,7 @@ static bool hugetlb_sysfs_initialized __ro_after_init; */ struct node_hstate { struct kobject *hugepages_kobj; - struct kobject *hstate_kobjs[HUGE_MAX_HSTATE]; + struct node_hstate_item items[HUGE_MAX_HSTATE]; }; static struct node_hstate node_hstates[MAX_NUMNODES]; =20 @@ -497,7 +501,7 @@ static struct hstate *kobj_to_node_hstate(struct kobjec= t *kobj, int *nidp) struct node_hstate *nhs =3D &node_hstates[nid]; int i; for (i =3D 0; i < HUGE_MAX_HSTATE; i++) - if (nhs->hstate_kobjs[i] =3D=3D kobj) { + if (nhs->items[i].hstate_kobj =3D=3D kobj) { if (nidp) *nidp =3D nid; return &hstates[i]; @@ -522,7 +526,7 @@ void hugetlb_unregister_node(struct node *node) =20 for_each_hstate(h) { int idx =3D hstate_index(h); - struct kobject *hstate_kobj =3D nhs->hstate_kobjs[idx]; + struct kobject *hstate_kobj =3D nhs->items[idx].hstate_kobj; =20 if (!hstate_kobj) continue; @@ -530,7 +534,7 @@ void hugetlb_unregister_node(struct node *node) sysfs_remove_group(hstate_kobj, &hstate_demote_attr_group); sysfs_remove_group(hstate_kobj, &per_node_hstate_attr_group); kobject_put(hstate_kobj); - nhs->hstate_kobjs[idx] =3D NULL; + nhs->items[idx].hstate_kobj =3D NULL; } =20 kobject_put(nhs->hugepages_kobj); @@ -561,7 +565,7 @@ void hugetlb_register_node(struct node *node) =20 for_each_hstate(h) { err =3D hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj, - &nhs->hstate_kobjs[hstate_index(h)], + &nhs->items[hstate_index(h)].hstate_kobj, &per_node_hstate_attr_group); if (err) { pr_err("HugeTLB: Unable to add hstate %s for node %d\n", --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-100.ptr.blmpb.com (sg-1-100.ptr.blmpb.com [118.26.132.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E74F30AAC0 for ; Thu, 25 Dec 2025 08:22:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650979; cv=none; b=CS56oCIZP/VC5rhAk1iKFTSezj3o15+pC2nqW6aJ0tAD4J8Fz85tEiKNDAymsQCzkx9f5BO38ZI7n++84m2tOX3rwZpjVgDsnf91n00taHjQratP2dykAl4ajiOBw2u6OPAsymHtr0RNWvoqUZZ0Xnu/uk7w6TPw9yJbx27XnTY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650979; c=relaxed/simple; bh=dc6xXW91+gVIwsIxV+NrIFQjSg7wzPiQSMpkL68wxbk=; h=From:References:Date:Content-Type:To:Subject:Message-Id: Mime-Version:In-Reply-To:Cc; b=R/GmclE6LAkUnasKwRHUQ/Va4OudsORmaRRCvu92X+WdEG43w8g5f0vtBZbZRLSsBIj43pv/UpvhuKpYIFlMCZXEpp22KOquARNggOhCIIjGIwKo0HzWmt0BnusdB7NLVCwl4ftxEs43UzJmfz5AJ7Olbh8ab9wq3oezF7sMT0Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XijCKMRX; arc=none smtp.client-ip=118.26.132.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XijCKMRX" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650971; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=5cEdrCycZoSGIaSMQtohlYU0qPrlLixmKafhXY+xcm0=; b=XijCKMRXz0bUt2h22L9Vpe/2yPl0HCS7pucYTxXUo6SPNfg6diJIe5n61WwUS4awfvGlzC RQMyqJy3LYue9UM2Ru6f6YxJRNT/Dp1cKmghUiPx2btIGfUDmr5mfwRDgcBJKM0JwrkthL Hqz7/wkLo8+B/H3N9A2jvqInd/y4qht1b6Ml3f5Ellm0X5ct6LzhAlN5UNLz+YUuZPd2aS /LIJSWfDzi43wQc1h7BuY/NgvBYGa8kbStb0WglMtaTuzfaaslXy9F9omtPPh8Bd831OkP /nDj9LpsU7EQEk1bwbKOfSN2CQPswXjZeCURWYtCok+oOcO+safd35BCdojunw== From: =?utf-8?q?=E6=9D=8E=E5=96=86?= X-Lms-Return-Path: X-Original-From: lizhe.67@bytedance.com Content-Transfer-Encoding: quoted-printable References: <20251225082059.1632-1-lizhe.67@bytedance.com> Date: Thu, 25 Dec 2025 16:20:58 +0800 X-Mailer: git-send-email 2.45.2 To: , , , , Subject: [PATCH 7/8] mm/hugetlb: add epoll support for interface "zeroable_hugepages" Message-Id: <20251225082059.1632-8-lizhe.67@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> Cc: , , Content-Type: text/plain; charset="utf-8" From: Li Zhe Add epoll support for interface "zeroable_hugepages". When no huge folios are available for pre-zeroing, user space can block on the zeroable_hugepages file with epoll, and it will be woken as soon as one or more huge folios become eligible for pre-zeroing. Signed-off-by: Li Zhe --- mm/hugetlb.c | 13 +++++++++++++ mm/hugetlb_internal.h | 6 ++++++ mm/hugetlb_sysfs.c | 22 +++++++++++++++++++++- 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8d36487659f8..c2df0317fe15 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1868,6 +1868,7 @@ void free_huge_folio(struct folio *folio) arch_clear_hugetlb_flags(folio); enqueue_hugetlb_folio(h, folio); spin_unlock_irqrestore(&hugetlb_lock, flags); + do_zero_free_notify(h, folio_nid(folio)); } } =20 @@ -1999,8 +2000,10 @@ static struct folio *alloc_fresh_hugetlb_folio(struc= t hstate *h, void prep_and_add_allocated_folios(struct hstate *h, struct list_head *folio_list) { + nodemask_t allocated_mask =3D NODE_MASK_NONE; unsigned long flags; struct folio *folio, *tmp_f; + int nid; =20 /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); @@ -2010,8 +2013,12 @@ void prep_and_add_allocated_folios(struct hstate *h, list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { prep_account_new_hugetlb_folio(h, folio); enqueue_hugetlb_folio(h, folio); + node_set(folio_nid(folio), allocated_mask); } spin_unlock_irqrestore(&hugetlb_lock, flags); + + for_each_node_mask(nid, allocated_mask) + do_zero_free_notify(h, nid); } =20 /* @@ -2383,6 +2390,8 @@ static int gather_surplus_pages(struct hstate *h, lon= g delta) long needed, allocated; bool alloc_ok =3D true; nodemask_t *mbind_nodemask, alloc_nodemask; + nodemask_t allocated_mask =3D NODE_MASK_NONE; + int nid; =20 mbind_nodemask =3D policy_mbind_nodemask(htlb_alloc_mask(h)); if (mbind_nodemask) @@ -2455,9 +2464,12 @@ static int gather_surplus_pages(struct hstate *h, lo= ng delta) break; /* Add the page to the hugetlb allocator */ enqueue_hugetlb_folio(h, folio); + node_set(folio_nid(folio), allocated_mask); } free: spin_unlock_irq(&hugetlb_lock); + for_each_node_mask(nid, allocated_mask) + do_zero_free_notify(h, nid); =20 /* * Free unnecessary surplus pages to the buddy allocator. @@ -2841,6 +2853,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct fo= lio *old_folio, * Folio has been replaced, we can safely free the old one. */ spin_unlock_irq(&hugetlb_lock); + do_zero_free_notify(h, folio_nid(new_folio)); update_and_free_hugetlb_folio(h, old_folio, false); } =20 diff --git a/mm/hugetlb_internal.h b/mm/hugetlb_internal.h index 1d2f870deccf..9c60661283c7 100644 --- a/mm/hugetlb_internal.h +++ b/mm/hugetlb_internal.h @@ -106,6 +106,12 @@ extern ssize_t __nr_hugepages_store_common(bool obey_m= empolicy, struct hstate *h, int nid, unsigned long count, size_t len); =20 +#ifdef CONFIG_NUMA +extern void do_zero_free_notify(struct hstate *h, int nid); +#else +static inline void do_zero_free_notify(struct hstate *h, int nid) {} +#endif + extern void hugetlb_sysfs_init(void) __init; =20 #ifdef CONFIG_SYSCTL diff --git a/mm/hugetlb_sysfs.c b/mm/hugetlb_sysfs.c index 08ad39d3e022..c063237249f6 100644 --- a/mm/hugetlb_sysfs.c +++ b/mm/hugetlb_sysfs.c @@ -340,6 +340,7 @@ static bool hugetlb_sysfs_initialized __ro_after_init; =20 struct node_hstate_item { struct kobject *hstate_kobj; + struct work_struct notify_work; }; =20 /* @@ -355,6 +356,21 @@ struct node_hstate { }; static struct node_hstate node_hstates[MAX_NUMNODES]; =20 +static void pre_zero_notify_fun(struct work_struct *work) +{ + struct node_hstate_item *item =3D + container_of(work, struct node_hstate_item, notify_work); + + sysfs_notify(item->hstate_kobj, NULL, "zeroable_hugepages"); +} + +void do_zero_free_notify(struct hstate *h, int nid) +{ + struct node_hstate *nhs =3D &node_hstates[nid]; + + schedule_work(&nhs->items[hstate_index(h)].notify_work); +} + static ssize_t zeroable_hugepages_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -564,8 +580,11 @@ void hugetlb_register_node(struct node *node) return; =20 for_each_hstate(h) { + int index =3D hstate_index(h); + struct node_hstate_item *item =3D &nhs->items[index]; + err =3D hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj, - &nhs->items[hstate_index(h)].hstate_kobj, + &item->hstate_kobj, &per_node_hstate_attr_group); if (err) { pr_err("HugeTLB: Unable to add hstate %s for node %d\n", @@ -573,6 +592,7 @@ void hugetlb_register_node(struct node *node) hugetlb_unregister_node(node); break; } + INIT_WORK(&item->notify_work, pre_zero_notify_fun); } } =20 --=20 2.20.1 From nobody Sun Feb 8 15:53:16 2026 Received: from sg-1-102.ptr.blmpb.com (sg-1-102.ptr.blmpb.com [118.26.132.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BFDE3009E8 for ; Thu, 25 Dec 2025 08:23:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.102 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650991; cv=none; b=jQqJJ0w3KbZEKrbeUA7BCWnuXNk3vsxA+yjAzBUm/m2FI4QocFHFC/icKZmj7Y+dIXek8bCb+ASezpzYgjEVJi6g1JbzC5/ZvNry7nEaOHd0zIRbo4V6BbiuX23FxVOJ8SOFILVKBBUCOHXuGg+oaQiQ682zdsA4GMJHf96eB2A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766650991; c=relaxed/simple; bh=19IHJqxk/M2ErNLlDY57SvL6MQfjDBDHkQlYc5XLWKw=; h=From:Subject:Date:Content-Type:Mime-Version:In-Reply-To:Cc: Message-Id:References:To; b=rqm8aIyAvuouUp6uMtzqJSGpsSwICAna+Ln88SsFOgjApfvkzV++yXQyJHBVKG8Fk7uiLRqaPQxGC+PBtT/zsX16hq7552PrGhuHZ/MsDpfWjCrmaM1H7Cd1yZG4fYtMxv9EvL6Vp6PnKrmVtSVCFfn68oeA1GTZc3Or1MbzwYM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=gGRlztR3; arc=none smtp.client-ip=118.26.132.102 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="gGRlztR3" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1766650981; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=qUNVArIPsex5nogpbUGIxGPpwCyLHHmHwDKuXvJThPk=; b=gGRlztR3UjiRFRm4IOTs+IqU445D+8rupQB8fnJSn0F27FJtLkk0e6UqNNeQx+ybxznj3z POqIH/2v4/9JyNSNWr14bEeVbIZ+gcR5xbWe+aQ4pDewjMiPP01objFdxZCgQgw7Zokm04 uhlsvdbWXiQ8ImsWfVSJMkLGFouYWxv3ulVfreB7UcGNQwdptCUSo520OhEjOuMJSHtpi1 om+2/SfxBX7S8wa3kVO9hT3Kx4aUtdVE8MIR3XRdlwxOOgeNrmW3A0NAer3bN5dVNIkYdm mDwe6KAjRAMWWxx4qb5P7h4npE09KqovlDicdBkOHBdkhYi8Q1P4TSHibQnGww== From: =?utf-8?q?=E6=9D=8E=E5=96=86?= Subject: [PATCH 8/8] mm/hugetlb: limit event generation frequency of function do_zero_free_notify() Date: Thu, 25 Dec 2025 16:20:59 +0800 X-Mailer: git-send-email 2.45.2 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: X-Original-From: lizhe.67@bytedance.com Content-Transfer-Encoding: quoted-printable In-Reply-To: <20251225082059.1632-1-lizhe.67@bytedance.com> Cc: , , Message-Id: <20251225082059.1632-9-lizhe.67@bytedance.com> References: <20251225082059.1632-1-lizhe.67@bytedance.com> To: , , , , Content-Type: text/plain; charset="utf-8" From: Li Zhe Throttling notifications reduces the number of scheduling notify_work making the mechanism far more efficient when huge numbers of huge folios are freed in rapid succession. Signed-off-by: Li Zhe --- mm/hugetlb_sysfs.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb_sysfs.c b/mm/hugetlb_sysfs.c index c063237249f6..dd47d48fe910 100644 --- a/mm/hugetlb_sysfs.c +++ b/mm/hugetlb_sysfs.c @@ -341,6 +341,8 @@ static bool hugetlb_sysfs_initialized __ro_after_init; struct node_hstate_item { struct kobject *hstate_kobj; struct work_struct notify_work; + unsigned long notified_at; + spinlock_t notify_lock; }; =20 /* @@ -364,11 +366,30 @@ static void pre_zero_notify_fun(struct work_struct *w= ork) sysfs_notify(item->hstate_kobj, NULL, "zeroable_hugepages"); } =20 +static void __do_zero_free_notify(struct node_hstate_item *item) +{ + unsigned long last; + unsigned long next; + +#define PRE_ZERO_NOTIFY_MIN_INTV DIV_ROUND_UP(HZ, 100) + spin_lock(&item->notify_lock); + last =3D item->notified_at; + next =3D last + PRE_ZERO_NOTIFY_MIN_INTV; + if (time_in_range(jiffies, last, next)) { + spin_unlock(&item->notify_lock); + return; + } + item->notified_at =3D jiffies; + spin_unlock(&item->notify_lock); + + schedule_work(&item->notify_work); +} + void do_zero_free_notify(struct hstate *h, int nid) { struct node_hstate *nhs =3D &node_hstates[nid]; =20 - schedule_work(&nhs->items[hstate_index(h)].notify_work); + __do_zero_free_notify(&nhs->items[hstate_index(h)]); } =20 static ssize_t zeroable_hugepages_show(struct kobject *kobj, @@ -593,6 +614,8 @@ void hugetlb_register_node(struct node *node) break; } INIT_WORK(&item->notify_work, pre_zero_notify_fun); + item->notified_at =3D jiffies; + spin_lock_init(&item->notify_lock); } } =20 --=20 2.20.1