From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C4FE2367B5 for ; Fri, 12 Sep 2025 03:29:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647780; cv=none; b=T/DBwSjZC9GVNd+cgjUcOrwkM+stPhMLK8D8MSmpbQG/hpHkSUOMCr1S1vROjmblRDOE0bm2lS0kjgLOw+kX1PpMGt5vWej+fmHOrntvAQr2qzuOE19PH3Zxj9EcaWaugjhc+dNjBLPzkKu3a4MNavzJ71LseA66VCru5sU0BYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647780; c=relaxed/simple; bh=vVW63pLd8dRlHJjIDu/VVKcYMSDwAjg/qfwIQ6ISHiM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fBKfmBrS0FjUlAKLcUIqmTDUAcmKWugSOChkx3oXwCXJf08+WerS91CFlLamW0JNOr+Vby67gUV+/I8gIHk5kzdoJ39itFgABlaOJ37JWoNLKvYwC7CV3pcVUIRjpXCi0ZaiNbd7FDFZ9WPwl5q3E6VHZM1TcZFQzk58B+dd7aU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NVAxFEli; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NVAxFEli" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647777; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WWEgRNk/JfQcUp+5YUiRF1aP5J/soXNdg96qGRiyX+g=; b=NVAxFEliGO7In0sowJS2NOEoPLzuVbU++zsek8O6Zm0tBaDezkgQe0ZcgZAmVZTuTepOYG Ze7gNPcOBkqpk0ZQ8eem/HBWcwpUkvRK0qYX02I0n4vzKbOXwdJ0Qu1qFJv8/J6CQMRU62 SHsBEEEZ6dARoIuafQF+r3BCYF5oTIw= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-463-1O_T2EnuOTCunxwWCn_PCQ-1; Thu, 11 Sep 2025 23:29:35 -0400 X-MC-Unique: 1O_T2EnuOTCunxwWCn_PCQ-1 X-Mimecast-MFC-AGG-ID: 1O_T2EnuOTCunxwWCn_PCQ_1757647770 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B707E180048E; Fri, 12 Sep 2025 03:29:28 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2170718004D8; Fri, 12 Sep 2025 03:29:17 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 01/15] khugepaged: rename hpage_collapse_* to collapse_* Date: Thu, 11 Sep 2025 21:27:56 -0600 Message-ID: <20250912032810.197475-2-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The hpage_collapse functions describe functions used by madvise_collapse and khugepaged. remove the unnecessary hpage prefix to shorten the function name. Reviewed-by: Lance Yang Reviewed-by: Liam R. Howlett Reviewed-by: Zi Yan Reviewed-by: Baolin Wang Reviewed-by: Lorenzo Stoakes Acked-by: David Hildenbrand Signed-off-by: Nico Pache --- mm/khugepaged.c | 73 ++++++++++++++++++++++++------------------------- mm/mremap.c | 2 +- 2 files changed, 37 insertions(+), 38 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index af5f5c80fe4e..40fa6e0a6b2d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -401,14 +401,14 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } =20 -static inline int hpage_collapse_test_exit(struct mm_struct *mm) +static inline int collapse_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) =3D=3D 0; } =20 -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) +static inline int collapse_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || + return collapse_test_exit(mm) || mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); } =20 @@ -443,7 +443,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; =20 /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_BUG_ON_MM(collapse_test_exit(mm), mm); if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) return; =20 @@ -501,7 +501,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * collapse_test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -590,7 +590,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 - /* See hpage_collapse_scan_pmd(). */ + /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; if (cc->is_khugepaged && @@ -841,7 +841,7 @@ struct collapse_control khugepaged_collapse_control =3D= { .is_khugepaged =3D true, }; =20 -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) +static bool collapse_scan_abort(int nid, struct collapse_control *cc) { int i; =20 @@ -876,7 +876,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(v= oid) } =20 #ifdef CONFIG_NUMA -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int collapse_find_target_node(struct collapse_control *cc) { int nid, target_node =3D 0, max_value =3D 0; =20 @@ -895,7 +895,7 @@ static int hpage_collapse_find_target_node(struct colla= pse_control *cc) return target_node; } #else -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int collapse_find_target_node(struct collapse_control *cc) { return 0; } @@ -916,7 +916,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, enum tva_type type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_COLLAPSE; =20 - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(collapse_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; =20 *vmap =3D vma =3D find_vma(mm, address); @@ -989,7 +989,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, =20 /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if hpage_collapse_scan_pmd believes it is worthwhile. + * Only done if khugepaged_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. @@ -1075,7 +1075,7 @@ static int alloc_charge_folio(struct folio **foliop, = struct mm_struct *mm, { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); - int node =3D hpage_collapse_find_target_node(cc); + int node =3D collapse_find_target_node(cc); struct folio *folio; =20 folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); @@ -1261,10 +1261,10 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, return result; } =20 -static int hpage_collapse_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, bool *mmap_locked, - struct collapse_control *cc) +static int collapse_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, bool *mmap_locked, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1372,7 +1372,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *= mm, * hit record. */ node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (collapse_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; goto out_unmap; } @@ -1439,7 +1439,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot= *mm_slot) =20 lockdep_assert_held(&khugepaged_mm_lock); =20 - if (hpage_collapse_test_exit(mm)) { + if (collapse_test_exit(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); @@ -1741,7 +1741,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) continue; =20 - if (hpage_collapse_test_exit(mm)) + if (collapse_test_exit(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2263,9 +2263,9 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, return result; } =20 -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, - struct file *file, pgoff_t start, - struct collapse_control *cc) +static int collapse_scan_file(struct mm_struct *mm, unsigned long addr, + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct folio *folio =3D NULL; struct address_space *mapping =3D file->f_mapping; @@ -2320,7 +2320,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, } =20 node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (collapse_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; folio_put(folio); break; @@ -2370,7 +2370,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, return result; } =20 -static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *resul= t, +static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2408,7 +2408,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, goto breakouterloop_mmap_lock; =20 progress++; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(collapse_test_exit_or_disable(mm))) goto breakouterloop; =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); @@ -2416,7 +2416,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, unsigned long hstart, hend; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) { + if (unlikely(collapse_test_exit_or_disable(mm))) { progress++; break; } @@ -2437,7 +2437,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, bool mmap_locked =3D true; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(collapse_test_exit_or_disable(mm))) goto breakouterloop; =20 VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2450,12 +2450,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigne= d int pages, int *result, =20 mmap_read_unlock(mm); mmap_locked =3D false; - *result =3D hpage_collapse_scan_file(mm, + *result =3D collapse_scan_file(mm, khugepaged_scan.address, file, pgoff, cc); fput(file); if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) + if (collapse_test_exit_or_disable(mm)) goto breakouterloop; *result =3D collapse_pte_mapped_thp(mm, khugepaged_scan.address, false); @@ -2464,7 +2464,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, mmap_read_unlock(mm); } } else { - *result =3D hpage_collapse_scan_pmd(mm, vma, + *result =3D collapse_scan_pmd(mm, vma, khugepaged_scan.address, &mmap_locked, cc); } =20 @@ -2497,7 +2497,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (collapse_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2550,8 +2550,8 @@ static void khugepaged_do_scan(struct collapse_contro= l *cc) pass_through_head++; if (khugepaged_has_work() && pass_through_head < 2) - progress +=3D khugepaged_scan_mm_slot(pages - progress, - &result, cc); + progress +=3D collapse_scan_mm_slot(pages - progress, + &result, cc); else progress =3D pages; spin_unlock(&khugepaged_mm_lock); @@ -2792,12 +2792,11 @@ int madvise_collapse(struct vm_area_struct *vma, un= signed long start, =20 mmap_read_unlock(mm); mmap_locked =3D false; - result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, - cc); + result =3D collapse_scan_file(mm, addr, file, pgoff, cc); fput(file); } else { - result =3D hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); + result =3D collapse_scan_pmd(mm, vma, addr, + &mmap_locked, cc); } if (!mmap_locked) *lock_dropped =3D true; diff --git a/mm/mremap.c b/mm/mremap.c index a562d8cf1eee..c461758c47f5 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -241,7 +241,7 @@ static int move_ptes(struct pagetable_move_control *pmc, goto out; } /* - * Now new_pte is none, so hpage_collapse_scan_file() path can not find + * Now new_pte is none, so collapse_scan_file() path can not find * this by traversing file->f_mapping, so there is no concurrency with * retract_page_tables(). In addition, we already hold the exclusive * mmap_lock, so this new_pte page is stable, so there is no need to get --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACC9923817C for ; Fri, 12 Sep 2025 03:29:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647790; cv=none; b=keZo3I/6dK01/bGWeU+0Y/gafzA/OM+GLv7gzMW/gPjyz6++fUBWJQYw28MibM8krwYP+xt/2tTxfkSxiD/SesjuRpfVHvlrNC1FZXbCE9wYyI0fhsOiCqbFFSNO+7shIqbgiWz0vBHA2U13lwB60Stfaw8eRHZhN083rNpynFA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647790; c=relaxed/simple; bh=7pqbCJ9/PHYK9lxKpwin8Kj6OUbSQUOWrCq3PNz5MHE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dFLJcUo43Cm1N3UG/alp+D2YbPmNYwti4S6eb1gsb/Ytyx93n+G/vGW3Ac66qc65sehiGsKApsQl9/Gs6fw7VuP/HGii7OguV7Jg35FC/tTNVu5dBRBz+eF0FrElFO3yTf8AXQwTGOp1qsI3rcBlThb3bjJPTepu32ZpOLSvQXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fUlL2Cgn; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fUlL2Cgn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647787; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FgAeDXTdH58+NNzWxsAhtBgj8QkDsiIiiAbzGEuCw8U=; b=fUlL2CgnAOw8e2bptepc9Yr2fDnXDideDSYv2SDzdhDfZX4EPz+ZPFqaoeT3ilqzHBdigl MwlyL8Flk5ap0i/X/HZsez/F+BxFROthwntsWxL7KpakhBaW0TPknlGcmoLs7t9rmbiiPV 6yXWvTa+AE4JjtVoHQBU6JImH8kqbQE= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-653-DbG21aV7NumJcjM1VAnbGg-1; Thu, 11 Sep 2025 23:29:44 -0400 X-MC-Unique: DbG21aV7NumJcjM1VAnbGg-1 X-Mimecast-MFC-AGG-ID: DbG21aV7NumJcjM1VAnbGg_1757647780 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0D4F81955E9A; Fri, 12 Sep 2025 03:29:39 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 29C631800576; Fri, 12 Sep 2025 03:29:28 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 02/15] introduce collapse_single_pmd to unify khugepaged and madvise_collapse Date: Thu, 11 Sep 2025 21:27:57 -0600 Message-ID: <20250912032810.197475-3-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The khugepaged daemon and madvise_collapse have two different implementations that do almost the same thing. Create collapse_single_pmd to increase code reuse and create an entry point to these two users. Refactor madvise_collapse and collapse_scan_mm_slot to use the new collapse_single_pmd function. This introduces a minor behavioral change that is most likely an undiscovered bug. The current implementation of khugepaged tests collapse_test_exit_or_disable before calling collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse case. By unifying these two callers madvise_collapse now also performs this check. We also modify the return value to be SCAN_ANY_PROCESS which properly indicates that this process is no longer valid to operate on. We also guard the khugepaged_pages_collapsed variable to ensure its only incremented for khugepaged. Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Signed-off-by: Nico Pache --- mm/khugepaged.c | 97 ++++++++++++++++++++++++++----------------------- 1 file changed, 52 insertions(+), 45 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 40fa6e0a6b2d..63d2ba4b2b6d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2370,6 +2370,53 @@ static int collapse_scan_file(struct mm_struct *mm, = unsigned long addr, return result; } =20 +/* + * Try to collapse a single PMD starting at a PMD aligned addr, and return + * the results. + */ +static int collapse_single_pmd(unsigned long addr, + struct vm_area_struct *vma, bool *mmap_locked, + struct collapse_control *cc) +{ + struct mm_struct *mm =3D vma->vm_mm; + int result; + struct file *file; + pgoff_t pgoff; + + if (vma_is_anonymous(vma)) { + result =3D collapse_scan_pmd(mm, vma, addr, mmap_locked, cc); + goto end; + } + + file =3D get_file(vma->vm_file); + pgoff =3D linear_page_index(vma, addr); + + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D collapse_scan_file(mm, addr, file, pgoff, cc); + fput(file); + if (result !=3D SCAN_PTE_MAPPED_HUGEPAGE) + goto end; + + mmap_read_lock(mm); + *mmap_locked =3D true; + if (collapse_test_exit_or_disable(mm)) { + mmap_read_unlock(mm); + *mmap_locked =3D false; + return SCAN_ANY_PROCESS; + } + result =3D collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged); + if (result =3D=3D SCAN_PMD_MAPPED) + result =3D SCAN_SUCCEED; + mmap_read_unlock(mm); + *mmap_locked =3D false; + +end: + if (cc->is_khugepaged && result =3D=3D SCAN_SUCCEED) + ++khugepaged_pages_collapsed; + return result; +} + static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result, struct collapse_control *cc) __releases(&khugepaged_mm_lock) @@ -2443,34 +2490,9 @@ static unsigned int collapse_scan_mm_slot(unsigned i= nt pages, int *result, VM_BUG_ON(khugepaged_scan.address < hstart || khugepaged_scan.address + HPAGE_PMD_SIZE > hend); - if (!vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, - khugepaged_scan.address); - - mmap_read_unlock(mm); - mmap_locked =3D false; - *result =3D collapse_scan_file(mm, - khugepaged_scan.address, file, pgoff, cc); - fput(file); - if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { - mmap_read_lock(mm); - if (collapse_test_exit_or_disable(mm)) - goto breakouterloop; - *result =3D collapse_pte_mapped_thp(mm, - khugepaged_scan.address, false); - if (*result =3D=3D SCAN_PMD_MAPPED) - *result =3D SCAN_SUCCEED; - mmap_read_unlock(mm); - } - } else { - *result =3D collapse_scan_pmd(mm, vma, - khugepaged_scan.address, &mmap_locked, cc); - } - - if (*result =3D=3D SCAN_SUCCEED) - ++khugepaged_pages_collapsed; =20 + *result =3D collapse_single_pmd(khugepaged_scan.address, + vma, &mmap_locked, cc); /* move to next address */ khugepaged_scan.address +=3D HPAGE_PMD_SIZE; progress +=3D HPAGE_PMD_NR; @@ -2786,34 +2808,19 @@ int madvise_collapse(struct vm_area_struct *vma, un= signed long start, mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); - if (!vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, addr); =20 - mmap_read_unlock(mm); - mmap_locked =3D false; - result =3D collapse_scan_file(mm, addr, file, pgoff, cc); - fput(file); - } else { - result =3D collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } + result =3D collapse_single_pmd(addr, vma, &mmap_locked, cc); + if (!mmap_locked) *lock_dropped =3D true; =20 -handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; - case SCAN_PTE_MAPPED_HUGEPAGE: - BUG_ON(mmap_locked); - mmap_read_lock(mm); - result =3D collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); - goto handle_result; /* Whitelisted set of results where continuing OK */ + case SCAN_PTE_MAPPED_HUGEPAGE: case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: case SCAN_PTE_UFFD_WP: --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AED15231A32 for ; Fri, 12 Sep 2025 03:29:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647800; cv=none; b=Dr/67ohocLiah/51f0kmtg1aiZDu4Lb9GmOLV251ZC0IiZ65y9c/eCF4xrN0xlCVQwgaxivMqo5NwlCNRCZkTio8tWkzdpYsELrWljHN9t/8xxBDfklRgz/+aE/Kb9BUQAiiJBzHYhAVQK0JjCfYJwCikHv68xFS/KnZu3WJWb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647800; c=relaxed/simple; bh=3sMzsnxXfXOT9IDJAYDzD4SxzakcjaZr/vy+/NKXUuQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AOmQixXV+YeA8ASeTQObDvoLaJR8DzlYOLmPq3w5K8aQmhOQ1Xj9y9QSXnl1AoL/60LQ8IyUvRtRd1uEytp5rEsFPBkWIEeWWTOFpjjWw9i68HxXzd7/H3NmUNWDpwYG6JvoBS9tFf88M/J2awFBpz50R4QkMuFBrGHUbzyfo0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZNwgMck8; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZNwgMck8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647797; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8/zPLf+eIzOBCmqNcyV6l1/de0O/nJtADzIVrStQZTs=; b=ZNwgMck8QCKHatQU2h6vROToBSzd+5ZS9j6+Gew7QdgnvHBjW7QVM80vD+f4xjmj3kabCy YOIiL/vQC4egHtHXeIZ/1G4sib8Gd4RGjirEPP2Cp1XDSwyROG/Zc2nH+FybT8gQKFU1i/ uVhTsuv2QUvAS5NgYdiMXy7RCmu0ETU= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-119-lRdZ3CZoORy9VQvj-Uv2-w-1; Thu, 11 Sep 2025 23:29:55 -0400 X-MC-Unique: lRdZ3CZoORy9VQvj-Uv2-w-1 X-Mimecast-MFC-AGG-ID: lRdZ3CZoORy9VQvj-Uv2-w_1757647789 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A79BD180045C; Fri, 12 Sep 2025 03:29:48 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5C5AE180035E; Fri, 12 Sep 2025 03:29:39 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 03/15] khugepaged: generalize hugepage_vma_revalidate for mTHP support Date: Thu, 11 Sep 2025 21:27:58 -0600 Message-ID: <20250912032810.197475-4-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" For khugepaged to support different mTHP orders, we must generalize this to check if the PMD is not shared by another VMA and that the order is enabled. No functional change in this patch. Also correct a comment about the functionality of the revalidation. Reviewed-by: Baolin Wang Reviewed-by: Lorenzo Stoakes Acked-by: David Hildenbrand Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 63d2ba4b2b6d..6dbe2d0683ac 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -903,14 +903,13 @@ static int collapse_find_target_node(struct collapse_= control *cc) =20 /* * If mmap_lock temporarily dropped, revalidate vma - * before taking mmap_lock. + * after taking the mmap_lock again. * Returns enum scan_result value. */ =20 static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long add= ress, - bool expect_anon, - struct vm_area_struct **vmap, - struct collapse_control *cc) + bool expect_anon, struct vm_area_struct **vmap, + struct collapse_control *cc, unsigned int order) { struct vm_area_struct *vma; enum tva_type type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : @@ -923,15 +922,16 @@ static int hugepage_vma_revalidate(struct mm_struct *= mm, unsigned long address, if (!vma) return SCAN_VMA_NULL; =20 + /* Always check the PMD order to ensure its not shared by another VMA */ if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, type, PMD_ORDER)) + if (!thp_vma_allowable_orders(vma, vma->vm_flags, type, BIT(order))) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then * remapped to file after khugepaged reaquired the mmap_lock. * - * thp_vma_allowable_order may return true for qualified file + * thp_vma_allowable_orders may return true for qualified file * vmas. */ if (expect_anon && (!(*vmap)->anon_vma || !vma_is_anonymous(*vmap))) @@ -1127,7 +1127,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, + HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1161,7 +1162,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, + HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -2797,7 +2799,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, mmap_read_lock(mm); mmap_locked =3D true; result =3D hugepage_vma_revalidate(mm, addr, false, &vma, - cc); + cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) { last_fail =3D result; goto out_nolock; --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62B28238172 for ; Fri, 12 Sep 2025 03:30:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647812; cv=none; b=apM4QBA6m66xuSbuUh3iEv3FJW6JUupExTpf9M/8u2974A1zKpaKELLLHtIngwxNDv38dwa4DNSN8mDDRjqWm5zy3hxaAQfSp7f9k8/41SKTTjnu85UQwI3sXSbbSMY9rKuX4n1AsSY0JFXWewLjdmFi7vCk+SoIm0OTHzxCXis= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647812; c=relaxed/simple; bh=Vm7bITW9QCZf1WqPihmfS3SG7aR8ZIqIzZkE/w9uWyE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DDZ9qrlmOFMpVtOccac7kLU9yXDpYf062FRvijZrhPBP7IsTB9OfKzIpwx+9WwecCkKQPMb1K/Aw6WQo0HaLViiFsIkGX+3rwksXpyt016/JNvihdR7dT8fSR7zmzD20J3YY+Wf3qvi4j8SvvNMk6DR11X1ktdveRHJN7JZ60Tw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PMw2ajvI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PMw2ajvI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647809; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g2z83/9qPq5Wjmo5BrRPgJsjDp/pEICW14/vfPUKdO0=; b=PMw2ajvIARCdfWj4D4BaKJMWGQ7xHgAjQ0VSBPWOs0f2pl31vsNx8tNVU9tJOY9PbTZZVa I/JPRzq38AjTOjc3AGOBjOiGYJ6hu+5aEHYjt0uMxVI+fLXZRPFo7vEN9cLEdCA5kvzS6H Yg11JowmOQROV0hijo+iIotLLM0Dn4k= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-604-4E5lNFqlN0WrNzq7CiVYUw-1; Thu, 11 Sep 2025 23:30:04 -0400 X-MC-Unique: 4E5lNFqlN0WrNzq7CiVYUw-1 X-Mimecast-MFC-AGG-ID: 4E5lNFqlN0WrNzq7CiVYUw_1757647799 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 40A351945102; Fri, 12 Sep 2025 03:29:59 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0423D1800451; Fri, 12 Sep 2025 03:29:48 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 04/15] khugepaged: generalize alloc_charge_folio() Date: Thu, 11 Sep 2025 21:27:59 -0600 Message-ID: <20250912032810.197475-5-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" From: Dev Jain Pass order to alloc_charge_folio() and update mTHP statistics. Reviewed-by: Baolin Wang Reviewed-by: Lorenzo Stoakes Acked-by: David Hildenbrand Co-developed-by: Nico Pache Signed-off-by: Nico Pache Signed-off-by: Dev Jain --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 4 ++++ mm/khugepaged.c | 17 +++++++++++------ 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 1654211cc6cf..13269a0074d4 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -634,6 +634,14 @@ anon_fault_fallback_charge instead falls back to using huge pages with lower orders or small pages even though the allocation was successful. =20 +collapse_alloc + is incremented every time a huge page is successfully allocated for a + khugepaged collapse. + +collapse_alloc_failed + is incremented every time a huge page allocation fails during a + khugepaged collapse. + zswpout is incremented every time a huge page is swapped out to zswap in one piece without splitting. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a166be872628..d442f45bd458 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -128,6 +128,8 @@ enum mthp_stat_item { MTHP_STAT_ANON_FAULT_ALLOC, MTHP_STAT_ANON_FAULT_FALLBACK, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, + MTHP_STAT_COLLAPSE_ALLOC, + MTHP_STAT_COLLAPSE_ALLOC_FAILED, MTHP_STAT_ZSWPOUT, MTHP_STAT_SWPIN, MTHP_STAT_SWPIN_FALLBACK, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d6fc669e11c1..76509e3d845b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -620,6 +620,8 @@ static struct kobj_attribute _name##_attr =3D __ATTR_RO= (_name) DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FAL= LBACK_CHARGE); +DEFINE_MTHP_STAT_ATTR(collapse_alloc, MTHP_STAT_COLLAPSE_ALLOC); +DEFINE_MTHP_STAT_ATTR(collapse_alloc_failed, MTHP_STAT_COLLAPSE_ALLOC_FAIL= ED); DEFINE_MTHP_STAT_ATTR(zswpout, MTHP_STAT_ZSWPOUT); DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN); DEFINE_MTHP_STAT_ATTR(swpin_fallback, MTHP_STAT_SWPIN_FALLBACK); @@ -685,6 +687,8 @@ static struct attribute *any_stats_attrs[] =3D { #endif &split_attr.attr, &split_failed_attr.attr, + &collapse_alloc_attr.attr, + &collapse_alloc_failed_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6dbe2d0683ac..2dea49522755 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1071,21 +1071,26 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } =20 static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, - struct collapse_control *cc) + struct collapse_control *cc, unsigned int order) { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); int node =3D collapse_find_target_node(cc); struct folio *folio; =20 - folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); + folio =3D __folio_alloc(gfp, order, node, &cc->alloc_nmask); if (!folio) { *foliop =3D NULL; - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC_FAILED); return SCAN_ALLOC_HUGE_PAGE_FAIL; } =20 - count_vm_event(THP_COLLAPSE_ALLOC); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC); + if (unlikely(mem_cgroup_charge(folio, mm, gfp))) { folio_put(folio); *foliop =3D NULL; @@ -1122,7 +1127,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, */ mmap_read_unlock(mm); =20 - result =3D alloc_charge_folio(&folio, mm, cc); + result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 @@ -1850,7 +1855,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); =20 - result =3D alloc_charge_folio(&new_folio, mm, cc); + result =3D alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out; =20 --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BE5823E32D for ; Fri, 12 Sep 2025 03:30:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647821; cv=none; b=B9q5l6hBiGPCr+rf4cH3d2y54cz1KwVJyzFhN/WV3Ymn9EM5NpIrZSa/K65thoO71UabR+djpzDQRcF0Jig0SZD8ucjxT4txrFvxw7O9c2crsDA0wdVfDfbQR4s3mIFw60c4KeY69n+2AXyet2P8WC9Zc+p36doSnrpsM0D3xFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647821; c=relaxed/simple; bh=L+X8lSMSHyjSp37kz1Mx5fLbBeejlsIH/NiOQydE8Ug=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WtJU/2fnCidX4tEJcL7jHnoaBHDIF4+Xt2UgxlgM+GgN4DfU4hSdMAKM7+04PyP89iPUpy97lZKHi5znuA9occVP8NT+ulwLpfwclh3N4EgxUmQaSyEKYLzzjw2F4cNzCqKlXUVpnsuY6ecSw0VTT4er7Z8ezFCO+UDFueFYF0c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=O0YLtZ5A; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="O0YLtZ5A" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647819; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9D76zFALWSml5Ic9rb2rWLYMf0DSBysHFMFaXfSgyWc=; b=O0YLtZ5A0+AHtDf8uWbkfck8EG/k73nrmcvQdXNXRfnvhgKMHDS+X6l4Dt9T5D3TTOc9cq TejgF7jiaBUsxwKnmJwhoNmypeWwTuhP0vLi1XC9p0mkYQIKkw0BNX1f2eISPqviEZsKKN t5ZI9AmaA7Evb7UsB6sJqeuStnjC4sY= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-295-ktzECp4YPJ-kuDKK3ZLadw-1; Thu, 11 Sep 2025 23:30:14 -0400 X-MC-Unique: ktzECp4YPJ-kuDKK3ZLadw-1 X-Mimecast-MFC-AGG-ID: ktzECp4YPJ-kuDKK3ZLadw_1757647809 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8244F1800366; Fri, 12 Sep 2025 03:30:09 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 90F4F18004D8; Fri, 12 Sep 2025 03:29:59 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 05/15] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Thu, 11 Sep 2025 21:28:00 -0600 Message-ID: <20250912032810.197475-6-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse will not honor the khugepaged_max_ptes_shared or khugepaged_max_ptes_swap parameters, and will fail if it encounters a shared or swapped entry. No functional changes in this patch. Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 78 ++++++++++++++++++++++++++++++------------------- 1 file changed, 48 insertions(+), 30 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2dea49522755..b0ae0b63fc9b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -547,17 +547,17 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte, } =20 static int __collapse_huge_page_isolate(struct vm_area_struct *vma, - unsigned long address, - pte_t *pte, - struct collapse_control *cc, - struct list_head *compound_pagelist) + unsigned long address, pte_t *pte, struct collapse_control *cc, + unsigned int order, struct list_head *compound_pagelist) { struct page *page =3D NULL; struct folio *folio =3D NULL; pte_t *_pte; int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, referenced = =3D 0; + int scaled_max_ptes_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER= - order); + const unsigned long nr_pages =3D 1UL << order; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + nr_pages; _pte++, address +=3D PAGE_SIZE) { pte_t pteval =3D ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -565,7 +565,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + none_or_zero <=3D scaled_max_ptes_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -593,8 +593,14 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + /* + * TODO: Support shared pages without leading to further + * mTHP collapses. Currently bringing in new pages via + * shared may cause a future higher order collapse on a + * rescan of the same range. + */ + if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -687,18 +693,18 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, } =20 static void __collapse_huge_page_copy_succeeded(pte_t *pte, - struct vm_area_struct *vma, - unsigned long address, - spinlock_t *ptl, - struct list_head *compound_pagelist) + struct vm_area_struct *vma, unsigned long address, + spinlock_t *ptl, unsigned int order, + struct list_head *compound_pagelist) { - unsigned long end =3D address + HPAGE_PMD_SIZE; + unsigned long end =3D address + (PAGE_SIZE << order); struct folio *src, *tmp; pte_t pteval; pte_t *_pte; unsigned int nr_ptes; + const unsigned long nr_pages =3D 1UL << order; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; _pte +=3D nr_ptes, + for (_pte =3D pte; _pte < pte + nr_pages; _pte +=3D nr_ptes, address +=3D nr_ptes * PAGE_SIZE) { nr_ptes =3D 1; pteval =3D ptep_get(_pte); @@ -751,13 +757,11 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, } =20 static void __collapse_huge_page_copy_failed(pte_t *pte, - pmd_t *pmd, - pmd_t orig_pmd, - struct vm_area_struct *vma, - struct list_head *compound_pagelist) + pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, + unsigned int order, struct list_head *compound_pagelist) { spinlock_t *pmd_ptl; - + const unsigned long nr_pages =3D 1UL << order; /* * Re-establish the PMD to point to the original page table * entry. Restoring PMD needs to be done prior to releasing @@ -771,7 +775,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + nr_pages, compound_pagelist); } =20 /* @@ -791,16 +795,16 @@ static void __collapse_huge_page_copy_failed(pte_t *p= te, */ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - unsigned long address, spinlock_t *ptl, + unsigned long address, spinlock_t *ptl, unsigned int order, struct list_head *compound_pagelist) { unsigned int i; int result =3D SCAN_SUCCEED; - + const unsigned long nr_pages =3D 1UL << order; /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i =3D 0; i < HPAGE_PMD_NR; i++) { + for (i =3D 0; i < nr_pages; i++) { pte_t pteval =3D ptep_get(pte + i); struct page *page =3D folio_page(folio, i); unsigned long src_addr =3D address + i * PAGE_SIZE; @@ -819,10 +823,10 @@ static int __collapse_huge_page_copy(pte_t *pte, stru= ct folio *folio, =20 if (likely(result =3D=3D SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + order, compound_pagelist); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + order, compound_pagelist); =20 return result; } @@ -995,13 +999,12 @@ static int check_pmd_still_valid(struct mm_struct *mm, * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. */ static int __collapse_huge_page_swapin(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long haddr, pmd_t *pmd, - int referenced) + struct vm_area_struct *vma, unsigned long haddr, + pmd_t *pmd, int referenced, unsigned int order) { int swapped_in =3D 0; vm_fault_t ret =3D 0; - unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end =3D haddr + (PAGE_SIZE << order); int result; pte_t *pte =3D NULL; spinlock_t *ptl; @@ -1032,6 +1035,19 @@ static int __collapse_huge_page_swapin(struct mm_str= uct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; =20 + /* + * TODO: Support swapin without leading to further mTHP + * collapses. Currently bringing in new pages via swapin may + * cause a future higher order collapse on a rescan of the same + * range. + */ + if (order !=3D HPAGE_PMD_ORDER) { + pte_unmap(pte); + mmap_read_unlock(mm); + result =3D SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte =3D pte; vmf.ptl =3D ptl; ret =3D do_swap_page(&vmf); @@ -1152,7 +1168,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * that case. Continuing to collapse causes inconsistency. */ result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; } @@ -1200,6 +1216,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result =3D __collapse_huge_page_isolate(vma, address, pte, cc, + HPAGE_PMD_ORDER, &compound_pagelist); spin_unlock(pte_ptl); } else { @@ -1230,6 +1247,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, + HPAGE_PMD_ORDER, &compound_pagelist); pte_unmap(pte); if (unlikely(result !=3D SCAN_SUCCEED)) --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 482B223236D for ; Fri, 12 Sep 2025 03:30:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647830; cv=none; b=G+x4ic57jU64tQkYyd60dVqZgI7LmMiW3uJNwhJqWEXqFYCzAN9eSw3bNuDhg+UarOQlVcVyIK11wtQRB10GuDih/j7QruE8WMYm1+GPD7o4MCmdAyhaA5+pXewO4Ay4z9dxMDgvHAXzJ5YTYyxKY1oYF7SZQ1J6FuER7McF4/I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647830; c=relaxed/simple; bh=nCxK5YTs9NXIxASgRRsqjljX/6Or2lDcz9lxyWphGMw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dxfnxnom8YT0A4D2/rmcmCrvx7JADpa98SHe3FhLN6kbv1y2ylCaQXNNhCoO3kw2UcMr4vroVnSXBTe8MG13tSWhPr6JmUDyo10DkkOUqGXzXCTnG1LbGCs2f8savPlArk0zuleQrSqDIbZn29kHL5N6yRg+afmlF7I/wZNSLCc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VYmFWQlq; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VYmFWQlq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647828; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=01Jz9/k0vzI/AACRV7RtMTOuI2wxHsxgFib4uYq4G7A=; b=VYmFWQlqTLjTXCCYWvgrmofnbMeazB6JNtYF8PuqI5bKuNGg+DWspnNJPIjk//MQ+D+gLF B2HhGgEuTYmn+BAA33EeFIS7xVKyBYHhtOgI+VGhbV2BRn86DNo7y/9ZSPuAOt/hvRH7tT uZIcMNAbbr6i/S94oCc3yWgqwXK1kEA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-34-ZY1NzjJ6PU22UsrupP4zCA-1; Thu, 11 Sep 2025 23:30:24 -0400 X-MC-Unique: ZY1NzjJ6PU22UsrupP4zCA-1 X-Mimecast-MFC-AGG-ID: ZY1NzjJ6PU22UsrupP4zCA_1757647820 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8D3A41800343; Fri, 12 Sep 2025 03:30:19 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D81BA1800451; Fri, 12 Sep 2025 03:30:09 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 06/15] khugepaged: introduce collapse_max_ptes_none helper function Date: Thu, 11 Sep 2025 21:28:01 -0600 Message-ID: <20250912032810.197475-7-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The current mechanism for determining mTHP collapse scales the khugepaged_max_ptes_none value based on the target order. This introduces an undesirable feedback loop, or "creep", when max_ptes_none is set to a value greater than HPAGE_PMD_NR / 2. With this configuration, a successful collapse to order N will populate enough pages to satisfy the collapse condition on order N+1 on the next scan. This leads to unnecessary work and memory churn. To fix this issue introduce a helper function that caps the max_ptes_none to HPAGE_PMD_NR / 2 - 1 (255 on 4k page size). The function also scales the max_ptes_none number by the (PMD_ORDER - target collapse order). Signed-off-by: Nico Pache --- mm/khugepaged.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0ae0b63fc9b..4587f2def5c1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -468,6 +468,26 @@ void __khugepaged_enter(struct mm_struct *mm) wake_up_interruptible(&khugepaged_wait); } =20 +/* Returns the scaled max_ptes_none for a given order. + * Caps the value to HPAGE_PMD_NR/2 - 1 in the case of mTHP collapse to pr= event + * a feedback loop. If max_ptes_none is greater than HPAGE_PMD_NR/2, the v= alue + * would lead to collapses that introduces 2x more pages than the original + * number of pages. On subsequent scans, the max_ptes_none check would be + * satisfied and the collapses would continue until the largest order is r= eached + */ +static int collapse_max_ptes_none(unsigned int order) +{ + int max_ptes_none; + + if (order !=3D HPAGE_PMD_ORDER && + khugepaged_max_ptes_none >=3D HPAGE_PMD_NR/2) + max_ptes_none =3D HPAGE_PMD_NR/2 - 1; + else + max_ptes_none =3D khugepaged_max_ptes_none; + return max_ptes_none >> (HPAGE_PMD_ORDER - order); + +} + void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags) { @@ -554,7 +574,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, struct folio *folio =3D NULL; pte_t *_pte; int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, referenced = =3D 0; - int scaled_max_ptes_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER= - order); + int scaled_max_ptes_none =3D collapse_max_ptes_none(order); const unsigned long nr_pages =3D 1UL << order; =20 for (_pte =3D pte; _pte < pte + nr_pages; --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4280923D7E0 for ; Fri, 12 Sep 2025 03:30:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647842; cv=none; b=TRIuXGTqG/QQ0LRR8pz7k/CMc2lahMr8kl8ySVSitbV7SwLQwrKBP7Er2lZctD31xbT4uySjDmkzZuJRigZbq9FDWoPUQQz8jcUNBiIdi9Ds3FUl2RzMvFSXTfgsq+zbTLk+GK3SOnQe8LtQ7Y7IYVnQRYsyc5AvabTK3ROaKuA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647842; c=relaxed/simple; bh=G/TWbH7qtzOZJz68h0HFliZ4cz0doC0TG+9C2dumlv0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m5VVtS4oQiiDQMDScFtVDWEU8SLRK3C6uxKrZ7jJUreQHYoTZ03Tc9shQI5M+yAja/plc6qO4kgzdzQqdyalqEpPCtlobGTGt0uzhGso9wuSheAfMIAw4+girOviTxLhEUM+kAjDu5HwsiYl8C5+TpCTahRpvCN7AMnPM2XyypI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SsTEyDlV; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SsTEyDlV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647839; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TUbZXuD/tIL/iLobUEFL18qus/bMdCN0O7dskgZnAgQ=; b=SsTEyDlVqjIH26uIV+L7/Dj7VFUKlrNlFiHtWLRfnL47rHOGiz0T6dLcP/HSkVmk9pWZLb dbEnpElJ0zUqVEcrWXK/XMWDjbHr9Wx4eV5naobPyk2X1YiBcM7Nmd0jODfm0HU3EKPd4d HA42BxGYp5X3+E+veYwaBQ08LtOsVFA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-463-EsB3IwknOo2sXoPAEPBHNQ-1; Thu, 11 Sep 2025 23:30:34 -0400 X-MC-Unique: EsB3IwknOo2sXoPAEPBHNQ-1 X-Mimecast-MFC-AGG-ID: EsB3IwknOo2sXoPAEPBHNQ_1757647829 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B4EED180048E; Fri, 12 Sep 2025 03:30:29 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DEE5C18004D8; Fri, 12 Sep 2025 03:30:19 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 07/15] khugepaged: generalize collapse_huge_page for mTHP collapse Date: Thu, 11 Sep 2025 21:28:02 -0600 Message-ID: <20250912032810.197475-8-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Pass an order and offset to collapse_huge_page to support collapsing anon memory to arbitrary orders within a PMD. order indicates what mTHP size we are attempting to collapse to, and offset indicates were in the PMD to start the collapse attempt. For non-PMD collapse we must leave the anon VMA write locked until after we collapse the mTHP-- in the PMD case all the pages are isolated, but in the mTHP case this is not true, and we must keep the lock to prevent changes to the VMA from occurring. Signed-off-by: Nico Pache --- mm/khugepaged.c | 123 +++++++++++++++++++++++++++++------------------- 1 file changed, 74 insertions(+), 49 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4587f2def5c1..248947e78a30 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1139,43 +1139,50 @@ static int alloc_charge_folio(struct folio **foliop= , struct mm_struct *mm, return SCAN_SUCCEED; } =20 -static int collapse_huge_page(struct mm_struct *mm, unsigned long address, - int referenced, int unmapped, - struct collapse_control *cc) +static int collapse_huge_page(struct mm_struct *mm, unsigned long pmd_addr= ess, + int referenced, int unmapped, struct collapse_control *cc, + bool *mmap_locked, unsigned int order, unsigned long offset) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; - pte_t *pte; + pte_t *pte =3D NULL, mthp_pte; pgtable_t pgtable; struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result =3D SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; + bool anon_vma_locked =3D false; + const unsigned long nr_pages =3D 1UL << order; + unsigned long mthp_address =3D pmd_address + offset * PAGE_SIZE; =20 - VM_BUG_ON(address & ~HPAGE_PMD_MASK); + VM_BUG_ON(pmd_address & ~HPAGE_PMD_MASK); =20 /* * Before allocating the hugepage, release the mmap_lock read lock. * The allocation can take potentially a long time if it involves * sync compaction, and we do not need to hold the mmap_lock during * that. We will recheck the vma after taking it again in write mode. + * If collapsing mTHPs we may have already released the read_lock. */ - mmap_read_unlock(mm); + if (*mmap_locked) { + mmap_read_unlock(mm); + *mmap_locked =3D false; + } =20 - result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); + result =3D alloc_charge_folio(&folio, mm, cc, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, - HPAGE_PMD_ORDER); + *mmap_locked =3D true; + result =3D hugepage_vma_revalidate(mm, pmd_address, true, &vma, cc, order= ); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; } =20 - result =3D find_pmd_or_thp_or_none(mm, address, &pmd); + result =3D find_pmd_or_thp_or_none(mm, pmd_address, &pmd); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1187,13 +1194,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * released when it fails. So we jump out_nolock directly in * that case. Continuing to collapse causes inconsistency. */ - result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_swapin(mm, vma, mthp_address, pmd, + referenced, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; } =20 mmap_read_unlock(mm); + *mmap_locked =3D false; /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM @@ -1203,20 +1211,20 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, - HPAGE_PMD_ORDER); + result =3D hugepage_vma_revalidate(mm, pmd_address, true, &vma, cc, order= ); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ vma_start_write(vma); - result =3D check_pmd_still_valid(mm, address, pmd); + result =3D check_pmd_still_valid(mm, pmd_address, pmd); if (result !=3D SCAN_SUCCEED) goto out_up_write; =20 anon_vma_lock_write(vma->anon_vma); + anon_vma_locked =3D true; =20 - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, - address + HPAGE_PMD_SIZE); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, mthp_address, + mthp_address + (PAGE_SIZE << order)); mmu_notifier_invalidate_range_start(&range); =20 pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ @@ -1228,24 +1236,21 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * Parallel GUP-fast is fine since GUP-fast will back off when * it detects PMD is changed. */ - _pmd =3D pmdp_collapse_flush(vma, address, pmd); + _pmd =3D pmdp_collapse_flush(vma, pmd_address, pmd); spin_unlock(pmd_ptl); mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, mthp_address, &pte_ptl); if (pte) { - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - HPAGE_PMD_ORDER, - &compound_pagelist); + result =3D __collapse_huge_page_isolate(vma, mthp_address, pte, cc, + order, &compound_pagelist); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; } =20 if (unlikely(result !=3D SCAN_SUCCEED)) { - if (pte) - pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); /* @@ -1255,21 +1260,21 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, */ pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); - anon_vma_unlock_write(vma->anon_vma); goto out_up_write; } =20 /* - * All pages are isolated and locked so anon_vma rmap - * can't run anymore. + * For PMD collapse all pages are isolated and locked so anon_vma + * rmap can't run anymore. For mTHP collapse we must hold the lock */ - anon_vma_unlock_write(vma->anon_vma); + if (order =3D=3D HPAGE_PMD_ORDER) { + anon_vma_unlock_write(vma->anon_vma); + anon_vma_locked =3D false; + } =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, - vma, address, pte_ptl, - HPAGE_PMD_ORDER, - &compound_pagelist); - pte_unmap(pte); + vma, mthp_address, pte_ptl, + order, &compound_pagelist); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; =20 @@ -1279,27 +1284,48 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * write. */ __folio_mark_uptodate(folio); - pgtable =3D pmd_pgtable(_pmd); - - _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); - _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order =3D=3D HPAGE_PMD_ORDER) { + pgtable =3D pmd_pgtable(_pmd); + _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); + _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + WARN_ON_ONCE(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, pmd_address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, pmd_address, pmd, _pmd); + update_mmu_cache_pmd(vma, pmd_address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { /* mTHP collapse */ + mthp_pte =3D mk_pte(folio_page(folio, 0), vma->vm_page_prot); + mthp_pte =3D maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl); + WARN_ON_ONCE(!pmd_none(*pmd)); + folio_ref_add(folio, nr_pages - 1); + folio_add_new_anon_rmap(folio, vma, mthp_address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + set_ptes(vma->vm_mm, mthp_address, pte, mthp_pte, nr_pages); + update_mmu_cache_range(NULL, vma, mthp_address, pte, nr_pages); + + smp_wmb(); /* make PTEs visible before PMD. See pmd_install() */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); + spin_unlock(pmd_ptl); + } =20 folio =3D NULL; =20 result =3D SCAN_SUCCEED; out_up_write: + if (anon_vma_locked) + anon_vma_unlock_write(vma->anon_vma); + if (pte) + pte_unmap(pte); mmap_write_unlock(mm); out_nolock: + *mmap_locked =3D false; if (folio) folio_put(folio); trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result); @@ -1467,9 +1493,8 @@ static int collapse_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked =3D false; + unmapped, cc, mmap_locked, + HPAGE_PMD_ORDER, 0); } out: trace_mm_khugepaged_scan_pmd(mm, folio, referenced, --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E4D272613 for ; Fri, 12 Sep 2025 03:30:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647858; cv=none; b=Jf60NrNE1MORqTn66IdEV7T6dwYxYjUHXCWAFYuFp4krpHZoN9basfAVRDeS7j74YawsyYjo1IHQPadzLGI4D0+Mk9lgDKtP+8i5SrrK91ogUvCEqKpkIreSI45oSkq5VT1REKEsqQK35Z84f06Og96SfPMry8gFwPd2/lw82tc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647858; c=relaxed/simple; bh=qSZr/HiNxE8+FqedU8EAcRa7l0+8fuT7AyBwMXmIz3w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SpxVoHeacT+cayLDuknRgXfxi1Pc/MjkhawBNF/IcrTseiHhtWpHbJN7H0yCvs+Ysz8YjKFMyVStlatbHgnby/ivgPeQuMQrldYNtINSVmtKrTDfCk2gO/OTI/q9xoKOWLlwmiSf8vTLHIl8O5jBnxlbhK6VaQ9R8XJ0BRo9drE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QdO9fSHl; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QdO9fSHl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647855; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9kc/D3R1E4tB3N46/xRvTcnE3zMFjQQ6ARvrOttPH00=; b=QdO9fSHlDwoYx2IAdMWwJGjzy+D4RzQlp9HPz1aXvrxgfczU+/GEc1UZsfjWiLUgJNRRBW bUzU8XPJSG52q/EwkAa3UG4VXNTcSuOxSUE9WntjF1eRe5sEwtgmUxn53/NqOsSMv24ude h9CXaPJwa7+k914CQSpwu55uXF5B2v0= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-322-iImcHlBZOBqp8n-HOYvhmg-1; Thu, 11 Sep 2025 23:30:49 -0400 X-MC-Unique: iImcHlBZOBqp8n-HOYvhmg-1 X-Mimecast-MFC-AGG-ID: iImcHlBZOBqp8n-HOYvhmg_1757647839 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A3EDD1953943; Fri, 12 Sep 2025 03:30:39 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0FD5A180035E; Fri, 12 Sep 2025 03:30:29 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 08/15] khugepaged: skip collapsing mTHP to smaller orders Date: Thu, 11 Sep 2025 21:28:03 -0600 Message-ID: <20250912032810.197475-9-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" khugepaged may try to collapse a mTHP to a smaller mTHP, resulting in some pages being unmapped. Skip these cases until we have a way to check if its ok to collapse to a smaller mTHP size (like in the case of a partially mapped folio). This patch is inspired by Dev Jain's work on khugepaged mTHP support [1]. [1] https://lore.kernel.org/lkml/20241216165105.56185-11-dev.jain@arm.com/ Reviewed-by: Lorenzo Stoakes Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 248947e78a30..ebcc0c85a0d6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -610,6 +610,15 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 + /* + * TODO: In some cases of partially-mapped folios, we'd actually + * want to collapse. + */ + if (order !=3D HPAGE_PMD_ORDER && folio_order(folio) >=3D order) { + result =3D SCAN_PTE_MAPPED_HUGEPAGE; + goto out; + } + /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8047219C542 for ; Fri, 12 Sep 2025 03:30:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647860; cv=none; b=qpwBS0uYGO/MsxD+NbE58MEn4FaXEScwWs8HShLYA2ahbRVDoTNjGWTm+DrF3w/2Jb3RBYeE5yxwZoURTnnSjpKts5dB0vdf08w9WkEBJv7UCvayI9L6mFrukbq409Phfm2jSt5HWI+VJcGJKtW/IdLx4mn4QxGnbMPV+Fjz/1w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647860; c=relaxed/simple; bh=Jci8ngpqx3JVWXWH5shxMVzQkMl+YiLgh5yMTjiv60g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F/ARQJueoQrjpiy6d7BQtygQZoi+BAdKKMqCrZ2H/C3PVsISFT3QAdAw9fFhK8oMC1Pelg3TzI02XmL6RV2eSqqm00g7cdkx0RLeEvol3KlGziCHzomdaJ5Z2EtxxGVt1QGmUxX9mdoCE/5GcCesCHxA60nWwMxBhur8BDmhOZI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bw1GDi97; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bw1GDi97" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4sAUP2U7MWVtynazKGYhR7cq+RzRKWoz39ZZu9Fjrk0=; b=bw1GDi97prlvtCx+0bEDfk54s+hAPthdt5odcLbRrmT8S7MWe8StLYAURWUHhYV8hZ7gR+ /4ZOyijXsko26WotLfaIfuMIq/VL0+fzfyH3tqWDxiBiDofNVAhRZmhsham3zzyT8jVpqA hwd58PlB9Z7v/cmoYgDm+RcsY2TGTp8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-599-zKvBvyLYNf-nURx5EBNA6w-1; Thu, 11 Sep 2025 23:30:54 -0400 X-MC-Unique: zKvBvyLYNf-nURx5EBNA6w-1 X-Mimecast-MFC-AGG-ID: zKvBvyLYNf-nURx5EBNA6w_1757647849 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 88BD51800289; Fri, 12 Sep 2025 03:30:49 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id F1C351800452; Fri, 12 Sep 2025 03:30:39 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 09/15] khugepaged: add per-order mTHP collapse failure statistics Date: Thu, 11 Sep 2025 21:28:04 -0600 Message-ID: <20250912032810.197475-10-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Add three new mTHP statistics to track collapse failures for different orders when encountering swap PTEs, excessive none PTEs, and shared PTEs: - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to swap PTEs - collapse_exceed_none_pte: Counts when mTHP collapse fails due to exceeding the none PTE threshold for the given order - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to shared PTEs These statistics complement the existing THP_SCAN_EXCEED_* events by providing per-order granularity for mTHP collapse attempts. The stats are exposed via sysfs under `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each supported hugepage size. As we currently dont support collapsing mTHPs that contain a swap or shared entry, those statistics keep track of how often we are encountering failed mTHP collapses due to these restrictions. Signed-off-by: Nico Pache Reviewed-by: Baolin Wang --- Documentation/admin-guide/mm/transhuge.rst | 23 ++++++++++++++++++++++ include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 16 ++++++++++++--- 4 files changed, 46 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 13269a0074d4..7c71cda8aea1 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -709,6 +709,29 @@ nr_anon_partially_mapped an anonymous THP as "partially mapped" and count it here, even thou= gh it is not actually partially mapped anymore. =20 +collapse_exceed_none_pte + The number of anonymous mTHP pte ranges where the number of none PT= Es + exceeded the max_ptes_none threshold. For mTHP collapse, khugepaged + checks a PMD region and tracks which PTEs are present. It then tries + to collapse to the largest enabled mTHP size. The allowed number of= empty + PTEs is the max_ptes_none threshold scaled by the collapse order. T= his + counter records the number of times a collapse attempt was skipped = for + this reason, and khugepaged moved on to try the next available mTHP= size. + +collapse_exceed_swap_pte + The number of anonymous mTHP pte ranges which contain at least one = swap + PTE. Currently khugepaged does not support collapsing mTHP regions + that contain a swap PTE. This counter can be used to monitor the + number of khugepaged mTHP collapses that failed due to the presence + of a swap PTE. + +collapse_exceed_shared_pte + The number of anonymous mTHP pte ranges which contain at least one = shared + PTE. Currently khugepaged does not support collapsing mTHP pte rang= es + that contain a shared PTE. This counter can be used to monitor the + number of khugepaged mTHP collapses that failed due to the presence + of a shared PTE. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d442f45bd458..990622c96c8b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -144,6 +144,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 76509e3d845b..07ea9aafd64c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -638,6 +638,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FA= ILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALL= Y_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_= SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_= NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEE= D_SHARED); + =20 static struct attribute *anon_stats_attrs[] =3D { &anon_fault_alloc_attr.attr, @@ -654,6 +658,9 @@ static struct attribute *anon_stats_attrs[] =3D { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ebcc0c85a0d6..8abbe6e4317a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -589,7 +589,9 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, continue; } else { result =3D SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } } @@ -628,10 +630,17 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, * shared may cause a future higher order collapse on a * rescan of the same range. */ - if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (order !=3D HPAGE_PMD_ORDER) { + result =3D SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out; } } @@ -1071,6 +1080,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, * range. */ if (order !=3D HPAGE_PMD_ORDER) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); pte_unmap(pte); mmap_read_unlock(mm); result =3D SCAN_EXCEED_SWAP_PTE; --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFE1F23F42D for ; Fri, 12 Sep 2025 03:31:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647884; cv=none; b=eKvQIUfYK4oZzuta1mLi+fqq6A6jw37y5Kj2v4MJxmz/bJ+3P4r+dM8ywY/Yy9yuuGvVuufGFfmE8Ewegt9h7Rsca5MH1jhORLu9Rvn7Nfl25XFmpaWh92Nf9y/MzVxec8heguckQ2jhoF+OfkDLgzdeV1OX9eVQPI4CBtWPLhw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647884; c=relaxed/simple; bh=vPMK5R4RlQRQ6r3NuyuboLu4rMNm0f4Y/UIQy8HTItI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uM5cUDux4uf2yfoXddKaDs+a5OVvA67Gc+UmWTpzZQAXqN6L50El+iRFX0T9VoPAHffZAbo7rgNp1mcqo0ong04xxXbU+vQ3ABJtbXq0/z2vl5ak2CSZkjv/Nu4sxuyqYEKQem3PKkWJ6yugUpR4FK0giv4JjaxO36tppDFQOZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Qxeh6Mcb; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Qxeh6Mcb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647881; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZpCf0X6KApE3Nxwemh/YTyo0WnmYkSshuYDdaCzSaMA=; b=Qxeh6McbONwjUiNATaxu9+4MLtZ5bYRM7WWTyqXqLzkEJJAVhIKMEJ8LtyYU8+zJC7ljuQ CKZ+9s0bJQ5l00sQo3gpjqQlL/ef/gR2H3B7ozeq+IJpU3Yi/oNZMmG53XogJ+KJceU+rh +7D4xIWqAwLATCXyzNhirYsIm+i0Mro= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-608-C_W9vkk9M-mFjBmhEYVg1g-1; Thu, 11 Sep 2025 23:31:16 -0400 X-MC-Unique: C_W9vkk9M-mFjBmhEYVg1g-1 X-Mimecast-MFC-AGG-ID: C_W9vkk9M-mFjBmhEYVg1g_1757647859 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 292711800452; Fri, 12 Sep 2025 03:30:59 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D9090180035E; Fri, 12 Sep 2025 03:30:49 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 10/15] khugepaged: improve tracepoints for mTHP orders Date: Thu, 11 Sep 2025 21:28:05 -0600 Message-ID: <20250912032810.197475-11-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Add the order to the mm_collapse_huge_page<_swapin,_isolate> tracepoints to give better insight into what order is being operated at for. Acked-by: David Hildenbrand Reviewed-by: Lorenzo Stoakes Reviewed-by: Baolin Wang Signed-off-by: Nico Pache --- include/trace/events/huge_memory.h | 34 +++++++++++++++++++----------- mm/khugepaged.c | 9 ++++---- 2 files changed, 27 insertions(+), 16 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index dd94d14a2427..19d99b2549e6 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -88,40 +88,44 @@ TRACE_EVENT(mm_khugepaged_scan_pmd, =20 TRACE_EVENT(mm_collapse_huge_page, =20 - TP_PROTO(struct mm_struct *mm, int isolated, int status), + TP_PROTO(struct mm_struct *mm, int isolated, int status, unsigned int ord= er), =20 - TP_ARGS(mm, isolated, status), + TP_ARGS(mm, isolated, status, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, isolated) __field(int, status) + __field(unsigned int, order) ), =20 TP_fast_assign( __entry->mm =3D mm; __entry->isolated =3D isolated; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, isolated=3D%d, status=3D%s", + TP_printk("mm=3D%p, isolated=3D%d, status=3D%s order=3D%u", __entry->mm, __entry->isolated, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_isolate, =20 TP_PROTO(struct folio *folio, int none_or_zero, - int referenced, int status), + int referenced, int status, unsigned int order), =20 - TP_ARGS(folio, none_or_zero, referenced, status), + TP_ARGS(folio, none_or_zero, referenced, status, order), =20 TP_STRUCT__entry( __field(unsigned long, pfn) __field(int, none_or_zero) __field(int, referenced) __field(int, status) + __field(unsigned int, order) ), =20 TP_fast_assign( @@ -129,26 +133,30 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __entry->none_or_zero =3D none_or_zero; __entry->referenced =3D referenced; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, status= =3D%s", + TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, status= =3D%s order=3D%u", __entry->pfn, __entry->none_or_zero, __entry->referenced, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_swapin, =20 - TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret), + TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret, + unsigned int order), =20 - TP_ARGS(mm, swapped_in, referenced, ret), + TP_ARGS(mm, swapped_in, referenced, ret, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, swapped_in) __field(int, referenced) __field(int, ret) + __field(unsigned int, order) ), =20 TP_fast_assign( @@ -156,13 +164,15 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->swapped_in =3D swapped_in; __entry->referenced =3D referenced; __entry->ret =3D ret; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d", + TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d, order=3D%= u", __entry->mm, __entry->swapped_in, __entry->referenced, - __entry->ret) + __entry->ret, + __entry->order) ); =20 TRACE_EVENT(mm_khugepaged_scan_file, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8abbe6e4317a..5b45ef575446 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -720,13 +720,13 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, } else { result =3D SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(folio, none_or_zero, - referenced, result); + referenced, result, order); return result; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(folio, none_or_zero, - referenced, result); + referenced, result, order); return result; } =20 @@ -1121,7 +1121,8 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, =20 result =3D SCAN_SUCCEED; out: - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result, + order); return result; } =20 @@ -1347,7 +1348,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long pmd_address, *mmap_locked =3D false; if (folio) folio_put(folio); - trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result); + trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result, order= ); return result; } =20 --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E2EA23AB8D for ; Fri, 12 Sep 2025 03:31:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647880; cv=none; b=Ug7/8A46aquBpdLiXnQsINrr8hf/ch9p5yr9AWnC0pa7m+JQrRxtD4ip4FOtLvDCHEtKJzKtZ6tLDB4y14lpNrIyitvn878bTG77MIbk+c8alDWtSjqrRp87TV7WqzwEtq3VkrqALyssNV6eFc0UxSCJNxWVKlvFpbRFPO1JuOE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647880; c=relaxed/simple; bh=EzKPOnb9l2Kea1z5jTBc2IYS7y3gI8Yvm7VlBy/4eMU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CmceCPAQ/b1aXd01xUwHlT2dG1z9eCoFw4zgx/tc9zRfpOmPdr5emippc+tQ7hUeeMN8aUAXESz3HhZ7WKpokTo3DpIgoIuy0DM4DkEZKrmmzG+bSQQkeKsiOYqJSg2wu7XcmYDxG1CpUCcgEasVsFvtol/pon3NXuZHBUPQvKg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=C2P4RD9p; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="C2P4RD9p" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wETBaE+dYsc2nQqnQRxwAppd6Mkv5OwMkb/6jMq01AA=; b=C2P4RD9prNbym8yaD5z3C3vmfA7DFgzU/+YtP8ek2itcmH01u43AMSK0mTIaXwKj2FsuI+ eW8XiZgy716kd+oax5hncR9v0OAEHbryh8U/erEc7bEJQGygxKzGz2wD/nVdPcNkHTPVyo jeA+Ci54QsAlvPInGzW7sfmOmfW5NgY= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-263-A36zdP3KMpCQ6dV6BkoLuQ-1; Thu, 11 Sep 2025 23:31:13 -0400 X-MC-Unique: A36zdP3KMpCQ6dV6BkoLuQ-1 X-Mimecast-MFC-AGG-ID: A36zdP3KMpCQ6dV6BkoLuQ_1757647869 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 821101944F11; Fri, 12 Sep 2025 03:31:08 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 79C801800451; Fri, 12 Sep 2025 03:30:59 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 11/15] khugepaged: introduce collapse_allowable_orders helper function Date: Thu, 11 Sep 2025 21:28:06 -0600 Message-ID: <20250912032810.197475-12-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Add collapse_allowable_orders() to generalize THP order eligibility. The function determines which THP orders are permitted based on collapse context (khugepaged vs madv_collapse). This consolidates collapse configuration logic and provides a clean interface for future mTHP collapse support where the orders may be different. Signed-off-by: Nico Pache Reviewed-by: Baolin Wang --- mm/khugepaged.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5b45ef575446..d224fa97281a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -485,7 +485,16 @@ static int collapse_max_ptes_none(unsigned int order) else max_ptes_none =3D khugepaged_max_ptes_none; return max_ptes_none >> (HPAGE_PMD_ORDER - order); +} + +/* Check what orders are allowed based on the vma and collapse type */ +static unsigned long collapse_allowable_orders(struct vm_area_struct *vma, + vm_flags_t vm_flags, bool is_khugepaged) +{ + enum tva_type tva_flags =3D is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_C= OLLAPSE; + unsigned long orders =3D BIT(HPAGE_PMD_ORDER); =20 + return thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); } =20 void khugepaged_enter_vma(struct vm_area_struct *vma, @@ -493,7 +502,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) + if (collapse_allowable_orders(vma, vm_flags, true)) __khugepaged_enter(vma->vm_mm); } } @@ -2557,7 +2566,7 @@ static unsigned int collapse_scan_mm_slot(unsigned in= t pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORD= ER)) { + if (!collapse_allowable_orders(vma, vma->vm_flags, true)) { skip: progress++; continue; @@ -2865,7 +2874,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, BUG_ON(vma->vm_start > start); BUG_ON(vma->vm_end < end); =20 - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) + if (!collapse_allowable_orders(vma, vma->vm_flags, false)) return -EINVAL; =20 cc =3D kmalloc(sizeof(*cc), GFP_KERNEL); --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9580524BBE4 for ; Fri, 12 Sep 2025 03:31:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647888; cv=none; b=usP6wjO8rXGUJWJ3YLoQYi4QiZwKGvOzsTvVjPe47wOWCBiCKK82S5lVUKYVR+vMAdJXUir3ZdDVfnKFsFcldTjUzctOta+bIJQoPEtuAiJUWUuOAWJMP3mBX9V/4Pezd5KpjKr3dgBhCP8bfkQ8jOIHceLjjv/TTZovWYBAmBo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647888; c=relaxed/simple; bh=QiF62P7y9kfFvmYVothonawruXk+0yM5shhhM64GHCQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eEY53a6F1giFREKSGERT724Pok4sDLLcD13pNq0Z0qhIPrbpXMJ/vXitFJO0iDF2DGNN6bKBDJvAvib3MhTKLc1tFdMlmfgqK5lHy3lXsElqdwenyTuT3eJxQBr8fCue/xsYtZB0J8N8up3jA5C11w54bvP6DgLu4jrBni44q5I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S/qGDt6o; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S/qGDt6o" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647885; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4p++l5rxwNyrk8zr7oH7Pc+TPQw68SwHfcfHt5STCj0=; b=S/qGDt6od923oMSTmUHASeJ0TRb7tKgGo8TPW4F55cpEs1K1NuX3jSXaQexvuLD6dCYkkI tukU2BXmsdreBP8WvyI/yARgjbSzzEaPIybZOhL24WQBhz5rBqzWxdumatw8dWlMFppDwf AV8GzEx/EJOREmVQ6Ms3aaYky0w/pho= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-113-0zkMekGkMPGxxyZHp6zZtQ-1; Thu, 11 Sep 2025 23:31:22 -0400 X-MC-Unique: 0zkMekGkMPGxxyZHp6zZtQ-1 X-Mimecast-MFC-AGG-ID: 0zkMekGkMPGxxyZHp6zZtQ_1757647878 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 181DF1800452; Fri, 12 Sep 2025 03:31:18 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D2EED1800452; Fri, 12 Sep 2025 03:31:08 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 12/15] khugepaged: Introduce mTHP collapse support Date: Thu, 11 Sep 2025 21:28:07 -0600 Message-ID: <20250912032810.197475-13-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" During PMD range scanning, track occupied pages in a bitmap. If mTHPs are enabled we remove the restriction of max_ptes_none during the scan phase to avoid missing potential mTHP candidates. Implement collapse_scan_bitmap() to perform binary recursion on the bitmap and determine the best eligible order for the collapse. A stack struct is used instead of traditional recursion. The algorithm splits the bitmap into smaller chunks to find the best fit mTHP. max_ptes_none is scaled by the attempted collapse order to determine how "full" an order must be before being considered for collapse. Once we determine what mTHP sizes fits best in that PMD range a collapse is attempted. A minimum collapse order of 2 is used as this is the lowest order supported by anon memory. mTHP collapses reject regions containing swapped out or shared pages. This is because adding new entries can lead to new none pages, and these may lead to constant promotion into a higher order (m)THP. A similar issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to a collapse introducing at least 2x the number of pages, and on a future scan will satisfy the promotion condition once again. This issue is prevented via the collapse_allowable_orders() function. Currently madv_collapse is not supported and will only attempt PMD collapse. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 2 + mm/khugepaged.c | 123 ++++++++++++++++++++++++++++++++++--- 2 files changed, 116 insertions(+), 9 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index eb1946a70cff..179ce716e769 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,6 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H +#define KHUGEPAGED_MIN_MTHP_ORDER 2 +#define MAX_MTHP_BITMAP_STACK (1UL << (ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGE= D_MIN_MTHP_ORDER)) =20 #include =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d224fa97281a..8455a02dc3d6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -93,6 +93,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SL= OTS_HASH_BITS); =20 static struct kmem_cache *mm_slot_cache __ro_after_init; =20 +struct scan_bit_state { + u8 order; + u16 offset; +}; + struct collapse_control { bool is_khugepaged; =20 @@ -101,6 +106,13 @@ struct collapse_control { =20 /* nodemask for allocation fallback */ nodemask_t alloc_nmask; + + /* + * bitmap used to collapse mTHP sizes. + */ + DECLARE_BITMAP(mthp_bitmap, HPAGE_PMD_NR); + DECLARE_BITMAP(mthp_bitmap_mask, HPAGE_PMD_NR); + struct scan_bit_state mthp_bitmap_stack[MAX_MTHP_BITMAP_STACK]; }; =20 /** @@ -1361,6 +1373,85 @@ static int collapse_huge_page(struct mm_struct *mm, = unsigned long pmd_address, return result; } =20 +static void push_mthp_bitmap_stack(struct collapse_control *cc, int *top, + u8 order, u16 offset) +{ + cc->mthp_bitmap_stack[++*top] =3D (struct scan_bit_state) + { order, offset }; +} + +/* + * collapse_scan_bitmap() consumes the bitmap that is generated during + * collapse_scan_pmd() to determine what regions and mTHP orders fit best. + * + * Each bit in the bitmap represents a single occupied (!none/zero) page. + * A stack structure cc->mthp_bitmap_stack is used to check different regi= ons + * of the bitmap for collapse eligibility. We start at the PMD order and + * check if it is eligible for collapse; if not, we add two entries to the + * stack at a lower order to represent the left and right halves of the re= gion. + * + * For each region, we calculate the number of set bits and compare it + * against a threshold derived from collapse_max_ptes_none(). A region is + * eligible if the number of set bits exceeds this threshold. + */ +static int collapse_scan_bitmap(struct mm_struct *mm, unsigned long addres= s, + int referenced, int unmapped, struct collapse_control *cc, + bool *mmap_locked, unsigned long enabled_orders) +{ + u8 order, next_order; + u16 offset, mid_offset; + int num_chunks; + int bits_set, threshold_bits; + int top =3D -1; + int collapsed =3D 0; + int ret; + struct scan_bit_state state; + unsigned int max_none_ptes; + + push_mthp_bitmap_stack(cc, &top, HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_OR= DER, 0); + + while (top >=3D 0) { + state =3D cc->mthp_bitmap_stack[top--]; + order =3D state.order + KHUGEPAGED_MIN_MTHP_ORDER; + offset =3D state.offset; + num_chunks =3D 1UL << order; + + /* Skip mTHP orders that are not enabled */ + if (!test_bit(order, &enabled_orders)) + goto next_order; + + max_none_ptes =3D collapse_max_ptes_none(order); + + /* Calculate weight of the range */ + bitmap_zero(cc->mthp_bitmap_mask, HPAGE_PMD_NR); + bitmap_set(cc->mthp_bitmap_mask, offset, num_chunks); + bits_set =3D bitmap_weight_and(cc->mthp_bitmap, + cc->mthp_bitmap_mask, HPAGE_PMD_NR); + + threshold_bits =3D (1UL << order) - max_none_ptes - 1; + + /* Check if the region is eligible based on the threshold */ + if (bits_set > threshold_bits) { + ret =3D collapse_huge_page(mm, address, referenced, + unmapped, cc, mmap_locked, + order, offset); + if (ret =3D=3D SCAN_SUCCEED) { + collapsed +=3D 1UL << order; + continue; + } + } + +next_order: + if (state.order > 0) { + next_order =3D state.order - 1; + mid_offset =3D offset + (num_chunks / 2); + push_mthp_bitmap_stack(cc, &top, next_order, mid_offset); + push_mthp_bitmap_stack(cc, &top, next_order, offset); + } + } + return collapsed; +} + static int collapse_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1368,30 +1459,39 @@ static int collapse_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; + int i; int result =3D SCAN_FAIL, referenced =3D 0; - int none_or_zero =3D 0, shared =3D 0; + int none_or_zero =3D 0, shared =3D 0, nr_collapsed =3D 0; struct page *page =3D NULL; struct folio *folio =3D NULL; unsigned long _address; + unsigned long enabled_orders; spinlock_t *ptl; int node =3D NUMA_NO_NODE, unmapped =3D 0; - + bool is_pmd_only; VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 result =3D find_pmd_or_thp_or_none(mm, address, &pmd); if (result !=3D SCAN_SUCCEED) goto out; =20 + bitmap_zero(cc->mthp_bitmap, HPAGE_PMD_NR); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + enabled_orders =3D collapse_allowable_orders(vma, vma->vm_flags, cc->is_k= hugepaged); + + is_pmd_only =3D enabled_orders =3D=3D _BITUL(HPAGE_PMD_ORDER); + pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); if (!pte) { result =3D SCAN_PMD_NULL; goto out; } =20 - for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, _address +=3D PAGE_SIZE) { + for (i =3D 0; i < HPAGE_PMD_NR; i++) { + _pte =3D pte + i; + _address =3D address + i * PAGE_SIZE; pte_t pteval =3D ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; @@ -1416,8 +1516,8 @@ static int collapse_scan_pmd(struct mm_struct *mm, if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { ++none_or_zero; if (!userfaultfd_armed(vma) && - (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + (!cc->is_khugepaged || !is_pmd_only || + none_or_zero <=3D khugepaged_max_ptes_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -1425,6 +1525,8 @@ static int collapse_scan_pmd(struct mm_struct *mm, goto out_unmap; } } + /* Set bit for occupied pages */ + bitmap_set(cc->mthp_bitmap, i, 1); if (pte_uffd_wp(pteval)) { /* * Don't collapse the page if any of the small @@ -1521,9 +1623,12 @@ static int collapse_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { - result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc, mmap_locked, - HPAGE_PMD_ORDER, 0); + nr_collapsed =3D collapse_scan_bitmap(mm, address, referenced, unmapped, + cc, mmap_locked, enabled_orders); + if (nr_collapsed > 0) + result =3D SCAN_SUCCEED; + else + result =3D SCAN_FAIL; } out: trace_mm_khugepaged_scan_pmd(mm, folio, referenced, --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49692257827 for ; Fri, 12 Sep 2025 03:31:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647897; cv=none; b=A6t+AwxnNbAsUT4x+oZDP52e2rBo45+eSZFa3m9D66VYykHO5Ndt+VcDyPHv7X/Jqa3e4++qnV37mXJZKivNhNauoNbz05BvnJESAUyuIJR+4fZHjm/+PHHBbzuaovlpSgcJXkgME6E1ndB6h/SDSVyGDr4zXhkP+NR7h1DN1d4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647897; c=relaxed/simple; bh=ri0YdrH20n1iZNTn51xGNWhcblljB0dVx/6+MetMLjo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=obaP4Y5dJQ31u1sI/OWS4P9jKH/drkvu5cwI5W5AVzFq3DtF6VSrOPCHWjb0W5pFLxvKg0FI7dp2ZAg+4Wk/YFSq/tfaFI+FOhRbwNNcEyFixk78ZRCSpy+dlea+u60xekLebjZLEXTNX80c+5S6WWF0B7+S0kGcnSF8EZGO5T4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QG5VL8aD; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QG5VL8aD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647895; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1WkSS2x1APmP4WSa/KdkdNQb+giopwv0nFFtO61rokc=; b=QG5VL8aDvjXb+lWdXmuMxlu52CiUl9XUdOoCeKczsH4LZsV6laN/7Dfh06VBivPiINNFJS tjAu3mohmKaGPtmlbxkcN7ErXw1z5dYIXCZTAvbQMpUrtJHl6v+SIsyT4ZSvoVLS1L1Fui fZtl71QOnWXmlbKxDl6G7jvbz3wHRT4= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-345-8MqFEGeOPS2NV-lcySUhFw-1; Thu, 11 Sep 2025 23:31:32 -0400 X-MC-Unique: 8MqFEGeOPS2NV-lcySUhFw-1 X-Mimecast-MFC-AGG-ID: 8MqFEGeOPS2NV-lcySUhFw_1757647887 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6C9471956086; Fri, 12 Sep 2025 03:31:27 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 68CC81800451; Fri, 12 Sep 2025 03:31:18 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 13/15] khugepaged: avoid unnecessary mTHP collapse attempts Date: Thu, 11 Sep 2025 21:28:08 -0600 Message-ID: <20250912032810.197475-14-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" There are cases where, if an attempted collapse fails, all subsequent orders are guaranteed to also fail. Avoid these collapse attempts by bailing out early. Signed-off-by: Nico Pache --- mm/khugepaged.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8455a02dc3d6..ead07ccac351 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1435,10 +1435,39 @@ static int collapse_scan_bitmap(struct mm_struct *m= m, unsigned long address, ret =3D collapse_huge_page(mm, address, referenced, unmapped, cc, mmap_locked, order, offset); - if (ret =3D=3D SCAN_SUCCEED) { + + /* + * Analyze failure reason to determine next action: + * - goto next_order: try smaller orders in same region + * - continue: try other regions at same order + * - break: stop all attempts (system-wide failure) + */ + switch (ret) { + /* Cases were we should continue to the next region */ + case SCAN_SUCCEED: collapsed +=3D 1UL << order; + fallthrough; + case SCAN_PTE_MAPPED_HUGEPAGE: continue; + /* Cases were lower orders might still succeed */ + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_EXCEED_NONE_PTE: + case SCAN_EXCEED_SWAP_PTE: + case SCAN_EXCEED_SHARED_PTE: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LRU: + case SCAN_PAGE_NULL: + case SCAN_DEL_PAGE_LRU: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_ALLOC_HUGE_PAGE_FAIL: + goto next_order; + /* All other cases should stop collapse attempts */ + default: + break; } + break; } =20 next_order: --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA5582417C6 for ; Fri, 12 Sep 2025 03:31:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647913; cv=none; b=VBmb1y2HxpuX5fKqEAceJScVjINbNrU8/QjWdopIE2ooDQrXy7yBXDdKI222gFXpk1z1eZzOj98nIwjiqC4qUUqOJ+xxZZfTy6UAgX/Ha1nm/uGgn+V90FzPeWisNNYWEGlN1jdjgneg06rFpPZqs1k+UZ+LG4rz2VzQDROYWnM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647913; c=relaxed/simple; bh=4hucCmp62NIMPMU+qernobwqubisaoRBFZXzSRkKsVM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kmczarKcE5qvYd+I2A+GwGlXRtMnjNwKEv9D4hUGKD4i7yYtOm1pBOavvowDvJwKTOOSPhS2JvRYbsrFKOITmhh6h94C/9fLsksKYzP79MW/Da6Nd23JUvaBH93nXP/514odjp0DZfW4vAx/5Wpkem5GLWIhqsZraF4ikSaJ320= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VOm3e+Jp; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VOm3e+Jp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=McPCd1ejK8Ux8giNR86/ZxqFOIr38LPX9NFmMA++CIQ=; b=VOm3e+JpP/ZXmCjSxbhXUTPywbALmNQh0TsuzFki7HiWsQ3PCi9c/O1fv8Ub9o94tAirVS 7KWuXRsfNH00Q4rwrBqoxJGHyA6EcR3OrkJGeu7UyYjde5RMtjf9Q3qFCXRu06JU0Io6z9 /LtiVionqwEBosTFNFyHJ5JCitZAzfU= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-315-9sEW2LWYP1CUlwfpO2-YTw-1; Thu, 11 Sep 2025 23:31:42 -0400 X-MC-Unique: 9sEW2LWYP1CUlwfpO2-YTw-1 X-Mimecast-MFC-AGG-ID: 9sEW2LWYP1CUlwfpO2-YTw_1757647897 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6556F1945102; Fri, 12 Sep 2025 03:31:37 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BBED71800452; Fri, 12 Sep 2025 03:31:27 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de Subject: [PATCH v11 14/15] khugepaged: run khugepaged for all orders Date: Thu, 11 Sep 2025 21:28:09 -0600 Message-ID: <20250912032810.197475-15-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" From: Baolin Wang If any order (m)THP is enabled we should allow running khugepaged to attempt scanning and collapsing mTHPs. In order for khugepaged to operate when only mTHP sizes are specified in sysfs, we must modify the predicate function that determines whether it ought to run to do so. This function is currently called hugepage_pmd_enabled(), this patch renames it to hugepage_enabled() and updates the logic to check to determine whether any valid orders may exist which would justify khugepaged running. We must also update collapse_allowable_orders() to check all orders if the vma is anonymous and the collapse is khugepaged. After this patch khugepaged mTHP collapse is fully enabled. Signed-off-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ead07ccac351..1c7f3224234e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -424,23 +424,23 @@ static inline int collapse_test_exit_or_disable(struc= t mm_struct *mm) mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); } =20 -static bool hugepage_pmd_enabled(void) +static bool hugepage_enabled(void) { /* * We cover the anon, shmem and the file-backed case here; file-backed * hugepages, when configured in, are determined by the global control. - * Anon pmd-sized hugepages are determined by the pmd-size control. + * Anon hugepages are determined by its per-size mTHP control. * Shmem pmd-sized hugepages are also determined by its pmd-size control, * except when the global shmem_huge is set to SHMEM_HUGE_DENY. */ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && hugepage_global_enabled()) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_always)) + if (READ_ONCE(huge_anon_orders_always)) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_madvise)) + if (READ_ONCE(huge_anon_orders_madvise)) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) && + if (READ_ONCE(huge_anon_orders_inherit) && hugepage_global_enabled()) return true; if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled()) @@ -504,7 +504,8 @@ static unsigned long collapse_allowable_orders(struct v= m_area_struct *vma, vm_flags_t vm_flags, bool is_khugepaged) { enum tva_type tva_flags =3D is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_C= OLLAPSE; - unsigned long orders =3D BIT(HPAGE_PMD_ORDER); + unsigned long orders =3D is_khugepaged && vma_is_anonymous(vma) ? + THP_ORDERS_ALL_ANON : BIT(HPAGE_PMD_ORDER); =20 return thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); } @@ -513,7 +514,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags) { if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && - hugepage_pmd_enabled()) { + hugepage_enabled()) { if (collapse_allowable_orders(vma, vm_flags, true)) __khugepaged_enter(vma->vm_mm); } @@ -2776,7 +2777,7 @@ static unsigned int collapse_scan_mm_slot(unsigned in= t pages, int *result, =20 static int khugepaged_has_work(void) { - return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled(); + return !list_empty(&khugepaged_scan.mm_head) && hugepage_enabled(); } =20 static int khugepaged_wait_event(void) @@ -2849,7 +2850,7 @@ static void khugepaged_wait_work(void) return; } =20 - if (hugepage_pmd_enabled()) + if (hugepage_enabled()) wait_event_freezable(khugepaged_wait, khugepaged_wait_event()); } =20 @@ -2880,7 +2881,7 @@ static void set_recommended_min_free_kbytes(void) int nr_zones =3D 0; unsigned long recommended_min; =20 - if (!hugepage_pmd_enabled()) { + if (!hugepage_enabled()) { calculate_min_free_kbytes(); goto update_wmarks; } @@ -2930,7 +2931,7 @@ int start_stop_khugepaged(void) int err =3D 0; =20 mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled()) { + if (hugepage_enabled()) { if (!khugepaged_thread) khugepaged_thread =3D kthread_run(khugepaged, NULL, "khugepaged"); @@ -2956,7 +2957,7 @@ int start_stop_khugepaged(void) void khugepaged_min_free_kbytes_update(void) { mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled() && khugepaged_thread) + if (hugepage_enabled() && khugepaged_thread) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } --=20 2.51.0 From nobody Thu Oct 2 20:46:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BA12242D9E for ; Fri, 12 Sep 2025 03:31:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647917; cv=none; b=WxxKwPo7YLdfwxhhhdzLB4FlUsh/n2jVvKvWXt1uAWizH2nsO3yNrl2bxdIT7Co9Ru0ahPdvh87PRPkRC+UMsSD6BgfPlZHJVvVf8fth6uaRRYTZ6sqbZI+iWogu7bpW5GSnwSTKTFTohxLfM1/d1G51cWkMR5EUd4SChVoa9hU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757647917; c=relaxed/simple; bh=Gs04Ke8ZQsZ5xlvRmdC5Jq9lQ9ztjFB5vVKPD0iGZ4o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P6Xy48Z6vR6plFrMOFuDgkFSudg8wR/y8OMpWWfOjn3yxhWD58EXWnBuJc0VtVAJrfiGnSDRT0Ni7MzkPcs2oM7PLMMNI7KX1e4fkv7epf+zr6Yx35uCvh+D9RuTSLx8NjwIrxBupPMYTNIZvOGatmwf+cAZA9gUTbh4w6W8tX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iZYZhMhM; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iZYZhMhM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757647914; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=12fmvtjbt9fuwT7NzQSsNvpS+gFolguWIorcWNt0KzA=; b=iZYZhMhMnCxlMRFz4o4ex8aLCdwvaGOpBbtPFw/JRZ/1Orc0PerrH/OOKrFRWqXgigrD3D iP37OGdITTRohIu3knIX+W7f/23wtAB+8bKp9zaW5y73ELPJ0/ddTy0+SD66N1ThIV1GrP Do+MAoUBUVqjV+1uYyS07dPd8KQqlGc= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-339-MDUdFEIgOUqx22hE6Lj6_w-1; Thu, 11 Sep 2025 23:31:52 -0400 X-MC-Unique: MDUdFEIgOUqx22hE6Lj6_w-1 X-Mimecast-MFC-AGG-ID: MDUdFEIgOUqx22hE6Lj6_w_1757647907 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5F6C11800365; Fri, 12 Sep 2025 03:31:47 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.28]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B446F1800451; Fri, 12 Sep 2025 03:31:37 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kas@kernel.org, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org, jannh@google.com, pfalcato@suse.de, Bagas Sanjaya Subject: [PATCH v11 15/15] Documentation: mm: update the admin guide for mTHP collapse Date: Thu, 11 Sep 2025 21:28:10 -0600 Message-ID: <20250912032810.197475-16-npache@redhat.com> In-Reply-To: <20250912032810.197475-1-npache@redhat.com> References: <20250912032810.197475-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Now that we can collapse to mTHPs lets update the admin guide to reflect these changes and provide proper guidence on how to utilize it. Reviewed-by: Bagas Sanjaya Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 60 +++++++++++++--------- 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 7c71cda8aea1..b3da713f7837 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -63,7 +63,8 @@ often. THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is ``khugepaged`` daemon that scans memory and -collapses sequences of basic pages into PMD-sized huge pages. +collapses sequences of basic pages into huge pages of either PMD size +or mTHP sizes, if the system is configured to do so =20 The THP behaviour is controlled via :ref:`sysfs ` interface and using madvise(2) and prctl(2) system calls. @@ -212,17 +213,17 @@ PMD-mappable transparent hugepage:: All THPs at fault and collapse time will be added to _deferred_list, and will therefore be split under memory presure if they are considered "underused". A THP is underused if the number of zero-filled pages in -the THP is above max_ptes_none (see below). It is possible to disable -this behaviour by writing 0 to shrink_underused, and enable it by writing -1 to it:: +the THP is above max_ptes_none (see below) scaled by the THP order. It is +possible to disable this behaviour by writing 0 to shrink_underused, and e= nable +it by writing 1 to it:: =20 echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused =20 -khugepaged will be automatically started when PMD-sized THP is enabled +khugepaged will be automatically started when any THP size is enabled (either of the per-size anon control or the top-level control are set to "always" or "madvise"), and it'll be automatically shutdown when -PMD-sized THP is disabled (when both the per-size anon control and the +all THP sizes are disabled (when both the per-size anon control and the top-level control are "never") =20 process THP controls @@ -264,11 +265,6 @@ support the following arguments:: Khugepaged controls ------------------- =20 -.. note:: - khugepaged currently only searches for opportunities to collapse to - PMD-sized THP and no attempt is made to collapse to other THP - sizes. - khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's @@ -296,11 +292,11 @@ allocation failure to throttle the next allocation at= tempt:: The khugepaged progress can be seen in the number of pages collapsed (note that this counter may not be an exact count of the number of pages collapsed, since "collapsed" could mean multiple things: (1) A PTE mapping -being replaced by a PMD mapping, or (2) All 4K physical pages replaced by -one 2M hugepage. Each may happen independently, or together, depending on -the type of memory and the failures that occur. As such, this value should -be interpreted roughly as a sign of progress, and counters in /proc/vmstat -consulted for more accurate accounting):: +being replaced by a PMD mapping, or (2) physical pages replaced by one +hugepage of various sizes (PMD-sized or mTHP). Each may happen independent= ly, +or together, depending on the type of memory and the failures that occur. +As such, this value should be interpreted roughly as a sign of progress, +and counters in /proc/vmstat consulted for more accurate accounting):: =20 /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed =20 @@ -308,16 +304,25 @@ for each pass:: =20 /sys/kernel/mm/transparent_hugepage/khugepaged/full_scans =20 -``max_ptes_none`` specifies how many extra small pages (that are -not already mapped) can be allocated when collapsing a group -of small pages into one large page:: +``max_ptes_none`` specifies how many empty (none/zero) pages are allowed +when collapsing a group of small pages into one large page. This parameter +is scaled by the page order of the attempted collapse to determine eligibi= lity:: =20 /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none =20 -A higher value leads to use additional memory for programs. -A lower value leads to gain less thp performance. Value of -max_ptes_none can waste cpu time very little, you can -ignore it. +For PMD-sized THP collapse, this directly limits the number of empty pages +allowed in the 2MB region. For mTHP collapse, the threshold is scaled by +the order (e.g., for 64K mTHP, the threshold is max_ptes_none >> 4). + +To prevent "creeping" behavior where collapses continuously promote to lar= ger +orders, if max_ptes_none >=3D HPAGE_PMD_NR/2 (255 on 4K page size), it is +capped to HPAGE_PMD_NR/2 - 1 for mTHP collapses. This is due to the fact +that introducing more than half of the pages to be non-zero it will always +satisfy the eligibility check on the next scan and the region will be coll= apse. + +A higher value allows more empty pages, potentially leading to more memory +usage but better THP performance. A lower value is more conservative and +may result in fewer THP collapses. =20 ``max_ptes_swap`` specifies how many pages can be brought in from swap when collapsing a group of pages into a transparent huge page:: @@ -337,6 +342,15 @@ that THP is shared. Exceeding the number would block t= he collapse:: =20 A higher value may increase memory footprint for some workloads. =20 +.. note:: + For mTHP collapse, khugepaged does not support collapsing regions that + contain shared or swapped out pages, as this could lead to continuous + promotion to higher orders. The collapse will fail if any shared or + swapped PTEs are encountered during the scan. + + Currently, madvise_collapse only supports collapsing to PMD-sized THPs + and does not attempt mTHP collapses. + Boot parameters =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.51.0