From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41B7515442C for ; Tue, 19 Aug 2025 13:43:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611003; cv=none; b=kzNTvZpmMoNtIp8sZ90gClBbHSc2no61xdBhWDV3VQmOkRr8WnwLgd4vJPkzEKmwX9kNy6pls93BB9rSYZA2U4w5561Mzl+VVGte/4cQFQ1HgGd700tIOjoTQ0S9+KgNMrv1v81GlGLEPL7eqDBChJJvFYKFpFXD6pJAYawn4sU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611003; c=relaxed/simple; bh=iN+g29VEZVesUOqKzHDnYxT7QRQ2WgN3N+zUOI9NjNI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ReKvRZVveXjr0EoWJjtdQYwTJypzglSkK+PfXdm8lS7fOanPcpE6ydfUtpYtvrDZQTG1tfvizYuGXn256vc+1bT/yRULT9XpJva6gZnhZXl1Jx/L2CHzCYD0HfgKClJNj5Hx3cm2F0bY69C0CcihbPFpM0sGASOx5bOM0fcbA7A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=i1fxAjgb; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="i1fxAjgb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Plh8etSbXdyh8bXmsP8QgAsGXLtsQHnu+WgqFqa+e7M=; b=i1fxAjgbcf8NHqTz8mfpjVxnNeFLctu8tMJC4CnKnGB2uzGW8ohkSnUnLSKk5AnlMGWsrG CW2OpUyuadXO51lZ/nynk2fy6ScOCq9YPLmyPtZcxYqiPlv+TM9fzwuLsXFSTlUIme44sG tofKU7BI53u07sYFGP4BgnZuet9klv8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-544-pgZaXwinPk21EFasYWSvUQ-1; Tue, 19 Aug 2025 09:43:17 -0400 X-MC-Unique: pgZaXwinPk21EFasYWSvUQ-1 X-Mimecast-MFC-AGG-ID: pgZaXwinPk21EFasYWSvUQ_1755610993 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5B881195608E; Tue, 19 Aug 2025 13:43:12 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 74E2619560B0; Tue, 19 Aug 2025 13:42:51 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 01/13] khugepaged: rename hpage_collapse_* to collapse_* Date: Tue, 19 Aug 2025 07:41:53 -0600 Message-ID: <20250819134205.622806-2-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" The hpage_collapse functions describe functions used by madvise_collapse and khugepaged. remove the unnecessary hpage prefix to shorten the function name. Reviewed-by: Liam R. Howlett Reviewed-by: Zi Yan Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Signed-off-by: Nico Pache Reviewed-by: Lance Yang Reviewed-by: Lorenzo Stoakes --- mm/khugepaged.c | 73 ++++++++++++++++++++++++------------------------- 1 file changed, 36 insertions(+), 37 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d3d4f116e14b..0e7bbadf03ee 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -402,14 +402,14 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } =20 -static inline int hpage_collapse_test_exit(struct mm_struct *mm) +static inline int collapse_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) =3D=3D 0; } =20 -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) +static inline int collapse_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || + return collapse_test_exit(mm) || mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); } =20 @@ -444,7 +444,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; =20 /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_BUG_ON_MM(collapse_test_exit(mm), mm); if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) return; =20 @@ -502,7 +502,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * collapse_test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -592,7 +592,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 - /* See hpage_collapse_scan_pmd(). */ + /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; if (cc->is_khugepaged && @@ -848,7 +848,7 @@ struct collapse_control khugepaged_collapse_control =3D= { .is_khugepaged =3D true, }; =20 -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) +static bool collapse_scan_abort(int nid, struct collapse_control *cc) { int i; =20 @@ -883,7 +883,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(v= oid) } =20 #ifdef CONFIG_NUMA -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int collapse_find_target_node(struct collapse_control *cc) { int nid, target_node =3D 0, max_value =3D 0; =20 @@ -902,7 +902,7 @@ static int hpage_collapse_find_target_node(struct colla= pse_control *cc) return target_node; } #else -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int collapse_find_target_node(struct collapse_control *cc) { return 0; } @@ -923,7 +923,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, enum tva_type type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_COLLAPSE; =20 - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(collapse_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; =20 *vmap =3D vma =3D find_vma(mm, address); @@ -996,7 +996,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, =20 /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if hpage_collapse_scan_pmd believes it is worthwhile. + * Only done if khugepaged_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. @@ -1082,7 +1082,7 @@ static int alloc_charge_folio(struct folio **foliop, = struct mm_struct *mm, { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); - int node =3D hpage_collapse_find_target_node(cc); + int node =3D collapse_find_target_node(cc); struct folio *folio; =20 folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); @@ -1268,10 +1268,10 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, return result; } =20 -static int hpage_collapse_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, bool *mmap_locked, - struct collapse_control *cc) +static int collapse_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, bool *mmap_locked, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1382,7 +1382,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *= mm, * hit record. */ node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (collapse_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; goto out_unmap; } @@ -1451,7 +1451,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot= *mm_slot) =20 lockdep_assert_held(&khugepaged_mm_lock); =20 - if (hpage_collapse_test_exit(mm)) { + if (collapse_test_exit(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); @@ -1753,7 +1753,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) continue; =20 - if (hpage_collapse_test_exit(mm)) + if (collapse_test_exit(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2275,9 +2275,9 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, return result; } =20 -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, - struct file *file, pgoff_t start, - struct collapse_control *cc) +static int collapse_scan_file(struct mm_struct *mm, unsigned long addr, + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct folio *folio =3D NULL; struct address_space *mapping =3D file->f_mapping; @@ -2332,7 +2332,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, } =20 node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (collapse_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; folio_put(folio); break; @@ -2382,7 +2382,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, return result; } =20 -static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *resul= t, +static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2420,7 +2420,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, goto breakouterloop_mmap_lock; =20 progress++; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(collapse_test_exit_or_disable(mm))) goto breakouterloop; =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); @@ -2428,7 +2428,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, unsigned long hstart, hend; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) { + if (unlikely(collapse_test_exit_or_disable(mm))) { progress++; break; } @@ -2449,7 +2449,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, bool mmap_locked =3D true; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(collapse_test_exit_or_disable(mm))) goto breakouterloop; =20 VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2462,12 +2462,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigne= d int pages, int *result, =20 mmap_read_unlock(mm); mmap_locked =3D false; - *result =3D hpage_collapse_scan_file(mm, + *result =3D collapse_scan_file(mm, khugepaged_scan.address, file, pgoff, cc); fput(file); if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) + if (collapse_test_exit_or_disable(mm)) goto breakouterloop; *result =3D collapse_pte_mapped_thp(mm, khugepaged_scan.address, false); @@ -2476,7 +2476,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, mmap_read_unlock(mm); } } else { - *result =3D hpage_collapse_scan_pmd(mm, vma, + *result =3D collapse_scan_pmd(mm, vma, khugepaged_scan.address, &mmap_locked, cc); } =20 @@ -2509,7 +2509,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (collapse_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2563,8 +2563,8 @@ static void khugepaged_do_scan(struct collapse_contro= l *cc) pass_through_head++; if (khugepaged_has_work() && pass_through_head < 2) - progress +=3D khugepaged_scan_mm_slot(pages - progress, - &result, cc); + progress +=3D collapse_scan_mm_slot(pages - progress, + &result, cc); else progress =3D pages; spin_unlock(&khugepaged_mm_lock); @@ -2805,12 +2805,11 @@ int madvise_collapse(struct vm_area_struct *vma, un= signed long start, =20 mmap_read_unlock(mm); mmap_locked =3D false; - result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, - cc); + result =3D collapse_scan_file(mm, addr, file, pgoff, cc); fput(file); } else { - result =3D hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); + result =3D collapse_scan_pmd(mm, vma, addr, + &mmap_locked, cc); } if (!mmap_locked) *lock_dropped =3D true; --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E8BB23AE9A for ; Tue, 19 Aug 2025 13:43:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611026; cv=none; b=GD8cSKX+GvROkU6ksq64c/rvw9dNlBlLauzWmuLDFWyboyiW4P9NNanqn03kveh861xIMSfYNDawkKPOg+veqaEZk/lQvTlLK2MljIQIxJ+CYzD9YYD6TxlrncAME+srr1yzVyVf/hyrhgrAI6WoSoQ4WybGuCicY3gmNn99iSQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611026; c=relaxed/simple; bh=8vqQgoUzIAgTTXT/fLDMR24clQkTmQf97jsSGK7CFhs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BZMRqsEPiRcBRdZF8Gwr0OdCHTJeeuEeVLQkZ7azyIEX5VPFuS6mThCJuUMrzB4m/6Xu0GjoYEUyhKvd7VeL+dFKaQjU/UpV5vgXkOjsHH/DLRA6uUpQtP3kJzDhAxDxR2t5iCE4nBPRw0vf7bV36WS3ARuCiV5roGLt/B2U8wA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fQm2qElN; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fQm2qElN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kTFsuAmlEgxzFtD1O/P7BIlE/vfV96f3aKi9QfqF2QQ=; b=fQm2qElNjqbDR+M2u5KxnGDsB1CNH8FVDHROV3TrptZh8LN86fAZy0M2rdGK9MtHfcJVKe yHbqKXxINkIkvArB/Eg3OEkREKfFC3FK11xFpDhlaoHG3/YX3q6VNZZ02YiEzkQ8OfK7GA TSIVZaBterJKltXY+d3ISO75KIcCpSw= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-255-7FDqgCWGPy-XDcg9xC29_g-1; Tue, 19 Aug 2025 09:43:37 -0400 X-MC-Unique: 7FDqgCWGPy-XDcg9xC29_g-1 X-Mimecast-MFC-AGG-ID: 7FDqgCWGPy-XDcg9xC29_g_1755611013 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5306C1800289; Tue, 19 Aug 2025 13:43:32 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2F56319560AB; Tue, 19 Aug 2025 13:43:12 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 02/13] introduce collapse_single_pmd to unify khugepaged and madvise_collapse Date: Tue, 19 Aug 2025 07:41:54 -0600 Message-ID: <20250819134205.622806-3-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" The khugepaged daemon and madvise_collapse have two different implementations that do almost the same thing. Create collapse_single_pmd to increase code reuse and create an entry point to these two users. Refactor madvise_collapse and collapse_scan_mm_slot to use the new collapse_single_pmd function. This introduces a minor behavioral change that is most likely an undiscovered bug. The current implementation of khugepaged tests collapse_test_exit_or_disable before calling collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse case. By unifying these two callers madvise_collapse now also performs this check. Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Signed-off-by: Nico Pache --- mm/khugepaged.c | 94 ++++++++++++++++++++++++++----------------------- 1 file changed, 49 insertions(+), 45 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0e7bbadf03ee..b7b98aebb670 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2382,6 +2382,50 @@ static int collapse_scan_file(struct mm_struct *mm, = unsigned long addr, return result; } =20 +/* + * Try to collapse a single PMD starting at a PMD aligned addr, and return + * the results. + */ +static int collapse_single_pmd(unsigned long addr, + struct vm_area_struct *vma, bool *mmap_locked, + struct collapse_control *cc) +{ + int result =3D SCAN_FAIL; + struct mm_struct *mm =3D vma->vm_mm; + + if (!vma_is_anonymous(vma)) { + struct file *file =3D get_file(vma->vm_file); + pgoff_t pgoff =3D linear_page_index(vma, addr); + + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D collapse_scan_file(mm, addr, file, pgoff, cc); + fput(file); + if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + *mmap_locked =3D true; + if (collapse_test_exit_or_disable(mm)) { + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D SCAN_ANY_PROCESS; + goto end; + } + result =3D collapse_pte_mapped_thp(mm, addr, + !cc->is_khugepaged); + if (result =3D=3D SCAN_PMD_MAPPED) + result =3D SCAN_SUCCEED; + mmap_read_unlock(mm); + *mmap_locked =3D false; + } + } else { + result =3D collapse_scan_pmd(mm, vma, addr, mmap_locked, cc); + } + if (cc->is_khugepaged && result =3D=3D SCAN_SUCCEED) + ++khugepaged_pages_collapsed; +end: + return result; +} + static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result, struct collapse_control *cc) __releases(&khugepaged_mm_lock) @@ -2455,34 +2499,9 @@ static unsigned int collapse_scan_mm_slot(unsigned i= nt pages, int *result, VM_BUG_ON(khugepaged_scan.address < hstart || khugepaged_scan.address + HPAGE_PMD_SIZE > hend); - if (!vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, - khugepaged_scan.address); - - mmap_read_unlock(mm); - mmap_locked =3D false; - *result =3D collapse_scan_file(mm, - khugepaged_scan.address, file, pgoff, cc); - fput(file); - if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { - mmap_read_lock(mm); - if (collapse_test_exit_or_disable(mm)) - goto breakouterloop; - *result =3D collapse_pte_mapped_thp(mm, - khugepaged_scan.address, false); - if (*result =3D=3D SCAN_PMD_MAPPED) - *result =3D SCAN_SUCCEED; - mmap_read_unlock(mm); - } - } else { - *result =3D collapse_scan_pmd(mm, vma, - khugepaged_scan.address, &mmap_locked, cc); - } - - if (*result =3D=3D SCAN_SUCCEED) - ++khugepaged_pages_collapsed; =20 + *result =3D collapse_single_pmd(khugepaged_scan.address, + vma, &mmap_locked, cc); /* move to next address */ khugepaged_scan.address +=3D HPAGE_PMD_SIZE; progress +=3D HPAGE_PMD_NR; @@ -2799,34 +2818,19 @@ int madvise_collapse(struct vm_area_struct *vma, un= signed long start, mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); - if (!vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, addr); =20 - mmap_read_unlock(mm); - mmap_locked =3D false; - result =3D collapse_scan_file(mm, addr, file, pgoff, cc); - fput(file); - } else { - result =3D collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } + result =3D collapse_single_pmd(addr, vma, &mmap_locked, cc); + if (!mmap_locked) *lock_dropped =3D true; =20 -handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; - case SCAN_PTE_MAPPED_HUGEPAGE: - BUG_ON(mmap_locked); - mmap_read_lock(mm); - result =3D collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); - goto handle_result; /* Whitelisted set of results where continuing OK */ + case SCAN_PTE_MAPPED_HUGEPAGE: case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: case SCAN_PTE_UFFD_WP: --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D9C923AE9A for ; Tue, 19 Aug 2025 13:44:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611042; cv=none; b=Oy4CKq56/uY/Hg6GEUK6DWaXdjajGir39JPpd3ENjtKewR9x5HXNL5A8NT4Sn9U5HX3F/JSHveqMYk+wueKAS5jtk9vsmfFaP4l5lD0G96EGR+uJiPlfkl/VyqCcCcymRcCFFEYJeQPiNXx2HsQswbqiweOA1OlATOTnfVT7Y5U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611042; c=relaxed/simple; bh=LYm2VSw2vUWJ5BHSPj7eMQlGB9I4IaIQJVgnfUee0Ig=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pTUMS9lnStNGqMe4D2/NjdPuu9I+4fKsXiMPTBaYcfqQrAijxcYaTEbwUzBsUBOMAvMj0jzPeaX95GvF///rHlYW0ZjJ1leDgP8Xoy6c7o5uiEvWG5+uYCbpfFLipw2ZHvpmoLZulS9T71narzM8o+NkaUpz2qRJo/+ZWZfmpEI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bfUZ9+sU; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bfUZ9+sU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611040; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GDpdJsewucpAnKWnmNVOjt/r9vNeFUjY9axce88FL7g=; b=bfUZ9+sURs47qvPLg6bPB0pylGb25aWmImtAbQ6Ch0nVxk5iGt+Y2xFUA8AENZlullYnb+ b+cDbEpXgei45GSRX+LK48e10HtHl5+EYDILBtlsYQ6IripFr9PymV9yFKL5we3fZGGtjM UlcK25sDQnzsN46//FovXpyB6Gjzh/8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-644-T8nz2qPBMSSy7Z0JKuXeAA-1; Tue, 19 Aug 2025 09:43:56 -0400 X-MC-Unique: T8nz2qPBMSSy7Z0JKuXeAA-1 X-Mimecast-MFC-AGG-ID: T8nz2qPBMSSy7Z0JKuXeAA_1755611032 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0AAD71800294; Tue, 19 Aug 2025 13:43:52 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A709619560B0; Tue, 19 Aug 2025 13:43:32 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 03/13] khugepaged: generalize hugepage_vma_revalidate for mTHP support Date: Tue, 19 Aug 2025 07:41:55 -0600 Message-ID: <20250819134205.622806-4-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" For khugepaged to support different mTHP orders, we must generalize this to check if the PMD is not shared by another VMA and the order is enabled. To ensure madvise_collapse can support working on mTHP orders without the PMD order enabled, we need to convert hugepage_vma_revalidate to take a bitmap of orders. No functional change in this patch. Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache Reviewed-by: Lorenzo Stoakes --- mm/khugepaged.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b7b98aebb670..2d192ec961d2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -917,7 +917,7 @@ static int collapse_find_target_node(struct collapse_co= ntrol *cc) static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long add= ress, bool expect_anon, struct vm_area_struct **vmap, - struct collapse_control *cc) + struct collapse_control *cc, unsigned long orders) { struct vm_area_struct *vma; enum tva_type type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : @@ -930,9 +930,10 @@ static int hugepage_vma_revalidate(struct mm_struct *m= m, unsigned long address, if (!vma) return SCAN_VMA_NULL; =20 + /* Always check the PMD order to insure its not shared by another VMA */ if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, type, PMD_ORDER)) + if (!thp_vma_allowable_orders(vma, vma->vm_flags, type, orders)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1134,7 +1135,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, + BIT(HPAGE_PMD_ORDER)); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1168,7 +1170,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, + BIT(HPAGE_PMD_ORDER)); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -2807,7 +2810,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, mmap_read_lock(mm); mmap_locked =3D true; result =3D hugepage_vma_revalidate(mm, addr, false, &vma, - cc); + cc, BIT(HPAGE_PMD_ORDER)); if (result !=3D SCAN_SUCCEED) { last_fail =3D result; goto out_nolock; --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE2A51D514E for ; Tue, 19 Aug 2025 13:44:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611063; cv=none; b=r4oftHBUVEm1hCNUK8lrNtPjSYwwrSwf928WC02Z8/lWo5u1A6zElgPlDSAoXYUzvpqF+Zd4IQeXCa7THlbeTPDT9y0SyNfdsn0IPBdYkpdw/Z1B90MENqbhOFkCj52jyrODW2XopImwS4BawyvSuHW+1V+7Saf+eH7sJFK1EWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611063; c=relaxed/simple; bh=Gww9g2Nw/ThVEJ9E0jaYpC3ZzrQufk2bo6iJi5cYp/w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PVDaEHk70YTCh53s9hnxddzk4jF4JGR9BixSyAUdf0IqzZwH1qGcUUOSYJzd2QFRwDy+JmCxK3knQA/tbZEzJagXXp1rKYVPACrDCbk6IN6iAzAbLqcLmyaAVHsT4tVLLGWTrSLk/GexTCx2s9yJ4qFkUm1SNxFc773QLl1STfo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XdC7SYt+; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XdC7SYt+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UxQsevBiJ70Rw00mZJVyjG8jNrue1O6FFG6Ma5YZCUg=; b=XdC7SYt+EnvOMEkjHS8FgupOYhn3KM47F155+IlY52/Nre2MzePOfqQaDInFVz7R7MWgvB JpdjoIzq1HlMLjghT6PY25LnnfNcT021xg3Zq7dXORiDtydwH2hiZ5N59jtm6cOkr3eCkg 1i28zvI5ZqBOokudgJX1oznayBbCfqc= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-160-Tgq5OxZFNF6PrFs2YTge4A-1; Tue, 19 Aug 2025 09:44:16 -0400 X-MC-Unique: Tgq5OxZFNF6PrFs2YTge4A-1 X-Mimecast-MFC-AGG-ID: Tgq5OxZFNF6PrFs2YTge4A_1755611052 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 978DA1955BD9; Tue, 19 Aug 2025 13:44:11 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BF58419560AB; Tue, 19 Aug 2025 13:43:52 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 04/13] khugepaged: generalize alloc_charge_folio() Date: Tue, 19 Aug 2025 07:41:56 -0600 Message-ID: <20250819134205.622806-5-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" From: Dev Jain Pass order to alloc_charge_folio() and update mTHP statistics. Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Co-developed-by: Nico Pache Signed-off-by: Nico Pache Signed-off-by: Dev Jain Reviewed-by: Lorenzo Stoakes --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 4 ++++ mm/khugepaged.c | 17 +++++++++++------ 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index a16a04841b96..7ccb93e22852 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -630,6 +630,14 @@ anon_fault_fallback_charge instead falls back to using huge pages with lower orders or small pages even though the allocation was successful. =20 +collapse_alloc + is incremented every time a huge page is successfully allocated for a + khugepaged collapse. + +collapse_alloc_failed + is incremented every time a huge page allocation fails during a + khugepaged collapse. + zswpout is incremented every time a huge page is swapped out to zswap in one piece without splitting. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1ac0d06fb3c1..4ada5d1f7297 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -128,6 +128,8 @@ enum mthp_stat_item { MTHP_STAT_ANON_FAULT_ALLOC, MTHP_STAT_ANON_FAULT_FALLBACK, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, + MTHP_STAT_COLLAPSE_ALLOC, + MTHP_STAT_COLLAPSE_ALLOC_FAILED, MTHP_STAT_ZSWPOUT, MTHP_STAT_SWPIN, MTHP_STAT_SWPIN_FALLBACK, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aac5f0a2cb54..20d005c2c61f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -621,6 +621,8 @@ static struct kobj_attribute _name##_attr =3D __ATTR_RO= (_name) DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FAL= LBACK_CHARGE); +DEFINE_MTHP_STAT_ATTR(collapse_alloc, MTHP_STAT_COLLAPSE_ALLOC); +DEFINE_MTHP_STAT_ATTR(collapse_alloc_failed, MTHP_STAT_COLLAPSE_ALLOC_FAIL= ED); DEFINE_MTHP_STAT_ATTR(zswpout, MTHP_STAT_ZSWPOUT); DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN); DEFINE_MTHP_STAT_ATTR(swpin_fallback, MTHP_STAT_SWPIN_FALLBACK); @@ -686,6 +688,8 @@ static struct attribute *any_stats_attrs[] =3D { #endif &split_attr.attr, &split_failed_attr.attr, + &collapse_alloc_attr.attr, + &collapse_alloc_failed_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2d192ec961d2..77e0d8ee59a0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1079,21 +1079,26 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } =20 static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, - struct collapse_control *cc) + struct collapse_control *cc, unsigned int order) { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); int node =3D collapse_find_target_node(cc); struct folio *folio; =20 - folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); + folio =3D __folio_alloc(gfp, order, node, &cc->alloc_nmask); if (!folio) { *foliop =3D NULL; - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC_FAILED); return SCAN_ALLOC_HUGE_PAGE_FAIL; } =20 - count_vm_event(THP_COLLAPSE_ALLOC); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC); + if (unlikely(mem_cgroup_charge(folio, mm, gfp))) { folio_put(folio); *foliop =3D NULL; @@ -1130,7 +1135,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, */ mmap_read_unlock(mm); =20 - result =3D alloc_charge_folio(&folio, mm, cc); + result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 @@ -1863,7 +1868,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); =20 - result =3D alloc_charge_folio(&new_folio, mm, cc); + result =3D alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out; =20 --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B6C833CE82 for ; Tue, 19 Aug 2025 13:44:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611081; cv=none; b=A9lkgsY7LSwlDCR8k+ZtpHVR+DscBXhetvjzf8eq8AawtzU8D2n+aXN/7VzThrTiUeZlkXmPPqceHEDCOGr+lBeqUVFsIGi0iT3hrw8tq9RGAPQALpP5thfFk9chSdo1ZGvCq+nTGUEw3iaRph677t+KMJ5UUnWEwIyL4ue9EWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611081; c=relaxed/simple; bh=v8/a6Bu4sS/PhiUBBiz+JqkBt5vt6dFq7dsS+bFeT5o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tfVyAt6iKHyGIRp2QsR25bmBoJpGPdb6s9lz4hEcVq3muah19Y9vFq8aXqY9drw1JUHSS8r2+o7tHSP/DZJ2nLi8DkTG9X9Kndv6mhOn0+89mNmMKMYd26HCzrWWj73y4wS+xVGyNaH1w/U7q8ntXOtpobaaayNZliTpysK0Jc4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VaVQBV0p; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VaVQBV0p" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611078; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FIzPuxQHyBdvNqV8dRvpyqjrUljFAc5kAhdT3Gs+PS0=; b=VaVQBV0pEmuChlZK7FCJrEPg+VA7xUhm7ZWJ1/TwDP0vATl42KMPuSpTAu6OYxxk8qqoTa tUYalq8Iyvdm2H9acg4pmCHOsA1l5fU6E0OQ7eF1EMOgkcKBsEDsgNXwjbOta3stNPNZKc iphKlp8f+R88RU45Uy3uR25E+BHVVVM= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-593-Kvf5g2AzN0Gk-RAu_vyxdA-1; Tue, 19 Aug 2025 09:44:35 -0400 X-MC-Unique: Kvf5g2AzN0Gk-RAu_vyxdA-1 X-Mimecast-MFC-AGG-ID: Kvf5g2AzN0Gk-RAu_vyxdA_1755611070 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 86FB6180034B; Tue, 19 Aug 2025 13:44:30 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5698C19560AB; Tue, 19 Aug 2025 13:44:11 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 05/13] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Tue, 19 Aug 2025 07:41:57 -0600 Message-ID: <20250819134205.622806-6-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse can suffer from incosistant behavior, and memory waste "creep". disable swapin and shared support for mTHP collapse. No functional changes in this patch. Reviewed-by: Baolin Wang Acked-by: David Hildenbrand Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 77e0d8ee59a0..074101d03c9d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -551,15 +551,17 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + unsigned int order) { struct page *page =3D NULL; struct folio *folio =3D NULL; pte_t *_pte; int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, referenced = =3D 0; bool writable =3D false; + int scaled_max_ptes_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER= - order); =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + (1 << order); _pte++, address +=3D PAGE_SIZE) { pte_t pteval =3D ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -567,7 +569,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + none_or_zero <=3D scaled_max_ptes_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -595,8 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + /* + * TODO: Support shared pages without leading to further + * mTHP collapses. Currently bringing in new pages via + * shared may cause a future higher order collapse on a + * rescan of the same range. + */ + if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -697,15 +705,16 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + unsigned int order) { - unsigned long end =3D address + HPAGE_PMD_SIZE; + unsigned long end =3D address + (PAGE_SIZE << order); struct folio *src, *tmp; pte_t pteval; pte_t *_pte; unsigned int nr_ptes; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; _pte +=3D nr_ptes, + for (_pte =3D pte; _pte < pte + (1 << order); _pte +=3D nr_ptes, address +=3D nr_ptes * PAGE_SIZE) { nr_ptes =3D 1; pteval =3D ptep_get(_pte); @@ -761,7 +770,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + unsigned int order) { spinlock_t *pmd_ptl; =20 @@ -778,7 +788,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } =20 /* @@ -799,7 +809,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, unsigned int order) { unsigned int i; int result =3D SCAN_SUCCEED; @@ -807,7 +817,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct= folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i =3D 0; i < HPAGE_PMD_NR; i++) { + for (i =3D 0; i < (1 << order); i++) { pte_t pteval =3D ptep_get(pte + i); struct page *page =3D folio_page(folio, i); unsigned long src_addr =3D address + i * PAGE_SIZE; @@ -826,10 +836,10 @@ static int __collapse_huge_page_copy(pte_t *pte, stru= ct folio *folio, =20 if (likely(result =3D=3D SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); =20 return result; } @@ -1005,11 +1015,11 @@ static int check_pmd_still_valid(struct mm_struct *= mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, unsigned int order) { int swapped_in =3D 0; vm_fault_t ret =3D 0; - unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end =3D haddr + (PAGE_SIZE << order); int result; pte_t *pte =3D NULL; spinlock_t *ptl; @@ -1040,6 +1050,19 @@ static int __collapse_huge_page_swapin(struct mm_str= uct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; =20 + /* + * TODO: Support swapin without leading to further mTHP + * collapses. Currently bringing in new pages via swapin may + * cause a future higher order collapse on a rescan of the same + * range. + */ + if (order !=3D HPAGE_PMD_ORDER) { + pte_unmap(pte); + mmap_read_unlock(mm); + result =3D SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte =3D pte; vmf.ptl =3D ptl; ret =3D do_swap_page(&vmf); @@ -1160,7 +1183,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * that case. Continuing to collapse causes inconsistency. */ result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; } @@ -1208,7 +1231,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, + HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; @@ -1238,7 +1262,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D81AF341AA9 for ; Tue, 19 Aug 2025 13:45:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611102; cv=none; b=OmP83EjheJVmoIaI7eCZPEErNSDA7WsT1jwqzM3q0YQJEUq+zIRgvB9XiVBZyzsY3aenfIje6L8CtV4T/cwQVhyMG1xTW1pOwDwHZCj04hpbgxhHyBaft6bEayGoHAeeAf9QkHwlhoQ7oe8hwflrwuPM9WGAEIwMfNWl5jPj7Z0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611102; c=relaxed/simple; bh=C3CkM0PtFlWZxNPO9H/p4CIlzRK/uikK76JN5ILoFY8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CrjQnnKqriJqcm2AhjMQ6JeGbRypVu0kO+IGlBJ8tDm54JDeEU0l71Z+mmMsbqTCfboeUxsE1ImxjeycmbXnseB9gJ1I/90dcJ91lTP8yQqac6XWhAYGxcM11IJ/7ipGSWtvzHDQOtWv+hu4yxetJEMuWwBG4yYaXkc8gTeXbGs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FIDGZ2Lj; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FIDGZ2Lj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611099; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v+ADnhqPDJQAloHp02DAJ9LRNY/hQI9liEeFJZyVYtA=; b=FIDGZ2Ljq3Q3cUepc0WKcBmQatw+FXKITC6SIczIloyFWqkO9aItX4RVOkD/lw4Jqp+oFx VQLpAJFIvvZnhYxBdHHRM18PL1R2hxGHyhQZrTzA4iBhEIsQyPg30eTj12KQPpBe9sQE6d n37rigJcKQDv4PRhZpnAFRDORsFGS88= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-135-W1cbFdp8PPygjDrY8ILXZg-1; Tue, 19 Aug 2025 09:44:55 -0400 X-MC-Unique: W1cbFdp8PPygjDrY8ILXZg-1 X-Mimecast-MFC-AGG-ID: W1cbFdp8PPygjDrY8ILXZg_1755611090 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3C528195608E; Tue, 19 Aug 2025 13:44:50 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4917519560AB; Tue, 19 Aug 2025 13:44:30 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 06/13] khugepaged: add mTHP support Date: Tue, 19 Aug 2025 07:41:58 -0600 Message-ID: <20250819134205.622806-7-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Introduce the ability for khugepaged to collapse to different mTHP sizes. While scanning PMD ranges for potential collapse candidates, keep track of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If mTHPs are enabled we remove the restriction of max_ptes_none during the scan phase so we don't bailout early and miss potential mTHP candidates. A new function collapse_scan_bitmap is used to perform binary recursion on the bitmap and determine the best eligible order for the collapse. A stack struct is used instead of traditional recursion. max_ptes_none will be scaled by the attempted collapse order to determine how "full" an order must be before being considered for collapse. Once we determine what mTHP sizes fits best in that PMD range a collapse is attempted. A minimum collapse order of 2 is used as this is the lowest order supported by anon memory. For orders configured with "always", we perform greedy collapsing to that order without considering bit density. If a mTHP collapse is attempted, but contains swapped out, or shared pages, we don't perform the collapse. This is because adding new entries can lead to new none pages, and these may lead to constant promotion into a higher order (m)THP. A similar issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to the fact that a collapse will introduce at least 2x the number of pages, and on a future scan will satisfy the promotion condition once again. For non-PMD collapse we must leave the anon VMA write locked until after we collapse the mTHP-- in the PMD case all the pages are isolated, but in the non-PMD case this is not true, and we must keep the lock to prevent changes to the VMA from occurring. Currently madv_collapse is not supported and will only attempt PMD collapse. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 + mm/khugepaged.c | 236 +++++++++++++++++++++++++++++-------- 2 files changed, 188 insertions(+), 52 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index eb1946a70cff..d12cdb9ef3ba 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,6 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H +#define KHUGEPAGED_MIN_MTHP_ORDER 2 +#define KHUGEPAGED_MIN_MTHP_NR (1 << KHUGEPAGED_MIN_MTHP_ORDER) +#define MAX_MTHP_BITMAP_SIZE (1 << (ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_= MIN_MTHP_ORDER)) +#define MTHP_BITMAP_SIZE (1 << (HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORD= ER)) =20 #include =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 074101d03c9d..1ad7e00d3fd6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -94,6 +94,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SL= OTS_HASH_BITS); =20 static struct kmem_cache *mm_slot_cache __ro_after_init; =20 +struct scan_bit_state { + u8 order; + u16 offset; +}; + struct collapse_control { bool is_khugepaged; =20 @@ -102,6 +107,18 @@ struct collapse_control { =20 /* nodemask for allocation fallback */ nodemask_t alloc_nmask; + + /* + * bitmap used to collapse mTHP sizes. + * 1bit =3D order KHUGEPAGED_MIN_MTHP_ORDER mTHP + */ + DECLARE_BITMAP(mthp_bitmap, MAX_MTHP_BITMAP_SIZE); + DECLARE_BITMAP(mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); + struct scan_bit_state mthp_bitmap_stack[MAX_MTHP_BITMAP_SIZE]; +}; + +struct collapse_control khugepaged_collapse_control =3D { + .is_khugepaged =3D true, }; =20 /** @@ -854,10 +871,6 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } =20 -struct collapse_control khugepaged_collapse_control =3D { - .is_khugepaged =3D true, -}; - static bool collapse_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -1136,17 +1149,19 @@ static int alloc_charge_folio(struct folio **foliop= , struct mm_struct *mm, =20 static int collapse_huge_page(struct mm_struct *mm, unsigned long address, int referenced, int unmapped, - struct collapse_control *cc) + struct collapse_control *cc, bool *mmap_locked, + unsigned int order, unsigned long offset) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; - pte_t *pte; + pte_t *pte =3D NULL, mthp_pte; pgtable_t pgtable; struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result =3D SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; + unsigned long _address =3D address + offset * PAGE_SIZE; =20 VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 @@ -1155,16 +1170,20 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * The allocation can take potentially a long time if it involves * sync compaction, and we do not need to hold the mmap_lock during * that. We will recheck the vma after taking it again in write mode. + * If collapsing mTHPs we may have already released the read_lock. */ - mmap_read_unlock(mm); + if (*mmap_locked) { + mmap_read_unlock(mm); + *mmap_locked =3D false; + } =20 - result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); + result =3D alloc_charge_folio(&folio, mm, cc, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, - BIT(HPAGE_PMD_ORDER)); + *mmap_locked =3D true; + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, BIT(order= )); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1182,13 +1201,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * released when it fails. So we jump out_nolock directly in * that case. Continuing to collapse causes inconsistency. */ - result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_swapin(mm, vma, _address, pmd, + referenced, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; } =20 mmap_read_unlock(mm); + *mmap_locked =3D false; /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM @@ -1198,8 +1218,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, - BIT(HPAGE_PMD_ORDER)); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, BIT(order= )); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1210,11 +1229,12 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, =20 anon_vma_lock_write(vma->anon_vma); =20 - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, - address + HPAGE_PMD_SIZE); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, + _address + (PAGE_SIZE << order)); mmu_notifier_invalidate_range_start(&range); =20 pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ + /* * This removes any huge TLB entry from the CPU so we won't allow * huge and small TLB entries for the same virtual address to @@ -1228,19 +1248,16 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, _address, &pte_ptl); if (pte) { - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist, - HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_isolate(vma, _address, pte, cc, + &compound_pagelist, order); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; } =20 if (unlikely(result !=3D SCAN_SUCCEED)) { - if (pte) - pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); /* @@ -1255,17 +1272,17 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, } =20 /* - * All pages are isolated and locked so anon_vma rmap - * can't run anymore. + * For PMD collapse all pages are isolated and locked so anon_vma + * rmap can't run anymore */ - anon_vma_unlock_write(vma->anon_vma); + if (order =3D=3D HPAGE_PMD_ORDER) + anon_vma_unlock_write(vma->anon_vma); =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, - vma, address, pte_ptl, - &compound_pagelist, HPAGE_PMD_ORDER); - pte_unmap(pte); + vma, _address, pte_ptl, + &compound_pagelist, order); if (unlikely(result !=3D SCAN_SUCCEED)) - goto out_up_write; + goto out_unlock_anon_vma; =20 /* * The smp_wmb() inside __folio_mark_uptodate() ensures the @@ -1273,33 +1290,115 @@ static int collapse_huge_page(struct mm_struct *mm= , unsigned long address, * write. */ __folio_mark_uptodate(folio); - pgtable =3D pmd_pgtable(_pmd); - - _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); - _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order =3D=3D HPAGE_PMD_ORDER) { + pgtable =3D pmd_pgtable(_pmd); + _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); + _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, address, pmd, _pmd); + update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { /* mTHP collapse */ + mthp_pte =3D mk_pte(&folio->page, vma->vm_page_prot); + mthp_pte =3D maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_ref_add(folio, (1 << order) - 1); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + set_ptes(vma->vm_mm, _address, pte, mthp_pte, (1 << order)); + update_mmu_cache_range(NULL, vma, _address, pte, (1 << order)); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); + spin_unlock(pmd_ptl); + } =20 folio =3D NULL; =20 result =3D SCAN_SUCCEED; +out_unlock_anon_vma: + if (order !=3D HPAGE_PMD_ORDER) + anon_vma_unlock_write(vma->anon_vma); out_up_write: + if (pte) + pte_unmap(pte); mmap_write_unlock(mm); out_nolock: + *mmap_locked =3D false; if (folio) folio_put(folio); trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result); return result; } =20 +/* Recursive function to consume the bitmap */ +static int collapse_scan_bitmap(struct mm_struct *mm, unsigned long addres= s, + int referenced, int unmapped, struct collapse_control *cc, + bool *mmap_locked, unsigned long enabled_orders) +{ + u8 order, next_order; + u16 offset, mid_offset; + int num_chunks; + int bits_set, threshold_bits; + int top =3D -1; + int collapsed =3D 0; + int ret; + struct scan_bit_state state; + bool is_pmd_only =3D (enabled_orders =3D=3D (1 << HPAGE_PMD_ORDER)); + + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER, 0 }; + + while (top >=3D 0) { + state =3D cc->mthp_bitmap_stack[top--]; + order =3D state.order + KHUGEPAGED_MIN_MTHP_ORDER; + offset =3D state.offset; + num_chunks =3D 1 << (state.order); + /* Skip mTHP orders that are not enabled */ + if (!test_bit(order, &enabled_orders)) + goto next_order; + + /* copy the relavant section to a new bitmap */ + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set =3D bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + threshold_bits =3D (HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) + >> (HPAGE_PMD_ORDER - state.order); + + /* Check if the region is "almost full" based on the threshold */ + if (bits_set > threshold_bits || is_pmd_only + || test_bit(order, &huge_anon_orders_always)) { + ret =3D collapse_huge_page(mm, address, referenced, unmapped, + cc, mmap_locked, order, + offset * KHUGEPAGED_MIN_MTHP_NR); + if (ret =3D=3D SCAN_SUCCEED) { + collapsed +=3D (1 << order); + continue; + } + } + +next_order: + if (state.order > 0) { + next_order =3D state.order - 1; + mid_offset =3D offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int collapse_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1307,31 +1406,60 @@ static int collapse_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; + int i; int result =3D SCAN_FAIL, referenced =3D 0; int none_or_zero =3D 0, shared =3D 0; struct page *page =3D NULL; struct folio *folio =3D NULL; unsigned long _address; + unsigned long enabled_orders; spinlock_t *ptl; int node =3D NUMA_NO_NODE, unmapped =3D 0; + bool is_pmd_only; bool writable =3D false; - + int chunk_none_count =3D 0; + int scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - KHUGEP= AGED_MIN_MTHP_ORDER); + unsigned long tva_flags =3D cc->is_khugepaged ? TVA_KHUGEPAGED : TVA_FORC= ED_COLLAPSE; VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 result =3D find_pmd_or_thp_or_none(mm, address, &pmd); if (result !=3D SCAN_SUCCEED) goto out; =20 + bitmap_zero(cc->mthp_bitmap, MAX_MTHP_BITMAP_SIZE); + bitmap_zero(cc->mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + if (cc->is_khugepaged) + enabled_orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, + tva_flags, THP_ORDERS_ALL_ANON); + else + enabled_orders =3D BIT(HPAGE_PMD_ORDER); + + is_pmd_only =3D (enabled_orders =3D=3D (1 << HPAGE_PMD_ORDER)); + pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); if (!pte) { result =3D SCAN_PMD_NULL; goto out; } =20 - for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, _address +=3D PAGE_SIZE) { + for (i =3D 0; i < HPAGE_PMD_NR; i++) { + /* + * we are reading in KHUGEPAGED_MIN_MTHP_NR page chunks. if + * there are pages in this chunk keep track of it in the bitmap + * for mTHP collapsing. + */ + if (i % KHUGEPAGED_MIN_MTHP_NR =3D=3D 0) { + if (i > 0 && chunk_none_count <=3D scaled_none) + bitmap_set(cc->mthp_bitmap, + (i - 1) / KHUGEPAGED_MIN_MTHP_NR, 1); + chunk_none_count =3D 0; + } + + _pte =3D pte + i; + _address =3D address + i * PAGE_SIZE; pte_t pteval =3D ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; @@ -1354,10 +1482,11 @@ static int collapse_scan_pmd(struct mm_struct *mm, } } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + ++chunk_none_count; ++none_or_zero; if (!userfaultfd_armed(vma) && - (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + (!cc->is_khugepaged || !is_pmd_only || + none_or_zero <=3D khugepaged_max_ptes_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -1453,6 +1582,7 @@ static int collapse_scan_pmd(struct mm_struct *mm, address))) referenced++; } + if (!writable) { result =3D SCAN_PAGE_RO; } else if (cc->is_khugepaged && @@ -1465,10 +1595,12 @@ static int collapse_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { - result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked =3D false; + result =3D collapse_scan_bitmap(mm, address, referenced, unmapped, cc, + mmap_locked, enabled_orders); + if (result > 0) + result =3D SCAN_SUCCEED; + else + result =3D SCAN_FAIL; } out: trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced, --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EA66338F58 for ; Tue, 19 Aug 2025 13:45:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611121; cv=none; b=GazYLBbZujosxfVQ/G09/nsplr7q20Tzq0mjCysTVmcFS+ARwv9DkPdKNfOLfAEQOYeLb51+Re3oDRGmuEBfaPy6r6AEyDyMTq92DSPCjFyHC3ekPb4afVSHOIx5vjl2B1lJ9Wvoqfnh5ykoQSK748OgFfDJXYcS3R89MmHOpxE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611121; c=relaxed/simple; bh=xZpSapUCIiloQfsuG8wFnmJCS++oXh0EPCXqiTM0+v0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=b7oqRjndu1JPiNZ1xPlSkyzqVgU4hSXniI7DjmjbsolYM+kA6Rd4aJ/m8RYwP2fUKk1keABvyXE9MlVQOeJJz2gXkGCwqVwqiaGq+q+9sUCE/tJTT4hQTXdh7dv5Fo6nO9UDvz0mpnmkiI8f2wfKByCW9BdfqXrWak5eoecqQDU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=c86YeMJ7; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="c86YeMJ7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9QiOibDnmxU0r/BF2aR9AwgLrcohp+xyaMN8bjnYCcI=; b=c86YeMJ7SNUAr2Zs5hvhVoBvET5I/KR8oigqUrEn4IjQY7R8+xvF5ezkS60tu/PYfboYSg cT8MYuIf6zfVxqLyOYEtbT11+Ga0tOdqvfXMMBFY27sq4paeFpEQPCz2OofR1HgKhGKobY s/9L0vbkrYCtSiLFKW/gyFqg8UBMgO0= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-368-JMbLXzPAN8e6Zz9iKHx7DA-1; Tue, 19 Aug 2025 09:45:16 -0400 X-MC-Unique: JMbLXzPAN8e6Zz9iKHx7DA-1 X-Mimecast-MFC-AGG-ID: JMbLXzPAN8e6Zz9iKHx7DA_1755611109 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6015D180034A; Tue, 19 Aug 2025 13:45:08 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EF37F19560AB; Tue, 19 Aug 2025 13:44:50 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 07/13] khugepaged: skip collapsing mTHP to smaller orders Date: Tue, 19 Aug 2025 07:41:59 -0600 Message-ID: <20250819134205.622806-8-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" khugepaged may try to collapse a mTHP to a smaller mTHP, resulting in some pages being unmapped. Skip these cases until we have a way to check if its ok to collapse to a smaller mTHP size (like in the case of a partially mapped folio). This patch is inspired by Dev Jain's work on khugepaged mTHP support [1]. [1] https://lore.kernel.org/lkml/20241216165105.56185-11-dev.jain@arm.com/ Acked-by: David Hildenbrand Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache Reviewed-by: Lorenzo Stoakes --- mm/khugepaged.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1ad7e00d3fd6..6a4cf7e4a7cc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -611,6 +611,15 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 + /* + * TODO: In some cases of partially-mapped folios, we'd actually + * want to collapse. + */ + if (order !=3D HPAGE_PMD_ORDER && folio_order(folio) >=3D order) { + result =3D SCAN_PTE_MAPPED_HUGEPAGE; + goto out; + } + /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A826419C540 for ; Tue, 19 Aug 2025 13:45:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611136; cv=none; b=unjf8w4hEpavf7bRK8QFtGE+zUXNnVe0CwTTa1YIt05RJgroABPJv1358OjWj1qMvtf3OZau/ToHOyzNhv5VD2Q0m86Zm4RUKDjWNy1c24EM++tIVQuRwj+BhCy64wpmu/JJWWTbiQKa7hceDd7Br1+02OQJWX110Q9CfW+0vTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611136; c=relaxed/simple; bh=BcBkd2ONu3qEsJ8amjqoMv8e03xJ7JS1NkQC4inL7hY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t9X9/0t3LZ9TcLkaMVTSbSWT8jMAykfcl/7uGbKVa3VA0U5vltUOWcg7Der9g9BWbRbzjUJ1wwr/dGNUoSLW7z5RFy2i0IYLZh/MPJcJwcamuXUGMw68QSEMGlPpWxFzgoD90/1KJyBNQHa7qCt1V6NKWGEFoNbhiULqqdM/MHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OnPLCy8o; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OnPLCy8o" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611133; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V5DchuEmH6POX8OSi9R3/H3ulGAsujcIoqxqs5qQqIk=; b=OnPLCy8oZZm4CHdrwQmQ0xCdYeAl6pCIvzOCqwASk8IR/EF4718WqaUdBgerHc3kPLrucy qpCRIR4vKv8zZJtoiteFjQZoaMCb5lHb+e8DCqyb6AgIPHbBLhyZHNzrks4nBra8ZPSZNB KA+3n0xEUEqBVwJbN+oCmB/tWy7yzYc= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-682-kyYq5qvsNpunGuM_q3lK-Q-1; Tue, 19 Aug 2025 09:45:31 -0400 X-MC-Unique: kyYq5qvsNpunGuM_q3lK-Q-1 X-Mimecast-MFC-AGG-ID: kyYq5qvsNpunGuM_q3lK-Q_1755611126 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8979F180034D; Tue, 19 Aug 2025 13:45:26 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3C00D19560BC; Tue, 19 Aug 2025 13:45:08 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 08/13] khugepaged: avoid unnecessary mTHP collapse attempts Date: Tue, 19 Aug 2025 07:42:00 -0600 Message-ID: <20250819134205.622806-9-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" There are cases where, if an attempted collapse fails, all subsequent orders are guaranteed to also fail. Avoid these collapse attempts by bailing out early. Signed-off-by: Nico Pache --- mm/khugepaged.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6a4cf7e4a7cc..7d9b5100bea1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1389,10 +1389,39 @@ static int collapse_scan_bitmap(struct mm_struct *m= m, unsigned long address, ret =3D collapse_huge_page(mm, address, referenced, unmapped, cc, mmap_locked, order, offset * KHUGEPAGED_MIN_MTHP_NR); - if (ret =3D=3D SCAN_SUCCEED) { + + /* + * Analyze failure reason to determine next action: + * - goto next_order: try smaller orders in same region + * - continue: try other regions at same order + * - break: stop all attempts (system-wide failure) + */ + switch (ret) { + /* Cases were we should continue to the next region */ + case SCAN_SUCCEED: collapsed +=3D (1 << order); + case SCAN_PAGE_RO: + case SCAN_PTE_MAPPED_HUGEPAGE: continue; + /* Cases were lower orders might still succeed */ + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_EXCEED_NONE_PTE: + case SCAN_EXCEED_SWAP_PTE: + case SCAN_EXCEED_SHARED_PTE: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LRU: + case SCAN_PAGE_NULL: + case SCAN_DEL_PAGE_LRU: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_ALLOC_HUGE_PAGE_FAIL: + goto next_order; + /* All other cases should stop collapse attempts */ + default: + break; } + break; } =20 next_order: --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69FB63376A7 for ; Tue, 19 Aug 2025 13:45:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611156; cv=none; b=PiqaZ0h6L8q7vfBgy4x8N5qc5PAlKag7cwyhwhw/f6y0b7DyuKVC6Kq/dxjLSJNj8kNI2NqTsfDyqkpbEFoG7cCaH1Ae1QRE93Fum+3J5Cu4+Dyll/qlHqBTpcE4fovSg8bnZUIL3c43J7exHcHKmJPbvO6wsXAfZMvigY9Wwjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611156; c=relaxed/simple; bh=bUoF8o+qhtqbCIHGE+B5eohfl/qebE8G1Jg9K7je0to=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OJHLTTQRM1FDBsMjAf7sSRntwUVSgjN+J9RXihvF0HAvmjmoyHL/oyp0f+hNem4d45MGUHQI6Q+9TNsQAOXhOU7SeV0Tps+AObTL6eGeNUFLKkXe3PvzW4Mw8Dgg+ccohS9tjEtmy/xA6XXso/EmhLBpw5O+7pGjWrMsTaxj3Q0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FYsFjFlf; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FYsFjFlf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611154; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ENywRgeFkaGbdbIXNdDIup+iQujNfOziwpxb6S+KNKI=; b=FYsFjFlfCBWE8ttksK6LTr18LVCCt0tjV4sHc0ty1+SeUffjflGIqcwUSrgIAKIdCFdhMo 7Ou8YXicCMA5wV8X2cpkgC02I+nNTVjSN5RtP/R7gqH/O4g/Ju2YRGai3laCWsIzpX+UE2 O0R1+VqjWwho8vcPGpd3rZ2qJJamG7Y= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-421-PodVLyjuPvO85QIsdcVHrg-1; Tue, 19 Aug 2025 09:45:49 -0400 X-MC-Unique: PodVLyjuPvO85QIsdcVHrg-1 X-Mimecast-MFC-AGG-ID: PodVLyjuPvO85QIsdcVHrg_1755611145 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 044EC19775E2; Tue, 19 Aug 2025 13:45:45 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4934119560AB; Tue, 19 Aug 2025 13:45:26 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 09/13] khugepaged: enable collapsing mTHPs even when PMD THPs are disabled Date: Tue, 19 Aug 2025 07:42:01 -0600 Message-ID: <20250819134205.622806-10-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" From: Baolin Wang We have now allowed mTHP collapse, but thp_vma_allowable_order() still only checks if the PMD-sized mTHP is allowed to collapse. This prevents scanning and collapsing of 64K mTHP when only 64K mTHP is enabled. Thus, we should modify the checks to allow all large orders of anonymous mTHP. Acked-by: David Hildenbrand Signed-off-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7d9b5100bea1..2cadd07341de 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -491,7 +491,11 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) + unsigned long orders =3D vma_is_anonymous(vma) ? + THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); + + if (thp_vma_allowable_orders(vma, vm_flags, TVA_KHUGEPAGED, + orders)) __khugepaged_enter(vma->vm_mm); } } @@ -2671,6 +2675,8 @@ static unsigned int collapse_scan_mm_slot(unsigned in= t pages, int *result, =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); for_each_vma(vmi, vma) { + unsigned long orders =3D vma_is_anonymous(vma) ? + THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); unsigned long hstart, hend; =20 cond_resched(); @@ -2678,7 +2684,8 @@ static unsigned int collapse_scan_mm_slot(unsigned in= t pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORD= ER)) { + if (!thp_vma_allowable_orders(vma, vma->vm_flags, + TVA_KHUGEPAGED, orders)) { skip: progress++; continue; --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D6DA2EB850 for ; Tue, 19 Aug 2025 13:46:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611173; cv=none; b=rhKlM32UtAkUf/EwkXrVPjCka+Vcv3vussvIcNgo0903sFtpWyix0O4uboX+hpvKtUsLWHlZkHD112SKkZMbtFr/8zQyGRTb2QZ3Q/uvP70qkGiGb/aDJY2mtV+/nBVCDdojuD7GT6oUYyNZrerrQk7FCVfUpVlGn8Yi1mQr7Js= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611173; c=relaxed/simple; bh=sa4qO7krtlD7ST+deM/6/7y6HksSEi2y1TWKK9Uw4kU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=toK0Cn8+vq0jfvCM+jmJwEHMYAScQsKoP5k+gPf+BWhAZyXoIPRf5n407XAVQC/51VdQso0QjFma4VV6G64sXPC6nqYbmRQGXuDmVVwUSd/6vmdkt4OzQCRJ8izxt1b/k2HLAiriQwVQxJCaeTKhj2rqJCA2AC5ToDebzJmK6+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Umln8OHC; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Umln8OHC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611171; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wAoZXSzkkV5CAmqb9aRsF+eQRwNUz1x/zfG2eNwgvP0=; b=Umln8OHC4IC8HqPyQaopS1ungS+kYs6uqakJ38Ufb5oeZ+YRQHkSo4j9oOX3KpDuBLGsPJ Q3KHpvoGP2r/5OIl8cq6rIh6wWKQFjYSIRacyErghRzHI/4pfKRcqGjHQT9ZohOjA2uH+S ZRfHmdsdxOHQfWGi4uDZwQXRw0aS7J8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-367-U_3tR3dbMQ2wvlq4rTnUFA-1; Tue, 19 Aug 2025 09:46:07 -0400 X-MC-Unique: U_3tR3dbMQ2wvlq4rTnUFA-1 X-Mimecast-MFC-AGG-ID: U_3tR3dbMQ2wvlq4rTnUFA_1755611163 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 443C619541B8; Tue, 19 Aug 2025 13:46:03 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BFBB019560AB; Tue, 19 Aug 2025 13:45:45 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 10/13] khugepaged: kick khugepaged for enabling none-PMD-sized mTHPs Date: Tue, 19 Aug 2025 07:42:02 -0600 Message-ID: <20250819134205.622806-11-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" From: Baolin Wang When only non-PMD-sized mTHP is enabled (such as only 64K mTHP enabled), we should also allow kicking khugepaged to attempt scanning and collapsing 64K mTHP. Modify hugepage_pmd_enabled() to support mTHP collapse, and while we are at it, rename it to make the function name more clear. Signed-off-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2cadd07341de..81d2ffd56ab9 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -430,7 +430,7 @@ static inline int collapse_test_exit_or_disable(struct = mm_struct *mm) mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); } =20 -static bool hugepage_pmd_enabled(void) +static bool hugepage_enabled(void) { /* * We cover the anon, shmem and the file-backed case here; file-backed @@ -442,11 +442,11 @@ static bool hugepage_pmd_enabled(void) if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && hugepage_global_enabled()) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_always)) + if (READ_ONCE(huge_anon_orders_always)) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_madvise)) + if (READ_ONCE(huge_anon_orders_madvise)) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) && + if (READ_ONCE(huge_anon_orders_inherit) && hugepage_global_enabled()) return true; if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled()) @@ -490,7 +490,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags) { if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && - hugepage_pmd_enabled()) { + hugepage_enabled()) { unsigned long orders =3D vma_is_anonymous(vma) ? THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); =20 @@ -2762,7 +2762,7 @@ static unsigned int collapse_scan_mm_slot(unsigned in= t pages, int *result, =20 static int khugepaged_has_work(void) { - return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled(); + return !list_empty(&khugepaged_scan.mm_head) && hugepage_enabled(); } =20 static int khugepaged_wait_event(void) @@ -2835,7 +2835,7 @@ static void khugepaged_wait_work(void) return; } =20 - if (hugepage_pmd_enabled()) + if (hugepage_enabled()) wait_event_freezable(khugepaged_wait, khugepaged_wait_event()); } =20 @@ -2866,7 +2866,7 @@ static void set_recommended_min_free_kbytes(void) int nr_zones =3D 0; unsigned long recommended_min; =20 - if (!hugepage_pmd_enabled()) { + if (!hugepage_enabled()) { calculate_min_free_kbytes(); goto update_wmarks; } @@ -2916,7 +2916,7 @@ int start_stop_khugepaged(void) int err =3D 0; =20 mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled()) { + if (hugepage_enabled()) { if (!khugepaged_thread) khugepaged_thread =3D kthread_run(khugepaged, NULL, "khugepaged"); @@ -2942,7 +2942,7 @@ int start_stop_khugepaged(void) void khugepaged_min_free_kbytes_update(void) { mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled() && khugepaged_thread) + if (hugepage_enabled() && khugepaged_thread) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B452341ACD for ; Tue, 19 Aug 2025 13:46:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611194; cv=none; b=OreU6ZIlVNimhrUvKDeTA7N50OMowHf09rVaTvtVZLgdwSNKBamVJ4EPrWxZw03J0qkUE87LkKDuvSBm0TaZDcQGVTMNP75epLdbMIgzeq8dxx7GjZ97kMy0UO5wiP3qS7WsGYnh5AsQe3LiAYQqRWpS/Fo3feRyAbIS323fKOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611194; c=relaxed/simple; bh=g1AOsksZl8Di8E8u1ulIAn7HjaaSJqpRiRKp8peqsiM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mKnADHaRaY0f9hHgWi8ZhibaeO1lzIhI7HLuMj1cbh9SQfZem9UwmN+hMcA3o6DoD7ZnFVSk0Dcx0iRLa8JN5vTSH7OHGzVpBLu2JKQgemsH7GtQjPlnaykxf+/3HvNNVY4dUdtAQjym6nW3/l+6ZsYypK0mGmp59W6pWPjSqLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=c+hRf/D/; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="c+hRf/D/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611191; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nTgdfYjH9mbf6+JVSypR+R05kCmlws5kd8OCk6LcGFs=; b=c+hRf/D/HGxJlkcJ0NTbO/raB1CoMlpfxIE+gySM74YamaNnqVPQ9lx7/9NsPDPv36zWM5 p3An/dF0mIIUGTYIZKlENnd9HL2Y411lvGvHNeUW+oa5gemKz9srUQMEIOfdaSNOl78t8X Am27wksuiQN5RihnhFI59xZuU4V6sxA= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-578-q1pcZ4KyN4G2YDVS6oxUJA-1; Tue, 19 Aug 2025 09:46:26 -0400 X-MC-Unique: q1pcZ4KyN4G2YDVS6oxUJA-1 X-Mimecast-MFC-AGG-ID: q1pcZ4KyN4G2YDVS6oxUJA_1755611182 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id AD6D31954194; Tue, 19 Aug 2025 13:46:21 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 05F3C19560AB; Tue, 19 Aug 2025 13:46:03 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 11/13] khugepaged: improve tracepoints for mTHP orders Date: Tue, 19 Aug 2025 07:42:03 -0600 Message-ID: <20250819134205.622806-12-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Add the order to the tracepoints to give better insight into what order is being operated at for khugepaged. Acked-by: David Hildenbrand Reviewed-by: Baolin Wang Signed-off-by: Nico Pache Reviewed-by: Lorenzo Stoakes --- include/trace/events/huge_memory.h | 34 +++++++++++++++++++----------- mm/khugepaged.c | 10 +++++---- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index 2305df6cb485..56aa8c3b011b 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -92,34 +92,37 @@ TRACE_EVENT(mm_khugepaged_scan_pmd, =20 TRACE_EVENT(mm_collapse_huge_page, =20 - TP_PROTO(struct mm_struct *mm, int isolated, int status), + TP_PROTO(struct mm_struct *mm, int isolated, int status, unsigned int ord= er), =20 - TP_ARGS(mm, isolated, status), + TP_ARGS(mm, isolated, status, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, isolated) __field(int, status) + __field(unsigned int, order) ), =20 TP_fast_assign( __entry->mm =3D mm; __entry->isolated =3D isolated; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, isolated=3D%d, status=3D%s", + TP_printk("mm=3D%p, isolated=3D%d, status=3D%s order=3D%u", __entry->mm, __entry->isolated, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_isolate, =20 TP_PROTO(struct folio *folio, int none_or_zero, - int referenced, bool writable, int status), + int referenced, bool writable, int status, unsigned int order), =20 - TP_ARGS(folio, none_or_zero, referenced, writable, status), + TP_ARGS(folio, none_or_zero, referenced, writable, status, order), =20 TP_STRUCT__entry( __field(unsigned long, pfn) @@ -127,6 +130,7 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __field(int, referenced) __field(bool, writable) __field(int, status) + __field(unsigned int, order) ), =20 TP_fast_assign( @@ -135,27 +139,31 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __entry->referenced =3D referenced; __entry->writable =3D writable; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, writable= =3D%d, status=3D%s", + TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, writable= =3D%d, status=3D%s order=3D%u", __entry->pfn, __entry->none_or_zero, __entry->referenced, __entry->writable, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_swapin, =20 - TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret), + TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret, + unsigned int order), =20 - TP_ARGS(mm, swapped_in, referenced, ret), + TP_ARGS(mm, swapped_in, referenced, ret, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, swapped_in) __field(int, referenced) __field(int, ret) + __field(unsigned int, order) ), =20 TP_fast_assign( @@ -163,13 +171,15 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->swapped_in =3D swapped_in; __entry->referenced =3D referenced; __entry->ret =3D ret; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d", + TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d, order=3D%= u", __entry->mm, __entry->swapped_in, __entry->referenced, - __entry->ret) + __entry->ret, + __entry->order) ); =20 TRACE_EVENT(mm_khugepaged_scan_file, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 81d2ffd56ab9..c13bc583a368 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -721,13 +721,14 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, } else { result =3D SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(folio, none_or_zero, - referenced, writable, result); + referenced, writable, result, + order); return result; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(folio, none_or_zero, - referenced, writable, result); + referenced, writable, result, order); return result; } =20 @@ -1123,7 +1124,8 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, =20 result =3D SCAN_SUCCEED; out: - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result, + order); return result; } =20 @@ -1348,7 +1350,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, *mmap_locked =3D false; if (folio) folio_put(folio); - trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result); + trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result, order= ); return result; } =20 --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D33ED23FC54 for ; Tue, 19 Aug 2025 14:17:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755613027; cv=none; b=R5QqwCSsSIW0+CxlGE6aNS/NwI53YyUyhxOm3ne4RZGOQWBRC4QcA4XCLTlNoYEJuV7pUEDMB4gcq/8L+r0z5+wDhf0JURo5+XOS5rGkwlihtU15wW8yKOVCH1J7dgSVqSkSCV1kXEQZgccWPAoPZYUnvuuaQYPWfgcT4l/qhjI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755613027; c=relaxed/simple; bh=fIvuRJk9m8C5l+h2MFh0h7kc9zU68Ycc+a0H4sEdmMo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NUZFUgWqF9HM+gjpCH0W54Tlev9dytgj+8p1u3VbDnnFSd1D3HufEHGpHGoV5Yurtfe2W3LMd0KfTUIvPBqor7mQUM6gbkdaFVJGJ6H4S71U2GQKZALsJeHIRXyUo9Kw4ll/BkGLeUd6xCFp37Hpj+W9++6Ml6mbneq86vbuIA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dFGehMZE; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dFGehMZE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755613024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AyWTWLJP6DW1C6EfE8Dr/2WHmv98BO2UL0YW0p387dM=; b=dFGehMZEqxVgCQDe5Vd93MNvYCH4+Z+nfW+jx/um9YKB2cZ7mRuBAwZ4YaqFxcAtS9/Bus 0ikW09dLG9egTs9hNF2FKYdNZk2n8tKRPoQheRZ14XupMnJgTTdOdf0xXGGK0PUAflABHp s/JV99t7eliT5OMf+o+XGYaFFKqUSdo= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-97-WICi4W53MiWUgZ8wwI-hHQ-1; Tue, 19 Aug 2025 10:17:00 -0400 X-MC-Unique: WICi4W53MiWUgZ8wwI-hHQ-1 X-Mimecast-MFC-AGG-ID: WICi4W53MiWUgZ8wwI-hHQ_1755613012 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9BCDA188280A; Tue, 19 Aug 2025 14:16:50 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4EBB019560BC; Tue, 19 Aug 2025 14:16:28 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com Subject: [PATCH v10 12/13] khugepaged: add per-order mTHP khugepaged stats Date: Tue, 19 Aug 2025 08:16:10 -0600 Message-ID: <20250819141610.626140-1-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" With mTHP support inplace, let add the per-order mTHP stats for exceeding NONE, SWAP, and SHARED. Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 17 +++++++++++++++++ include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 16 +++++++++++++--- 4 files changed, 40 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 7ccb93e22852..b85547ac4fe9 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -705,6 +705,23 @@ nr_anon_partially_mapped an anonymous THP as "partially mapped" and count it here, even thou= gh it is not actually partially mapped anymore. =20 +collapse_exceed_swap_pte + The number of anonymous THP which contain at least one swap PTE. + Currently khugepaged does not support collapsing mTHP regions that + contain a swap PTE. + +collapse_exceed_none_pte + The number of anonymous THP which have exceeded the none PTE thresh= old. + With mTHP collapse, a bitmap is used to gather the state of a PMD r= egion + and is then recursively checked from largest to smallest order agai= nst + the scaled max_ptes_none count. This counter indicates that the next + enabled order will be checked. + +collapse_exceed_shared_pte + The number of anonymous THP which contain at least one shared PTE. + Currently khugepaged does not support collapsing mTHP regions that + contain a shared PTE. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4ada5d1f7297..6f1593d0b4b5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -144,6 +144,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 20d005c2c61f..9f0470c3e983 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -639,6 +639,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FA= ILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALL= Y_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_= SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_= NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEE= D_SHARED); + =20 static struct attribute *anon_stats_attrs[] =3D { &anon_fault_alloc_attr.attr, @@ -655,6 +659,9 @@ static struct attribute *anon_stats_attrs[] =3D { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c13bc583a368..5a3386043f39 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -594,7 +594,9 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, continue; } else { result =3D SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } } @@ -633,10 +635,17 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, * shared may cause a future higher order collapse on a * rescan of the same range. */ - if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (order !=3D HPAGE_PMD_ORDER) { + result =3D SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out; } } @@ -1084,6 +1093,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, * range. */ if (order !=3D HPAGE_PMD_ORDER) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); pte_unmap(pte); mmap_read_unlock(mm); result =3D SCAN_EXCEED_SWAP_PTE; --=20 2.50.1 From nobody Sat Oct 4 06:33:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7560724BD03 for ; Tue, 19 Aug 2025 14:18:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755613106; cv=none; b=HZ0KEpS8+3Q6E8W5G+k1ZQO2kC3XO+cZFBw0eNdWsowVnG2z7rBwOssMhsz3EBCU8XOQ0ISEzs12xL1PVpe50u2EWkOis3vlgc72dUmi4cLBx+H7PcrP+NSaiE1OdkTJKr1aYzml8lBuC/JrPeKSIRlpsbCHKLfr54LsF6hKQt0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755613106; c=relaxed/simple; bh=4MB8zoNyUP9PNYSZUbYxzZb9F2jIAlPMVEyNAZh5fE4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HTxFuBjgeS+Coi5AKiZLdPltZg7R1GSeJrzAudhYowqI51qIfuiAS/pbrxT61GVKfXzGCmHpR6o1gE9YYMaO4AvsyZTp+IP5t4Np2EsBDTeMRNMD3709MK9OYSkQVsj20thzjFn5vMvjNQhJaMwGV+R6HeAnfZEj7vrMBT16fxk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PgFjKqMd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PgFjKqMd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755613103; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KQ5SyoE/MYPNZDWioj0kryq2voWpDfBzmzGzBWhm6as=; b=PgFjKqMdmuH82lmSahnqaGalKTh786HJihW+//jf6oo/QaBQCWfFe+Y52HfCseoH2Yp1bz lfNmH4mGNl087BKIyhIxqVEIfsFUa6Khx/tMV6LGIqxCeNlajGj6UlmeoxDVboXdLBfvG/ eE17oGWFOTbym3LIXPLeM61UieufsEI= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-175-Tt6NCjBUM0KnyyV-_L6s1A-1; Tue, 19 Aug 2025 10:18:18 -0400 X-MC-Unique: Tt6NCjBUM0KnyyV-_L6s1A-1 X-Mimecast-MFC-AGG-ID: Tt6NCjBUM0KnyyV-_L6s1A_1755613094 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E835F19775AD; Tue, 19 Aug 2025 14:18:12 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0D8AD180028A; Tue, 19 Aug 2025 14:17:53 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, Bagas Sanjaya Subject: [PATCH v10 13/13] Documentation: mm: update the admin guide for mTHP collapse Date: Tue, 19 Aug 2025 08:17:42 -0600 Message-ID: <20250819141742.626517-1-npache@redhat.com> In-Reply-To: <20250819134205.622806-1-npache@redhat.com> References: <20250819134205.622806-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Now that we can collapse to mTHPs lets update the admin guide to reflect these changes and provide proper guidence on how to utilize it. Reviewed-by: Bagas Sanjaya Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index b85547ac4fe9..1f9e6a32052c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -63,7 +63,7 @@ often. THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is ``khugepaged`` daemon that scans memory and -collapses sequences of basic pages into PMD-sized huge pages. +collapses sequences of basic pages into huge pages. =20 The THP behaviour is controlled via :ref:`sysfs ` interface and using madvise(2) and prctl(2) system calls. @@ -149,6 +149,18 @@ hugepage sizes have enabled=3D"never". If enabling mul= tiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation. =20 +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size +to determine collapses. When using mTHPs it's recommended to set +max_ptes_none low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page +size). This will prevent undesired "creep" behavior that leads to +continuously collapsing to the largest mTHP size; when we collapse, we are +bringing in new non-zero pages that will, on a subsequent scan, cause the +max_ptes_none check of the +1 order to always be satisfied. By limiting +this to less than half the current order, we make sure we don't cause this +feedback loop. max_ptes_shared and max_ptes_swap have no effect when +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out +pages. + It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular @@ -264,11 +276,6 @@ support the following arguments:: Khugepaged controls ------------------- =20 -.. note:: - khugepaged currently only searches for opportunities to collapse to - PMD-sized THP and no attempt is made to collapse to other THP - sizes. - khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's --=20 2.50.1