From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C977279904 for ; Mon, 28 Apr 2025 18:13:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745863986; cv=none; b=PO/rt1HzeOSwPj24MbIs+p06sSyjHCS2JdH1tQUDRQye1pXcLvRhco7mgRDjB2WYZRsaKsHxO5pgkDXi4QSYeBlEyImUD6X347ZhxclbtgpdYavRaBXGClfEoNEgVzI0yvc/uI1czfRr6qPqAXmmA6AFlkGIwBIq9yikSLfgXwI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745863986; c=relaxed/simple; bh=CsgR5+bMGchqL9CAoNmcvRJFLa4yTBHYcPTEmqHGhSQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GoooXwhciV8d4oFYJRBZLuLyq9Pdk01CGzFiEZOXljobSjT2gNuPP4kzCJn+s1ADOS4moInCgYoRKOHIcbfDLi3MleilpJkDcnamUTh+fTPVi0HocIQ4MAUIBy32zqDEcvR1S+/cbDwn1GpQO4fBmud1QAXfzoPhMRZQKPraebA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=d3Hz0I6Y; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="d3Hz0I6Y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745863983; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uLQRlgJj5XW7yzC0aMlBZ48qrkE0W51VjOc5G9NcLLA=; b=d3Hz0I6YrARx2c1O0/UiL6VykTZeVb44fdxDMqfQfYV6rOwJrKh50+Fm6y0Mmgo15c4b4o +ncgCIqFghqDzCbefagjxyn+WBHtjso451CyKcrzNMIgaa3NAA8EIpvXwqaJ45IrRzJgno Ita1hVsgtqnUfLFdzy4Ud2dVVIeRXUM= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-410-PjeTqJ5zPVKoivghpWlzew-1; Mon, 28 Apr 2025 14:13:00 -0400 X-MC-Unique: PjeTqJ5zPVKoivghpWlzew-1 X-Mimecast-MFC-AGG-ID: PjeTqJ5zPVKoivghpWlzew_1745863976 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id ED298180036D; Mon, 28 Apr 2025 18:12:54 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 68DB418009BC; Mon, 28 Apr 2025 18:12:46 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 01/12] khugepaged: rename hpage_collapse_* to khugepaged_* Date: Mon, 28 Apr 2025 12:12:07 -0600 Message-ID: <20250428181218.85925-2-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" functions in khugepaged.c use a mix of hpage_collapse and khugepaged as the function prefix. rename all of them to khugepaged to keep things consistent and slightly shorten the function names. Signed-off-by: Nico Pache Reviewed-by: Baolin Wang --- mm/khugepaged.c | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5cf204ab6af0..8b96274dce07 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -402,14 +402,14 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } =20 -static inline int hpage_collapse_test_exit(struct mm_struct *mm) +static inline int khugepaged_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) =3D=3D 0; } =20 -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) +static inline int khugepaged_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || + return khugepaged_test_exit(mm) || test_bit(MMF_DISABLE_THP, &mm->flags); } =20 @@ -444,7 +444,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; =20 /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) return; =20 @@ -503,7 +503,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * khugepaged_test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -851,7 +851,7 @@ struct collapse_control khugepaged_collapse_control =3D= { .is_khugepaged =3D true, }; =20 -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; =20 @@ -886,7 +886,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(v= oid) } =20 #ifdef CONFIG_NUMA -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { int nid, target_node =3D 0, max_value =3D 0; =20 @@ -905,7 +905,7 @@ static int hpage_collapse_find_target_node(struct colla= pse_control *cc) return target_node; } #else -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -925,7 +925,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, struct vm_area_struct *vma; unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; =20 - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; =20 *vmap =3D vma =3D find_vma(mm, address); @@ -992,7 +992,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, =20 /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if hpage_collapse_scan_pmd believes it is worthwhile. + * Only done if khugepaged_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. @@ -1078,7 +1078,7 @@ static int alloc_charge_folio(struct folio **foliop, = struct mm_struct *mm, { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); - int node =3D hpage_collapse_find_target_node(cc); + int node =3D khugepaged_find_target_node(cc); struct folio *folio; =20 folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); @@ -1264,7 +1264,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, return result; } =20 -static int hpage_collapse_scan_pmd(struct mm_struct *mm, +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, struct collapse_control *cc) @@ -1378,7 +1378,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *= mm, * hit record. */ node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; goto out_unmap; } @@ -1447,7 +1447,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot= *mm_slot) =20 lockdep_assert_held(&khugepaged_mm_lock); =20 - if (hpage_collapse_test_exit(mm)) { + if (khugepaged_test_exit(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); @@ -1742,7 +1742,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) continue; =20 - if (hpage_collapse_test_exit(mm)) + if (khugepaged_test_exit(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2264,7 +2264,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, return result; } =20 -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2309,7 +2309,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, } =20 node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; break; } @@ -2355,7 +2355,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, return result; } #else -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2401,7 +2401,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, goto breakouterloop_mmap_lock; =20 progress++; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); @@ -2409,7 +2409,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, unsigned long hstart, hend; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) { + if (unlikely(khugepaged_test_exit_or_disable(mm))) { progress++; break; } @@ -2431,7 +2431,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, bool mmap_locked =3D true; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; =20 VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2491,7 +2491,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (khugepaged_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AE19296153 for ; Mon, 28 Apr 2025 18:13:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745863992; cv=none; b=nVfHk2EUSzghPgdtr59xPIXFmZQAZfASYddJWCv9MfpPHcabi/RBBNvAzEJOgAOYxTxHu+gkwiKnRKG3Mc3py1q7X5tGvnV44pHmWLw9DUedg6LNs7xjzmqgA5IEpC3lzS+45l2ozkxdRHbnzLfabm+dbh7KZyhVm4Yih2qX1j8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745863992; c=relaxed/simple; bh=4y3u3E6kmJcuvWmYXSdpFo88/qAfRUGFIZf1ieRpRTE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aiy3Lp6bt+HeF+vl7C5kXVO4PiZJTKTkvUezjy0S7VGBzz5GCE42PgKKS8WBK8DVO9LrG0hN5EMhoNPg+M00jJ4+RFmxhsOEpTaBqYrn1qflyW93RT4NIo6+4r60l2+e6XYbKit7WIdv6GMO2vgMp3WuqdFdmeYvCCdt1GZn+4c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=emZk9mNq; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="emZk9mNq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745863989; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aqTWRAPjFtm5XSG33SN/WEH03Q2ejePE9J6ReQyF6bM=; b=emZk9mNqTGT4n0dj2xOFAjN3m9b617II2hr2wAJblkln3OdnE6ERpbhxc6MPqvEQYUB+SU /Ipm0Y5qb5FuWqoOy5pvREyOOpzZeL9bnBKkHeC7pOLzdgLumc+EzKbTEPfzjZ/S4JPAwH IkcKR6PSIxCEQHa7grr67pjxOQlqeWw= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-587-9xK0tfCmMY2VizAPEflFhg-1; Mon, 28 Apr 2025 14:13:07 -0400 X-MC-Unique: 9xK0tfCmMY2VizAPEflFhg-1 X-Mimecast-MFC-AGG-ID: 9xK0tfCmMY2VizAPEflFhg_1745863984 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 47726195608C; Mon, 28 Apr 2025 18:13:03 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5889918001DA; Mon, 28 Apr 2025 18:12:55 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 02/12] introduce khugepaged_collapse_single_pmd to unify khugepaged and madvise_collapse Date: Mon, 28 Apr 2025 12:12:08 -0600 Message-ID: <20250428181218.85925-3-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The khugepaged daemon and madvise_collapse have two different implementations that do almost the same thing. Create khugepaged_collapse_single_pmd to increase code reuse and create an entry point for future khugepaged changes. Refactor madvise_collapse and khugepaged_scan_mm_slot to use the new khugepaged_collapse_single_pmd function. Signed-off-by: Nico Pache Reviewed-by: Baolin Wang --- mm/khugepaged.c | 96 +++++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 47 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8b96274dce07..3690a6ec5883 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2363,6 +2363,48 @@ static int khugepaged_scan_file(struct mm_struct *mm= , unsigned long addr, } #endif =20 +/* + * Try to collapse a single PMD starting at a PMD aligned addr, and return + * the results. + */ +static int khugepaged_collapse_single_pmd(unsigned long addr, + struct vm_area_struct *vma, bool *mmap_locked, + struct collapse_control *cc) +{ + int result =3D SCAN_FAIL; + struct mm_struct *mm =3D vma->vm_mm; + + if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { + struct file *file =3D get_file(vma->vm_file); + pgoff_t pgoff =3D linear_page_index(vma, addr); + + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D khugepaged_scan_file(mm, addr, file, pgoff, cc); + fput(file); + if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + *mmap_locked =3D true; + if (khugepaged_test_exit_or_disable(mm)) { + result =3D SCAN_ANY_PROCESS; + goto end; + } + result =3D collapse_pte_mapped_thp(mm, addr, + !cc->is_khugepaged); + if (result =3D=3D SCAN_PMD_MAPPED) + result =3D SCAN_SUCCEED; + mmap_read_unlock(mm); + *mmap_locked =3D false; + } + } else { + result =3D khugepaged_scan_pmd(mm, vma, addr, mmap_locked, cc); + } + if (cc->is_khugepaged && result =3D=3D SCAN_SUCCEED) + ++khugepaged_pages_collapsed; +end: + return result; +} + static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *resul= t, struct collapse_control *cc) __releases(&khugepaged_mm_lock) @@ -2437,34 +2479,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigne= d int pages, int *result, VM_BUG_ON(khugepaged_scan.address < hstart || khugepaged_scan.address + HPAGE_PMD_SIZE > hend); - if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, - khugepaged_scan.address); - - mmap_read_unlock(mm); - mmap_locked =3D false; - *result =3D hpage_collapse_scan_file(mm, - khugepaged_scan.address, file, pgoff, cc); - fput(file); - if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { - mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) - goto breakouterloop; - *result =3D collapse_pte_mapped_thp(mm, - khugepaged_scan.address, false); - if (*result =3D=3D SCAN_PMD_MAPPED) - *result =3D SCAN_SUCCEED; - mmap_read_unlock(mm); - } - } else { - *result =3D hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, &mmap_locked, cc); - } - - if (*result =3D=3D SCAN_SUCCEED) - ++khugepaged_pages_collapsed; =20 + *result =3D khugepaged_collapse_single_pmd(khugepaged_scan.address, + vma, &mmap_locked, cc); + /* If we return SCAN_ANY_PROCESS we are holding the mmap_lock */ + if (*result =3D=3D SCAN_ANY_PROCESS) + goto breakouterloop; /* move to next address */ khugepaged_scan.address +=3D HPAGE_PMD_SIZE; progress +=3D HPAGE_PMD_NR; @@ -2783,36 +2803,18 @@ int madvise_collapse(struct vm_area_struct *vma, st= ruct vm_area_struct **prev, mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); - if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, addr); =20 - mmap_read_unlock(mm); - mmap_locked =3D false; - result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, - cc); - fput(file); - } else { - result =3D hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } + result =3D khugepaged_collapse_single_pmd(addr, vma, &mmap_locked, cc); + if (!mmap_locked) *prev =3D NULL; /* Tell caller we dropped mmap_lock */ =20 -handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; case SCAN_PTE_MAPPED_HUGEPAGE: - BUG_ON(mmap_locked); - BUG_ON(*prev); - mmap_read_lock(mm); - result =3D collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); - goto handle_result; - /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: case SCAN_PTE_UFFD_WP: --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B564A2957A8 for ; Mon, 28 Apr 2025 18:13:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864003; cv=none; b=b3L6dDRd/NKR03M5tEaVZSw7/Wm9LnMZ/ngTr3zoqWUxOhFwrJgfrZbvSKRoepYnTZPMZuCX7FM6BAd+wwWwUhgAGApoHOWELxe1GYc5zL0Fg91UCuAz/rPvyPog05IacLhwrNpHgbQAExiBoWG5e/GYsmDZNTs4lJmSvFNLZzA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864003; c=relaxed/simple; bh=5z+IYGVEcdYg78Uowl2EpuhK5eJp37p2sUw0GqC0quo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aM+SC9ZrGK49Nw2fe9Hq5yfAcmyKrGqYd2G2FBMea/S/uLm1lKM2Q6xS+tB/8EtWh+yMRx8w+p4CQSnKhkrQ4KF3J59ckAjHP3YbKEdMBDNP5u5ZIC45BdNB+luVQXxqxcbpk+xa7H0QeZiq7ZvaIvCqHHJMvI3Jlanfq5ygMVY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XXtmLw9Z; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XXtmLw9Z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ePmLXrUr/o3h7jPNSUCdUrcTkYoB25I2nXP+9bwQ7to=; b=XXtmLw9Z8lV2yXEr6ZO15Wh6FVxrCJqL0gG5l7lathjobQin3SQWOZ6FYyPYxzwridWE8x zDrUYCwEAT1CeX5Tba15kWIUrHxJF4EjwQugimMaIdBVORyKi0xw7ZadcYtLiBwXKgy2hL CHmxEE7XzEDkmdVjsZDBG8pTuaDFc6M= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-86-4VUygtS7OQeFszrIXDtQew-1; Mon, 28 Apr 2025 14:13:18 -0400 X-MC-Unique: 4VUygtS7OQeFszrIXDtQew-1 X-Mimecast-MFC-AGG-ID: 4VUygtS7OQeFszrIXDtQew_1745863992 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B71081956089; Mon, 28 Apr 2025 18:13:11 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id AA927180094B; Mon, 28 Apr 2025 18:13:03 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 03/12] khugepaged: generalize hugepage_vma_revalidate for mTHP support Date: Mon, 28 Apr 2025 12:12:09 -0600 Message-ID: <20250428181218.85925-4-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" For khugepaged to support different mTHP orders, we must generalize this function for arbitrary orders. No functional change in this patch. Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3690a6ec5883..ddb805356e05 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -920,7 +920,7 @@ static int khugepaged_find_target_node(struct collapse_= control *cc) static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long add= ress, bool expect_anon, struct vm_area_struct **vmap, - struct collapse_control *cc) + struct collapse_control *cc, int order) { struct vm_area_struct *vma; unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; @@ -932,9 +932,9 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, if (!vma) return SCAN_VMA_NULL; =20 - if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) + if (!thp_vma_suitable_order(vma, address, order)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, order)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1130,7 +1130,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1164,7 +1164,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -2792,7 +2792,7 @@ int madvise_collapse(struct vm_area_struct *vma, stru= ct vm_area_struct **prev, mmap_read_lock(mm); mmap_locked =3D true; result =3D hugepage_vma_revalidate(mm, addr, false, &vma, - cc); + cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) { last_fail =3D result; goto out_nolock; --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C7842973C7 for ; Mon, 28 Apr 2025 18:13:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864010; cv=none; b=cMHBkzdqT3p/TZIV1CExjpj5XHvUmO1R/A2xc4E8LwW+t2jKzm6qPAayNHaV+La3yVEsG1PAOiJtUUyXFluLIR4OqLB/PY5wNBkZ4AsJKUVfuK+aHNwvrbDhEFA9CmUXgPgRymFITvPv0I1ZaF+kqOoHxRo3guNsI1t1XNE+tIU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864010; c=relaxed/simple; bh=izoFedzkTaJOHbhKVxh1hVWz6CnEStoxb12rm/MlOVk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i2zaXE8hWtEOdpcgmnQFIz4PGFeT7M6ktvFam7ebCo8iSZiyxPWf/JnyeFNLH2J+6fpmzf1Xmc6xyv+uZ08HtH9otbIUUCyuYqgnPwP3G7lBpuVmSoaMwRCtttpRtyEGxwsU5gJVEYXsJXB2zpfZyxrSphHZXJhZapTMWwcGxgA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LBmJheu/; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LBmJheu/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aTX6P2IggYqzg7C0+KpKEa9+57Ib0vSV4YGBosZ11Fk=; b=LBmJheu/qTspjhO/8XGDje/u8oplNaIsf1OtQ6iQUWDs5LUBmDxTevvCgOKbtGYqGy9o3p mee/1z35py+rtgGJkj9CH93zRC/xmMdGLKRoVg+F/JQvGxaK8bDhidumYJVUN1bjef0QUu 9R9sHqabyv5xlDBCzNnTvEzHQ6CEca0= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-591-ve0q8oP_OZCoLPOyLOidkQ-1; Mon, 28 Apr 2025 14:13:24 -0400 X-MC-Unique: ve0q8oP_OZCoLPOyLOidkQ-1 X-Mimecast-MFC-AGG-ID: ve0q8oP_OZCoLPOyLOidkQ_1745864000 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9E778180010A; Mon, 28 Apr 2025 18:13:19 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 07152180045B; Mon, 28 Apr 2025 18:13:11 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 04/12] khugepaged: generalize alloc_charge_folio() Date: Mon, 28 Apr 2025 12:12:10 -0600 Message-ID: <20250428181218.85925-5-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" From: Dev Jain Pass order to alloc_charge_folio() and update mTHP statistics. Reviewed-by: Baolin Wang Co-developed-by: Nico Pache Signed-off-by: Nico Pache Signed-off-by: Dev Jain --- include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 4 ++++ mm/khugepaged.c | 17 +++++++++++------ 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2f190c90192d..0bb65bd4e6dd 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -123,6 +123,8 @@ enum mthp_stat_item { MTHP_STAT_ANON_FAULT_ALLOC, MTHP_STAT_ANON_FAULT_FALLBACK, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, + MTHP_STAT_COLLAPSE_ALLOC, + MTHP_STAT_COLLAPSE_ALLOC_FAILED, MTHP_STAT_ZSWPOUT, MTHP_STAT_SWPIN, MTHP_STAT_SWPIN_FALLBACK, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2780a12b25f0..670b8b7a6738 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -615,6 +615,8 @@ static struct kobj_attribute _name##_attr =3D __ATTR_RO= (_name) DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FAL= LBACK_CHARGE); +DEFINE_MTHP_STAT_ATTR(collapse_alloc, MTHP_STAT_COLLAPSE_ALLOC); +DEFINE_MTHP_STAT_ATTR(collapse_alloc_failed, MTHP_STAT_COLLAPSE_ALLOC_FAIL= ED); DEFINE_MTHP_STAT_ATTR(zswpout, MTHP_STAT_ZSWPOUT); DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN); DEFINE_MTHP_STAT_ATTR(swpin_fallback, MTHP_STAT_SWPIN_FALLBACK); @@ -680,6 +682,8 @@ static struct attribute *any_stats_attrs[] =3D { #endif &split_attr.attr, &split_failed_attr.attr, + &collapse_alloc_attr.attr, + &collapse_alloc_failed_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ddb805356e05..16b5b3761f95 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1074,21 +1074,26 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } =20 static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, - struct collapse_control *cc) + struct collapse_control *cc, u8 order) { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); int node =3D khugepaged_find_target_node(cc); struct folio *folio; =20 - folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); + folio =3D __folio_alloc(gfp, order, node, &cc->alloc_nmask); if (!folio) { *foliop =3D NULL; - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC_FAILED); return SCAN_ALLOC_HUGE_PAGE_FAIL; } =20 - count_vm_event(THP_COLLAPSE_ALLOC); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC); + if (unlikely(mem_cgroup_charge(folio, mm, gfp))) { folio_put(folio); *foliop =3D NULL; @@ -1125,7 +1130,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, */ mmap_read_unlock(mm); =20 - result =3D alloc_charge_folio(&folio, mm, cc); + result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 @@ -1849,7 +1854,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); =20 - result =3D alloc_charge_folio(&new_folio, mm, cc); + result =3D alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out; =20 --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98076297A5C for ; Mon, 28 Apr 2025 18:13:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864015; cv=none; b=YxwMOPXhU7OUXpSK0WlRy+eX/+c8f9h+o7dV+WbIldIfte2HnYVhLFDrBtoIHkvGj3na3NXgA45mcXOKYp71/obIuHBqIhu4v/ug3B+cGiR3RnyyXKX2qQEG0cuCv2n8gwIuxE+QO1TlYhko4XVwxvRSEu7jl1ih/ngFZqRp5OA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864015; c=relaxed/simple; bh=LrkfjeDAWnDDyXhMOsAM+xMr960P9RGV0QdyzCSZGbA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IpZJYTyMj0Ba+Lo6zq5JWhKCDoebanlVd8C8soEHK9XWuZ4xilR69s81dD2re1wGBI/IfF18sOC3WcSmrnuJgHw4m3ATmI/MdkyTvhGN4C/d81Lu3sNZBZlcbHk5lCZcVA9zCrTDfrYAxf1pdb8Y4uWoZN/kGGysEvTJtMBz2LI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WVjKr+VB; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WVjKr+VB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864012; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5P5xWMoPPMey88TbUIMFKq0k2NJVFjtZk5Hx8qz+wZs=; b=WVjKr+VB+aVEQiYzrtDyx8C1EgO/OLpmvZf+f6WuN47VPqPLu+j5Pp0wzzUEodBXm7uTWT SlTJyfSjZfO99R/rZvO2kp02CLwbq3wG7B78TC4WouYeUT8gDJbLZzX94q+/GT7+Inl+XP 6OGJpdRXFu7bezncO+L1e+MxI812dzI= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-145-PBSKMQ46N2WAHSPATUlI1Q-1; Mon, 28 Apr 2025 14:13:31 -0400 X-MC-Unique: PBSKMQ46N2WAHSPATUlI1Q-1 X-Mimecast-MFC-AGG-ID: PBSKMQ46N2WAHSPATUlI1Q_1745864007 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D0F06195609D; Mon, 28 Apr 2025 18:13:26 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E551A180087B; Mon, 28 Apr 2025 18:13:19 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 05/12] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Mon, 28 Apr 2025 12:12:11 -0600 Message-ID: <20250428181218.85925-6-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse can suffer from incosistant behavior, and memory waste "creep". disable swapin and shared support for mTHP collapse. No functional changes in this patch. Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 16b5b3761f95..e21998a06253 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -565,15 +565,17 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct page *page =3D NULL; struct folio *folio =3D NULL; pte_t *_pte; int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, referenced = =3D 0; bool writable =3D false; + int scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + (1 << order); _pte++, address +=3D PAGE_SIZE) { pte_t pteval =3D ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -581,7 +583,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + none_or_zero <=3D scaled_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -609,8 +611,8 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, /* See hpage_collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -711,13 +713,14 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct folio *src, *tmp; pte_t *_pte; pte_t pteval; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + (1 << order); _pte++, address +=3D PAGE_SIZE) { pteval =3D ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { @@ -764,7 +767,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { spinlock_t *pmd_ptl; =20 @@ -781,7 +785,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } =20 /* @@ -802,7 +806,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, u8 order) { unsigned int i; int result =3D SCAN_SUCCEED; @@ -810,7 +814,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct= folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i =3D 0; i < HPAGE_PMD_NR; i++) { + for (i =3D 0; i < (1 << order); i++) { pte_t pteval =3D ptep_get(pte + i); struct page *page =3D folio_page(folio, i); unsigned long src_addr =3D address + i * PAGE_SIZE; @@ -829,10 +833,10 @@ static int __collapse_huge_page_copy(pte_t *pte, stru= ct folio *folio, =20 if (likely(result =3D=3D SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); =20 return result; } @@ -1000,11 +1004,11 @@ static int check_pmd_still_valid(struct mm_struct *= mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, u8 order) { int swapped_in =3D 0; vm_fault_t ret =3D 0; - unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end =3D haddr + (PAGE_SIZE << order); int result; pte_t *pte =3D NULL; spinlock_t *ptl; @@ -1035,6 +1039,12 @@ static int __collapse_huge_page_swapin(struct mm_str= uct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; =20 + /* Dont swapin for mTHP collapse */ + if (order !=3D HPAGE_PMD_ORDER) { + result =3D SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte =3D pte; vmf.ptl =3D ptl; ret =3D do_swap_page(&vmf); @@ -1154,7 +1164,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * that case. Continuing to collapse causes inconsistency. */ result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; } @@ -1201,7 +1211,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; @@ -1231,7 +1241,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 270AD2980A0 for ; Mon, 28 Apr 2025 18:13:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864024; cv=none; b=gtPBRv5W2URRKqHU5aGCu4L53QwI8zGFMCYrqBku8Z2tYFfi6hhnkJ1vzvZ48isCXxiJ2z0jdvILoM/2Vz+cbx6P3eBjMs8H8Nf/SGugaRxgEn92Mmwr357LRgC63ZB2VfHQvGwao/q2DvLma2YZDoTPiFKzr/F9fPdLPHq+5kU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864024; c=relaxed/simple; bh=8fooa8LpMn4Bw/fr1qymLmE032t4Yl0QP3/E06QCPY8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uZNMAJuU3gaIXADN0PjRjmiVL0BC7Ci5RI8B8j6zVr2tIiJsMj2+2a0a0cK7QkE3oubXBcvYrETrfq0MF5wC3AlJaeRffjj2UqreTRUczhthVqrMqJ+zCARYYStuntazn1S8MK79GmG0LvWvcjT8+ZXKiOFWy9K2SrtRH+EWmqY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QoaYn3IC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QoaYn3IC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864021; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NWjIg3Fg24s78QL5xrV3h5AP8YC8bz0Iqifh84RizAw=; b=QoaYn3ICVmGzeDNRLYhoKFWh8dEA+jlhs8r14ENucLToQg1cM3DtYDNytKUXSAi3a/FqjC JSgRANHpOHYypE5DEsLoAZDNKLXLJEg3KWdSPkf8A8Nb0J75Rbsc0ry8uTHjI72OD/nDqj boTkZSpjsuMEaZcu6oc1i2DSgt+roWM= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-12-byG1LfqQPmiDTXFuy1js7w-1; Mon, 28 Apr 2025 14:13:39 -0400 X-MC-Unique: byG1LfqQPmiDTXFuy1js7w-1 X-Mimecast-MFC-AGG-ID: byG1LfqQPmiDTXFuy1js7w_1745864014 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1562A1800980; Mon, 28 Apr 2025 18:13:34 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 21A39180045B; Mon, 28 Apr 2025 18:13:26 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 06/12] khugepaged: introduce khugepaged_scan_bitmap for mTHP support Date: Mon, 28 Apr 2025 12:12:12 -0600 Message-ID: <20250428181218.85925-7-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" khugepaged scans anons PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of utilized sections of the PMD. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the anon PMD. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. When attempting to collapse an order that has its order set to "always" lets always collapse to that order in a greedy manner without considering the number of bits set. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 ++ mm/khugepaged.c | 94 ++++++++++++++++++++++++++++++++++---- 2 files changed, 89 insertions(+), 9 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 1f46046080f5..18fe6eb5051d 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,6 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H +#define KHUGEPAGED_MIN_MTHP_ORDER 2 +#define KHUGEPAGED_MIN_MTHP_NR (1<mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER, 0 }; + + while (top >=3D 0) { + state =3D cc->mthp_bitmap_stack[top--]; + order =3D state.order + KHUGEPAGED_MIN_MTHP_ORDER; + offset =3D state.offset; + num_chunks =3D 1 << (state.order); + // Skip mTHP orders that are not enabled + if (!test_bit(order, &enabled_orders)) + goto next; + + // copy the relavant section to a new bitmap + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set =3D bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + threshold_bits =3D (HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) + >> (HPAGE_PMD_ORDER - state.order); + + //Check if the region is "almost full" based on the threshold + if (bits_set > threshold_bits || is_pmd_only + || test_bit(order, &huge_anon_orders_always)) { + ret =3D collapse_huge_page(mm, address, referenced, unmapped, cc, + mmap_locked, order, offset * KHUGEPAGED_MIN_MTHP_NR); + if (ret =3D=3D SCAN_SUCCEED) { + collapsed +=3D (1 << order); + continue; + } + } + +next: + if (state.order > 0) { + next_order =3D state.order - 1; + mid_offset =3D offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1445,9 +1523,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked =3D false; + unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); } out: trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E8912980CF for ; Mon, 28 Apr 2025 18:13:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864033; cv=none; b=pgvcLzoHJ9jvvVR+QMFXy6xdbdexYk7o2+P7MXEWR+WDslnoKPcPeR+AwYK+4d59EUrbQAluwq/K0bWvHAXOf+PfzSdi68BKk7iPCbIZBcL8c4xjA05xtwj0s/M5KBuGGAtOgRlglUT7ABB+5vZYrIfoVMDRvgmgS1n87dQ3CSw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864033; c=relaxed/simple; bh=nEPfstYpgPCgNSTCZrRh0SMGkPb3mKHaRZFuholmEcY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jauB0F5AQq+01vQC2+Tdox8fFXqUmh6ZxX97RF6FeHN91ZjtVaObh8cRvwgHNJGMvSKLm+HgGvk2h3w17RiizAVJgi891ccLhEeru4FMMV5m/sNX7NyUZo4UCGBJrj1b0u1MDfDkqmSrZiL6lImwbSx6R4WIC7rSCulBEbiFkME= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bWfpQowq; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bWfpQowq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864030; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ADIlaSsBiUSMdw/JIXaCDLJREIXCNHrt4cf85r4FLSE=; b=bWfpQowqYimipcD2CRXw58Guzy5sQNb66YKguikfL2rE30WBUlDOzlPFlvjuh9CXOZdgIy RPesmlCNYHcbGcvm0iMX7ndN8qx4FW57MrpnLdK3v1u1EykbaaR34O2E5YHJC29sQ0Q5o5 iNXRz1b/x+FQtyUqJ3avb8QhwAv4IPo= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-204-hxiZ3u-SOiKem8uraryY4w-1; Mon, 28 Apr 2025 14:13:46 -0400 X-MC-Unique: hxiZ3u-SOiKem8uraryY4w-1 X-Mimecast-MFC-AGG-ID: hxiZ3u-SOiKem8uraryY4w_1745864021 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 687AA19560AB; Mon, 28 Apr 2025 18:13:41 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5DA55180087B; Mon, 28 Apr 2025 18:13:34 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 07/12] khugepaged: add mTHP support Date: Mon, 28 Apr 2025 12:12:13 -0600 Message-ID: <20250428181218.85925-8-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Introduce the ability for khugepaged to collapse to different mTHP sizes. While scanning PMD ranges for potential collapse candidates, keep track of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If mTHPs are enabled we remove the restriction of max_ptes_none during the scan phase so we dont bailout early and miss potential mTHP candidates. After the scan is complete we will perform binary recursion on the bitmap to determine which mTHP size would be most efficient to collapse to. max_ptes_none will be scaled by the attempted collapse order to determine how full a THP must be to be eligible. If a mTHP collapse is attempted, but contains swapped out, or shared pages, we dont perform the collapse. Signed-off-by: Nico Pache --- mm/khugepaged.c | 125 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 88 insertions(+), 37 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6e67db86409a..3a846cd70c66 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1136,13 +1136,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; - pte_t *pte; + pte_t *pte, mthp_pte; pgtable_t pgtable; struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result =3D SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; + unsigned long _address =3D address + offset * PAGE_SIZE; =20 VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 @@ -1158,12 +1159,13 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, *mmap_locked =3D false; } =20 - result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); + result =3D alloc_charge_folio(&folio, mm, cc, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); + *mmap_locked =3D true; + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, order); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1181,13 +1183,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * released when it fails. So we jump out_nolock directly in * that case. Continuing to collapse causes inconsistency. */ - result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_swapin(mm, vma, _address, pmd, + referenced, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; } =20 mmap_read_unlock(mm); + *mmap_locked =3D false; /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM @@ -1197,7 +1200,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD= _ORDER); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, order); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1208,11 +1211,12 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, vma_start_write(vma); anon_vma_lock_write(vma->anon_vma); =20 - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, - address + HPAGE_PMD_SIZE); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, + _address + (PAGE_SIZE << order)); mmu_notifier_invalidate_range_start(&range); =20 pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ + /* * This removes any huge TLB entry from the CPU so we won't allow * huge and small TLB entries for the same virtual address to @@ -1226,18 +1230,16 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, _address, &pte_ptl); if (pte) { - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_isolate(vma, _address, pte, cc, + &compound_pagelist, order); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; } =20 if (unlikely(result !=3D SCAN_SUCCEED)) { - if (pte) - pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); /* @@ -1258,9 +1260,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, anon_vma_unlock_write(vma->anon_vma); =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, - vma, address, pte_ptl, - &compound_pagelist, HPAGE_PMD_ORDER); - pte_unmap(pte); + vma, _address, pte_ptl, + &compound_pagelist, order); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; =20 @@ -1270,25 +1271,42 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * write. */ __folio_mark_uptodate(folio); - pgtable =3D pmd_pgtable(_pmd); - - _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); - _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order =3D=3D HPAGE_PMD_ORDER) { + pgtable =3D pmd_pgtable(_pmd); + _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); + _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, address, pmd, _pmd); + update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { /* mTHP collapse */ + mthp_pte =3D mk_pte(&folio->page, vma->vm_page_prot); + mthp_pte =3D maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl); + folio_ref_add(folio, (1 << order) - 1); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + set_ptes(vma->vm_mm, _address, pte, mthp_pte, (1 << order)); + update_mmu_cache_range(NULL, vma, _address, pte, (1 << order)); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); + spin_unlock(pmd_ptl); + } =20 folio =3D NULL; =20 result =3D SCAN_SUCCEED; out_up_write: + if (pte) + pte_unmap(pte); mmap_write_unlock(mm); out_nolock: *mmap_locked =3D false; @@ -1364,31 +1382,58 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; + int i; int result =3D SCAN_FAIL, referenced =3D 0; int none_or_zero =3D 0, shared =3D 0; struct page *page =3D NULL; struct folio *folio =3D NULL; unsigned long _address; + unsigned long enabled_orders; spinlock_t *ptl; int node =3D NUMA_NO_NODE, unmapped =3D 0; + bool is_pmd_only; bool writable =3D false; - + int chunk_none_count =3D 0; + int scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - KHUGEP= AGED_MIN_MTHP_ORDER); + unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 result =3D find_pmd_or_thp_or_none(mm, address, &pmd); if (result !=3D SCAN_SUCCEED) goto out; =20 + bitmap_zero(cc->mthp_bitmap, MAX_MTHP_BITMAP_SIZE); + bitmap_zero(cc->mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + enabled_orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, + tva_flags, THP_ORDERS_ALL_ANON); + + is_pmd_only =3D (enabled_orders =3D=3D (1 << HPAGE_PMD_ORDER)); + pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); if (!pte) { result =3D SCAN_PMD_NULL; goto out; } =20 - for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, _address +=3D PAGE_SIZE) { + for (i =3D 0; i < HPAGE_PMD_NR; i++) { + /* + * we are reading in KHUGEPAGED_MIN_MTHP_NR page chunks. if + * there are pages in this chunk keep track of it in the bitmap + * for mTHP collapsing. + */ + if (i % KHUGEPAGED_MIN_MTHP_NR =3D=3D 0) { + if (chunk_none_count <=3D scaled_none) + bitmap_set(cc->mthp_bitmap, + i / KHUGEPAGED_MIN_MTHP_NR, 1); + + chunk_none_count =3D 0; + } + + _pte =3D pte + i; + _address =3D address + i * PAGE_SIZE; pte_t pteval =3D ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; @@ -1411,10 +1456,11 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, } } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + ++chunk_none_count; ++none_or_zero; if (!userfaultfd_armed(vma) && - (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + (!cc->is_khugepaged || !is_pmd_only || + none_or_zero <=3D khugepaged_max_ptes_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -1510,6 +1556,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, address))) referenced++; } + if (!writable) { result =3D SCAN_PAGE_RO; } else if (cc->is_khugepaged && @@ -1522,8 +1569,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { - result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); + result =3D khugepaged_scan_bitmap(mm, address, referenced, unmapped, cc, + mmap_locked, enabled_orders); + if (result > 0) + result =3D SCAN_SUCCEED; + else + result =3D SCAN_FAIL; } out: trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0C7F27990B for ; Mon, 28 Apr 2025 18:13:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864039; cv=none; b=WcL3zUTtFmA+c4lk83mfayxO91zQRRwg896CzjIF0J/A4bgxdOOzeS/oWCh7IErjm+6xFo8kR5IMDwe8hNXFXrekpJvs9F8Wk87h/5NL3UciC/rIX1mPJXADHsjXyo6rG2Ibfld4xbajF3YYhHcSdHqJwZZUzF5kAda2rxho27c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864039; c=relaxed/simple; bh=7WQuzYTgoOktNa7BaRZ3rNS0qZfI1nMj9imLpll+iM8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iT6U7g9Gp6gj0duOBbIzygSv6H410dhDwUSDlsCH5TEwYsEvbN1Zw26JBn6rJzkgrQHYMWYMEiTb8kPhPWOuCcDpr6N8UHlM9xuwn3qtTiKFVZXm1VDoFbMArreBC37mnKAELyQCCBW+cXv4SjbqyD1Aa9wJW6Tl7KCrbl54qzg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=M65X4IAr; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="M65X4IAr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fcQozjNet+vZ+AD/dmAuUysFjwKpw68qmsV5tnNH5oA=; b=M65X4IAryyDsjeZSrqm/WmAAXRoLlIuu3LrRzt869BUB0W2d50DURJGd1LiKC96SkxMaxa OVLV3e3N10DM32+vZ9o3LXKAZpDkvp86teb0PKv+mFWdvmmLcEpEtjk9Thc7zLyCs92Zlw isr6p2JF29SwAWWvTJMNdHCRGIYJe1g= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-466-FS__RpNxNRuCkqrFoFCq5w-1; Mon, 28 Apr 2025 14:13:53 -0400 X-MC-Unique: FS__RpNxNRuCkqrFoFCq5w-1 X-Mimecast-MFC-AGG-ID: FS__RpNxNRuCkqrFoFCq5w_1745864028 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7A17719560AD; Mon, 28 Apr 2025 18:13:48 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id AC2B1180045B; Mon, 28 Apr 2025 18:13:41 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 08/12] khugepaged: skip collapsing mTHP to smaller orders Date: Mon, 28 Apr 2025 12:12:14 -0600 Message-ID: <20250428181218.85925-9-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" khugepaged may try to collapse a mTHP to a smaller mTHP, resulting in some pages being unmapped. Skip these cases until we have a way to check if its ok to collapse to a smaller mTHP size (like in the case of a partially mapped folio). This patch is inspired by Dev Jain's work on khugepaged mTHP support [1]. [1] https://lore.kernel.org/lkml/20241216165105.56185-11-dev.jain@arm.com/ Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache Reviewed-by: Baolin Wang --- mm/khugepaged.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3a846cd70c66..86d1153ce9e8 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -625,7 +625,12 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 - /* See hpage_collapse_scan_pmd(). */ + if (order !=3D HPAGE_PMD_ORDER && folio_order(folio) >=3D order) { + result =3D SCAN_PTE_MAPPED_HUGEPAGE; + goto out; + } + + /* See khugepaged_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 533942957BC for ; Mon, 28 Apr 2025 18:14:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864047; cv=none; b=I2OyHlCaF6ZqWqvID86OQ2ole5NsONDUV8ZlVWgv3oSDKSsrflNnapkFiAHjyK7iipEhTzPV8/zf2moLZXkYRVRHghDfC2kG2GWYFoDWZvyFqwZHiI4NH4tCB+JwjLHLCy5WFnAI3nnu4pSW9mIT40Szn7wDmxBasVtI3DH/jOA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864047; c=relaxed/simple; bh=z4qGqVAHteYb356O98TgDOjX3bm35nFfTIwL8OhB43A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FAy3l6+ngXsEcjWk4L4c0/3LsIRaH6xFwJU6ydsKJyYrba917niSPFNIeqE8L3Nl3gyqXPpk3Z32s2Hbx7VeVvAiWuSrEoON879cMP1qyWfIv+RmLBeAclHF2W+H/qas8KHN+Vz0kbfzfqg5JmF2YLJvhPGDOZD0xTamFdfkX6U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XPIK+v8P; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XPIK+v8P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VXX1Q1S6YWWF7541obLgN0IKvQHRe1QaMnMvIKwwN+I=; b=XPIK+v8PrFWu6hNIooNl2WybPrUhPPWJQ4LMTkg7kbD7AuRvLxShrIfRivJDzGVrKJISpA KIPqOM69fqLEXo4ViziL/Z0EVOySxde7UHlBR+LZmODaYdBgl/mVzsaTuu284qbSy9P8fx V7WQyxmGTgYKZwyklRnIFMgVP8yq3oU= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-365-W2x8u2p5MKSMnDVTbh193Q-1; Mon, 28 Apr 2025 14:13:59 -0400 X-MC-Unique: W2x8u2p5MKSMnDVTbh193Q-1 X-Mimecast-MFC-AGG-ID: W2x8u2p5MKSMnDVTbh193Q_1745864036 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A99171955DC5; Mon, 28 Apr 2025 18:13:55 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BEC2018001DA; Mon, 28 Apr 2025 18:13:48 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 09/12] khugepaged: avoid unnecessary mTHP collapse attempts Date: Mon, 28 Apr 2025 12:12:15 -0600 Message-ID: <20250428181218.85925-10-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" There are cases where, if an attempted collapse fails, all subsequent orders are guaranteed to also fail. Avoid these collapse attempts by bailing out early. Signed-off-by: Nico Pache --- mm/khugepaged.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 86d1153ce9e8..5e6732cccb86 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1365,6 +1365,23 @@ static int khugepaged_scan_bitmap(struct mm_struct *= mm, unsigned long address, collapsed +=3D (1 << order); continue; } + /* + * Some ret values indicate all lower order will also + * fail, dont trying to collapse smaller orders + */ + if (ret =3D=3D SCAN_EXCEED_NONE_PTE || + ret =3D=3D SCAN_EXCEED_SWAP_PTE || + ret =3D=3D SCAN_EXCEED_SHARED_PTE || + ret =3D=3D SCAN_PTE_NON_PRESENT || + ret =3D=3D SCAN_PTE_UFFD_WP || + ret =3D=3D SCAN_ALLOC_HUGE_PAGE_FAIL || + ret =3D=3D SCAN_CGROUP_CHARGE_FAIL || + ret =3D=3D SCAN_COPY_MC || + ret =3D=3D SCAN_PAGE_LOCK || + ret =3D=3D SCAN_PAGE_COUNT) + goto next; + else + break; } =20 next: --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F9652973A1 for ; Mon, 28 Apr 2025 18:14:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864054; cv=none; b=VRvK8Sm0RuWj05jbtOCp6EW+JsAf2xbuloayvtcDmeIFa1Esa4629wRX74/tFc47e1rl8KqKfBwYlWoQL1OZXqQtEm4YCENExUp7ZDQNfo2OsHwHOcGlg4/iETurr67fsvYpDljdhl20yFtmJY/S/nb73s5sp8StsPDHKdgSy0A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864054; c=relaxed/simple; bh=9Aqr7JXUqvK8SAIIYtakdHbkoRpEkKUaTwuvNHIlP/U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YItmp//eGIJ6YAQgVK9Q+xlQrLxfjUxqZYAGffYqeNSsD0PZ4QHrOa7Er9c34Shhn1UFytgHVOvFeWL3lrHHJ/SD/E5afKSJakm5x7rfh2H9GYAbY7PZ4td1F7tEuOH9yixgocyPv0Jmib3BsH5qx2BbOhGINJyr1TT6/YTMC3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dpMSCRXX; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dpMSCRXX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tkNyY6ZLizAAb0UD2nDjdjdRVOpZgMssw2cie3tgwO8=; b=dpMSCRXXcUPH2n4iWOYkrlg/7VFd6DHBE5sggDwDuOlQ8Oe/oPBrHmtz6iLaVnewAhxcCg ybE0jqzBvieLVNDhOMP4bTH756z7oyReXVBa4thpQatoSlaLHTtGwV2+NbsSx5ky7F1j1S wIXvGuDnSApIRI3ArpoNzYaejpfqXKA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-64-Jvs5c98tNtOXZtBYdtjSSQ-1; Mon, 28 Apr 2025 14:14:07 -0400 X-MC-Unique: Jvs5c98tNtOXZtBYdtjSSQ-1 X-Mimecast-MFC-AGG-ID: Jvs5c98tNtOXZtBYdtjSSQ_1745864043 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0810D1801A0D; Mon, 28 Apr 2025 18:14:03 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ED0EB180045B; Mon, 28 Apr 2025 18:13:55 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 10/12] khugepaged: improve tracepoints for mTHP orders Date: Mon, 28 Apr 2025 12:12:16 -0600 Message-ID: <20250428181218.85925-11-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Add the order to the tracepoints to give better insight into what order is being operated at for khugepaged. Reviewed-by: Baolin Wang Signed-off-by: Nico Pache --- include/trace/events/huge_memory.h | 34 +++++++++++++++++++----------- mm/khugepaged.c | 10 +++++---- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index 9d5c00b0285c..ea2fe20a39f5 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -92,34 +92,37 @@ TRACE_EVENT(mm_khugepaged_scan_pmd, =20 TRACE_EVENT(mm_collapse_huge_page, =20 - TP_PROTO(struct mm_struct *mm, int isolated, int status), + TP_PROTO(struct mm_struct *mm, int isolated, int status, int order), =20 - TP_ARGS(mm, isolated, status), + TP_ARGS(mm, isolated, status, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, isolated) __field(int, status) + __field(int, order) ), =20 TP_fast_assign( __entry->mm =3D mm; __entry->isolated =3D isolated; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, isolated=3D%d, status=3D%s", + TP_printk("mm=3D%p, isolated=3D%d, status=3D%s order=3D%d", __entry->mm, __entry->isolated, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_isolate, =20 TP_PROTO(struct page *page, int none_or_zero, - int referenced, bool writable, int status), + int referenced, bool writable, int status, int order), =20 - TP_ARGS(page, none_or_zero, referenced, writable, status), + TP_ARGS(page, none_or_zero, referenced, writable, status, order), =20 TP_STRUCT__entry( __field(unsigned long, pfn) @@ -127,6 +130,7 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __field(int, referenced) __field(bool, writable) __field(int, status) + __field(int, order) ), =20 TP_fast_assign( @@ -135,27 +139,31 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __entry->referenced =3D referenced; __entry->writable =3D writable; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, writable= =3D%d, status=3D%s", + TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, writable= =3D%d, status=3D%s order=3D%d", __entry->pfn, __entry->none_or_zero, __entry->referenced, __entry->writable, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_swapin, =20 - TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret), + TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret, + int order), =20 - TP_ARGS(mm, swapped_in, referenced, ret), + TP_ARGS(mm, swapped_in, referenced, ret, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, swapped_in) __field(int, referenced) __field(int, ret) + __field(int, order) ), =20 TP_fast_assign( @@ -163,13 +171,15 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->swapped_in =3D swapped_in; __entry->referenced =3D referenced; __entry->ret =3D ret; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d", + TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d, order=3D%= d", __entry->mm, __entry->swapped_in, __entry->referenced, - __entry->ret) + __entry->ret, + __entry->order) ); =20 TRACE_EVENT(mm_khugepaged_scan_file, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5e6732cccb86..1d99ef184c3f 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -721,13 +721,14 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, } else { result =3D SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, - referenced, writable, result); + referenced, writable, result, + order); return result; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, - referenced, writable, result); + referenced, writable, result, order); return result; } =20 @@ -1097,7 +1098,8 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, =20 result =3D SCAN_SUCCEED; out: - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result, + order); return result; } =20 @@ -1317,7 +1319,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, *mmap_locked =3D false; if (folio) folio_put(folio); - trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result); + trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result, order= ); return result; } =20 --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B28DB2973A1 for ; Mon, 28 Apr 2025 18:14:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864063; cv=none; b=TLyT6HfIZI5JmmsI7ysnTpPJlTuDHKNYGcHolCGwcrr8Tq/tNHsdCUusU6vqlDxV3bqPX9fOdJZ2J70gBf5r1XW9VXQEJmkt99cW7n73fxR6kalCGiTuH9aqNwC4T7r9OtdQq0qNS4GLx4fn2+caQDj+dYEu0Vgu/D1GPBn0O+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864063; c=relaxed/simple; bh=Sl2zw7u9yvF4MQTFaJy9C+4e0zDrd2e8ActGDXqUnik=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uf5BAru2GYl6MsEgSzPR9mI9hOzaOZ2zbU0jI2DVw6Jt/aKg9sWr+2C2KgRgKt3BqdBlQUep18oW6HQJWtOO/TjViGdiM7c5UgVojnt/XLahUTFtGnHU1XL3166bXpj1S8z7ppJAYUQS6zkHF/IwGBcRUW2XXXjsNHM8BSZuBtc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Aniy5Lkq; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Aniy5Lkq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lUK4EJo1p9yOMVm4PqIIKoGumhQqGfO0Cgj3FXFL3EE=; b=Aniy5LkqgEU6Wt8/yK/jcPvflMebaf9elV7ViI7ZfLlcTJ2UtEbIt7klprqtlB9YzW65YQ KwtfkBR3/dbqr24+nPqsB+nLjf8rDLt5eN96eHe3tTZrMqisF4Sk3Tu6HbbsTU273+sqb8 OIpI3kAjFHv0mxNsWEADs0Qu0Nj/NZM= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-331-mba8hnT7MXae9DNdrfiMLw-1; Mon, 28 Apr 2025 14:14:16 -0400 X-MC-Unique: mba8hnT7MXae9DNdrfiMLw-1 X-Mimecast-MFC-AGG-ID: mba8hnT7MXae9DNdrfiMLw_1745864051 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CDC72180048E; Mon, 28 Apr 2025 18:14:10 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4E387180087B; Mon, 28 Apr 2025 18:14:03 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 11/12] khugepaged: add per-order mTHP khugepaged stats Date: Mon, 28 Apr 2025 12:12:17 -0600 Message-ID: <20250428181218.85925-12-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" With mTHP support inplace, let add the per-order mTHP stats for exceeding NONE, SWAP, and SHARED. Signed-off-by: Nico Pache --- include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 16 +++++++++++++--- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0bb65bd4e6dd..e3d15c737008 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -139,6 +139,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 670b8b7a6738..8af5caa0d9bc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -633,6 +633,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FA= ILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALL= Y_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_= SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_= NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEE= D_SHARED); + =20 static struct attribute *anon_stats_attrs[] =3D { &anon_fault_alloc_attr.attr, @@ -649,6 +653,9 @@ static struct attribute *anon_stats_attrs[] =3D { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1d99ef184c3f..812181354c46 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -604,7 +604,10 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, continue; } else { result =3D SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + else + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } } @@ -633,8 +636,14 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, /* See khugepaged_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (order !=3D HPAGE_PMD_ORDER) { + result =3D SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -1060,6 +1069,7 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, =20 /* Dont swapin for mTHP collapse */ if (order !=3D HPAGE_PMD_ORDER) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); result =3D SCAN_EXCEED_SWAP_PTE; goto out; } --=20 2.48.1 From nobody Sat Feb 7 17:49:08 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C00A2973CD for ; Mon, 28 Apr 2025 18:14:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864071; cv=none; b=Q0U+IccpikaxtzNVE10abypMVPDfgv+4Zo4qM0QDqqkb9E9LJq31J7oh1X2BCsPJyigmiDs2Qu69UVmQeSYd/f4gz3ptVEO3l6Fo1N55Oo2HsVD75Bkesksow8yT8tDxbGd/XNoS3CGHOFBoNvJS6HsS9v2aXRo6FGbOXzYuFnQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745864071; c=relaxed/simple; bh=o/Zn2RjmOaY+Li0T3wfzPh+EvbQjsbJThZMIHe2Is3w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kBoW/EMz487DRyqVmyx3x7ESLRXxUYdKpPOKHsLUYPCx7L4Kzl+GVxLU5o19pRFvPqJ0eXzuK5nYe/icttPaxHzYWgtug/LmSs70ao1dZwzdX+z+iC0nFlaMuwF+LO7Ms4grsPzMMVcJlPIxJ+9Uw/dPYy/QC6rS8hixF6YCcek= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WjtOGrO1; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WjtOGrO1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864068; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I/+DK6e3bYJDKc79x5D+EtIP/qIBPJHqfvfRzdAZtOY=; b=WjtOGrO1duwqcupF1TZJAq7hEBG89R2yj4Fqn0yk4if4N9L066eORD5ZMAi3reu0sIQgUk D0dnu9F3AxwsKTMfgpk1GlwulmBDEsjc4EtoiQT/iILZD19DNwsBcTOk0cgZ+jdZ73r0AM NZhcpiyjoHc+XTCeBk9dyPa7qR5avX0= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-606-KbfyS2V7PYS2LFXFRWK3mw-1; Mon, 28 Apr 2025 14:14:23 -0400 X-MC-Unique: KbfyS2V7PYS2LFXFRWK3mw-1 X-Mimecast-MFC-AGG-ID: KbfyS2V7PYS2LFXFRWK3mw_1745864059 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 092F3180010A; Mon, 28 Apr 2025 18:14:19 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1EC2318001DA; Mon, 28 Apr 2025 18:14:10 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com Subject: [PATCH v5 12/12] Documentation: mm: update the admin guide for mTHP collapse Date: Mon, 28 Apr 2025 12:12:18 -0600 Message-ID: <20250428181218.85925-13-npache@redhat.com> In-Reply-To: <20250428181218.85925-1-npache@redhat.com> References: <20250428181218.85925-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Now that we can collapse to mTHPs lets update the admin guide to reflect these changes and provide proper guidence on how to utilize it. Signed-off-by: Nico Pache Reviewed-by: Bagas Sanjaya --- Documentation/admin-guide/mm/transhuge.rst | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index dff8d5985f0f..5c63fe51b3ad 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -63,7 +63,7 @@ often. THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is ``khugepaged`` daemon that scans memory and -collapses sequences of basic pages into PMD-sized huge pages. +collapses sequences of basic pages into huge pages. =20 The THP behaviour is controlled via :ref:`sysfs ` interface and using madvise(2) and prctl(2) system calls. @@ -144,6 +144,18 @@ hugepage sizes have enabled=3D"never". If enabling mul= tiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation. =20 +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size +to determine collapses. When using mTHPs it's recommended to set +max_ptes_none low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page +size). This will prevent undesired "creep" behavior that leads to +continuously collapsing to a larger mTHP size; When we collapse, we are +bringing in new non-zero pages that will, on a subsequent scan, cause the +max_ptes_none check of the +1 order to always be satisfied. By limiting +this to less than half the current order, we make sure we don't cause this +feedback loop. max_ptes_shared and max_ptes_swap have no effect when +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out +pages. + It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular --=20 2.48.1