From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1BFC2629D for ; Wed, 2 Jul 2025 05:58:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435930; cv=none; b=ruKpHjo2lDU8xSVRTrDx38/Eonx1oJ0pMNlEnMTtVVzFrYCYjyL5F6fyHPbz3hqDHEYIpR8u6i/W0fMNqcJihxE4g3UldRr1dLvwTM4475hOyiF9Uv9zEqhBaQLi7IWAcrUSmxkm0Jfd3lQ8o575JOGxEi93293MyXXiQwety6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435930; c=relaxed/simple; bh=R26Jc/mPx/BZUAHA0MGmyA+L043UvJUzBB34A6DnEFg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XrvV1MtJ2m+MgD3uRHXe8P1iLfOCFyAa/oKilIN8eso8QVBdJ00nJPT4qHvUpKp0lIxEexKBSshRmb2z7pzd7TxXPp5TjhG6cIX3JsPpGaRTmGbFrnzjBoEzs6sSi6m3+1fQyg5KhJx12rCHSmOhJokL+gE4HiCt/IZREyiCsyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BfNLPxxR; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BfNLPxxR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751435928; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GO5DwRuc6cmbE2FDiRpD1Jne2FCr3l2QRxq5I5fGCxI=; b=BfNLPxxRkbMGCA8jAsH2ue7IZTWjEfGySASMjjYv20i+TvcToPkc3ngatBP+aI8tavjM3D O6739ce0GB+j2GSmlYSAc84H5D3t5l4TQQUvcCAGP+s/6sSVlnCyyt1SRVkhX+jhVqlF+Q 3drjMsqfcIyq6Kw/aSh94+fSOQPyhbo= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-94-v-99lafQPEC-IWqrYTNa8g-1; Wed, 02 Jul 2025 01:58:44 -0400 X-MC-Unique: v-99lafQPEC-IWqrYTNa8g-1 X-Mimecast-MFC-AGG-ID: v-99lafQPEC-IWqrYTNa8g_1751435920 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2C2BC1944AA9; Wed, 2 Jul 2025 05:58:39 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D420F1800285; Wed, 2 Jul 2025 05:58:21 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 01/15] khugepaged: rename hpage_collapse_* to khugepaged_* Date: Tue, 1 Jul 2025 23:57:28 -0600 Message-ID: <20250702055742.102808-2-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" functions in khugepaged.c use a mix of hpage_collapse and khugepaged as the function prefix. rename all of them to khugepaged to keep things consistent and slightly shorten the function names. Reviewed-by: Zi Yan Reviewed-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1aa7ca67c756..f27fe7ca9b86 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -402,14 +402,14 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } =20 -static inline int hpage_collapse_test_exit(struct mm_struct *mm) +static inline int khugepaged_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) =3D=3D 0; } =20 -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) +static inline int khugepaged_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || + return khugepaged_test_exit(mm) || test_bit(MMF_DISABLE_THP, &mm->flags); } =20 @@ -444,7 +444,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; =20 /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) return; =20 @@ -503,7 +503,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * khugepaged_test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -838,7 +838,7 @@ struct collapse_control khugepaged_collapse_control =3D= { .is_khugepaged =3D true, }; =20 -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; =20 @@ -873,7 +873,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(v= oid) } =20 #ifdef CONFIG_NUMA -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { int nid, target_node =3D 0, max_value =3D 0; =20 @@ -892,7 +892,7 @@ static int hpage_collapse_find_target_node(struct colla= pse_control *cc) return target_node; } #else -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -912,7 +912,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, struct vm_area_struct *vma; unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; =20 - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; =20 *vmap =3D vma =3D find_vma(mm, address); @@ -977,7 +977,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, =20 /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if hpage_collapse_scan_pmd believes it is worthwhile. + * Only done if khugepaged_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. @@ -1063,7 +1063,7 @@ static int alloc_charge_folio(struct folio **foliop, = struct mm_struct *mm, { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); - int node =3D hpage_collapse_find_target_node(cc); + int node =3D khugepaged_find_target_node(cc); struct folio *folio; =20 folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); @@ -1249,7 +1249,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, return result; } =20 -static int hpage_collapse_scan_pmd(struct mm_struct *mm, +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, struct collapse_control *cc) @@ -1363,7 +1363,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *= mm, * hit record. */ node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; goto out_unmap; } @@ -1432,7 +1432,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot= *mm_slot) =20 lockdep_assert_held(&khugepaged_mm_lock); =20 - if (hpage_collapse_test_exit(mm)) { + if (khugepaged_test_exit(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); @@ -1725,7 +1725,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) continue; =20 - if (hpage_collapse_test_exit(mm)) + if (khugepaged_test_exit(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2247,7 +2247,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, return result; } =20 -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long ad= dr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2304,7 +2304,7 @@ static int hpage_collapse_scan_file(struct mm_struct = *mm, unsigned long addr, } =20 node =3D folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result =3D SCAN_SCAN_ABORT; folio_put(folio); break; @@ -2392,7 +2392,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, goto breakouterloop_mmap_lock; =20 progress++; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); @@ -2400,7 +2400,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, unsigned long hstart, hend; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) { + if (unlikely(khugepaged_test_exit_or_disable(mm))) { progress++; break; } @@ -2422,7 +2422,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, bool mmap_locked =3D true; =20 cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; =20 VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2482,7 +2482,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (khugepaged_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D50351F8722 for ; Wed, 2 Jul 2025 05:59:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435949; cv=none; b=OfAsg23nVBKWoKyocklPkYUqUjxkQZJCKNXXDP5CeXMTiQY+/yEmS67rE+paEp6FTnqd/o+O1DvrigQc6kJ2bT3c4CpGApZMZo32MnHOhLSihw2k5RebauzRFvnEUTqJmPRrkDjvSDY0oSqTe7NcKgxNndf4YwMTP+T81GmjgJQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435949; c=relaxed/simple; bh=ej7QtjMFRmKKOvyMP8hAoxz47HgxGbvRh5grnaKsYXY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gCzn/uLtCkNdT4jchuuxgfMyhB7AZ2ipqEDdA8em8LFfdxpPJ1OCwbiKhVwyHkiyI+OV2ECodkyJ1gjkoa/78F7ZRvGGO9mCw/ztSZe89Ykb5Z4vHM/pCewdBNS50fUi0sJz0LNxzjad82M9EzaVHIeHMEQsMSN+8k7VVYsFdyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VyVSzx47; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VyVSzx47" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751435947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3SKcei+/sNkOhdwaQhaPJzluqvYvNAfrvgdfOhPY5dU=; b=VyVSzx47OonwC28f21h8My6SOB8xJl7C3EhTxm3cKm7Lynl3S/w49YnCyY3nRZUAEgjLAA RO6a54BDueMI1vyYhgmZJ+5E0SolPeo0B5xErrxpM3NUZiGOY6Bx14WMTXd2Nfwny06nOg ZcsFZp4mZB/5coYSFcgkBB2++onSoHw= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-339-rDf9V_ijOHOWfnRk4VAVBg-1; Wed, 02 Jul 2025 01:59:01 -0400 X-MC-Unique: rDf9V_ijOHOWfnRk4VAVBg-1 X-Mimecast-MFC-AGG-ID: rDf9V_ijOHOWfnRk4VAVBg_1751435937 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 07C69195FD30; Wed, 2 Jul 2025 05:58:56 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CD3581800287; Wed, 2 Jul 2025 05:58:39 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 02/15] introduce khugepaged_collapse_single_pmd to unify khugepaged and madvise_collapse Date: Tue, 1 Jul 2025 23:57:29 -0600 Message-ID: <20250702055742.102808-3-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" The khugepaged daemon and madvise_collapse have two different implementations that do almost the same thing. Create khugepaged_collapse_single_pmd to increase code reuse and create an entry point these two users. Refactor madvise_collapse and khugepaged_scan_mm_slot to use the new khugepaged_collapse_single_pmd function. This introduces a minor behavioral change that is most likely an undiscovered bug. The current implementation of khugepaged tests khugepaged_test_exit_or_disable before calling khugepaged_pte_mapped_thp, but we weren't doing it in the madvise_collapse case. By unifying these two callers madvise_collapse now also performs this check. Reviewed-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 95 +++++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 46 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f27fe7ca9b86..bf69e81a3d82 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2354,6 +2354,50 @@ static int khugepaged_scan_file(struct mm_struct *mm= , unsigned long addr, return result; } =20 +/* + * Try to collapse a single PMD starting at a PMD aligned addr, and return + * the results. + */ +static int khugepaged_collapse_single_pmd(unsigned long addr, + struct vm_area_struct *vma, bool *mmap_locked, + struct collapse_control *cc) +{ + int result =3D SCAN_FAIL; + struct mm_struct *mm =3D vma->vm_mm; + + if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { + struct file *file =3D get_file(vma->vm_file); + pgoff_t pgoff =3D linear_page_index(vma, addr); + + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D khugepaged_scan_file(mm, addr, file, pgoff, cc); + fput(file); + if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + *mmap_locked =3D true; + if (khugepaged_test_exit_or_disable(mm)) { + mmap_read_unlock(mm); + *mmap_locked =3D false; + result =3D SCAN_ANY_PROCESS; + goto end; + } + result =3D collapse_pte_mapped_thp(mm, addr, + !cc->is_khugepaged); + if (result =3D=3D SCAN_PMD_MAPPED) + result =3D SCAN_SUCCEED; + mmap_read_unlock(mm); + *mmap_locked =3D false; + } + } else { + result =3D khugepaged_scan_pmd(mm, vma, addr, mmap_locked, cc); + } + if (cc->is_khugepaged && result =3D=3D SCAN_SUCCEED) + ++khugepaged_pages_collapsed; +end: + return result; +} + static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *resul= t, struct collapse_control *cc) __releases(&khugepaged_mm_lock) @@ -2428,34 +2472,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned= int pages, int *result, VM_BUG_ON(khugepaged_scan.address < hstart || khugepaged_scan.address + HPAGE_PMD_SIZE > hend); - if (!vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, - khugepaged_scan.address); - - mmap_read_unlock(mm); - mmap_locked =3D false; - *result =3D hpage_collapse_scan_file(mm, - khugepaged_scan.address, file, pgoff, cc); - fput(file); - if (*result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { - mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) - goto breakouterloop; - *result =3D collapse_pte_mapped_thp(mm, - khugepaged_scan.address, false); - if (*result =3D=3D SCAN_PMD_MAPPED) - *result =3D SCAN_SUCCEED; - mmap_read_unlock(mm); - } - } else { - *result =3D hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, &mmap_locked, cc); - } - - if (*result =3D=3D SCAN_SUCCEED) - ++khugepaged_pages_collapsed; =20 + *result =3D khugepaged_collapse_single_pmd(khugepaged_scan.address, + vma, &mmap_locked, cc); /* move to next address */ khugepaged_scan.address +=3D HPAGE_PMD_SIZE; progress +=3D HPAGE_PMD_NR; @@ -2772,35 +2791,19 @@ int madvise_collapse(struct vm_area_struct *vma, un= signed long start, mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); - if (!vma_is_anonymous(vma)) { - struct file *file =3D get_file(vma->vm_file); - pgoff_t pgoff =3D linear_page_index(vma, addr); =20 - mmap_read_unlock(mm); - mmap_locked =3D false; - result =3D hpage_collapse_scan_file(mm, addr, file, pgoff, - cc); - fput(file); - } else { - result =3D hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } + result =3D khugepaged_collapse_single_pmd(addr, vma, &mmap_locked, cc); + if (!mmap_locked) *lock_dropped =3D true; =20 -handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; - case SCAN_PTE_MAPPED_HUGEPAGE: - BUG_ON(mmap_locked); - mmap_read_lock(mm); - result =3D collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); - goto handle_result; /* Whitelisted set of results where continuing OK */ + case SCAN_PTE_MAPPED_HUGEPAGE: case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: case SCAN_PTE_UFFD_WP: --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61CB260B8A for ; Wed, 2 Jul 2025 05:59:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435965; cv=none; b=rqd/rKHpViRjJ7fka8bSKhhcR5A142P1a8Omk2krX0mRpy7SLsK77ERcQYL2ERFWgoY9rCgojjcHBBpo/IFrFpGC9E6qRFR76XKCqdYjUs9cLwpRc1cb/HeY4XclufuWZLiRXMC5Pw3uin9frcsNdn42x10AtAVcyqaKToYkQ2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435965; c=relaxed/simple; bh=MTrWvcqTp6JZcwvCGJf0cB4BmVPb2qa14BV/GxhOvUk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UfKSSO+tw7WM63sEL7EKfNFCyZorznHlsiTElAJwp3c1IKgmCEteEGA4vB0EpL0bmsZ6Y98t+EWsp7zagJQhuEE1vOOB6itPEn2ArF9+w72lVtAKbzjOqkmvxXKEFYZiExA7IvCnHcynNS1Qaw+307cti6nTZvvPMvel+v3ipGE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TuMtK2Iz; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TuMtK2Iz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751435963; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u+H9xfARm1u9/s+gCR3ZQc0XlTk0h8GFkMRGzqJcOrk=; b=TuMtK2IzrXYcCpUxgnX3VUwjhzL0r1AvAP2d3d+BXV1iqeSYDX8sekjGD+gfOjPU1nxdxq 5+YbCtqnDIjyqIvjLvpAs6OmFg2Xp8Np3gInG8VJRiXRf+E9c6eSJO0+7GqQjASRJZh/H3 0dtPU6BTLFT4v3Gtv3LRoDXFOUkwVcM= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-38-TMjt8s8HNaSsbb0b_Lkisw-1; Wed, 02 Jul 2025 01:59:18 -0400 X-MC-Unique: TMjt8s8HNaSsbb0b_Lkisw-1 X-Mimecast-MFC-AGG-ID: TMjt8s8HNaSsbb0b_Lkisw_1751435953 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E67AB1800289; Wed, 2 Jul 2025 05:59:12 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B9F6918003FC; Wed, 2 Jul 2025 05:58:56 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 03/15] khugepaged: generalize hugepage_vma_revalidate for mTHP support Date: Tue, 1 Jul 2025 23:57:30 -0600 Message-ID: <20250702055742.102808-4-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" For khugepaged to support different mTHP orders, we must generalize this to check if the PMD is not shared by another VMA and the order is enabled. To ensure madvise_collapse can support working on mTHP orders without the PMD order enabled, we need to convert hugepage_vma_revalidate to take a bitmap of orders. No functional change in this patch. Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index bf69e81a3d82..7bcd4d280c71 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -907,7 +907,7 @@ static int khugepaged_find_target_node(struct collapse_= control *cc) static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long add= ress, bool expect_anon, struct vm_area_struct **vmap, - struct collapse_control *cc) + struct collapse_control *cc, unsigned long orders) { struct vm_area_struct *vma; unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; @@ -919,9 +919,10 @@ static int hugepage_vma_revalidate(struct mm_struct *m= m, unsigned long address, if (!vma) return SCAN_VMA_NULL; =20 + /* Always check the PMD order to insure its not shared by another VMA */ if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER)) + if (!thp_vma_allowable_orders(vma, vma->vm_flags, tva_flags, orders)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1115,7 +1116,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, + BIT(HPAGE_PMD_ORDER)); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1149,7 +1151,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, + BIT(HPAGE_PMD_ORDER)); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -2780,7 +2783,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, mmap_read_lock(mm); mmap_locked =3D true; result =3D hugepage_vma_revalidate(mm, addr, false, &vma, - cc); + cc, BIT(HPAGE_PMD_ORDER)); if (result !=3D SCAN_SUCCEED) { last_fail =3D result; goto out_nolock; --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E721F20D4F9 for ; Wed, 2 Jul 2025 06:02:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436132; cv=none; b=aCyivvOrmJKBlmQk1yRBUithW4XB705WShJgmV8QlmiOV+hxC50ist4j9jKsnPXyxWujYMuSqtAzQIxCtwPX9EBAv1PcotTMKuRblkvwecNYTu5hXJJsreD91C5WNlEoW6luXbtNXQTb5Ady4GfBcWcEFg5j+rK4Fb8Ip8v4yp0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436132; c=relaxed/simple; bh=rgKFchSpRtVlMbrB4t3oG0+Oi78O7UqzVOpvAdmiE7w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LsLiOEss256MAaB/e4QTlvJoJNeloMDPgG1CYtpIiHh4oJMpEz+GyHX35n11ELFKEfF1tRI0iwxUDQmB9gMIm8C2phvEsv9RhtGJLDK0+HugvR0nQhju2mF00iP9ny2vI5aVzguZquNDmAvJJBbzmkBQmnBJkqBe3tcqTkDFY1Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HtbxzGOu; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HtbxzGOu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436130; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6UTvDkzjNaf34OHog+LnlSzW82URMLMlLfB/a8vX8sM=; b=HtbxzGOuQU+ddS2iw3zRKw4x02iZ54VCsts0D5/EFSG5OzJPriRswrL/47yCTuK6mV5FQb 2Q1LmNmQwLazXIWdDKzsA/8VoALm/DqrVklyokT5+gCwZddXH9TSIvDBzvaMwlugTwTKll idr87qn+Hdbx77fuM/E3s9wg2cODmP4= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-678-7jOVUjo2OjGxB1sN0MVdhw-1; Wed, 02 Jul 2025 01:59:33 -0400 X-MC-Unique: 7jOVUjo2OjGxB1sN0MVdhw-1 X-Mimecast-MFC-AGG-ID: 7jOVUjo2OjGxB1sN0MVdhw_1751435969 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C1BDA1956096; Wed, 2 Jul 2025 05:59:28 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A06AB180045B; Wed, 2 Jul 2025 05:59:13 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 04/15] khugepaged: generalize alloc_charge_folio() Date: Tue, 1 Jul 2025 23:57:31 -0600 Message-ID: <20250702055742.102808-5-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" From: Dev Jain Pass order to alloc_charge_folio() and update mTHP statistics. Reviewed-by: Baolin Wang Co-developed-by: Nico Pache Signed-off-by: Nico Pache Signed-off-by: Dev Jain --- include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 4 ++++ mm/khugepaged.c | 17 +++++++++++------ 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4d5bb67dc4ec..a6ea89fdaee6 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -125,6 +125,8 @@ enum mthp_stat_item { MTHP_STAT_ANON_FAULT_ALLOC, MTHP_STAT_ANON_FAULT_FALLBACK, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, + MTHP_STAT_COLLAPSE_ALLOC, + MTHP_STAT_COLLAPSE_ALLOC_FAILED, MTHP_STAT_ZSWPOUT, MTHP_STAT_SWPIN, MTHP_STAT_SWPIN_FALLBACK, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ce130225a8e5..69777a35e722 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -614,6 +614,8 @@ static struct kobj_attribute _name##_attr =3D __ATTR_RO= (_name) DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FAL= LBACK_CHARGE); +DEFINE_MTHP_STAT_ATTR(collapse_alloc, MTHP_STAT_COLLAPSE_ALLOC); +DEFINE_MTHP_STAT_ATTR(collapse_alloc_failed, MTHP_STAT_COLLAPSE_ALLOC_FAIL= ED); DEFINE_MTHP_STAT_ATTR(zswpout, MTHP_STAT_ZSWPOUT); DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN); DEFINE_MTHP_STAT_ATTR(swpin_fallback, MTHP_STAT_SWPIN_FALLBACK); @@ -679,6 +681,8 @@ static struct attribute *any_stats_attrs[] =3D { #endif &split_attr.attr, &split_failed_attr.attr, + &collapse_alloc_attr.attr, + &collapse_alloc_failed_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7bcd4d280c71..5ff4a5bdf5f6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1060,21 +1060,26 @@ static int __collapse_huge_page_swapin(struct mm_st= ruct *mm, } =20 static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, - struct collapse_control *cc) + struct collapse_control *cc, u8 order) { gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); int node =3D khugepaged_find_target_node(cc); struct folio *folio; =20 - folio =3D __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); + folio =3D __folio_alloc(gfp, order, node, &cc->alloc_nmask); if (!folio) { *foliop =3D NULL; - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC_FAILED); return SCAN_ALLOC_HUGE_PAGE_FAIL; } =20 - count_vm_event(THP_COLLAPSE_ALLOC); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_COLLAPSE_ALLOC); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_ALLOC); + if (unlikely(mem_cgroup_charge(folio, mm, gfp))) { folio_put(folio); *foliop =3D NULL; @@ -1111,7 +1116,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, */ mmap_read_unlock(mm); =20 - result =3D alloc_charge_folio(&folio, mm, cc); + result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 @@ -1835,7 +1840,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); =20 - result =3D alloc_charge_folio(&new_folio, mm, cc); + result =3D alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out; =20 --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA49972637 for ; Wed, 2 Jul 2025 05:59:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435993; cv=none; b=E21TNEgM61F1v7v0feqv1b+7eTN1daM0sM3juf8Z81R9O5pJxqjCis/ylKTJvJWtIu0yoXNXfI7TDr1PJt8MV1vw6STNW+qsYDLS7KTbV2TzDAM3FKge8KfodyxbmzkFmjMYDhCoe46Jx9WNqPdQSZY//o3MCoGjZ43tInA6e04= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751435993; c=relaxed/simple; bh=1JRxXv/vT4tubdq8u46J7Pglj5F5ADS+7/ZJ18EuI64=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=l4uhkW6/7FJg73dEEo3lfBVCYoKRk7kbSqvGdVJyWDpBEsyqB/Fku5GsfS6Jx/dXpa5rUYAdl8gOzpe2ROPk/mW83nfVj0StCMmLiI0dGUQNMRTpLFxdXaBWCKg+OkABqf9ZPwu7Zk/B+ux7H8yzDQ9erwGcSkvzYPPYdgecxFY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HjuN8PLO; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HjuN8PLO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751435990; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sIH9BDMaqlzsWlJNg+YrL5p/TLctNhwLDQLMETbbVIg=; b=HjuN8PLOm3llX9LvStjbycUsS1dfjurQhDh5MxGLN4s+Y0tbNnNiFF2cR7/lGukEn7rTJ1 k65KdnNfBBmIxGo9X92YrCfwST+bEIuv1tld+xlIqLv8Sld8zIDsaUULL89NYdJZ54Gahc akuORi5y3zG6z8yB4DH5PslocRLUu4E= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-410-azsAXXNdMmySHwJ0VGwUkg-1; Wed, 02 Jul 2025 01:59:49 -0400 X-MC-Unique: azsAXXNdMmySHwJ0VGwUkg-1 X-Mimecast-MFC-AGG-ID: azsAXXNdMmySHwJ0VGwUkg_1751435985 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8B8BF1955EC1; Wed, 2 Jul 2025 05:59:44 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 66DED18003FC; Wed, 2 Jul 2025 05:59:29 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 05/15] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Tue, 1 Jul 2025 23:57:32 -0600 Message-ID: <20250702055742.102808-6-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse can suffer from incosistant behavior, and memory waste "creep". disable swapin and shared support for mTHP collapse. No functional changes in this patch. Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 48 ++++++++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5ff4a5bdf5f6..7a176f3b7729 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -552,15 +552,17 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct page *page =3D NULL; struct folio *folio =3D NULL; pte_t *_pte; int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, referenced = =3D 0; bool writable =3D false; + int scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + (1 << order); _pte++, address +=3D PAGE_SIZE) { pte_t pteval =3D ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -568,7 +570,7 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + none_or_zero <=3D scaled_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -596,8 +598,8 @@ static int __collapse_huge_page_isolate(struct vm_area_= struct *vma, /* See hpage_collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -698,13 +700,14 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct folio *src, *tmp; pte_t *_pte; pte_t pteval; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; + for (_pte =3D pte; _pte < pte + (1 << order); _pte++, address +=3D PAGE_SIZE) { pteval =3D ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { @@ -751,7 +754,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { spinlock_t *pmd_ptl; =20 @@ -768,7 +772,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } =20 /* @@ -789,7 +793,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, u8 order) { unsigned int i; int result =3D SCAN_SUCCEED; @@ -797,7 +801,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct= folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i =3D 0; i < HPAGE_PMD_NR; i++) { + for (i =3D 0; i < (1 << order); i++) { pte_t pteval =3D ptep_get(pte + i); struct page *page =3D folio_page(folio, i); unsigned long src_addr =3D address + i * PAGE_SIZE; @@ -816,10 +820,10 @@ static int __collapse_huge_page_copy(pte_t *pte, stru= ct folio *folio, =20 if (likely(result =3D=3D SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); =20 return result; } @@ -986,11 +990,11 @@ static int check_pmd_still_valid(struct mm_struct *mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, u8 order) { int swapped_in =3D 0; vm_fault_t ret =3D 0; - unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end =3D haddr + (PAGE_SIZE << order); int result; pte_t *pte =3D NULL; spinlock_t *ptl; @@ -1021,6 +1025,14 @@ static int __collapse_huge_page_swapin(struct mm_str= uct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; =20 + /* Dont swapin for mTHP collapse */ + if (order !=3D HPAGE_PMD_ORDER) { + pte_unmap(pte); + mmap_read_unlock(mm); + result =3D SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte =3D pte; vmf.ptl =3D ptl; ret =3D do_swap_page(&vmf); @@ -1141,7 +1153,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * that case. Continuing to collapse causes inconsistency. */ result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result !=3D SCAN_SUCCEED) goto out_nolock; } @@ -1189,7 +1201,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; @@ -1219,7 +1231,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result !=3D SCAN_SUCCEED)) goto out_up_write; --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 215AA1F5619 for ; Wed, 2 Jul 2025 06:00:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436010; cv=none; b=GY0iazkfApnoNfVu2J+34gUfjz0/y7DDp4Fhx98g3m0T01RSv4xGyw5no5/eMeSjvfUPRdb0AajWz20BILPZIc3suPhqYQuY7kRftvJ1El+cY4a8NPuKbBM05vr7Iw7zftcRkkf2BPECfKKmA9UTgCHApGMTFsbaqQ6GNDCLplk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436010; c=relaxed/simple; bh=ADGJhQ2aqaNJif+7Se2HTuIWtbcbq+rd9Fz2LCcVaRg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KwCIvWEstfKdy/eYdPsGFwGUBeTVnlpjbFxpfntTSESlMGWiR4i2ZhdHzKLDINXkloERqLMZ9VwfcGJe8udDeYV2M2hcjZkLkM0A8phT2A6uE1uPHzbpwWKWpThRVifvmX2e6DsbVrKPQuRGkOvOahSqWXsRACp/VWgkoGe5wC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DQA5tebF; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DQA5tebF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J7loTkprZZNXlyy874VYi0qZRn+a3HlQFG7EFdDFZf0=; b=DQA5tebFG6JaliPtxbCJbvSkTOHk/wjrKxMFE/WEMPaPEuDBxeuvo8akhlTyCPZAVG+lDV y8mUH6ENY4JRxr7xU9kFRknkQL6UoAqPm2VMxiRLIZtL+1M7/M2OHUGOp4S6sOb81dP/pv hQ6ESqfqSJFw3I3WNijU3WExGrROxpg= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-194-ZzPbPVkwMKWjJJxEGn_KPw-1; Wed, 02 Jul 2025 02:00:04 -0400 X-MC-Unique: ZzPbPVkwMKWjJJxEGn_KPw-1 X-Mimecast-MFC-AGG-ID: ZzPbPVkwMKWjJJxEGn_KPw_1751436000 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3215518001D6; Wed, 2 Jul 2025 06:00:00 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 35F2318003FC; Wed, 2 Jul 2025 05:59:44 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 06/15] khugepaged: introduce khugepaged_scan_bitmap for mTHP support Date: Tue, 1 Jul 2025 23:57:33 -0600 Message-ID: <20250702055742.102808-7-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" khugepaged scans anons PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of utilized sections of the PMD. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the anon PMD. A minimum collapse order of 2 is used as this is the lowest order supported by anon memory. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. When attempting to collapse an order that has its order set to "always" lets always collapse to that order in a greedy manner without considering the number of bits set. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 ++ mm/khugepaged.c | 96 ++++++++++++++++++++++++++++++++++---- 2 files changed, 90 insertions(+), 10 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index ff6120463745..0f957711a117 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,6 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H +#define KHUGEPAGED_MIN_MTHP_ORDER 2 +#define KHUGEPAGED_MIN_MTHP_NR (1<mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER, 0 }; + + while (top >=3D 0) { + state =3D cc->mthp_bitmap_stack[top--]; + order =3D state.order + KHUGEPAGED_MIN_MTHP_ORDER; + offset =3D state.offset; + num_chunks =3D 1 << (state.order); + // Skip mTHP orders that are not enabled + if (!test_bit(order, &enabled_orders)) + goto next; + + // copy the relavant section to a new bitmap + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set =3D bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + threshold_bits =3D (HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) + >> (HPAGE_PMD_ORDER - state.order); + + //Check if the region is "almost full" based on the threshold + if (bits_set > threshold_bits || is_pmd_only + || test_bit(order, &huge_anon_orders_always)) { + ret =3D collapse_huge_page(mm, address, referenced, unmapped, cc, + mmap_locked, order, offset * KHUGEPAGED_MIN_MTHP_NR); + if (ret =3D=3D SCAN_SUCCEED) { + collapsed +=3D (1 << order); + continue; + } + } + +next: + if (state.order > 0) { + next_order =3D state.order - 1; + mid_offset =3D offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1435,9 +1513,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked =3D false; + unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); } out: trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced, @@ -2385,7 +2461,7 @@ static int khugepaged_collapse_single_pmd(unsigned lo= ng addr, int result =3D SCAN_FAIL; struct mm_struct *mm =3D vma->vm_mm; =20 - if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { + if (!vma_is_anonymous(vma)) { struct file *file =3D get_file(vma->vm_file); pgoff_t pgoff =3D linear_page_index(vma, addr); =20 --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13E422AE8B for ; Wed, 2 Jul 2025 06:00:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436028; cv=none; b=kkfXo2L4QxTT2giXqGprPxL5V90XoB6Mn+b20samdLnlk9WFI2twhZIxkl3U8eMYsBi6gruAjyyarVtet7eCWfpLgTKdEchJu+Ax1syit7UuiAr14JsMWTPrGvooE1I6yINzKYyXUi5D7Y21GiDp+3abJhB4utvKmL7DOwmmekI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436028; c=relaxed/simple; bh=PCNRiR0tv58myAMxv9Rq9/Fc/h9SDsZWQjtfGknDC2A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SD5QoOAWWmTNViTQPJZa+Ys8BovqhvuGBuxbtMfHVfnLGepoadk9aJmcQg7/5DdMhBnkVt+E5qcsApADA8XsCr7SMNaE0nEyx4FZfbtNtZuaKe65BEp2y1WOD/8FtaxBpHjmp5Yfu4xs7I02w1QQtKUiLsyhPGXP5cQOmMcYbWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EvyS61lr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EvyS61lr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436025; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z0XP3utHs9NNbHopP0WUO88kGA3T6jwUuBhRcDkxWFY=; b=EvyS61lrXOcvqV2BywxJ276bVRQ1kMOKMW/mOFvOsw1/iK5QhGer4YPslBa0YE8WJQO2r/ Yh0XTmZ7sSClHauBlIsW9twlk0B71nNgzL2XECf2J4qIP8nqGsxjl+zVPGBHJYHqkrdlwo /dsG1qSH1whpSr6abg2f+PM2AJmF3U4= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-241-P9pgslO0Nc6aXLvNMw9oBg-1; Wed, 02 Jul 2025 02:00:20 -0400 X-MC-Unique: P9pgslO0Nc6aXLvNMw9oBg-1 X-Mimecast-MFC-AGG-ID: P9pgslO0Nc6aXLvNMw9oBg_1751436016 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 31CE1180029E; Wed, 2 Jul 2025 06:00:16 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C9BD318003FC; Wed, 2 Jul 2025 06:00:00 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 07/15] khugepaged: add mTHP support Date: Tue, 1 Jul 2025 23:57:34 -0600 Message-ID: <20250702055742.102808-8-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Introduce the ability for khugepaged to collapse to different mTHP sizes. While scanning PMD ranges for potential collapse candidates, keep track of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If mTHPs are enabled we remove the restriction of max_ptes_none during the scan phase so we dont bailout early and miss potential mTHP candidates. After the scan is complete we will perform binary recursion on the bitmap to determine which mTHP size would be most efficient to collapse to. max_ptes_none will be scaled by the attempted collapse order to determine how full a THP must be to be eligible. If a mTHP collapse is attempted, but contains swapped out, or shared pages, we dont perform the collapse. For non-PMD collapse we must leave the anon VMA write locked until after we collapse the mTHP-- in the PMD case all the pages are isolated, but in the non-PMD case this is not true, and we must keep the lock to prevent changes to the VMA from occurring. Signed-off-by: Nico Pache --- mm/khugepaged.c | 139 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 96 insertions(+), 43 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5e8496398afd..a42f08edb0e4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1124,13 +1124,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; - pte_t *pte; + pte_t *pte =3D NULL, mthp_pte; pgtable_t pgtable; struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result =3D SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; + unsigned long _address =3D address + offset * PAGE_SIZE; =20 VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 @@ -1146,13 +1147,13 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, *mmap_locked =3D false; } =20 - result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); + result =3D alloc_charge_folio(&folio, mm, cc, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 mmap_read_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, - BIT(HPAGE_PMD_ORDER)); + *mmap_locked =3D true; + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, BIT(order= )); if (result !=3D SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1170,13 +1171,14 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * released when it fails. So we jump out_nolock directly in * that case. Continuing to collapse causes inconsistency. */ - result =3D __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_swapin(mm, vma, _address, pmd, + referenced, order); if (result !=3D SCAN_SUCCEED) goto out_nolock; } =20 mmap_read_unlock(mm); + *mmap_locked =3D false; /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM @@ -1186,8 +1188,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, - BIT(HPAGE_PMD_ORDER)); + result =3D hugepage_vma_revalidate(mm, address, true, &vma, cc, BIT(order= )); if (result !=3D SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1198,11 +1199,12 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, vma_start_write(vma); anon_vma_lock_write(vma->anon_vma); =20 - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, - address + HPAGE_PMD_SIZE); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, + _address + (PAGE_SIZE << order)); mmu_notifier_invalidate_range_start(&range); =20 pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ + /* * This removes any huge TLB entry from the CPU so we won't allow * huge and small TLB entries for the same virtual address to @@ -1216,18 +1218,16 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); =20 - pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + pte =3D pte_offset_map_lock(mm, &_pmd, _address, &pte_ptl); if (pte) { - result =3D __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist, HPAGE_PMD_ORDER); + result =3D __collapse_huge_page_isolate(vma, _address, pte, cc, + &compound_pagelist, order); spin_unlock(pte_ptl); } else { result =3D SCAN_PMD_NULL; } =20 if (unlikely(result !=3D SCAN_SUCCEED)) { - if (pte) - pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); /* @@ -1242,17 +1242,17 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, } =20 /* - * All pages are isolated and locked so anon_vma rmap - * can't run anymore. + * For PMD collapse all pages are isolated and locked so anon_vma + * rmap can't run anymore */ - anon_vma_unlock_write(vma->anon_vma); + if (order =3D=3D HPAGE_PMD_ORDER) + anon_vma_unlock_write(vma->anon_vma); =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, - vma, address, pte_ptl, - &compound_pagelist, HPAGE_PMD_ORDER); - pte_unmap(pte); + vma, _address, pte_ptl, + &compound_pagelist, order); if (unlikely(result !=3D SCAN_SUCCEED)) - goto out_up_write; + goto out_unlock_anon_vma; =20 /* * The smp_wmb() inside __folio_mark_uptodate() ensures the @@ -1260,25 +1260,46 @@ static int collapse_huge_page(struct mm_struct *mm,= unsigned long address, * write. */ __folio_mark_uptodate(folio); - pgtable =3D pmd_pgtable(_pmd); - - _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); - _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order =3D=3D HPAGE_PMD_ORDER) { + pgtable =3D pmd_pgtable(_pmd); + _pmd =3D folio_mk_pmd(folio, vma->vm_page_prot); + _pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, address, pmd, _pmd); + update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { /* mTHP collapse */ + mthp_pte =3D mk_pte(&folio->page, vma->vm_page_prot); + mthp_pte =3D maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_ref_add(folio, (1 << order) - 1); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + set_ptes(vma->vm_mm, _address, pte, mthp_pte, (1 << order)); + update_mmu_cache_range(NULL, vma, _address, pte, (1 << order)); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); + spin_unlock(pmd_ptl); + } =20 folio =3D NULL; =20 result =3D SCAN_SUCCEED; +out_unlock_anon_vma: + if (order !=3D HPAGE_PMD_ORDER) + anon_vma_unlock_write(vma->anon_vma); out_up_write: + if (pte) + pte_unmap(pte); mmap_write_unlock(mm); out_nolock: *mmap_locked =3D false; @@ -1354,31 +1375,57 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; + int i; int result =3D SCAN_FAIL, referenced =3D 0; int none_or_zero =3D 0, shared =3D 0; struct page *page =3D NULL; struct folio *folio =3D NULL; unsigned long _address; + unsigned long enabled_orders; spinlock_t *ptl; int node =3D NUMA_NO_NODE, unmapped =3D 0; + bool is_pmd_only; bool writable =3D false; - + int chunk_none_count =3D 0; + int scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - KHUGEP= AGED_MIN_MTHP_ORDER); + unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; VM_BUG_ON(address & ~HPAGE_PMD_MASK); =20 result =3D find_pmd_or_thp_or_none(mm, address, &pmd); if (result !=3D SCAN_SUCCEED) goto out; =20 + bitmap_zero(cc->mthp_bitmap, MAX_MTHP_BITMAP_SIZE); + bitmap_zero(cc->mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + enabled_orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, + tva_flags, THP_ORDERS_ALL_ANON); + + is_pmd_only =3D (enabled_orders =3D=3D (1 << HPAGE_PMD_ORDER)); + pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); if (!pte) { result =3D SCAN_PMD_NULL; goto out; } =20 - for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, _address +=3D PAGE_SIZE) { + for (i =3D 0; i < HPAGE_PMD_NR; i++) { + /* + * we are reading in KHUGEPAGED_MIN_MTHP_NR page chunks. if + * there are pages in this chunk keep track of it in the bitmap + * for mTHP collapsing. + */ + if (i % KHUGEPAGED_MIN_MTHP_NR =3D=3D 0) { + if (chunk_none_count <=3D scaled_none) + bitmap_set(cc->mthp_bitmap, + i / KHUGEPAGED_MIN_MTHP_NR, 1); + chunk_none_count =3D 0; + } + + _pte =3D pte + i; + _address =3D address + i * PAGE_SIZE; pte_t pteval =3D ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; @@ -1401,10 +1448,11 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, } } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + ++chunk_none_count; ++none_or_zero; if (!userfaultfd_armed(vma) && - (!cc->is_khugepaged || - none_or_zero <=3D khugepaged_max_ptes_none)) { + (!cc->is_khugepaged || !is_pmd_only || + none_or_zero <=3D khugepaged_max_ptes_none)) { continue; } else { result =3D SCAN_EXCEED_NONE_PTE; @@ -1500,6 +1548,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, address))) referenced++; } + if (!writable) { result =3D SCAN_PAGE_RO; } else if (cc->is_khugepaged && @@ -1512,8 +1561,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { - result =3D collapse_huge_page(mm, address, referenced, - unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); + result =3D khugepaged_scan_bitmap(mm, address, referenced, unmapped, cc, + mmap_locked, enabled_orders); + if (result > 0) + result =3D SCAN_SUCCEED; + else + result =3D SCAN_FAIL; } out: trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced, --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCBDB3FC7 for ; Wed, 2 Jul 2025 06:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436042; cv=none; b=Mnl6ZhrOftO7FVgIRbmTvkAABoQ/Py+WHEZ5PB+/pDagq+4g41yEJ38CBcjIiuFsXb3oggNmg1Bk9AdlAhpNEFg1rMUx0G8fbYTgqgXJQriN478XLKHtWphryez7z9v6wIPhUHb+B+TklWG2tyHuDMZD73XZUB34HlIyXMOvKeg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436042; c=relaxed/simple; bh=vUBEjYbGY3JxaS0BK4Tz2WAGo1AGM70r1TbHyW1HY2o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F36Gi5BhwCdefXw5qk9xwQ950M6Pz4zAoi1k0kv8enhDFRpg7n3Rk7Y+VjRgFvYNeaCer29dLjkI0QiWvYaesrTwR7KgYhWezzWvlPOT6Ci0q5bX1WND2vJEh22rO6WIsOCcV2NHICmVpga85DQyrFBLQAmI1zkYa7u639Zp8XQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Xgy5ltw1; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Xgy5ltw1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436039; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dnM62A0y0GYLJiYfrqmXPdYYIUiA0LEzu0+nPh3D+kk=; b=Xgy5ltw1j5pVZ2rWiHMJHG1BwxPOk+UellvJz8UaGYRay+LHN4LntdmWwld/qC+auxVj5X tKB9wNK+kCBnI4s847RQJa9h8ghkicSdx+CrjZLiyD/4eiYuxRt2V6gzxddt6uk07Y562b tJZwUJZxXUPG2tJx8sVUxJR8Tx7pQgA= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-577-paYXR1uOMJuRxkW0FtiDog-1; Wed, 02 Jul 2025 02:00:36 -0400 X-MC-Unique: paYXR1uOMJuRxkW0FtiDog-1 X-Mimecast-MFC-AGG-ID: paYXR1uOMJuRxkW0FtiDog_1751436032 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3099A1953941; Wed, 2 Jul 2025 06:00:32 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E860F18003FC; Wed, 2 Jul 2025 06:00:16 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 08/15] khugepaged: skip collapsing mTHP to smaller orders Date: Tue, 1 Jul 2025 23:57:35 -0600 Message-ID: <20250702055742.102808-9-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" khugepaged may try to collapse a mTHP to a smaller mTHP, resulting in some pages being unmapped. Skip these cases until we have a way to check if its ok to collapse to a smaller mTHP size (like in the case of a partially mapped folio). This patch is inspired by Dev Jain's work on khugepaged mTHP support [1]. [1] https://lore.kernel.org/lkml/20241216165105.56185-11-dev.jain@arm.com/ Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a42f08edb0e4..61d2b3ebc7ac 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -612,7 +612,12 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, folio =3D page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); =20 - /* See hpage_collapse_scan_pmd(). */ + if (order !=3D HPAGE_PMD_ORDER && folio_order(folio) >=3D order) { + result =3D SCAN_PTE_MAPPED_HUGEPAGE; + goto out; + } + + /* See khugepaged_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6ACE61A2C04 for ; Wed, 2 Jul 2025 06:00:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436059; cv=none; b=WvGntwh38smK6T716BPFWIdrcj+7ROPxczEYAUIPeCt73qiBH+hX+eW9+eLFQ4+xoMUpEsgqDRXJdpB+I7oAE6n3wDH8V668gefBvjiKQrtD+oe1jfrCI6Q27Qm3aR5W0ato8iNW0vCYhnlB7rTuD2HfD45y2y9MQFlaDMjKWbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436059; c=relaxed/simple; bh=4gUo5mhAQoIRF3zDizyxmkFzfYyi85yyGqjfCEGTlIk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ag83A58VWFNmrr5g9bYW4+cI+SP/XvAiwhH4E0QwaRG+dYp9fG1KFXbT7p/pPuKur3DWdZCtFKWTEz/ZnEWUnCP0gKDAwv+bd7zZH3c+E0dBVOXjOuZFJVjIY//y1YPGZCTPEmtoAd0xetXFpbbhkDTUN7QofTQFeiMUCYjpWM4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UJ4bxkv1; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UJ4bxkv1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hlMD3t7ycJKRomUVuQAiuk5BAoj5s4ukulb+cMLGP2s=; b=UJ4bxkv1mq7BB2B0GuTSK+8j5/A3NhlupAtbUetnBtscJVxW/YRJGAnWDmwW6LkPfAm0bt NEz43u+6YftnbMqfUp0lrXOSdAhqqy5fZvYgmEM77tKaHcsxQupxagIugYkZCP3LejKm6/ B0wWGoIgOcSxO0JUvnVGIsDWp+24x94= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-600-cY2WhJ9yNnWCB4gRyLU_-g-1; Wed, 02 Jul 2025 02:00:53 -0400 X-MC-Unique: cY2WhJ9yNnWCB4gRyLU_-g-1 X-Mimecast-MFC-AGG-ID: cY2WhJ9yNnWCB4gRyLU_-g_1751436048 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4E0671955EC3; Wed, 2 Jul 2025 06:00:48 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C9E7A180045B; Wed, 2 Jul 2025 06:00:32 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 09/15] khugepaged: avoid unnecessary mTHP collapse attempts Date: Tue, 1 Jul 2025 23:57:36 -0600 Message-ID: <20250702055742.102808-10-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" There are cases where, if an attempted collapse fails, all subsequent orders are guaranteed to also fail. Avoid these collapse attempts by bailing out early. Signed-off-by: Nico Pache --- mm/khugepaged.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 61d2b3ebc7ac..50e1d7ef7e6d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1358,6 +1358,23 @@ static int khugepaged_scan_bitmap(struct mm_struct *= mm, unsigned long address, collapsed +=3D (1 << order); continue; } + /* + * Some ret values indicate all lower order will also + * fail, dont trying to collapse smaller orders + */ + if (ret =3D=3D SCAN_EXCEED_NONE_PTE || + ret =3D=3D SCAN_EXCEED_SWAP_PTE || + ret =3D=3D SCAN_EXCEED_SHARED_PTE || + ret =3D=3D SCAN_PTE_NON_PRESENT || + ret =3D=3D SCAN_PTE_UFFD_WP || + ret =3D=3D SCAN_ALLOC_HUGE_PAGE_FAIL || + ret =3D=3D SCAN_CGROUP_CHARGE_FAIL || + ret =3D=3D SCAN_COPY_MC || + ret =3D=3D SCAN_PAGE_LOCK || + ret =3D=3D SCAN_PAGE_COUNT) + goto next; + else + break; } =20 next: --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 450AF1EB5DB for ; Wed, 2 Jul 2025 06:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436074; cv=none; b=Epc9SQfNxvitBM25AhTkYkgNUpQ1Ppo9jhpuxp/VwYK7rDhSz8MBwzU/F7DtE1Jxl4pUGX+FGSYIM/0/cuAFUtw4sWwjS9iI/0ayFQLGTjfEc+sFDpWVC584KSLQnQ7zXpjzAfGQYClCNE1Q+zSHYHDUR3wB7TSjLKZXYm86Xlg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436074; c=relaxed/simple; bh=4V3VZKJ7tNnTF9gQxOHiRZHMosS+JqRZ4B6S7aZ2Xgc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J38AiKTz3EySnxmzSZi2kVg9qDgtdmSHVHvSeIaVy8RRkibGWNYeQ17RJeXy4E3+ieQTow6bKhKLFp3wAy5I5y8g2onjP1IeGvup3D0vJexRRTA3zwjnN3WDyY0oGiRscfFWAbZMS6sftI5sd47WU8mkwSWEDQsika7p6PQ43cc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=I9E93vmT; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="I9E93vmT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436072; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=raAG5xPTJbhWGRq+uqBJOcAkRPs8VPHBCr9uCuWQi40=; b=I9E93vmTQDkYpubVV/JVbxvoTAXniC1/Fvci/WDeEQTNVIAeAof9b3HF4q4V+dFHLaQ6JU DLPB93mK2U8sXPGdf/KylZKW55rieZO6twY9IBEv7kZ52XmVSL9r9Vp/nWPihtrLjlSEh9 wFbjrjA9wJpHBJ9vV0YAxPu0NGDVmxg= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-117-kIqpZ7gwN_eHr4aMxsfPug-1; Wed, 02 Jul 2025 02:01:07 -0400 X-MC-Unique: kIqpZ7gwN_eHr4aMxsfPug-1 X-Mimecast-MFC-AGG-ID: kIqpZ7gwN_eHr4aMxsfPug_1751436063 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A5862195FCCB; Wed, 2 Jul 2025 06:01:03 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E842318003FC; Wed, 2 Jul 2025 06:00:48 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 10/15] khugepaged: allow khugepaged to check all anonymous mTHP orders Date: Tue, 1 Jul 2025 23:57:37 -0600 Message-ID: <20250702055742.102808-11-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" From: Baolin Wang We have now allowed mTHP collapse, but thp_vma_allowable_order() still only checks if the PMD-sized mTHP is allowed to collapse. This prevents scanning and collapsing of 64K mTHP when only 64K mTHP is enabled. Thus, we should modify the checks to allow all large orders of anonymous mTHP. Signed-off-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 50e1d7ef7e6d..b96a7327b9c0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -491,8 +491,11 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS, - PMD_ORDER)) + unsigned long orders =3D vma_is_anonymous(vma) ? + THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); + + if (thp_vma_allowable_orders(vma, vm_flags, TVA_ENFORCE_SYSFS, + orders)) __khugepaged_enter(vma->vm_mm); } } @@ -2612,6 +2615,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, =20 vma_iter_init(&vmi, mm, khugepaged_scan.address); for_each_vma(vmi, vma) { + unsigned long orders =3D vma_is_anonymous(vma) ? + THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); unsigned long hstart, hend; =20 cond_resched(); @@ -2619,8 +2624,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, - TVA_ENFORCE_SYSFS, PMD_ORDER)) { + if (!thp_vma_allowable_orders(vma, vma->vm_flags, + TVA_ENFORCE_SYSFS, orders)) { skip: progress++; continue; --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B7431F4621 for ; Wed, 2 Jul 2025 06:01:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436094; cv=none; b=XWxdm6IOUFgsjb8uvl8XFM7KFTYcjFKVGRW0o/P9UeGiTkTGmZoRRqvKHTD2OqzZGFU9Lm5uvJnDpt2ZgzRXoWdREMsGZQCAK3sccUxaXX1ySb/aTS1rj7GLi/F525l+gwqYwXHXNTnJkdAOrGdoruGuNr6egWDjQNveZSusLaQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436094; c=relaxed/simple; bh=g4V9DvZy86AmYxsZbjXmyqrMPPS/X9r+lilkJ6a6Wac=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MJ5/eLPIIXGtWgfr5nUl+hPVj+Qr5Hs5Mvu7q4XnlOou8P6FCqtgk00HQV0c8qLr03PS29z9eInjbjcr4iMdqm97cagS47plzbs+PDND07Eys4ECj7SCm0NAFnyojQCcBhMmiVq+1mKmWhDOHDp4nkZ3OMVVChpbhJfThO8fwnw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=H5fbS9dO; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H5fbS9dO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436091; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pClnkwE28GE4qFvtDIHtCpJBujOjEEnlN0VFfKLLmHs=; b=H5fbS9dOLvqP006C12msiqd3sYf9wu0Es+JWprBdy0oTbCjkDuhfzyYN6bS2ru4lH/n6+i nWIlPvmLxeSl3HYGhTGI0SF5TQRbxg7vXvYGE674J6z8R0FPTW78hopHpnt1bqcupGLeu8 Rhudzx2G0iw9bnz8R5pj8iThWMAOPXc= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-511-vEXaRJnlMOeMD53K8UT_uA-1; Wed, 02 Jul 2025 02:01:28 -0400 X-MC-Unique: vEXaRJnlMOeMD53K8UT_uA-1 X-Mimecast-MFC-AGG-ID: vEXaRJnlMOeMD53K8UT_uA_1751436081 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 39E1C180120D; Wed, 2 Jul 2025 06:01:19 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5241B18003FC; Wed, 2 Jul 2025 06:01:04 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 11/15] khugepaged: allow madvise_collapse to check all anonymous mTHP orders Date: Tue, 1 Jul 2025 23:57:38 -0600 Message-ID: <20250702055742.102808-12-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Allow madvise_collapse to scan/collapse all mTHP orders without the strict requirement of needing the PMD-order enabled. Signed-off-by: Nico Pache --- mm/khugepaged.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b96a7327b9c0..6ea681b81647 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2930,11 +2930,13 @@ int madvise_collapse(struct vm_area_struct *vma, un= signed long start, unsigned long hstart, hend, addr; int thps =3D 0, last_fail =3D SCAN_FAIL; bool mmap_locked =3D true; + unsigned long orders =3D vma_is_anonymous(vma) ? + THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); =20 BUG_ON(vma->vm_start > start); BUG_ON(vma->vm_end < end); =20 - if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) + if (!thp_vma_allowable_orders(vma, vma->vm_flags, 0, orders)) return -EINVAL; =20 cc =3D kmalloc(sizeof(*cc), GFP_KERNEL); @@ -2956,7 +2958,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, mmap_read_lock(mm); mmap_locked =3D true; result =3D hugepage_vma_revalidate(mm, addr, false, &vma, - cc, BIT(HPAGE_PMD_ORDER)); + cc, orders); if (result !=3D SCAN_SUCCEED) { last_fail =3D result; goto out_nolock; --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E96141DF96F for ; Wed, 2 Jul 2025 06:01:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436105; cv=none; b=k+mxrEYZnsRj4BWBLkOaUKFGnKMRF3pmIPZlpYLTVW/VqEvrl6uWCG4tAWJT10Ookpg0MtSybBbrek8m6fGvzhLc4rSUjHI776TjqzXXcujtIRvuBBIjGVyCd0GN53B6S8Fl/qpL+w1ZsSWXaKLEZTowXhqjdOyeaN0peLr3h4g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436105; c=relaxed/simple; bh=PCz1XiVA+FkwVn8JcYxuVc66+H9tb0XXfMn1uyb/VJY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M5WcqW3PW5IkfQoqyoO6su/c+lvuwS63bzKmqGAYtKbv63yDpPpBbTvo0YTVpjZPrbYETeierOzCHce9H4oNUbLvqnTfcK21ag9NPVKKyeoz+5HQbEK0yc0Djiyma7NSZ9xgYW6EEhfkES0XlBFOtydLynNrqaOWYmfVkOTZeUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PW86EjYH; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PW86EjYH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436103; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5nTPrWf9Aqck2ptKb2zyXs5vzzOQnB/G54DtwX2rrW0=; b=PW86EjYHgT9SMSQwBEmroumFdd1o+Ld6yp6ACH7ZTVDXdRawtYeHDws0fI6q/BdjUcdlcu 4XfmpxV4QkrciA+r4d16f96DobUxPjZKOfLprTbh2ke3oWGHW5GSWKbmkizdkCix0szJO9 pKW0Z7lYdu04OXxwPkE3fKB+33YWOxs= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-62-aYZuKyN1MwOe2pokc-BKxA-1; Wed, 02 Jul 2025 02:01:38 -0400 X-MC-Unique: aYZuKyN1MwOe2pokc-BKxA-1 X-Mimecast-MFC-AGG-ID: aYZuKyN1MwOe2pokc-BKxA_1751436094 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 259231809C8B; Wed, 2 Jul 2025 06:01:34 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9ED7F18003FC; Wed, 2 Jul 2025 06:01:19 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 12/15] khugepaged: kick khugepaged for enabling none-PMD-sized mTHPs Date: Tue, 1 Jul 2025 23:57:39 -0600 Message-ID: <20250702055742.102808-13-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" From: Baolin Wang When only non-PMD-sized mTHP is enabled (such as only 64K mTHP enabled), we should also allow kicking khugepaged to attempt scanning and collapsing 64K mTHP. Modify hugepage_pmd_enabled() to support mTHP collapse, and while we are at it, rename it to make the function name more clear. Signed-off-by: Baolin Wang Signed-off-by: Nico Pache --- mm/khugepaged.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6ea681b81647..d20a217387ed 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -430,7 +430,7 @@ static inline int khugepaged_test_exit_or_disable(struc= t mm_struct *mm) test_bit(MMF_DISABLE_THP, &mm->flags); } =20 -static bool hugepage_pmd_enabled(void) +static bool hugepage_enabled(void) { /* * We cover the anon, shmem and the file-backed case here; file-backed @@ -442,11 +442,11 @@ static bool hugepage_pmd_enabled(void) if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && hugepage_global_enabled()) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_always)) + if (READ_ONCE(huge_anon_orders_always)) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_madvise)) + if (READ_ONCE(huge_anon_orders_madvise)) return true; - if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) && + if (READ_ONCE(huge_anon_orders_inherit) && hugepage_global_enabled()) return true; if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled()) @@ -490,7 +490,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags) { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && - hugepage_pmd_enabled()) { + hugepage_enabled()) { unsigned long orders =3D vma_is_anonymous(vma) ? THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); =20 @@ -2702,7 +2702,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, =20 static int khugepaged_has_work(void) { - return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled(); + return !list_empty(&khugepaged_scan.mm_head) && hugepage_enabled(); } =20 static int khugepaged_wait_event(void) @@ -2775,7 +2775,7 @@ static void khugepaged_wait_work(void) return; } =20 - if (hugepage_pmd_enabled()) + if (hugepage_enabled()) wait_event_freezable(khugepaged_wait, khugepaged_wait_event()); } =20 @@ -2806,7 +2806,7 @@ static void set_recommended_min_free_kbytes(void) int nr_zones =3D 0; unsigned long recommended_min; =20 - if (!hugepage_pmd_enabled()) { + if (!hugepage_enabled()) { calculate_min_free_kbytes(); goto update_wmarks; } @@ -2856,7 +2856,7 @@ int start_stop_khugepaged(void) int err =3D 0; =20 mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled()) { + if (hugepage_enabled()) { if (!khugepaged_thread) khugepaged_thread =3D kthread_run(khugepaged, NULL, "khugepaged"); @@ -2882,7 +2882,7 @@ int start_stop_khugepaged(void) void khugepaged_min_free_kbytes_update(void) { mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled() && khugepaged_thread) + if (hugepage_enabled() && khugepaged_thread) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DF78C2E0 for ; Wed, 2 Jul 2025 06:01:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436121; cv=none; b=kuNnYleaw9+iNwV6zZ91kwsj9a/Kx77i8Rt6m8QKgUoo+KYXfPRB2pLBY5LZx2kb2+wQlzNQEX764+3B2k0ExxIRFImO6X9aI51dZvFDwUsPvv3+Yk2Y4eVocKIN+dzQDsvidsVmayrTAQGDC6qjasuYvPXQU0S0D5sfGA/wcx0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436121; c=relaxed/simple; bh=CWjDlIqmI7r5MX+6Ff6+BdoQ8VNzO9tkbjoSxSvE4Kc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZEz6PsG9rq8GllGEM0xQrbY0zmoLO4jMRbhF4fiGNEVlwmaSJjsYcosQQLzEM6tuCJzBJp9uh0AwwjUko/mHjoeG3wJT3Ley3st6AZ9hSbOL7fC0/X3C1XSeopWOmVICkMzhlbDxGmNS/v+amcDKV5FrCKuB40rfdlsBD0hG83A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bSnz2B3j; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bSnz2B3j" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Te3ncyc0BcAKNiPn3UGiXNtvO3eBlJiyC77AptKisoc=; b=bSnz2B3jDoYY/cJRHbNi5eQ2VYZ2nJpkTD6tBHMY/HbJuxGw6lQlU/Fh9lpLRDdYACB+MC JOVCgBPZ8XS7cLIPaOwc8K2UdZH/yGx2GOIwfrsrgyc2Htpqucn19uBQbHbeeolw60er+5 2bWLR7oaAWm1RUoIEeBvNEJo4jsfsHA= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-pIqVZFUKM5i-5tAW-osWpQ-1; Wed, 02 Jul 2025 02:01:54 -0400 X-MC-Unique: pIqVZFUKM5i-5tAW-osWpQ-1 X-Mimecast-MFC-AGG-ID: pIqVZFUKM5i-5tAW-osWpQ_1751436110 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F1424180028C; Wed, 2 Jul 2025 06:01:49 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C81B918003FC; Wed, 2 Jul 2025 06:01:34 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 13/15] khugepaged: improve tracepoints for mTHP orders Date: Tue, 1 Jul 2025 23:57:40 -0600 Message-ID: <20250702055742.102808-14-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Add the order to the tracepoints to give better insight into what order is being operated at for khugepaged. Reviewed-by: Baolin Wang Signed-off-by: Nico Pache --- include/trace/events/huge_memory.h | 34 +++++++++++++++++++----------- mm/khugepaged.c | 10 +++++---- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index 2305df6cb485..70661bbf676f 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -92,34 +92,37 @@ TRACE_EVENT(mm_khugepaged_scan_pmd, =20 TRACE_EVENT(mm_collapse_huge_page, =20 - TP_PROTO(struct mm_struct *mm, int isolated, int status), + TP_PROTO(struct mm_struct *mm, int isolated, int status, int order), =20 - TP_ARGS(mm, isolated, status), + TP_ARGS(mm, isolated, status, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, isolated) __field(int, status) + __field(int, order) ), =20 TP_fast_assign( __entry->mm =3D mm; __entry->isolated =3D isolated; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, isolated=3D%d, status=3D%s", + TP_printk("mm=3D%p, isolated=3D%d, status=3D%s order=3D%d", __entry->mm, __entry->isolated, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_isolate, =20 TP_PROTO(struct folio *folio, int none_or_zero, - int referenced, bool writable, int status), + int referenced, bool writable, int status, int order), =20 - TP_ARGS(folio, none_or_zero, referenced, writable, status), + TP_ARGS(folio, none_or_zero, referenced, writable, status, order), =20 TP_STRUCT__entry( __field(unsigned long, pfn) @@ -127,6 +130,7 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __field(int, referenced) __field(bool, writable) __field(int, status) + __field(int, order) ), =20 TP_fast_assign( @@ -135,27 +139,31 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __entry->referenced =3D referenced; __entry->writable =3D writable; __entry->status =3D status; + __entry->order =3D order; ), =20 - TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, writable= =3D%d, status=3D%s", + TP_printk("scan_pfn=3D0x%lx, none_or_zero=3D%d, referenced=3D%d, writable= =3D%d, status=3D%s order=3D%d", __entry->pfn, __entry->none_or_zero, __entry->referenced, __entry->writable, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); =20 TRACE_EVENT(mm_collapse_huge_page_swapin, =20 - TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret), + TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret, + int order), =20 - TP_ARGS(mm, swapped_in, referenced, ret), + TP_ARGS(mm, swapped_in, referenced, ret, order), =20 TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, swapped_in) __field(int, referenced) __field(int, ret) + __field(int, order) ), =20 TP_fast_assign( @@ -163,13 +171,15 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->swapped_in =3D swapped_in; __entry->referenced =3D referenced; __entry->ret =3D ret; + __entry->order =3D order; ), =20 - TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d", + TP_printk("mm=3D%p, swapped_in=3D%d, referenced=3D%d, ret=3D%d, order=3D%= d", __entry->mm, __entry->swapped_in, __entry->referenced, - __entry->ret) + __entry->ret, + __entry->order) ); =20 TRACE_EVENT(mm_khugepaged_scan_file, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d20a217387ed..2c0962637c34 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -711,13 +711,14 @@ static int __collapse_huge_page_isolate(struct vm_are= a_struct *vma, } else { result =3D SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(folio, none_or_zero, - referenced, writable, result); + referenced, writable, result, + order); return result; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(folio, none_or_zero, - referenced, writable, result); + referenced, writable, result, order); return result; } =20 @@ -1088,7 +1089,8 @@ static int __collapse_huge_page_swapin(struct mm_stru= ct *mm, =20 result =3D SCAN_SUCCEED; out: - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result, + order); return result; } =20 @@ -1313,7 +1315,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, *mmap_locked =3D false; if (folio) folio_put(folio); - trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result); + trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, result, order= ); return result; } =20 --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C25162063FD for ; Wed, 2 Jul 2025 06:02:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436140; cv=none; b=nlUV5yJWvS1IxkR1mXHoXjPn9AmzK3YUuYyNJSbHUTAEycwrkQcp0KLFz4KthtrkDOdkSiPGTAEQB3TPE2s6r4VSJLYcwJAOlC51BPoXDYDjasNqd16EIfr4UB9ncUKs7gwrF9copo5GlozZyz0faRswvkPF+3dIuwIwjZwCIec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436140; c=relaxed/simple; bh=NgyC3gSpfP77l+z7bbDGL6zgPF14iYn9gkyrkHGERe0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ONUazsW0fUd/oTs7Kjz/RDhUTHNk4DHoK+V0CTNYIausbIYFyXQlzSv7UKq8n0vGTjpZJtpqBWoX7Iy60ri20L+/X59YEEMPjpPI27ecoC8MQI+7LH9krfeNRdGq5YMsqELUI1Q79l6pvY4ymdXSBua5n/V9SNlYmJwqxvQqHRo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=jQkv3951; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="jQkv3951" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KrLYUyOSHv6ti8mLUpiK2UpRJKu3qFGfunZUfG2C6A4=; b=jQkv3951/8mQmAyeJYecMBcq0SXXYQQpoD+3hWPFYR6ejr0mW4JbU95uGfhv1V3tCO8qGc eB8/CB0KY3FMu7TveB/43HuvDwtFityCFFsK6K9tLi6rAdicVQ/O4aigbHzDuqEWbeeSQq pUS8N3uQ2olqLE9luAJnHmUqFE0P6Jg= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-140-xQprs1ByMli6vDlVE0PNBw-1; Wed, 02 Jul 2025 02:02:09 -0400 X-MC-Unique: xQprs1ByMli6vDlVE0PNBw-1 X-Mimecast-MFC-AGG-ID: xQprs1ByMli6vDlVE0PNBw_1751436125 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 78E0F195608C; Wed, 2 Jul 2025 06:02:05 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9DFEC1800285; Wed, 2 Jul 2025 06:01:50 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 14/15] khugepaged: add per-order mTHP khugepaged stats Date: Tue, 1 Jul 2025 23:57:41 -0600 Message-ID: <20250702055742.102808-15-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" With mTHP support inplace, let add the per-order mTHP stats for exceeding NONE, SWAP, and SHARED. Signed-off-by: Nico Pache --- include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 15 ++++++++++++--- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a6ea89fdaee6..6034b4c9dae5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -141,6 +141,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; =20 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 69777a35e722..3eb1c34be601 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -632,6 +632,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FA= ILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALL= Y_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_= SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_= NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEE= D_SHARED); + =20 static struct attribute *anon_stats_attrs[] =3D { &anon_fault_alloc_attr.attr, @@ -648,6 +652,9 @@ static struct attribute *anon_stats_attrs[] =3D { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2c0962637c34..636b84bf1ca1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -594,7 +594,10 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, continue; } else { result =3D SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (order =3D=3D HPAGE_PMD_ORDER) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + else + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } } @@ -623,8 +626,14 @@ static int __collapse_huge_page_isolate(struct vm_area= _struct *vma, /* See khugepaged_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (order !=3D HPAGE_PMD_ORDER) { + result =3D SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result =3D SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; --=20 2.49.0 From nobody Wed Oct 8 05:16:15 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9AB01FDA89 for ; Wed, 2 Jul 2025 06:02:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436152; cv=none; b=snanf8EHC2Qkf3viKpqOVLCerlE5EFYxX6DHaxkWg+AbYZ0sto/cFucfBlUVJObSJ7n/IYfAgDuTfxczXPe8pmw6yEzulWCwZOFITbr5sWpoGQ4AbGsPz6Vji2d7MSKhn7w1hy1/kVmpyVjrfAK8mm27eoquzLXZma1N2n4x0t8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751436152; c=relaxed/simple; bh=KsDa0dapENPz5lMBPiueXxkXVedRNjXG9mMzmcErq9k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qVpopZdORgoKfUYDrJcTIMySFN92pMpB9WwpatT/ICLOh/4RpGI8w16Exc7IRyTnthlvS80cDwBYJul6DLGs7NH6TOad3lIuOEXROMAtpGnSczdOG/OSlxFX6km65XSzrn6GmeY/fgKu71jgPzXu3QcaiTTCiO0xvla67J1wcCA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CEkudyYT; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CEkudyYT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436149; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nIGpPqgNzgq47EY5ZTEnTs1FneYRqT/zHlE7fmslHU8=; b=CEkudyYT8gvyAG5K5o9PXfT2ya5g9TObKuc45vKeYfGPJxYMes+v9u/kbhxM5wJR44K1o9 JPuyy0Vcnj0ns+YB2ve2Q2cuRwb1w6FI59QUci6m0015RgA7JR/YuL3+mQMmyaX2f0saoJ PW8Oz1vd30VUpcUDb+Zh5jm0mi0f61k= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-636-exMEV-qmP7mWzW_W7wlbDA-1; Wed, 02 Jul 2025 02:02:25 -0400 X-MC-Unique: exMEV-qmP7mWzW_W7wlbDA-1 X-Mimecast-MFC-AGG-ID: exMEV-qmP7mWzW_W7wlbDA_1751436141 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3F2E51955ECA; Wed, 2 Jul 2025 06:02:21 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1B00618003FC; Wed, 2 Jul 2025 06:02:05 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, Bagas Sanjaya Subject: [PATCH v8 15/15] Documentation: mm: update the admin guide for mTHP collapse Date: Tue, 1 Jul 2025 23:57:42 -0600 Message-ID: <20250702055742.102808-16-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Now that we can collapse to mTHPs lets update the admin guide to reflect these changes and provide proper guidence on how to utilize it. Reviewed-by: Bagas Sanjaya Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index dff8d5985f0f..878796b4d7d3 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -63,7 +63,7 @@ often. THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is ``khugepaged`` daemon that scans memory and -collapses sequences of basic pages into PMD-sized huge pages. +collapses sequences of basic pages into huge pages. =20 The THP behaviour is controlled via :ref:`sysfs ` interface and using madvise(2) and prctl(2) system calls. @@ -144,6 +144,18 @@ hugepage sizes have enabled=3D"never". If enabling mul= tiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation. =20 +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size +to determine collapses. When using mTHPs it's recommended to set +max_ptes_none low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page +size). This will prevent undesired "creep" behavior that leads to +continuously collapsing to the largest mTHP size; when we collapse, we are +bringing in new non-zero pages that will, on a subsequent scan, cause the +max_ptes_none check of the +1 order to always be satisfied. By limiting +this to less than half the current order, we make sure we don't cause this +feedback loop. max_ptes_shared and max_ptes_swap have no effect when +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out +pages. + It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular @@ -221,11 +233,6 @@ top-level control are "never") Khugepaged controls ------------------- =20 -.. note:: - khugepaged currently only searches for opportunities to collapse to - PMD-sized THP and no attempt is made to collapse to other THP - sizes. - khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's --=20 2.49.0