From nobody Sat Oct 4 04:56:50 2025 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 386D52D94AC for ; Wed, 20 Aug 2025 09:07:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.118 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680866; cv=none; b=bt2OJRWm7UZnoNCakPrJvpHQcMMlC9SkQmOhiGLQq1T295uAjr2SEiyywGcyhhUrA76lypLJOiHtgknTXf/aeayigUuoRJPZh9+H09ozLEeySP2qoN+zWfahFgm7H+Rd8+y11IR8DR3/6L3VcOyFHSR3KOsXQFa9OYNW7TYfLpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680866; c=relaxed/simple; bh=mCTblTkWY/P/L2IRDLbm+6ThRwSwTwsbNqN1g3D8a5c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ecO1CLb/NQjVh3QgsSs9U3c6Ju3aW1ZUt05iRIApgtnDKVw7nYAoCoi7oWPFZ4fzpHq3nb0zcc/MEV5LlcTrmFDCap71POJc/CCcaq8+REuHnyXH5LsfeH/wba0Qpn0YfwXGatKh6dhojAzg2a1YoYewpgiMEIrnjG9tfwFhRgw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=YZBXdtmb; arc=none smtp.client-ip=115.124.30.118 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="YZBXdtmb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680854; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=yzdqgUo48S+1eTljuJiSw1fVT7iNeD7IfBQzpt10nJg=; b=YZBXdtmblAJH1wxXxOsOBaKsfY6H7RkX8x4IbJNFyxQsutySbICgwULl0BznFT/Y/+CypznHd3sT+irOHRV0sHRJ0ptWu2YFMMdsD8/tH8EVeQEOLMVJ2sM8K3Fg/OsLP7BA2dNPxNV8LSdwnJNRsO/DMFEBJnSIb3z1ZcGo0uk= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBYTxE_1755680853 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:34 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 01/11] mm: khugepaged: add khugepaged_max_ptes_none check in collapse_file() Date: Wed, 20 Aug 2025 17:07:12 +0800 Message-ID: X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Similar to the anonymous folios collapse, we should also check the 'khugepa= ged_max_ptes_none' when trying to collapse shmem/file folios. Signed-off-by: Baolin Wang --- mm/khugepaged.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5a3386043f39..5d4493b77f3c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2125,6 +2125,13 @@ static int collapse_file(struct mm_struct *mm, unsig= ned long addr, } } nr_none++; + + if (cc->is_khugepaged && nr_none > khugepaged_max_ptes_none) { + result =3D SCAN_EXCEED_NONE_PTE; + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + goto xa_locked; + } + index++; continue; } --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E67E02DAFA5 for ; Wed, 20 Aug 2025 09:07:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.98 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680867; cv=none; b=fujpGvQn7EsNVW2W1hphM2ugp8rYYfMt2qBSS2+QJF2pPJsN3qI2PyxMB5wFeLKIQp3Z+7l9YE+wbKRBu37ihKdv3w6mdXrK804MiwGV0/mviPNlXtsvXGtfrx+5UoFaBhNsWRCMgDH5EcSkmaL2pSCBNtosr9b+4Yh2DFvsn3U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680867; c=relaxed/simple; bh=VxWP1RtZGBoLLD4W5dwxuUIsJuYfdEtY8FUDOFtmi4Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HomEIhD8+nD2jK5Ps7s+D/WlZbkzGcTB23LUSTu+ZaC9OYmh7uDuUcjIdiKa6VEk6TKRdoiS6eORIFqO6dkcZdlF0Zo54EM5GOGTBPXBRXAgmL+Uv5vXH46mIy7/4qsJQfoTkfONB+Vhhhe8pJLmSt4gGpDQwaW/Cf23C41mdCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=xrpJYgPT; arc=none smtp.client-ip=115.124.30.98 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="xrpJYgPT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680856; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=Ei9u2YF++6MO0lYBeBOSJFHTc0M7jUeUTkqRnIOPTM8=; b=xrpJYgPTaQo7Q6rDSbM1KZX3XDCmznbywwzdc8gJemvJJieGYkBUWmtgKEagR/oJh0IDHQXvYNPS0olbDJ2Al17Q39RIYRQsth4Kkv3bpsdGoZw8I9JfF5fvFV4IkEZ6bb3LFhzhM/5Y9JLooiKBXz7yAsHlgdKBuiSh+bO+CVs= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBICrd_1755680855 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:35 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 02/11] mm: khugepaged: generalize collapse_file for mTHP support Date: Wed, 20 Aug 2025 17:07:13 +0800 Message-ID: <6a2f28b4541bbbc56ea9e07f24b67cef87899a50.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Generalize the order of the collapse_file() function to support future file/shmem mTHP collapse. No functional changes in this patch. Signed-off-by: Baolin Wang --- mm/khugepaged.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5d4493b77f3c..e64ed86d28ca 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2064,21 +2064,23 @@ static void retract_page_tables(struct address_spac= e *mapping, pgoff_t pgoff) */ static int collapse_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + int order) { struct address_space *mapping =3D file->f_mapping; struct page *dst; struct folio *folio, *tmp, *new_folio; - pgoff_t index =3D 0, end =3D start + HPAGE_PMD_NR; + int nr_pages =3D 1 << order; + pgoff_t index =3D 0, end =3D start + nr_pages; LIST_HEAD(pagelist); - XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); + XA_STATE_ORDER(xas, &mapping->i_pages, start, order); int nr_none =3D 0, result =3D SCAN_SUCCEED; bool is_shmem =3D shmem_file(file); =20 VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); - VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); + VM_BUG_ON(start & (nr_pages - 1)); =20 - result =3D alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); + result =3D alloc_charge_folio(&new_folio, mm, cc, order); if (result !=3D SCAN_SUCCEED) goto out; =20 @@ -2426,14 +2428,14 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, * unwritten page. */ folio_mark_uptodate(new_folio); - folio_ref_add(new_folio, HPAGE_PMD_NR - 1); + folio_ref_add(new_folio, nr_pages - 1); =20 if (is_shmem) folio_mark_dirty(new_folio); folio_add_lru(new_folio); =20 /* Join all the small entries into a single multi-index entry. */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_set_order(&xas, start, order); xas_store(&xas, new_folio); WARN_ON_ONCE(xas_error(&xas)); xas_unlock_irq(&xas); @@ -2496,7 +2498,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, folio_put(new_folio); out: VM_BUG_ON(!list_empty(&pagelist)); - trace_mm_khugepaged_collapse_file(mm, new_folio, index, addr, is_shmem, f= ile, HPAGE_PMD_NR, result); + trace_mm_khugepaged_collapse_file(mm, new_folio, index, addr, is_shmem, f= ile, nr_pages, result); return result; } =20 @@ -2599,7 +2601,7 @@ static int collapse_scan_file(struct mm_struct *mm, u= nsigned long addr, result =3D SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - result =3D collapse_file(mm, addr, file, start, cc); + result =3D collapse_file(mm, addr, file, start, cc, HPAGE_PMD_ORDER); } } =20 --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 585C5296BA8 for ; Wed, 20 Aug 2025 09:07:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680863; cv=none; b=YN2Kh5L/n5EvrhoyYbsUBfe/KvsBnp9NlxBSHKf/qln3zo7WuBjXtS4GHeOqXiwRusDmzo9Xp94YNGKxk4MdzUOwk6f9ZZ2jbF4D03UV4OpMtVFX2fy5VETSR5ua4/0iAd74gAxwhNWkg5Uqg3vxswT9JjqOKxg1rSC5WA0fYEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680863; c=relaxed/simple; bh=XkuAN/IK+RcGgut3NLuCNGpM4PYJELvfJq41P/X1Kcc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MwAy9vTN5iqCfldJzDz1jThZjeagrYBT+j4o1KzRe+txf8Jr+o5uPf3wrzz21MFV48eJImVvLRfvx/CCQ88CNcvKtKYC4sFVTYNHHElOHezg4lYZ1HEobOwvEejtZj4XdqObh+mTmwsgYHrDINU0oY+EpSNN8RlE5yJLWxa5EgA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=QZylzmhq; arc=none smtp.client-ip=115.124.30.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="QZylzmhq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680857; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=6PS3i58fU5+0vksauhTWhV82i4mcmoyJ+bQe1x8vgv8=; b=QZylzmhqwLE4yxDO92cYr1yNGVvdCdmCOT3x3Jfw79pXUnnREEn1jinfl/M/XZ4Kbeizo8wBAFVkuZRrutcHzEeH6UBn7cLt5TFWNd5ES+W0NhlDzW8c6kEDjzDY/lrmXPTe+XezCFzkpiicYlw/cWGeq0vU3TCgrnEO6roINZ4= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBXj5E_1755680856 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:37 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 03/11] mm: khugepaged: add an order check for THP statistics Date: Wed, 20 Aug 2025 17:07:14 +0800 Message-ID: <7c99eb27cf98615f80f7b14af479a2d1ea56b1f1.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to support file/shmem mTHP collapse in the following patches, add an PMD-sized THP order check to avoid PMD-sized THP statistics errors. No functional changes. Signed-off-by: Baolin Wang --- mm/khugepaged.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e64ed86d28ca..195c26699118 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2411,10 +2411,12 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, xas_lock_irq(&xas); } =20 - if (is_shmem) - __lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR); - else - __lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR); + if (order =3D=3D HPAGE_PMD_ORDER) { + if (is_shmem) + __lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR); + else + __lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR); + } =20 if (nr_none) { __lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none); --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5707C2D979E for ; Wed, 20 Aug 2025 09:07:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680866; cv=none; b=oIPUNpu53ca6fM22eY0eeTT8YqpDBvxehgGZBmy+9/oc6SVUlkJQsydbH8ILhHDs2vVBDO5V9FXYK0SulGfDwYJDRNp/z0DPBW4G6UUFXP8cr6Ixpb3NF99GII5BvYUbyikJVvuLsIQnW+mxPbNfzgiFK+Cs3clKgRTkxAIJycg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680866; c=relaxed/simple; bh=6x5ijexvDuF+H/p4E44IabCbASCkw+qyUIigZZiJT8U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rUXshsHY74R/XeLqspy2OViYhT+AdC6Jx0g/7yr3Y7nsJFjxEURt9FBfb7GfMznKvMkkZnI4RWyGFs1RLCF4YM6sgVUrX0XorX+olVnHqdcG2ldw5dS7k+tY0ksrxQXbGuYN4rE//zvjShzWb2AraBmpnTvyiKogxExTmLyvZfE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=CMoGiUu6; arc=none smtp.client-ip=115.124.30.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="CMoGiUu6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680860; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=d861k2/gR3LU0SyA7Qgfbs2BiibixgL7AImClvK96mk=; b=CMoGiUu6LJJOISzSs6PfHe3l/yYY0cjVCfqRT0J53oQibjYB3138hJcl5N6S6zDc45vhH8P4+0rvMx2WM4toCUCouXzMtKsBEy4x2RQvnIQ4YIRLal+bdP70R2ivUM1GYm8fMpUmY0UhTN0FbDyugFF15+ZbWWnskqbkukbPZl8= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBY55V_1755680857 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:38 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 04/11] mm: khugepaged: add shmem/file mTHP collapse support Date: Wed, 20 Aug 2025 17:07:15 +0800 Message-ID: X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Khugepaged already supports the anonymous mTHP collapse. Similarly, let khugepaged also support the shmem/file mTHP collapse. The strategy for shmem/file mTHP collapse follows the anonymous mTHP collapse, which is, quoting from Nico: "while scanning PMD ranges for potential collapse candidates, keep track of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER PTEs. After the scan is complete, we will perform binary recursion on the bitmap to determine which mTHP size would be most efficient to collapse to. The 'max_ptes_none' will be scaled by the attempted collapse order to determine how full a THP must be to be eligible. " Moreover, to facilitate the scanning of shmem/file folios, extend the 'cc->mthp_bitmap_temp' bitmap to record whether each index within the PMD range corresponds to a present page, and then this temp bitmap is used to determine whether each chunk should be marked as present for mTHP collapse. Currently, the collapse_pte_mapped_thp() does not build the mapping for mTH= P. Cause we still expect to establish the mTHP mapping via refault under the control of fault_around. So collapse_pte_mapped_thp() remains responsible only for building the mapping for PMD-sized THP, which is reasonable and makes life easier. Note that we do not need to remove pte page tables for shmem/file mTHP collapse. Signed-off-by: Baolin Wang --- mm/khugepaged.c | 133 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 107 insertions(+), 26 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 195c26699118..53ca7bb72fbc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -113,7 +113,7 @@ struct collapse_control { * 1bit =3D order KHUGEPAGED_MIN_MTHP_ORDER mTHP */ DECLARE_BITMAP(mthp_bitmap, MAX_MTHP_BITMAP_SIZE); - DECLARE_BITMAP(mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); + DECLARE_BITMAP(mthp_bitmap_temp, HPAGE_PMD_NR); struct scan_bit_state mthp_bitmap_stack[MAX_MTHP_BITMAP_SIZE]; }; =20 @@ -147,6 +147,10 @@ static struct khugepaged_scan khugepaged_scan =3D { .mm_head =3D LIST_HEAD_INIT(khugepaged_scan.mm_head), }; =20 +static int collapse_file(struct mm_struct *mm, unsigned long addr, + struct file *file, pgoff_t start, + struct collapse_control *cc, int order); + #ifdef CONFIG_SYSFS static ssize_t scan_sleep_millisecs_show(struct kobject *kobj, struct kobj_attribute *attr, @@ -1366,7 +1370,8 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, =20 /* Recursive function to consume the bitmap */ static int collapse_scan_bitmap(struct mm_struct *mm, unsigned long addres= s, - int referenced, int unmapped, struct collapse_control *cc, + struct file *file, int referenced, int unmapped, + pgoff_t start, struct collapse_control *cc, bool *mmap_locked, unsigned long enabled_orders) { u8 order, next_order; @@ -1401,10 +1406,14 @@ static int collapse_scan_bitmap(struct mm_struct *m= m, unsigned long address, =20 /* Check if the region is "almost full" based on the threshold */ if (bits_set > threshold_bits || is_pmd_only - || test_bit(order, &huge_anon_orders_always)) { - ret =3D collapse_huge_page(mm, address, referenced, unmapped, - cc, mmap_locked, order, - offset * KHUGEPAGED_MIN_MTHP_NR); + || (!file && test_bit(order, &huge_anon_orders_always))) { + if (file) + ret =3D collapse_file(mm, address, file, + start + offset * KHUGEPAGED_MIN_MTHP_NR, cc, order); + else + ret =3D collapse_huge_page(mm, address, referenced, unmapped, + cc, mmap_locked, order, + offset * KHUGEPAGED_MIN_MTHP_NR); =20 /* * Analyze failure reason to determine next action: @@ -1418,6 +1427,7 @@ static int collapse_scan_bitmap(struct mm_struct *mm,= unsigned long address, collapsed +=3D (1 << order); case SCAN_PAGE_RO: case SCAN_PTE_MAPPED_HUGEPAGE: + case SCAN_PAGE_COMPOUND: continue; /* Cases were lower orders might still succeed */ case SCAN_LACK_REFERENCED_PAGE: @@ -1481,7 +1491,7 @@ static int collapse_scan_pmd(struct mm_struct *mm, goto out; =20 bitmap_zero(cc->mthp_bitmap, MAX_MTHP_BITMAP_SIZE); - bitmap_zero(cc->mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); + bitmap_zero(cc->mthp_bitmap_temp, HPAGE_PMD_NR); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); =20 @@ -1649,8 +1659,8 @@ static int collapse_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result =3D=3D SCAN_SUCCEED) { - result =3D collapse_scan_bitmap(mm, address, referenced, unmapped, cc, - mmap_locked, enabled_orders); + result =3D collapse_scan_bitmap(mm, address, NULL, referenced, unmapped, + 0, cc, mmap_locked, enabled_orders); if (result > 0) result =3D SCAN_SUCCEED; else @@ -2067,6 +2077,7 @@ static int collapse_file(struct mm_struct *mm, unsign= ed long addr, struct collapse_control *cc, int order) { + int max_scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - or= der); struct address_space *mapping =3D file->f_mapping; struct page *dst; struct folio *folio, *tmp, *new_folio; @@ -2128,9 +2139,10 @@ static int collapse_file(struct mm_struct *mm, unsig= ned long addr, } nr_none++; =20 - if (cc->is_khugepaged && nr_none > khugepaged_max_ptes_none) { + if (cc->is_khugepaged && nr_none > max_scaled_none) { result =3D SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto xa_locked; } =20 @@ -2223,6 +2235,18 @@ static int collapse_file(struct mm_struct *mm, unsig= ned long addr, goto out_unlock; } =20 + /* + * If the folio order is greater than the collapse order, there is + * no need to continue attempting to collapse. + * And should return SCAN_PAGE_COMPOUND instead of SCAN_PTE_MAPPED_HUGEP= AGE, + * then we can build the mapping under the control of fault_around + * when refaulting. + */ + if (folio_order(folio) >=3D order) { + result =3D SCAN_PAGE_COMPOUND; + goto out_unlock; + } + if (folio_mapping(folio) !=3D mapping) { result =3D SCAN_TRUNCATED; goto out_unlock; @@ -2443,12 +2467,12 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, xas_unlock_irq(&xas); =20 /* - * Remove pte page tables, so we can re-fault the page as huge. + * Remove pte page tables for PMD-sized THP collapse, so we can re-fault + * the page as huge. * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp(). */ - retract_page_tables(mapping, start); - if (cc && !cc->is_khugepaged) - result =3D SCAN_PTE_MAPPED_HUGEPAGE; + if (order =3D=3D HPAGE_PMD_ORDER) + retract_page_tables(mapping, start); folio_unlock(new_folio); =20 /* @@ -2504,21 +2528,35 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, return result; } =20 -static int collapse_scan_file(struct mm_struct *mm, unsigned long addr, - struct file *file, pgoff_t start, +static int collapse_scan_file(struct mm_struct *mm, struct vm_area_struct = *vma, + unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { + int max_scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - KH= UGEPAGED_MIN_MTHP_ORDER); + enum tva_type type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_CO= LLAPSE; struct folio *folio =3D NULL; struct address_space *mapping =3D file->f_mapping; XA_STATE(xas, &mapping->i_pages, start); - int present, swap; + int present, swap, nr_pages; + unsigned long enabled_orders; int node =3D NUMA_NO_NODE; int result =3D SCAN_SUCCEED; + bool is_pmd_only; =20 present =3D 0; swap =3D 0; + bitmap_zero(cc->mthp_bitmap, MAX_MTHP_BITMAP_SIZE); + bitmap_zero(cc->mthp_bitmap_temp, HPAGE_PMD_NR); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + if (cc->is_khugepaged) + enabled_orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, + type, THP_ORDERS_ALL_FILE_DEFAULT); + else + enabled_orders =3D BIT(HPAGE_PMD_ORDER); + is_pmd_only =3D (enabled_orders =3D=3D (1 << HPAGE_PMD_ORDER)); + rcu_read_lock(); xas_for_each(&xas, folio, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, folio)) @@ -2587,7 +2625,20 @@ static int collapse_scan_file(struct mm_struct *mm, = unsigned long addr, * is just too costly... */ =20 - present +=3D folio_nr_pages(folio); + nr_pages =3D folio_nr_pages(folio); + present +=3D nr_pages; + + /* + * If there are folios present, keep track of it in the bitmap + * for file/shmem mTHP collapse. + */ + if (!is_pmd_only) { + pgoff_t pgoff =3D max_t(pgoff_t, start, folio->index) - start; + + nr_pages =3D min_t(int, HPAGE_PMD_NR - pgoff, nr_pages); + bitmap_set(cc->mthp_bitmap_temp, pgoff, nr_pages); + } + folio_put(folio); =20 if (need_resched()) { @@ -2597,16 +2648,46 @@ static int collapse_scan_file(struct mm_struct *mm,= unsigned long addr, } rcu_read_unlock(); =20 - if (result =3D=3D SCAN_SUCCEED) { - if (cc->is_khugepaged && - present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { - result =3D SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); - } else { - result =3D collapse_file(mm, addr, file, start, cc, HPAGE_PMD_ORDER); + if (result !=3D SCAN_SUCCEED) + goto out; + + if (cc->is_khugepaged && is_pmd_only && + present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + result =3D SCAN_EXCEED_NONE_PTE; + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + goto out; + } + + /* + * Check each KHUGEPAGED_MIN_MTHP_NR page chunks, and keep track of it + * in the bitmap if this chunk has enough present folios. + */ + if (!is_pmd_only) { + int i; + + for (i =3D 0; i < HPAGE_PMD_NR; i +=3D KHUGEPAGED_MIN_MTHP_NR) { + if (bitmap_weight(cc->mthp_bitmap_temp, KHUGEPAGED_MIN_MTHP_NR) > + KHUGEPAGED_MIN_MTHP_NR - max_scaled_none) + bitmap_set(cc->mthp_bitmap, i / KHUGEPAGED_MIN_MTHP_NR, 1); + + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap_temp, + KHUGEPAGED_MIN_MTHP_NR, HPAGE_PMD_NR); } + + bitmap_zero(cc->mthp_bitmap_temp, HPAGE_PMD_NR); + } + result =3D collapse_scan_bitmap(mm, addr, file, 0, 0, start, + cc, NULL, enabled_orders); + if (result > 0) { + if (cc && !cc->is_khugepaged) + result =3D SCAN_PTE_MAPPED_HUGEPAGE; + else + result =3D SCAN_SUCCEED; + } else { + result =3D SCAN_FAIL; } =20 +out: trace_mm_khugepaged_scan_file(mm, folio, file, present, swap, result); return result; } @@ -2628,7 +2709,7 @@ static int collapse_single_pmd(unsigned long addr, =20 mmap_read_unlock(mm); *mmap_locked =3D false; - result =3D collapse_scan_file(mm, addr, file, pgoff, cc); + result =3D collapse_scan_file(mm, vma, addr, file, pgoff, cc); fput(file); if (result =3D=3D SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6760A2D375C for ; Wed, 20 Aug 2025 09:07:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680865; cv=none; b=E4DSFBHnu4P2W1VvSwxZ8ieIgcHSKCWfhjtiwA5c3e3xrwx7SJFRrPiq1JKsOLrowG+owvOnjtjrTLc7G6e0swKdH2PqbD0ksND0JRcwobtgXzyyzoDNunvz9P+Muzv8ycp/LfbtVjgXM5RggukC9B0XZflZ5kmTwcnGqo4qkJc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680865; c=relaxed/simple; bh=j1Nq8oUzOAN3Apk8ICh8S20bLXT7/OScSiLjEPdd8rw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ub1U5KUqVFg+kJq0up7Th0As5H+caKDOUrj10l0tF0AMdM6MNnjZYaCb0A9FfNZ5c/TZdSp9ci/8ZG9PxtP2w6ikOGqqfls50Ug2MR6UMkouQ8Qv2RQk405ThYU6UDFYzgSo1gsXeW7fQAEEIeLDJO/Gr4xXEGedsHWxUOPyNiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Apb1wwnA; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Apb1wwnA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680861; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=kaAWoBis1OZYy09ndtQ4e2rlen5KUPlGbetwvB7igCU=; b=Apb1wwnAdsehKaTR9oqucuHH23qetBF9iLSbrDG2wT6eKFde5/OxfMCvC1le/3eZL+Ff02tSpiAOok0FzjvjH1lruJ3ezm0buckZZ/1kvYI9nPRWGRpJJg8bebgaq2qhjYPHc5RUD9qRhri2/snPuG04C+TSDmsY2iV88fGdK/Y= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBY56L_1755680859 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:39 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 05/11] mm: shmem: kick khugepaged for enabling none-PMD-sized shmem mTHPs Date: Wed, 20 Aug 2025 17:07:16 +0800 Message-ID: <7c8ee99eb146a9e8abd20d110cb591d33fa1ebae.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When only non-PMD-sized mTHP is enabled (such as only 64K mTHP enabled), we should also allow kicking khugepaged to attempt scanning and collapsing 64K shmem mTHP. Modify shmem_hpage_pmd_enabled() to support shmem mTHP collapse, and while we are at it, rename it to make the function name more clear. Signed-off-by: Baolin Wang --- include/linux/shmem_fs.h | 4 ++-- mm/khugepaged.c | 2 +- mm/shmem.c | 10 +++++----- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 6d0f9c599ff7..cbe46e0c8bce 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -118,7 +118,7 @@ int shmem_unuse(unsigned int type); unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, loff_t write_end, bool shmem_huge_force); -bool shmem_hpage_pmd_enabled(void); +bool shmem_hpage_enabled(void); #else static inline unsigned long shmem_allowable_huge_orders(struct inode *inod= e, struct vm_area_struct *vma, pgoff_t index, @@ -127,7 +127,7 @@ static inline unsigned long shmem_allowable_huge_orders= (struct inode *inode, return 0; } =20 -static inline bool shmem_hpage_pmd_enabled(void) +static inline bool shmem_hpage_enabled(void) { return false; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 53ca7bb72fbc..eb0b433d6ccb 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -453,7 +453,7 @@ static bool hugepage_enabled(void) if (READ_ONCE(huge_anon_orders_inherit) && hugepage_global_enabled()) return true; - if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled()) + if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_enabled()) return true; return false; } diff --git a/mm/shmem.c b/mm/shmem.c index 13cc51df3893..a360738ab732 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1791,17 +1791,17 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t l= imit_gfp) } =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE -bool shmem_hpage_pmd_enabled(void) +bool shmem_hpage_enabled(void) { if (shmem_huge =3D=3D SHMEM_HUGE_DENY) return false; - if (test_bit(HPAGE_PMD_ORDER, &huge_shmem_orders_always)) + if (READ_ONCE(huge_shmem_orders_always)) return true; - if (test_bit(HPAGE_PMD_ORDER, &huge_shmem_orders_madvise)) + if (READ_ONCE(huge_shmem_orders_madvise)) return true; - if (test_bit(HPAGE_PMD_ORDER, &huge_shmem_orders_within_size)) + if (READ_ONCE(huge_shmem_orders_within_size)) return true; - if (test_bit(HPAGE_PMD_ORDER, &huge_shmem_orders_inherit) && + if (READ_ONCE(huge_shmem_orders_inherit) && shmem_huge !=3D SHMEM_HUGE_NEVER) return true; =20 --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01E342DAFB7 for ; Wed, 20 Aug 2025 09:07:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680867; cv=none; b=j7ee77FTuOAD6mH9M1EVp++YEVyHcKPsnBrti/d4bJCmIweC8iP8sWuy1lrMLRY+QT7W/2FClDFKPemEIgdYoBZak4fRVUhUPEvyvU2y5n1DZlffM45jM2iXwmI7D6VbdWnCmEpsgLlI2t3/TH6uOww5RTSSfYg9g+6m/Q2xe5w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680867; c=relaxed/simple; bh=NPPvPHUtxPXBykCwhCQYDRz9k9AoGUbT6RH07W58r2I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S/eqgNRxE2Z+aO5SFhwC2Egk8hPNfidUjS1MA5OjYZ6uQblojmYBcNB25c+jsOQsN5U/ON3YgA0NFDc4ZNFCw667kImI6NfOn88poTxFgUjEQGX73dTHFU2G99NL2909syYguKqJu588Q1Xw/VtXmvKBibXYbBsHySCBtHnkRhI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=iB7/AV63; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="iB7/AV63" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680862; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=hp0aqN5MfkEUvcDi4Uu976xMEqVAADqjDiv9jyT1Mvs=; b=iB7/AV63AA8lqUQIyEjavD5wmP0B6PLbj63ZW1wb+rpqlwVk1YiXck6nRJZa9FFB0ktYW8lirdxxY8XMuOTtcq+2tVvBJcmAvHrW+wNG+aznypDdmesU2Bd7Bcg7mC4WnW4TiNyT/3uF7035qX8UtlcLhosu3O1jMklZ3ivbbgs= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBEsC-_1755680860 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:41 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 06/11] mm: khugepaged: allow khugepaged to check all shmem/file large orders Date: Wed, 20 Aug 2025 17:07:17 +0800 Message-ID: <4cad4cded3f19c667442ae4d89ec03452c42a7b5.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We are now ready to enable shmem/file mTHP collapse, allowing thp_vma_allowable_orders() to check all permissible file large orders. Signed-off-by: Baolin Wang --- mm/khugepaged.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eb0b433d6ccb..d5ae2e6c4107 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -496,7 +496,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && hugepage_enabled()) { unsigned long orders =3D vma_is_anonymous(vma) ? - THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); + THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE_DEFAULT; =20 if (thp_vma_allowable_orders(vma, vm_flags, TVA_KHUGEPAGED, orders)) @@ -2780,7 +2780,7 @@ static unsigned int collapse_scan_mm_slot(unsigned in= t pages, int *result, vma_iter_init(&vmi, mm, khugepaged_scan.address); for_each_vma(vmi, vma) { unsigned long orders =3D vma_is_anonymous(vma) ? - THP_ORDERS_ALL_ANON : BIT(PMD_ORDER); + THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE_DEFAULT; unsigned long hstart, hend; =20 cond_resched(); --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C5F32DCC11 for ; Wed, 20 Aug 2025 09:07:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680869; cv=none; b=qQCg9Ej44PijhLEV7mqbxPCW4SpC96pTOVH8ZuSeLqRjW4STMLPYbBgcjLjfU2MDExY0LD8yFX2GlTs/d5mnceXmaAyJ10h2JvrGnyhHWIWhHAu3/wf7mQZJnbemHoI4gZQpwkIXofsNP/LAzLBBts/qZVue55r7RjomxUzOWBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680869; c=relaxed/simple; bh=pYFuC7WrlZHb/j5yrIiyRfRzsKiq0PRGQ2hwTCLZtus=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fyru7nxs/6siITy8j/nfoqV3lhBe9ciW/1SHbk63lXohE4PCf9r9wFh7QG/QPGm1V/dO0jfQzWbEO2XQ5HJzWDAIgtAmLlh/b0o4WdU5+SyPWWOSTa++5Bb9Pb5Zh6GqirFHO5dzdrFZskdEm2XKS7FD0M17oMBwtjViYIU81zc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=eDWwwc0Y; arc=none smtp.client-ip=115.124.30.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="eDWwwc0Y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680863; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=EjISQWqO9y1RYDSZTblYyeGwhEoGNvkdz3hoHgtvBm0=; b=eDWwwc0Y9ODZ1cQfgcKhueuOazvdOE7ix5maBqTHoNTfNKt8CXo/YBYNjnfPnEyttzrY/yNs3eOyi5idIH4cYLk+SiqqsdWULGOfA+KJQDet7UkRjXfccc+AdrlY0+qKfoKvQyoZFa1vHoCsA7lc2+x64fsdQ6QshFYooz5dTfE= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBICuO_1755680862 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:42 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 07/11] mm: khugepaged: skip large folios that don't need to be collapsed Date: Wed, 20 Aug 2025 17:07:18 +0800 Message-ID: <27aa3e9657958cf17d0f42a725155c05b21806e5.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If a VMA has already created a mapping of large folios after a successful mTHP collapse, we can skip those folios that exceed the 'highest_enabled_or= der' when scanning the VMA range again, as they can no longer be collapsed furth= er. This helps prevent wasting CPU cycles. Signed-off-by: Baolin Wang --- mm/khugepaged.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d5ae2e6c4107..c25b68b13402 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2537,6 +2537,7 @@ static int collapse_scan_file(struct mm_struct *mm, s= truct vm_area_struct *vma, struct folio *folio =3D NULL; struct address_space *mapping =3D file->f_mapping; XA_STATE(xas, &mapping->i_pages, start); + unsigned int highest_enabled_order; int present, swap, nr_pages; unsigned long enabled_orders; int node =3D NUMA_NO_NODE; @@ -2556,6 +2557,7 @@ static int collapse_scan_file(struct mm_struct *mm, s= truct vm_area_struct *vma, else enabled_orders =3D BIT(HPAGE_PMD_ORDER); is_pmd_only =3D (enabled_orders =3D=3D (1 << HPAGE_PMD_ORDER)); + highest_enabled_order =3D highest_order(enabled_orders); =20 rcu_read_lock(); xas_for_each(&xas, folio, start + HPAGE_PMD_NR - 1) { @@ -2631,8 +2633,11 @@ static int collapse_scan_file(struct mm_struct *mm, = struct vm_area_struct *vma, /* * If there are folios present, keep track of it in the bitmap * for file/shmem mTHP collapse. + * Skip those folios whose order has already exceeded the + * 'highest_enabled_order', meaning they cannot be collapsed + * into larger order folios. */ - if (!is_pmd_only) { + if (!is_pmd_only && folio_order(folio) < highest_enabled_order) { pgoff_t pgoff =3D max_t(pgoff_t, start, folio->index) - start; =20 nr_pages =3D min_t(int, HPAGE_PMD_NR - pgoff, nr_pages); --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF6E92DE70D for ; Wed, 20 Aug 2025 09:07:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680872; cv=none; b=uPXqrJb9Vvgu8sITfiS31md9fpaiXkebuDzXQVOn3NCzZP6VnSrNQNBTr4qVMz/Mlt5kn8dUv0XgM3j5J57vZKPDJMoyIYil7+Visai8PZk69M/RjD0lj+4b88ugbsSJo2he2zNYP09d6RaLXNK4jtyWlz5oerXdpR0lNrfAcFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680872; c=relaxed/simple; bh=7l8anJGrjxDBUklI/KzcjKZdDZDb3d1t+jxFRw1pc8U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hchbTeCFuc14og7HdgnoHdXt5OHTnMcbtcWy9WLBfcf7HAXumC9owHHiZI66JARq1wWczR3wiEj8gRj5T1po8wnJwKpCnUefPWcBFnPXoPBn1ee58mH7cGCTBj9umaAlj5Ggx/QqfZK0bDXS3zzYLrhEQ51M0Lkl1nUPATq3IGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=XGp0a9JD; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="XGp0a9JD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680865; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=JXLebxa72w6she1oq4EUJPJVyU6GJQQByANwGkThcak=; b=XGp0a9JDx4SR+Bvw+ctlP+hNP6yB+kWFYE8GTUF3Vxs75I6svJdCeGR++WH8UGZM3UX28Inybhsl6gLrSr8oKRZgUptaZ8KC9DdCB5qEM986S5DkRlGqt/pBRx9bzezo4pE8audDkgkNVPdKxQ4Gc5V4h8oNEAYHpXWDJxE2UGI= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBXj7x_1755680863 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:44 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 08/11] selftests:mm: extend the check_huge() to support mTHP check Date: Wed, 20 Aug 2025 17:07:19 +0800 Message-ID: <54919436962dc50b77e89ec142cf114f1186fa2a.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support checking for various sized mTHPs during mTHP collapse, it is necessary to extend the check_huge() function prototype in preparation for the following patches. No functional changes. Signed-off-by: Baolin Wang --- tools/testing/selftests/mm/khugepaged.c | 66 ++++++++++--------- .../selftests/mm/split_huge_page_test.c | 10 +-- tools/testing/selftests/mm/uffd-common.c | 4 +- tools/testing/selftests/mm/vm_util.c | 4 +- tools/testing/selftests/mm/vm_util.h | 4 +- 5 files changed, 48 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selfte= sts/mm/khugepaged.c index a18c50d51141..e529074a1fdf 100644 --- a/tools/testing/selftests/mm/khugepaged.c +++ b/tools/testing/selftests/mm/khugepaged.c @@ -45,7 +45,7 @@ struct mem_ops { void *(*setup_area)(int nr_hpages); void (*cleanup_area)(void *p, unsigned long size); void (*fault)(void *p, unsigned long start, unsigned long end); - bool (*check_huge)(void *addr, int nr_hpages); + bool (*check_huge)(void *addr, unsigned long size, int nr_hpages, unsigne= d long hpage_size); const char *name; }; =20 @@ -319,7 +319,7 @@ static void *alloc_hpage(struct mem_ops *ops) perror("madvise(MADV_COLLAPSE)"); exit(EXIT_FAILURE); } - if (!ops->check_huge(p, 1)) { + if (!ops->check_huge(p, hpage_pmd_size, 1, hpage_pmd_size)) { perror("madvise(MADV_COLLAPSE)"); exit(EXIT_FAILURE); } @@ -359,9 +359,10 @@ static void anon_fault(void *p, unsigned long start, u= nsigned long end) fill_memory(p, start, end); } =20 -static bool anon_check_huge(void *addr, int nr_hpages) +static bool anon_check_huge(void *addr, unsigned long size, + int nr_hpages, unsigned long hpage_size) { - return check_huge_anon(addr, nr_hpages, hpage_pmd_size); + return check_huge_anon(addr, size, nr_hpages, hpage_size); } =20 static void *file_setup_area(int nr_hpages) @@ -422,13 +423,14 @@ static void file_fault(void *p, unsigned long start, = unsigned long end) } } =20 -static bool file_check_huge(void *addr, int nr_hpages) +static bool file_check_huge(void *addr, unsigned long size, + int nr_hpages, unsigned long hpage_size) { switch (finfo.type) { case VMA_FILE: - return check_huge_file(addr, nr_hpages, hpage_pmd_size); + return check_huge_file(addr, nr_hpages, hpage_size); case VMA_SHMEM: - return check_huge_shmem(addr, nr_hpages, hpage_pmd_size); + return check_huge_shmem(addr, size, nr_hpages, hpage_size); default: exit(EXIT_FAILURE); return false; @@ -464,9 +466,10 @@ static void shmem_cleanup_area(void *p, unsigned long = size) close(finfo.fd); } =20 -static bool shmem_check_huge(void *addr, int nr_hpages) +static bool shmem_check_huge(void *addr, unsigned long size, + int nr_hpages, unsigned long hpage_size) { - return check_huge_shmem(addr, nr_hpages, hpage_pmd_size); + return check_huge_shmem(addr, size, nr_hpages, hpage_size); } =20 static struct mem_ops __anon_ops =3D { @@ -514,7 +517,7 @@ static void __madvise_collapse(const char *msg, char *p= , int nr_hpages, ret =3D madvise_collapse_retry(p, nr_hpages * hpage_pmd_size); if (((bool)ret) =3D=3D expect) fail("Fail: Bad return value"); - else if (!ops->check_huge(p, expect ? nr_hpages : 0)) + else if (!ops->check_huge(p, nr_hpages * hpage_pmd_size, expect ? nr_hpag= es : 0, hpage_pmd_size)) fail("Fail: check_huge()"); else success("OK"); @@ -526,7 +529,7 @@ static void madvise_collapse(const char *msg, char *p, = int nr_hpages, struct mem_ops *ops, bool expect) { /* Sanity check */ - if (!ops->check_huge(p, 0)) { + if (!ops->check_huge(p, nr_hpages * hpage_pmd_size, 0, hpage_pmd_size)) { printf("Unexpected huge page\n"); exit(EXIT_FAILURE); } @@ -537,11 +540,12 @@ static void madvise_collapse(const char *msg, char *p= , int nr_hpages, static bool wait_for_scan(const char *msg, char *p, int nr_hpages, struct mem_ops *ops) { + unsigned long size =3D nr_hpages * hpage_pmd_size; int full_scans; int timeout =3D 6; /* 3 seconds */ =20 /* Sanity check */ - if (!ops->check_huge(p, 0)) { + if (!ops->check_huge(p, size, 0, hpage_pmd_size)) { printf("Unexpected huge page\n"); exit(EXIT_FAILURE); } @@ -553,7 +557,7 @@ static bool wait_for_scan(const char *msg, char *p, int= nr_hpages, =20 printf("%s...", msg); while (timeout--) { - if (ops->check_huge(p, nr_hpages)) + if (ops->check_huge(p, size, nr_hpages, hpage_pmd_size)) break; if (thp_read_num("khugepaged/full_scans") >=3D full_scans) break; @@ -567,6 +571,8 @@ static bool wait_for_scan(const char *msg, char *p, int= nr_hpages, static void khugepaged_collapse(const char *msg, char *p, int nr_hpages, struct mem_ops *ops, bool expect) { + unsigned long size =3D nr_hpages * hpage_pmd_size; + if (wait_for_scan(msg, p, nr_hpages, ops)) { if (expect) fail("Timeout"); @@ -583,7 +589,7 @@ static void khugepaged_collapse(const char *msg, char *= p, int nr_hpages, if (ops !=3D &__anon_ops) ops->fault(p, 0, nr_hpages * hpage_pmd_size); =20 - if (ops->check_huge(p, expect ? nr_hpages : 0)) + if (ops->check_huge(p, size, expect ? nr_hpages : 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -622,7 +628,7 @@ static void alloc_at_fault(void) p =3D alloc_mapping(1); *p =3D 1; printf("Allocate huge page on fault..."); - if (check_huge_anon(p, 1, hpage_pmd_size)) + if (check_huge_anon(p, hpage_pmd_size, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -631,7 +637,7 @@ static void alloc_at_fault(void) =20 madvise(p, page_size, MADV_DONTNEED); printf("Split huge PMD on MADV_DONTNEED..."); - if (check_huge_anon(p, 0, hpage_pmd_size)) + if (check_huge_anon(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -797,7 +803,7 @@ static void collapse_single_pte_entry_compound(struct c= ollapse_context *c, struc madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); printf("Split huge page leaving single PTE mapping compound page..."); madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -817,7 +823,7 @@ static void collapse_full_of_compound(struct collapse_c= ontext *c, struct mem_ops printf("Split huge page leaving single PTE page table full of compound pa= ges..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -840,7 +846,7 @@ static void collapse_compound_extreme(struct collapse_c= ontext *c, struct mem_ops =20 madvise(BASE_ADDR, hpage_pmd_size, MADV_HUGEPAGE); ops->fault(BASE_ADDR, 0, hpage_pmd_size); - if (!ops->check_huge(BASE_ADDR, 1)) { + if (!ops->check_huge(BASE_ADDR, hpage_pmd_size, 1, hpage_pmd_size)) { printf("Failed to allocate huge page\n"); exit(EXIT_FAILURE); } @@ -869,7 +875,7 @@ static void collapse_compound_extreme(struct collapse_c= ontext *c, struct mem_ops =20 ops->cleanup_area(BASE_ADDR, hpage_pmd_size); ops->fault(p, 0, hpage_pmd_size); - if (!ops->check_huge(p, 1)) + if (!ops->check_huge(p, hpage_pmd_size, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -890,7 +896,7 @@ static void collapse_fork(struct collapse_context *c, s= truct mem_ops *ops) =20 printf("Allocate small page..."); ops->fault(p, 0, page_size); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -901,7 +907,7 @@ static void collapse_fork(struct collapse_context *c, s= truct mem_ops *ops) skip_settings_restore =3D true; exit_status =3D 0; =20 - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -919,7 +925,7 @@ static void collapse_fork(struct collapse_context *c, s= truct mem_ops *ops) exit_status +=3D WEXITSTATUS(wstatus); =20 printf("Check if parent still has small page..."); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -939,7 +945,7 @@ static void collapse_fork_compound(struct collapse_cont= ext *c, struct mem_ops *o skip_settings_restore =3D true; exit_status =3D 0; =20 - if (ops->check_huge(p, 1)) + if (ops->check_huge(p, hpage_pmd_size, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -947,7 +953,7 @@ static void collapse_fork_compound(struct collapse_cont= ext *c, struct mem_ops *o printf("Split huge page PMD in child process..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -968,7 +974,7 @@ static void collapse_fork_compound(struct collapse_cont= ext *c, struct mem_ops *o exit_status +=3D WEXITSTATUS(wstatus); =20 printf("Check if parent still has huge page..."); - if (ops->check_huge(p, 1)) + if (ops->check_huge(p, hpage_pmd_size, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -989,7 +995,7 @@ static void collapse_max_ptes_shared(struct collapse_co= ntext *c, struct mem_ops skip_settings_restore =3D true; exit_status =3D 0; =20 - if (ops->check_huge(p, 1)) + if (ops->check_huge(p, hpage_pmd_size, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -997,7 +1003,7 @@ static void collapse_max_ptes_shared(struct collapse_c= ontext *c, struct mem_ops printf("Trigger CoW on page %d of %d...", hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr); ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -1010,7 +1016,7 @@ static void collapse_max_ptes_shared(struct collapse_= context *c, struct mem_ops hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (ops->check_huge(p, 0)) + if (ops->check_huge(p, hpage_pmd_size, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -1028,7 +1034,7 @@ static void collapse_max_ptes_shared(struct collapse_= context *c, struct mem_ops exit_status +=3D WEXITSTATUS(wstatus); =20 printf("Check if parent still has huge page..."); - if (ops->check_huge(p, 1)) + if (ops->check_huge(p, hpage_pmd_size, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/test= ing/selftests/mm/split_huge_page_test.c index 80eb1f91261e..cbf190598988 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -311,7 +311,7 @@ static void verify_rss_anon_split_huge_page_all_zeroes(= char *one_page, int nr_hp unsigned long rss_anon_before, rss_anon_after; size_t i; =20 - if (!check_huge_anon(one_page, nr_hpages, pmd_pagesize)) + if (!check_huge_anon(one_page, nr_hpages * pmd_pagesize, nr_hpages, pmd_p= agesize)) ksft_exit_fail_msg("No THP is allocated\n"); =20 rss_anon_before =3D rss_anon(); @@ -326,7 +326,7 @@ static void verify_rss_anon_split_huge_page_all_zeroes(= char *one_page, int nr_hp if (one_page[i] !=3D (char)0) ksft_exit_fail_msg("%ld byte corrupted\n", i); =20 - if (!check_huge_anon(one_page, 0, pmd_pagesize)) + if (!check_huge_anon(one_page, nr_hpages * pmd_pagesize, 0, pmd_pagesize)) ksft_exit_fail_msg("Still AnonHugePages not split\n"); =20 rss_anon_after =3D rss_anon(); @@ -362,7 +362,7 @@ static void split_pmd_thp_to_order(int order) for (i =3D 0; i < len; i++) one_page[i] =3D (char)i; =20 - if (!check_huge_anon(one_page, 4, pmd_pagesize)) + if (!check_huge_anon(one_page, 4 * pmd_pagesize, 4, pmd_pagesize)) ksft_exit_fail_msg("No THP is allocated\n"); =20 /* split all THPs */ @@ -381,7 +381,7 @@ static void split_pmd_thp_to_order(int order) (pmd_order + 1))) ksft_exit_fail_msg("Unexpected THP split\n"); =20 - if (!check_huge_anon(one_page, 0, pmd_pagesize)) + if (!check_huge_anon(one_page, 4 * pmd_pagesize, 0, pmd_pagesize)) ksft_exit_fail_msg("Still AnonHugePages not split\n"); =20 ksft_test_result_pass("Split huge pages to order %d successful\n", order); @@ -405,7 +405,7 @@ static void split_pte_mapped_thp(void) for (i =3D 0; i < len; i++) one_page[i] =3D (char)i; =20 - if (!check_huge_anon(one_page, 4, pmd_pagesize)) + if (!check_huge_anon(one_page, 4 * pmd_pagesize, 4, pmd_pagesize)) ksft_exit_fail_msg("No THP is allocated\n"); =20 /* remap the first pagesize of first THP */ diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selft= ests/mm/uffd-common.c index f4e9a5f43e24..b6cfcc6950e1 100644 --- a/tools/testing/selftests/mm/uffd-common.c +++ b/tools/testing/selftests/mm/uffd-common.c @@ -191,7 +191,9 @@ static void shmem_alias_mapping(uffd_global_test_opts_t= *gopts, __u64 *start, static void shmem_check_pmd_mapping(uffd_global_test_opts_t *gopts, void _= _unused *p, int expect_nr_hpages) { - if (!check_huge_shmem(gopts->area_dst_alias, expect_nr_hpages, + unsigned long size =3D expect_nr_hpages * read_pmd_pagesize(); + + if (!check_huge_shmem(gopts->area_dst_alias, size, expect_nr_hpages, read_pmd_pagesize())) err("Did not find expected %d number of hugepages", expect_nr_hpages); diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 56e9bd541edd..6058d80c63ef 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -248,7 +248,7 @@ bool __check_huge(void *addr, char *pattern, int nr_hpa= ges, return thp =3D=3D (nr_hpages * (hpage_size >> 10)); } =20 -bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size) +bool check_huge_anon(void *addr, unsigned long size, int nr_hpages, uint64= _t hpage_size) { return __check_huge(addr, "AnonHugePages: ", nr_hpages, hpage_size); } @@ -258,7 +258,7 @@ bool check_huge_file(void *addr, int nr_hpages, uint64_= t hpage_size) return __check_huge(addr, "FilePmdMapped:", nr_hpages, hpage_size); } =20 -bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size) +bool check_huge_shmem(void *addr, unsigned long size, int nr_hpages, uint6= 4_t hpage_size) { return __check_huge(addr, "ShmemPmdMapped:", nr_hpages, hpage_size); } diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index 07c4acfd84b6..a1cd446e5140 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -82,9 +82,9 @@ void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t le= n); uint64_t read_pmd_pagesize(void); unsigned long rss_anon(void); -bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); +bool check_huge_anon(void *addr, unsigned long size, int nr_hpages, uint64= _t hpage_size); bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size); -bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size); +bool check_huge_shmem(void *addr, unsigned long size, int nr_hpages, uint6= 4_t hpage_size); int64_t allocate_transhuge(void *ptr, int pagemap_fd); unsigned long default_huge_page_size(void); int detect_hugetlb_page_sizes(size_t sizes[], int max); --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D20702E11CA for ; Wed, 20 Aug 2025 09:07:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.99 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680877; cv=none; b=SWXeT52evo0cYtNnVnpeqrhZiiRUvhZ9jI2LCnpdgDGspsYhCGnEm3iA4dE7CcDevKe4fTwVwo1eYdO+9AzOawQU6YlbfbXhDMKE/cBEqxFtiXo8MjHNKtJe93MTWqGmcl0n//Eeq613WM3PyG45Igm3iHWLEXEbWiBhr6EuaSs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680877; c=relaxed/simple; bh=rPzHol3DMM50Pu4NcflGe271pqw0z138WV8LSF/u8eg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M2oSgyeJS25mVHQ5+cF7+LKmCFIMiQnGXRwtgrAc+FHQM25CBQI1VAzkTb2w7x4Sq7S88VNwW9XlF2fb0d3knESLnGXNk5PcLtqG8fH6DM0pFUphFtCuERET3GzVDJsTvQggTevgN98JtJk1Xl3OG6o/xtUjwLXWPcKz4SMvZjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=WP/FONVh; arc=none smtp.client-ip=115.124.30.99 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="WP/FONVh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680867; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=GLHMQ2aL75uW7dMd2UVF2Rmlbf3qVWCe706KFetc4TE=; b=WP/FONVhaX27AJQtkCiGuAeLGZTBE7MMz7YSvBUg8GZNzoeDGV0C0fOwyzFtvHoUHRjD6zadfmggCK8ObWyEoajz+WnkwPpUKliSa0hy3OCZmLpObETMX/U/CX+Az2D+kUxw2juRg1xfFpppIPz/kgaVY/2NS5fX4xX54AwK1hg= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBY58f_1755680865 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:46 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 09/11] selftests: mm: move gather_after_split_folio_orders() into vm_util.c file Date: Wed, 20 Aug 2025 17:07:20 +0800 Message-ID: <955e0b9682b1746c528a043f0ca530b54ee22536.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move gather_after_split_folio_orders() to vm_util.c as a helper function in preparation for implementing checks for mTHP collapse. While we are at it, rename this function to indicate that it is not only used for large folio splits. No functional changes. Signed-off-by: Baolin Wang --- .../selftests/mm/split_huge_page_test.c | 125 +----------------- tools/testing/selftests/mm/vm_util.c | 123 +++++++++++++++++ tools/testing/selftests/mm/vm_util.h | 2 + 3 files changed, 126 insertions(+), 124 deletions(-) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/test= ing/selftests/mm/split_huge_page_test.c index cbf190598988..77cf510f18e0 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -104,129 +104,6 @@ static bool is_backed_by_folio(char *vaddr, int order= , int pagemap_fd, return false; } =20 -static int vaddr_pageflags_get(char *vaddr, int pagemap_fd, int kpageflags= _fd, - uint64_t *flags) -{ - unsigned long pfn; - - pfn =3D pagemap_get_pfn(pagemap_fd, vaddr); - - /* non-present PFN */ - if (pfn =3D=3D -1UL) - return 1; - - if (pageflags_get(pfn, kpageflags_fd, flags)) - return -1; - - return 0; -} - -/* - * gather_after_split_folio_orders - scan through [vaddr_start, len) and r= ecord - * folio orders - * - * @vaddr_start: start vaddr - * @len: range length - * @pagemap_fd: file descriptor to /proc//pagemap - * @kpageflags_fd: file descriptor to /proc/kpageflags - * @orders: output folio order array - * @nr_orders: folio order array size - * - * gather_after_split_folio_orders() scan through [vaddr_start, len) and c= heck - * all folios within the range and record their orders. All order-0 pages = will - * be recorded. Non-present vaddr is skipped. - * - * NOTE: the function is used to check folio orders after a split is perfo= rmed, - * so it assumes [vaddr_start, len) fully maps to after-split folios withi= n that - * range. - * - * Return: 0 - no error, -1 - unhandled cases - */ -static int gather_after_split_folio_orders(char *vaddr_start, size_t len, - int pagemap_fd, int kpageflags_fd, int orders[], int nr_orders) -{ - uint64_t page_flags =3D 0; - int cur_order =3D -1; - char *vaddr; - - if (pagemap_fd =3D=3D -1 || kpageflags_fd =3D=3D -1) - return -1; - if (!orders) - return -1; - if (nr_orders <=3D 0) - return -1; - - for (vaddr =3D vaddr_start; vaddr < vaddr_start + len;) { - char *next_folio_vaddr; - int status; - - status =3D vaddr_pageflags_get(vaddr, pagemap_fd, kpageflags_fd, - &page_flags); - if (status < 0) - return -1; - - /* skip non present vaddr */ - if (status =3D=3D 1) { - vaddr +=3D psize(); - continue; - } - - /* all order-0 pages with possible false postive (non folio) */ - if (!(page_flags & (KPF_COMPOUND_HEAD | KPF_COMPOUND_TAIL))) { - orders[0]++; - vaddr +=3D psize(); - continue; - } - - /* skip non thp compound pages */ - if (!(page_flags & KPF_THP)) { - vaddr +=3D psize(); - continue; - } - - /* vpn points to part of a THP at this point */ - if (page_flags & KPF_COMPOUND_HEAD) - cur_order =3D 1; - else { - vaddr +=3D psize(); - continue; - } - - next_folio_vaddr =3D vaddr + (1UL << (cur_order + pshift())); - - if (next_folio_vaddr >=3D vaddr_start + len) - break; - - while ((status =3D vaddr_pageflags_get(next_folio_vaddr, - pagemap_fd, kpageflags_fd, - &page_flags)) >=3D 0) { - /* - * non present vaddr, next compound head page, or - * order-0 page - */ - if (status =3D=3D 1 || - (page_flags & KPF_COMPOUND_HEAD) || - !(page_flags & (KPF_COMPOUND_HEAD | KPF_COMPOUND_TAIL))) { - if (cur_order < nr_orders) { - orders[cur_order]++; - cur_order =3D -1; - vaddr =3D next_folio_vaddr; - } - break; - } - - cur_order++; - next_folio_vaddr =3D vaddr + (1UL << (cur_order + pshift())); - } - - if (status < 0) - return status; - } - if (cur_order > 0 && cur_order < nr_orders) - orders[cur_order]++; - return 0; -} - static int check_after_split_folio_orders(char *vaddr_start, size_t len, int pagemap_fd, int kpageflags_fd, int orders[], int nr_orders) { @@ -240,7 +117,7 @@ static int check_after_split_folio_orders(char *vaddr_s= tart, size_t len, ksft_exit_fail_msg("Cannot allocate memory for vaddr_orders"); =20 memset(vaddr_orders, 0, sizeof(int) * nr_orders); - status =3D gather_after_split_folio_orders(vaddr_start, len, pagemap_fd, + status =3D gather_folio_orders(vaddr_start, len, pagemap_fd, kpageflags_fd, vaddr_orders, nr_orders); if (status) ksft_exit_fail_msg("gather folio info failed\n"); diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 6058d80c63ef..853c8a4caa1d 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -195,6 +195,129 @@ unsigned long rss_anon(void) return rss_anon; } =20 +static int vaddr_pageflags_get(char *vaddr, int pagemap_fd, int kpageflags= _fd, + uint64_t *flags) +{ + unsigned long pfn; + + pfn =3D pagemap_get_pfn(pagemap_fd, vaddr); + + /* non-present PFN */ + if (pfn =3D=3D -1UL) + return 1; + + if (pageflags_get(pfn, kpageflags_fd, flags)) + return -1; + + return 0; +} + +/* + * gather_folio_orders - scan through [vaddr_start, len) and record + * folio orders + * + * @vaddr_start: start vaddr + * @len: range length + * @pagemap_fd: file descriptor to /proc//pagemap + * @kpageflags_fd: file descriptor to /proc/kpageflags + * @orders: output folio order array + * @nr_orders: folio order array size + * + * gather_after_split_folio_orders() scan through [vaddr_start, len) and c= heck + * all folios within the range and record their orders. All order-0 pages = will + * be recorded. Non-present vaddr is skipped. + * + * NOTE: the function is used to check folio orders after a split is perfo= rmed, + * so it assumes [vaddr_start, len) fully maps to after-split folios withi= n that + * range. + * + * Return: 0 - no error, -1 - unhandled cases + */ +int gather_folio_orders(char *vaddr_start, size_t len, + int pagemap_fd, int kpageflags_fd, int orders[], int nr_orders) +{ + uint64_t page_flags =3D 0; + int cur_order =3D -1; + char *vaddr; + + if (pagemap_fd =3D=3D -1 || kpageflags_fd =3D=3D -1) + return -1; + if (!orders) + return -1; + if (nr_orders <=3D 0) + return -1; + + for (vaddr =3D vaddr_start; vaddr < vaddr_start + len;) { + char *next_folio_vaddr; + int status; + + status =3D vaddr_pageflags_get(vaddr, pagemap_fd, kpageflags_fd, + &page_flags); + if (status < 0) + return -1; + + /* skip non present vaddr */ + if (status =3D=3D 1) { + vaddr +=3D psize(); + continue; + } + + /* all order-0 pages with possible false postive (non folio) */ + if (!(page_flags & (KPF_COMPOUND_HEAD | KPF_COMPOUND_TAIL))) { + orders[0]++; + vaddr +=3D psize(); + continue; + } + + /* skip non thp compound pages */ + if (!(page_flags & KPF_THP)) { + vaddr +=3D psize(); + continue; + } + + /* vpn points to part of a THP at this point */ + if (page_flags & KPF_COMPOUND_HEAD) + cur_order =3D 1; + else { + vaddr +=3D psize(); + continue; + } + + next_folio_vaddr =3D vaddr + (1UL << (cur_order + pshift())); + + if (next_folio_vaddr >=3D vaddr_start + len) + break; + + while ((status =3D vaddr_pageflags_get(next_folio_vaddr, + pagemap_fd, kpageflags_fd, + &page_flags)) >=3D 0) { + /* + * non present vaddr, next compound head page, or + * order-0 page + */ + if (status =3D=3D 1 || + (page_flags & KPF_COMPOUND_HEAD) || + !(page_flags & (KPF_COMPOUND_HEAD | KPF_COMPOUND_TAIL))) { + if (cur_order < nr_orders) { + orders[cur_order]++; + cur_order =3D -1; + vaddr =3D next_folio_vaddr; + } + break; + } + + cur_order++; + next_folio_vaddr =3D vaddr + (1UL << (cur_order + pshift())); + } + + if (status < 0) + return status; + } + if (cur_order > 0 && cur_order < nr_orders) + orders[cur_order]++; + return 0; +} + char *__get_smap_entry(void *addr, const char *pattern, char *buf, size_t = len) { int ret; diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index a1cd446e5140..197a9b69cbba 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -89,6 +89,8 @@ int64_t allocate_transhuge(void *ptr, int pagemap_fd); unsigned long default_huge_page_size(void); int detect_hugetlb_page_sizes(size_t sizes[], int max); int pageflags_get(unsigned long pfn, int kpageflags_fd, uint64_t *flags); +int gather_folio_orders(char *vaddr_start, size_t len, + int pagemap_fd, int kpageflags_fd, int orders[], int nr_orders); =20 int uffd_register(int uffd, void *addr, uint64_t len, bool miss, bool wp, bool minor); --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6CCE2E03FE for ; Wed, 20 Aug 2025 09:07:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680875; cv=none; b=PokoFeG5U8iYc3oph9c7eXxrFI/+VRG+z55yjWNMT+6z65azDbmo0Yk67Wl6l/20xJVJ4ak74aeGfmvbNLEazFhZBJ/IRoQKRxBywlzb4k58nrn7sJClFRoUItzPsAb26Roy8SIF4M+nobA2qOOsAFANkFy18zlUsEguIWuprpU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680875; c=relaxed/simple; bh=O/EL/GABzezzJnhdCns5+dqaFA0xQGxrWt+V6vBqbzA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H53a63bD+1+syGEApE5OGBRQCsTKbcCzZqFw3JPhkH6CTP1d4G6PEA86aFnzT/sYRgkXreLuXhShCx2J8f6wJKiDjgj3+VUkR4EvCLInJRQV9akYncqscNdqcfV5xlIOSeAP4QceM3LAipBtvj4tZOUyUae+vxAZ7Nh5RMtWgPk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Ue2WAyMS; arc=none smtp.client-ip=115.124.30.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Ue2WAyMS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680869; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=k0BMZl4FO1A/3YqzLq6Cu8hUlXulJA4evFI5qXofcb4=; b=Ue2WAyMSnWxBSuqN+JfaZ3fR+nRVSxhymHNuwfUPg3Y0xfupfZefzVJp3+u6fo6WWROGqs9/Gix4DR6PvTO73u0X1KB1lfl3jVcUDvyiKZPULh1r18dFJxT/S1u/SqiRNkIZnMWvW/QCkap+S++bgPzC0ykghd8ONFQ1FJ2Gk+s= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBYU.i_1755680866 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:47 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 10/11] selftests: mm: implement the mTHP hugepage check helper Date: Wed, 20 Aug 2025 17:07:21 +0800 Message-ID: <85ad632e5ac5844a4e8a6266bcd647932e4d0b11.1755677674.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the mTHP hugepage check helper. Signed-off-by: Baolin Wang --- tools/testing/selftests/mm/vm_util.c | 52 +++++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 853c8a4caa1d..d0f8aa66b988 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -16,6 +16,10 @@ #define SMAP_FILE_PATH "/proc/self/smaps" #define STATUS_FILE_PATH "/proc/self/status" #define MAX_LINE_LENGTH 500 +#define PAGEMAP_PATH "/proc/self/pagemap" +#define KPAGEFLAGS_PATH "/proc/kpageflags" +#define GET_ORDER(nr_pages) (31 - __builtin_clz(nr_pages)) +#define NR_ORDERS 20 =20 unsigned int __page_size; unsigned int __page_shift; @@ -353,7 +357,7 @@ char *__get_smap_entry(void *addr, const char *pattern,= char *buf, size_t len) return entry; } =20 -bool __check_huge(void *addr, char *pattern, int nr_hpages, +static bool __check_pmd_huge(void *addr, char *pattern, int nr_hpages, uint64_t hpage_size) { char buffer[MAX_LINE_LENGTH]; @@ -371,19 +375,59 @@ bool __check_huge(void *addr, char *pattern, int nr_h= pages, return thp =3D=3D (nr_hpages * (hpage_size >> 10)); } =20 +static bool check_large_folios(void *addr, unsigned long size, int nr_hpag= es, uint64_t hpage_size) +{ + int pagesize =3D getpagesize(); + int order =3D GET_ORDER(hpage_size / pagesize); + int pagemap_fd, kpageflags_fd; + int orders[NR_ORDERS], status; + bool ret =3D false; + + memset(orders, 0, sizeof(int) * NR_ORDERS); + + pagemap_fd =3D open(PAGEMAP_PATH, O_RDONLY); + if (pagemap_fd =3D=3D -1) + ksft_exit_fail_msg("read pagemap fail\n"); + + kpageflags_fd =3D open(KPAGEFLAGS_PATH, O_RDONLY); + if (kpageflags_fd =3D=3D -1) { + close(pagemap_fd); + ksft_exit_fail_msg("read kpageflags fail\n"); + } + + status =3D gather_folio_orders(addr, size, pagemap_fd, + kpageflags_fd, orders, NR_ORDERS); + if (status) + goto out; + + if (orders[order] =3D=3D nr_hpages) + ret =3D true; + +out: + close(pagemap_fd); + close(kpageflags_fd); + return ret; +} + bool check_huge_anon(void *addr, unsigned long size, int nr_hpages, uint64= _t hpage_size) { - return __check_huge(addr, "AnonHugePages: ", nr_hpages, hpage_size); + if (hpage_size =3D=3D read_pmd_pagesize()) + return __check_pmd_huge(addr, "AnonHugePages: ", nr_hpages, hpage_size); + + return check_large_folios(addr, size, nr_hpages, hpage_size); } =20 bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size) { - return __check_huge(addr, "FilePmdMapped:", nr_hpages, hpage_size); + return __check_pmd_huge(addr, "FilePmdMapped:", nr_hpages, hpage_size); } =20 bool check_huge_shmem(void *addr, unsigned long size, int nr_hpages, uint6= 4_t hpage_size) { - return __check_huge(addr, "ShmemPmdMapped:", nr_hpages, hpage_size); + if (hpage_size =3D=3D read_pmd_pagesize()) + return __check_pmd_huge(addr, "ShmemPmdMapped:", nr_hpages, hpage_size); + + return check_large_folios(addr, size, nr_hpages, hpage_size); } =20 int64_t allocate_transhuge(void *ptr, int pagemap_fd) --=20 2.43.5 From nobody Sat Oct 4 04:56:50 2025 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FB132E041D for ; Wed, 20 Aug 2025 09:07:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680875; cv=none; b=gLLS3M3AiCih3tF4jRxxjwQqLXwDXUObw6JfEX/n1qyTvWPmchW9PtpbudRuQ7jvTGwewgOMRRE5vnw2nZ5mFlx8rl+1hiPavL4h1EPfZzLwUT2A5nVD7/vuMuO8EhjsYxawJbEixCBTonned8aiZSKruQYHAxt0nwb086yr37Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755680875; c=relaxed/simple; bh=aTMxy3Xrqz4tk/drzT/gt7H8OVCROcA5+hi4Rqj/Rbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LDPh5vYqjiy8dB29PCQxY3SlmvzPbuQnWxQfQXhkHDPQh25GnZSmzFX3KgNLKmEWO9CkGbOwBq+FwIaIQMDrIskQjTz2K3ZIa5OaACeqBcDuBphOTGefwfAl5+/ir07RG3u/3KV8SMdbrSi+0DiwpNHKhY+SHgpebB7RxrvoY5Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=dRby7plr; arc=none smtp.client-ip=115.124.30.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="dRby7plr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1755680869; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=lvI4x2iIQB1VP2FwvPVI+Ftp5i1v0UQgsUVTdBGrP/g=; b=dRby7plrMJPfhpilHGUKLliIM5LYudrd6fvBN5tezFpwZucv0j1gY57gwJ0pqIwYdHr4IbhlEatOF5+NbPzWwxUmttq84bRg3xD2b/GQM+OFmmItNoetydEADhFzshZv1xEVoY3Y1vwOeHJtdaaIpqBYxARIAQv+17xbFS/SiDE= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WmBICwN_1755680868 cluster:ay36) by smtp.aliyun-inc.com; Wed, 20 Aug 2025 17:07:48 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com Cc: ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 11/11] selftests: mm: add mTHP collapse test cases Date: Wed, 20 Aug 2025 17:07:22 +0800 Message-ID: X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add mTHP collapse test cases. Signed-off-by: Baolin Wang --- tools/testing/selftests/mm/khugepaged.c | 102 +++++++++++++++++++--- tools/testing/selftests/mm/run_vmtests.sh | 4 + 2 files changed, 92 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selfte= sts/mm/khugepaged.c index e529074a1fdf..f7081e9e20ec 100644 --- a/tools/testing/selftests/mm/khugepaged.c +++ b/tools/testing/selftests/mm/khugepaged.c @@ -26,9 +26,11 @@ =20 #define BASE_ADDR ((void *)(1UL << 30)) static unsigned long hpage_pmd_size; +static int hpage_pmd_order; static unsigned long page_size; static int hpage_pmd_nr; static int anon_order; +static int collapse_order; =20 #define PID_SMAPS "/proc/self/smaps" #define TEST_FILE "collapse_test_file" @@ -61,6 +63,7 @@ struct collapse_context { }; =20 static struct collapse_context *khugepaged_context; +static struct collapse_context *mthp_khugepaged_context; static struct collapse_context *madvise_context; =20 struct file_info { @@ -538,26 +541,27 @@ static void madvise_collapse(const char *msg, char *p= , int nr_hpages, =20 #define TICK 500000 static bool wait_for_scan(const char *msg, char *p, int nr_hpages, - struct mem_ops *ops) + int collap_order, struct mem_ops *ops) { - unsigned long size =3D nr_hpages * hpage_pmd_size; + unsigned long hpage_size =3D page_size << collap_order; + unsigned long size =3D nr_hpages * hpage_size; int full_scans; int timeout =3D 6; /* 3 seconds */ =20 /* Sanity check */ - if (!ops->check_huge(p, size, 0, hpage_pmd_size)) { + if (!ops->check_huge(p, size, 0, hpage_size)) { printf("Unexpected huge page\n"); exit(EXIT_FAILURE); } =20 - madvise(p, nr_hpages * hpage_pmd_size, MADV_HUGEPAGE); + madvise(p, size, MADV_HUGEPAGE); =20 /* Wait until the second full_scan completed */ full_scans =3D thp_read_num("khugepaged/full_scans") + 2; =20 printf("%s...", msg); while (timeout--) { - if (ops->check_huge(p, size, nr_hpages, hpage_pmd_size)) + if (ops->check_huge(p, size, nr_hpages, hpage_size)) break; if (thp_read_num("khugepaged/full_scans") >=3D full_scans) break; @@ -573,7 +577,7 @@ static void khugepaged_collapse(const char *msg, char *= p, int nr_hpages, { unsigned long size =3D nr_hpages * hpage_pmd_size; =20 - if (wait_for_scan(msg, p, nr_hpages, ops)) { + if (wait_for_scan(msg, p, nr_hpages, hpage_pmd_order, ops)) { if (expect) fail("Timeout"); else @@ -595,12 +599,66 @@ static void khugepaged_collapse(const char *msg, char= *p, int nr_hpages, fail("Fail"); } =20 +static void mthp_khugepaged_collapse(const char *msg, char *p, int nr_hpag= es, + struct mem_ops *ops, bool expect) +{ + unsigned long hpage_size =3D page_size << collapse_order; + unsigned long size =3D nr_hpages * hpage_pmd_size; + struct thp_settings settings =3D *thp_current_settings(); + + nr_hpages =3D size / hpage_size; + + /* Set mTHP setting for mTHP collapse */ + if (ops =3D=3D &__anon_ops) { + settings.thp_enabled =3D THP_NEVER; + settings.hugepages[collapse_order].enabled =3D THP_ALWAYS; + } else if (ops =3D=3D &__shmem_ops) { + settings.shmem_enabled =3D SHMEM_NEVER; + settings.shmem_hugepages[collapse_order].enabled =3D SHMEM_ALWAYS; + } + + thp_push_settings(&settings); + + if (wait_for_scan(msg, p, nr_hpages, collapse_order, ops)) { + if (expect) + fail("Timeout"); + else + success("OK"); + + /* Restore THP settings for mTHP collapse. */ + thp_pop_settings(); + return; + } + + /* + * For file and shmem memory, khugepaged only retracts pte entries after + * putting the new hugepage in the page cache. The hugepage must be + * subsequently refaulted to install the pmd mapping for the mm. + */ + if (ops !=3D &__anon_ops) + ops->fault(p, 0, size); + + if (ops->check_huge(p, size, expect ? (size / hpage_size) : 0, hpage_size= )) + success("OK"); + else + fail("Fail"); + + /* Restore THP settings for mTHP collapse. */ + thp_pop_settings(); +} + static struct collapse_context __khugepaged_context =3D { .collapse =3D &khugepaged_collapse, .enforce_pte_scan_limits =3D true, .name =3D "khugepaged", }; =20 +static struct collapse_context __mthp_khugepaged_context =3D { + .collapse =3D &mthp_khugepaged_collapse, + .enforce_pte_scan_limits =3D true, + .name =3D "mthp_khugepaged", +}; + static struct collapse_context __madvise_context =3D { .collapse =3D &madvise_collapse, .enforce_pte_scan_limits =3D false, @@ -650,6 +708,12 @@ static void collapse_full(struct collapse_context *c, = struct mem_ops *ops) int nr_hpages =3D 4; unsigned long size =3D nr_hpages * hpage_pmd_size; =20 + /* Only try 1 PMD sized range for mTHP collapse. */ + if (c =3D=3D &__mthp_khugepaged_context) { + nr_hpages =3D 1; + size =3D hpage_pmd_size; + } + p =3D ops->setup_area(nr_hpages); ops->fault(p, 0, size); c->collapse("Collapse multiple fully populated PTE table", p, nr_hpages, @@ -1074,7 +1138,7 @@ static void madvise_retracted_page_tables(struct coll= apse_context *c, =20 /* Let khugepaged collapse and leave pmd cleared */ if (wait_for_scan("Collapse and leave PMD cleared", p, nr_hpages, - ops)) { + hpage_pmd_order, ops)) { fail("Timeout"); return; } @@ -1089,7 +1153,7 @@ static void usage(void) { fprintf(stderr, "\nUsage: ./khugepaged [OPTIONS] [dir]\n\n"); fprintf(stderr, "\t\t: :\n"); - fprintf(stderr, "\t\t: [all|khugepaged|madvise]\n"); + fprintf(stderr, "\t\t: [all|khugepaged|mthp_khugepaged|madvise]\= n"); fprintf(stderr, "\t\t: [all|anon|file|shmem]\n"); fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n"); fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n"); @@ -1100,6 +1164,7 @@ static void usage(void) fprintf(stderr, "\t\t-h: This help message.\n"); fprintf(stderr, "\t\t-s: mTHP size, expressed as page order.\n"); fprintf(stderr, "\t\t Defaults to 0. Use this size for anon or shmem a= llocations.\n"); + fprintf(stderr, "\t\t-c: collapse order for mTHP collapse, expressed as p= age order.\n"); exit(1); } =20 @@ -1109,11 +1174,14 @@ static void parse_test_type(int argc, char **argv) char *buf; const char *token; =20 - while ((opt =3D getopt(argc, argv, "s:h")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "s:c:h")) !=3D -1) { switch (opt) { case 's': anon_order =3D atoi(optarg); break; + case 'c': + collapse_order =3D atoi(optarg); + break; case 'h': default: usage(); @@ -1139,6 +1207,10 @@ static void parse_test_type(int argc, char **argv) madvise_context =3D &__madvise_context; } else if (!strcmp(token, "khugepaged")) { khugepaged_context =3D &__khugepaged_context; + } else if (!strcmp(token, "mthp_khugepaged")) { + mthp_khugepaged_context =3D &__mthp_khugepaged_context; + if (collapse_order =3D=3D 0 || collapse_order >=3D hpage_pmd_order) + usage(); } else if (!strcmp(token, "madvise")) { madvise_context =3D &__madvise_context; } else { @@ -1173,7 +1245,6 @@ static void parse_test_type(int argc, char **argv) =20 int main(int argc, char **argv) { - int hpage_pmd_order; struct thp_settings default_settings =3D { .thp_enabled =3D THP_MADVISE, .thp_defrag =3D THP_DEFRAG_ALWAYS, @@ -1199,10 +1270,6 @@ int main(int argc, char **argv) return KSFT_SKIP; } =20 - parse_test_type(argc, argv); - - setbuf(stdout, NULL); - page_size =3D getpagesize(); hpage_pmd_size =3D read_pmd_pagesize(); if (!hpage_pmd_size) { @@ -1212,6 +1279,10 @@ int main(int argc, char **argv) hpage_pmd_nr =3D hpage_pmd_size / page_size; hpage_pmd_order =3D __builtin_ctz(hpage_pmd_nr); =20 + parse_test_type(argc, argv); + + setbuf(stdout, NULL); + default_settings.khugepaged.max_ptes_none =3D hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap =3D hpage_pmd_nr / 8; default_settings.khugepaged.max_ptes_shared =3D hpage_pmd_nr / 2; @@ -1236,11 +1307,14 @@ int main(int argc, char **argv) TEST(collapse_full, khugepaged_context, anon_ops); TEST(collapse_full, khugepaged_context, file_ops); TEST(collapse_full, khugepaged_context, shmem_ops); + TEST(collapse_full, mthp_khugepaged_context, anon_ops); + TEST(collapse_full, mthp_khugepaged_context, shmem_ops); TEST(collapse_full, madvise_context, anon_ops); TEST(collapse_full, madvise_context, file_ops); TEST(collapse_full, madvise_context, shmem_ops); =20 TEST(collapse_empty, khugepaged_context, anon_ops); + TEST(collapse_empty, mthp_khugepaged_context, anon_ops); TEST(collapse_empty, madvise_context, anon_ops); =20 TEST(collapse_single_pte_entry, khugepaged_context, anon_ops); diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/self= tests/mm/run_vmtests.sh index 75b94fdc915f..12d2a4f28ab5 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -496,6 +496,10 @@ CATEGORY=3D"thp" run_test ./khugepaged all:shmem =20 CATEGORY=3D"thp" run_test ./khugepaged -s 4 all:shmem =20 +CATEGORY=3D"thp" run_test ./khugepaged -c 4 mthp_khugepaged:anon + +CATEGORY=3D"thp" run_test ./khugepaged -c 4 mthp_khugepaged:shmem + CATEGORY=3D"thp" run_test ./transhuge-stress -d 20 =20 # Try to create XFS if not provided --=20 2.43.5