From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB65E2765ED for ; Tue, 30 Sep 2025 05:58:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211937; cv=none; b=ED8+pimWGDUnjRY0fYRS3EYsnvoT0rVlgDZaWFksQWYgMRYNwnEjFbX7WRmmmhahyqiXjDFG9a5Luc57vnNelkpj0vE3Hmz/4C9wfsIqZU0wB0Pb2pUZOiIparFeOSZdhlaECVY4i5FhUpFrb2sx6q8TZ9FGef3dZfuqAyPwSUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211937; c=relaxed/simple; bh=QKc1hKn0Jaib+kOHavXhIsp+9pbidvHuEURYso6g3oE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WCrgtjtBsou4iMxWhJXL801TpiDpuciNqHZSpG7XvbXOeEKlWjwI01hY2FArWNqgPMwbJjXvxOaFrMsShobKDO1hdFtEd0zSnA3wIPwpCafjayw8Hqwn9KjpdUYBff7/LtY9nzdYCF/wqHxkDO1iz4tJFHXEoV/MiUaOhq3yugw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=g2ryFYvm; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="g2ryFYvm" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-273a0aeed57so74652355ad.1 for ; Mon, 29 Sep 2025 22:58:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211935; x=1759816735; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IX/EZDn+5jjqePiAgsr/tPwX5d1al1/qKcHV0+MDT+Q=; b=g2ryFYvmW2ZqzMdfP0Z3NYw5nnkjztUME7R44drvxEbNioeJw3pQhDx4grVBvdyWMI 1NP4tH22HpJOQUE4rQR5UbY8lhphLLacu6D96hxz5u3CanqDCwN15kZUS9uLPC5KMVxY hL4X5zSIzg5JaCum2XfID1kz3Gxr5TKqqkPpz1ZPJiEq65m67rL5BDze3dLLhIku9UfS JOBlrCiU3MzHjOI4swbjKgVkZVWBb5Z6GJijQScSIFX7cylBNgwnFwVQyxj8mOaCKocq pwSXajnE4uzozx/HvnkkyapEY2Mfc2hCikl1SVtJiVvRzYYwZ4euysqfzIjNgh1GWyfk TU2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211935; x=1759816735; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IX/EZDn+5jjqePiAgsr/tPwX5d1al1/qKcHV0+MDT+Q=; b=TU4ZVYJnOyUGVkPAf9qeKYhaOo/t6W0MKQ0QxqrKlwYN/QcZY0HuFkgxrlY6g58i/i zzrzD5NeMe7ZRCkVJeTS7q2Sv43jRuGFcfWXBZS5q2i6m8kGjbvy30GFRHTaRc+pYu7K StbbU2FnxtOyvDSM2iAXQ7TTa4p2DquXC5SOV5ceWLs/sSiQQZsNC6KldKRr3U/wcoz9 KWLk9jDC7PK85ZMy7n2oM/14bqepGuczmbxlBXGYMWUW8ysGyETelEyvpDWKVxGYFAWo nBAxwfYd9UB4CAp+RF54FOPoq4JiWPgy5l2vmxQYe1fM5X15MkmTmbMpqHww16rW5Tq+ XgJw== X-Forwarded-Encrypted: i=1; AJvYcCWlUnaT9MV0Q2Nde6LnkeNVLFWVRGXe7SYnUbz1Pmfd4Rt1Za//GwO58y2a43nvowMiUEkZVKHtEcYmLAI=@vger.kernel.org X-Gm-Message-State: AOJu0Yxsk9FRKGHNLEh09QF5fIwozXpD8sWAROsuaF0ndAP7Kcu42tbS CbLVt6f8jYe0ayUbzJd8Mk6yRiZclTEwPMrPF8R2hL2Z4o98HIx+bduI X-Gm-Gg: ASbGncve18cPfGyck3c2F08+hc92qlDUkc7u96BJhbo+meqaREOZF4Ohmncru7SJFu1 VXIebAKrCJ8RK7I19H3X/kF5DyskDVUWp8Hy6MZPMlLi5KMnajvl+Q2qR2ATbabVwY1OgFAv9+j BbSQyertgzL26kpxBBpupFBKIX98yb0sGlMg+ADkEFeMxKogQOu3UVHXa0cpRVFQ9jFOc48gWiW ayWiUEAOEno/eVTSaOkhmeqQ3p34kjIUGF9u7C4TuTNP1cdnQwCnEndU6+3WRSpnywAttSOYYV+ FT8LuwAstEpz0SHkxQNA2Is0I4ohD1G8znMdhWX2iCAGhI8WEJh+iZOjbzEw0KiQ7T9EA1JO0ET OsypRViA07KfWDh9E4h81eglgVw+9II2UdmENrzQ5/kHHNIXBQWCiT0qF47jsxSLQijRX4+AduI 93CkA3tjDwXWjlt7usdE03bre5WxTTgsrHpYJ0aA== X-Google-Smtp-Source: AGHT+IEe87sBeRrqaS6fI58+8CLk5OyPR4wtDVThXkEE6FbVm+QMGWoKZoG0IbKrCxt+Qu4CXqUQVQ== X-Received: by 2002:a17:903:1904:b0:282:2c52:508e with SMTP id d9443c01a7336-28d171199c0mr34377675ad.8.1759211935110; Mon, 29 Sep 2025 22:58:55 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.58.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:58:54 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao , Yang Shi Subject: [PATCH v9 mm-new 01/11] mm: thp: remove vm_flags parameter from khugepaged_enter_vma() Date: Tue, 30 Sep 2025 13:58:16 +0800 Message-Id: <20250930055826.9810-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The khugepaged_enter_vma() function requires handling in two specific scenarios: 1. New VMA creation When a new VMA is created (for anon vma, it is deferred to pagefault), if vma->vm_mm is not present in khugepaged_mm_slot, it must be added. In this case, khugepaged_enter_vma() is called after vma->vm_flags have been set, allowing direct use of the VMA's flags. 2. VMA flag modification When vma->vm_flags are modified (particularly when VM_HUGEPAGE is set), the system must recheck whether to add vma->vm_mm to khugepaged_mm_slot. Currently, khugepaged_enter_vma() is called before the flag update, so the call must be relocated to occur after vma->vm_flags have been set. In the VMA merging path, khugepaged_enter_vma() is also called. For this case, since VMA merging only occurs when the vm_flags of both VMAs are identical (excluding special flags like VM_SOFTDIRTY), we can safely use target->vm_flags instead. (It is worth noting that khugepaged_enter_vma() can be removed from the VMA merging path because the VMA has already been added in the two aforementioned cases. We will address this cleanup in a separate patch.) After this change, we can further remove vm_flags parameter from thp_vma_allowable_order(). That will be handled in a followup patch. Signed-off-by: Yafang Shao Cc: Yang Shi Cc: Usama Arif --- include/linux/khugepaged.h | 10 ++++++---- mm/huge_memory.c | 2 +- mm/khugepaged.c | 27 ++++++++++++++------------- mm/madvise.c | 7 +++++++ mm/vma.c | 6 +++--- 5 files changed, 31 insertions(+), 21 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index eb1946a70cff..b30814d3d665 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -13,8 +13,8 @@ extern void khugepaged_destroy(void); extern int start_stop_khugepaged(void); extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); -extern void khugepaged_enter_vma(struct vm_area_struct *vma, - vm_flags_t vm_flags); +extern void khugepaged_enter_vma(struct vm_area_struct *vma); +extern void khugepaged_enter_mm(struct mm_struct *mm); extern void khugepaged_min_free_kbytes_update(void); extern bool current_is_khugepaged(void); extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long add= r, @@ -38,8 +38,10 @@ static inline void khugepaged_fork(struct mm_struct *mm,= struct mm_struct *oldmm static inline void khugepaged_exit(struct mm_struct *mm) { } -static inline void khugepaged_enter_vma(struct vm_area_struct *vma, - vm_flags_t vm_flags) +static inline void khugepaged_enter_vma(struct vm_area_struct *vma) +{ +} +static inline void khugepaged_enter_mm(struct mm_struct *mm) { } static inline int collapse_pte_mapped_thp(struct mm_struct *mm, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b81680b4225..ac6601f30e65 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1346,7 +1346,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) ret =3D vmf_anon_prepare(vmf); if (ret) return ret; - khugepaged_enter_vma(vma, vma->vm_flags); + khugepaged_enter_vma(vma); =20 if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7ab2d1a42df3..5088eedafc35 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -353,12 +353,6 @@ int hugepage_madvise(struct vm_area_struct *vma, #endif *vm_flags &=3D ~VM_NOHUGEPAGE; *vm_flags |=3D VM_HUGEPAGE; - /* - * If the vma become good for khugepaged to scan, - * register it here without waiting a page fault that - * may not happen any time soon. - */ - khugepaged_enter_vma(vma, *vm_flags); break; case MADV_NOHUGEPAGE: *vm_flags &=3D ~VM_HUGEPAGE; @@ -460,14 +454,21 @@ void __khugepaged_enter(struct mm_struct *mm) wake_up_interruptible(&khugepaged_wait); } =20 -void khugepaged_enter_vma(struct vm_area_struct *vma, - vm_flags_t vm_flags) +void khugepaged_enter_mm(struct mm_struct *mm) { - if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && - hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) - __khugepaged_enter(vma->vm_mm); - } + if (mm_flags_test(MMF_VM_HUGEPAGE, mm)) + return; + if (!hugepage_pmd_enabled()) + return; + + __khugepaged_enter(mm); +} + +void khugepaged_enter_vma(struct vm_area_struct *vma) +{ + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDE= R)) + return; + khugepaged_enter_mm(vma->vm_mm); } =20 void __khugepaged_exit(struct mm_struct *mm) diff --git a/mm/madvise.c b/mm/madvise.c index fb1c86e630b6..8de7c39305dd 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1425,6 +1425,13 @@ static int madvise_vma_behavior(struct madvise_behav= ior *madv_behavior) VM_WARN_ON_ONCE(madv_behavior->lock_mode !=3D MADVISE_MMAP_WRITE_LOCK); =20 error =3D madvise_update_vma(new_flags, madv_behavior); + /* + * If the vma become good for khugepaged to scan, + * register it here without waiting a page fault that + * may not happen any time soon. + */ + if (!error && new_flags & VM_HUGEPAGE) + khugepaged_enter_mm(vma->vm_mm); out: /* * madvise() returns EAGAIN if kernel resources, such as diff --git a/mm/vma.c b/mm/vma.c index a1ec405bda25..6a548b0d64cd 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -973,7 +973,7 @@ static __must_check struct vm_area_struct *vma_merge_ex= isting_range( if (err || commit_merge(vmg)) goto abort; =20 - khugepaged_enter_vma(vmg->target, vmg->vm_flags); + khugepaged_enter_vma(vmg->target); vmg->state =3D VMA_MERGE_SUCCESS; return vmg->target; =20 @@ -1093,7 +1093,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma= _merge_struct *vmg) * following VMA if we have VMAs on both sides. */ if (vmg->target && !vma_expand(vmg)) { - khugepaged_enter_vma(vmg->target, vmg->vm_flags); + khugepaged_enter_vma(vmg->target); vmg->state =3D VMA_MERGE_SUCCESS; return vmg->target; } @@ -2520,7 +2520,7 @@ static int __mmap_new_vma(struct mmap_state *map, str= uct vm_area_struct **vmap) * call covers the non-merge case. */ if (!vma_is_anonymous(vma)) - khugepaged_enter_vma(vma, map->vm_flags); + khugepaged_enter_vma(vma); *vmap =3D vma; return 0; =20 --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA21327FB10 for ; Tue, 30 Sep 2025 05:59:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211947; cv=none; b=YsRhCe6+7KuwkitHmJzPjZdCoSO9v+IGkxFhCWqztaiwsQNVWUPnwYbxfbx9Ui4EtuJKWfqUSeNMd0PxpAFpQy76nTL341POGOGrddB6dCPN93gbibdwOvmr9ITTs+9t3T3zO6/CqrMfRi6W751SaFUZxoFsWqoIR9gmCEztVAY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211947; c=relaxed/simple; bh=QVWu+s4grXw7TZwDzhbSweg4ctwhYK5zTlTC5+DUPKE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lRPAEnlNZgXeQWGC6LnzcKyyizI9MeRW6IJFQcuf027oifh1Li+2ZUB/q3b2gPx5IKB9TgBv78wHUJcIO0A0dWVN7KWP1DIdWaIW6iqk1qZxj7j5dewf+5oP5plaibEL3mDgACedtTu+fYs7tRqHwX/lfeuoJ1IjUT6gqDnKXMs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=f9f0fqla; arc=none smtp.client-ip=209.85.215.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="f9f0fqla" Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-b57bffc0248so4306055a12.0 for ; Mon, 29 Sep 2025 22:59:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211945; x=1759816745; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vy5v9XGDeS4JIoi1WBeevI4rwn9UOnGGFWMHMux6eVM=; b=f9f0fqlaxQCKGEy33qChq6ZXfFUwmNKYV2AsdiWq1xKmspChbFvz6FmhfwtA1XEc0h lV8GoRW7/MXw39sfOvf/xsDxGxgMx49SUBXxv9QCoJSkujw3fg+KIu8wWW3bPzseUfPW 5501EF8rUHf+xnXhcysWbSmTezMpcSftbGB4/qGBqSMk+ASNhC5bPVZI5D3LuWSd/v+I JciQvG2FA232u6GReh9LT1MWKZjb0nLk8AOst1F+UeOIP9P0hF3nclQEk12oBoXtlcEz fTOJao1uOz70Xziqf5PPjuWZ2YtltR4iQ6v9kK/XQtbHyNYDRgfK7SBpfzU0hm4ysa69 /5Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211945; x=1759816745; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vy5v9XGDeS4JIoi1WBeevI4rwn9UOnGGFWMHMux6eVM=; b=v5+ej1zXqg5MXo2sUGujus/ejO02Vh90gOdbamypK8/nBcuEhxGvTwpBax3vCYCOxa FFMI0BXWRs1DQC+3nQeezxumMRKCQHyPlYHhgQ1EUp+W4jq/0mmmSpb/W3ysFqefBh1K c1/eLJC3ZDndtYa4F4WdIjXVSW6YWassQrckVuye238MqrP/wMT1hVgu+nNEEDpE/yzw 48a9sRTHUOvcqEtxEYjkkDclklGOnRIJV6F0LIXGIyd0qDuuVhISI/D38BFnUivlVodO 319fXVc47WiSNJRmCyQs9MKXCmEu8VuQsUAmcTcYHfVEAO4xvNzFyBagBTt1hFpwBpCu kiUg== X-Forwarded-Encrypted: i=1; AJvYcCXr3pwlsE5+w5tYbmtR18kQhDQgvNPFwiiYVm54P2MTgHYKVgGRvt+L5oJzWauXZ+Dz0segK6+8c/1fFCs=@vger.kernel.org X-Gm-Message-State: AOJu0YyUk2V6jFN2/AQdK1CotA+A1C4MxfZ9qT6vScXPz2ivOLZYLezv ULwep3dFf+Jo26AwIiyemNfQKuwqCR3CGVVFhE3bqQ/lBvDrM53JuEQr X-Gm-Gg: ASbGncsIPhpTtcFx+qoI66PTKOJvvrJ9yK41gIjiBFCLN7tg5PD7dQiROAniW3KMvN0 ODgJjFUB8wzs2R/OwIJFgVF3FaCfND5JrKoY8WtVJK2rcwZZ2laXf5yOjwwhDibDF/0ZbZclyIk XVNTBUhRKx0OghGvObkqzE9ZGX8eCi8OVViwl3WJ0K3CQajdqP1RakgSAAwakDGKjszp7DiPrT0 lfxgh08T8qMRXZClLNpnSEz9k/ATOEzPInkCVdgpKZlpRxu7N9mIJPW05+2/lXC29tWR3n1Jy6A apqS6VKvqDkdTpbUs38QQMz5UOaUMLHt0nLERLhKrlGZBiA3ecPde0ddouLqiGw/siiDOVH4P7S A3R4jL2BGMu7uN3qy9HRIyRqxxlBMCCW7QClv1w1zW9T9WACM4VcqJCFnb6nCH/+1O1aI4JYmTe UH9N4iJ20GAe536/5L6E9M/fenCSA= X-Google-Smtp-Source: AGHT+IHiNqaI5HcMCjNBXCMCb69rPSRlL4lp8kiQPTncjuapEf6Dsp0eKT1UMyUgAU1qyrFaGosstA== X-Received: by 2002:a17:902:ce04:b0:26c:4280:4860 with SMTP id d9443c01a7336-28d16d92d9cmr37310415ad.8.1759211944857; Mon, 29 Sep 2025 22:59:04 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.58.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:59:04 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 02/11] mm: thp: remove vm_flags parameter from thp_vma_allowable_order() Date: Tue, 30 Sep 2025 13:58:17 +0800 Message-Id: <20250930055826.9810-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Because all calls to thp_vma_allowable_order() pass vma->vm_flags as the vma_flags argument, we can remove the parameter and have the function access vma->vm_flags directly. Signed-off-by: Yafang Shao Acked-by: Usama Arif --- fs/proc/task_mmu.c | 3 +-- include/linux/huge_mm.h | 16 ++++++++-------- mm/huge_memory.c | 4 ++-- mm/khugepaged.c | 10 +++++----- mm/memory.c | 11 +++++------ mm/shmem.c | 2 +- 6 files changed, 22 insertions(+), 24 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index fc35a0543f01..e713d1905750 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1369,8 +1369,7 @@ static int show_smap(struct seq_file *m, void *v) __show_smap(m, &mss, false); =20 seq_printf(m, "THPeligible: %8u\n", - !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS, - THP_ORDERS_ALL)); + !!thp_vma_allowable_orders(vma, TVA_SMAPS, THP_ORDERS_ALL)); =20 if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f327d62fc985..a635dcbb2b99 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -101,8 +101,8 @@ enum tva_type { TVA_FORCED_COLLAPSE, /* Forced collapse (e.g. MADV_COLLAPSE). */ }; =20 -#define thp_vma_allowable_order(vma, vm_flags, type, order) \ - (!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order))) +#define thp_vma_allowable_order(vma, type, order) \ + (!!thp_vma_allowable_orders(vma, type, BIT(order))) =20 #define split_folio(f) split_folio_to_list(f, NULL) =20 @@ -266,14 +266,12 @@ static inline unsigned long thp_vma_suitable_orders(s= truct vm_area_struct *vma, } =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders); =20 /** * thp_vma_allowable_orders - determine hugepage orders that are allowed f= or vma * @vma: the vm area to check - * @vm_flags: use these vm_flags instead of vma->vm_flags * @type: TVA type * @orders: bitfield of all orders to consider * @@ -287,10 +285,11 @@ unsigned long __thp_vma_allowable_orders(struct vm_ar= ea_struct *vma, */ static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders) { + vm_flags_t vm_flags =3D vma->vm_flags; + /* * Optimization to check if required orders are enabled early. Only * forced collapse ignores sysfs configs. @@ -309,7 +308,7 @@ unsigned long thp_vma_allowable_orders(struct vm_area_s= truct *vma, return 0; } =20 - return __thp_vma_allowable_orders(vma, vm_flags, type, orders); + return __thp_vma_allowable_orders(vma, type, orders); } =20 struct thpsize { @@ -329,8 +328,10 @@ struct thpsize { * through madvise or prctl. */ static inline bool vma_thp_disabled(struct vm_area_struct *vma, - vm_flags_t vm_flags, bool forced_collapse) + bool forced_collapse) { + vm_flags_t vm_flags =3D vma->vm_flags; + /* Are THPs disabled for this VMA? */ if (vm_flags & VM_NOHUGEPAGE) return true; @@ -560,7 +561,6 @@ static inline unsigned long thp_vma_suitable_orders(str= uct vm_area_struct *vma, } =20 static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct= *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ac6601f30e65..1ac476fe6dc5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -98,7 +98,6 @@ static inline bool file_thp_enabled(struct vm_area_struct= *vma) } =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders) { @@ -106,6 +105,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; const bool forced_collapse =3D type =3D=3D TVA_FORCED_COLLAPSE; unsigned long supported_orders; + vm_flags_t vm_flags =3D vma->vm_flags; =20 /* Check the intersection of requested and supported orders. */ if (vma_is_anonymous(vma)) @@ -122,7 +122,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, if (!vma->vm_mm) /* vdso */ return 0; =20 - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags, forced_collap= se)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, forced_collapse)) return 0; =20 /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5088eedafc35..b60f1856714a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -466,7 +466,7 @@ void khugepaged_enter_mm(struct mm_struct *mm) =20 void khugepaged_enter_vma(struct vm_area_struct *vma) { - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDE= R)) + if (!thp_vma_allowable_order(vma, TVA_KHUGEPAGED, PMD_ORDER)) return; khugepaged_enter_mm(vma->vm_mm); } @@ -917,7 +917,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, =20 if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, type, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, type, PMD_ORDER)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1531,7 +1531,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, * and map it by a PMD, regardless of sysfs THP settings. As such, let's * analogously elide sysfs THP settings here and force collapse. */ - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) + if (!thp_vma_allowable_order(vma, TVA_FORCED_COLLAPSE, PMD_ORDER)) return SCAN_VMA_CHECK; =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ @@ -2426,7 +2426,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORD= ER)) { + if (!thp_vma_allowable_order(vma, TVA_KHUGEPAGED, PMD_ORDER)) { skip: progress++; continue; @@ -2757,7 +2757,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, BUG_ON(vma->vm_start > start); BUG_ON(vma->vm_end < end); =20 - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) + if (!thp_vma_allowable_order(vma, TVA_FORCED_COLLAPSE, PMD_ORDER)) return -EINVAL; =20 cc =3D kmalloc(sizeof(*cc), GFP_KERNEL); diff --git a/mm/memory.c b/mm/memory.c index 7e32eb79ba99..cd04e4894725 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4558,7 +4558,7 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * Get a list of all the (large) orders below PMD_ORDER that are enabled * and suitable for swapping THP. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + orders =3D thp_vma_allowable_orders(vma, TVA_PAGEFAULT, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swap_suitable_orders(swp_offset(entry), @@ -5107,7 +5107,7 @@ static struct folio *alloc_anon_folio(struct vm_fault= *vmf) * for this vma. Then filter out the orders that can't be allocated over * the faulting address and still be fully contained in the vma. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + orders =3D thp_vma_allowable_orders(vma, TVA_PAGEFAULT, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); =20 @@ -5379,7 +5379,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct fo= lio *folio, struct page *pa * PMD mappings if THPs are disabled. As we already have a THP, * behave as if we are forcing a collapse. */ - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags, + if (thp_disabled_by_hw() || vma_thp_disabled(vma, /* forced_collapse=3D*/ true)) return ret; =20 @@ -6280,7 +6280,6 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, .gfp_mask =3D __get_fault_gfp_mask(vma), }; struct mm_struct *mm =3D vma->vm_mm; - vm_flags_t vm_flags =3D vma->vm_flags; pgd_t *pgd; p4d_t *p4d; vm_fault_t ret; @@ -6295,7 +6294,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, return VM_FAULT_OOM; retry_pud: if (pud_none(*vmf.pud) && - thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PUD_ORDER)) { + thp_vma_allowable_order(vma, TVA_PAGEFAULT, PUD_ORDER)) { ret =3D create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -6329,7 +6328,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, goto retry_pud; =20 if (pmd_none(*vmf.pmd) && - thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) { + thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) { ret =3D create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; diff --git a/mm/shmem.c b/mm/shmem.c index 4855eee22731..cc2c90656b66 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1780,7 +1780,7 @@ unsigned long shmem_allowable_huge_orders(struct inod= e *inode, vm_flags_t vm_flags =3D vma ? vma->vm_flags : 0; unsigned int global_orders; =20 - if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags, shmem= _huge_force))) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, shmem_huge_forc= e))) return 0; =20 global_orders =3D shmem_huge_global_enabled(inode, index, write_end, --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3F1A222560 for ; Tue, 30 Sep 2025 05:59:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211956; cv=none; b=kZ3sBc2LRz9p49Hx7u67FMMbkZknfelxIQl+R5DeadzP/4Cr7Cb5gku8N7gjjnC217ePT2dbmN+6d/4rJl0wRgwcC30D10XRk26tPa6OEO4Gc/CeJYb3NmpsA/hEAv7SC5+EBIDIMj3XpFO0G22j5F+KWq41Rr8+/3iiTaYURVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211956; c=relaxed/simple; bh=EWSaSgGlB/juKc59Olj/PpxOpjhYRC5Fq9YoPMt6Abg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=mSSAN9uq2fG7E/nYh1DjJtovealdiaqsFWieNb2CnKF1TJbf66cPgQqVp6UYobo9WrpqxlmaarO4Dm1Lsmrv6tpcwzQVyW1ugbsp5rHqEiET14RU9SXxvlu80U0gykb1G+jT0gjDnlwsNwuM/5kttkLUucoqEFa0pGlXRsJ5uD4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FAIjZU+P; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FAIjZU+P" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-b5565f0488bso3782137a12.2 for ; Mon, 29 Sep 2025 22:59:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211954; x=1759816754; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pU2SzCsYNRWOhFkPpxF5i6fX6DQXje7k2mX62DbPVIg=; b=FAIjZU+PIPf/JbJPiInL6gBjklILs0eEDG97nip2GXzASunDTWhb9y/hGfJyCKy6dv uGL6w/oazn/YInNXmz/AVQMQMfPNXeQYmaROjvm/5SvisXXP6xDuIj8xd9YmISA+fAag 7Ke7FOi4mPWrIi4l7zsRNJx8r/5PnPreFont9WP0u4/9VTuSQoU6FvNCrC0bLB0HDDzA MEthJOhHb+m96wwvojUv1c1cdiY/WKQx50ozb3gX1DeSMJ9n+bp6pGhy0LqX3U6CMVT8 JnajaY2iZNKsRCK1u7o2sE2u3TDJ5KNc8rXc5l3nWJo0BP0PCTN5Y8LoHg/yt1rbIUQ9 su0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211954; x=1759816754; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pU2SzCsYNRWOhFkPpxF5i6fX6DQXje7k2mX62DbPVIg=; b=mVPXmvLPhVE2qzQfpNvtdvgYF1ljgrfiyoL7CmDmN/DmOW3xkJODDe/TrGh4l0S967 AzAlVgmBVzcHf0s5OoYH0nos7eVzH9yDLrO9vJJE4MEwA2sJBJHBy70eGeAcdT74BiYh ZO6JZ578+AUq4DgTLtRN4TWL1R+v0N3uQtU2TXO0YjR8juXX4/SSxf/IeHt1ttkyOGFL h7KU/CYdSj+a6ObD25ryrkZIFGdcul9lySx1IYhLZpJwXWUwhu37BhmgfgaCj7An8Sst 5doGgtkC/EJ5ylhuWghKFF0r6qfv4JeGwwvoCEjoI2fMp/cQcrh5qFgy+KSJ/ky7UJc7 e3AQ== X-Forwarded-Encrypted: i=1; AJvYcCWCw6p6QezOJmxd4l7Su8goVDGSlqvcCB/48YmnNgZkx7pHNnCKs3eE/afy2El1A7MreYCueRsLUg4ylnQ=@vger.kernel.org X-Gm-Message-State: AOJu0Ywn1EgNJKbF/Xzt1ERLBQBSbGYDfJ8qiUem1KJ+mfrG3MskXI6x 1SKPHDb0JN4EkWLfPp5b6lHhA9OMkhOQ4aUKnX3zmwGLzZcaNcD0+RkG X-Gm-Gg: ASbGncv9QwfJ8KuGAChz0cpxLzkjOsdfdZAmx5v+G09my3nHvZ7JLUOrLZSq1ehq9pr RXB2gC9L0AFXZbwCgWoXgrr+Yk9arTv7Ry3BtPj7tGpTN5a/724Cio58ir50qfXdem9392BsXiP ODJQWzl+Tx4TssFLVBsbMsWzq3dUBXEafX+4QScQnJNnAYKxVIcMqmhH0rujY5VyepH7L5QsJaf 8fc2DAE5zDg+0liS5Wm1gmFK6GG805WOEhqwiISAQt7KJWOgFsg6jMtXc1MycRcxzh843c1rxi0 5woJemeRp4nBRnzEZdU42Fm9RngbxY/z3AiL42O0AiQtZVQpnt3Waub7WHOeN+aTNsioP9ql4Ta /dwx5JJS0UbBfmDT0aCfe6bAFmGK/NvJpkFRqvM9XR3kltwowRG6aGQ/cVi6JTFAeaMSu2Tx45q 7dUp1+jmNZyU1LeatGrqzkUJvm318= X-Google-Smtp-Source: AGHT+IFpB1ELcmxl96OYKyLCQ93t55yCZz/ljq5ccARZ0IQJmhLOUbqZtdM1H5tzU1i/eB4FY2S1gw== X-Received: by 2002:a17:903:3586:b0:268:cc5:5e4e with SMTP id d9443c01a7336-27ed4a06de2mr192026365ad.1.1759211953994; Mon, 29 Sep 2025 22:59:13 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.59.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:59:13 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao , Alexei Starovoitov Subject: [PATCH v9 mm-new 03/11] mm: thp: add support for BPF based THP order selection Date: Tue, 30 Sep 2025 13:58:18 +0800 Message-Id: <20250930055826.9810-4-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This patch introduces a new BPF struct_ops called bpf_thp_ops for dynamic THP tuning. It includes a hook bpf_hook_thp_get_order(), allowing BPF programs to influence THP order selection based on factors such as: - Workload identity For example, workloads running in specific containers or cgroups. - Allocation context Whether the allocation occurs during a page fault, khugepaged, swap or other paths. - VMA's memory advice settings MADV_HUGEPAGE or MADV_NOHUGEPAGE - Memory pressure PSI system data or associated cgroup PSI metrics The kernel API of this new BPF hook is as follows, /** * thp_order_fn_t: Get the suggested THP order from a BPF program for alloc= ation * @vma: vm_area_struct associated with the THP allocation * @type: TVA type for current @vma * @orders: Bitmask of available THP orders for this allocation * * Return: The suggested THP order for allocation from the BPF program. Mus= t be * a valid, available order. */ typedef int thp_order_fn_t(struct vm_area_struct *vma, enum tva_type type, unsigned long orders); Only a single BPF program can be attached at any given time, though it can be dynamically updated to adjust the policy. The implementation supports anonymous THP, shmem THP, and mTHP, with future extensions planned for file-backed THP. This functionality is only active when system-wide THP is configured to madvise or always mode. It remains disabled in never mode. Additionally, if THP is explicitly disabled for a specific task via prctl(), this BPF functionality will also be unavailable for that task. This BPF hook enables the implementation of flexible THP allocation policies at the system, per-cgroup, or per-task level. This feature requires CONFIG_BPF_THP (EXPERIMENTAL) to be enabled. Note that this capability is currently unstable and may undergo significant changes=E2=80=94including potential removal=E2=80=94in future kernel versio= ns. Suggested-by: David Hildenbrand Suggested-by: Lorenzo Stoakes Signed-off-by: Yafang Shao Cc: Alexei Starovoitov Cc: Usama Arif Cc: Randy Dunlap --- MAINTAINERS | 1 + include/linux/huge_mm.h | 23 +++++ mm/Kconfig | 11 +++ mm/Makefile | 1 + mm/huge_memory_bpf.c | 204 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 240 insertions(+) create mode 100644 mm/huge_memory_bpf.c diff --git a/MAINTAINERS b/MAINTAINERS index ca8e3d18eedd..7be34b2a64fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16257,6 +16257,7 @@ F: include/linux/huge_mm.h F: include/linux/khugepaged.h F: include/trace/events/huge_memory.h F: mm/huge_memory.c +F: mm/huge_memory_bpf.c F: mm/khugepaged.c F: mm/mm_slot.h F: tools/testing/selftests/mm/khugepaged.c diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a635dcbb2b99..02055cc93bfe 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -56,6 +56,7 @@ enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG, TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, + TRANSPARENT_HUGEPAGE_BPF_ATTACHED, /* BPF prog is attached */ }; =20 struct kobject; @@ -269,6 +270,23 @@ unsigned long __thp_vma_allowable_orders(struct vm_are= a_struct *vma, enum tva_type type, unsigned long orders); =20 +#ifdef CONFIG_BPF_THP + +unsigned long +bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type, + unsigned long orders); + +#else + +static inline unsigned long +bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type, + unsigned long orders) +{ + return orders; +} + +#endif + /** * thp_vma_allowable_orders - determine hugepage orders that are allowed f= or vma * @vma: the vm area to check @@ -290,6 +308,11 @@ unsigned long thp_vma_allowable_orders(struct vm_area_= struct *vma, { vm_flags_t vm_flags =3D vma->vm_flags; =20 + /* The BPF-specified order overrides which order is selected. */ + orders &=3D bpf_hook_thp_get_orders(vma, type, orders); + if (!orders) + return 0; + /* * Optimization to check if required orders are enabled early. Only * forced collapse ignores sysfs configs. diff --git a/mm/Kconfig b/mm/Kconfig index bde9f842a4a8..ffbcc5febb10 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -895,6 +895,17 @@ config NO_PAGE_MAPCOUNT =20 EXPERIMENTAL because the impact of some changes is still unclear. =20 +config BPF_THP + bool "BPF-based THP Policy (EXPERIMENTAL)" + depends on TRANSPARENT_HUGEPAGE && BPF_SYSCALL + + help + Enable dynamic THP policy adjustment using BPF programs. This feature + is currently experimental. + + WARNING: This feature is unstable and may change in future kernel + versions. + endif # TRANSPARENT_HUGEPAGE =20 # simple helper to make the code a bit easier to read diff --git a/mm/Makefile b/mm/Makefile index 21abb3353550..4efca1c8a919 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -99,6 +99,7 @@ obj-$(CONFIG_MIGRATION) +=3D migrate.o obj-$(CONFIG_NUMA) +=3D memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o +obj-$(CONFIG_BPF_THP) +=3D huge_memory_bpf.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o diff --git a/mm/huge_memory_bpf.c b/mm/huge_memory_bpf.c new file mode 100644 index 000000000000..47c124d588b2 --- /dev/null +++ b/mm/huge_memory_bpf.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * BPF-based THP policy management + * + * Author: Yafang Shao + */ + +#include +#include +#include +#include + +/** + * @thp_order_fn_t: Get the suggested THP order from a BPF program for all= ocation + * @vma: vm_area_struct associated with the THP allocation + * @type: TVA type for current @vma + * @orders: Bitmask of available THP orders for this allocation + * + * Return: The suggested THP order for allocation from the BPF program. Mu= st be + * a valid, available order. + */ +typedef int thp_order_fn_t(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders); + +struct bpf_thp_ops { + thp_order_fn_t __rcu *thp_get_order; +}; + +static struct bpf_thp_ops bpf_thp; +static DEFINE_SPINLOCK(thp_ops_lock); + +unsigned long bpf_hook_thp_get_orders(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders) +{ + thp_order_fn_t *bpf_hook_thp_get_order; + int bpf_order; + + /* No BPF program is attached */ + if (!test_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, + &transparent_hugepage_flags)) + return orders; + + rcu_read_lock(); + bpf_hook_thp_get_order =3D rcu_dereference(bpf_thp.thp_get_order); + if (WARN_ON_ONCE(!bpf_hook_thp_get_order)) + goto out; + + bpf_order =3D bpf_hook_thp_get_order(vma, type, orders); + orders &=3D BIT(bpf_order); + +out: + rcu_read_unlock(); + return orders; +} + +static bool bpf_thp_ops_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static const struct bpf_func_proto * +bpf_thp_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *pr= og) +{ + return bpf_base_func_proto(func_id, prog); +} + +static const struct bpf_verifier_ops thp_bpf_verifier_ops =3D { + .get_func_proto =3D bpf_thp_get_func_proto, + .is_valid_access =3D bpf_thp_ops_is_valid_access, +}; + +static int bpf_thp_init(struct btf *btf) +{ + return 0; +} + +static int bpf_thp_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + /* The call site operates under RCU protection. */ + if (prog->sleepable) + return -EINVAL; + return 0; +} + +static int bpf_thp_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + return 0; +} + +static int bpf_thp_reg(void *kdata, struct bpf_link *link) +{ + struct bpf_thp_ops *ops =3D kdata; + + spin_lock(&thp_ops_lock); + if (test_and_set_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, + &transparent_hugepage_flags)) { + spin_unlock(&thp_ops_lock); + return -EBUSY; + } + WARN_ON_ONCE(rcu_access_pointer(bpf_thp.thp_get_order)); + rcu_assign_pointer(bpf_thp.thp_get_order, ops->thp_get_order); + spin_unlock(&thp_ops_lock); + return 0; +} + +static void bpf_thp_unreg(void *kdata, struct bpf_link *link) +{ + thp_order_fn_t *old_fn; + + spin_lock(&thp_ops_lock); + clear_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, &transparent_hugepage_flags); + old_fn =3D rcu_replace_pointer(bpf_thp.thp_get_order, NULL, + lockdep_is_held(&thp_ops_lock)); + WARN_ON_ONCE(!old_fn); + spin_unlock(&thp_ops_lock); + + synchronize_rcu(); +} + +static int bpf_thp_update(void *kdata, void *old_kdata, struct bpf_link *l= ink) +{ + thp_order_fn_t *old_fn, *new_fn; + struct bpf_thp_ops *old =3D old_kdata; + struct bpf_thp_ops *ops =3D kdata; + int ret =3D 0; + + if (!ops || !old) + return -EINVAL; + + spin_lock(&thp_ops_lock); + /* The prog has aleady been removed. */ + if (!test_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, + &transparent_hugepage_flags)) { + ret =3D -ENOENT; + goto out; + } + + new_fn =3D rcu_dereference(ops->thp_get_order); + old_fn =3D rcu_replace_pointer(bpf_thp.thp_get_order, new_fn, + lockdep_is_held(&thp_ops_lock)); + WARN_ON_ONCE(!old_fn || !new_fn); + +out: + spin_unlock(&thp_ops_lock); + if (!ret) + synchronize_rcu(); + return ret; +} + +static int bpf_thp_validate(void *kdata) +{ + struct bpf_thp_ops *ops =3D kdata; + + if (!ops->thp_get_order) { + pr_err("bpf_thp: required ops isn't implemented\n"); + return -EINVAL; + } + return 0; +} + +static int bpf_thp_get_order(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders) +{ + return -1; +} + +static struct bpf_thp_ops __bpf_thp_ops =3D { + .thp_get_order =3D (thp_order_fn_t __rcu *)bpf_thp_get_order, +}; + +static struct bpf_struct_ops bpf_bpf_thp_ops =3D { + .verifier_ops =3D &thp_bpf_verifier_ops, + .init =3D bpf_thp_init, + .check_member =3D bpf_thp_check_member, + .init_member =3D bpf_thp_init_member, + .reg =3D bpf_thp_reg, + .unreg =3D bpf_thp_unreg, + .update =3D bpf_thp_update, + .validate =3D bpf_thp_validate, + .cfi_stubs =3D &__bpf_thp_ops, + .owner =3D THIS_MODULE, + .name =3D "bpf_thp_ops", +}; + +static int __init bpf_thp_ops_init(void) +{ + int err; + + err =3D register_bpf_struct_ops(&bpf_bpf_thp_ops, bpf_thp_ops); + if (err) + pr_err("bpf_thp: Failed to register struct_ops (%d)\n", err); + return err; +} +late_initcall(bpf_thp_ops_init); --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D242E27FB3E for ; Tue, 30 Sep 2025 05:59:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211968; cv=none; b=eAf1tBpwegmlhTiK7eNUSsdLuAjSFz2CzMRIbWVdQ2zKbjvNOtrk3aVIJFEFqSIanARorNQjlkcA1bDszcX7IiZ9jQUWD8uup3fFhHwDJuueIfzWa0xPSCOJEfq/MoSfi2HP0aMuOJQIhLtOggVV3SRCWuyxRNKSgYLjA7gdKTY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211968; c=relaxed/simple; bh=4VkQKrS6FikymCfZ1+5puhT+uYYqYYXhQgE3Hi/wt0Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=mbjsBILYdq13lz2BpkJocpALlciLTndfF5LRWoHUtjt5JHJgyKEbqtZrsmoXObFfbcwDm/3Y9jys9u8MGk/Ln5z0Ukg4x3ZCtFRzx8FiPEb45YDr1M/3Uw8W+ZU1I2C8IC5I4m8kA/7NbV/SDM2oaE+eRLwIt/6IOX9nE1ewSi8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Lst4y7dB; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Lst4y7dB" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2681660d604so52256985ad.0 for ; Mon, 29 Sep 2025 22:59:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211966; x=1759816766; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Bj30sQDVRkZpxPh0yEwciSjpXqkKKqMqk3AWrRPDElU=; b=Lst4y7dBYV6F49SB2gq8lqFojT9lOwA0/hTZKSmEvuQxysLhsV87b7g4LJefLfElOB sfhPeAqdMCa1mim0Pfbs57Tk1l7c/VUSNt6X7NoAAPi5IIeqQXjfLk3w+ae4psFPhEGc 7fEDbFZMOUoUE1TNywu5tJ3vHacVJLuQ/2K/2fLOgTY+fbx4zEr4mEiU3zie2+h6UPB7 JdQk4E1JmNrQaFaRWjvljdbEhFHg/0cXIzzAuRfz3RVSka5W3bjfvHKfwzJPRiw3iXga gkcaWuah+0QBWekZGSJu9z11IsewWc1eWQ5f8tAfE7wjkVwoKxFwOjskdL4mAMqN1jlF djVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211966; x=1759816766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Bj30sQDVRkZpxPh0yEwciSjpXqkKKqMqk3AWrRPDElU=; b=iC6v/Jz4nnHZzGM/+79OscqwoI2/N4KN0ZAQBfmudU5rxm/aHIiry0k4QGdDqUh0cZ T3tcs3ZzZb7beP4YUtvBo0ZtHupGgA6bw8AToYaz7qMv0Hoo3lgVqXwJgqe8Vha/swsX Y8kF75aEBhhGD+CFs4hrUVnGsDhJ+obgJyO3XL0HaF5uKSC1uNF8kzmMLzmvWN6OJWg9 O/fSEx5p9FgqxhoRkFQhSCwtQzLdH+VAGs6IpMjCP6cGDGLyfuUTwMohT12rxMQL72iN JliT5MH30MGnWZkKURtEz7VhVtOWBTI1gSBBa9/vyG9FCsI8NOMjgO+jIDq0xmAeQnTO Z8NQ== X-Forwarded-Encrypted: i=1; AJvYcCVQ1/ACfl4ogRsoCOnIlXKSVWBTBGvGc5MQED7bVPk2RKUurwwUn5RcCYZzPl7Oy90mKC9vi0kkE7yfhHE=@vger.kernel.org X-Gm-Message-State: AOJu0Yw2M9uE78eMulnUzSjcmP0wgmYBjcg5F5HTKmTtXMaijxl2LBp3 dANHBqLbzhSnOtJSWS5voRSqKI8VLGUlLJpG/Z8Ji8HyJixPgnFV/0so X-Gm-Gg: ASbGnctT+v14wPFhIhaL5OM/I97Ss45cZwUEpg44leHD1gUnTOQHNpAvRy5PcL74HwC iuaYT9r2yD8FsNgWt7/LHPs4lRDAFHQMknASamksaY7XETF+DEq5FWXiXpviP1LxcUx9uVxKdXo X0JJRFMQqFH/7nKvgFPMW6oWMkPFIaGUjwDWXwiRhXXsBJEH0zxqaghJ8qx7Z8lFSvgJIjohLAU V3w0qCU5xK5+NGn6/7D25p9ecwMSp0UlLWFwY1o3txPECe0nriYdaR0LWtzPRXlktAGGdqbJe3y YdhTzQPgcsoI3cjLlunC5DERWiE9zLhinPfVaoDbFnEVhZjauO07yLDIUYBag2xLSpi4ttgsGc1 a5qHydPgcKGvqop67pAWyM+3F+fIs2of5uY8ZPd459zKB0zGg2YtWI5sZ1eV8q0332eHMhyHkMd YLIGwaek+ByV+MZ8c1SUFt42byYqIEJItOJ+9tcw== X-Google-Smtp-Source: AGHT+IEmo2u/7V63Fa+4aKjEO7zXO6V10zopA5XEUoYxrgliFdmiMUfOOebvAfkXMDqtGOOwYbddwQ== X-Received: by 2002:a17:902:fc86:b0:269:8407:5ae3 with SMTP id d9443c01a7336-27ed4a60923mr250920865ad.54.1759211966029; Mon, 29 Sep 2025 22:59:26 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.59.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:59:25 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 04/11] mm: thp: decouple THP allocation between swap and page fault paths Date: Tue, 30 Sep 2025 13:58:19 +0800 Message-Id: <20250930055826.9810-5-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The new BPF capability enables finer-grained THP policy decisions by introducing separate handling for swap faults versus normal page faults. As highlighted by Barry: We=E2=80=99ve observed that swapping in large folios can lead to more swap thrashing for some workloads- e.g. kernel build. Consequently, some workloads might prefer swapping in smaller folios than those allocated by alloc_anon_folio(). While prtcl() could potentially be extended to leverage this new policy, doing so would require modifications to the uAPI. Signed-off-by: Yafang Shao Reviewed-by: Lorenzo Stoakes Acked-by: Usama Arif Cc: Barry Song <21cnbao@gmail.com> --- include/linux/huge_mm.h | 3 ++- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 02055cc93bfe..9b9dfe646daa 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -97,9 +97,10 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; =20 enum tva_type { TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */ - TVA_PAGEFAULT, /* Serving a page fault. */ + TVA_PAGEFAULT, /* Serving a non-swap page fault. */ TVA_KHUGEPAGED, /* Khugepaged collapse. */ TVA_FORCED_COLLAPSE, /* Forced collapse (e.g. MADV_COLLAPSE). */ + TVA_SWAP_PAGEFAULT, /* serving a swap page fault. */ }; =20 #define thp_vma_allowable_order(vma, type, order) \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1ac476fe6dc5..08372dfcb41a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -102,7 +102,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, unsigned long orders) { const bool smaps =3D type =3D=3D TVA_SMAPS; - const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; + const bool in_pf =3D (type =3D=3D TVA_PAGEFAULT || type =3D=3D TVA_SWAP_P= AGEFAULT); const bool forced_collapse =3D type =3D=3D TVA_FORCED_COLLAPSE; unsigned long supported_orders; vm_flags_t vm_flags =3D vma->vm_flags; diff --git a/mm/memory.c b/mm/memory.c index cd04e4894725..58ea0f93f79e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4558,7 +4558,7 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * Get a list of all the (large) orders below PMD_ORDER that are enabled * and suitable for swapping THP. */ - orders =3D thp_vma_allowable_orders(vma, TVA_PAGEFAULT, + orders =3D thp_vma_allowable_orders(vma, TVA_SWAP_PAGEFAULT, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swap_suitable_orders(swp_offset(entry), --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B35F27B354 for ; Tue, 30 Sep 2025 05:59:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211978; cv=none; b=CkxhqhB5SZdpCL3dh+BRAP1ExeV/2s2bpdMIjK7451o7tBoqXqkV/rgWtOE051itMNC3mfocQ7/GQ2ro39fWDcBXvUh6H6NywewW9vWztHjIF8xyRJOQaX3GOfeIHRr0gFw9HBUpGR4efDpPHiq37FQdQasrbLYaKQ6LJI7IgxI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211978; c=relaxed/simple; bh=PdHzrFnTfY/BX2nYC+VH4KXSooOxeg+vmzUyNWX3Omw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=h4Q+YW0GxlCaP+oqNyh6ThCfjfxQk6L4vGvAbPh2AfSDDtoaVZ4CqHnG598Pa56NYoMMEfws5ks+Anlos+W/LugFntweXbtBFhMoS17q6pihGdZOJ6t8dUIVUiOmfM23dgjMdknioqU8IAFOer/g8/IVwNneIg7IAIY8hZ5KuVs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kYKPgaf2; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kYKPgaf2" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-27eec33b737so57290725ad.1 for ; Mon, 29 Sep 2025 22:59:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211976; x=1759816776; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1tfnp2VgNI7X5NDB06OaZAZrurrAWQnxjDDxodnKF9g=; b=kYKPgaf2hBEEypXtFPz9ODbbwNAaKfrhTifLbPD/njZ1sgZk7+QkgHPfOxfUw/jmp9 oxjW1Qv5aJ5zhXL6YTHzztPeWBLMGp6xvNtz8kDtALGI3YrzKUeVKLoVlW9pFWfFVTTL 3TuCsb+6FGBOkPXHvu9hs/jXLVWy1+BqINpx33ua2bqLO8khlHNmmU0nt28coa5NrVeF Y9ciSUgrcx2WNa9usTbTc5sC+8MFE5MSADp5pyaxvqgk9GomMRuG1RAl8jXhqAEPFuug uZYVb/ILW+koYQxEWb2piECDxcVFFZVeQSOgrIHkqXkeg0po+z6bH3Ba1JlsH9xN2kJ7 Hqag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211976; x=1759816776; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1tfnp2VgNI7X5NDB06OaZAZrurrAWQnxjDDxodnKF9g=; b=EF9vnDsqtLjEOYgbnPhJ+XjveS+Ltc0MLjJIqHvlciQq63nNdb7vm5yszKZHntzrwq CoWgbAzkgYB2LJZf3oDQ1P09QVhEVo6SdQgf20v9gX+sETz/jOENaI4ePT+8qECpqZSc oHcAhcbsHmWpElskvyXVX1lS/iQmIEeH3MBWW/wqeKOkTdwJ6YkNN2yoHYBlY8FiRbJU kT2JnzzPkeRtR3kIWcy1eZlmrxw0DKU6kd+SNLVBb5Vu3MnayvmV4LXfsPLRZkV6Vr+E IhQ/D7K+bBZOrXMT5rAJuHoNtY+NBxzn6fAu283Zl3ys3ur7p9E8GL01npzxJfRjon7M YzyQ== X-Forwarded-Encrypted: i=1; AJvYcCVs4VWrVot81o+2N4e0zg7SNUHBCw+0oyZt9GZNg/KyLdKpz1CCDNcIRZG/8Mu4jb804ut623X+8GK4kuE=@vger.kernel.org X-Gm-Message-State: AOJu0YxIeqvFAjhoOV5sicFuTDN8UFD0A1eLHXRxSATIfFiQYC3s5ncC OPkQu+rQ10Z8iCNiquuj1c1K8Csd7IP4e14uSbp6swM+P53cU8Dld1cC X-Gm-Gg: ASbGnctkxBqc/hmV4fadw8koKdmW1+Gku3Zs5bcsbVa5RCjGeXk4jkwVz9P6VY89cNc qs3xf8i7hBkhWdIAWpoeLMdp75bxYTWhLTIqwhU2OlxGCkMYW9JIY9XCldzYDBZ6A0Wg7fHbbxz DC8ZsH5DGsQayVFJxep3C6FqxCyLKAw4ALTEPUr+TNjzIKpN0n0R3cp5H0ytQAguLpzPRPQifuj n4u2EhrvqIuxQvVquxnwUu7uZGB0veh58+ThRXmPqS6HiBHKI7pkhtjfJ3PjxoYm2UObDVnbn+T l32HNP4b/LspvajDwPAW9VUUpaqqx7R1VBIaJYox52eIRejzMYzWITpiRMlDEg06EGFvPzWKDKp /z+b6DVbRBdhi8MKan2Jc/HewJdI/BDQdsBI/3m8OE9jGd9cE5cAN4gyAoZTfYgRM1L4UM3H0vs IEZIV+tGwVpvvfunnL9thAlz0M79IDEYvZ18MySQ== X-Google-Smtp-Source: AGHT+IHzK9X+Xf+bVu8Rrmbp78/kS7yuMf0wwgc2gHprT/HrnmCmly3PRVr5PnyZFdbWTIQHPwUiwA== X-Received: by 2002:a17:903:3d0e:b0:250:bd52:4cdb with SMTP id d9443c01a7336-27ed4a3d9f5mr206668205ad.32.1759211976345; Mon, 29 Sep 2025 22:59:36 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.59.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:59:35 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 05/11] mm: thp: enable THP allocation exclusively through khugepaged Date: Tue, 30 Sep 2025 13:58:20 +0800 Message-Id: <20250930055826.9810-6-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" khugepaged_enter_vma() ultimately invokes any attached BPF function with the TVA_KHUGEPAGED flag set when determining whether or not to enable khugepaged THP for a freshly faulted in VMA. Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as invoked by create_huge_pmd() and only when we have already checked to see if an allowable TVA_PAGEFAULT order is specified. Since we might want to disallow THP on fault-in but allow it via khugepaged, we move things around so we always attempt to enter khugepaged upon fault. This change is safe because: - khugepaged operates at the MM level rather than per-VMA. The THP allocation might fail during page faults due to transient conditions (e.g., memory pressure), it is safe to add this MM to khugepaged for subsequent defragmentation. - If __thp_vma_allowable_orders(TVA_PAGEFAULT) returns 0, then __thp_vma_allowable_orders(TVA_KHUGEPAGED) will also return 0. While we could also extend prctl() to utilize this new policy, such a change would require a uAPI modification to PR_SET_THP_DISABLE. Signed-off-by: Yafang Shao Acked-by: Lance Yang Cc: Usama Arif --- mm/huge_memory.c | 1 - mm/memory.c | 13 ++++++++----- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 08372dfcb41a..2b155a734c78 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1346,7 +1346,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) ret =3D vmf_anon_prepare(vmf); if (ret) return ret; - khugepaged_enter_vma(vma); =20 if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && diff --git a/mm/memory.c b/mm/memory.c index 58ea0f93f79e..64f91191ffff 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6327,11 +6327,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_= struct *vma, if (pud_trans_unstable(vmf.pud)) goto retry_pud; =20 - if (pmd_none(*vmf.pmd) && - thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) { - ret =3D create_huge_pmd(&vmf); - if (!(ret & VM_FAULT_FALLBACK)) - return ret; + if (pmd_none(*vmf.pmd)) { + if (vma_is_anonymous(vma)) + khugepaged_enter_vma(vma); + if (thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) { + ret =3D create_huge_pmd(&vmf); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } } else { vmf.orig_pmd =3D pmdp_get_lockless(vmf.pmd); =20 --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BAEF280A5F for ; Tue, 30 Sep 2025 05:59:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211987; cv=none; b=UW+Jt57C5nrdaedy7IJZM7hcxynQjDwQcRnfczGJRuFx87vBqNkxi2g27hif7AbIZZBBre59+j4kNgzq+2uCoDd5AGBmd+mqlNBezJZfA9uIwyIZ6BltbcJUe3KtQ6KdgpAuyTFlCg8acH7ZzUvUMWWluQoQSKvVASS6Kf6KLUI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759211987; c=relaxed/simple; bh=CXI0qMvWqpeePDr4xBG9e2uqemr9IpjeVUqDleKN93I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=a8auq5jjwHviiyPzyIOs48L456qVcVVFvt7KbAMTSObxHrZCYkXwv5i6TptMVO+DiRHH/HqfMENOKV5BGWHDYJCioMWfHExADTmkPxDfPmm8rk6iVWs/YuqmxlTWnnZGcR55XeXKOVC8LOfQ+r+SpX8PCoyGEeY4TWAXxO7mui8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YANox5pF; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YANox5pF" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-271d1305ad7so71976745ad.2 for ; Mon, 29 Sep 2025 22:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211985; x=1759816785; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oOprrquoKlbkEreJojZ4KykFg8G0TswcVk79ghRz+Tk=; b=YANox5pFVE12qunKaUXxgelNvQtoVWW5gGIgclT/DLZg5Ns2oBahLhxY/5yLiw/H0e lXd+/IqPr//ONCHLNyAKFPbHGUUyI/HsQsoxSIB0mA3OO47s3UqZwezuiV/ytRpnC2uQ Vdku5T0A14xpECf6eAvpZvelQTqwnOTeIbZrSu/RiAtV5YywLH2HkawiDbGGa7HFIog/ MkmaAl8au+dfBBg09OoMUsxenMKlzZHS9BFRa78/U0iq90M6s7il8odJGI4OU0Z8WkN7 XEINGVkw3Bj3GSp/CYk0gSWfaVQEGXMhPExSEhCgMi9iz5ScGjiiXE13gWYHRkt2UoJX KZfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211985; x=1759816785; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oOprrquoKlbkEreJojZ4KykFg8G0TswcVk79ghRz+Tk=; b=lGyMr/EbQau2nNtuLjyeXebdB01u1FKcP4YRUQIpvmVfzDeTX+AY73Gmt4vnfHhmkq Np+vi8o20KXBD0HdnWPRl0fV5aOpb9t5Jxn5e6coRQ6zPosbAw5cDzupBsGT2eDcwNjm NycihkeRnmWO9NFLY/t+9kKaS5FcECZZTweEwmtTiWqoAsRn8z9qA+89z9EdqPuuXSkP VcaJwXpfEXI1UYBGxWH3E3L49DZa3yr0DVOctxnvy7WtXYfTZfShM61KNkbYoNsHXt+S QUM6W3dTsNLmSpLBHUFD5atQJhsJqUc7tx2Ow/d8ne46DYWud5x5PSyoUCqLnlizlWH9 +zZw== X-Forwarded-Encrypted: i=1; AJvYcCWxUi5IdOWo2fBbdkjuoKLlX0AiZUIS0SCMeJ+AHVuUA3RgaA2dMHapvLBrq+78nimo0uRHSGicHjUqKMM=@vger.kernel.org X-Gm-Message-State: AOJu0YwdYaEyQkN/8CpmzTcZ7QbUNw8GejauPvyfkFdYsI+Df5ugDraj bPH04dfMRr3iH2yhkCoYEXiC+4t9MWJw5zI9q/KBPnn+7KbTJ7dWIfWZ X-Gm-Gg: ASbGncunpnVIPv9cReYz2V6+eXYb3SjmNBuGHBmqyEtXZQkj3rCk4LylDXldVGP0Iz9 tQxO9bugJffEvL7/aSpb6TILs9GpwyRGOIJPMHyo3AlqlxuSF36imYv2zOyxJHzrv7F50eFTgmn YMn2zL03+EAXrQscTjULfa6mNypTNK8eVqry1alfCx/SZ6fGtlkQyu1wXPLZX4uEkVPGzZ6Nd3U yjFsfNZZKhmeH3we5b1/iRMt4AS1NU11mU1seQNl2213fwMfaQBs3ky1J2bDzBQYS2w+gwzdjzk kHtnVRi76LLwk1bMwhSnt6+jgPKDdyBHeZgK5JcOBgme5fgsL/UPt/tdLRsCqE2CCGt5E9ZiYmg CC9jS6Tb0YN9RH9m3r7SfSbtWtN/wxADp7PQm1J94u2RuCjzJjra73FATbIi0ry3WcVRMzqnrBk fYAwpmf8eAIDdLQz+1oeMXyHp8nJmi+BpdeCPUHA== X-Google-Smtp-Source: AGHT+IHaVympRvUhy+mNwqzkCqg8GQH0WHYHPq1npnx2or9ONNHwBhPtVzKfVWrGEP9ED65uKj4qEQ== X-Received: by 2002:a17:903:b83:b0:265:47:a7bd with SMTP id d9443c01a7336-27ed49b802bmr208932635ad.4.1759211985356; Mon, 29 Sep 2025 22:59:45 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.59.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:59:44 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 06/11] bpf: mark mm->owner as __safe_rcu_or_null Date: Tue, 30 Sep 2025 13:58:21 +0800 Message-Id: <20250930055826.9810-7-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When CONFIG_MEMCG is enabled, we can access mm->owner under RCU. The owner can be NULL. With this change, BPF helpers can safely access mm->owner to retrieve the associated task from the mm. We can then make policy decision based on the task attribute. The typical use case is as follows, bpf_rcu_read_lock(); // rcu lock must be held for rcu trusted field @owner =3D @mm->owner; // mm_struct::owner is rcu trusted or null if (!@owner) goto out; /* Do something based on the task attribute */ out: bpf_rcu_read_unlock(); Suggested-by: Andrii Nakryiko Signed-off-by: Yafang Shao Acked-by: Lorenzo Stoakes --- kernel/bpf/verifier.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c4f69a9e9af6..d400e18ee31e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7123,6 +7123,9 @@ BTF_TYPE_SAFE_RCU(struct cgroup_subsys_state) { /* RCU trusted: these fields are trusted in RCU CS and can be NULL */ BTF_TYPE_SAFE_RCU_OR_NULL(struct mm_struct) { struct file __rcu *exe_file; +#ifdef CONFIG_MEMCG + struct task_struct __rcu *owner; +#endif }; =20 /* skb->sk, req->sk are not RCU protected, but we mark them as such --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08E43222560 for ; Tue, 30 Sep 2025 05:59:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212000; cv=none; b=E3ICepeGKfpQVT4gqBLY+dK9S6oozOLOq/dWbltaQUAr9T9G87qulmHkwnKrx2RxWOueTBw29UmezWPdSlnTOnvOfg2ZsgeL4Pp9fBEWZnptqDH8WHWIUTVjpzfMd63hf0e5sGy0NOAlgOqDEP5iH1OXltqDYgLd/kQ3Wjlzxz4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212000; c=relaxed/simple; bh=/PMdDC4FDfpJHwrfvPwAn92Vwj5QW0m0NvTBiXm9L2o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=C/1PRSjA2OPatF+HiB0FYgMOoA1Ffz32+xUTlimb8+MfSW8GeHG0Ttg2nm5APSxIBLUcftUSp91jJUieFPUmj5KQnkmwe2YcCkUFUkNQnS+zqu4xiWFJV0/TJlMa+nLkYxxjWsi4q9K3WohCkEEKSbw2c2KfgoHAkXqCJjVQjOc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jKxAC9SS; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jKxAC9SS" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-b57bf560703so4119986a12.2 for ; Mon, 29 Sep 2025 22:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759211998; x=1759816798; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n/w5JYrEq/PCfn3Rwv8DsVtkgCdC0Y7Hm3GpSfLGEBg=; b=jKxAC9SSr5PpQCU4NI4dYxin5h1Ru0bOlUGNViEBSnJ0FUuVExjzi+D5UWedLVjTnj hwcA1w6HPBtE3MZKT8Dc9L3NhRCDa8ykZYueCh8Y6hNvgxvX/iSOA/fcBKWkRE71EAmC fyhWXxE7vROkgCcpS8hjbUK+MWJzWZtYyzJpZSRQX2uFI2cbWD5+3it1QP/5lEDlPLEL rUHEInGsLfiPCu/g0RIFtOoSKL4CBqJ6IuR4+Nsyg6ujC2VEkrVMhJLkSdI3ZF3gJEPd Utr10XWiZXLhu4wfJMoY5d39u/LssMpcUtj8U+Zdqn1q7IEiobYyhd2DEJhWfrbzabUm TlTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759211998; x=1759816798; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n/w5JYrEq/PCfn3Rwv8DsVtkgCdC0Y7Hm3GpSfLGEBg=; b=hqZB99KsE6e9gZWH+BGD6O/rgMtNZd4R/sbib8GhcOsAYu+v/5GYP1iDX+GZRFLy6f 9M/YmAxzRc1MKlHuJ4xCpXIo/bql/0S4WFbGM2iuA/s05VAiSgLzLVdfOy9vQ0LPwf8K LuPKMgWeE5B1zPDGx0ju/A0tU/ig7FQXJqNokmFQy7cpWzJcuAdE6QOrBP5nlbLuxNWK KnawHpbzIMz6xzi5LHa+lUmQEzuZUiqoi12J4GAoCigPPDZf/7PXkmrOYDJxuLzS/pHy JktYDN3jrGRCulgtrhEvRkBEz6fQ0dznb5z9KKnpUQoypV1ZnIGyCS2ElVQ2GhlaMDTj ov+g== X-Forwarded-Encrypted: i=1; AJvYcCUPOsw1/fmsRpCORnAzuvcUiRc3fIOhxSCYnhBXn5+BxrgFeCtI181uR9l8NOd29NQOp2lnUZKZpwm0LCI=@vger.kernel.org X-Gm-Message-State: AOJu0YxhKwJi8Vl04m6638tQXVvgvshPbMHUKXYrKNrwAmIO8Wm03q1G SmP5XGDI5DNPL/S39SB+u1XhXd+5hkBgCbpR1lTAeQboeJdSpxFUOHQN X-Gm-Gg: ASbGncvqtKEOrRUaEOq2YY8ULhj+K7/wZppy4+gwtvZ+NyhamjieQSjsB6sWWq4pVG8 j3X4WJJAw+53CpV6I6BmE1nj1ozcXodeSl0aI38Q9RU0KrX9OM5bgVetLDJAZ1FyRMGCMtv/xkf bcPyDVmFesp3QWS5nNDyzm7QIHZNjOgMFNVp9gwljQzwkDjGYxY1YxAVN/3XRmXtpJfxaKSFz2w IIIfmJZnQ9Zu95bXDPShnoIuJk66DMun7Aq38Q2CsTfDn0HX153ui3ayHhLish8Dm/n6dCSPeju td55VXPYx9jjY62hUaX7Jelr5LVi1p8igJw09VZl46TWYxZYWjOKEADLIgqsBxnAtLl0ilrwLgV Zn0YDM6ba7dC0DBa9/rIfHAab9hrc/KU1a2oX8hDIPhXUTg/amFyjK2BFwZIewfqf5Kpoqpbizr XYajouYYWAH6tThGLBGdkeOBij5rVhd862ER19Yw== X-Google-Smtp-Source: AGHT+IE9k45OyD1zQ/sX5CO57YKwXTRytMKq5dOgl/BGR9iX74e3JrI32v11I5E6AD/8k/izN7MaGQ== X-Received: by 2002:a17:903:13ce:b0:275:c2f:1b41 with SMTP id d9443c01a7336-27ed4ada760mr178958735ad.53.1759211998324; Mon, 29 Sep 2025 22:59:58 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.59.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 22:59:57 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 07/11] bpf: mark vma->vm_mm as __safe_trusted_or_null Date: Tue, 30 Sep 2025 13:58:22 +0800 Message-Id: <20250930055826.9810-8-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus, we can mark it as trusted_or_null. With this change, BPF helpers can safely access vma->vm_mm to retrieve the associated mm_struct from the VMA. Then we can make policy decision from the VMA. The "trusted" annotation enables direct access to vma->vm_mm within kfuncs marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and bpf_task_under_cgroup(). Conversely, "null" enforcement requires all callsites using vma->vm_mm to perform NULL checks. The lsm selftest must be modified because it directly accesses vma->vm_mm without a NULL pointer check; otherwise it will break due to this change. For the VMA based THP policy, the use case is as follows, @mm =3D @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null if (!@mm) return; bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner @owner =3D @mm->owner; // mm_struct::owner is rcu trusted or null if (!@owner) goto out; @cgroup1 =3D bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID); /* make the decision based on the @cgroup1 attribute */ bpf_cgroup_release(@cgroup1); // release the associated cgroup out: bpf_rcu_read_unlock(); PSI memory information can be obtained from the associated cgroup to inform policy decisions. Since upstream PSI support is currently limited to cgroup v2, the following example demonstrates cgroup v2 implementation: @owner =3D @mm->owner; if (@owner) { // @ancestor_cgid is user-configured @ancestor =3D bpf_cgroup_from_id(@ancestor_cgid); if (bpf_task_under_cgroup(@owner, @ancestor)) { @psi_group =3D @ancestor->psi; /* Extract PSI metrics from @psi_group and * implement policy logic based on the values */ } } Signed-off-by: Yafang Shao Acked-by: Lorenzo Stoakes Cc: "Liam R. Howlett" --- kernel/bpf/verifier.c | 5 +++++ tools/testing/selftests/bpf/progs/lsm.c | 8 +++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d400e18ee31e..b708b98f796c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7165,6 +7165,10 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket) { struct sock *sk; }; =20 +BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct) { + struct mm_struct *vm_mm; +}; + static bool type_is_rcu(struct bpf_verifier_env *env, struct bpf_reg_state *reg, const char *field_name, u32 btf_id) @@ -7206,6 +7210,7 @@ static bool type_is_trusted_or_null(struct bpf_verifi= er_env *env, { BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket)); BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct dentry)); + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct)); =20 return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_trusted_or_null"); diff --git a/tools/testing/selftests/bpf/progs/lsm.c b/tools/testing/selfte= sts/bpf/progs/lsm.c index 0c13b7409947..7de173daf27b 100644 --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -89,14 +89,16 @@ SEC("lsm/file_mprotect") int BPF_PROG(test_int_hook, struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot, int ret) { - if (ret !=3D 0) + struct mm_struct *mm =3D vma->vm_mm; + + if (ret !=3D 0 || !mm) return ret; =20 __s32 pid =3D bpf_get_current_pid_tgid() >> 32; int is_stack =3D 0; =20 - is_stack =3D (vma->vm_start <=3D vma->vm_mm->start_stack && - vma->vm_end >=3D vma->vm_mm->start_stack); + is_stack =3D (vma->vm_start <=3D mm->start_stack && + vma->vm_end >=3D mm->start_stack); =20 if (is_stack && monitored_pid =3D=3D pid) { mprotect_count++; --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D0E32857FC for ; Tue, 30 Sep 2025 06:00:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212011; cv=none; b=C79tdmvW6tJJ2rYFmIjzzYu10twVkl+q+G3RRlt6mPZYHns8zUFf7VNsP8ddTA2Pu46m0P+tArRVkYUijBRbDru3e9gv85LQaEPyaQvJi42kzH2n3fRj/VXdAb+G2772RHT485SJJ70trrVceqCuYAPVzmplVPa25IQaHjDVpFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212011; c=relaxed/simple; bh=+qf9s7Xdxiy4sr7f2TT6UbIoM16KyqoO4xQCqkCt2gc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ESs63QwOeO6Y1qfP0VzuYLQD7FIT9AAxYOj4APp49rdDNBtNbEkt5TILPsGKIlZUlyjG0W9+C58mw0XyX73OHBEWT43ZVbrY/27IMirNMDuocGt7PnHG1IAlRTwFwEoYI3Y09fZv2by8xnsEvdFYNZLeLhzfwQrL0bfBHs6Hl0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mTvvEGdx; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mTvvEGdx" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-b556284db11so5337720a12.0 for ; Mon, 29 Sep 2025 23:00:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759212009; x=1759816809; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q9AJux588gWgrV0mh0Bl4XWK/14sRfUKl951c+dMtJo=; b=mTvvEGdxC3uZf/9tPAufQYY0Zj7Jj2LEug/Q/dY32LzMHkW0A55Pzum+PnfoQ0wcHd iUXpmWKDgg/2LlYB0X2tet89vAHEBH2qDoA7F5PdHwnH6FHlQ4u3i9a9kndbutmfEhS9 ekX22GQeH9TkVnsNH5TV4LiWDsdq1gEZ3t3oTlXNmYbQD8CyqgUd59GS486Pm1legy7h 2/cm6xJEKS+NUXkXvRU+Rx7H9VhIpTc7/LmFDiKK5N7CAnO2shpYj2T/J6zcGjydj1/k 9An0iGN84ggvusbYZgnxLcEYKvvHbpVQwOszASwPOTuLBQga8tiDleoSD9AssTCC3SVD /4xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759212009; x=1759816809; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q9AJux588gWgrV0mh0Bl4XWK/14sRfUKl951c+dMtJo=; b=YFaYrfmlS8+YXt6NoAVsn4d0Q3lu8enoX21j+ekgefkHlgrHyXHLBbP0b1YGPOzbh0 BOCjPs3BsYT7vLT0xeYKrKT6DGHCWhKWUVQViC11v9DTbqviKHgobG2xCwVa63z11ghl 8t63CVj9gmYLozzwefwxLB934beVXMHCB7d8DD4TUhf0+x97MKBOh8jW3R1aeqJh+Z0w ma60wSRBnW0MSF/yHTFGpbHQ1DJuWLyEuMyp8KY9xEMwcOHrmdXohIlrXUR7pv9gsPwV 7DXwl//NGaHbSFbInJ+GXnBi4pT/AU9sN/I3Q/bkPV/Xh6D7HV5Xf4dRqgS7GwdDMojv EaBw== X-Forwarded-Encrypted: i=1; AJvYcCWCz0nAr/XKuSRR2zZqlkg5PcNZ8tBLdimqPeRnYfR2lsUm01NXscT1D7sNDXkOHKwmbtiApTq1Bs0HKk0=@vger.kernel.org X-Gm-Message-State: AOJu0Yy4ghks5CcuGLXfMW7AOWw9Y6IorYMG4C6bd/MPSnomGi7r3sdR 1qvM8ja4fpeSt+Bnlxe6lF8KODoMD5kSNEEXyuGRMikWA0DHKUwAiTNH X-Gm-Gg: ASbGnctPekIO0lHLzsaILJNX1pTW6y/5az59rq7a6GYnEPqe6H6BVUxGn14SUKxCjPE ALnS43dFZst5wUY8Kdi1Ppy85yRKckqk2t5GMnG849qgrWtcLUTYg+IZlTz8Fj+0Q3+PGNu326I tsh4M56Nsz8xbgG/Kg4yb8S4nW3Z7ukxwkO+6p+ZXkm1SGFF1ljz9/S/MaYxSVEIpJbOvkd0b2a UZszej5cZ5M/0gxf/dUi0bVhQDEWhftynL50Uz2r8Npin4u1yM+EGcb/6CnBmLc0dukFgjYn5D5 8G1GUwguBR+NJIsWKeadbzEce2oO/E98WOnKSqop6CC2YBI4Du+WroNOvYSZjtVKEsCB7QN+hFS mda5vWVwT43iVuCCFuA6j3xU1rl6DrnVQpFoEmyZ4BhwooRbAOxSfrBqA+Z7jBerWTLlyn+x5nD qAVqzMFdtslCIgigILdqt0+W5U5DExwJO//qqX8Q== X-Google-Smtp-Source: AGHT+IFxOy+OVpj7YYQUa1xeFuFT66rHzPXVfIt7SgvZyLCGZHlY21OivixoSzSZemCeMeGafMhUvg== X-Received: by 2002:a17:902:fc8d:b0:25c:2ed4:fd7f with SMTP id d9443c01a7336-27ed4a6e9e4mr225370925ad.42.1759212008758; Mon, 29 Sep 2025 23:00:08 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.22.59.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 23:00:08 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 08/11] selftests/bpf: add a simple BPF based THP policy Date: Tue, 30 Sep 2025 13:58:23 +0800 Message-Id: <20250930055826.9810-9-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test case implements a basic THP policy that sets THPeligible to 1 for a specific task and to 0 for all others. I selected THPeligible for verification because its straightforward nature makes it ideal for validating the BPF THP policy functionality. Below configs must be enabled for this test: CONFIG_BPF_THP=3Dy CONFIG_MEMCG=3Dy CONFIG_TRANSPARENT_HUGEPAGE=3Dy Signed-off-by: Yafang Shao --- MAINTAINERS | 2 + tools/testing/selftests/bpf/config | 3 + .../selftests/bpf/prog_tests/thp_adjust.c | 257 ++++++++++++++++++ .../selftests/bpf/progs/test_thp_adjust.c | 41 +++ 4 files changed, 303 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/thp_adjust.c create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust.c diff --git a/MAINTAINERS b/MAINTAINERS index 7be34b2a64fd..c1219bcd27c1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16260,6 +16260,8 @@ F: mm/huge_memory.c F: mm/huge_memory_bpf.c F: mm/khugepaged.c F: mm/mm_slot.h +F: tools/testing/selftests/bpf/prog_tests/thp_adjust.c +F: tools/testing/selftests/bpf/progs/test_thp_adjust* F: tools/testing/selftests/mm/khugepaged.c F: tools/testing/selftests/mm/split_huge_page_test.c F: tools/testing/selftests/mm/transhuge-stress.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/b= pf/config index 8916ab814a3e..13711f773091 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -9,6 +9,7 @@ CONFIG_BPF_LIRC_MODE2=3Dy CONFIG_BPF_LSM=3Dy CONFIG_BPF_STREAM_PARSER=3Dy CONFIG_BPF_SYSCALL=3Dy +CONFIG_BPF_THP=3Dy # CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set CONFIG_CGROUP_BPF=3Dy CONFIG_CRYPTO_HMAC=3Dy @@ -51,6 +52,7 @@ CONFIG_IPV6_TUNNEL=3Dy CONFIG_KEYS=3Dy CONFIG_LIRC=3Dy CONFIG_LWTUNNEL=3Dy +CONFIG_MEMCG=3Dy CONFIG_MODULE_SIG=3Dy CONFIG_MODULE_SRCVERSION_ALL=3Dy CONFIG_MODULE_UNLOAD=3Dy @@ -114,6 +116,7 @@ CONFIG_SECURITY=3Dy CONFIG_SECURITYFS=3Dy CONFIG_SYN_COOKIES=3Dy CONFIG_TEST_BPF=3Dm +CONFIG_TRANSPARENT_HUGEPAGE=3Dy CONFIG_UDMABUF=3Dy CONFIG_USERFAULTFD=3Dy CONFIG_VSOCKETS=3Dy diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/te= sting/selftests/bpf/prog_tests/thp_adjust.c new file mode 100644 index 000000000000..0a5a43416f2f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c @@ -0,0 +1,257 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include "test_thp_adjust.skel.h" + +#define LEN (16 * 1024 * 1024) /* 16MB */ +#define THP_ENABLED_FILE "/sys/kernel/mm/transparent_hugepage/enabled" +#define PMD_SIZE_FILE "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" + +static struct test_thp_adjust *skel; +static char old_mode[32]; +static long pagesize; + +static int thp_mode_save(void) +{ + const char *start, *end; + char buf[128]; + int fd, err; + size_t len; + + fd =3D open(THP_ENABLED_FILE, O_RDONLY); + if (fd =3D=3D -1) + return -1; + + err =3D read(fd, buf, sizeof(buf) - 1); + if (err =3D=3D -1) + goto close; + + start =3D strchr(buf, '['); + end =3D start ? strchr(start, ']') : NULL; + if (!start || !end || end <=3D start) { + err =3D -1; + goto close; + } + + len =3D end - start - 1; + if (len >=3D sizeof(old_mode)) + len =3D sizeof(old_mode) - 1; + strncpy(old_mode, start + 1, len); + old_mode[len] =3D '\0'; + +close: + close(fd); + return err; +} + +static int thp_mode_set(const char *desired_mode) +{ + int fd, err; + + fd =3D open(THP_ENABLED_FILE, O_RDWR); + if (fd =3D=3D -1) + return -1; + + err =3D write(fd, desired_mode, strlen(desired_mode)); + close(fd); + return err; +} + +static int thp_mode_reset(void) +{ + int fd, err; + + fd =3D open(THP_ENABLED_FILE, O_WRONLY); + if (fd =3D=3D -1) + return -1; + + err =3D write(fd, old_mode, strlen(old_mode)); + close(fd); + return err; +} + +static char *thp_alloc(void) +{ + char *addr; + int err, i; + + addr =3D mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, = -1, 0); + if (addr =3D=3D MAP_FAILED) + return NULL; + + err =3D madvise(addr, LEN, MADV_HUGEPAGE); + if (err =3D=3D -1) + goto unmap; + + /* Accessing a single byte within a page is sufficient to trigger a page = fault. */ + for (i =3D 0; i < LEN; i +=3D pagesize) + addr[i] =3D 1; + return addr; + +unmap: + munmap(addr, LEN); + return NULL; +} + +static void thp_free(char *ptr) +{ + munmap(ptr, LEN); +} + +static int get_pmd_order(void) +{ + ssize_t bytes_read, size; + int fd, order, ret =3D -1; + char buf[64], *endptr; + + fd =3D open(PMD_SIZE_FILE, O_RDONLY); + if (fd < 0) + return -1; + + bytes_read =3D read(fd, buf, sizeof(buf) - 1); + if (bytes_read <=3D 0) + goto close_fd; + + /* Remove potential newline character */ + if (buf[bytes_read - 1] =3D=3D '\n') + buf[bytes_read - 1] =3D '\0'; + + size =3D strtoul(buf, &endptr, 10); + if (endptr =3D=3D buf || *endptr !=3D '\0') + goto close_fd; + if (size % pagesize !=3D 0) + goto close_fd; + ret =3D size / pagesize; + if ((ret & (ret - 1)) =3D=3D 0) { + order =3D 0; + while (ret > 1) { + ret >>=3D 1; + order++; + } + ret =3D order; + } + +close_fd: + close(fd); + return ret; +} + +static int get_thp_eligible(pid_t pid, unsigned long addr) +{ + int this_vma =3D 0, eligible =3D -1; + unsigned long start, end; + char smaps_path[64]; + FILE *smaps_file; + char line[4096]; + + snprintf(smaps_path, sizeof(smaps_path), "/proc/%d/smaps", pid); + smaps_file =3D fopen(smaps_path, "r"); + if (!smaps_file) + return -1; + + while (fgets(line, sizeof(line), smaps_file)) { + if (sscanf(line, "%lx-%lx", &start, &end) =3D=3D 2) { + /* addr is monotonic */ + if (addr < start) + break; + this_vma =3D (addr >=3D start && addr < end) ? 1 : 0; + continue; + } + + if (!this_vma) + continue; + + if (strstr(line, "THPeligible:")) { + sscanf(line, "THPeligible: %d", &eligible); + break; + } + } + + fclose(smaps_file); + return eligible; +} + +static void subtest_thp_eligible(void) +{ + struct bpf_link *ops_link; + int elighble; + pid_t pid; + char *ptr; + + ops_link =3D bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops); + if (!ASSERT_OK_PTR(ops_link, "attach struct_ops")) + return; + + pid =3D getpid(); + ptr =3D thp_alloc(); + if (!ASSERT_OK_PTR(ptr, "THP alloc")) + goto detach; + + skel->bss->pid_eligible =3D pid; + elighble =3D get_thp_eligible(pid, (unsigned long)ptr); + ASSERT_EQ(elighble, 1, "THPeligible"); + + skel->bss->pid_eligible =3D 0; + skel->bss->pid_not_eligible =3D pid; + elighble =3D get_thp_eligible(pid, (unsigned long)ptr); + ASSERT_EQ(elighble, 0, "THP not eligible"); + + skel->bss->pid_eligible =3D 0; + skel->bss->pid_not_eligible =3D 0; + elighble =3D get_thp_eligible(pid, (unsigned long)ptr); + ASSERT_EQ(elighble, 0, "THP not eligible"); + + thp_free(ptr); +detach: + bpf_link__destroy(ops_link); +} + +static int thp_adjust_setup(void) +{ + int err =3D -1, pmd_order; + + pagesize =3D sysconf(_SC_PAGESIZE); + pmd_order =3D get_pmd_order(); + if (!ASSERT_NEQ(pmd_order, -1, "get_pmd_order")) + return -1; + + if (!ASSERT_NEQ(thp_mode_save(), -1, "THP mode save")) + return -1; + if (!ASSERT_GE(thp_mode_set("madvise"), 0, "THP mode set")) + return -1; + + skel =3D test_thp_adjust__open(); + if (!ASSERT_OK_PTR(skel, "open")) + goto thp_reset; + + skel->bss->pmd_order =3D pmd_order; + + err =3D test_thp_adjust__load(skel); + if (!ASSERT_OK(err, "load")) + goto destroy; + return 0; + +destroy: + test_thp_adjust__destroy(skel); +thp_reset: + ASSERT_GE(thp_mode_reset(), 0, "THP mode reset"); + return err; +} + +static void thp_adjust_destroy(void) +{ + test_thp_adjust__destroy(skel); + ASSERT_GE(thp_mode_reset(), 0, "THP mode reset"); +} + +void test_thp_adjust(void) +{ + if (thp_adjust_setup() =3D=3D -1) + return; + + if (test__start_subtest("thp_eligible")) + subtest_thp_eligible(); + + thp_adjust_destroy(); +} diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust.c b/tools/te= sting/selftests/bpf/progs/test_thp_adjust.c new file mode 100644 index 000000000000..74ad70c837ba --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +char _license[] SEC("license") =3D "GPL"; + +int pid_not_eligible, pid_eligible; +int pmd_order; + +SEC("struct_ops/thp_get_order") +int BPF_PROG(thp_eligible, struct vm_area_struct *vma, enum tva_type type, + unsigned long orders) +{ + struct mm_struct *mm =3D vma->vm_mm; + int suggested_order =3D 0; + struct task_struct *p; + + if (type !=3D TVA_SMAPS) + return 0; + + if (!mm) + return 0; + + /* This BPF hook is already under RCU */ + p =3D mm->owner; + if (!p || (p->pid !=3D pid_eligible && p->pid !=3D pid_not_eligible)) + return 0; + + if (p->pid =3D=3D pid_eligible) + suggested_order =3D pmd_order; + else + suggested_order =3D 30; /* invalid order */ + return suggested_order; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops thp_eligible_ops =3D { + .thp_get_order =3D (void *)thp_eligible, +}; --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBE4839ACF for ; Tue, 30 Sep 2025 06:00:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212021; cv=none; b=BD9Q43HSaJ8rJo9E35xaiNr4uUD73Ue/D6fT5hSYmUAhb34szRkoa0mHoeeFFVE2rUPnKv27+c9mOBOOvr467vUTGsliZr3xisBc/eBWxoHjNan2KBsArjhES79wxE9w/nsuWvnmjxW82MnD16ztPZd5kEwHI9H2fzGVC5pNwjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212021; c=relaxed/simple; bh=R5zbW5h2mVTkZ695ePmIiAB1l/JHUe56ilvlBhyPTSI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Vua3H73PZ6Y8QrkLubpEoqeQ4LnjYwDCepr9QgxuyLVkYT4Vjd0mYwWitAM3xFjC38UsCFnFqmoFlmiVtv2T7C8CggqcHJyWIrrY2ifdLHOOfkRAyC21KzarbyE06/cHsuZ/VPrRK2llhl+egy9BDGoEjWUdHW03Elg8opPB2Ow= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=io7zauPe; arc=none smtp.client-ip=209.85.215.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="io7zauPe" Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-b5507d3ccd8so4765302a12.0 for ; Mon, 29 Sep 2025 23:00:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759212019; x=1759816819; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uIx6JtRigPCSTkRZZksmLfb/7I11NxpDSXp1+H8RPCU=; b=io7zauPe92j8Iab3UKDNWGr9ECNd6z3ovZSzeGNv3OienplqWGBhPqRAaX+G2FbOBM 7jKrM0acttAamBgZo7rX/m/seCs4z5XmxihPDTvlOpFHTS3KBdK5U3aEdzOYoOTQ0xHC 8nSZOO8KzqD8GsolLJvXu+ZmK3oh7qKiLHv5aGQxPjI4zYtavJ0uw6e0fpub35EPzt9X akRqBnPHWB6XPn+PW7UZIEKeZLp5TkNXgeh3GSNjbCWb09LXvGGBG5KSgmpIIz0sAp29 JTrhaEdY28DO3hAej/7l494SwIR5UAVfI0NvOcS0s5Kx52PJ1DoOTrmjgFhmYOk//hyd QdQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759212019; x=1759816819; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uIx6JtRigPCSTkRZZksmLfb/7I11NxpDSXp1+H8RPCU=; b=kagydq6NRa7mZusoyEnVqxmdzrUtQATYWadz+XTUGAOdUpSwOqSMwZ591RnD21CWtr iXFpeMpcrqSceMMmamMsb5nN4BsArvQJbVbItQr3Z+yrFdr76QV9MOhpmnZ3fKfZpy9M woKwfveUGHDY8TzhLgbpcRwCT9IP5I54loSGIweVAsysL9mo6ppf9dVXiTk0ZSK6cWd7 /u/aPGaZr9mm4ObBI3QXp/L+a0/4HZ9c99MOA4/oFivoyjR3nZiWwDFKbM9HxsElc33v 5PhEH92BqDORyyeoG4L+DT4ioUTZ3iSi0dq6xBU2AZBQ46BsgV679B79bK9m9MC3w3cq d1LA== X-Forwarded-Encrypted: i=1; AJvYcCVJ3TaBcyGaSfipxTuC3xoNdzUEHVk6lAvg407b7b2qvITsgiiTUYbOr8UuvIjib30ui17a+/m7sMCnllg=@vger.kernel.org X-Gm-Message-State: AOJu0YxnUKNsLRA/eL+gS3FbUHs7R0Uin6taEyWY6KksWW/qNzbpzYJ7 S5X7eY9RtRHXC/vq+ST16f8ChYTMDnOC1CzP08bdkt6cGHSknVuobJ7T X-Gm-Gg: ASbGncuNVU6kStmh/pdD5Q+ou7zBTGpj/9NBuS5nDb95+BODM1wc753y+b7jkE4juNE lAyaYuD3UnA9bpLJTNQKOvWxiRmPTIaMQmJLX4TKSn2smRv8i/T/PNhilvD7tnQ69YyGFuHmCTM 5OM3NJnTiuBPM62w7AaGOwKWyJjTkEBepSoKlhSMP+lBdITId+IbZP+OuHU9Tc0ZCDfMTzqh3ED kKm6ijnficx8MAI3DnzwSZYNNeq0zypTclpMyAU2/K8Gvvd9iWz+xRbu05G4Zcp5Pa9tI5fXi8u l7+eMy974Iz9GewDUNax6cAS4+Q9x3BWTPKaXl4DaoqWHDjMWPb8uYPQOcmGN0mrTC2rCHe8gjL LyxfW92/3Vtw6BgeWuxhp1uPM/oxwA+2NqfxD4iK56Tylv12XBovQE/FxL/1axE4RxB0LY1DSrt mbuO3FYG7Hy3BqVhLir0Y0VzZjtu7alCDal1E/LQ== X-Google-Smtp-Source: AGHT+IFMF0W1qCu4x4cjpNrUn5ZcQpDe9+QCy8Fcn/EjOKLeyug2Wicq+5aDDzqbHXeIKGe4nnQ0RQ== X-Received: by 2002:a17:903:244a:b0:25c:d4b6:f117 with SMTP id d9443c01a7336-27ed4a3de3cmr218087635ad.35.1759212019034; Mon, 29 Sep 2025 23:00:19 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.23.00.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 23:00:18 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 09/11] selftests/bpf: add test case to update THP policy Date: Tue, 30 Sep 2025 13:58:24 +0800 Message-Id: <20250930055826.9810-10-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test case exercises the BPF THP update mechanism by modifying an existing policy. The behavior confirms that: - EBUSY error occurs when attempting to install a new BPF program while another is active - Updates to currently running programs are successfully processed Signed-off-by: Yafang Shao --- .../selftests/bpf/prog_tests/thp_adjust.c | 23 +++++++++++++++++++ .../selftests/bpf/progs/test_thp_adjust.c | 14 +++++++++++ 2 files changed, 37 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/te= sting/selftests/bpf/prog_tests/thp_adjust.c index 0a5a43416f2f..409ffe9e18f2 100644 --- a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c +++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c @@ -207,6 +207,27 @@ static void subtest_thp_eligible(void) bpf_link__destroy(ops_link); } =20 +static void subtest_thp_policy_update(void) +{ + struct bpf_link *old_link, *new_link; + int err; + + old_link =3D bpf_map__attach_struct_ops(skel->maps.swap_ops); + if (!ASSERT_OK_PTR(old_link, "attach_old_link")) + return; + + new_link =3D bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops); + if (!ASSERT_NULL(new_link, "attach_new_link")) + goto destory_old; + ASSERT_EQ(errno, EBUSY, "attach_new_link"); + + err =3D bpf_link__update_map(old_link, skel->maps.thp_eligible_ops); + ASSERT_EQ(err, 0, "update_old_link"); + +destory_old: + bpf_link__destroy(old_link); +} + static int thp_adjust_setup(void) { int err =3D -1, pmd_order; @@ -252,6 +273,8 @@ void test_thp_adjust(void) =20 if (test__start_subtest("thp_eligible")) subtest_thp_eligible(); + if (test__start_subtest("policy_update")) + subtest_thp_policy_update(); =20 thp_adjust_destroy(); } diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust.c b/tools/te= sting/selftests/bpf/progs/test_thp_adjust.c index 74ad70c837ba..fc62f0c6f891 100644 --- a/tools/testing/selftests/bpf/progs/test_thp_adjust.c +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust.c @@ -39,3 +39,17 @@ SEC(".struct_ops.link") struct bpf_thp_ops thp_eligible_ops =3D { .thp_get_order =3D (void *)thp_eligible, }; + +SEC("struct_ops/thp_get_order") +int BPF_PROG(alloc_not_in_swap, struct vm_area_struct *vma, enum tva_type = type, + unsigned long orders) +{ + if (type =3D=3D TVA_SWAP_PAGEFAULT) + return 0; + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops swap_ops =3D { + .thp_get_order =3D (void *)alloc_not_in_swap, +}; --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EC9228C869 for ; Tue, 30 Sep 2025 06:00:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212032; cv=none; b=RCgVOnE855NF9ktzxvEtM2WO/nbnuMyTqucTwn3bzWcFBUTI2Hk4dyl2pmEQi7VsBLhHl5/9qjKcRX1K1xm3eqqNrjmfNhRhFQ9KLyU3dHoOhwEBICuX7tR9zpM8nkcSy5x9YanIBQDO9orZn0PgofQnuVEOcQV1JarxeSec1Yc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212032; c=relaxed/simple; bh=megoui3FrcgMHtvlQs49ApxPOBMbnJrOEUpVHigRQTk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BJIROqTxjmm1EFmS9ydYx3Uyqz6qYXkDfVBL0ct771rVM7+UE73tKT5uHv0JEYoMwGriJR/GdPsC1tYf5Kz+eRXaJl6YPuNoPPk8AXP4KFNYweT3MlmuA+thXTLOrHvPtOWJCUea3Dg+Veuly5HtY1uhMMp+u9dkx1n8nAu865w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hAb3/aFp; arc=none smtp.client-ip=209.85.215.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hAb3/aFp" Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-b550a522a49so4922567a12.2 for ; Mon, 29 Sep 2025 23:00:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759212030; x=1759816830; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+baDWNJ3e+prJbakC2u9YwdVipZEa6rsDQvMsjkRx4g=; b=hAb3/aFpORT/46qcoQWs5kuerB9oeCvhla6UvwCaSVxVPFIep5JHTBm9129Uc3yY2e 0gO0WCQ1pibTjPSwTnf+XtC0C0B94gWQ/lzAI3lklHFWGDGHcQq6rFVr4sb+yn9lxbLu Cdrfp527Bv3+CBXWX5pj5L4NY/h9Jq4lP+GeRozDkvwfcfZt7LD7VllvxsIb47ncBnrO WsfxTQ+lrDh1RlzkUjpEHOz1pd2DMXguwzHNA5ToDtbSKlnRGQkOW4O9uA0yXBY/mxlR ko3vnrYlhzvgghpw5dFrPCBw4/O1qtbK+XyxGhTxod7+k/93QI4nOIOzAPLOgfhoC+bK R0lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759212030; x=1759816830; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+baDWNJ3e+prJbakC2u9YwdVipZEa6rsDQvMsjkRx4g=; b=hTOuBgkQwnlI5LLPpTZPM6QUJ0eb0rOh/Upnv4z/HS0wSsNsBVTLNnSDAqUm5VVav0 U54t4O9lSgKJN8wTRTNZxzOuCzegFQ3/tJitKce4XCV4Tawyw0QOCF1Rji/hNCAS5u+3 OrQnQx5QvINiNzFWnzOVVDPxQjjqm9ZODCuPaTiYMY5kWFUFSLEg/ZOOers8p80y1kXD EKP0ngzm3NjljRgt5pYNac1Kyl/OSUK2/AmDiezDcUeaPTdaPmSaahzfSHS5bqmOkbFs 1gSpgRoOaTg0ebbq4t3CT565wrSTGouslDUBxuk6z6UsT9UHK+6rcVNXhk7kJRzQdbvO 1ybw== X-Forwarded-Encrypted: i=1; AJvYcCUrTU4YfUzSaZxsyXC4hFmfTr6j7sHTcNoqHo8HjT57cUl2wTAAwcfnEKaEWQDWEpKudfrEMzRe2N1mbDs=@vger.kernel.org X-Gm-Message-State: AOJu0YyXRPr0W461hqcBUosEe+JqsX9GEMJkPwpdjgEbd/i5Jn3nhCEH K2omzd9hXs7NrhEJ1i9ExC7EbYhDFJ3kudIsAvEuCL7K/j7XCeG1dIuE X-Gm-Gg: ASbGncsCcVVfXVbT19BGolPDQAJ2GUSOP1vq6Zt+cDw8lwsWR3bFoGwd8106y01g/+u OTHFY8YsBgimRjypdWnl45DEEpHiTtT1nknIfT07Ai9rwdfDdtGdephDJklXI2U4Ii+9frBNfPF rgsqvC2ca7iG8QtQIjzUV8I5wZTvnf1bQb3p62qDjp4bv5/i+5lmwpcdwFSdb0HecutbUGxAj/n LdXKwjeFNmFQ80kP/Ewa1axXYNKd2MmVtvNQm0naEiWty8t3G7e6DMn7GtnyTV6WP5Sr+KROrdK 5czPE4D08lYQvBt7SYOHwrTKAy0k2o9nA7L+jcwPfhG1fERkskmPheFdX7r3u0H9q6huGrGVdfn ulPBn6OskULZYhBCSuIW1dsjEPlxCNGE7WOuOn5+Tl8wl17+ViW2oW34XxsvXyESfneohN4HgNH kP0KLPDnJKVZpjZICVJliEGr5hpzU= X-Google-Smtp-Source: AGHT+IF3u3CwEhxn4obFOsd5k7ipKHjSB6CIP+P0m/CZ4RrSdU6mgZTJEKlhFteu0bsxBMQFdSQa/A== X-Received: by 2002:a17:902:db04:b0:265:9878:4852 with SMTP id d9443c01a7336-27ed49ddab3mr250832865ad.15.1759212030159; Mon, 29 Sep 2025 23:00:30 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.23.00.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 23:00:29 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 10/11] selftests/bpf: add test cases for invalid thp_adjust usage Date: Tue, 30 Sep 2025 13:58:25 +0800 Message-Id: <20250930055826.9810-11-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 1. The trusted vma->vm_mm pointer can be null and must be checked before dereferencing. 2. The trusted mm->owner pointer can be null and must be checked before dereferencing. 3. Sleepable programs are prohibited because the call site operates under RCU protection. Signed-off-by: Yafang Shao --- .../selftests/bpf/prog_tests/thp_adjust.c | 7 +++++ .../bpf/progs/test_thp_adjust_sleepable.c | 22 ++++++++++++++ .../bpf/progs/test_thp_adjust_trusted_owner.c | 30 +++++++++++++++++++ .../bpf/progs/test_thp_adjust_trusted_vma.c | 27 +++++++++++++++++ 4 files changed, 86 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust_sleep= able.c create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust_trust= ed_owner.c create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust_trust= ed_vma.c diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/te= sting/selftests/bpf/prog_tests/thp_adjust.c index 409ffe9e18f2..90af0322f775 100644 --- a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c +++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c @@ -3,6 +3,9 @@ #include #include #include "test_thp_adjust.skel.h" +#include "test_thp_adjust_sleepable.skel.h" +#include "test_thp_adjust_trusted_vma.skel.h" +#include "test_thp_adjust_trusted_owner.skel.h" =20 #define LEN (16 * 1024 * 1024) /* 16MB */ #define THP_ENABLED_FILE "/sys/kernel/mm/transparent_hugepage/enabled" @@ -277,4 +280,8 @@ void test_thp_adjust(void) subtest_thp_policy_update(); =20 thp_adjust_destroy(); + + RUN_TESTS(test_thp_adjust_trusted_vma); + RUN_TESTS(test_thp_adjust_trusted_owner); + RUN_TESTS(test_thp_adjust_sleepable); } diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust_sleepable.c = b/tools/testing/selftests/bpf/progs/test_thp_adjust_sleepable.c new file mode 100644 index 000000000000..e3d70f258d84 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust_sleepable.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +SEC("struct_ops.s/thp_get_order") +__failure __msg("attach to unsupported member thp_get_order of struct bpf_= thp_ops") +int BPF_PROG(thp_sleepable, struct vm_area_struct *vma, enum tva_type type, + unsigned long orders) +{ + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops vma_ops =3D { + .thp_get_order =3D (void *)thp_sleepable, +}; diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_owne= r.c b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_owner.c new file mode 100644 index 000000000000..88bb09cb7cc2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_owner.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +SEC("struct_ops/thp_get_order") +__failure __msg("R3 pointer arithmetic on rcu_ptr_or_null_ prohibited, nul= l-check it first") +int BPF_PROG(thp_trusted_owner, struct vm_area_struct *vma, enum tva_type = tva_type, + unsigned long orders) +{ + struct mm_struct *mm =3D vma->vm_mm; + struct task_struct *p; + + if (!mm) + return 0; + + p =3D mm->owner; + bpf_printk("The task name is %s\n", p->comm); + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops vma_ops =3D { + .thp_get_order =3D (void *)thp_trusted_owner, +}; diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_vma.= c b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_vma.c new file mode 100644 index 000000000000..df7b0c160153 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_vma.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +SEC("struct_ops/thp_get_order") +__failure __msg("R1 invalid mem access 'trusted_ptr_or_null_'") +int BPF_PROG(thp_trusted_vma, struct vm_area_struct *vma, enum tva_type tv= a_type, + unsigned long orders) +{ + struct mm_struct *mm =3D vma->vm_mm; + struct task_struct *p =3D mm->owner; + + if (!p) + return 0; + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops vma_ops =3D { + .thp_get_order =3D (void *)thp_trusted_vma, +}; --=20 2.47.3 From nobody Wed Oct 1 22:26:30 2025 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82727212552 for ; Tue, 30 Sep 2025 06:00:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212043; cv=none; b=SQHf8VMbS2LTqeWWhKeFXOnj9L1oq4k/wxQBDQCNwRmwTUZ2Sn4QuB2ryvgWCwswzl5quFaSjS4R/14TP89P5APCBYtQNzZ7OI26/tnxWKrz6Br/THuSJwELlZiUXjLoTIKHg2h4hTJCt9nAiz9GcIE52ZxfvqusEOwRaCDgQt0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759212043; c=relaxed/simple; bh=M3JDOPbFjkClAC4UX2vhrNDOfN9b6bNYa0hiNAn1emM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Olx23W3ztM8MmCkskJZ6xGkbIKfxd2YBh4lnE0o8ydTDg+wSyb54lmsEGK3Gr0LjI4DufiBGNbmqmBmws9nvYNwbbjcr0AItmoS4REJj5I/NE/bg/DcygljhvTXfvCw8YH+dUJJYQoxQTLy2udSmi358ENUw8061xgTexEfhEp4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hKpLoJBt; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hKpLoJBt" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-27c369f8986so53291295ad.3 for ; Mon, 29 Sep 2025 23:00:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759212041; x=1759816841; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SHmEXjchUo7p5MNeEkw3mVcK0Ae4ArUpPhVglZardsE=; b=hKpLoJBtNDECiZjp0VakSp3fmaHyldgcYRJAURPLoZBP4KVhzaYtEzpRPn//5pH9kV fHJz7876ZaFaJB65R4/H/1S0qXqY8sxE5KmS+E46Z144B05+1atfc5ERHSbWp6CI2HHB QQ1sSt0DKpnu7G1TwweNu2GYKEujXltczKjxte5zVFgGkYVPTVMexXtkIlDIsv2A0FD0 t86vwWfsgEDUO/6ZM/MXDkxH/rEEUU5JKMt0htG+IS8JwFhEA0j+h//km/whpyax3yY7 LThza1Da89ochEpmaxbFXFdvttHvwt3DE29t8Bl2OqkbIQqzDxUEboM87T7TN6OKaXH8 dzqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759212041; x=1759816841; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SHmEXjchUo7p5MNeEkw3mVcK0Ae4ArUpPhVglZardsE=; b=rvaFgnu54lr+VLmMCDkVifWd+OAFl7SwemKmetM21PslzUrAfJiOCxWNUB6FNmfTR7 E76A00ns4azc3KG5DFaMd3YEz7heOdQH/lBpHLTdfyanEiG6toOS0Kl+sy58n/zc6fBj /hF53S4uwuSO0i742exmFCRvKb7W15bXHxsAudfb5fUPi5ZjlA8ajyGy66Dwaar2QCGM 09ssbu1AL6rOOvA9ZVlMwDPNIXQ+DYg1m9DWoO54w5G0w74h9Db7Iqk5rF1ewTN/h9Ly 0yU7ftZaa815GTbSi1s27Sye8n+rItKQgL0Owf0umWih38GUB2dmPPJM3E6D5W4zZeK+ cE3A== X-Forwarded-Encrypted: i=1; AJvYcCUf8zmSbZVzggQFzz3tbH5fefLD7ITXpuH2eudSr2bK34jzI8js626wVy1Tzye4Kz3IBZvVRp5w8KJbWxQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yy/cjewBIf1Em+1qqgijVXwlzAfvnt5EHCZbjms9jSxETJ6nuga i6C+vic3gJgF1VZ8a6Ekx9xhshVrjO/b28LyVhJjsdgWd9qMZrDe+Zpp X-Gm-Gg: ASbGncvX/9yPJxeZN+Fm+qB5vsQnWt09K6jF7sMUw8UpRoSro5qd0Ly9vIuzQy8CIuD Gv4odOeqeUexQuD9B6V7nmpnTWsrowRngJCg3Rjj7NEm5pbPMK5LZe1ZJwirhpIayUpGCxj3zTE S/Xh+qIOaBx2DzTsELRU4LG1HfoHV3CXvhJ+3JPAIaoFqmcAvMhiLtIOEBlymFT1AMHrJr8/gX7 NUuC+MO/xw4YWQZIAB8rRdl+eZbWu/DumL24eTOkfH1dXJkCLoUpufUrkljpkRoUH6YXUVqiiNu TK5lhbo5qbve3dMmFDGBJnZPZpO/cXP7UVJrVqGeAZPwuA9zUaAisA8atF5+7yQLYpZpS1IvWbK bnJdZsQMDGqxWdZtbNmphMMdg0OikG0wc8gEXksPs0aym6OrM+fBseKm0M+LbMkqzhnmn9DMukD SQywqSL/kgD6XQAmWCUFKR+1YgUtdnttBFZIk7HQ== X-Google-Smtp-Source: AGHT+IESjK5uuXD6SNUTqXGKn8jpixgQvb75Jnts3CNC+W4F2sDEPtoAZ7lXfxrqlZpJ+kSZvlFegw== X-Received: by 2002:a17:903:2407:b0:24c:ca55:6d90 with SMTP id d9443c01a7336-27ed4a7448emr222073545ad.61.1759212040904; Mon, 29 Sep 2025 23:00:40 -0700 (PDT) Received: from localhost.localdomain ([61.171.228.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66d43b8sm148834065ad.9.2025.09.29.23.00.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Sep 2025 23:00:40 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev, rdunlap@infradead.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v9 mm-new 11/11] Documentation: add BPF-based THP policy management Date: Tue, 30 Sep 2025 13:58:26 +0800 Message-Id: <20250930055826.9810-12-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250930055826.9810-1-laoar.shao@gmail.com> References: <20250930055826.9810-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the documentation. Signed-off-by: Yafang Shao --- Documentation/admin-guide/mm/transhuge.rst | 39 ++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 1654211cc6cf..f6991c674329 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -738,3 +738,42 @@ support enabled just fine as always. No difference can= be noted in hugetlbfs other than there will be less overall fragmentation. All usual features belonging to hugetlbfs are preserved and unaffected. libhugetlbfs will also work fine as usual. + +BPF THP +=3D=3D=3D=3D=3D=3D=3D + +Overview +-------- + +When the system is configured with "always" or "madvise" THP mode, a BPF p= rogram +can be used to adjust THP allocation policies dynamically. This enables +fine-grained control over THP decisions based on various factors including +workload identity, allocation context, and system memory pressure. + +Program Interface +----------------- + +This feature implements a struct_ops BPF program with the following interf= ace:: + + int thp_get_order(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders); + +Parameters:: + + @vma: vm_area_struct associated with the THP allocation + @type: TVA type for current @vma + @orders: Bitmask of available THP orders for this allocation + +Return value:: + + The suggested THP order for allocation from the BPF program. Must be + a valid, available order. + +Implementation Notes +-------------------- + +This is currently an experimental feature. CONFIG_BPF_THP (EXPERIMENTAL) m= ust be +enabled to use it. Only one BPF program can be attached at a time, but the +program can be updated dynamically to adjust policies without requiring af= fected +tasks to be restarted. --=20 2.47.3