From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DF3F29E114 for ; Fri, 26 Sep 2025 09:34:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879246; cv=none; b=k5gM6nCJ7wjUt8TFsbW+HWunCLqjUXsT4hbiINChrs0eLt2IjpdVBOyzNilD3ORtLJFU4Efjw3iq8m7wKKuCW8+RTLX2YW9qh8UXN/7VUR+IT/gcw+9Y086Jw0siXmsloYJH2A6kwSQwPi+vuLxn1w7PKH1s3dJ+nwplKxwDln4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879246; c=relaxed/simple; bh=bUp3BGp396G2CdfrOAvmGzPXg4IvkjijZlzOnD82C9A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bHfEjslxDbWZSc3+xiCzx1XKb//qD/T5xfqcutXTNYT+e6DDblR9JvkFPbTJu8FqhADXziqKgKuFDX1IFklFnWdtKEIIqk68hCjgZIHl1WVjOopyGAwivLqrVC4KTqRvb4FVKwzSPhwxDkwnSsUijWerz3mhNoJkE2JlN6vn6GQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jVVReo2k; arc=none smtp.client-ip=209.85.215.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jVVReo2k" Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-b550a522a49so1751336a12.2 for ; Fri, 26 Sep 2025 02:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879244; x=1759484044; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K1iT+4a+LxjpmMHe8jWDV6d5sbyjeFATI6sWgnnJ+Ko=; b=jVVReo2kJ04DABHVns5YR1DnUxryy7N9tkMrjqFsHqW4z/cVriDD0jgkTYSvbdMOGK VE/C9CiEpB3tcpT96NO2HxqtunRQZTdApLQYRTWDVQSo2tT1x7K+Rd2Gfl11fWwPoB6R hBv3PCB/CEkc5mnLxtLu45XjKsa7Ut0vacbenxdNdCeU7hBRxJVKXYBgOh4ezxid5h0S lmlgGRztrHU6sy9JhTzKCcA4AVEe4+HtqbmsSyr57JbfUMVfNSlyNoeSXMqJ5FWItuPp bR/PSpUr0NSkiFEjP+n5TId3A6ITqlv6Fg2DrNOp6JsY9WbDhMjXfu9AVpSpk11CWpzs HsPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879244; x=1759484044; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K1iT+4a+LxjpmMHe8jWDV6d5sbyjeFATI6sWgnnJ+Ko=; b=f9b/xnhUZugqjEWWhyvqHsEgGIPBV8EP+l8xyCzfo50duDqfFLVrNPa2ZQJUXOUzuG igjD5ITrxpjJaup07sWsEIxHSjjpDcv0/HEOhTHvxhhidAyZusEnbMO6jwjNOpfoDp3H ll6hmvFf7y6YHKL/zIX99GoPkNnNSX0FgT9U/jwFHSwF3JxV1zSItAN0MJGC98B4yMtZ XORG9wbVfwfvheaqzVZm9QK5X8lT1mVZ8ZcX1n5vfmPuqCJT36GU6bVNgCiPoapJGKsJ Qy/fyZnj7Y1TiqXg9sFRs3cE2JtdU+2r9SEm3b7WuTIGUtTzd8Z3m3BmX/FN8kLJtir3 V5Dg== X-Forwarded-Encrypted: i=1; AJvYcCUhkmHPMiFJhTz5yW3UmbczW4IvUCI5/iJ5S1bMb5B75IVVhaWgBSmUtOcRUlEKOaKjTfMsdL3ayMPxYRM=@vger.kernel.org X-Gm-Message-State: AOJu0YzE3mNvtW45atSk0um280n4JOSH+OX3czs9fkcfkQ/mWEKmU6x2 b1gBi0zXH4j9ZQrK9SSTudikdaZZsagmWDvbfonz9/lF32l8QYQ/8ATi X-Gm-Gg: ASbGnct1+G3TPG1aJLwQS7ZlIH4D6OTmQKExnM75Ow2Y97n/TYhvlm7JkSHEkXSyfXd RfF4cc1v3e+q3KfB+boWHZDZ189XjFek3DZQDpd+lZnfUG19xfIvOXtUO6dLjOaHSWE0gHkNEUC 5kqbRm2TlanBWxYYfJVJCduYes54ag/7zammoMJ16+rkzS9b+FPcEfGBiwDDerqfeOpKwzocVH3 XbYiD8DDXEim+AqmwjL1hg9IIp2dI5jusrhcE4yGbdZb5pBHFocEU2oXgM/rD5FjKNCiZ3NlNp4 ABDIUZm21Ko0IXPQ62z2JGscLyPR7NCOOXQybwqi5y50YvyRrTsIDsII8aZN1DUCzsJ5RPUqe/p AKrDGDGY1X2j/ZtBuMwbkPVL7K0OuyCRBTrsbOfLufAReORaRMnkrkh4XFdmZ2+RN5lxnSnZE77 Rg0xLaWx/aEsAlV6KTQv6f0uk= X-Google-Smtp-Source: AGHT+IFMcSGbIwf6P8Uo4nGwswfBIEGRuV2pBMbvxCc6qLaUcucPNfwqmzfPb0tJNWSduz7DlEJanQ== X-Received: by 2002:a17:902:8208:b0:274:aab9:4ed4 with SMTP id d9443c01a7336-27ed4a670f1mr55436305ad.57.1758879243564; Fri, 26 Sep 2025 02:34:03 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.33.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:03 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 01/12] mm: thp: remove disabled task from khugepaged_mm_slot Date: Fri, 26 Sep 2025 17:33:32 +0800 Message-Id: <20250926093343.1000-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since a task with MMF_DISABLE_THP_COMPLETELY cannot use THP, remove it from the khugepaged_mm_slot to stop khugepaged from processing it. After this change, the following semantic relationship always holds: MMF_VM_HUGEPAGE is set =3D=3D task is in khugepaged mm_slot MMF_VM_HUGEPAGE is not set =3D=3D task is not in khugepaged mm_slot Signed-off-by: Yafang Shao Acked-by: Lance Yang --- include/linux/khugepaged.h | 4 ++++ kernel/sys.c | 7 ++++-- mm/khugepaged.c | 49 ++++++++++++++++++++------------------ 3 files changed, 35 insertions(+), 25 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index eb1946a70cff..f14680cd9854 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -15,6 +15,7 @@ extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); extern void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags); +extern void khugepaged_enter_mm(struct mm_struct *mm); extern void khugepaged_min_free_kbytes_update(void); extern bool current_is_khugepaged(void); extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long add= r, @@ -42,6 +43,9 @@ static inline void khugepaged_enter_vma(struct vm_area_st= ruct *vma, vm_flags_t vm_flags) { } +static inline void khugepaged_enter_mm(struct mm_struct *mm) +{ +} static inline int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { diff --git a/kernel/sys.c b/kernel/sys.c index a46d9b75880b..2c445bf44ce3 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -2479,7 +2480,7 @@ static int prctl_set_thp_disable(bool thp_disable, un= signed long flags, /* Flags are only allowed when disabling. */ if ((!thp_disable && flags) || (flags & ~PR_THP_DISABLE_EXCEPT_ADVISED)) return -EINVAL; - if (mmap_write_lock_killable(current->mm)) + if (mmap_write_lock_killable(mm)) return -EINTR; if (thp_disable) { if (flags & PR_THP_DISABLE_EXCEPT_ADVISED) { @@ -2493,7 +2494,9 @@ static int prctl_set_thp_disable(bool thp_disable, un= signed long flags, mm_flags_clear(MMF_DISABLE_THP_COMPLETELY, mm); mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); } - mmap_write_unlock(current->mm); + + khugepaged_enter_mm(mm); + mmap_write_unlock(mm); return 0; } =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7ab2d1a42df3..f47ac8c19447 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -396,15 +396,10 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } =20 -static inline int hpage_collapse_test_exit(struct mm_struct *mm) -{ - return atomic_read(&mm->mm_users) =3D=3D 0; -} - static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || - mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); + return !atomic_read(&mm->mm_users) || /* exit */ + mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); /* disable */ } =20 static bool hugepage_pmd_enabled(void) @@ -437,7 +432,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; =20 /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_WARN_ON_ONCE(hpage_collapse_test_exit_or_disable(mm)); if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) return; =20 @@ -460,14 +455,25 @@ void __khugepaged_enter(struct mm_struct *mm) wake_up_interruptible(&khugepaged_wait); } =20 +void khugepaged_enter_mm(struct mm_struct *mm) +{ + if (mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm)) + return; + if (mm_flags_test(MMF_VM_HUGEPAGE, mm)) + return; + if (!hugepage_pmd_enabled()) + return; + + __khugepaged_enter(mm); +} + void khugepaged_enter_vma(struct vm_area_struct *vma, vm_flags_t vm_flags) { - if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && - hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) - __khugepaged_enter(vma->vm_mm); - } + if (!thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) + return; + + khugepaged_enter_mm(vma->vm_mm); } =20 void __khugepaged_exit(struct mm_struct *mm) @@ -491,7 +497,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * hpage_collapse_test_exit_or_disable() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -1429,16 +1435,13 @@ static void collect_mm_slot(struct mm_slot *slot) =20 lockdep_assert_held(&khugepaged_mm_lock); =20 - if (hpage_collapse_test_exit(mm)) { + if (hpage_collapse_test_exit_or_disable(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); =20 - /* - * Not strictly needed because the mm exited already. - * - * mm_flags_clear(MMF_VM_HUGEPAGE, mm); - */ + /* If the mm is disabled, this flag must be cleared. */ + mm_flags_clear(MMF_VM_HUGEPAGE, mm); =20 /* khugepaged_mm_lock actually not necessary for the below */ mm_slot_free(mm_slot_cache, slot); @@ -1749,7 +1752,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) !=3D SCAN_SUCCEED) continue; =20 - if (hpage_collapse_test_exit(mm)) + if (hpage_collapse_test_exit_or_disable(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2500,9 +2503,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, VM_BUG_ON(khugepaged_scan.mm_slot !=3D slot); /* * Release the current mm_slot if this mm is about to die, or - * if we scanned all vmas of this mm. + * if we scanned all vmas of this mm, or if this mm is disabled. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (hpage_collapse_test_exit_or_disable(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DF4D29B8D9 for ; Fri, 26 Sep 2025 09:34:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879253; cv=none; b=DFZ6tnIy+aec0lnhf5P2lwT8uv953PQL4MwuRGqHu2Zgnz81Et7ragYo6uMWBjrUiWRabNpM0SDALBcC7Ds7vJGlH7RNt5RDq1K4R9RJ5CeUy/5xDM/T8Q5A6st64UCL45TuEMhWIEZO8zAHi6EH6qWBuqANFKuxiBOxgtwqUrc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879253; c=relaxed/simple; bh=WFo9LyAeEaA81byOzYYPzk59eC4wRmtclcHVMsp5+Bw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GmqX+MSWMLbWCqDb2KY3Jp5956+nDbDKIszKvdcn1K2J2cIXQx7ivvda5JQYshKChGb0IlG4GM7kjJA9AEkgnwDJmeTV9uQQr32PIUE2sg+8CH/7zu6u8WH7bB7MhjLCv+/5E9TiEEyAypA4tX9M3QCb8OjGcYwu8OW4rrT+N8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gRoARfqg; arc=none smtp.client-ip=209.85.215.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gRoARfqg" Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-b5526b7c54eso1233320a12.0 for ; Fri, 26 Sep 2025 02:34:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879251; x=1759484051; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yZMgKn0wKQ/oqMfqx0uxt+uUc993BOebahb39/puGXY=; b=gRoARfqgFRk9vKKDfTDHsOLPvYzTwN6afyYUYiCLeT1GbcywDbz/MPpntwXtHcENN/ VYxy/1MuXNLOj9hCpHaF7ThKXVZ4nNMMBphsAIM7VqxXChxprufykq4jWiWuZD6jt5o2 B7DABBqZ/ollBxGZuZcLBLxUgajWBFAjqM9UgiO/TpaIKFVj0xHIJ/Fre9Kkh8vJMEHq PpjupUm4OavdoR7/jqB59cQhLQG6OVn7tpEOz9C/4+EgJvCXorAuBDtL3eXpdArbRfh8 hWyH8zKnfd/ySA16aZTWci+D3OtWSeMhVRnkBhUItH5xouE1pLKwJHdZnAVMgAl0xqEB wD6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879251; x=1759484051; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yZMgKn0wKQ/oqMfqx0uxt+uUc993BOebahb39/puGXY=; b=gwbLlY5GeDvQ5TcB5ozxhXWgoFWJJcmthdH3f+QNR0F2SZDdiSgYFiLQQkls1vC33+ SjaQVniQF18cAHkp0NKZDQsJ0CU+TcpBAtxdkW8YXAzEN1ZC/W2eDXxO9VWXiAp/Jj+n DxLz5aetIZCJU40ZC6W+aIYi7+R/VzJekM0cVb2o42jnLEXhExRJy7hOeR0ljDr4C9RA g4ibwhourUeh8CZEo9BArElw8hBJyVe0gT/Myh/2aQKtzCX8XkpFBzDttvEtdf5+TVtC jPIdoGYUyq7EnGcCyeKPU9lu8rH8li/iQvtGHIG/6vDPai8FkqFGAdewR/28s1siqO1W kYnw== X-Forwarded-Encrypted: i=1; AJvYcCUo2SDSE+/xTkcSkbMWqiSppWd26jeAXP5b3FrfDePNP7Ybkvg1J1t2dDvAqCvnVwsh2xzFiw0nkN80BeI=@vger.kernel.org X-Gm-Message-State: AOJu0Yzw9Em5gvA/AILLjwn6kdMJF9xG+ETISBob8W/QSKMkPOKhyJ6n V9AiaI3jfcUPncZTlzbrH29BsCm5fM816OcBzRnepbgjI0e7CFFMglQ+ X-Gm-Gg: ASbGncu2ngE8c/KRXf4+hWRNXv9y53ZfZTwNUb9nria5oq7Sqr2vuHIScvNYFZcbAIm OjCF5/foJS7jo/JQAvxcOcNN0eZ/uAv+b77Mr3qsYApHuJB0otClvSao4WGBjBadILxQ9HACoKB 4YMRfArmkxJS7Hg9znXwuPPLO3GT2Acuc0fy83LrEdlPZT7Bfxz5968+R1McJdjGG5+HYhoCOK7 BqRoqqhBMHFFjTDW5bOXQABdigZYSqOm55Z3k6GPaqvSsumH3ge+Rr9L7Kzi0KV7oylGaK4b7GK RuVNNU/0TLVKSTZBgt+aOyXOnHLIbMq3rflJdhIDFh+Lj1gPizJcDb5B2xSLmJFOW3I+FV/6b7n zhA8wyx+yRPhRh/F2P5k/SMVBhe9AiWN0hmJGvzbig86Mlz3w2ytjyFoBHSzMBo78Q+EQXdomTy w6+SKgf+YicOp5 X-Google-Smtp-Source: AGHT+IE8m06qLCI0fDHhVvccWM4X7VzASHCAmPvCMmQy9qW9LwLNoLvpBo8EtHPcpGLM28ZdNvKrWw== X-Received: by 2002:a17:903:b06:b0:276:d3e:6844 with SMTP id d9443c01a7336-27ed4a7e7d9mr64071655ad.33.1758879251420; Fri, 26 Sep 2025 02:34:11 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:10 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao , Yang Shi Subject: [PATCH v8 mm-new 02/12] mm: thp: remove vm_flags parameter from khugepaged_enter_vma() Date: Fri, 26 Sep 2025 17:33:33 +0800 Message-Id: <20250926093343.1000-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The khugepaged_enter_vma() function requires handling in two specific scenarios: 1. New VMA creation When a new VMA is created, if vma->vm_mm is not present in khugepaged_mm_slot, it must be added. In this case, khugepaged_enter_vma() is called after vma->vm_flags have been set, allowing direct use of the VMA's flags. 2. VMA flag modification When vma->vm_flags are modified (particularly when VM_HUGEPAGE is set), the system must recheck whether to add vma->vm_mm to khugepaged_mm_slot. Currently, khugepaged_enter_vma() is called before the flag update, so the call must be relocated to occur after vma->vm_flags have been set. Additionally, khugepaged_enter_vma() is invoked in other contexts, such as during VMA merging. However, these calls are unnecessary because the existing VMA already ensures that vma->vm_mm is registered in khugepaged_mm_slot. While removing these redundant calls represents a potential optimization, that change should be addressed separately. Because VMA merging only occurs when the vm_flags of both VMAs are identical (excluding special flags like VM_SOFTDIRTY), we can safely use target->vm_flags instead. After this change, we can further remove vm_flags parameter from thp_vma_allowable_order(). That will be handled in a followup patch. Signed-off-by: Yafang Shao Cc: Yang Shi --- include/linux/khugepaged.h | 6 ++---- mm/huge_memory.c | 2 +- mm/khugepaged.c | 11 ++--------- mm/madvise.c | 7 +++++++ mm/vma.c | 6 +++--- 5 files changed, 15 insertions(+), 17 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f14680cd9854..b30814d3d665 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -13,8 +13,7 @@ extern void khugepaged_destroy(void); extern int start_stop_khugepaged(void); extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); -extern void khugepaged_enter_vma(struct vm_area_struct *vma, - vm_flags_t vm_flags); +extern void khugepaged_enter_vma(struct vm_area_struct *vma); extern void khugepaged_enter_mm(struct mm_struct *mm); extern void khugepaged_min_free_kbytes_update(void); extern bool current_is_khugepaged(void); @@ -39,8 +38,7 @@ static inline void khugepaged_fork(struct mm_struct *mm, = struct mm_struct *oldmm static inline void khugepaged_exit(struct mm_struct *mm) { } -static inline void khugepaged_enter_vma(struct vm_area_struct *vma, - vm_flags_t vm_flags) +static inline void khugepaged_enter_vma(struct vm_area_struct *vma) { } static inline void khugepaged_enter_mm(struct mm_struct *mm) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b81680b4225..ac6601f30e65 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1346,7 +1346,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) ret =3D vmf_anon_prepare(vmf); if (ret) return ret; - khugepaged_enter_vma(vma, vma->vm_flags); + khugepaged_enter_vma(vma); =20 if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f47ac8c19447..04121ae7d18d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -353,12 +353,6 @@ int hugepage_madvise(struct vm_area_struct *vma, #endif *vm_flags &=3D ~VM_NOHUGEPAGE; *vm_flags |=3D VM_HUGEPAGE; - /* - * If the vma become good for khugepaged to scan, - * register it here without waiting a page fault that - * may not happen any time soon. - */ - khugepaged_enter_vma(vma, *vm_flags); break; case MADV_NOHUGEPAGE: *vm_flags &=3D ~VM_HUGEPAGE; @@ -467,10 +461,9 @@ void khugepaged_enter_mm(struct mm_struct *mm) __khugepaged_enter(mm); } =20 -void khugepaged_enter_vma(struct vm_area_struct *vma, - vm_flags_t vm_flags) +void khugepaged_enter_vma(struct vm_area_struct *vma) { - if (!thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDE= R)) return; =20 khugepaged_enter_mm(vma->vm_mm); diff --git a/mm/madvise.c b/mm/madvise.c index 35ed4ab0d7c5..ab8b5d47badb 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1425,6 +1425,13 @@ static int madvise_vma_behavior(struct madvise_behav= ior *madv_behavior) VM_WARN_ON_ONCE(madv_behavior->lock_mode !=3D MADVISE_MMAP_WRITE_LOCK); =20 error =3D madvise_update_vma(new_flags, madv_behavior); + /* + * If the vma become good for khugepaged to scan, + * register it here without waiting a page fault that + * may not happen any time soon. + */ + if (!error && new_flags & VM_HUGEPAGE) + khugepaged_enter_mm(vma->vm_mm); out: /* * madvise() returns EAGAIN if kernel resources, such as diff --git a/mm/vma.c b/mm/vma.c index a1ec405bda25..6a548b0d64cd 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -973,7 +973,7 @@ static __must_check struct vm_area_struct *vma_merge_ex= isting_range( if (err || commit_merge(vmg)) goto abort; =20 - khugepaged_enter_vma(vmg->target, vmg->vm_flags); + khugepaged_enter_vma(vmg->target); vmg->state =3D VMA_MERGE_SUCCESS; return vmg->target; =20 @@ -1093,7 +1093,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma= _merge_struct *vmg) * following VMA if we have VMAs on both sides. */ if (vmg->target && !vma_expand(vmg)) { - khugepaged_enter_vma(vmg->target, vmg->vm_flags); + khugepaged_enter_vma(vmg->target); vmg->state =3D VMA_MERGE_SUCCESS; return vmg->target; } @@ -2520,7 +2520,7 @@ static int __mmap_new_vma(struct mmap_state *map, str= uct vm_area_struct **vmap) * call covers the non-merge case. */ if (!vma_is_anonymous(vma)) - khugepaged_enter_vma(vma, map->vm_flags); + khugepaged_enter_vma(vma); *vmap =3D vma; return 0; =20 --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A641B29CB3A for ; Fri, 26 Sep 2025 09:34:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879261; cv=none; b=T7+T1mXqpJ5NPDkC+NLDdGLqd6tABMNu7zlxx/Ubbaf7C8c3a90bxTeHvvqBeJULWUDvqBopkEmqD2igmdtaPjIbzJP23cIfzTLkc47m9zmLsGStROrN/3nieZqFRbtrg2m2Xusb1wCmzxONH9V3RmRo9thQSGkQU2/5TIcntNw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879261; c=relaxed/simple; bh=E04frglNkA3VvQRUXJHLpZqQuBz1La1Ik0IcgqmEqYg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NUs1M8VEXMrrDg8H3atPuy5u3LJFQQMnQz5QuZu/TgTNkEtbIaCgN/+3pRJMnKXsWWbp7+t/9t4T003PM68tywBQCb9OJ5sGLuoOEABAPgmOGWJq0ZbT+elmX/MWqinvtp5hNjY04/Di4pYsr5c8umv4iwc0i8cOwPUrilvXCyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aOjyiz76; arc=none smtp.client-ip=209.85.215.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aOjyiz76" Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-b57f08e88bcso454485a12.0 for ; Fri, 26 Sep 2025 02:34:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879259; x=1759484059; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8F1HkzqHjtZ4asrnu6+OCuquE7j2zJ/cOO3f5j6VRt0=; b=aOjyiz76H/2qv4Bc1/LyOkHigfsgd8rbZjmOImLy5Xi4TJsru8tBz1U4RHYKvCoP/Z WAeu9AxXZHdVAz6rk3UdlIYKyXmH7t0ws+xwlgsxeo3ZlNnQIqamE1RRrREkscUexVT/ 8gUjewd8r2IbxZ1fUMZwzNkTZdoB3s4s8Aaq21boyZWkLnF7eCECfOyfHCIOZ1ekXg2d 09SUUa0LWLAkQsBqEmoMi+o2fKrN+3rYJWL350pQgNIKvWsCLEfF0kFZnzXAtH56ewPr 1r44SFzVSHWTdnKIn8/qf7hYUlLbpZ6Vf3hHd1WLvDc6tOOrlT27P1m0CuL9pOFmo87B R6qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879259; x=1759484059; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8F1HkzqHjtZ4asrnu6+OCuquE7j2zJ/cOO3f5j6VRt0=; b=lk6CrUXPi7EHeUc2H49/68zc7q+73ZtGR1mYbJrpGsxfMaXV3RorovdOwiG7TGCmNj BT4TwD1wdcVV0ESbvV3/KXOMlWoK62iBdi8cppfAXdl7PYfCHpS2IU97TC0R36ae+DS0 7yKHq+0QhdnKbWipk1KH4bqX+k5U6a4aZuMMyyWTJSXRURnHpoi9GNabZtNQfGDr8eSM ae1/duOuyy5+b6EQpusS+Dx/ktPNf97lpLL6e/JesIpGybSsMYnYKO2IkDS/tFm7t2gL HcYWwT1MpxDeyIEy4iX9s2JGXNBytinw/hvG8JPthdTKi4cqRkbkWWfpycgMFvu4FOPv RCww== X-Forwarded-Encrypted: i=1; AJvYcCVAMm8anl5mFg1VeZfQRgCsmIohJ8uDE+GhjxOeirNd9W/RclL1Vj8BwverGvE1tAtla4BxZo6VdWqeru0=@vger.kernel.org X-Gm-Message-State: AOJu0YyZ+S9vhTpABSLEUZGVPT6pQzZ5V9a2UvuEBPmHAigRZi57Iz69 /n057ORqOFVb1ddILyYEqoYR4VDQPYETRlBaww0EukiS2epGrsrdN5Yk X-Gm-Gg: ASbGncvRNti36ZOiGaC+NhlNgMLjVc0mSaoh7YX1z+BkLWfYW4qOm9lcXso5MRQVT1O zqRdF9iooako608ZvkmCT4cgzpYeiYhp0mqpwxOZfff0+A5j4z7popuWiCWuIUnMFLVGH73X4pX OsNUO8QyMgHhFf3upq2cfixNE7vFRXZ2sE18eICXSFIABk6RpxGqTzjVa9v/Nd15KdMQQjdv8qC dF4CDX379fdHFLRBcwWHSicsxpwDn2ygcOP+Qh31dx+DJ4M21uuGK0GLiP+IsIeES6eZUV7zjrs NzO0DSpKKyPMnDrfR+/K8MJrVWFDfJbPTR51W6PgKVLULj1GLkSg3EJVC7QRQQjZNj3ehZ9Wiqq RKPhvankaZLWoyg94dvZoWpUMOlUDAkW30tvAlJgdemnapJZP6HU9pETr1EyyPEO3aDgXf0tSAR YqUAjmfO6z0kJf X-Google-Smtp-Source: AGHT+IGs2RVuLQNle2jpLDUn/dQyj5uYkBVBI0mgx8CiaNeXlryu+CMTBs4lU6XcwR9nFTCXObCvkg== X-Received: by 2002:a17:902:cf09:b0:27e:f1d1:74e0 with SMTP id d9443c01a7336-27ef1d178a0mr18485445ad.17.1758879258795; Fri, 26 Sep 2025 02:34:18 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:18 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 03/12] mm: thp: remove vm_flags parameter from thp_vma_allowable_order() Date: Fri, 26 Sep 2025 17:33:34 +0800 Message-Id: <20250926093343.1000-4-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Because all calls to thp_vma_allowable_order() pass vma->vm_flags as the vma_flags argument, we can remove the parameter and have the function access vma->vm_flags directly. Signed-off-by: Yafang Shao Acked-by: Usama Arif --- fs/proc/task_mmu.c | 3 +-- include/linux/huge_mm.h | 16 ++++++++-------- mm/huge_memory.c | 4 ++-- mm/khugepaged.c | 10 +++++----- mm/memory.c | 11 +++++------ mm/shmem.c | 2 +- 6 files changed, 22 insertions(+), 24 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index fc35a0543f01..e713d1905750 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1369,8 +1369,7 @@ static int show_smap(struct seq_file *m, void *v) __show_smap(m, &mss, false); =20 seq_printf(m, "THPeligible: %8u\n", - !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS, - THP_ORDERS_ALL)); + !!thp_vma_allowable_orders(vma, TVA_SMAPS, THP_ORDERS_ALL)); =20 if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f327d62fc985..a635dcbb2b99 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -101,8 +101,8 @@ enum tva_type { TVA_FORCED_COLLAPSE, /* Forced collapse (e.g. MADV_COLLAPSE). */ }; =20 -#define thp_vma_allowable_order(vma, vm_flags, type, order) \ - (!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order))) +#define thp_vma_allowable_order(vma, type, order) \ + (!!thp_vma_allowable_orders(vma, type, BIT(order))) =20 #define split_folio(f) split_folio_to_list(f, NULL) =20 @@ -266,14 +266,12 @@ static inline unsigned long thp_vma_suitable_orders(s= truct vm_area_struct *vma, } =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders); =20 /** * thp_vma_allowable_orders - determine hugepage orders that are allowed f= or vma * @vma: the vm area to check - * @vm_flags: use these vm_flags instead of vma->vm_flags * @type: TVA type * @orders: bitfield of all orders to consider * @@ -287,10 +285,11 @@ unsigned long __thp_vma_allowable_orders(struct vm_ar= ea_struct *vma, */ static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders) { + vm_flags_t vm_flags =3D vma->vm_flags; + /* * Optimization to check if required orders are enabled early. Only * forced collapse ignores sysfs configs. @@ -309,7 +308,7 @@ unsigned long thp_vma_allowable_orders(struct vm_area_s= truct *vma, return 0; } =20 - return __thp_vma_allowable_orders(vma, vm_flags, type, orders); + return __thp_vma_allowable_orders(vma, type, orders); } =20 struct thpsize { @@ -329,8 +328,10 @@ struct thpsize { * through madvise or prctl. */ static inline bool vma_thp_disabled(struct vm_area_struct *vma, - vm_flags_t vm_flags, bool forced_collapse) + bool forced_collapse) { + vm_flags_t vm_flags =3D vma->vm_flags; + /* Are THPs disabled for this VMA? */ if (vm_flags & VM_NOHUGEPAGE) return true; @@ -560,7 +561,6 @@ static inline unsigned long thp_vma_suitable_orders(str= uct vm_area_struct *vma, } =20 static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct= *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ac6601f30e65..1ac476fe6dc5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -98,7 +98,6 @@ static inline bool file_thp_enabled(struct vm_area_struct= *vma) } =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, - vm_flags_t vm_flags, enum tva_type type, unsigned long orders) { @@ -106,6 +105,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; const bool forced_collapse =3D type =3D=3D TVA_FORCED_COLLAPSE; unsigned long supported_orders; + vm_flags_t vm_flags =3D vma->vm_flags; =20 /* Check the intersection of requested and supported orders. */ if (vma_is_anonymous(vma)) @@ -122,7 +122,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, if (!vma->vm_mm) /* vdso */ return 0; =20 - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags, forced_collap= se)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, forced_collapse)) return 0; =20 /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 04121ae7d18d..9eeb868adcd3 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -463,7 +463,7 @@ void khugepaged_enter_mm(struct mm_struct *mm) =20 void khugepaged_enter_vma(struct vm_area_struct *vma) { - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDE= R)) + if (!thp_vma_allowable_order(vma, TVA_KHUGEPAGED, PMD_ORDER)) return; =20 khugepaged_enter_mm(vma->vm_mm); @@ -915,7 +915,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, =20 if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, type, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, type, PMD_ORDER)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1526,7 +1526,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, * and map it by a PMD, regardless of sysfs THP settings. As such, let's * analogously elide sysfs THP settings here and force collapse. */ - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) + if (!thp_vma_allowable_order(vma, TVA_FORCED_COLLAPSE, PMD_ORDER)) return SCAN_VMA_CHECK; =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ @@ -2421,7 +2421,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORD= ER)) { + if (!thp_vma_allowable_order(vma, TVA_KHUGEPAGED, PMD_ORDER)) { skip: progress++; continue; @@ -2752,7 +2752,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, BUG_ON(vma->vm_start > start); BUG_ON(vma->vm_end < end); =20 - if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) + if (!thp_vma_allowable_order(vma, TVA_FORCED_COLLAPSE, PMD_ORDER)) return -EINVAL; =20 cc =3D kmalloc(sizeof(*cc), GFP_KERNEL); diff --git a/mm/memory.c b/mm/memory.c index 7e32eb79ba99..cd04e4894725 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4558,7 +4558,7 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * Get a list of all the (large) orders below PMD_ORDER that are enabled * and suitable for swapping THP. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + orders =3D thp_vma_allowable_orders(vma, TVA_PAGEFAULT, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swap_suitable_orders(swp_offset(entry), @@ -5107,7 +5107,7 @@ static struct folio *alloc_anon_folio(struct vm_fault= *vmf) * for this vma. Then filter out the orders that can't be allocated over * the faulting address and still be fully contained in the vma. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + orders =3D thp_vma_allowable_orders(vma, TVA_PAGEFAULT, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); =20 @@ -5379,7 +5379,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct fo= lio *folio, struct page *pa * PMD mappings if THPs are disabled. As we already have a THP, * behave as if we are forcing a collapse. */ - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags, + if (thp_disabled_by_hw() || vma_thp_disabled(vma, /* forced_collapse=3D*/ true)) return ret; =20 @@ -6280,7 +6280,6 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, .gfp_mask =3D __get_fault_gfp_mask(vma), }; struct mm_struct *mm =3D vma->vm_mm; - vm_flags_t vm_flags =3D vma->vm_flags; pgd_t *pgd; p4d_t *p4d; vm_fault_t ret; @@ -6295,7 +6294,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, return VM_FAULT_OOM; retry_pud: if (pud_none(*vmf.pud) && - thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PUD_ORDER)) { + thp_vma_allowable_order(vma, TVA_PAGEFAULT, PUD_ORDER)) { ret =3D create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -6329,7 +6328,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, goto retry_pud; =20 if (pmd_none(*vmf.pmd) && - thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) { + thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) { ret =3D create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; diff --git a/mm/shmem.c b/mm/shmem.c index 4855eee22731..cc2c90656b66 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1780,7 +1780,7 @@ unsigned long shmem_allowable_huge_orders(struct inod= e *inode, vm_flags_t vm_flags =3D vma ? vma->vm_flags : 0; unsigned int global_orders; =20 - if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags, shmem= _huge_force))) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, shmem_huge_forc= e))) return 0; =20 global_orders =3D shmem_huge_global_enabled(inode, index, write_end, --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 295AB29CB3A for ; Fri, 26 Sep 2025 09:34:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879268; cv=none; b=S0RxlX8GgiiuHEpYHntjHj3CsRyWPbk+74Ka90/0kGELXZpMw5h6TEaQgrFTcmY8L5Q9fPEyfACFzfbfrgd7MROF7/+F+PMB9f7E1dQPqsat6vZZka5FtZ85xkcXiL/lgrq5YIMKeMt0BIo795aHqqf0+rOqIq79WKyRmP6hygs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879268; c=relaxed/simple; bh=4v9yMqRUMsAdx9XCDbFbul+4a5Y+QacQbFmQAXupXGk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=l20mLgA90Qi94MwGb1db85CnMhTHTt1JYutdaR2I4iP6Rh2ppU0/qJqqNfd8a3BFWlRrfEX6te9SqT6HBRn9P3Eed32rNdQOAS4JUNmzqOP2gbhhE4btI5C05EIvrL3MMzau6HLymC5rJVIxU0Fd4LhjncLRyPyLkUY3u4XvXGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TmMB9PIb; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TmMB9PIb" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2698d47e776so16777775ad.1 for ; Fri, 26 Sep 2025 02:34:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879266; x=1759484066; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jB3qqE2os1Bz2apgpHhR6iMBcRyPEQoOLursDIPNCnw=; b=TmMB9PIbLShL4fDCszXVMMe7DlT5vf8vBx1HaBP4MD0KxkDNO8FJnvEYGFfAbzlSCy /j9JgfhvhvaRcVPNifMM3dhlNT5hegURb9L7HlyjbAC7DyVwTk/Dq/Bgsa4pGyyGF/99 iIfCQBPZPWBl5L428edBpFzflLAAuJaoIbs8iRaLCk1sxOC+D3LXHxA3TIVxzWaIPyUE DMlKYhfL1xJuFs93Y/1vdbSr4KNgd1ZOxjBDif49lDNBtUS+nUqApQ5mmxJInZYzLcM8 7uy/sE6kY8IIvYEXEc4erURcPdmRHBYz4L+q/i24zVBaSO/E/GCX4WFKuONWo5Nuw4U6 1KfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879266; x=1759484066; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jB3qqE2os1Bz2apgpHhR6iMBcRyPEQoOLursDIPNCnw=; b=oDh6PlFQ2Rlx4zaTcVmndmIO+C2N8nB0DSTREPrdhBHqGASxjU9Ffb4pm5+OjQ36pD id38nrZTzq8cYPSW11aDkK7HOpEq8aRUpjaayyjty7APJFUglvZKzZ03aORudhrxk/aH F2i2SAG2BzvgO3IsigqAdeSUlyPiPQHJSMwn507NQQSDgSuYNNEAJrF41ADnNRdTpKuI bJ2QqSTR7ntdZOs9qVjP3ek9/kJDbVMMqMg70TzFKyvhE8b05bRNyj1+zT1316wsEOxp FVPKoT677zr3EEUkxTwgRf28BcpVkl0k+q7gTq5djY6JP3Z47IC8SndPpmRkwlad8IFT sdhQ== X-Forwarded-Encrypted: i=1; AJvYcCU+IExV3rf+jgD2JW2UVh+BErAbPhXm/lL+aq4DhGWDKm4HXimV+W/S4HD+Nu1LzHEZz7ZpTMn62HFt/dw=@vger.kernel.org X-Gm-Message-State: AOJu0YyU1l+jkc3H0T32oRRLD2fAsBihAztt2ws1n09zAnWYDiZkMsKf JhmmJNkVaehjaaFRDyOi2C9rvysV8mTcUxm0IHK8UXnUs5gqqVOeE1CN X-Gm-Gg: ASbGncspvluXtC/13rtwuvXLyIkdyCy7uvDZmb9O1EF2NxxXhjrZjbqGRXrDiOFxOFb eyszoMv5LUUXmkO7UPLrCC7MpcKyBLS2/WqDx3E9hgkDefR7YAE4nkqCw0GSLHhEuPtBgIDGMgN TyXYMaYvaRQfh68Yao7STMqYE5WkQDhpcpIQQKQ+TLQ9S5xhuPb7bD9GkycAXOvKrwnwvU0qsOa XHi4MVecX9hY9kQ8Kkl/JrUjHuiPBOMXo78/UDxZAAAY/8LBin+A29FIbnMgRUQP8hwydRWYSCv TvBXsJA+Tjqjqd/ftoJ0REFcvUsAgG3sRozLdD/qkxipdQtbAiqifbk1cjWD5gcww0msPq5DQzu iVx6CdqkNN0+ZXblq9Zy3VofQvj004LkDosboyqn3l+mdGTPS/qoXAFETNGoWvpWeoG0scjBVt6 eT/DyQIgJcsQODO51kcb6PMbw= X-Google-Smtp-Source: AGHT+IGHvI1bbsOR3nSZZZLQ9842geVwDh30Zh6/XJrKPJQQbUmczO+WQSUozaPrlC9pjQUW3P6u0A== X-Received: by 2002:a17:902:f785:b0:274:6d95:99d2 with SMTP id d9443c01a7336-27ed4a2d5d2mr76014615ad.39.1758879266182; Fri, 26 Sep 2025 02:34:26 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:25 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 04/12] mm: thp: add support for BPF based THP order selection Date: Fri, 26 Sep 2025 17:33:35 +0800 Message-Id: <20250926093343.1000-5-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This patch introduces a new BPF struct_ops called bpf_thp_ops for dynamic THP tuning. It includes a hook bpf_hook_thp_get_order(), allowing BPF programs to influence THP order selection based on factors such as: - Workload identity For example, workloads running in specific containers or cgroups. - Allocation context Whether the allocation occurs during a page fault, khugepaged, swap or other paths. - VMA's memory advice settings MADV_HUGEPAGE or MADV_NOHUGEPAGE - Memory pressure PSI system data or associated cgroup PSI metrics The kernel API of this new BPF hook is as follows, /** * thp_order_fn_t: Get the suggested THP order from a BPF program for alloc= ation * @vma: vm_area_struct associated with the THP allocation * @type: TVA type for current @vma * @orders: Bitmask of available THP orders for this allocation * * Return: The suggested THP order for allocation from the BPF program. Mus= t be * a valid, available order. */ typedef int thp_order_fn_t(struct vm_area_struct *vma, enum tva_type type, unsigned long orders); Only a single BPF program can be attached at any given time, though it can be dynamically updated to adjust the policy. The implementation supports anonymous THP, shmem THP, and mTHP, with future extensions planned for file-backed THP. This functionality is only active when system-wide THP is configured to madvise or always mode. It remains disabled in never mode. Additionally, if THP is explicitly disabled for a specific task via prctl(), this BPF functionality will also be unavailable for that task. This BPF hook enables the implementation of flexible THP allocation policies at the system, per-cgroup, or per-task level. This feature requires CONFIG_BPF_THP_GET_ORDER_EXPERIMENTAL to be enabled. Note that this capability is currently unstable and may undergo significant changes=E2=80=94including potential removal=E2=80=94in future k= ernel versions. Suggested-by: David Hildenbrand Suggested-by: Lorenzo Stoakes Signed-off-by: Yafang Shao --- MAINTAINERS | 1 + include/linux/huge_mm.h | 23 +++++ mm/Kconfig | 12 +++ mm/Makefile | 1 + mm/huge_memory_bpf.c | 204 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 241 insertions(+) create mode 100644 mm/huge_memory_bpf.c diff --git a/MAINTAINERS b/MAINTAINERS index ca8e3d18eedd..7be34b2a64fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16257,6 +16257,7 @@ F: include/linux/huge_mm.h F: include/linux/khugepaged.h F: include/trace/events/huge_memory.h F: mm/huge_memory.c +F: mm/huge_memory_bpf.c F: mm/khugepaged.c F: mm/mm_slot.h F: tools/testing/selftests/mm/khugepaged.c diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a635dcbb2b99..fea94c059bed 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -56,6 +56,7 @@ enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG, TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, + TRANSPARENT_HUGEPAGE_BPF_ATTACHED, /* BPF prog is attached */ }; =20 struct kobject; @@ -269,6 +270,23 @@ unsigned long __thp_vma_allowable_orders(struct vm_are= a_struct *vma, enum tva_type type, unsigned long orders); =20 +#ifdef CONFIG_BPF_THP_GET_ORDER_EXPERIMENTAL + +unsigned long +bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type, + unsigned long orders); + +#else + +static inline unsigned long +bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type, + unsigned long orders) +{ + return orders; +} + +#endif + /** * thp_vma_allowable_orders - determine hugepage orders that are allowed f= or vma * @vma: the vm area to check @@ -290,6 +308,11 @@ unsigned long thp_vma_allowable_orders(struct vm_area_= struct *vma, { vm_flags_t vm_flags =3D vma->vm_flags; =20 + /* The BPF-specified order overrides which order is selected. */ + orders &=3D bpf_hook_thp_get_orders(vma, type, orders); + if (!orders) + return 0; + /* * Optimization to check if required orders are enabled early. Only * forced collapse ignores sysfs configs. diff --git a/mm/Kconfig b/mm/Kconfig index bde9f842a4a8..fd7459eecb2d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -895,6 +895,18 @@ config NO_PAGE_MAPCOUNT =20 EXPERIMENTAL because the impact of some changes is still unclear. =20 +config BPF_THP_GET_ORDER_EXPERIMENTAL + bool "BPF-based THP order selection (EXPERIMENTAL)" + depends on TRANSPARENT_HUGEPAGE && BPF_SYSCALL + + help + Enable dynamic THP order selection using BPF programs. This + experimental feature allows custom BPF logic to determine optimal + transparent hugepage allocation sizes at runtime. + + WARNING: This feature is unstable and may change in future kernel + versions. + endif # TRANSPARENT_HUGEPAGE =20 # simple helper to make the code a bit easier to read diff --git a/mm/Makefile b/mm/Makefile index 21abb3353550..62ebfa23635a 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -99,6 +99,7 @@ obj-$(CONFIG_MIGRATION) +=3D migrate.o obj-$(CONFIG_NUMA) +=3D memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o +obj-$(CONFIG_BPF_THP_GET_ORDER_EXPERIMENTAL) +=3D huge_memory_bpf.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o diff --git a/mm/huge_memory_bpf.c b/mm/huge_memory_bpf.c new file mode 100644 index 000000000000..b59a65d70a93 --- /dev/null +++ b/mm/huge_memory_bpf.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * BPF-based THP policy management + * + * Author: Yafang Shao + */ + +#include +#include +#include +#include + +/** + * @thp_order_fn_t: Get the suggested THP order from a BPF program for all= ocation + * @vma: vm_area_struct associated with the THP allocation + * @type: TVA type for current @vma + * @orders: Bitmask of available THP orders for this allocation + * + * Return: The suggested THP order for allocation from the BPF program. Mu= st be + * a valid, available order. + */ +typedef int thp_order_fn_t(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders); + +struct bpf_thp_ops { + thp_order_fn_t __rcu *thp_get_order; +}; + +static struct bpf_thp_ops bpf_thp; +static DEFINE_SPINLOCK(thp_ops_lock); + +unsigned long bpf_hook_thp_get_orders(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders) +{ + thp_order_fn_t *bpf_hook_thp_get_order; + int bpf_order; + + /* No BPF program is attached */ + if (!test_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, + &transparent_hugepage_flags)) + return orders; + + rcu_read_lock(); + bpf_hook_thp_get_order =3D rcu_dereference(bpf_thp.thp_get_order); + if (!bpf_hook_thp_get_order) + goto out; + + bpf_order =3D bpf_hook_thp_get_order(vma, type, orders); + orders &=3D BIT(bpf_order); + +out: + rcu_read_unlock(); + return orders; +} + +static bool bpf_thp_ops_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static const struct bpf_func_proto * +bpf_thp_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *pr= og) +{ + return bpf_base_func_proto(func_id, prog); +} + +static const struct bpf_verifier_ops thp_bpf_verifier_ops =3D { + .get_func_proto =3D bpf_thp_get_func_proto, + .is_valid_access =3D bpf_thp_ops_is_valid_access, +}; + +static int bpf_thp_init(struct btf *btf) +{ + return 0; +} + +static int bpf_thp_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + /* The call site operates under RCU protection. */ + if (prog->sleepable) + return -EINVAL; + return 0; +} + +static int bpf_thp_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + return 0; +} + +static int bpf_thp_reg(void *kdata, struct bpf_link *link) +{ + struct bpf_thp_ops *ops =3D kdata; + + spin_lock(&thp_ops_lock); + if (test_and_set_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, + &transparent_hugepage_flags)) { + spin_unlock(&thp_ops_lock); + return -EBUSY; + } + WARN_ON_ONCE(rcu_access_pointer(bpf_thp.thp_get_order)); + rcu_assign_pointer(bpf_thp.thp_get_order, ops->thp_get_order); + spin_unlock(&thp_ops_lock); + return 0; +} + +static void bpf_thp_unreg(void *kdata, struct bpf_link *link) +{ + thp_order_fn_t *old_fn; + + spin_lock(&thp_ops_lock); + clear_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, &transparent_hugepage_flags); + old_fn =3D rcu_replace_pointer(bpf_thp.thp_get_order, NULL, + lockdep_is_held(&thp_ops_lock)); + WARN_ON_ONCE(!old_fn); + spin_unlock(&thp_ops_lock); + + synchronize_rcu(); +} + +static int bpf_thp_update(void *kdata, void *old_kdata, struct bpf_link *l= ink) +{ + thp_order_fn_t *old_fn, *new_fn; + struct bpf_thp_ops *old =3D old_kdata; + struct bpf_thp_ops *ops =3D kdata; + int ret =3D 0; + + if (!ops || !old) + return -EINVAL; + + spin_lock(&thp_ops_lock); + /* The prog has aleady been removed. */ + if (!test_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, + &transparent_hugepage_flags)) { + ret =3D -ENOENT; + goto out; + } + + new_fn =3D rcu_dereference(ops->thp_get_order); + old_fn =3D rcu_replace_pointer(bpf_thp.thp_get_order, new_fn, + lockdep_is_held(&thp_ops_lock)); + WARN_ON_ONCE(!old_fn || !new_fn); + +out: + spin_unlock(&thp_ops_lock); + if (!ret) + synchronize_rcu(); + return ret; +} + +static int bpf_thp_validate(void *kdata) +{ + struct bpf_thp_ops *ops =3D kdata; + + if (!ops->thp_get_order) { + pr_err("bpf_thp: required ops isn't implemented\n"); + return -EINVAL; + } + return 0; +} + +static int bpf_thp_get_order(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders) +{ + return -1; +} + +static struct bpf_thp_ops __bpf_thp_ops =3D { + .thp_get_order =3D (thp_order_fn_t __rcu *)bpf_thp_get_order, +}; + +static struct bpf_struct_ops bpf_bpf_thp_ops =3D { + .verifier_ops =3D &thp_bpf_verifier_ops, + .init =3D bpf_thp_init, + .check_member =3D bpf_thp_check_member, + .init_member =3D bpf_thp_init_member, + .reg =3D bpf_thp_reg, + .unreg =3D bpf_thp_unreg, + .update =3D bpf_thp_update, + .validate =3D bpf_thp_validate, + .cfi_stubs =3D &__bpf_thp_ops, + .owner =3D THIS_MODULE, + .name =3D "bpf_thp_ops", +}; + +static int __init bpf_thp_ops_init(void) +{ + int err; + + err =3D register_bpf_struct_ops(&bpf_bpf_thp_ops, bpf_thp_ops); + if (err) + pr_err("bpf_thp: Failed to register struct_ops (%d)\n", err); + return err; +} +late_initcall(bpf_thp_ops_init); --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 230562BE7C2 for ; Fri, 26 Sep 2025 09:34:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879275; cv=none; b=FrpbCN8FlZMCi7RNeu8rUhQhfUqZ1rl0b0zIqg6RgHp8mYV1cf6B/iNtjSraUBWpm5n4Ngo/o7JYMjc6CX0AX2BcGFkfbib4VUQTZt+gA8zvC/BiOQWAk/w2Fffv+VhcOHdVEbE6+PSq/J8RZpn4e2SzNeCSWBPssrl3Bx0HNNw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879275; c=relaxed/simple; bh=G79GhmKfwmtyN9keftelYbHeQuJsfIzIt4OFiFUwKJ4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=XeyXJH9i/vZtM6RJrpBElrntWfFQlOqHUn1v05A0f3Qc4E/OF6qCb9f8ZJAG19c9KrZkVYz1dwQ3uDVINQn0+KvEyH39H6kzKXx3Fxh9XNca+XCm6eGzNem+0M80aiQKZnWQcQsN8jJXshZa5R3m1jgBX7H1ZEvqH+VyH1nAuUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AkHWKu80; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AkHWKu80" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-271d1305ad7so28082415ad.2 for ; Fri, 26 Sep 2025 02:34:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879273; x=1759484073; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CSK3xayS0nafPEeCs+qk2zDDE4HBGPMWnEY1k1aBfXw=; b=AkHWKu80T6YW8vrCbkrpBFtda+9rCwvb3mhCI4KNhy7DFXeryvcgRhApSzom69xkHw KqcJ0lAQWg6HWi4Kvhl2bfGbKwuqM+yMiXHsVTvixVcN1Tg5rBWFUvVIv1CZIeZxnbzw iku9uanBLdufuWfCmIfIS84AsRuolEq5rTiDXX/siHqbOWxe0HX+zE5kOlTyOoNED8q3 aBsZ5TrEStgEVi8YmlGd9mLf7F3ufCdLONuc1Yhp63mo1UctZeRA6RM7WUeyHjqdqv8t ecHAZMzi2iWTI6h6G7a1bzF6mq+808qpVGDziGhReorIq1kJ706ZsGQewNdxPakyMq48 fWHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879273; x=1759484073; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CSK3xayS0nafPEeCs+qk2zDDE4HBGPMWnEY1k1aBfXw=; b=AwWNGylQvwd30Gj8cbW6TeP8Q11DGLGKHUeZHH5b2ZBbnNosflGvp9rQEKANSTMBAw FMV0KWl1sj6wYufXmH2ObJQpKZlIbdEgu0EVYnMk+Qbgie/yddVz9mEgrzkZcYYd2jIH C1DLjLvCQveDJ2OLpoTmpCTo4x4qKPL9KTxSZKcW6B3P6ThApFtlLaJBX18jpZs2AixE PmTnESAHOSdl8EZvBd0iw1vg5EWIGX8O6jL4fJ5C8rnDLLObFKxdYziHVcU49dds5/Y2 W3F9PffXQsgTYO2V2Uxa8+M+gfj6LHryb3FyW21c91QsWPDEK0Fqz9SzynNIX93Q7FvP 3CDQ== X-Forwarded-Encrypted: i=1; AJvYcCXhiZj/fSVbHlvFT8bU7oaT/XJYwanvOQnuIpCUEfxkOODw1oGbDRrLD08IjoQXinSTM2ZcGTW2CRRBekQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyJ1Rwwi2yzUWDwJwyqLRcZS4wjg/Ctw8uT9azVKUb4vV/zP5ve k6QqFoyCF8XRCEyjauhMpd9pw4+5BEx8EOBd3vNgj+r9PNp9ZhfmvdcC X-Gm-Gg: ASbGnctBp5xnNpMWVML/PuswSuMUiLCkKkRPBZvmEJAQ7by07RE6z2Xl8lCmOXvyVJe tR/zbTOZBGh894KjberO6dYhl/GfAgcx9TGbzDGGL35Yq4xMVMY+HjfnvnbkcMgcTCqSMC821m8 rxiy+tMi7rKCdRW+WUFvPMdfMDcUb9BxBiPCyAjljyVKTQZHKAaHSJ5IXCnGTqOiMpMYackKNC8 h8078gTbFDH7Tzb74/ZExkAS1iyQRW8uNfl+9zQS5Z4/cFus9/MZkPlh2Befe8p2Pcdo5BOih3R Ctl4atYyN8w+fuCCZxoa+SIlp5KI0p3iotdCCllNnNGCa4oRosHGA8HzsjZy0xPvlSGvpe5jhKi kq7ghRcALaz4KLT30NowStzqE5YwEELGDHiCK/ciUelWFuADl9i+vvsGL8wKQa+rXDkRT6ELDed 2t4c6Njz7JtM3md4LK1aY+VM4= X-Google-Smtp-Source: AGHT+IHJHzJfJKSauriL7OPkevf//sVckJLTHFgqvaZTWYLp0JaZgp87Bhu+TeYvdzgl9+yv9HKXcQ== X-Received: by 2002:a17:903:faf:b0:27e:f201:ec94 with SMTP id d9443c01a7336-27ef201f0bcmr18530145ad.18.1758879273495; Fri, 26 Sep 2025 02:34:33 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:32 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 05/12] mm: thp: decouple THP allocation between swap and page fault paths Date: Fri, 26 Sep 2025 17:33:36 +0800 Message-Id: <20250926093343.1000-6-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The new BPF capability enables finer-grained THP policy decisions by introducing separate handling for swap faults versus normal page faults. As highlighted by Barry: We=E2=80=99ve observed that swapping in large folios can lead to more swap thrashing for some workloads- e.g. kernel build. Consequently, some workloads might prefer swapping in smaller folios than those allocated by alloc_anon_folio(). While prtcl() could potentially be extended to leverage this new policy, doing so would require modifications to the uAPI. Signed-off-by: Yafang Shao Reviewed-by: Lorenzo Stoakes Cc: Barry Song <21cnbao@gmail.com> Acked-by: Usama Arif --- include/linux/huge_mm.h | 3 ++- mm/huge_memory.c | 2 +- mm/memory.c | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index fea94c059bed..bd30694f6a9c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -97,9 +97,10 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; =20 enum tva_type { TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */ - TVA_PAGEFAULT, /* Serving a page fault. */ + TVA_PAGEFAULT, /* Serving a non-swap page fault. */ TVA_KHUGEPAGED, /* Khugepaged collapse. */ TVA_FORCED_COLLAPSE, /* Forced collapse (e.g. MADV_COLLAPSE). */ + TVA_SWAP_PAGEFAULT, /* serving a swap page fault. */ }; =20 #define thp_vma_allowable_order(vma, type, order) \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1ac476fe6dc5..08372dfcb41a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -102,7 +102,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, unsigned long orders) { const bool smaps =3D type =3D=3D TVA_SMAPS; - const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; + const bool in_pf =3D (type =3D=3D TVA_PAGEFAULT || type =3D=3D TVA_SWAP_P= AGEFAULT); const bool forced_collapse =3D type =3D=3D TVA_FORCED_COLLAPSE; unsigned long supported_orders; vm_flags_t vm_flags =3D vma->vm_flags; diff --git a/mm/memory.c b/mm/memory.c index cd04e4894725..58ea0f93f79e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4558,7 +4558,7 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * Get a list of all the (large) orders below PMD_ORDER that are enabled * and suitable for swapping THP. */ - orders =3D thp_vma_allowable_orders(vma, TVA_PAGEFAULT, + orders =3D thp_vma_allowable_orders(vma, TVA_SWAP_PAGEFAULT, BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swap_suitable_orders(swp_offset(entry), --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63DAE2BD58C for ; Fri, 26 Sep 2025 09:34:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879282; cv=none; b=lX0UvyTJimgqB3iSn3l044+bAxvxS/KXMJDMPDsh2EqbFi5xOoQQ2VNTUChqwNtilzp8jvX1umkIF+VPecMj9ITg4R3IIKjcjm5bptfrYcZHuh0Khh3LIk1g9RK2IM2czQH60y3nMarw4482slD7UNhvdvfggt3ImpVFaOJ8xiA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879282; c=relaxed/simple; bh=Hma/ACUcxIvuWRcfKm0+Fz7g3KjXZp3i+HAKPNpNOmc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sCgniHaGMwIgxsHuCUQGY4b86SRlQ/H746fgzJlpptx8otCczambltkXRZ12uRMZ2kVgHc6XJRMJ2H30SxlUW0gIKDHiJBingL6i7QBRSTxC74ssbTCdLTOu0ZumBTu1oaMCFeVytnTOTy/YmKkTJFK1ySECVsyOCmG4QdZpH1g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mrYUXZJs; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mrYUXZJs" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-27d4d6b7ab5so29856565ad.2 for ; Fri, 26 Sep 2025 02:34:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879281; x=1759484081; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QcUAVjuBCwqD/o526Djm1QGIR4OCMDMMqLBu73ohlCo=; b=mrYUXZJs8rHPD7HirvQTBTLLXrSkbLGEe3s34QhuXZrUR5So0DuAvy3gLIl1Euq0g0 CB+q9/AMLR+b6WtwAZN47ChdrepFj644hnSXZt5blK3BStiTovf9zQzih80++uRu/KU3 5qgDsvUZumMVjT3g2use4odlE64/c3rFvWh0ymdNs0M4gk9lgu5/8wcgYYy39Snp6P1y FdOAsUocNCRhHvYiSKJsUJIAKKGnCaX7NVcw09rYZ0Sx2FBcTkMUv/ks3bYn74bfll6y vdQIsFVdrWIPjsxcY5xS9bU8P9dUFIIOnbadP1AkHCTbd77IXjengbXwMBl/kyuauXkv YIYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879281; x=1759484081; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QcUAVjuBCwqD/o526Djm1QGIR4OCMDMMqLBu73ohlCo=; b=eEVxn3RGE9hDsGfzas5o1iA43w8YV3768hgi4d+OWYQDaHroK3gNpjcVerhpBJeOCh zJ23CsrqiRfJXVqpomnl2zZTELVjtBcG+p3JEX6lpsdhyJJa/Qx63Yl529gaZUHVowiz EENKcitZvoaYMy/lLFehaSo2qkIwOeyG084fW/FP3iS4/W0nxbcjF0J0zdjIFynhtZkc KsVfT7cG18OeaaIjBdCHiO+ia5Aw5eIc4AuJeo0VsotQKzl2jYUWRsNnQrkzZXPT7aGH cBILGAtSdEiSQgLIs6qbsa3Igfh2f3MX+ln3emuwdSsRvY384Cu5+3ud1vxL3J9H1WGP Po4w== X-Forwarded-Encrypted: i=1; AJvYcCVXO5AKJaYXhvRg6wh2vOcRs35nC//82qbxcmqnnQcqgVByOU/sIAD3nQG7MhP/7oElfBmQ2Y6KWu4msZM=@vger.kernel.org X-Gm-Message-State: AOJu0Yzihee9bIhZGqqyTu90bOc8FD4ebov3DKcaADLcxNWMOTnxmXeN l0DxqYon+1FsPrGyZPH8Kv0VdjpOTwPU4/qFt1gUV8H163JMl+m1LpUf X-Gm-Gg: ASbGncvlji9kTR/CtopgZnRKKja8mVKSpGX9r1W9230HmFv4S0NutfoT5zOUKArpUrV 81n7FG8n8zjj5uOPeRRtDtGGGEDLstNMTrCb/9f8xd7RBhtxnIlbMM0dmVbBuK25pOvvJnFsnzP N4b3PdWzphHW2JldriVq/vDw7V6Eh7ZZrFuXF3DTNuAMQEVApPzSwniIB3CTzEtxKQOVsrPUhyX wkFdUJhwqM8r7NRUBn5ScCj3rkepFRdmwVTyADO2lvCJqGxTxrgH3dWnpYzUyA7v7vg3Yb0zgAE kwn5X26nwD6K2q3E2XLvrli2br4TfvrKd6bEQtrOyRnZ+C7DhhlkT0teEDDFqWaDTnnsLtblgk3 39DMRKgVkm5qFAOsW6ULpjo2z4+TAiftnTSsVyqCXEajshn4RwfUdb2DzxYbqOPg+Qz6qNpCI9w 7pHoiotw/pq3Dh X-Google-Smtp-Source: AGHT+IHpTOJHleE7ACdFvjLA52er0X3g6rm9mXYHGB20zbDxAkuQgTkRYnpRqU5bnQ67XgLmD1Xa5Q== X-Received: by 2002:a17:903:19e6:b0:266:f01a:98c4 with SMTP id d9443c01a7336-27ed49dece8mr69854105ad.13.1758879280625; Fri, 26 Sep 2025 02:34:40 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:40 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 06/12] mm: thp: enable THP allocation exclusively through khugepaged Date: Fri, 26 Sep 2025 17:33:37 +0800 Message-Id: <20250926093343.1000-7-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" khugepaged_enter_vma() ultimately invokes any attached BPF function with the TVA_KHUGEPAGED flag set when determining whether or not to enable khugepaged THP for a freshly faulted in VMA. Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as invoked by create_huge_pmd() and only when we have already checked to see if an allowable TVA_PAGEFAULT order is specified. Since we might want to disallow THP on fault-in but allow it via khugepaged, we move things around so we always attempt to enter khugepaged upon fault. This change is safe because: - the checks for thp_vma_allowable_order(TVA_KHUGEPAGED) and thp_vma_allowable_order(TVA_PAGEFAULT) are functionally equivalent - khugepaged operates at the MM level rather than per-VMA. The THP allocation might fail during page faults due to transient conditions (e.g., memory pressure), it is safe to add this MM to khugepaged for subsequent defragmentation. While we could also extend prctl() to utilize this new policy, such a change would require a uAPI modification to PR_SET_THP_DISABLE. Signed-off-by: Yafang Shao Acked-by: Lance Yang --- mm/huge_memory.c | 1 - mm/memory.c | 13 ++++++++----- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 08372dfcb41a..2b155a734c78 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1346,7 +1346,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) ret =3D vmf_anon_prepare(vmf); if (ret) return ret; - khugepaged_enter_vma(vma); =20 if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && diff --git a/mm/memory.c b/mm/memory.c index 58ea0f93f79e..64f91191ffff 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6327,11 +6327,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_= struct *vma, if (pud_trans_unstable(vmf.pud)) goto retry_pud; =20 - if (pmd_none(*vmf.pmd) && - thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) { - ret =3D create_huge_pmd(&vmf); - if (!(ret & VM_FAULT_FALLBACK)) - return ret; + if (pmd_none(*vmf.pmd)) { + if (vma_is_anonymous(vma)) + khugepaged_enter_vma(vma); + if (thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) { + ret =3D create_huge_pmd(&vmf); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } } else { vmf.orig_pmd =3D pmdp_get_lockless(vmf.pmd); =20 --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B841B2C0323 for ; Fri, 26 Sep 2025 09:34:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879290; cv=none; b=SoMuWr5OQ2O/dymfsxWEFsmiBNQYj24b4+KdvMdKQuDSo5ox5eUYYs1C+YOxv8TBLA4rjPNLSnArHgTtszjuuJ+Z+hQcvpL4SIhEEnaIa3h+axUxoLLaRe3d2tRtu8j7pCRo2rJs8MegmZNtqZmjkEOpyL7Duf6vRTS/Cbhz5gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879290; c=relaxed/simple; bh=CXI0qMvWqpeePDr4xBG9e2uqemr9IpjeVUqDleKN93I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YfGxcw2ZIfcSxD66VCNm+OPV2tJgKzQT/+0a7LhvjL7gVBkP52n67SKRmVjSEZodpR9UJNGdY9PTYU6+LaoD4J+1Up5wtHc0xc/eh9vZsP3cbbvjZBrHIFsTkbOcOHTgaOdiOSUOBcbIeHfOQogTv/fH+B5n40JAJUAqFZBDl9Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DP8ZBPeR; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DP8ZBPeR" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-244580523a0so21147175ad.1 for ; Fri, 26 Sep 2025 02:34:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879288; x=1759484088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oOprrquoKlbkEreJojZ4KykFg8G0TswcVk79ghRz+Tk=; b=DP8ZBPeR5oR1hon/YL568Z773lSOyxCG4Z+gZ1kHN0RH4xkQ6MEp6q4BSo9DC2Q/R4 IXhE2EY/ogG6D1MV1CeWxmoVWyZMv5j0X1xq708K0wC80ly1w+mBMn24k6ARY5v1CrKr XR8NYrFaelqnSyv8jIE46QvZ4FVfYm2V1CgT1xvAGjKaNSc501uTFbiNCVmMeoZsWK9u M5hC2gJxnIWMk5vpq6q7D9fuC4rZsWZX8ecHwe0gWFnu9e+nOINMyVQ68oBFYFQn0QBr BhFjMLETOR60eoiZFOI+r4JQdbbBoaOY4PoxuEUJeFP/FabMkPoxihECfc0wGwg6kKMY u6yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879288; x=1759484088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oOprrquoKlbkEreJojZ4KykFg8G0TswcVk79ghRz+Tk=; b=XmdiySPUNvTesop2KoPxzetUMDWFH8g8HG2aXYqR3SRBlVQFxZQgYjD245IRylnDII Jv8+Zt+xTXnAnV9JiFMER7VcZ27M0UZ3fQ5XInWRQNAL74dpnSZQtoYGxAre1pKrvc1b t/9QJnLzKfem/fc57SI1c8WQ2M/IJJjG0hNVq+BNaeJ87OqzJPkw1TWM1a83ER5yh0Fl /WPvlBIYvO2upbvQoYjyCKpr/oaX2OqpzkmHCOy8HY71OCuOn9KpzOX/LeO/wVlB2JV7 K56JR52dms1zPQa3WNIh7WWNkeY4+1UQUAi2Css3fYLE/PVkgEo9/O5P1uq+xsvD6Vol vZdQ== X-Forwarded-Encrypted: i=1; AJvYcCXTjXBOQDfu+hWblenbDniMDD24e4OxzqWLBuQimDIrpGpUrgc+jOQ9EMb+jBhco0LqftAx8mkCn+RaM+s=@vger.kernel.org X-Gm-Message-State: AOJu0Yy2vhJon4GQUc+Y4py3tCb5CBczcWlHntIfxOB4FtsoPl+ydevr U0mj+nsqopgZztgsoW/5ivfEsFT+XdtCaQFUUrAai3ku4TimxUv8qx8U X-Gm-Gg: ASbGnctjQTYp06WNEGKu9cwnMoHLqd5hcQ6ikjQtO7jbR4AOwiaWdWYF0pRFXsA2nvP lJ4CA16gx3PlG5Nt5WFC2WfnWiSJ+UsTF9gpO2N9bt4/6fT+oPBf5wvPU9vpH3oLtvl8hxsMTkU gBQx6yO++adnU7pd5kayROQEjeAQHIPaJWZucGWlmIoWx5BrqzShj6LZJcV1uIrlcAOlhT6X8Nw tkPcN1PeUxeu6jl4wrFtE+Mgj2t9KWyNh9UxyCeJGPYKn/S3LvS7eTgckPWmVzUvnFMg/MHleMd anx1p+L/ZdjW7icEhmtLCbvLYybovYMkPKrRJYbOudQG+XmpJdgewf74XJ2eaQqcjiL0cSW8cA7 Kv4Olj/fbihRQhyUA5+o73KkYAAzXFz0f2T/ydt1Q8HHCyCxO1R8lGhyE6RcX7ZJCbScu0pZfhr VyFLxECxCV1k0w X-Google-Smtp-Source: AGHT+IHyNvkqZ/H58qTgrC7RxZoAtdYJitsFBBvDlhUuvEFdmMnVmU2Ltz5BgywZtZV48lyMJHsZWw== X-Received: by 2002:a17:902:fb4b:b0:26a:b9b4:8342 with SMTP id d9443c01a7336-27ed49e9308mr63284705ad.25.1758879287971; Fri, 26 Sep 2025 02:34:47 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:47 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 07/12] bpf: mark mm->owner as __safe_rcu_or_null Date: Fri, 26 Sep 2025 17:33:38 +0800 Message-Id: <20250926093343.1000-8-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When CONFIG_MEMCG is enabled, we can access mm->owner under RCU. The owner can be NULL. With this change, BPF helpers can safely access mm->owner to retrieve the associated task from the mm. We can then make policy decision based on the task attribute. The typical use case is as follows, bpf_rcu_read_lock(); // rcu lock must be held for rcu trusted field @owner =3D @mm->owner; // mm_struct::owner is rcu trusted or null if (!@owner) goto out; /* Do something based on the task attribute */ out: bpf_rcu_read_unlock(); Suggested-by: Andrii Nakryiko Signed-off-by: Yafang Shao Acked-by: Lorenzo Stoakes --- kernel/bpf/verifier.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c4f69a9e9af6..d400e18ee31e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7123,6 +7123,9 @@ BTF_TYPE_SAFE_RCU(struct cgroup_subsys_state) { /* RCU trusted: these fields are trusted in RCU CS and can be NULL */ BTF_TYPE_SAFE_RCU_OR_NULL(struct mm_struct) { struct file __rcu *exe_file; +#ifdef CONFIG_MEMCG + struct task_struct __rcu *owner; +#endif }; =20 /* skb->sk, req->sk are not RCU protected, but we mark them as such --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 446C42BDC19 for ; Fri, 26 Sep 2025 09:34:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879297; cv=none; b=r2KSnOsukiKV+1Z1wRBjJpouzJjBF7BvL2xGamULt9yqZHlSxKnEGIkdxBJ0TfbDGWZpHboTAL4TNj3IzmYhsm33FKnmcxtQ5KpxEqnze8Zs7/quP+sEGKw7ykqhU9ABf+CeTzKD5CTXQeV7yGdK0VrAYiqUj3MESIARAOYQdiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879297; c=relaxed/simple; bh=/PMdDC4FDfpJHwrfvPwAn92Vwj5QW0m0NvTBiXm9L2o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=G2NMIBXBqEhN4Bu3EqWkLARLdd98QCmX7kw9k6F7kKi0r0snXtb3A50wxPiCv/RLOP86g8YD1iYjLp/wWhozHfDrbhuogvF3GCECkL7DTVp1d9ortaXLuTzzzmgnJKjsAR+Xcshe84qDOJnxTNWdaXaIwmmq+giFRGOWfsJvJlI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZCtYvl9i; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZCtYvl9i" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-279e2554b5fso16903355ad.1 for ; Fri, 26 Sep 2025 02:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879295; x=1759484095; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n/w5JYrEq/PCfn3Rwv8DsVtkgCdC0Y7Hm3GpSfLGEBg=; b=ZCtYvl9ikGi0ehtdQzWeMnMzEojSZ6FTITmWcAXduTITfYKK65sIhl6Nmon9yJV4sb ymURvhbIT7bwm/FO44u9OJe/hGFOqmpFOrrj//Q/sXkL+OSVFF0q3F/F9zFDmtxwu233 QCxSmln2MvlFQjEDUw9qUDj3HBJQjs92knQR3yPTKQJ+MzxqbHkp/SUrbI4IIUD+i9GM SI7nLgnxRURhVF4aXTgT4xniTfCK45ADYePZDnYFIMFEEtEVYlPDD/5z2orUH0HM/jLA B1v819hx4+t8Til25dUtRiLIXTIlevSx66iO2h1mL0AhSBxrpgT7gBv459oMF7WzgtA9 LDBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879295; x=1759484095; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n/w5JYrEq/PCfn3Rwv8DsVtkgCdC0Y7Hm3GpSfLGEBg=; b=f16uf8hH1NOJccTdeME8IEvZwdewZcgrTP8a4D0mrGaP4qUDu+UCmfX7ISjAV9Ak+b 0idc9GKY3yq5vCDsmjDV8kcmIRLZ8B2x9LYF41ocBdlZbzJM5W9Bf7biigVJLpwTnhNG z8S4ASV2pXDTv/y6tcMDXAEbrjGwgfie5id+7EEpeNxZTjXPoEAkUCEqQxaMRw1UxqxT bV5PXgLEnUyXUdE/StMtw5D6/D012KG6/PvniElZNJwxqOifUFQx9aU+DXNYnmAvStpc ST3NYtEudl/qh4xdabKQCtvWA65nLNA9TL9tdR5kxSbGOFU/f1mY0yJDwwUbSTKdTDT3 X+hA== X-Forwarded-Encrypted: i=1; AJvYcCVH4LiXYtoxUWpjAOn/GJOAD5Xyp0Np9v952SkRNnP7F0JgR47aaij+8ye2JRw5I1wSaeJKJCbMI8OPVCY=@vger.kernel.org X-Gm-Message-State: AOJu0Yzz9nrWtb23laTd15863r2VpCMEEd+gbO0nrtV9sXPicWAi/Jrm +d2WGHOHMrhlCs9wzuWnEE4D1Q4S2Jp/kxzJbrKQ2XKajIIqFoo7s2P9 X-Gm-Gg: ASbGnctEoD5uBG5asoXlOgm1VXzZW7zArg+hcWfOx5C3mRUakjKaFAwWLmK7eMsiXmo 5okEpSbVOUmR69rXp+F3t8BiWK0WNsWCwTaNqtVESJR+rjbHVjbf6n1Z0h3YT4JebsHxvN1ODN5 8HkG0+7ld58j9GONfZNYCtJz1M1fx+mux+7R5bHu8NJOAmgwg4mo1B9ht0pytBYAjcECMDVTy8r G3Ake5vmQlcwMDEk6Ot/Jn1lrJtV/OYyPbKIXL7zSDiuynkBn9EYvehlU09/rUW39tT4BInm/XF 629N6K4cykC/i9UBy7QE3Pz2VuL70DqMgMP/1zc+VaKoXua/67gwcrF2TuI71OhNH2u6NiEAwsx EJ4/2fAzkT+YNs6+Eq/0rbev3erUlHyLR39zdDxUQ7iOtJeuhUXRFhQO7XY/a7v6pUb91ArwyI5 8gfPUqg7dUzb88 X-Google-Smtp-Source: AGHT+IHrdjXmU7uWxKIA6T2fxuFhYbTSMB1I+H7KjcACH1uB/bPjPmFTLzfvYSiunLXewY8lMi00xQ== X-Received: by 2002:a17:902:db11:b0:27c:3690:2c5d with SMTP id d9443c01a7336-27ed6780b92mr76951455ad.0.1758879295495; Fri, 26 Sep 2025 02:34:55 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:34:54 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 08/12] bpf: mark vma->vm_mm as __safe_trusted_or_null Date: Fri, 26 Sep 2025 17:33:39 +0800 Message-Id: <20250926093343.1000-9-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus, we can mark it as trusted_or_null. With this change, BPF helpers can safely access vma->vm_mm to retrieve the associated mm_struct from the VMA. Then we can make policy decision from the VMA. The "trusted" annotation enables direct access to vma->vm_mm within kfuncs marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and bpf_task_under_cgroup(). Conversely, "null" enforcement requires all callsites using vma->vm_mm to perform NULL checks. The lsm selftest must be modified because it directly accesses vma->vm_mm without a NULL pointer check; otherwise it will break due to this change. For the VMA based THP policy, the use case is as follows, @mm =3D @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null if (!@mm) return; bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner @owner =3D @mm->owner; // mm_struct::owner is rcu trusted or null if (!@owner) goto out; @cgroup1 =3D bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID); /* make the decision based on the @cgroup1 attribute */ bpf_cgroup_release(@cgroup1); // release the associated cgroup out: bpf_rcu_read_unlock(); PSI memory information can be obtained from the associated cgroup to inform policy decisions. Since upstream PSI support is currently limited to cgroup v2, the following example demonstrates cgroup v2 implementation: @owner =3D @mm->owner; if (@owner) { // @ancestor_cgid is user-configured @ancestor =3D bpf_cgroup_from_id(@ancestor_cgid); if (bpf_task_under_cgroup(@owner, @ancestor)) { @psi_group =3D @ancestor->psi; /* Extract PSI metrics from @psi_group and * implement policy logic based on the values */ } } Signed-off-by: Yafang Shao Acked-by: Lorenzo Stoakes Cc: "Liam R. Howlett" --- kernel/bpf/verifier.c | 5 +++++ tools/testing/selftests/bpf/progs/lsm.c | 8 +++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d400e18ee31e..b708b98f796c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7165,6 +7165,10 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket) { struct sock *sk; }; =20 +BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct) { + struct mm_struct *vm_mm; +}; + static bool type_is_rcu(struct bpf_verifier_env *env, struct bpf_reg_state *reg, const char *field_name, u32 btf_id) @@ -7206,6 +7210,7 @@ static bool type_is_trusted_or_null(struct bpf_verifi= er_env *env, { BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket)); BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct dentry)); + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct)); =20 return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_trusted_or_null"); diff --git a/tools/testing/selftests/bpf/progs/lsm.c b/tools/testing/selfte= sts/bpf/progs/lsm.c index 0c13b7409947..7de173daf27b 100644 --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -89,14 +89,16 @@ SEC("lsm/file_mprotect") int BPF_PROG(test_int_hook, struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot, int ret) { - if (ret !=3D 0) + struct mm_struct *mm =3D vma->vm_mm; + + if (ret !=3D 0 || !mm) return ret; =20 __s32 pid =3D bpf_get_current_pid_tgid() >> 32; int is_stack =3D 0; =20 - is_stack =3D (vma->vm_start <=3D vma->vm_mm->start_stack && - vma->vm_end >=3D vma->vm_mm->start_stack); + is_stack =3D (vma->vm_start <=3D mm->start_stack && + vma->vm_end >=3D mm->start_stack); =20 if (is_stack && monitored_pid =3D=3D pid) { mprotect_count++; --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A71782BDC2F for ; Fri, 26 Sep 2025 09:35:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879305; cv=none; b=XahrplLZkevdc+oj1ChBxXaOhsdJOHkG05ghnddDM9cGOopmqE1XYA1nT/VmIM3K5ZjIfXkGp1N4/7eVQLuY9PjYL+PjXZS7vVZKsnZAb8i/rJOUwARExTsEmSuHshOg5jWTGDYzjbzKQc9w+3KU7EmG5fDr5+aHz9LjFaN/HLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879305; c=relaxed/simple; bh=HBmJaZcDUZzcrC81eiCJDGIY48KrzLAuQ2+UCr8xkqo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I4VZ/zPXkYgIPXL2uJuK6S76x/XI0U/lLWttyi0xofb9qzt8f4ZVdWSALEfIbHsjUUllCeT4VUY+b3q94YeRQM7s57PzNhqPJWm7Dnhj2K71edBLLIQ/olYiIrOankF/litHOHDx9KMBFI8nFeH4JTVeO8nBxISibln10Ppq1/8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Tuj7WRGm; arc=none smtp.client-ip=209.85.215.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Tuj7WRGm" Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-b551b040930so1328489a12.2 for ; Fri, 26 Sep 2025 02:35:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879303; x=1759484103; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KejkPMgGRvReGxuBAv36+hrv4MPOBVSti8PCSW2auKE=; b=Tuj7WRGmflzMYd55lgdtWQ6wdMvkJpx++Rra89hqoM8ukhf9f0/t4A01+2JuRlHwWA UAgsA5cAo8SdBBVVOj6UIRZxXDq109d4lJEJn2S+nMyj3EOVi+sKvmJ3OuR9n+WmdJNZ GWFhNbCgCOdOTesC2WjK127YhRwTUCpPh+lFLK4uxcV3MZmifbnOR2sWvRXJY0IoSHDA WX/eY1SFqZkxvbilusCNnlIZcJBEpOWu6lYdhd+2ii6EVWMCqzMBvK+bBS+Czs3bIB88 tbfspq7iEE9h9QT0dVn3H26PORBtIHURc5Ftudb314SctaqUQI7Ed2Db1aOZ/M60/d7G OmzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879303; x=1759484103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KejkPMgGRvReGxuBAv36+hrv4MPOBVSti8PCSW2auKE=; b=a0EgrWsOy0N8b1to3RjhcG70Y/D2eiJa23mgUenPH7NfAPfyJ6YJ/16ix2UM+1LKgF k87nglyXCn88pT6FmBQUlFaGfnfggUKl4fmk4F8VOsP8nXsR/qZt7An0eCoP3P1GZDl1 gMoMVzNaYqfzqitjVR5nXO4py3VhDZVpCRiIDan42x4jqIYvgVD3kTNSinQdFSIVnizt rl+m/KHRQTyvXxo+F+NROSR/hMMVRG5PdKYu8+fKCpv92EYVOAkQKUpVEc/Q0qOS47Co CIWcK4eyV40QiAfsOWN6Jvuk3nc9263SM6h2KRRB9zpMnp/AKPeDDMJmGkSHkwJuwJbs JAnA== X-Forwarded-Encrypted: i=1; AJvYcCVyUQSaFTIm3vMGMA1QWbepg9ZfYQ3+Kimwf90RT8l60X/4GqX2CCpSeE9KG0AIScZv7buZiTYBEQ6U8xw=@vger.kernel.org X-Gm-Message-State: AOJu0YwzBWJcnxPcVnfpHCp+3/bYiGoZPwKbWjecNY8RCnC2QeMtxsKl C/YbR1asSUO/gOLRPwrsQ4vdVMi1cHRR/ULddw38R4QBvMVy1pFh/ADh X-Gm-Gg: ASbGnctZ7egPVymPoPhJARXKB11PNkwGALceXZwirpIVa7Z2CHI76sLlFQRPEi22o0U 4zFWwLAYELvoN9a0HarUxaU6jUmDOpJzMn5/TcnJhBQtLoCpyxDEDCCWF0JeZvSR/cZvZmjlfDD vqx8CiUZZx3A+XbDI+oP6HYkYuXDHQPbchu3pOjTB7Uc9bFydjLZDPMDu3dV0IQROj+UggJW7IN ffu/wETwgO7Vn+zrE+MU6Cl3c/eUw3En0bLVj3jZVZTLhhUz60Ixn7yYNix2mO+7ZdXdWte1Ao2 ikJPoqF2hAlvitsJZo1B7wMsosdN76pPNPMmySSHDzdNRMbftWZs54880P4CfT6albW0Dj/uHos aYr0VpEe/mmzRJ60uT/bJmbPjZrF6fHyzQXpQuJJkgH3WtmbsIqrJS+pohQj42HslOZ6wDkf2K9 Qby96rzy3qalfh X-Google-Smtp-Source: AGHT+IHJMvMRnfOqzEak3qmEUf11dLxaZn5jFt91kzwtU8AVAcW525LzqoVivLg7u9BwGX2hLSfT8w== X-Received: by 2002:a17:902:e749:b0:27d:6cdc:99e4 with SMTP id d9443c01a7336-27ed49e6cacmr73186945ad.21.1758879303020; Fri, 26 Sep 2025 02:35:03 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.34.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:35:02 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 09/12] selftests/bpf: add a simple BPF based THP policy Date: Fri, 26 Sep 2025 17:33:40 +0800 Message-Id: <20250926093343.1000-10-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test case implements a basic THP policy that sets THPeligible to 1 for a specific task and to 0 for all others. I selected THPeligible for verification because its straightforward nature makes it ideal for validating the BPF THP policy functionality. Signed-off-by: Yafang Shao --- MAINTAINERS | 2 + tools/testing/selftests/bpf/config | 3 + .../selftests/bpf/prog_tests/thp_adjust.c | 258 ++++++++++++++++++ .../selftests/bpf/progs/test_thp_adjust.c | 41 +++ 4 files changed, 304 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/thp_adjust.c create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust.c diff --git a/MAINTAINERS b/MAINTAINERS index 7be34b2a64fd..c1219bcd27c1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16260,6 +16260,8 @@ F: mm/huge_memory.c F: mm/huge_memory_bpf.c F: mm/khugepaged.c F: mm/mm_slot.h +F: tools/testing/selftests/bpf/prog_tests/thp_adjust.c +F: tools/testing/selftests/bpf/progs/test_thp_adjust* F: tools/testing/selftests/mm/khugepaged.c F: tools/testing/selftests/mm/split_huge_page_test.c F: tools/testing/selftests/mm/transhuge-stress.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/b= pf/config index 8916ab814a3e..7ccb9809e276 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -26,6 +26,7 @@ CONFIG_DMABUF_HEAPS=3Dy CONFIG_DMABUF_HEAPS_SYSTEM=3Dy CONFIG_DUMMY=3Dy CONFIG_DYNAMIC_FTRACE=3Dy +CONFIG_BPF_THP_GET_ORDER_EXPERIMENTAL=3Dy CONFIG_FPROBE=3Dy CONFIG_FTRACE_SYSCALLS=3Dy CONFIG_FUNCTION_ERROR_INJECTION=3Dy @@ -51,6 +52,7 @@ CONFIG_IPV6_TUNNEL=3Dy CONFIG_KEYS=3Dy CONFIG_LIRC=3Dy CONFIG_LWTUNNEL=3Dy +CONFIG_MEMCG=3Dy CONFIG_MODULE_SIG=3Dy CONFIG_MODULE_SRCVERSION_ALL=3Dy CONFIG_MODULE_UNLOAD=3Dy @@ -114,6 +116,7 @@ CONFIG_SECURITY=3Dy CONFIG_SECURITYFS=3Dy CONFIG_SYN_COOKIES=3Dy CONFIG_TEST_BPF=3Dm +CONFIG_TRANSPARENT_HUGEPAGE=3Dy CONFIG_UDMABUF=3Dy CONFIG_USERFAULTFD=3Dy CONFIG_VSOCKETS=3Dy diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/te= sting/selftests/bpf/prog_tests/thp_adjust.c new file mode 100644 index 000000000000..b14f57040654 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c @@ -0,0 +1,258 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include "test_thp_adjust.skel.h" + +#define LEN (16 * 1024 * 1024) /* 16MB */ +#define THP_ENABLED_FILE "/sys/kernel/mm/transparent_hugepage/enabled" +#define PMD_SIZE_FILE "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" + +static struct test_thp_adjust *skel; +static char old_mode[32]; +static long pagesize; + +static int thp_mode_save(void) +{ + const char *start, *end; + char buf[128]; + int fd, err; + size_t len; + + fd =3D open(THP_ENABLED_FILE, O_RDONLY); + if (fd =3D=3D -1) + return -1; + + err =3D read(fd, buf, sizeof(buf) - 1); + if (err =3D=3D -1) + goto close; + + start =3D strchr(buf, '['); + end =3D start ? strchr(start, ']') : NULL; + if (!start || !end || end <=3D start) { + err =3D -1; + goto close; + } + + len =3D end - start - 1; + if (len >=3D sizeof(old_mode)) + len =3D sizeof(old_mode) - 1; + strncpy(old_mode, start + 1, len); + old_mode[len] =3D '\0'; + +close: + close(fd); + return err; +} + +static int thp_mode_set(const char *desired_mode) +{ + int fd, err; + + fd =3D open(THP_ENABLED_FILE, O_RDWR); + if (fd =3D=3D -1) + return -1; + + err =3D write(fd, desired_mode, strlen(desired_mode)); + close(fd); + return err; +} + +static int thp_mode_reset(void) +{ + int fd, err; + + fd =3D open(THP_ENABLED_FILE, O_WRONLY); + if (fd =3D=3D -1) + return -1; + + err =3D write(fd, old_mode, strlen(old_mode)); + close(fd); + return err; +} + +static char *thp_alloc(void) +{ + char *addr; + int err, i; + + addr =3D mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, = -1, 0); + if (addr =3D=3D MAP_FAILED) + return NULL; + + err =3D madvise(addr, LEN, MADV_HUGEPAGE); + if (err =3D=3D -1) + goto unmap; + + /* Accessing a single byte within a page is sufficient to trigger a page = fault. */ + for (i =3D 0; i < LEN; i +=3D pagesize) + addr[i] =3D 1; + return addr; + +unmap: + munmap(addr, LEN); + return NULL; +} + +static void thp_free(char *ptr) +{ + munmap(ptr, LEN); +} + +static int get_pmd_order(void) +{ + ssize_t bytes_read, size; + int fd, order, ret =3D -1; + char buf[64], *endptr; + + fd =3D open(PMD_SIZE_FILE, O_RDONLY); + if (fd < 0) + return -1; + + bytes_read =3D read(fd, buf, sizeof(buf) - 1); + if (bytes_read <=3D 0) + goto close_fd; + + /* Remove potential newline character */ + if (buf[bytes_read - 1] =3D=3D '\n') + buf[bytes_read - 1] =3D '\0'; + + size =3D strtoul(buf, &endptr, 10); + if (endptr =3D=3D buf || *endptr !=3D '\0') + goto close_fd; + if (size % pagesize !=3D 0) + goto close_fd; + ret =3D size / pagesize; + if ((ret & (ret - 1)) =3D=3D 0) { + order =3D 0; + while (ret > 1) { + ret >>=3D 1; + order++; + } + ret =3D order; + } + +close_fd: + close(fd); + return ret; +} + +static int get_thp_eligible(pid_t pid, unsigned long addr) +{ + int this_vma =3D 0, eligible =3D -1; + unsigned long start, end; + char smaps_path[64]; + FILE *smaps_file; + char line[4096]; + + snprintf(smaps_path, sizeof(smaps_path), "/proc/%d/smaps", pid); + smaps_file =3D fopen(smaps_path, "r"); + if (!smaps_file) + return -1; + + while (fgets(line, sizeof(line), smaps_file)) { + if (sscanf(line, "%lx-%lx", &start, &end) =3D=3D 2) { + /* addr is monotonic */ + if (addr < start) + break; + this_vma =3D (addr >=3D start && addr < end) ? 1 : 0; + continue; + } + + if (!this_vma) + continue; + + if (strstr(line, "THPeligible:")) { + sscanf(line, "THPeligible: %d", &eligible); + break; + } + } + + fclose(smaps_file); + return eligible; +} + +static void subtest_thp_eligible(void) +{ + struct bpf_link *ops_link; + int elighble; + pid_t pid; + char *ptr; + + ops_link =3D bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops); + if (!ASSERT_OK_PTR(ops_link, "attach struct_ops")) + return; + + pid =3D getpid(); + ptr =3D thp_alloc(); + if (!ASSERT_OK_PTR(ptr, "THP alloc")) + goto detach; + + skel->bss->pid_eligible =3D pid; + elighble =3D get_thp_eligible(pid, (unsigned long)ptr); + ASSERT_EQ(elighble, 1, "THPeligible"); + + skel->bss->pid_eligible =3D 0; + skel->bss->pid_not_eligible =3D pid; + elighble =3D get_thp_eligible(pid, (unsigned long)ptr); + ASSERT_EQ(elighble, 0, "THP not eligible"); + + skel->bss->pid_eligible =3D 0; + skel->bss->pid_not_eligible =3D 0; + elighble =3D get_thp_eligible(pid, (unsigned long)ptr); + ASSERT_EQ(elighble, 0, "THP not eligible"); + + thp_free(ptr); +detach: + bpf_link__destroy(ops_link); +} + +static int thp_adjust_setup(void) +{ + int err =3D -1, pmd_order; + + pagesize =3D sysconf(_SC_PAGESIZE); + pmd_order =3D get_pmd_order(); + if (!ASSERT_NEQ(pmd_order, -1, "get_pmd_order")) + return -1; + + if (!ASSERT_NEQ(thp_mode_save(), -1, "THP mode save")) + return -1; + if (!ASSERT_GE(thp_mode_set("madvise"), 0, "THP mode set")) + return -1; + + skel =3D test_thp_adjust__open(); + if (!ASSERT_OK_PTR(skel, "open")) + goto thp_reset; + + skel->bss->pmd_order =3D pmd_order; + + err =3D test_thp_adjust__load(skel); + if (!ASSERT_OK(err, "load")) + goto destroy; + return 0; + +destroy: + test_thp_adjust__destroy(skel); +thp_reset: + ASSERT_GE(thp_mode_reset(), 0, "THP mode reset"); + return err; +} + +static void thp_adjust_destroy(void) +{ + test_thp_adjust__destroy(skel); + ASSERT_GE(thp_mode_reset(), 0, "THP mode reset"); +} + +void test_thp_adjust(void) +{ + if (thp_adjust_setup() =3D=3D -1) + return; + + if (test__start_subtest("thp_eligible")) + subtest_thp_eligible(); + + thp_adjust_destroy(); +} diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust.c b/tools/te= sting/selftests/bpf/progs/test_thp_adjust.c new file mode 100644 index 000000000000..ed8c510693a0 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +char _license[] SEC("license") =3D "GPL"; + +int pid_not_eligible, pid_eligible; +int pmd_order; + +SEC("struct_ops/thp_get_order") +int BPF_PROG(thp_eligible, struct vm_area_struct *vma, enum tva_type tva_t= ype, + unsigned long orders) +{ + struct mm_struct *mm =3D vma->vm_mm; + int suggested_order =3D 0; + struct task_struct *p; + + if (tva_type !=3D TVA_SMAPS) + return 0; + + if (!mm) + return 0; + + /* This BPF hook is already under RCU */ + p =3D mm->owner; + if (!p || (p->pid !=3D pid_eligible && p->pid !=3D pid_not_eligible)) + return 0; + + if (p->pid =3D=3D pid_eligible) + suggested_order =3D pmd_order; + else + suggested_order =3D 30; /* invalid order */ + return suggested_order; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops thp_eligible_ops =3D { + .thp_get_order =3D (void *)thp_eligible, +}; --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E30562BE021 for ; Fri, 26 Sep 2025 09:35:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879312; cv=none; b=GdIac65S0g9VNfscFNb5MySxlc2EA8MSPyYzy8RFkL28vURN//uf1k6E1pzdZSH6jOmor58iINUGcWSZZlXfT/TEoPa+g9OlzxIr8HhrvX8QskTWABsT4rsxd2zOH96p0kHWFCpkSrLwCbOIGRQl9ejDhB/LWYmy1/elM19ktv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879312; c=relaxed/simple; bh=zrlWjexRS+e6cnuSCMRZ3/yt6yu9oHpAebCTZ9vSuDs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=itXIeGfdo7/zsAFDWe2m4Vbu//WovJuDMfuShB4hA6h94EzfltEv+ieo1J+3hXFso2zum4otMVamWpOlkRTHFHFvwapAeRUacoNpbENqqzf4bYpSXidCNW4pbOZslMBARETsrmGKlzn6ssfSNJK1fgWkrOvoOXYPuJQlbAaVq70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NVora+UU; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NVora+UU" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2681660d604so19269845ad.0 for ; Fri, 26 Sep 2025 02:35:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879310; x=1759484110; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rw7WNnQLHMUa0Yn2DLPfVIa+t6BCCpmxXRHnFLCbq0M=; b=NVora+UUFXjQA9a6jq4f8i1ID4HKYlAjXIzCMjvNJ8PkLP//n0Iu0L9CK352sg9XbS Z8D4pBWGZKt1Kjhb9O6fBiBDHp0O/SYsRs6gdxO6R02mRnQ5m0lG5pDM4UChnH1+jUJw bzX5V9YiFycDZGiXZLyQG9KyPUERRKqDApjBr7kzmaAKVm9IPDTjxagBvV3nXRJW0k7t PTGCv5Bakkx1iG5DFsN8dKD8Bx144kY7iZf5VgOyShNTKYDJROav6usuYe3b/bQOnrmc b3TIsz0fsXFBsDjPaZzjV6Rw64+N+wgWNGo+n+gnBBDxAaeOgEga9bWJmKvfapWSzT/H HVbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879310; x=1759484110; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rw7WNnQLHMUa0Yn2DLPfVIa+t6BCCpmxXRHnFLCbq0M=; b=SdgvWKNpkS5R63YfmlLvc77Y+eq/4dCAzKypA0vYPgqZD3Vksu+0EgfEkWvHN4iocD ZzVrCq1Sl1HHCtSlg/78fYzJWDkZeZjJgWl76A3RLScbyZLfRLAZrdIMrsZbQo9CPoMu W7099CECVHZqan3uxmNIozASxuMqWzC2vzcL0xiXprFi384hazYqlG3K5GqRunJnRIL/ eblsw47TfX7cNyxs/G62fEfWrW27Ss+rWZ7XE4+mE+/Sh4/kSdt2v1WYtKTUMU9LeAQ6 71wQ3gP8WkeWI40qlbtBhYc8HcCw7rSVXjeqEANHiYVUi2dx0DnMkYu6pMpCrd7AqSR5 ctQg== X-Forwarded-Encrypted: i=1; AJvYcCVN1laucPifj+QibWU1MstZTXUW8dE9YmZEGzD2exNLGaKJOS35ebuEDauUmF5mDFxQOEA0zKS/dUZvZzs=@vger.kernel.org X-Gm-Message-State: AOJu0YwS6dZow0H3psHMH+4VLU2KKdSZHzHNdlHQLi2lWhG9SqOK+D1Q D/1aWl0nBLxWYv+UI/UwYLiflMm5uHbU3BxLaklXEetYaG2wvlyq/jeW X-Gm-Gg: ASbGnct3Pf9Cqwz/fGb95x9kDTaM7RpGHye22QIkwseD2hdly9o/KJhxcE/4yBlSRuo bGZGknkInGb5xd579eMgfX3+BC/GsHp3ECHoGVEIZyTliOQ8ZWjQ6VIH+FzXnc3XCys9L4Q4oYO HxmfEFAJleyErUmAnDtCaQi7J+ZNKwxkoWFWL6xjE0KCMZ+VB/CYsjKw8ctAUDagipOLa8AVwxW Cg7ONmE/kFeJsfPcyKh6jxDJe7oXAGCeIxF7a6+FVWpsyNWsv7UUyu9BMsGof65HIAqHzRVr8ch nCuNEqcEu4qZgUo0gElyuLkkrlkMmPXuaKMWMT0K4e1EBG8ANxUQznu6rHK0cYMa21+nd/1E6QS yHLqD7fq/1RE7B2kdeYjrJfekrt5/ecRUOenJW+EmdqfxnwuAewlV6657E+v/cZv4l1iHksF0B2 Vs+bjjdVQ70Jro X-Google-Smtp-Source: AGHT+IEEuMzIy4jcfzpCO06+83GzrwWygKfi3d1uaHoYRS3Fbpuh2MpzvVbrFawoF8VwJAU/udjvNw== X-Received: by 2002:a17:903:37ce:b0:278:f46b:d496 with SMTP id d9443c01a7336-27ed4a6085amr60657335ad.55.1758879310067; Fri, 26 Sep 2025 02:35:10 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.35.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:35:09 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 10/12] selftests/bpf: add test case to update THP policy Date: Fri, 26 Sep 2025 17:33:41 +0800 Message-Id: <20250926093343.1000-11-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test case exercises the BPF THP update mechanism by modifying an existing policy. The behavior confirms that: - EBUSY error occurs when attempting to install a new BPF program while another is active - Updates to currently running programs are successfully processed Signed-off-by: Yafang Shao --- .../selftests/bpf/prog_tests/thp_adjust.c | 23 +++++++++++++++++++ .../selftests/bpf/progs/test_thp_adjust.c | 14 +++++++++++ 2 files changed, 37 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/te= sting/selftests/bpf/prog_tests/thp_adjust.c index b14f57040654..72b2ec31025a 100644 --- a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c +++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c @@ -208,6 +208,27 @@ static void subtest_thp_eligible(void) bpf_link__destroy(ops_link); } =20 +static void subtest_thp_policy_update(void) +{ + struct bpf_link *old_link, *new_link; + int err; + + old_link =3D bpf_map__attach_struct_ops(skel->maps.swap_ops); + if (!ASSERT_OK_PTR(old_link, "attach_old_link")) + return; + + new_link =3D bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops); + if (!ASSERT_NULL(new_link, "attach_new_link")) + goto destory_old; + ASSERT_EQ(errno, EBUSY, "attach_new_link"); + + err =3D bpf_link__update_map(old_link, skel->maps.thp_eligible_ops); + ASSERT_EQ(err, 0, "update_old_link"); + +destory_old: + bpf_link__destroy(old_link); +} + static int thp_adjust_setup(void) { int err =3D -1, pmd_order; @@ -253,6 +274,8 @@ void test_thp_adjust(void) =20 if (test__start_subtest("thp_eligible")) subtest_thp_eligible(); + if (test__start_subtest("policy_update")) + subtest_thp_policy_update(); =20 thp_adjust_destroy(); } diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust.c b/tools/te= sting/selftests/bpf/progs/test_thp_adjust.c index ed8c510693a0..8f3bc4768edc 100644 --- a/tools/testing/selftests/bpf/progs/test_thp_adjust.c +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust.c @@ -39,3 +39,17 @@ SEC(".struct_ops.link") struct bpf_thp_ops thp_eligible_ops =3D { .thp_get_order =3D (void *)thp_eligible, }; + +SEC("struct_ops/thp_get_order") +int BPF_PROG(alloc_not_in_swap, struct vm_area_struct *vma, enum tva_type = tva_type, + unsigned long orders) +{ + if (tva_type =3D=3D TVA_SWAP_PAGEFAULT) + return 0; + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops swap_ops =3D { + .thp_get_order =3D (void *)alloc_not_in_swap, +}; --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E2342BE620 for ; Fri, 26 Sep 2025 09:35:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879319; cv=none; b=NsIc6CaqQanyZ7n56xVwZflNAnJMCg0OXUGnQTV1DNCKJYCPu9CzrXtQYPAELjBruYfwi2Trv3E2lfqk/sxYKtX67Kr7p/y3lvPPuhLkHm6/Fe4o68dDrnRH5xso2ZPwfy8U9qSw3qPmD7KYh2zxUhzrMX21XR8SjmFCf3S2HHQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879319; c=relaxed/simple; bh=aHAHaqsBoprnxz9EY70FLRuOd/jhpKeP2yAc03cB17I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=N5Liy23oY6aW5aytFFYgPZd5zUlYWE+oF3AuUyn3WM9iyb0tRXBAhUU904rujWV2FBklpzm0C/BIU6s9UAiQZQdoOh/x1+YoxrYmtrqy2kVOxIYGrOyhvXhgQJTJL2z5Sv4kYPKLXxZep846TOf+cWZz3ilEsZLIVuxqsv/cNNM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AeGIC4dX; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AeGIC4dX" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2570bf6058aso28217175ad.0 for ; Fri, 26 Sep 2025 02:35:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879317; x=1759484117; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GjEsZ2zZ0gswTTfZGuAqtKyUugB4UMlIc5tO04MQRtQ=; b=AeGIC4dXsbF8f+pw/D7Qxws7M/v3fplxR7zs5QfeS6hEfwwmkBeu68kQkHSuJCHTuo jOwsk7UtFOS1WiIGBXkAM5vKsVvxo4ERoyboFTqtH95hfl7WDWhrC7juwfAOLNRQPDOA sq4KT49lfjxBkG/7xan6SYOnxABt0vpXOx4sA5srEPnMG5zgABzPuyXZoabqIhFWJc9s +DvL4c5tlPiIhrMjcyzY9poC5/Ce7Fo98Ufa1kQRTz+Rz9VZFjjY9ORFw2JJIdu0YGXZ VOlV6gL+pQ09Qh7WoUYbXKz8pZmDIprhSmuSdSjVesM6BGfy4RK8lpd8EjxHoO4JavDm TzRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879317; x=1759484117; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GjEsZ2zZ0gswTTfZGuAqtKyUugB4UMlIc5tO04MQRtQ=; b=loCmqbL8hnHWwdpSZeBGBcrcfBgb9Pwh5M7iTH68EY3OS3uftT4Gq2QIspMK1vjTzP a3rRTy1mXhmh8cnyU1pkWyT7J8eKWDmuYoEBjxKIdFJAk9E/Jpi82k1km/ics73V+/Ll lVZU5kPLU+CZ1zo7/g9f/Y8XW3rH7GF6KC3pmyWcizdRnH0dY7M1ftlc/mrXG3iv5IFZ qoHvP8FTyO0AUK2t/SU7zB4Dxd8heEhqvqLVCgFChAHBKXtug+DMBVQv5bZhdH2CrvA0 BJuA/CEpreOO1qiEYFBiZ1Gjdmt4y1HupNBeb/Z6sbCRCW7BRlDWF6oQ5l/aKpd9gz4H htrA== X-Forwarded-Encrypted: i=1; AJvYcCUIyuAaELbJJv72mIU8kaqnbtpqe1p8n072cxsy//XveuoD2p10sqWD10CYQfQJS9Ef8fnP8iv1Ef23IdM=@vger.kernel.org X-Gm-Message-State: AOJu0YxBxmaZ45YKHn1ilLBRRuJKYY4ByGGKbNgUzWfln6a8UUAa5Hq7 M7vbWkLY6iCJ0OdNCcH+m8pTZy1hYE8OZjHyNxhhT2HXhTuzzaGLlGVt X-Gm-Gg: ASbGncscv3mFAeDOdG+foo2iZwAH/o7emkL7vlV83cYjCcUOXzkzSwD8/3KDflFRBhM sxJ9WrtOr9P3WyFSjLcOXRAWnqQ4Q6d0ZuXF9apToJfdTHPfgcbDHv1YadVroGS7foTdB8lSjO+ B6R6g57SomJOfr4pV+cTrFhFffOQ0d10CAQQbh7z7RFWhyt4yr8diNjmhFP64SpwGjaVzDm4j/Z 6+aWuATh8+/m/y/WZYZqpnqwABr6Z0XAUJ/PRpl+TUx5FCCLWhRxYNt66UOKvHCU4xqJYuI87YP WyD0/ifESskLj00MlDncnLghL1VfSI8ST7MR6VonQOcNxOQwrs2IkYe20kGWvw8yPiRIE0pkumq ltxBvgsQlQ65bQZ9NAxal+3/OP+hBTuZw3O82Khu8rPFiN4XNsKQ3fAwGa+5LZdnZpQQf6a62cV GQRMhXzUBZm7RB X-Google-Smtp-Source: AGHT+IGyv8obkrIygTqJyqvVbel1FZKkyg7zW+CRqiZxE8ebTcWIuicJPPFReDEq4JzQ2A2SZCWUTg== X-Received: by 2002:a17:903:2847:b0:269:4741:6d33 with SMTP id d9443c01a7336-27ed49d2c35mr46947235ad.23.1758879317548; Fri, 26 Sep 2025 02:35:17 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.35.10 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:35:17 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 11/12] selftests/bpf: add test cases for invalid thp_adjust usage Date: Fri, 26 Sep 2025 17:33:42 +0800 Message-Id: <20250926093343.1000-12-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 1. The trusted vma->vm_mm pointer can be null and must be checked before dereferencing. 2. The trusted mm->owner pointer can be null and must be checked before dereferencing. 3. Sleepable programs are prohibited because the call site operates under RCU protection. Signed-off-by: Yafang Shao --- .../selftests/bpf/prog_tests/thp_adjust.c | 7 +++++ .../bpf/progs/test_thp_adjust_sleepable.c | 22 ++++++++++++++ .../bpf/progs/test_thp_adjust_trusted_owner.c | 30 +++++++++++++++++++ .../bpf/progs/test_thp_adjust_trusted_vma.c | 27 +++++++++++++++++ 4 files changed, 86 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust_sleep= able.c create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust_trust= ed_owner.c create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust_trust= ed_vma.c diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/te= sting/selftests/bpf/prog_tests/thp_adjust.c index 72b2ec31025a..2e9864732c11 100644 --- a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c +++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c @@ -4,6 +4,9 @@ #include #include #include "test_thp_adjust.skel.h" +#include "test_thp_adjust_sleepable.skel.h" +#include "test_thp_adjust_trusted_vma.skel.h" +#include "test_thp_adjust_trusted_owner.skel.h" =20 #define LEN (16 * 1024 * 1024) /* 16MB */ #define THP_ENABLED_FILE "/sys/kernel/mm/transparent_hugepage/enabled" @@ -278,4 +281,8 @@ void test_thp_adjust(void) subtest_thp_policy_update(); =20 thp_adjust_destroy(); + + RUN_TESTS(test_thp_adjust_trusted_vma); + RUN_TESTS(test_thp_adjust_trusted_owner); + RUN_TESTS(test_thp_adjust_sleepable); } diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust_sleepable.c = b/tools/testing/selftests/bpf/progs/test_thp_adjust_sleepable.c new file mode 100644 index 000000000000..4db78f2f0b2d --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust_sleepable.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +SEC("struct_ops.s/thp_get_order") +__failure __msg("attach to unsupported member thp_get_order of struct bpf_= thp_ops") +int BPF_PROG(thp_sleepable, struct vm_area_struct *vma, enum tva_type tva_= type, + unsigned long orders) +{ + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops vma_ops =3D { + .thp_get_order =3D (void *)thp_sleepable, +}; diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_owne= r.c b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_owner.c new file mode 100644 index 000000000000..88bb09cb7cc2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_owner.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +SEC("struct_ops/thp_get_order") +__failure __msg("R3 pointer arithmetic on rcu_ptr_or_null_ prohibited, nul= l-check it first") +int BPF_PROG(thp_trusted_owner, struct vm_area_struct *vma, enum tva_type = tva_type, + unsigned long orders) +{ + struct mm_struct *mm =3D vma->vm_mm; + struct task_struct *p; + + if (!mm) + return 0; + + p =3D mm->owner; + bpf_printk("The task name is %s\n", p->comm); + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops vma_ops =3D { + .thp_get_order =3D (void *)thp_trusted_owner, +}; diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_vma.= c b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_vma.c new file mode 100644 index 000000000000..df7b0c160153 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_thp_adjust_trusted_vma.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +SEC("struct_ops/thp_get_order") +__failure __msg("R1 invalid mem access 'trusted_ptr_or_null_'") +int BPF_PROG(thp_trusted_vma, struct vm_area_struct *vma, enum tva_type tv= a_type, + unsigned long orders) +{ + struct mm_struct *mm =3D vma->vm_mm; + struct task_struct *p =3D mm->owner; + + if (!p) + return 0; + return -1; +} + +SEC(".struct_ops.link") +struct bpf_thp_ops vma_ops =3D { + .thp_get_order =3D (void *)thp_trusted_vma, +}; --=20 2.47.3 From nobody Wed Oct 1 22:26:32 2025 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBF912C21FA for ; Fri, 26 Sep 2025 09:35:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879327; cv=none; b=o5BPjKTTKhaU82ajbALWnQkDrFwrWZJApgY7HMBGz3JXI2AA8EVQiFth9oqmxeJfWWSr+lXnqtgqzzxuw5XQ1OiE0Pq4Mu4z95EMCsxYH7rgsgevp2Ew+CeMIE7HjZ7D1x108squwV313qabpr3/PXmjB9Pv71//qMkX5C8HYkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758879327; c=relaxed/simple; bh=UBiranDCCKyEzq2tHLJoYwbpNJseAjiAQrhPN7Tvoyk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XI1iaoNJLwZBoLrlXMRe0syw0TfOi3IYAOfcY7AefiPSv9iTrrQnOrmz4SdKuciB7SdalHtN5WIdeiN2MCBbO1sq86u90upkqiDo8tHfm659STtGizyWDZjZe5jHimRSsNmc04+OrzKZiQxJIDtjoEUBMNDS9qnDQgeOO+p9YJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TZTZNqqX; arc=none smtp.client-ip=209.85.215.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TZTZNqqX" Received: by mail-pg1-f181.google.com with SMTP id 41be03b00d2f7-b57de3f63c8so522827a12.3 for ; Fri, 26 Sep 2025 02:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758879325; x=1759484125; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=huK0MPI+64vH11fsWziyabmXxqecdshCDlVy93FTD7M=; b=TZTZNqqXH+AfeJ44oI7xZfdHiyZfUVc36Sbzn8bU6P3ojXxKXwPchN/xSYUw+Q0zKy R2EdxBuB6bCnR/VCKuteb6e3Drc+ulJehOIy3mL7CEvYPjoQv0vX4o5eQXCD0boZ9/jh 0U5plxSxr5g7+k4zk+A0aAhpLG+p0SpDn/NNfzGn37ujpQaKHHYmtEOqcUqxICfuAZIA VhBnbTStPdJueLt+6J1ePe7jKDnSiVutxHXQD5u+JL0TEBnxtvIfKnFojzyMl9yhLaxj HyRNHwCvpDpzSUXizriFjI0x8Sb1hay8fwKZrmXySw+6hBpdlCsJCWU7QBivdGBbd5nZ bJqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758879325; x=1759484125; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=huK0MPI+64vH11fsWziyabmXxqecdshCDlVy93FTD7M=; b=rs5N3Y2Vy7kGDBSbNldCPIzQGPuOLlsZA8btjCJlc1/7H0i/QFaq8GlgPvrIoIUY+Z XfLyk1s/f6E8Qro3lmH0S4v8GE16I15ym6p9dZLZIgAXwLOig5gZpqxo/TCsbk0AENOu 4P9NtWze4kY2y7fpl9SfxqzIV6WRVVkNdyoF3+xRIzJtHtDWv7wAKQgwk6Dloijkpiu5 IXue84hNuBCqX+YS9nVve1WIe8HLsV+GgIzT6tJeJwR/n3mHAFNQhLhtIw2LeWXbAH/Z W8ELhNPAzky2DmuJBtFCpRH0HhhouyvcwXUuwjFl+7CZdvbAxE7SAclwx6C0LgxcDpL3 3I+Q== X-Forwarded-Encrypted: i=1; AJvYcCV2GmMzq49FyvGPCo8BWonU92vcYJprHIJkKkGAOydzfg/zpSW1/say6AZNvReV0uG8fbrzmuDaHLKt1HI=@vger.kernel.org X-Gm-Message-State: AOJu0Yxvraoa5seSo/zgTMnSB7AvR6B1MLtkm15je+6ukOXWloJqGztA hrfgFny/qrok7TJzZOMrcaFsMXV1HEXyJA3tlvR+1ecKnE1hkzfljdA4 X-Gm-Gg: ASbGncsC6IcADe9Xbu9wL9cSmwlb+AnabsdNVFAixn9luQf16XVflLUbSvNdEHY8ilW eLCuiYTgZiikaCFHfMWcZ5ETJX//WxB97G/gTEhxh5ahZn8gA46fEc+IXGYaIh2k/qsckYRsqqt xXqWKp+Id0hHFHtBBdWoHMxxqVypH4RPa+5WrUfRjJ3AMEiFcDnvFgvMcTfKmBSw3dS3HDMMqBm gttbtHgpdRmwMWekaHjGxYYnzkSM4X5PeX6Iu2tyrzVj4mrWLhelvWDOSuKeDmdHRyYba5XX2y8 InWYW+jIVSDIFlESZ6GwryR1cPegEewl7ubvZXwzeVCBhF6oPANE5t+kL6/qoM2LRWaK5SL6yI9 ZxylEl2ISA0TncbOoPiU3H+hlm6U/xXwiFcGIIHaM+OfaoA2YzWHtJ2YWXCpLcCOM4ksmrFelr5 9yTW1qEmKIYtVd X-Google-Smtp-Source: AGHT+IGaBFISMjqZ56jhqOkqG6YgWjBFJGtoPxP49Jy/1yngaLT4ocjmKacYDQWcrmZR3MpcJrfLmA== X-Received: by 2002:a17:903:1c2:b0:26b:5346:5857 with SMTP id d9443c01a7336-27ed4a3719bmr75144915ad.24.1758879324875; Fri, 26 Sep 2025 02:35:24 -0700 (PDT) Received: from localhost.localdomain ([2409:891f:1c21:566:e1d1:c082:790c:7be6]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-27ed66cda43sm49247475ad.25.2025.09.26.02.35.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 26 Sep 2025 02:35:24 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net, 21cnbao@gmail.com, shakeel.butt@linux.dev, tj@kernel.org, lance.yang@linux.dev Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v8 mm-new 12/12] Documentation: add BPF-based THP policy management Date: Fri, 26 Sep 2025 17:33:43 +0800 Message-Id: <20250926093343.1000-13-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250926093343.1000-1-laoar.shao@gmail.com> References: <20250926093343.1000-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the documentation. Signed-off-by: Yafang Shao --- Documentation/admin-guide/mm/transhuge.rst | 39 ++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 1654211cc6cf..fa03bcdb8854 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -738,3 +738,42 @@ support enabled just fine as always. No difference can= be noted in hugetlbfs other than there will be less overall fragmentation. All usual features belonging to hugetlbfs are preserved and unaffected. libhugetlbfs will also work fine as usual. + +BPF-based THP adjustment +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Overview +-------- + +When the system is configured with "always" or "madvise" THP mode, a BPF p= rogram +can be used to adjust THP allocation policies dynamically. This enables +fine-grained control over THP decisions based on various factors including +workload identity, allocation context, and system memory pressure. + +Program Interface +----------------- + +This feature implements a struct_ops BPF program with the following interf= ace:: + + int thp_get_order(struct vm_area_struct *vma, + enum tva_type type, + unsigned long orders); + +Parameters:: + + @vma: vm_area_struct associated with the THP allocation + @type: TVA type for current @vma + @orders: Bitmask of available THP orders for this allocation + +Return value:: + + The suggested THP order for allocation from the BPF program. Must be + a valid, available order. + +Implementation Notes +-------------------- + +This is currently an experimental feature. +CONFIG_BPF_THP_GET_ORDER_EXPERIMENTAL must be enabled to use it. +Only one BPF program can be attached at a time, but the program can be upd= ated +dynamically to adjust policies without requiring affected tasks to be rest= arted. --=20 2.47.3