From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DECA2E36EF; Fri, 15 Aug 2025 13:55:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266160; cv=none; b=CZcfQwePwiBoUJBXcCAHF8jS9UAWFmAZ1jFiwm3pWL3nL0fNO8Ak9RKOBQUHgE9orP/T3rcnECKHep37OZtWde5f/CSDDB6umHDTpiTvly2GW7aYrMoc3ArjDxDqXdDIJpzd9heZTokAMeMCoZd6392rBsXXzZsURQn/7LVnqZg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266160; c=relaxed/simple; bh=clzkXZMykLrjrnKo2KZHjDNwa7wN2qhvT+IRnTjQP6U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jCEsGL65nlvnR6+UpS+KT2JPwIcIlEEq/0Nb3/VE3YdeNlBcc3X+mEHjxdaQExcyFGM3bVjpdxU6pRF9RaRPiF3CdCQjf9GFarrJz2Pe/VEx6cnvrBqQhqPqdXtqG0j7wfpn/sDRUld4nZ7MIoaIadVSs4+NySzX+wBivLJqM54= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=B2h1dKmT; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B2h1dKmT" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-4b109c63e84so21530421cf.3; Fri, 15 Aug 2025 06:55:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266157; x=1755870957; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MFkut+jxkX2kCL0o1G+AwY8Knubz8aROAdKPd/K2Kek=; b=B2h1dKmT54ymcCL6L7VEcOX8ujxPMJ7b5VWSBNJT4rSHGDaAd1qCSx9w1oFNEH9Czf TLBhBi9W5/yBpzLQ1pIryegnKfPS+UCuNO+BD97n2mqJG5/OgnS8Il2nntDH8i8FhP9u COaiKkQA5TqrKhFOgadrpIttYbG/0uaWVCyX17VkXL34K/7CQIVlVfTquRkOFE8RLWa0 RKDFTE2L64btiEp+2BSKU3cEvw4Bly0GvzGErEeZriIItxttlrzCGE6KjRIRRaytbHuE wAX6FJ32yccgut723H4+RxOegMacGlyWFoyv8KVv3D7MdwvunqdNfmQVwl/F72NEQQbX nXqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266157; x=1755870957; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MFkut+jxkX2kCL0o1G+AwY8Knubz8aROAdKPd/K2Kek=; b=pDfYanEJxrUv1OvXDPXqRiSAhu+lNj/4IWM0L4UpkzYgJcwogYB3yIXzSn0Fu4xDW7 5lCTAczfXErpLCjKnRCyogHIm+6jF6stocAEwx/acbsP+jDUA0XmtL+sDobTohizgeIn gRG/Mp3ZwLsDoV5d0Od741uRF5Chmel1yrMmbEBo3s2GkphtStLfHT613pQ+fcgfAy1U 0XO08jB/40hFhgkopN8zBOvINrUkCU/CzWffs2utorGrk8sDOxLAJW8Rmc7mKdK9wwOC Te9Bwo3hs+FV84ZNkOu2+JE/brJYTbw6aw5o6e4jekyyCsPVAk5h19dLS/vBgDL8tFXL tQZA== X-Forwarded-Encrypted: i=1; AJvYcCUl6e3molTlyjkvjTg91cjeLIdKJlRPVhs2ipejGcPRJ/nxpd3NiIECy+OziQp7sVbwxN5E7Qe/M5FBUwi9@vger.kernel.org, AJvYcCVSPN5WUFkeTS7Uf5UWLZM3YdWHaSEHOhezc+B6JKYX3IfB1wqMOsur3w5C6MdLMej5sgRcdlVGJCc=@vger.kernel.org X-Gm-Message-State: AOJu0Yy/sXHDRUdjJx4/EHK3jBaxZmGwfAFbqX/CQY8b2bK8Y4Q69r4G fMpUkK11a4NY00acm74B+sX2c7EBh4iyr4hvMVgkchVh/HdkW/7c55rp X-Gm-Gg: ASbGncu+1FtRND3J1hWJ74S1TW8PKiWLROX0E32+6ocZiz6Ne/sLr63hfDw6YJV5xOy 3cJaxFSDeRjmorSgLlbzMK+rkmwELmLZbAqeDFdM+Dx4nHhJqCAzUXAOxyj/B8sJeFgSRfhAtRS 5ejDPQ0wntob3vYD6+WzbijGDaJhQrv5iXuermouXa3tTMniMKkf5nhZ6KZsD88zRlL2iosX6ia P9b8qnneAqxXYO4/2LiAc/CM86rgN3sdZ0bnRe7ERE05M/CUmyvRkVAI3gQ2BV4Jf2ta0cDA/h4 V3YBFWOnqKb05tsidSfTWIqkHCmigi4w8NIp7XMZgDEuDm/ciCyGVtjG3ou5nDfMABQqiCYDSY7 jxFJUQfCEmcavO5YEbeJEKnaoy8YyTg== X-Google-Smtp-Source: AGHT+IGLy69e4IdVsbA189ZVbc6fQsA9PMvQiTjGOo5VRMpt/EUuAyATZ4S2zwc6xkvFgSRT+eDxfw== X-Received: by 2002:a05:622a:1e95:b0:4b0:77d7:c8b4 with SMTP id d75a77b69052e-4b11e0b9c29mr26892461cf.14.1755266156802; Fri, 15 Aug 2025 06:55:56 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:4::]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4b11dddb0e9sm9442771cf.32.2025.08.15.06.55.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:55:56 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 1/7] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE Date: Fri, 15 Aug 2025 14:54:53 +0100 Message-ID: <20250815135549.130506-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand People want to make use of more THPs, for example, moving from the "never" system policy to "madvise", or from "madvise" to "always". While this is great news for every THP desperately waiting to get allocated out there, apparently there are some workloads that require a bit of care during that transition: individual processes may need to opt-out from this behavior for various reasons, and this should be permitted without needing to make all other workloads on the system similarly opt-out. The following scenarios are imaginable: (1) Switch from "none" system policy to "madvise"/"always", but keep THPs disabled for selected workloads. (2) Stay at "none" system policy, but enable THPs for selected workloads, making only these workloads use the "madvise" or "always" policy. (3) Switch from "madvise" system policy to "always", but keep the "madvise" policy for selected workloads: allocate THPs only when advised. (4) Stay at "madvise" system policy, but enable THPs even when not advised for selected workloads -- "always" policy. Once can emulate (2) through (1), by setting the system policy to "madvise"/"always" while disabling THPs for all processes that don't want THPs. It requires configuring all workloads, but that is a user-space problem to sort out. (4) can be emulated through (3) in a similar way. Back when (1) was relevant in the past, as people started enabling THPs, we added PR_SET_THP_DISABLE, so relevant workloads that were not ready yet (i.e., used by Redis) were able to just disable THPs completely. Redis still implements the option to use this interface to disable THPs completely. With PR_SET_THP_DISABLE, we added a way to force-disable THPs for a workload -- a process, including fork+exec'ed process hierarchy. That essentially made us support (1): simply disable THPs for all workloads that are not ready for THPs yet, while still enabling THPs system-wide. The quest for handling (3) and (4) started, but current approaches (completely new prctl, options to set other policies per process, alternatives to prctl -- mctrl, cgroup handling) don't look particularly promising. Likely, the future will use bpf or something similar to implement better policies, in particular to also make better decisions about THP sizes to use, but this will certainly take a while as that work just started. Long story short: a simple enable/disable is not really suitable for the future, so we're not willing to add completely new toggles. While we could emulate (3)+(4) through (1)+(2) by simply disabling THPs completely for these processes, this is a step backwards, because these processes can no longer allocate THPs in regions where THPs were explicitly advised: regions flagged as VM_HUGEPAGE. Apparently, that imposes a problem for relevant workloads, because "not THPs" is certainly worse than "THPs only when advised". Could we simply relax PR_SET_THP_DISABLE, to "disable THPs unless not explicitly advised by the app through MAD_HUGEPAGE"? *maybe*, but this would change the documented semantics quite a bit, and the versatility to use it for debugging purposes, so I am not 100% sure that is what we want -- although it would certainly be much easier. So instead, as an easy way forward for (3) and (4), add an option to make PR_SET_THP_DISABLE disable *less* THPs for a process. In essence, this patch: (A) Adds PR_THP_DISABLE_EXCEPT_ADVISED, to be used as a flag in arg3 of prctl(PR_SET_THP_DISABLE) when disabling THPs (arg2 !=3D 0). prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED). (B) Makes prctl(PR_GET_THP_DISABLE) return 3 if PR_THP_DISABLE_EXCEPT_ADVISED was set while disabling. Previously, it would return 1 if THPs were disabled completely. Now it returns the set flags as well: 3 if PR_THP_DISABLE_EXCEPT_ADVISED was set. (C) Renames MMF_DISABLE_THP to MMF_DISABLE_THP_COMPLETELY, to express the semantics clearly. Fortunately, there are only two instances outside of prctl() code. (D) Adds MMF_DISABLE_THP_EXCEPT_ADVISED to express "no THP except for VMAs with VM_HUGEPAGE" -- essentially "thp=3Dmadvise" behavior Fortunately, we only have to extend vma_thp_disabled(). (E) Indicates "THP_enabled: 0" in /proc/pid/status only if THPs are disabled completely Only indicating that THPs are disabled when they are really disabled completely, not only partially. For now, we don't add another interface to obtained whether THPs are disabled partially (PR_THP_DISABLE_EXCEPT_ADVISED was set). If ever required, we could add a new entry. The documented semantics in the man page for PR_SET_THP_DISABLE "is inherited by a child created via fork(2) and is preserved across execve(2)" is maintained. This behavior, for example, allows for disabling THPs for a workload through the launching process (e.g., systemd where we fork() a helper process to then exec()). For now, MADV_COLLAPSE will *fail* in regions without VM_HUGEPAGE and VM_NOHUGEPAGE. As MADV_COLLAPSE is a clear advise that user space thinks a THP is a good idea, we'll enable that separately next (requiring a bit of cleanup first). There is currently not way to prevent that a process will not issue PR_SET_THP_DISABLE itself to re-enable THP. There are not really known users for re-enabling it, and it's against the purpose of the original interface. So if ever required, we could investigate just forbidding to re-enable them, or make this somehow configurable. Acked-by: Usama Arif Tested-by: Usama Arif Signed-off-by: David Hildenbrand Reviewed-by: Lorenzo Stoakes Signed-off-by: Usama Arif Acked-by: Zi Yan --- Documentation/filesystems/proc.rst | 5 ++- fs/proc/array.c | 2 +- include/linux/huge_mm.h | 20 +++++++--- include/linux/mm_types.h | 14 +++---- include/uapi/linux/prctl.h | 10 +++++ kernel/sys.c | 59 ++++++++++++++++++++++++------ mm/khugepaged.c | 2 +- 7 files changed, 83 insertions(+), 29 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems= /proc.rst index 2971551b72353..915a3e44bc120 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -291,8 +291,9 @@ It's slow but very precise. HugetlbPages size of hugetlb memory portions CoreDumping process's memory is currently being dumped (killing the process may lead to a corrupted = core) - THP_enabled process is allowed to use THP (returns 0 when - PR_SET_THP_DISABLE is set on the process + THP_enabled process is allowed to use THP (returns 0 when + PR_SET_THP_DISABLE is set on the process to d= isable + THP completely, not just partially) Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread diff --git a/fs/proc/array.c b/fs/proc/array.c index c286dc12325ed..d84b291dd1ed8 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -422,7 +422,7 @@ static inline void task_thp_status(struct seq_file *m, = struct mm_struct *mm) bool thp_enabled =3D IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE); =20 if (thp_enabled) - thp_enabled =3D !mm_flags_test(MMF_DISABLE_THP, mm); + thp_enabled =3D !mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); seq_printf(m, "THP_enabled:\t%d\n", thp_enabled); } =20 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 84b7eebe0d685..22b8b067b295e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -318,16 +318,26 @@ struct thpsize { (transparent_hugepage_flags & \ (1<vm_mm)) + return true; /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. + * Are THPs disabled only for VMAs where we didn't get an explicit + * advise to use them? */ - return (vm_flags & VM_NOHUGEPAGE) || - mm_flags_test(MMF_DISABLE_THP, vma->vm_mm); + if (vm_flags & VM_HUGEPAGE) + return false; + return mm_flags_test(MMF_DISABLE_THP_EXCEPT_ADVISED, vma->vm_mm); } =20 static inline bool thp_disabled_by_hw(void) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 47d2e4598acd6..3b369dfbbedd6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1781,19 +1781,17 @@ enum { #define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */ #define MMF_VM_HUGEPAGE 17 /* set when mm is available for khugepaged */ =20 -/* - * This one-shot flag is dropped due to necessity of changing exe once aga= in - * on NFS restore - */ -//#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_file() */ +#define MMF_HUGE_ZERO_FOLIO 18 /* mm has ever used the global huge ze= ro folio */ =20 #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ #define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */ #define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */ -#define MMF_HUGE_ZERO_FOLIO 23 /* mm has ever used the global huge ze= ro folio */ -#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ -#define MMF_DISABLE_THP_MASK _BITUL(MMF_DISABLE_THP) + +#define MMF_DISABLE_THP_EXCEPT_ADVISED 23 /* no THP except when advised (e= .g., VM_HUGEPAGE) */ +#define MMF_DISABLE_THP_COMPLETELY 24 /* no THP for all VMAs */ +#define MMF_DISABLE_THP_MASK (_BITUL(MMF_DISABLE_THP_COMPLETELY) | \ + _BITUL(MMF_DISABLE_THP_EXCEPT_ADVISED)) #define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */ #define MMF_MULTIPROCESS 26 /* mm is shared between processes */ /* diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index ed3aed264aeb2..150b6deebfb1e 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -177,7 +177,17 @@ struct prctl_mm_map { =20 #define PR_GET_TID_ADDRESS 40 =20 +/* + * Flags for PR_SET_THP_DISABLE are only applicable when disabling. Bit 0 + * is reserved, so PR_GET_THP_DISABLE can return "1 | flags", to effective= ly + * return "1" when no flags were specified for PR_SET_THP_DISABLE. + */ #define PR_SET_THP_DISABLE 41 +/* + * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE / + * VM_HUGEPAGE). + */ +# define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) #define PR_GET_THP_DISABLE 42 =20 /* diff --git a/kernel/sys.c b/kernel/sys.c index 605f7fe9a1432..a46d9b75880b8 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2452,6 +2452,51 @@ static int prctl_get_auxv(void __user *addr, unsigne= d long len) return sizeof(mm->saved_auxv); } =20 +static int prctl_get_thp_disable(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + struct mm_struct *mm =3D current->mm; + + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + + /* If disabled, we return "1 | flags", otherwise 0. */ + if (mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm)) + return 1; + else if (mm_flags_test(MMF_DISABLE_THP_EXCEPT_ADVISED, mm)) + return 1 | PR_THP_DISABLE_EXCEPT_ADVISED; + return 0; +} + +static int prctl_set_thp_disable(bool thp_disable, unsigned long flags, + unsigned long arg4, unsigned long arg5) +{ + struct mm_struct *mm =3D current->mm; + + if (arg4 || arg5) + return -EINVAL; + + /* Flags are only allowed when disabling. */ + if ((!thp_disable && flags) || (flags & ~PR_THP_DISABLE_EXCEPT_ADVISED)) + return -EINVAL; + if (mmap_write_lock_killable(current->mm)) + return -EINTR; + if (thp_disable) { + if (flags & PR_THP_DISABLE_EXCEPT_ADVISED) { + mm_flags_clear(MMF_DISABLE_THP_COMPLETELY, mm); + mm_flags_set(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); + } else { + mm_flags_set(MMF_DISABLE_THP_COMPLETELY, mm); + mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); + } + } else { + mm_flags_clear(MMF_DISABLE_THP_COMPLETELY, mm); + mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); + } + mmap_write_unlock(current->mm); + return 0; +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, ar= g3, unsigned long, arg4, unsigned long, arg5) { @@ -2625,20 +2670,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, = arg2, unsigned long, arg3, return -EINVAL; return task_no_new_privs(current) ? 1 : 0; case PR_GET_THP_DISABLE: - if (arg2 || arg3 || arg4 || arg5) - return -EINVAL; - error =3D !!mm_flags_test(MMF_DISABLE_THP, me->mm); + error =3D prctl_get_thp_disable(arg2, arg3, arg4, arg5); break; case PR_SET_THP_DISABLE: - if (arg3 || arg4 || arg5) - return -EINVAL; - if (mmap_write_lock_killable(me->mm)) - return -EINTR; - if (arg2) - mm_flags_set(MMF_DISABLE_THP, me->mm); - else - mm_flags_clear(MMF_DISABLE_THP, me->mm); - mmap_write_unlock(me->mm); + error =3D prctl_set_thp_disable(arg2, arg3, arg4, arg5); break; case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 550eb00116c51..1a416b8659972 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -410,7 +410,7 @@ static inline int hpage_collapse_test_exit(struct mm_st= ruct *mm) static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) { return hpage_collapse_test_exit(mm) || - mm_flags_test(MMF_DISABLE_THP, mm); + mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm); } =20 static bool hugepage_pmd_enabled(void) --=20 2.47.3 From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAEF1303C8B; Fri, 15 Aug 2025 13:55:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266161; cv=none; b=WqdKG5v4dE4H26/1RenZYBJWFs/wYeD+T5fOPt9EubRcSDdeZOyjXUgx6NYSncSn1R8BUJdReQgefJfbydvxZ6hc43/MxMabj2eOBzYH1mb0RkRnxtXUB8cmwXXI3/kgwrRZBuPHHHWAxxBeClTXz/n4jnIvexkcmfWZ5FsLNuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266161; c=relaxed/simple; bh=6t1wYHlwoGOlE+wn0Qw1XhlU475Ovs9otb1Y7Cmkf3I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bfhNitbAwyPFLYQPnhM/lj++p+cGAFOQDyaNrQbMegmbT+p+U9lK/BZvvWSr452HRBlQsDgXUymLWl9kJnbc9doRrXqxbharULxx9wLiva7/h8LZGldexkHJLe8qHW+oKAId1glPbUYC0yH06M7tAmT3D9VWKuMI2YnH0HlAHzY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MJcXD3Gy; arc=none smtp.client-ip=209.85.219.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MJcXD3Gy" Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-70a92aec278so16276346d6.2; Fri, 15 Aug 2025 06:55:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266158; x=1755870958; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QxI2HK082hr0nl7ESEup9nCbSYB+C86WtbOumtQTza0=; b=MJcXD3GyBuyWxhvS3nI5QItfeMfaK0awtBLWBNjq60fI1CJFASxyBwUSJmLCl2Xv1N yNLWH1iAtEGwTYIbC4Ai09Xphg/unssGkQ/Au4hsh6NSm5bKRRjY92E/i28JdMpNx3LR auPfQN9Tuwb7dyus1DTDokZ+sBiKL/o2NkGVP14Q/1LeZIeYG8dkiKUrNgAv6jEYRzhQ 28PKpSqqg+6ifm9plEQjOdvaEsPZWdgbTvbyz7JprgH8TUxEqys7qvvCScOXOa3GYF8c cT/OH6uyBUaROCBXvR9PrtJSauR+jTPPzvU+wMQpg/vz/PeGzCXeWt/0c9cuYk9qx/os QvFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266158; x=1755870958; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QxI2HK082hr0nl7ESEup9nCbSYB+C86WtbOumtQTza0=; b=UCxkbpJPYo5Qpv1KQ85vMzZ1neCuW/U9kwNbUfCaDa2ncq1TrfdpNHCnTSdL7/sC0c adOyThqbgLpxGGmMYARys0wFvokz9IJYXVwrY+ov04gok+IVjN6NV+gaa1a4ACaWTdbu 43zFuq7N7c3Ywcjw6N9QLgX2q/M6xRxPR2tPAWjDn+g3HwGSaG/POadUD1ZKvUMn6DpT Bi+wr0VX4WzbixTonuSB/GUTbFe6wtpeQJyJmFx81VHjhn8PmjZvEYhNErXvPd6lBI2a oTHRH/9xwa34xh/ukGRqrWL9uXEOtN2wXgOSeWpMdxiixF2R91L6RnNXIECPIQXryweO eBeg== X-Forwarded-Encrypted: i=1; AJvYcCUsKfr5yHFbhuy4P8zrubqiNBg+w1suXZGr7gdnDja2tNfpHgBiDk0gPb/gMuq0FV6KJExKANsxyYo=@vger.kernel.org, AJvYcCXWKzBSh2MH6fx0D2BrlX+BfthfhrM37O2h3QFNCJ/nmxcz2A+V5poIeTDnl4mvFC149wxaBXMbcmrjQTQ6@vger.kernel.org X-Gm-Message-State: AOJu0YzqMeoxa+Aq56WoR3qjWK85gmPb5OZy4u04Mk0/Cv5yIjI28WOK VCmcXNjGrlMDbBHatze1/YbCoMpnqhmh7SeN9q9B07jBLVVhewG1eN5Q X-Gm-Gg: ASbGncvQ1ks1vv5/5eaEDCr2ExmkUhB3J+w/vSiFiK3SCzeP8tgRHh7jBy9jMpmVAgf 53tMJ2R8/ZaPykcVVWnGKGCH+oqKqMnX3tAAT0rIKUa6PZ2UTs1kkCRMnr9GLHjXtKsq16mMxfb +qk9oHG/k+JfSRwfTNT4E43G6wJbQPj/eJv6XQOD8msJXaVitOhGlGgZC2RwFkgBMhhv2kr3lT5 v8huRpjL6KNzc+OM6H1llU/JELctrYsTGMFENhZt3HD5C3jaGYG+hyygQtPCl0EIgJT7NXOiGb9 Red6RfYftGLZ4uao3nnGrsnpFaOBtSefz/rBg7bd0haulAelm4Dkk5VD2AqZtQNF2jxcFzoSDXi /5Ali6e7TCy+/8ex1Hgs= X-Google-Smtp-Source: AGHT+IHBCaFDPyBU147r0jvUsBcLUxCE+/fT36HT2b9gDmr/4eyis+eapRa6b48Qx3YPx+T7pRkFMg== X-Received: by 2002:a05:6214:27ef:b0:705:1647:6dfa with SMTP id 6a1803df08f44-70ba7b1e930mr20556356d6.17.1755266158319; Fri, 15 Aug 2025 06:55:58 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:8::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-70ba902f4f8sm8339556d6.8.2025.08.15.06.55.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:55:57 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 2/7] mm/huge_memory: convert "tva_flags" to "enum tva_type" Date: Fri, 15 Aug 2025 14:54:54 +0100 Message-ID: <20250815135549.130506-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand When determining which THP orders are eligible for a VMA mapping, we have previously specified tva_flags, however it turns out it is really not necessary to treat these as flags. Rather, we distinguish between distinct modes. The only case where we previously combined flags was with TVA_ENFORCE_SYSFS, but we can avoid this by observing that this is the default, except for MADV_COLLAPSE or an edge cases in collapse_pte_mapped_thp() and hugepage_vma_revalidate(), and adding a mode specifically for this case - TVA_FORCED_COLLAPSE. We have: * smaps handling for showing "THPeligible" * Pagefault handling * khugepaged handling * Forced collapse handling: primarily MADV_COLLAPSE, but also for an edge case in collapse_pte_mapped_thp() Disregarding the edge cases, we only want to ignore sysfs settings only when we are forcing a collapse through MADV_COLLAPSE, otherwise we want to enforce it, hence this patch does the following flag to enum conversions: * TVA_SMAPS | TVA_ENFORCE_SYSFS -> TVA_SMAPS * TVA_IN_PF | TVA_ENFORCE_SYSFS -> TVA_PAGEFAULT * TVA_ENFORCE_SYSFS -> TVA_KHUGEPAGED * 0 -> TVA_FORCED_COLLAPSE With this change, we immediately know if we are in the forced collapse case, which will be valuable next. Signed-off-by: David Hildenbrand Acked-by: Usama Arif Signed-off-by: Usama Arif Reviewed-by: Baolin Wang Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- fs/proc/task_mmu.c | 4 ++-- include/linux/huge_mm.h | 30 ++++++++++++++++++------------ mm/huge_memory.c | 8 ++++---- mm/khugepaged.c | 17 ++++++++--------- mm/memory.c | 14 ++++++-------- 5 files changed, 38 insertions(+), 35 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e8e7bef345313..ced01cf3c5ab3 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1369,8 +1369,8 @@ static int show_smap(struct seq_file *m, void *v) __show_smap(m, &mss, false); =20 seq_printf(m, "THPeligible: %8u\n", - !!thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL)); + !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS, + THP_ORDERS_ALL)); =20 if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 22b8b067b295e..92ea0b9771fae 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -94,12 +94,15 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define THP_ORDERS_ALL \ (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAU= LT) =20 -#define TVA_SMAPS (1 << 0) /* Will be used for procfs */ -#define TVA_IN_PF (1 << 1) /* Page fault handler */ -#define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */ +enum tva_type { + TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */ + TVA_PAGEFAULT, /* Serving a page fault. */ + TVA_KHUGEPAGED, /* Khugepaged collapse. */ + TVA_FORCED_COLLAPSE, /* Forced collapse (e.g. MADV_COLLAPSE). */ +}; =20 -#define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ - (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) +#define thp_vma_allowable_order(vma, vm_flags, type, order) \ + (!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order))) =20 #define split_folio(f) split_folio_to_list(f, NULL) =20 @@ -264,14 +267,14 @@ static inline unsigned long thp_vma_suitable_orders(s= truct vm_area_struct *vma, =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders); =20 /** * thp_vma_allowable_orders - determine hugepage orders that are allowed f= or vma * @vma: the vm area to check * @vm_flags: use these vm_flags instead of vma->vm_flags - * @tva_flags: Which TVA flags to honour + * @type: TVA type * @orders: bitfield of all orders to consider * * Calculates the intersection of the requested hugepage orders and the al= lowed @@ -285,11 +288,14 @@ unsigned long __thp_vma_allowable_orders(struct vm_ar= ea_struct *vma, static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders) { - /* Optimization to check if required orders are enabled early. */ - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { + /* + * Optimization to check if required orders are enabled early. Only + * forced collapse ignores sysfs configs. + */ + if (type !=3D TVA_FORCED_COLLAPSE && vma_is_anonymous(vma)) { unsigned long mask =3D READ_ONCE(huge_anon_orders_always); =20 if (vm_flags & VM_HUGEPAGE) @@ -303,7 +309,7 @@ unsigned long thp_vma_allowable_orders(struct vm_area_s= truct *vma, return 0; } =20 - return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); + return __thp_vma_allowable_orders(vma, vm_flags, type, orders); } =20 struct thpsize { @@ -547,7 +553,7 @@ static inline unsigned long thp_vma_suitable_orders(str= uct vm_area_struct *vma, =20 static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct= *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6df1ed0cef5cf..9c716be949cbf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -99,12 +99,12 @@ static inline bool file_thp_enabled(struct vm_area_stru= ct *vma) =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders) { - bool smaps =3D tva_flags & TVA_SMAPS; - bool in_pf =3D tva_flags & TVA_IN_PF; - bool enforce_sysfs =3D tva_flags & TVA_ENFORCE_SYSFS; + const bool smaps =3D type =3D=3D TVA_SMAPS; + const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; + const bool enforce_sysfs =3D type !=3D TVA_FORCED_COLLAPSE; unsigned long supported_orders; =20 /* Check the intersection of requested and supported orders. */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1a416b8659972..d3d4f116e14b6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -474,8 +474,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS, - PMD_ORDER)) + if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) __khugepaged_enter(vma->vm_mm); } } @@ -921,7 +920,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, struct collapse_control *cc) { struct vm_area_struct *vma; - unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; + enum tva_type type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : + TVA_FORCED_COLLAPSE; =20 if (unlikely(hpage_collapse_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; @@ -932,7 +932,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, =20 if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, type, PMD_ORDER)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1533,9 +1533,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, * in the page cache with a single hugepage. If a mm were to fault-in * this memory (mapped by a suitably aligned VMA), we'd get the hugepage * and map it by a PMD, regardless of sysfs THP settings. As such, let's - * analogously elide sysfs THP settings here. + * analogously elide sysfs THP settings here and force collapse. */ - if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return SCAN_VMA_CHECK; =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ @@ -2432,8 +2432,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, - TVA_ENFORCE_SYSFS, PMD_ORDER)) { + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORD= ER)) { skip: progress++; continue; @@ -2767,7 +2766,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, BUG_ON(vma->vm_start > start); BUG_ON(vma->vm_end < end); =20 - if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return -EINVAL; =20 cc =3D kmalloc(sizeof(*cc), GFP_KERNEL); diff --git a/mm/memory.c b/mm/memory.c index 002c28795d8b7..7b1e8f137fa3f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4515,8 +4515,8 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * Get a list of all the (large) orders below PMD_ORDER that are enabled * and suitable for swapping THP. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swap_suitable_orders(swp_offset(entry), vmf->address, orders); @@ -5063,8 +5063,8 @@ static struct folio *alloc_anon_folio(struct vm_fault= *vmf) * for this vma. Then filter out the orders that can't be allocated over * the faulting address and still be fully contained in the vma. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); =20 if (!orders) @@ -6254,8 +6254,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, return VM_FAULT_OOM; retry_pud: if (pud_none(*vmf.pud) && - thp_vma_allowable_order(vma, vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) { + thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PUD_ORDER)) { ret =3D create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -6289,8 +6288,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, goto retry_pud; =20 if (pmd_none(*vmf.pmd) && - thp_vma_allowable_order(vma, vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, PMD_ORDER)) { + thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) { ret =3D create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; --=20 2.47.3 From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECDAB304968; Fri, 15 Aug 2025 13:56:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266162; cv=none; b=TFD8mfZ7S3+PwTmo9Z8bMiLgTcYbXuLmY57jQKZcv8e5nM4NCBzlCwTKEqO4bLjCZ74drGYg0xU4w5ieAtIxzMRvwDY1oIKr7KIjg2zf4qAHvLybrMLp07lm1w0CKLXJnbhlSkMCxUkNDVl9Vri2+uhADqbESPg1b33IQAUYILA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266162; c=relaxed/simple; bh=f7Q1hmAYhKiSLYKCGrqxDUw/31jE44vHvKD+blKxRWI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iu4PfaF+WacbWlg1CkJlNWUVCIGEQmqQ/ZD5IqVhIpiDnmNrGI1cuEKIbLDLUttwCCF5PDwkR5v7TYVouRY+9n0p6D1+CRCKqWIDx7UvFGA2L90LQIxXnuA7X4vSmb7iyIsTE6mtxBKxiyTBhmi8V2AXrKlD/vhKqTpV0FsIr04= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MTyFuRX/; arc=none smtp.client-ip=209.85.222.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MTyFuRX/" Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-7e87031ce70so120583585a.0; Fri, 15 Aug 2025 06:56:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266160; x=1755870960; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HjAM2kVPg7HSAJDIq3Fm+b8P0QFUYqeRJ1MrMImZHTY=; b=MTyFuRX/zpZtR8CN+63Syl41R+HI0TlKTOmUp2XmjV3EqnZ+NJLtmPpcumCK9dXIDF sNZlPC3yqLaPQ+zFuXtfEOWOFlYi8knw/nz9C96cACjrW+Ho8ILaVQHmAC2DfjiYn2QT hbkmcvBaSEEm4Y4APNDQHrcSAXUFp5Fh0BdYf2PmoWy+JnxXjipqmoVxCzqKJlOKlYG/ POsBTIEsSPzdX/8+BdlDisiZ2T8DYeu9mjE2OknyGQhZZ2LhcS50tzteiFDkRg4SxCE8 8tf/ZxiHC+R0et9CDpHxvgnW5to5p/6YvzroM7m11bcaascqXrGOE5ysawfY+D8ZHPY+ KQaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266160; x=1755870960; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HjAM2kVPg7HSAJDIq3Fm+b8P0QFUYqeRJ1MrMImZHTY=; b=aRhNnGqCOyD7Kt/0DNkNM8FvFfo7rmXBdbBMG2IHim7H/o9T/jXHO/95zqPnYGY03G m5tULPC2hCWLpAVtk+3fHIi88fOoTnjgEimz8YJdUGlhfnCAWNtsUBSYnJyBXagzm8Xc 4413Xjv/TXofQPVLlJaHTzWiHHb23RQV5l9VPnRpYNp5wr6CPieg9DQBCC/wd0wnpTjZ bomuRtZlA4dluvK1zJiDXZxjC34DaXM6yzAZkVLHZ6SXMKmg2cKjV0IfmMezIzd+x2Gs A8ioGoQeC57JQ6lHmOGlgHSiDSrqgI5M2mfHqJBecEsLn+bjX0KR2znmBGXjqmRgVoC4 PPDQ== X-Forwarded-Encrypted: i=1; AJvYcCUtaXCawSr1p3VNvDiHDCDuHxWeD78DPbQWrqBbMd+B6sL8W+f9fYqS77Z1gBkTGKOkIsYy79d1ZXXH8TUm@vger.kernel.org, AJvYcCX/ThuXvW5omdixt7UX2JRHtxNxP3pxI+6tLHYAfXSSuG0nvxoDf1sZVgQ0nU/ZsuEriicHoWAplSU=@vger.kernel.org X-Gm-Message-State: AOJu0YzrKfnhKFfvBBzLm8/5FNJeTGL4DZhBZcG0cRhsTlPnVugFL43a DmuKOEHZXq0W+iwT2oxFLDJSvFEmS5qd2ojhzjLpK5goz6bgW9JEsgze X-Gm-Gg: ASbGnctJXkksmi1ncbX5IEoA+ePIGg0Br499C/dLOH5yU5QVoFCdvA1bjGJEPxg/cku uuO2vqk3R6tRD7OiXDalfeZQF1qWGkIvlzABL7Aw8vM6f5xhDv3L01LiV7ZHi5xo5bMbhdPTr2r x7znoXWpcyEth3BeAIHn5kP7EAzG921Zx4G4Z7VM9g4fmIkpN+BP6GeSVqU7/9FvgzLbjlXAdMi aLQn+G0GksFisbbTRm1Y1RCu06Miv58Z9opC4vy0H+ooUTGhIh/kY1EuIaJnbwMxTu3q5EmLmJT Z1appK0zqZAiWFkdbq8Ofq2mBebSMCuPwaEQeXlf0eP7n8kyylmszocXowNHopbfvhRwfZ+pkUP O1eBhl0L8OZH/0Sw3Al9S X-Google-Smtp-Source: AGHT+IFF9RW9CTYz5caN0M6+lm/SBlFq7E34PjjRXGEeqVn4nYoCKB9rubR5z1M4MA3PdwlDkbJAVw== X-Received: by 2002:a05:620a:370d:b0:7e6:28d3:c4df with SMTP id af79cd13be357-7e87df87270mr299191685a.14.1755266159571; Fri, 15 Aug 2025 06:55:59 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:74::]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7e87e1ddd9fsm108971485a.71.2025.08.15.06.55.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:55:58 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 3/7] mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED Date: Fri, 15 Aug 2025 14:54:55 +0100 Message-ID: <20250815135549.130506-4-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand Let's allow for making MADV_COLLAPSE succeed on areas that neither have VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED). MADV_COLLAPSE is a clear advice that we want to collapse. Note that we still respect the VM_NOHUGEPAGE flag, just like MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED, including for shmem. Co-developed-by: Usama Arif Signed-off-by: Usama Arif Signed-off-by: David Hildenbrand Reviewed-by: Baolin Wang Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- include/linux/huge_mm.h | 8 +++++++- include/uapi/linux/prctl.h | 2 +- mm/huge_memory.c | 5 +++-- mm/memory.c | 6 ++++-- mm/shmem.c | 2 +- 5 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 92ea0b9771fae..1ac0d06fb3c1d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -329,7 +329,7 @@ struct thpsize { * through madvise or prctl. */ static inline bool vma_thp_disabled(struct vm_area_struct *vma, - vm_flags_t vm_flags) + vm_flags_t vm_flags, bool forced_collapse) { /* Are THPs disabled for this VMA? */ if (vm_flags & VM_NOHUGEPAGE) @@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_str= uct *vma, */ if (vm_flags & VM_HUGEPAGE) return false; + /* + * Forcing a collapse (e.g., madv_collapse), is a clear advice to + * use THPs. + */ + if (forced_collapse) + return false; return mm_flags_test(MMF_DISABLE_THP_EXCEPT_ADVISED, vma->vm_mm); } =20 diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 150b6deebfb1e..51c4e8c82b1e9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -185,7 +185,7 @@ struct prctl_mm_map { #define PR_SET_THP_DISABLE 41 /* * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE / - * VM_HUGEPAGE). + * VM_HUGEPAGE, MADV_COLLAPSE). */ # define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) #define PR_GET_THP_DISABLE 42 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9c716be949cbf..1eca2d543449c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, { const bool smaps =3D type =3D=3D TVA_SMAPS; const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; - const bool enforce_sysfs =3D type !=3D TVA_FORCED_COLLAPSE; + const bool forced_collapse =3D type =3D=3D TVA_FORCED_COLLAPSE; + const bool enforce_sysfs =3D !forced_collapse; unsigned long supported_orders; =20 /* Check the intersection of requested and supported orders. */ @@ -122,7 +123,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, if (!vma->vm_mm) /* vdso */ return 0; =20 - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags, forced_collap= se)) return 0; =20 /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/memory.c b/mm/memory.c index 7b1e8f137fa3f..d9de6c0561794 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5332,9 +5332,11 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct f= olio *folio, struct page *pa * It is too late to allocate a small folio, we already have a large * folio in the pagecache: especially s390 KVM cannot tolerate any * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any - * PMD mappings if THPs are disabled. + * PMD mappings if THPs are disabled. As we already have a THP, + * behave as if we are forcing a collapse. */ - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags, + /* forced_collapse=3D*/ true)) return ret; =20 if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) diff --git a/mm/shmem.c b/mm/shmem.c index e2c76a30802b6..d945de3a7f0e7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1817,7 +1817,7 @@ unsigned long shmem_allowable_huge_orders(struct inod= e *inode, vm_flags_t vm_flags =3D vma ? vma->vm_flags : 0; unsigned int global_orders; =20 - if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags, shmem= _huge_force))) return 0; =20 global_orders =3D shmem_huge_global_enabled(inode, index, write_end, --=20 2.47.3 From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 131F330497F; Fri, 15 Aug 2025 13:56:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266163; cv=none; b=hZW1Pm9dbD15KPOV9nmXbudBR7E0uqA3QrZLgZ8e2n1FhcoMSb1Tc24dedkKKQE1i3pGFrs100zekp2TNePGtczkW/WSltnznuSH436+3a5/V1VewemvdqH6LjYQ7LCRQ5ahZhzXf8JVB7cmCOJM35hcYVJVLbvJsN4t56UhBi8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266163; c=relaxed/simple; bh=+aC8tuxth/3ftzMsKGRiTm7dbkZ167G+8GUSLlXDjmI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NcJLhGPgPYiceNu7rUjc0sBjC7XKqYScIyqxqXcDm0FvDrrNkPSiSpMDhj46BlPm4xwI3n51sl2Yp3qGSQBuva1pwJGjog23+rs+gDZT3di/ZjmWNYL23mTmvxzyEge7M/zA+qTKtR/xPyrZl9jFrB6zwJF8DoAOcHwRhhOlbh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KHnn9gXY; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KHnn9gXY" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-4b109919a09so25629961cf.0; Fri, 15 Aug 2025 06:56:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266161; x=1755870961; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=htOuw1jckUm3cFOQCy2Ez0sym5NjcBkm+jIoTXRQCgk=; b=KHnn9gXYPTVSMjYVnMApEgnAs2Y+UC6Bz045BRy9QJ02oa0ptX9p81ComlfkugnEtb 5F0VHblHPJb7oulmLHiM/Q1RVNWJOpboiJ9qfCT4cU07P3oADUowNQBu5Na9l9Si7wSQ TfBzUrx/+pVdO03QqZsV9aEZKjcdbS1YeoV0EDQQod3oTtakWxGuI2byTXccfcHH+zp8 dqVXiEeQcR+7LePQq+hgaiVguQmv9tOlBkzsxyWQlYeuw34bnuo9LUGL5zy0j5kXRO4x 0D1KA3tZGh6C14m2ESB+PxZ59rHTGjw9GOXHXWAid8BqgTXFCvtSq7v9exQM5+JJ+Jlf cnCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266161; x=1755870961; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=htOuw1jckUm3cFOQCy2Ez0sym5NjcBkm+jIoTXRQCgk=; b=wW2VKHx8d8lyAVTkEmYjfxqpRk0M0AACapt2+Ry99BgGAc0v0t6PLyoMPe3H3SQBlq J3rtLvMLmA3Yut87sGdJr/eMPSWWLh5rEK8VAOwbmUbzb+dlM3lklls1ifZC+CjIHx3G g7lzL89iP2CO3PzAVTPV3kC6z1oTfb4D7wD6Fj5CF0hMUAUgYVza28flPtW1Hyn6ZzuI ZhXj+NiabPiqJJ6vzwyDnvesblOMMWc3Wsw5hJkZQyzbGd3A49yGyFFcrD8nj20w3zam bRDgitOREQ6nfS9uS8aWzgwKkQjXEu2Q9Da0UIEO3iuaDUqpLg49Sf99cxGvBotWC+0Y 3NsA== X-Forwarded-Encrypted: i=1; AJvYcCVpNlpl4x/q9D7hB4harZcRLlVP8Q7k0Fnm9rssnYoe3ZmOY9czubq/xkVR+FRMUr+sjL6EqW0hKbPE6GrF@vger.kernel.org, AJvYcCWnBwW/U5GOryfizVXASZ+l5VCbkU/uKSrfB74atqVomk2RUwbKiBZD7P3OQRjV8ITSxsC3rsfMXmk=@vger.kernel.org X-Gm-Message-State: AOJu0YzE9FJfovdwj5zr/VWB9wjP3nMxqnhjzWc/AwO/C/Il7QWQCN29 qauC20yPs9/ck2N41UQljwQx3dUOfoZcUq4U8pzRtcNxSE3LY/9bosr5 X-Gm-Gg: ASbGncuzUGEFilfE081lGZ6qJ3nV5lV30o06nIxoKAzicnjX/X+M0Y7nA704rg1B/JS 6JLCFta/ZoFfhClK5mgJy70gjyQ9qvGtGEgjofvTV4fRkL3lyxm7ZTNfAtcXz3bXP2T7s9jbKm/ rNfA2MfP6/hySZ0o26e4fwuH1kxIApz1cV3rA8ogM7Fg/zfRHyRMcb4lfQEFxLH7oB0Si8YuEHv lDA3hIa4Cz9DEP0REuV3871mg98hVjE7u1c9hUfsDMHZsT5aFv1kuw6ZzhIKawRXP2n+1P2judd O18Nqfkw9bFNkoC01N8PGC+FGxxsRMuGdom/8eZPoYPiLFpQpPo1ldlnZUzxCNYCloSoF5Y5+1l auIOpmvj4tkoMkgsx7Z2V X-Google-Smtp-Source: AGHT+IFA5ocU0t0P6Vd1e5Jiomncb95U626f/HquasQ6giNUDflHW0b1pvVh3BD+6pcW5GWRqf2ajg== X-Received: by 2002:ac8:57d0:0:b0:4b0:89c2:68e0 with SMTP id d75a77b69052e-4b11e2c638cmr30410701cf.60.1755266160836; Fri, 15 Aug 2025 06:56:00 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:71::]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4b11de55b61sm9255701cf.56.2025.08.15.06.56.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:56:00 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 4/7] docs: transhuge: document process level THP controls Date: Fri, 15 Aug 2025 14:54:56 +0100 Message-ID: <20250815135549.130506-5-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This includes the PR_SET_THP_DISABLE/PR_GET_THP_DISABLE pair of prctl calls as well the newly introduced PR_THP_DISABLE_EXCEPT_ADVISED flag for the PR_SET_THP_DISABLE prctl call. Signed-off-by: Usama Arif Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- Documentation/admin-guide/mm/transhuge.rst | 36 ++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 370fba1134606..a16a04841b960 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -225,6 +225,42 @@ to "always" or "madvise"), and it'll be automatically = shutdown when PMD-sized THP is disabled (when both the per-size anon control and the top-level control are "never") =20 +process THP controls +-------------------- + +A process can control its own THP behaviour using the ``PR_SET_THP_DISABLE= `` +and ``PR_GET_THP_DISABLE`` pair of prctl(2) calls. The THP behaviour set u= sing +``PR_SET_THP_DISABLE`` is inherited across fork(2) and execve(2). These ca= lls +support the following arguments:: + + prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0): + This will disable THPs completely for the process, irrespective + of global THP controls or madvise(..., MADV_COLLAPSE) being used. + + prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0): + This will disable THPs for the process except when the usage of THPs is + advised. Consequently, THPs will only be used when: + - Global THP controls are set to "always" or "madvise" and + madvise(..., MADV_HUGEPAGE) or madvise(..., MADV_COLLAPSE) is used. + - Global THP controls are set to "never" and madvise(..., MADV_COLLAPSE) + is used. This is the same behavior as if THPs would not be disabled on + a process level. + Note that MADV_COLLAPSE is currently always rejected if + madvise(..., MADV_NOHUGEPAGE) is set on an area. + + prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0): + This will re-enable THPs for the process, as if they were never disabled. + Whether THPs will actually be used depends on global THP controls and + madvise() calls. + + prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0): + This returns a value whose bits indicate how THP-disable is configured: + Bits + 1 0 Value Description + |0|0| 0 No THP-disable behaviour specified. + |0|1| 1 THP is entirely disabled for this process. + |1|1| 3 THP-except-advised mode is set for this process. + Khugepaged controls ------------------- =20 --=20 2.47.3 From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9795730499E; Fri, 15 Aug 2025 13:56:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266165; cv=none; b=GBzIBvmVZrm1vmSGOqaMA0BkmeW2Klrt7lxa03Kvm9GjFYYSH8N+GyFe0BtMIwmqunEkdJuvzwy5CDKlc91SP781qrU7czLWpJCCg+QL9+YNZLcM1a5H0B/ssNfNUz0pONI3YawTpwuovq9JmDZ12tJ1AGGdThxkX5mtwruNO4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266165; c=relaxed/simple; bh=QO39INygjINebWP8oHh/0VwVoIxsdAq/h6+u4bYW3Bs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ke7fQLduDI80Gi7kARkouypPKOjjzX3SiKG91qQxdGR1zZaPkydOTMzjnkMq16WaoSfFKaj30C/1aVN/P6d05jFrxI9ij/MGBDQOu/6nXRlDqUTQq6/p5Spu24nNIfHOfrTbOlZOtt3TZsvKgdbLHPR6O+Iscl630M1YOLjtGwE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=i9co7TLv; arc=none smtp.client-ip=209.85.219.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="i9co7TLv" Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-70a88daec47so11518586d6.0; Fri, 15 Aug 2025 06:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266162; x=1755870962; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zKLbQqBppRJDcJgKqWsD8LPR5qN9TJ+cj2ycCND4QA8=; b=i9co7TLvwSqCKkwJ2Xh+rN2BtLC9z5zu6rQFp5JIVGAtj7PcfNY6YSayVbGpEbgipF wzqrcV5hqyiN8KCl+w/iwpxYLdRX4wug+AOZ2zi4/ErEXSiPcfx+SkR5zd0Yp14QalHT ja+LCpNToKPOFDHoJ/Cy9BJUQINs3A8vCdPoIMuM5FhBTNPzsO6QLFmBMQmLvVp0Z/44 L7/oR7AlHlnxHBHl5Vs2t1/fhhX5n17hroHyjNIlvqmIY8PELuPdHqpZ9InZseOCFrXD DbffaukHQhnvM2KOmXXKRyP4WzoTCUelTWKha8gpMuC69YHTthr8mMMyr4CqqJFUVxNR TjyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266162; x=1755870962; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zKLbQqBppRJDcJgKqWsD8LPR5qN9TJ+cj2ycCND4QA8=; b=QaAqBkiodQwhu7lj/Me2aBz62V6216snHHVZ2ai3zBlVksuoK+WxcPmBnMn+Fv1NmM S6o4Z/UedU4dFYms8xpMzQ2lQu1iLlkz9fs5M0gON+A9mKP6Lo+mzw7dlU5NuSqybVGd 4gNKYcx3RJdJAGnZ9XDBR04qIc96ZDKqGXVPOtog84fD4V94vmiw+6NLS9ejggvO5o9x BTH/g+1q4Ib/ahMCkYIY1FEa9PQAmMvXXkW+zG4/YYWLo6qe06ypRsDpl7shGniolChV X6k6byGBP7vch6JWONPbmCf9FwZsVMxbhPT6eA5aQP5/vYnHeNoPq/AZWtp7jyd/WpaP OV5w== X-Forwarded-Encrypted: i=1; AJvYcCWP/Ctcx8MrCx5p03P/AlqUkt2C2QbSlRdPsuGrqOKWcobLJKZZro1X1nLKtMkdG0fIlFXwINznxliYLHKy@vger.kernel.org, AJvYcCWXolaNwVQznarkZDnJwEBBrwLjF2M5qc2R9q8qaw5lUp5d1AOFm856OmDW2JDebsmm0/gIYZtEIM0=@vger.kernel.org X-Gm-Message-State: AOJu0YzuXIHwdi3lamocvFTiOyVo5QWzQTlC5+gusYHbyLSdF9soD+jf ixd8pwr7VipyUuX9InuWiYhcdWthT+O1cszAN957eaLdX0YQtSGZksfg X-Gm-Gg: ASbGncuaFMedzKnYEwQgxTDXwcm9lUAveiWQVtarFZ7Qk/7PNfyZJmXhIa4HJhyJ05S rPAYIHUbz+B6UZRuQC0X7aIZ+mTCe17LsRh5uVV/rhu9rJ3QwSEZUKiZaIrJSGzZ0McBnoEemjM N7N3XHhpBiRJ63aJMmqQFAXwcDHFFROWrCPhmkJjYuOpa2fCI+6cCW5zBlewyYiZ+hJwP0EBxnb 7lcF4rHdrIj1vZY5bOIgF/nl2zn61Nkgj/RwZJaltjtsdryEYyugJ1b1ytUna9W/6NYSDRe0aTd 5gKqKwAitPBkNwrHSNkV8mTqO4c/0VktDsM4oHD2FGKrd6p1x/JB7eg0byiiMG30C4pSlni+EON n5NWpYH0TI8R1+Rjo46Y= X-Google-Smtp-Source: AGHT+IFRUMVXwq9t9zoppX3jUjPgp5npqXSGShrgHAx927648lNHSf66xRhiQlIDhKL6ceNqK3JUkQ== X-Received: by 2002:a05:6214:c43:b0:707:3cb1:3fac with SMTP id 6a1803df08f44-70ba7af6bfbmr21560886d6.15.1755266162144; Fri, 15 Aug 2025 06:56:02 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:3::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-70ba92f8c73sm8074566d6.44.2025.08.15.06.56.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:56:01 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 5/7] selftest/mm: Extract sz2ord function into vm_util.h Date: Fri, 15 Aug 2025 14:54:57 +0100 Message-ID: <20250815135549.130506-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The function already has 2 uses and will have a 3rd one in prctl selftests. The pagesize argument is added into the function, as it's not a global variable anymore. No functional change intended with this patch. Suggested-by: David Hildenbrand Signed-off-by: Usama Arif Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- tools/testing/selftests/mm/cow.c | 12 ++++-------- tools/testing/selftests/mm/uffd-wp-mremap.c | 9 ++------- tools/testing/selftests/mm/vm_util.h | 5 +++++ 3 files changed, 11 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/= cow.c index 90ee5779662f3..a568fe629b094 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -41,10 +41,6 @@ static size_t hugetlbsizes[10]; static int gup_fd; static bool has_huge_zeropage; =20 -static int sz2ord(size_t size) -{ - return __builtin_ctzll(size / pagesize); -} =20 static int detect_thp_sizes(size_t sizes[], int max) { @@ -57,7 +53,7 @@ static int detect_thp_sizes(size_t sizes[], int max) if (!pmdsize) return 0; =20 - orders =3D 1UL << sz2ord(pmdsize); + orders =3D 1UL << sz2ord(pmdsize, pagesize); orders |=3D thp_supported_orders(); =20 for (i =3D 0; orders && count < max; i++) { @@ -1216,8 +1212,8 @@ static void run_anon_test_case(struct test_case const= *test_case) size_t size =3D thpsizes[i]; struct thp_settings settings =3D *thp_current_settings(); =20 - settings.hugepages[sz2ord(pmdsize)].enabled =3D THP_NEVER; - settings.hugepages[sz2ord(size)].enabled =3D THP_ALWAYS; + settings.hugepages[sz2ord(pmdsize, pagesize)].enabled =3D THP_NEVER; + settings.hugepages[sz2ord(size, pagesize)].enabled =3D THP_ALWAYS; thp_push_settings(&settings); =20 if (size =3D=3D pmdsize) { @@ -1868,7 +1864,7 @@ int main(void) if (pmdsize) { /* Only if THP is supported. */ thp_read_settings(&default_settings); - default_settings.hugepages[sz2ord(pmdsize)].enabled =3D THP_INHERIT; + default_settings.hugepages[sz2ord(pmdsize, pagesize)].enabled =3D THP_IN= HERIT; thp_save_settings(); thp_push_settings(&default_settings); =20 diff --git a/tools/testing/selftests/mm/uffd-wp-mremap.c b/tools/testing/se= lftests/mm/uffd-wp-mremap.c index 13ceb56289701..b2b6116e65808 100644 --- a/tools/testing/selftests/mm/uffd-wp-mremap.c +++ b/tools/testing/selftests/mm/uffd-wp-mremap.c @@ -19,11 +19,6 @@ static size_t thpsizes[20]; static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; =20 -static int sz2ord(size_t size) -{ - return __builtin_ctzll(size / pagesize); -} - static int detect_thp_sizes(size_t sizes[], int max) { int count =3D 0; @@ -87,9 +82,9 @@ static void *alloc_one_folio(size_t size, bool private, b= ool hugetlb) struct thp_settings settings =3D *thp_current_settings(); =20 if (private) - settings.hugepages[sz2ord(size)].enabled =3D THP_ALWAYS; + settings.hugepages[sz2ord(size, pagesize)].enabled =3D THP_ALWAYS; else - settings.shmem_hugepages[sz2ord(size)].enabled =3D SHMEM_ALWAYS; + settings.shmem_hugepages[sz2ord(size, pagesize)].enabled =3D SHMEM_ALWA= YS; =20 thp_push_settings(&settings); =20 diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index 148b792cff0fc..e5cb72bf3a2ab 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -135,6 +135,11 @@ static inline void log_test_result(int result) ksft_test_result_report(result, "%s\n", test_name); } =20 +static inline int sz2ord(size_t size, size_t pagesize) +{ + return __builtin_ctzll(size / pagesize); +} + void *sys_mremap(void *old_address, unsigned long old_size, unsigned long new_size, int flags, void *new_address); =20 --=20 2.47.3 From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95BA3305E3C; Fri, 15 Aug 2025 13:56:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266166; cv=none; b=ue3Dtgr1YdwaHl2F1TczM9u8EC4YND+NQgnrvCyoTYSmr2DGDKtTW6dm/jp1f8sAZ2Y5Y1NXy/jK0hs9IpHHk853TaDBJglmRX/zH53ndJzL3xzwXBvEHoDFW8TNeG8XFauf66OeaCfDxIhYC7BSfkOHPUxrVtOgK/PFSnOnmgo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266166; c=relaxed/simple; bh=9Nh5dCrF7vmkiPO6+SZKBUdCvfvTaXBCeQl+556ey5A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hWGBQnFvqG8eCvDmDS++LX6A9dwlqJ3vZd9hDxE0tcO49XdkAf8to5WeenQnNj8oqj6/KroDW1oafS+SnRYAAISEUp6q7dtUHY3va6wDUl/f2/n2ymXlcODtljmSWJZ7XVg81ahNtwlobqrxLrCompMc9SryzdiTnvfphP6KKcA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kA4lBaui; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kA4lBaui" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-70a88ddb1a2so18633756d6.0; Fri, 15 Aug 2025 06:56:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266163; x=1755870963; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ThKI922dULUmYFiaW3lL2ZDEHLRMIx7W7/OmILgNixg=; b=kA4lBauixBhyLcAKgZeDSW5i89lXD8ocvkZw6eNWZ20J3bawHCgwSi6WVGUV5fTSr1 Ok2yAcqJ1swHLDUhQN+oW4xwvgyegD1EE6NQePRC1csZQHQ58gPwKaRPBhKCzx2FW+Jw P0BOaVjj69WdxwI9OdLRT3GioQW9jF9BwyKYIXLS4cvWWknA/JgsuU0wubH/TR2NMfn/ Ins/AlfOq9BSN45NVh53YsfMWUn9aMX0rB+LT8vh3v7OoS62c0igL6vQd+BQ98qBkPhA vSjmkqFfGs7qd/D+98UcmVTNPKwUYbOh5VL7BO7aLGOvJZ7bnUXJZjyIeHPkm7Fq76wZ OaPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266163; x=1755870963; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ThKI922dULUmYFiaW3lL2ZDEHLRMIx7W7/OmILgNixg=; b=UCG6eJOUlaQUj1t8TyaEX6hdL81tGXFCdongk00Gt6ElBUTSrhF2Kqi0ljuMGSEtvm uxb0wmXZKSZbVWtT0qSuigjS/3n4DBIPwLJrKpcTf0Kugjsh3jJlFEro7G5R+2A/3DZR STmAyfZb3l9M18JVR+p4p2hQDXxY6CTzCB37rZbKD/9/ORFiEdOgWv+4fTJlgvvkR+uo TdupIix/mu45u3EXhhp4WE4CinY4dpsKs9uhGNc5irJVa7lmEPYB2oMsL5cwi0+XZQUE XAIF3yAqJb3Y0FlgAc7HImgaDglec55i+CN/KoTPU9IxC5tJqjBL+Uc/Vl98VFI6i6fh lRDA== X-Forwarded-Encrypted: i=1; AJvYcCWxF/v+B6PD8Bz7cPwVgdxrK4chGvdJF12ml41hTuVbyIkir7o96HXpo0NCFpON3PPRrXmh0KIPwnTUQzdD@vger.kernel.org, AJvYcCXhcZngeMJZUGrLRzpEXwBHhMOJ2K/I5XiYhr9ejMGLI2mghlEOpWuAp+trWoObo5i3MXFr4HTbTQE=@vger.kernel.org X-Gm-Message-State: AOJu0YxOywqR+0BLoezu1IgAogNHI1DUnoHfIX6vFKGvi5/b78q0RbQR HnLqjUrXX7odUggSoonjw3K9vNkTFgTHVpSpmqrNfyr7LnlJyJ+3cUd1 X-Gm-Gg: ASbGncuog93Q0VtcZR5DKtTVrBl17eu1kzJ30ZZZ55e+COkqKWA89HZBOTsOSZU/ujw pE+v66PiR5+HPb2GKJCjy+yTud88VNjjs/gLHQzcqoUlfyGr4txbHzlHfgRVvrI0R4wn/bGJch5 QUG54Um+YCxANlIlyGR5yoGGoFOG4vLg8Hg302svGQ+Jw3ebL1jpLbFG2VAMM+g9eul1FnwUjd4 qRJQ9xkGNAE6ZMG+t91Ek+yIeA7DLO2D5cr5elimukptwr/wre7l0WMJKUGtqKOe0X+BE96ZPZd l5agOrvSjXdwU+C1BxoGLz4XYiFleYHmPHQN1qJpxxrRRrUznq7asfDYXgIr33Mg0putnXxIVld dS22TXCkakaXW++mLhY6L X-Google-Smtp-Source: AGHT+IG+TCWxSTubxOqPY0sQAGowCS1LkSUfiep5FDN5JBRpx5/efo7iB20uzg87MScqVq8pUB5pxg== X-Received: by 2002:a05:6214:246c:b0:709:7345:9aa3 with SMTP id 6a1803df08f44-70ba6524e9fmr26519706d6.14.1755266163499; Fri, 15 Aug 2025 06:56:03 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:72::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-70ba95d3449sm7954516d6.77.2025.08.15.06.56.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:56:02 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 6/7] selftests: prctl: introduce tests for disabling THPs completely Date: Fri, 15 Aug 2025 14:54:58 +0100 Message-ID: <20250815135549.130506-7-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The test will set the global system THP setting to never, madvise or always depending on the fixture variant and the 2M setting to inherit before it starts (and reset to original at teardown). The fixture setup will also test if PR_SET_THP_DISABLE prctl call can be made to disable all THPs and skip if it fails. This tests if the process can: - successfully get the policy to disable THPs completely. - never get a hugepage when the THPs are completely disabled with the prctl, including with MADV_HUGE and MADV_COLLAPSE. - successfully reset the policy of the process. - after reset, only get hugepages with: - MADV_COLLAPSE when policy is set to never. - MADV_HUGE and MADV_COLLAPSE when policy is set to madvise. - always when policy is set to "always". - never get a THP with MADV_NOHUGEPAGE. - repeat the above tests in a forked process to make sure the policy is carried across forks. Signed-off-by: Usama Arif Acked-by: David Hildenbrand Reviewed-by: Lorenzo Stoakes --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../testing/selftests/mm/prctl_thp_disable.c | 175 ++++++++++++++++++ tools/testing/selftests/mm/thp_settings.c | 9 +- tools/testing/selftests/mm/thp_settings.h | 1 + 5 files changed, 186 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/mm/prctl_thp_disable.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftest= s/mm/.gitignore index e7b23a8a05fe2..eb023ea857b31 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -58,3 +58,4 @@ pkey_sighandler_tests_32 pkey_sighandler_tests_64 guard-regions merge +prctl_thp_disable diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index d75f1effcb791..bd5d17beafa64 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -87,6 +87,7 @@ TEST_GEN_FILES +=3D on-fault-limit TEST_GEN_FILES +=3D pagemap_ioctl TEST_GEN_FILES +=3D pfnmap TEST_GEN_FILES +=3D process_madv +TEST_GEN_FILES +=3D prctl_thp_disable TEST_GEN_FILES +=3D thuge-gen TEST_GEN_FILES +=3D transhuge-stress TEST_GEN_FILES +=3D uffd-stress diff --git a/tools/testing/selftests/mm/prctl_thp_disable.c b/tools/testing= /selftests/mm/prctl_thp_disable.c new file mode 100644 index 0000000000000..e9e519c85224c --- /dev/null +++ b/tools/testing/selftests/mm/prctl_thp_disable.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Basic tests for PR_GET/SET_THP_DISABLE prctl calls + * + * Author(s): Usama Arif + */ +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest_harness.h" +#include "thp_settings.h" +#include "vm_util.h" + +enum thp_collapse_type { + THP_COLLAPSE_NONE, + THP_COLLAPSE_MADV_NOHUGEPAGE, + THP_COLLAPSE_MADV_HUGEPAGE, /* MADV_HUGEPAGE before access */ + THP_COLLAPSE_MADV_COLLAPSE, /* MADV_COLLAPSE after access */ +}; + +/* + * Function to mmap a buffer, fault it in, madvise it appropriately (before + * page fault for MADV_HUGE, and after for MADV_COLLAPSE), and check if the + * mmap region is huge. + * Returns: + * 0 if test doesn't give hugepage + * 1 if test gives a hugepage + * -errno if mmap fails + */ +static int test_mmap_thp(enum thp_collapse_type madvise_buf, size_t pmdsiz= e) +{ + char *mem, *mmap_mem; + size_t mmap_size; + int ret; + + /* For alignment purposes, we need twice the THP size. */ + mmap_size =3D 2 * pmdsize; + mmap_mem =3D (char *)mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mmap_mem =3D=3D MAP_FAILED) + return -errno; + + /* We need a THP-aligned memory area. */ + mem =3D (char *)(((uintptr_t)mmap_mem + pmdsize) & ~(pmdsize - 1)); + + if (madvise_buf =3D=3D THP_COLLAPSE_MADV_HUGEPAGE) + madvise(mem, pmdsize, MADV_HUGEPAGE); + else if (madvise_buf =3D=3D THP_COLLAPSE_MADV_NOHUGEPAGE) + madvise(mem, pmdsize, MADV_NOHUGEPAGE); + + /* Ensure memory is allocated */ + memset(mem, 1, pmdsize); + + if (madvise_buf =3D=3D THP_COLLAPSE_MADV_COLLAPSE) + madvise(mem, pmdsize, MADV_COLLAPSE); + + /* HACK: make sure we have a separate VMA that we can check reliably. */ + mprotect(mem, pmdsize, PROT_READ); + + ret =3D check_huge_anon(mem, 1, pmdsize); + munmap(mmap_mem, mmap_size); + return ret; +} + +static void prctl_thp_disable_completely_test(struct __test_metadata *cons= t _metadata, + size_t pmdsize, + enum thp_enabled thp_policy) +{ + ASSERT_EQ(prctl(PR_GET_THP_DISABLE, NULL, NULL, NULL, NULL), 1); + + /* tests after prctl overrides global policy */ + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_NONE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_NOHUGEPAGE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_HUGEPAGE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_COLLAPSE, pmdsize), 0); + + /* Reset to global policy */ + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 0, NULL, NULL, NULL), 0); + + /* tests after prctl is cleared, and only global policy is effective */ + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_NONE, pmdsize), + thp_policy =3D=3D THP_ALWAYS ? 1 : 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_NOHUGEPAGE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_HUGEPAGE, pmdsize), + thp_policy =3D=3D THP_NEVER ? 0 : 1); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_COLLAPSE, pmdsize), 1); +} + +FIXTURE(prctl_thp_disable_completely) +{ + struct thp_settings settings; + size_t pmdsize; +}; + +FIXTURE_VARIANT(prctl_thp_disable_completely) +{ + enum thp_enabled thp_policy; +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_completely, never) +{ + .thp_policy =3D THP_NEVER, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_completely, madvise) +{ + .thp_policy =3D THP_MADVISE, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_completely, always) +{ + .thp_policy =3D THP_ALWAYS, +}; + +FIXTURE_SETUP(prctl_thp_disable_completely) +{ + if (!thp_available()) + SKIP(return, "Transparent Hugepages not available\n"); + + self->pmdsize =3D read_pmd_pagesize(); + if (!self->pmdsize) + SKIP(return, "Unable to read PMD size\n"); + + if (prctl(PR_SET_THP_DISABLE, 1, NULL, NULL, NULL)) + SKIP(return, "Unable to disable THPs completely for the process\n"); + + thp_save_settings(); + thp_read_settings(&self->settings); + self->settings.thp_enabled =3D variant->thp_policy; + self->settings.hugepages[sz2ord(self->pmdsize, getpagesize())].enabled = =3D THP_INHERIT; + thp_write_settings(&self->settings); +} + +FIXTURE_TEARDOWN(prctl_thp_disable_completely) +{ + thp_restore_settings(); +} + +TEST_F(prctl_thp_disable_completely, nofork) +{ + prctl_thp_disable_completely_test(_metadata, self->pmdsize, variant->thp_= policy); +} + +TEST_F(prctl_thp_disable_completely, fork) +{ + int ret =3D 0; + pid_t pid; + + /* Make sure prctl changes are carried across fork */ + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (!pid) + prctl_thp_disable_completely_test(_metadata, self->pmdsize, variant->thp= _policy); + + wait(&ret); + if (WIFEXITED(ret)) + ret =3D WEXITSTATUS(ret); + else + ret =3D -EINVAL; + ASSERT_EQ(ret, 0); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/thp_settings.c b/tools/testing/self= tests/mm/thp_settings.c index bad60ac52874a..574bd0f8ae480 100644 --- a/tools/testing/selftests/mm/thp_settings.c +++ b/tools/testing/selftests/mm/thp_settings.c @@ -382,10 +382,17 @@ unsigned long thp_shmem_supported_orders(void) return __thp_supported_orders(true); } =20 -bool thp_is_enabled(void) +bool thp_available(void) { if (access(THP_SYSFS, F_OK) !=3D 0) return false; + return true; +} + +bool thp_is_enabled(void) +{ + if (!thp_available()) + return false; =20 int mode =3D thp_read_string("enabled", thp_enabled_strings); =20 diff --git a/tools/testing/selftests/mm/thp_settings.h b/tools/testing/self= tests/mm/thp_settings.h index 6c07f70beee97..76eeb712e5f10 100644 --- a/tools/testing/selftests/mm/thp_settings.h +++ b/tools/testing/selftests/mm/thp_settings.h @@ -84,6 +84,7 @@ void thp_set_read_ahead_path(char *path); unsigned long thp_supported_orders(void); unsigned long thp_shmem_supported_orders(void); =20 +bool thp_available(void); bool thp_is_enabled(void); =20 #endif /* __THP_SETTINGS_H__ */ --=20 2.47.3 From nobody Sat Sep 27 19:24:39 2025 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A4A0306D55; Fri, 15 Aug 2025 13:56:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266168; cv=none; b=VHkYdzLpC428yJU5UnkHHIHGfUxFiMLmOqc5K7aoi7PAn4RYJE1pIdie+KOngPZIZzOamjE/TpVGUxeDsbsiYxhuuxJSeJwIpNGf3aiAYyrnDMrD9ScbACBew2hFX7phIvv78vBn1a7SogHOehnJEQnlIte5+7mKwyFfkpBzSoA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755266168; c=relaxed/simple; bh=gqKG3x2482aPgHHgIqYYA6kaxCzewtm+iS09hO7miPU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TwgfOK7DaUP9DSp3vIzDlmNsCjWHHpZHkioHsT+8r053XE3CGLoNdXRYp54fTih/qwJvJgs8hckxCnfS9u73pu4pGy/72dOJHNOto70nufO/YuJvojoyHtgx1THA4MEwKtD8aZOkht0mWyZ+sQgRfHHZEB1WVkcjkqGC/wu5CVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aatNFmZV; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aatNFmZV" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-4b109bcceb9so22867251cf.2; Fri, 15 Aug 2025 06:56:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755266165; x=1755870965; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PDJg+8VWkXRRI01/m0uJH2Q2Hi8ld8jM3wHbWmjFcUE=; b=aatNFmZVLMu/SxZaJxS5VeMDAOvc+h5I1xChcu6qwDd7GI+r+UCstIvr/+EJ58Eych qtIOSs4U3Thd9chI8ujK+MI4o7zswsy5/5qtPWKZE/qJJf3+xeKfcLivfJzQhG3QSA/V 3ARXu853YCyfMyh2rg3q3sQjL3JqZIRdQpFHd6Bo3SXMvvCISk8hTIlbvlZcWm3bh+lu go0wqCbRtcxW2rYLr2NHcmoXn861z/dKamataj6C3pHR0VI+2dJzJGQflcs91Q+HNWFm G2pYc8gKISJ3295X6syfJR+e9GXP769oc6ZeJWuEpSq7WNix1rKGe1tR3+ejXty2A5rW Vrmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755266165; x=1755870965; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PDJg+8VWkXRRI01/m0uJH2Q2Hi8ld8jM3wHbWmjFcUE=; b=YFywOiYBiiq6VToF9MtVL8kzXojDceLOVMiRm4BQM+pDE56cWTtyXHIHySnedzOtQi CBuQ8NhPDt/13JtOdKnwJiBPBnzVl/6WqGOusmiMzfLrrKCkp0ynO+YSELtJQyZDNlAm Wk224zvD4BvVkAoFQVJ5dxY+Hat4tSjIeeH16vyJ6DwWb8239KiPH1cWcjv6wFRR0p0X c7IjJIyFxiNCrdZ3aZORDYYS8bJ3MZuHAkjBPMmTY+TePwTGUFH/fzhXZiUUpaRt4yje clpTwDaVnFGFpjbi+KGymzT/CNqhZwoKiFAhgExIyzN8dF/W4+xg5jiHBNdkW6wAUXBA yXCA== X-Forwarded-Encrypted: i=1; AJvYcCVfSlg7PNP8/84xrGQWuJPgBLx6AjFn0vYUZhLuQTG9oaiwZ83cNsVTm+KX7+HJcUZMcbekf2qn3KHCKpqZ@vger.kernel.org, AJvYcCWprdRIVR07X/43d+We9rcIb1yVpBKqc4Y6catwjH2tTk4iffqLN928B1XqcH6K29NWksCtRdQ+E7E=@vger.kernel.org X-Gm-Message-State: AOJu0YxAtSbSRbUB/6W4oYB7YAEeqSFK3TJglOdI6we5JAsFehodp05c m/zX3dI6oYbdULY3+JItp+FWn8WaRMdDOmjGSYlYZHIPC/ZwNnc/263r X-Gm-Gg: ASbGncu/MilHwO44VmTXKASrX8LQf1Nq3PvJd/LQP9y+fdqgYXKVx2J5rYbazVJjy0l sihOw5jyQWdeVe6OFdd546i+ACSVNuOVfMDNnDbM6sLDQa4fHNbbKJrUv+B0bacnrNJglI3Nras 6TMqjzsMOw70Fq6pLFZ2GLrl1tv100aZwUYwlRAVxr8KfejRZH3vt+Z3LH6KnwWRkpPKgGIZurf Uxdn/h6pbmIigV8bT+LHX7bdHPTy6ShhXdJWCJ3rO3wdbP6TTcTQqmj4npTqdSctxVErFONOPAQ KWK40redap3S5Z4bK65XmnI3ZFh5GhJ1JBZpFpbERNNBxOaW6tja+VZV9tDW7OzZEWJYSMg7+LA nMZACU4qdPbusw/CMfns= X-Google-Smtp-Source: AGHT+IGiowQqpHScj8TmKBlVlJNsOwsO0+vsYR/6GIDeAkEcuuhXzeHkMw4QTi8JVjdMqvbPaYNdmw== X-Received: by 2002:a05:622a:8ca:b0:4b1:103b:bb6b with SMTP id d75a77b69052e-4b11e315af2mr26902261cf.61.1755266165043; Fri, 15 Aug 2025 06:56:05 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:7::]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4b11dddb0f4sm9436461cf.38.2025.08.15.06.56.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Aug 2025 06:56:04 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 7/7] selftests: prctl: introduce tests for disabling THPs except for madvise Date: Fri, 15 Aug 2025 14:54:59 +0100 Message-ID: <20250815135549.130506-8-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250815135549.130506-1-usamaarif642@gmail.com> References: <20250815135549.130506-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The test will set the global system THP setting to never, madvise or always depending on the fixture variant and the 2M setting to inherit before it starts (and reset to original at teardown). The fixture setup will also test if PR_SET_THP_DISABLE prctl call can be made with PR_THP_DISABLE_EXCEPT_ADVISED and skip if it fails. This tests if the process can: - successfully get the policy to disable THPs expect for madvise. - get hugepages only on MADV_HUGE and MADV_COLLAPSE if the global policy is madvise/always and only with MADV_COLLAPSE if the global policy is never. - successfully reset the policy of the process. - after reset, only get hugepages with: - MADV_COLLAPSE when policy is set to never. - MADV_HUGE and MADV_COLLAPSE when policy is set to madvise. - always when policy is set to "always". - never get a THP with MADV_NOHUGEPAGE. - repeat the above tests in a forked process to make sure the policy is carried across forks. Test results: ./prctl_thp_disable TAP version 13 1..12 ok 1 prctl_thp_disable_completely.never.nofork ok 2 prctl_thp_disable_completely.never.fork ok 3 prctl_thp_disable_completely.madvise.nofork ok 4 prctl_thp_disable_completely.madvise.fork ok 5 prctl_thp_disable_completely.always.nofork ok 6 prctl_thp_disable_completely.always.fork ok 7 prctl_thp_disable_except_madvise.never.nofork ok 8 prctl_thp_disable_except_madvise.never.fork ok 9 prctl_thp_disable_except_madvise.madvise.nofork ok 10 prctl_thp_disable_except_madvise.madvise.fork ok 11 prctl_thp_disable_except_madvise.always.nofork ok 12 prctl_thp_disable_except_madvise.always.fork Signed-off-by: Usama Arif Acked-by: David Hildenbrand --- .../testing/selftests/mm/prctl_thp_disable.c | 111 ++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/tools/testing/selftests/mm/prctl_thp_disable.c b/tools/testing= /selftests/mm/prctl_thp_disable.c index e9e519c85224c..77c53a91124f1 100644 --- a/tools/testing/selftests/mm/prctl_thp_disable.c +++ b/tools/testing/selftests/mm/prctl_thp_disable.c @@ -16,6 +16,10 @@ #include "thp_settings.h" #include "vm_util.h" =20 +#ifndef PR_THP_DISABLE_EXCEPT_ADVISED +#define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) +#endif + enum thp_collapse_type { THP_COLLAPSE_NONE, THP_COLLAPSE_MADV_NOHUGEPAGE, @@ -172,4 +176,111 @@ TEST_F(prctl_thp_disable_completely, fork) ASSERT_EQ(ret, 0); } =20 +static void prctl_thp_disable_except_madvise_test(struct __test_metadata *= const _metadata, + size_t pmdsize, + enum thp_enabled thp_policy) +{ + ASSERT_EQ(prctl(PR_GET_THP_DISABLE, NULL, NULL, NULL, NULL), 3); + + /* tests after prctl overrides global policy */ + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_NONE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_NOHUGEPAGE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_HUGEPAGE, pmdsize), + thp_policy =3D=3D THP_NEVER ? 0 : 1); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_COLLAPSE, pmdsize), 1); + + /* Reset to global policy */ + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 0, NULL, NULL, NULL), 0); + + /* tests after prctl is cleared, and only global policy is effective */ + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_NONE, pmdsize), + thp_policy =3D=3D THP_ALWAYS ? 1 : 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_NOHUGEPAGE, pmdsize), 0); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_HUGEPAGE, pmdsize), + thp_policy =3D=3D THP_NEVER ? 0 : 1); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_COLLAPSE, pmdsize), 1); +} + +FIXTURE(prctl_thp_disable_except_madvise) +{ + struct thp_settings settings; + size_t pmdsize; +}; + +FIXTURE_VARIANT(prctl_thp_disable_except_madvise) +{ + enum thp_enabled thp_policy; +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_except_madvise, never) +{ + .thp_policy =3D THP_NEVER, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_except_madvise, madvise) +{ + .thp_policy =3D THP_MADVISE, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_except_madvise, always) +{ + .thp_policy =3D THP_ALWAYS, +}; + +FIXTURE_SETUP(prctl_thp_disable_except_madvise) +{ + if (!thp_available()) + SKIP(return, "Transparent Hugepages not available\n"); + + self->pmdsize =3D read_pmd_pagesize(); + if (!self->pmdsize) + SKIP(return, "Unable to read PMD size\n"); + + if (prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, NULL, NUL= L)) + SKIP(return, "Unable to set PR_THP_DISABLE_EXCEPT_ADVISED\n"); + + thp_save_settings(); + thp_read_settings(&self->settings); + self->settings.thp_enabled =3D variant->thp_policy; + self->settings.hugepages[sz2ord(self->pmdsize, getpagesize())].enabled = =3D THP_INHERIT; + thp_write_settings(&self->settings); +} + +FIXTURE_TEARDOWN(prctl_thp_disable_except_madvise) +{ + thp_restore_settings(); +} + +TEST_F(prctl_thp_disable_except_madvise, nofork) +{ + prctl_thp_disable_except_madvise_test(_metadata, self->pmdsize, variant->= thp_policy); +} + +TEST_F(prctl_thp_disable_except_madvise, fork) +{ + int ret =3D 0; + pid_t pid; + + /* Make sure prctl changes are carried across fork */ + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (!pid) + prctl_thp_disable_except_madvise_test(_metadata, self->pmdsize, + variant->thp_policy); + + wait(&ret); + if (WIFEXITED(ret)) + ret =3D WEXITSTATUS(ret); + else + ret =3D -EINVAL; + ASSERT_EQ(ret, 0); +} + TEST_HARNESS_MAIN --=20 2.47.3