From nobody Sun Oct 5 16:18:50 2025 Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 238718F66; Thu, 31 Jul 2025 12:28:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964915; cv=none; b=lrSipfoZA9dXX+JRiHiAmSqrVmMGhbM9t+Qoi/QF9VnHXjnYaB/dyhOmzOmn6OsKzEGpv/rEvModRChc9pMRsMRW4M2sOIZh9HQXXoI+vA24/P0MU8mIjFYsDRY1NYk21dtcV+3nQWytQvEplOfhcYvwvCf4Vlj6whqTYREgiUM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964915; c=relaxed/simple; bh=xZ9YZgIxzV1a7mKg/0oAROyV8ujWXIyeZMiyUcdlV70=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TykiqvIDXTdaYIFhXf4Q9s2SCcQZ9IbsRpw6x+k8JirtuLoTzo5mJ/mSLcyZHUqv44/G2kr+CMW9fo5S4P9NkIQ3CBHFCUwsr94vQrQHBr5S/f3P9Lfig/+jhYaAozqGDGZOeumUBDHwYl+WKhfzJoSbjOG6AaWD/9vgK735xI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gubzsVOe; arc=none smtp.client-ip=209.85.222.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gubzsVOe" Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7e2e3108841so33875085a.3; Thu, 31 Jul 2025 05:28:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753964912; x=1754569712; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IE+REDzqvLKoDrvawnxQmNnKD+qsH5pazW//69VuTHA=; b=gubzsVOe0EZDBy/yyY+tXsUUWSew1w8Ia/FA663sAgRoNidjKDhyKn1Ck4vhN2ckYh pDfCbXUDE6wMfxLQBWiiBqek6s85HNHJ7mE3kj0xqxbdB/OLDu6V10RVOuQQ/tR8DD+k iUSrD1Aq5G9OQXH4AFL9+YFaYTtnMFxqYJTpFOSmjZUfbbXD/n5C7p+CftaSakSuWtYe w620muPUhC/RvXhJQqpf+bYPiwgqvd9eaTaTjpiaUim9Ws1FMN9vWJii8GePFl78+zIV dryHpJjwKSNeVIM5ImbQFdfCqQ4pB0kq/0WDoGxRNWs+RUQoFZL4XF2cqxrnDPyF8XJA SIqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753964912; x=1754569712; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IE+REDzqvLKoDrvawnxQmNnKD+qsH5pazW//69VuTHA=; b=IknYWJMGwAspbYI7LtOfGAWnBcBUSFNQ/PQdR6Xl3jKtWbvFThCTycbBdxKaoBg4Rg Dv/jtedgTWEEngYsfg8JGRGCaDaXXt5ktB39ed7DUq3QNbw4xULDE7/NlnqSrMsidNIt 7uXmtreEO1c96GV2IOzrs93uJL/bip0ehMmLWNRYlZBOgNcA5JbNmTH8ds0o1g1fRmxb zOq00ynzHO277+QXm32YmRzLGfwalDTacPSGAHsDwR/3wq60IdnF7qEgeY3IKNOJDrXk gAzevcLLM1jb4VQIzvTmWHFX6om9Acv1NSJY/mXzHYk0db3DTNcai+Oo9bDdc49wCEqU GiOw== X-Forwarded-Encrypted: i=1; AJvYcCWH1hjd9lxTD4QdqZKV3/JE9pzQ7Nowre/wxFluD8oRYRnQ1ddOVdbteUrYXtKKt6cvp+uXaiV0BPg=@vger.kernel.org, AJvYcCWuW2tuEGzPwXzG08kJEfmR1FAYu4oLhyiZAUUL5u4gJQH2kKo2t/jSUK8mTRTiLNK3PHx10wgZyeOrEQd4@vger.kernel.org X-Gm-Message-State: AOJu0Yys4rbx0SgEjBUbqi7oazBvyQG8ZhxPEqOCn1wosTFgtiNJFiTH LWpXk+r/XZlFeyvWqJd5iEanUqNYPSPFEQhHmaa2wJO2RC5q6kmxH4bF X-Gm-Gg: ASbGncvsdxhu7tDmPCfNB2c7un4SPqEfqww3/DZgEt95cNKCGiEIAJ3o+bEjlHtC7tW GT0Pv3ztJLbHs/KvSnJHXUx7nd9RtzCr6Wkv42AsHzTAnOmTv+fLhJrZ9CU0RsaXef8M3KnOwbK DYMPNsSsSvH/+qtK/ognpifgDDpiRHLgUo/QqFwQODTwtRiBpo2npGid24tAAYfIQshAh/A4Tw6 J9bdUwx1zCFWxl/sF53t35VrrS0y7qWUmdkHBXKMtEyDaXJuBwHMZPRiDNeptq7WVGlxSF7FqqS lxR3V+oYrYRY86nFXk11g13Gy/f+fdEFSwnX3YtEm7f6MwiyE7xGXUIVKRSQ/RX2c8pK2hBxFK5 9yl98XBrtA0FxLGyfiNPw X-Google-Smtp-Source: AGHT+IH/cBB9x74cpkRu8HPZ/iEfhO8KlkDxbrmTcVNP2rihHZS8nQJ/U5x1SDYWYPt6bAWgldxMyg== X-Received: by 2002:a05:620a:2846:b0:7e0:6402:bece with SMTP id af79cd13be357-7e66f0143efmr801145585a.38.1753964911599; Thu, 31 Jul 2025 05:28:31 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:73::]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7e67f598a0asm78995285a.6.2025.07.31.05.28.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Jul 2025 05:28:30 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif , Matthew Wilcox Subject: [PATCH v2 1/5] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE Date: Thu, 31 Jul 2025 13:27:18 +0100 Message-ID: <20250731122825.2102184-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250731122825.2102184-1-usamaarif642@gmail.com> References: <20250731122825.2102184-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand People want to make use of more THPs, for example, moving from the "never" system policy to "madvise", or from "madvise" to "always". While this is great news for every THP desperately waiting to get allocated out there, apparently there are some workloads that require a bit of care during that transition: individual processes may need to opt-out from this behavior for various reasons, and this should be permitted without needing to make all other workloads on the system similarly opt-out. The following scenarios are imaginable: (1) Switch from "none" system policy to "madvise"/"always", but keep THPs disabled for selected workloads. (2) Stay at "none" system policy, but enable THPs for selected workloads, making only these workloads use the "madvise" or "always" policy. (3) Switch from "madvise" system policy to "always", but keep the "madvise" policy for selected workloads: allocate THPs only when advised. (4) Stay at "madvise" system policy, but enable THPs even when not advised for selected workloads -- "always" policy. Once can emulate (2) through (1), by setting the system policy to "madvise"/"always" while disabling THPs for all processes that don't want THPs. It requires configuring all workloads, but that is a user-space problem to sort out. (4) can be emulated through (3) in a similar way. Back when (1) was relevant in the past, as people started enabling THPs, we added PR_SET_THP_DISABLE, so relevant workloads that were not ready yet (i.e., used by Redis) were able to just disable THPs completely. Redis still implements the option to use this interface to disable THPs completely. With PR_SET_THP_DISABLE, we added a way to force-disable THPs for a workload -- a process, including fork+exec'ed process hierarchy. That essentially made us support (1): simply disable THPs for all workloads that are not ready for THPs yet, while still enabling THPs system-wide. The quest for handling (3) and (4) started, but current approaches (completely new prctl, options to set other policies per process, alternatives to prctl -- mctrl, cgroup handling) don't look particularly promising. Likely, the future will use bpf or something similar to implement better policies, in particular to also make better decisions about THP sizes to use, but this will certainly take a while as that work just started. Long story short: a simple enable/disable is not really suitable for the future, so we're not willing to add completely new toggles. While we could emulate (3)+(4) through (1)+(2) by simply disabling THPs completely for these processes, this is a step backwards, because these processes can no longer allocate THPs in regions where THPs were explicitly advised: regions flagged as VM_HUGEPAGE. Apparently, that imposes a problem for relevant workloads, because "not THPs" is certainly worse than "THPs only when advised". Could we simply relax PR_SET_THP_DISABLE, to "disable THPs unless not explicitly advised by the app through MAD_HUGEPAGE"? *maybe*, but this would change the documented semantics quite a bit, and the versatility to use it for debugging purposes, so I am not 100% sure that is what we want -- although it would certainly be much easier. So instead, as an easy way forward for (3) and (4), add an option to make PR_SET_THP_DISABLE disable *less* THPs for a process. In essence, this patch: (A) Adds PR_THP_DISABLE_EXCEPT_ADVISED, to be used as a flag in arg3 of prctl(PR_SET_THP_DISABLE) when disabling THPs (arg2 !=3D 0). prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED). (B) Makes prctl(PR_GET_THP_DISABLE) return 3 if PR_THP_DISABLE_EXCEPT_ADVISED was set while disabling. Previously, it would return 1 if THPs were disabled completely. Now it returns the set flags as well: 3 if PR_THP_DISABLE_EXCEPT_ADVISED was set. (C) Renames MMF_DISABLE_THP to MMF_DISABLE_THP_COMPLETELY, to express the semantics clearly. Fortunately, there are only two instances outside of prctl() code. (D) Adds MMF_DISABLE_THP_EXCEPT_ADVISED to express "no THP except for VMAs with VM_HUGEPAGE" -- essentially "thp=3Dmadvise" behavior Fortunately, we only have to extend vma_thp_disabled(). (E) Indicates "THP_enabled: 0" in /proc/pid/status only if THPs are disabled completely Only indicating that THPs are disabled when they are really disabled completely, not only partially. For now, we don't add another interface to obtained whether THPs are disabled partially (PR_THP_DISABLE_EXCEPT_ADVISED was set). If ever required, we could add a new entry. The documented semantics in the man page for PR_SET_THP_DISABLE "is inherited by a child created via fork(2) and is preserved across execve(2)" is maintained. This behavior, for example, allows for disabling THPs for a workload through the launching process (e.g., systemd where we fork() a helper process to then exec()). For now, MADV_COLLAPSE will *fail* in regions without VM_HUGEPAGE and VM_NOHUGEPAGE. As MADV_COLLAPSE is a clear advise that user space thinks a THP is a good idea, we'll enable that separately next (requiring a bit of cleanup first). There is currently not way to prevent that a process will not issue PR_SET_THP_DISABLE itself to re-enable THP. There are not really known users for re-enabling it, and it's against the purpose of the original interface. So if ever required, we could investigate just forbidding to re-enable them, or make this somehow configurable. Acked-by: Usama Arif Tested-by: Usama Arif Cc: Jonathan Corbet Cc: Andrew Morton Cc: Lorenzo Stoakes Cc: Zi Yan Cc: Baolin Wang Cc: "Liam R. Howlett" Cc: Nico Pache Cc: Ryan Roberts Cc: Dev Jain Cc: Barry Song Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Usama Arif Cc: SeongJae Park Cc: Jann Horn Cc: Liam R. Howlett Cc: Yafang Shao Cc: Matthew Wilcox Signed-off-by: David Hildenbrand Reviewed-by: Lorenzo Stoakes --- At first, I thought of "why not simply relax PR_SET_THP_DISABLE", but I think there might be real use cases where we want to disable any THPs -- in particular also around debugging THP-related problems, and "never" not meaning ... "never" anymore ever since we add MADV_COLLAPSE. PR_SET_THP_DISABLE will also block MADV_COLLAPSE, which can be very helpful for debugging purposes. Of course, I thought of having a system-wide config option to modify PR_SET_THP_DISABLE behavior, but I just don't like the semantics. "prctl: allow overriding system THP policy to always"[1] proposed "overriding policies to always", which is just the wrong way around: we should not add mechanisms to "enable more" when we already have an interface/mechanism to "disable" them (PR_SET_THP_DISABLE). It all gets weird otherwise. "[PATCH 0/6] prctl: introduce PR_SET/GET_THP_POLICY"[2] proposed setting the default of the VM_HUGEPAGE, which is similarly the wrong way around I think now. The ideas explored by Lorenzo to extend process_madvise()[3] and mctrl()[4] similarly were around the "default for VM_HUGEPAGE" idea, but after the discussion, I think we should better leave VM_HUGEPAGE untouched. Happy to hear naming suggestions for "PR_THP_DISABLE_EXCEPT_ADVISED" where we essentially want to say "leave advised regions alone" -- "keep THP enabled for advised regions", The only thing I really dislike about this is using another MMF_* flag, but well, no way around it -- and seems like we could easily support more than 32 if we want to (most users already treat it like a proper bitmap). I think this here (modifying an existing toggle) is the only prctl() extension that we might be willing to accept. In general, I agree like most others, that prctl() is a very bad interface for that -- but PR_SET_THP_DISABLE is already there and is getting used. Long-term, I think the answer will be something based on bpf[5]. Maybe in that context, I there could still be value in easily disabling THPs for selected workloads (esp. debugging purposes). Jann raised valid concerns[6] about new flags that are persistent across exec[6]. As this here is a relaxation to existing PR_SET_THP_DISABLE I consider it having a similar security risk as our existing PR_SET_THP_DISABLE, but devil is in the detail. [1] https://lore.kernel.org/r/20250507141132.2773275-1-usamaarif642@gmail.c= om [2] https://lkml.kernel.org/r/20250515133519.2779639-2-usamaarif642@gmail.c= om [3] https://lore.kernel.org/r/cover.1747686021.git.lorenzo.stoakes@oracle.c= om [4] https://lkml.kernel.org/r/85778a76-7dc8-4ea8-8827-acb45f74ee05@lucifer.= local [5] https://lkml.kernel.org/r/20250608073516.22415-1-laoar.shao@gmail.com [6] https://lore.kernel.org/r/CAG48ez3-7EnBVEjpdoW7z5K0hX41nLQN5Wb65Vg-1p8D= dXRnjg@mail.gmail.com Signed-off-by: David Hildenbrand --- Documentation/filesystems/proc.rst | 5 ++- fs/proc/array.c | 2 +- include/linux/huge_mm.h | 20 +++++++--- include/linux/mm_types.h | 13 +++---- include/uapi/linux/prctl.h | 10 +++++ kernel/sys.c | 59 ++++++++++++++++++++++++------ mm/khugepaged.c | 2 +- 7 files changed, 82 insertions(+), 29 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems= /proc.rst index 2971551b7235..915a3e44bc12 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -291,8 +291,9 @@ It's slow but very precise. HugetlbPages size of hugetlb memory portions CoreDumping process's memory is currently being dumped (killing the process may lead to a corrupted = core) - THP_enabled process is allowed to use THP (returns 0 when - PR_SET_THP_DISABLE is set on the process + THP_enabled process is allowed to use THP (returns 0 when + PR_SET_THP_DISABLE is set on the process to d= isable + THP completely, not just partially) Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread diff --git a/fs/proc/array.c b/fs/proc/array.c index d6a0369caa93..c4f91a784104 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -422,7 +422,7 @@ static inline void task_thp_status(struct seq_file *m, = struct mm_struct *mm) bool thp_enabled =3D IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE); =20 if (thp_enabled) - thp_enabled =3D !test_bit(MMF_DISABLE_THP, &mm->flags); + thp_enabled =3D !test_bit(MMF_DISABLE_THP_COMPLETELY, &mm->flags); seq_printf(m, "THP_enabled:\t%d\n", thp_enabled); } =20 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7748489fde1b..71db243a002e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -318,16 +318,26 @@ struct thpsize { (transparent_hugepage_flags & \ (1<vm_mm->flags)) + return true; /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. + * Are THPs disabled only for VMAs where we didn't get an explicit + * advise to use them? */ - return (vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags); + if (vm_flags & VM_HUGEPAGE) + return false; + return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags); } =20 static inline bool thp_disabled_by_hw(void) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1ec273b06691..123fefaa4b98 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1743,19 +1743,16 @@ enum { #define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */ #define MMF_VM_HUGEPAGE 17 /* set when mm is available for khugepaged */ =20 -/* - * This one-shot flag is dropped due to necessity of changing exe once aga= in - * on NFS restore - */ -//#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_file() */ +#define MMF_HUGE_ZERO_PAGE 18 /* mm has ever used the global huge zer= o page */ =20 #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ #define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */ #define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */ -#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zer= o page */ -#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ -#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) +#define MMF_DISABLE_THP_EXCEPT_ADVISED 23 /* no THP except when advised (e= .g., VM_HUGEPAGE) */ +#define MMF_DISABLE_THP_COMPLETELY 24 /* no THP for all VMAs */ +#define MMF_DISABLE_THP_MASK ((1 << MMF_DISABLE_THP_COMPLETELY) |\ + (1 << MMF_DISABLE_THP_EXCEPT_ADVISED)) #define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */ #define MMF_MULTIPROCESS 26 /* mm is shared between processes */ /* diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 43dec6eed559..9c1d6e49b8a9 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -177,7 +177,17 @@ struct prctl_mm_map { =20 #define PR_GET_TID_ADDRESS 40 =20 +/* + * Flags for PR_SET_THP_DISABLE are only applicable when disabling. Bit 0 + * is reserved, so PR_GET_THP_DISABLE can return "1 | flags", to effective= ly + * return "1" when no flags were specified for PR_SET_THP_DISABLE. + */ #define PR_SET_THP_DISABLE 41 +/* + * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE / + * VM_HUGEPAGE). + */ +# define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) #define PR_GET_THP_DISABLE 42 =20 /* diff --git a/kernel/sys.c b/kernel/sys.c index b153fb345ada..932a8e637e78 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2423,6 +2423,51 @@ static int prctl_get_auxv(void __user *addr, unsigne= d long len) return sizeof(mm->saved_auxv); } =20 +static int prctl_get_thp_disable(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + unsigned long *mm_flags =3D ¤t->mm->flags; + + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + + /* If disabled, we return "1 | flags", otherwise 0. */=20 + if (test_bit(MMF_DISABLE_THP_COMPLETELY, mm_flags)) + return 1; + else if (test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, mm_flags)) + return 1 | PR_THP_DISABLE_EXCEPT_ADVISED; + return 0; +} + +static int prctl_set_thp_disable(bool thp_disable, unsigned long flags, + unsigned long arg4, unsigned long arg5) +{ + unsigned long *mm_flags =3D ¤t->mm->flags; + + if (arg4 || arg5) + return -EINVAL; + + /* Flags are only allowed when disabling. */ + if ((!thp_disable && flags) || (flags & ~PR_THP_DISABLE_EXCEPT_ADVISED)) + return -EINVAL; + if (mmap_write_lock_killable(current->mm)) + return -EINTR; + if (thp_disable) { + if (flags & PR_THP_DISABLE_EXCEPT_ADVISED) { + clear_bit(MMF_DISABLE_THP_COMPLETELY, mm_flags); + set_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, mm_flags); + } else { + set_bit(MMF_DISABLE_THP_COMPLETELY, mm_flags); + clear_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, mm_flags); + } + } else { + clear_bit(MMF_DISABLE_THP_COMPLETELY, mm_flags); + clear_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, mm_flags); + } + mmap_write_unlock(current->mm); + return 0; +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, ar= g3, unsigned long, arg4, unsigned long, arg5) { @@ -2596,20 +2641,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, = arg2, unsigned long, arg3, return -EINVAL; return task_no_new_privs(current) ? 1 : 0; case PR_GET_THP_DISABLE: - if (arg2 || arg3 || arg4 || arg5) - return -EINVAL; - error =3D !!test_bit(MMF_DISABLE_THP, &me->mm->flags); + error =3D prctl_get_thp_disable(arg2, arg3, arg4, arg5); break; case PR_SET_THP_DISABLE: - if (arg3 || arg4 || arg5) - return -EINVAL; - if (mmap_write_lock_killable(me->mm)) - return -EINTR; - if (arg2) - set_bit(MMF_DISABLE_THP, &me->mm->flags); - else - clear_bit(MMF_DISABLE_THP, &me->mm->flags); - mmap_write_unlock(me->mm); + error =3D prctl_set_thp_disable(arg2, arg3, arg4, arg5); break; case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1ff0c7dd2be4..2c9008246785 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -410,7 +410,7 @@ static inline int hpage_collapse_test_exit(struct mm_st= ruct *mm) static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) { return hpage_collapse_test_exit(mm) || - test_bit(MMF_DISABLE_THP, &mm->flags); + test_bit(MMF_DISABLE_THP_COMPLETELY, &mm->flags); } =20 static bool hugepage_pmd_enabled(void) --=20 2.47.3 From nobody Sun Oct 5 16:18:50 2025 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8744D2C08BB; Thu, 31 Jul 2025 12:28:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964916; cv=none; b=hpSvnZcsGBvv12pKPAKjpfXWFxzm4kF5n/6NL2oHfMCKUR2m0jfneJPp2rEVUYgI0q3Xzygc9PFTK2haBJ9lhpUUsfzjGX2a6Xlyp+dwDhg1bh2gyeQZvXC9ljIXs3AH+Dpz8KP7FKmHCmmoZEGyST9RstpGAznfV3oas3M19ng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964916; c=relaxed/simple; bh=UciuG8bWW1nCMXt04oc8GQmG9stwv+Gzdpx7dTgbBDE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=I6JMGeTBJLejLhZGJaye9w7H8SiIePbqr3bdXH1/eiFbP6Ep2/tNopPjTLYgMr13eD9gt50en4n4lPsu6JrGvPUZLAG53keJJAJ9WwaKCRuNaH1TYj0Ptj5LTUTv3Kr5arBSV0xQ+m5FsFAFGnc6Qg4HdYqxcuUTu/vjxyGuekQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MnqIeN9+; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MnqIeN9+" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4ab814c4f2dso4683621cf.1; Thu, 31 Jul 2025 05:28:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753964913; x=1754569713; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vVGPh02pmLJrHp5VeDMRa3cXERDmLuGimyItCvJL/y4=; b=MnqIeN9+DsXKJBiM1MeZ4/Kn0NYLn8Q8GvGyF47XUCsMIPdAzG7ycwU4KhHbgRevBu 58bdu44YCZ7vFyAWas9luWDtMVP8Oqj8/fP28lw9wYbhh9+dialQuzaTXD5TFm3QOHnQ hLznyzD9yqEzjRozpTho4jBFOfjzaW+qVwzCXlE3n0yD0iOkCLpsCpJHIOC9r4UghDLA PLPNh3nfRqgGU1sofdfnm9CFJxm5Fh0DFGB81VL732XkRoaAcZYohiZ7nGLYM9QTQe0a MT7O++gTJYSmKCGINDrXiMlBC96y+RlVjZXXb5xGG1pDmyTd/8hfb5FS7Ieyds7v0D6j tVcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753964913; x=1754569713; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vVGPh02pmLJrHp5VeDMRa3cXERDmLuGimyItCvJL/y4=; b=q1SQ6RiySshNrKLYN61r3seSu4VRCwYHusNX7R9FznB8pZwx6seRIrZ0A6UeN+nzpp 2TiQFcmcoiSzswJmjvq3ObAKPQHJbD5VU8pa+riKST0ON6panYL8NW8zGA7gqE4synQN ug1BgBWs8N6OQHLPaUyRV1ugw8XYrusu4zJijNwxCNZc8MWbWo5DzhkF8YkFxi+6rwMs RZnUER5Boz2DDtDqzYXNGwwDF5fZRNtCcdeepZ1wrHUleNbSVaX8vwrlm1DWi9Ua4jO0 UOdvks54GSA45orv7oyu9+vOgWFq2UNI23W1W5qTVh+HJtGj3j/dkEi+8SEgj+V6iewF xwzQ== X-Forwarded-Encrypted: i=1; AJvYcCWTdrYP7yBSG3cl8oFZOoMJBD9O9k+Inc37S76+NP55etCNMdjntDrfHMOHo5n7WR9HvkUmSUzHnxM=@vger.kernel.org, AJvYcCXs0EzRepT2s+yZKjs5XHR82RLnGYMfTeArgnQM9UyREjqThPU9WY0B77xCN5cKnJK9MP4MWnNGjqjJIHMq@vger.kernel.org X-Gm-Message-State: AOJu0YyBlxyj3WJ+18HKelwTz1zAFfntq5ZvKuq3UrCsY1o/cM13OYq2 q/PfQWeX2vA758INis9LTBMeAyDX0UoWFfATueKeStQko5+tqBzEvilD X-Gm-Gg: ASbGncsUQn53ZEDN3sePXVp0WYpGyQ3xjomMHKdvgFnshqgHpFDWvYC56OK7WRzAKZs BNPkVilTtDt5oBOoKvih0yP/Ci/ugOZWGJAvlSgIjm1bmrUjefp4kwZZ4eHQ21YEpsmuvbT7c70 1Nvim3RvWz59XJBgLCuk/Zz2dYRgLFvnr+r//Mw5mAjY4nrKctLSvGO468xvyP8BaLCbAvGfVLh R8LGqG+07/U03MZ/D6x0znr846L1d/WG7362+9PC5wPcxcjZ8tRPnB1wnB3rx//ZVyoNsiMLTXE SQfJk3soG6wqnLpYzHp6WeOIKrUdbBzsm+clCZcOVCWNcI4ZN3IITSMQB6DKrmUHknMLQxcxBDP n1Yzz5KOjvJMZz0lo6ET3YEC0Y2XmcERnNpKohRF2 X-Google-Smtp-Source: AGHT+IGuKJdAvwjbdBZ1j3ZwGMuIpVw4ZVBxyR5tHg/Pre4/7L/XxQht4x0Inx8zUKLN64ENADXlyQ== X-Received: by 2002:ac8:5a8f:0:b0:4ab:825d:60d6 with SMTP id d75a77b69052e-4aedb9ab5e6mr101508271cf.8.1753964913180; Thu, 31 Jul 2025 05:28:33 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:5::]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4aeeebde8c0sm7794781cf.2.2025.07.31.05.28.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Jul 2025 05:28:32 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v2 2/5] mm/huge_memory: convert "tva_flags" to "enum tva_type" for thp_vma_allowable_order*() Date: Thu, 31 Jul 2025 13:27:19 +0100 Message-ID: <20250731122825.2102184-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250731122825.2102184-1-usamaarif642@gmail.com> References: <20250731122825.2102184-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand Describing the context through a type is much clearer, and good enough for our case. We have: * smaps handling for showing "THPeligible" * Pagefault handling * khugepaged handling * Forced collapse handling: primarily MADV_COLLAPSE, but one other odd case Really, we want to ignore sysfs only when we are forcing a collapse through MADV_COLLAPSE, otherwise we want to enforce. With this change, we immediately know if we are in the forced collapse case, which will be valuable next. Signed-off-by: David Hildenbrand Acked-by: Usama Arif Signed-off-by: Usama Arif --- fs/proc/task_mmu.c | 4 ++-- include/linux/huge_mm.h | 30 ++++++++++++++++++------------ mm/huge_memory.c | 8 ++++---- mm/khugepaged.c | 18 +++++++++--------- mm/memory.c | 14 ++++++-------- 5 files changed, 39 insertions(+), 35 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3d6d8a9f13fc..d440df7b3d59 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1293,8 +1293,8 @@ static int show_smap(struct seq_file *m, void *v) __show_smap(m, &mss, false); =20 seq_printf(m, "THPeligible: %8u\n", - !!thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL)); + !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS, + THP_ORDERS_ALL)); =20 if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 71db243a002e..b0ff54eee81c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -94,12 +94,15 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define THP_ORDERS_ALL \ (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAU= LT) =20 -#define TVA_SMAPS (1 << 0) /* Will be used for procfs */ -#define TVA_IN_PF (1 << 1) /* Page fault handler */ -#define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */ +enum tva_type { + TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */ + TVA_PAGEFAULT, /* Serving a page fault. */ + TVA_KHUGEPAGED, /* Khugepaged collapse. */ + TVA_FORCED_COLLAPSE, /* Forced collapse (i.e., MADV_COLLAPSE). */ +}; =20 -#define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ - (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) +#define thp_vma_allowable_order(vma, vm_flags, type, order) \ + (!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order))) =20 #define split_folio(f) split_folio_to_list(f, NULL) =20 @@ -264,14 +267,14 @@ static inline unsigned long thp_vma_suitable_orders(s= truct vm_area_struct *vma, =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders); =20 /** * thp_vma_allowable_orders - determine hugepage orders that are allowed f= or vma * @vma: the vm area to check * @vm_flags: use these vm_flags instead of vma->vm_flags - * @tva_flags: Which TVA flags to honour + * @type: TVA type * @orders: bitfield of all orders to consider * * Calculates the intersection of the requested hugepage orders and the al= lowed @@ -285,11 +288,14 @@ unsigned long __thp_vma_allowable_orders(struct vm_ar= ea_struct *vma, static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders) { - /* Optimization to check if required orders are enabled early. */ - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { + /* + * Optimization to check if required orders are enabled early. Only + * forced collapse ignores sysfs configs. + */ + if (type !=3D TVA_FORCED_COLLAPSE && vma_is_anonymous(vma)) { unsigned long mask =3D READ_ONCE(huge_anon_orders_always); =20 if (vm_flags & VM_HUGEPAGE) @@ -303,7 +309,7 @@ unsigned long thp_vma_allowable_orders(struct vm_area_s= truct *vma, return 0; } =20 - return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); + return __thp_vma_allowable_orders(vma, vm_flags, type, orders); } =20 struct thpsize { @@ -536,7 +542,7 @@ static inline unsigned long thp_vma_suitable_orders(str= uct vm_area_struct *vma, =20 static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct= *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2b4ea5a2ce7d..85252b468f80 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -99,12 +99,12 @@ static inline bool file_thp_enabled(struct vm_area_stru= ct *vma) =20 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, vm_flags_t vm_flags, - unsigned long tva_flags, + enum tva_type type, unsigned long orders) { - bool smaps =3D tva_flags & TVA_SMAPS; - bool in_pf =3D tva_flags & TVA_IN_PF; - bool enforce_sysfs =3D tva_flags & TVA_ENFORCE_SYSFS; + const bool smaps =3D type =3D=3D TVA_SMAPS; + const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; + const bool enforce_sysfs =3D type !=3D TVA_FORCED_COLLAPSE; unsigned long supported_orders; =20 /* Check the intersection of requested and supported orders. */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2c9008246785..7a54b6f2a346 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -474,8 +474,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && hugepage_pmd_enabled()) { - if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS, - PMD_ORDER)) + if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) __khugepaged_enter(vma->vm_mm); } } @@ -921,7 +920,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, struct collapse_control *cc) { struct vm_area_struct *vma; - unsigned long tva_flags =3D cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; + enum tva_type tva_type =3D cc->is_khugepaged ? TVA_KHUGEPAGED : + TVA_FORCED_COLLAPSE; =20 if (unlikely(hpage_collapse_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; @@ -932,7 +932,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm= , unsigned long address, =20 if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_type, PMD_ORDER)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1532,9 +1532,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, * in the page cache with a single hugepage. If a mm were to fault-in * this memory (mapped by a suitably aligned VMA), we'd get the hugepage * and map it by a PMD, regardless of sysfs THP settings. As such, let's - * analogously elide sysfs THP settings here. + * analogously elide sysfs THP settings here and pretend we are + * collapsing. */ - if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return SCAN_VMA_CHECK; =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ @@ -2431,8 +2432,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned = int pages, int *result, progress++; break; } - if (!thp_vma_allowable_order(vma, vma->vm_flags, - TVA_ENFORCE_SYSFS, PMD_ORDER)) { + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORD= ER)) { skip: progress++; continue; @@ -2766,7 +2766,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, BUG_ON(vma->vm_start > start); BUG_ON(vma->vm_end < end); =20 - if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return -EINVAL; =20 cc =3D kmalloc(sizeof(*cc), GFP_KERNEL); diff --git a/mm/memory.c b/mm/memory.c index 92fd18a5d8d1..be761753f240 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4369,8 +4369,8 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * Get a list of all the (large) orders below PMD_ORDER that are enabled * and suitable for swapping THP. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); orders =3D thp_swap_suitable_orders(swp_offset(entry), vmf->address, orders); @@ -4917,8 +4917,8 @@ static struct folio *alloc_anon_folio(struct vm_fault= *vmf) * for this vma. Then filter out the orders that can't be allocated over * the faulting address and still be fully contained in the vma. */ - orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders =3D thp_vma_allowable_orders(vma, vma->vm_flags, TVA_PAGEFAULT, + BIT(PMD_ORDER) - 1); orders =3D thp_vma_suitable_orders(vma, vmf->address, orders); =20 if (!orders) @@ -6108,8 +6108,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, return VM_FAULT_OOM; retry_pud: if (pud_none(*vmf.pud) && - thp_vma_allowable_order(vma, vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) { + thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PUD_ORDER)) { ret =3D create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -6143,8 +6142,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_st= ruct *vma, goto retry_pud; =20 if (pmd_none(*vmf.pmd) && - thp_vma_allowable_order(vma, vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, PMD_ORDER)) { + thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) { ret =3D create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; --=20 2.47.3 From nobody Sun Oct 5 16:18:50 2025 Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D404F2C1597; Thu, 31 Jul 2025 12:28:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964917; cv=none; b=YyY3EYQNWzvIN3q4CEW2ZPItmxNgB2sZaME5oOoaRyVnd7AZFTxGqdZgJFZ3I9xtYgDUB6fWXdFbTma8a6Pb88+uuz+oLt9zUcVFe+T8YqS16zUhQv6aeQQYQGNsYz4QSJsolHiYmnXQHFlRKC0c9pfguCCSLqikVeSVgr+czD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964917; c=relaxed/simple; bh=lmwS2ogqw4bPnaA/NfmN2fdPCVXyC/N6/6t61sTZVlU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EcR7JRge9Qpbqam2kV6Ut8FmuvR9RwT+RyS4cz1bu9gX73HdRtrMH8uFZvC9/mlc8giuT549Z83Iu53/HnuQOVf0FYbDMvflzdpjJAU0CIypdG6wzHLzEVatwKT6yAKELXDtuG3xQuHxwKpiDMf3haQiVGiNqjWnA/Zz2E9I3FY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kUWJoKRM; arc=none smtp.client-ip=209.85.219.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kUWJoKRM" Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-70749d4c5b5so7277836d6.1; Thu, 31 Jul 2025 05:28:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753964915; x=1754569715; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JREE5x0ARz2QqEVTan/K1w654MwTZ7Hc0OH3aFNqbBE=; b=kUWJoKRMClORAAw9UaCfoA+bD0ROIuDF94XT7C9KimNx7DIYccKxvnTDkwVMfrce7I z5xBZeKvUh0rowFMlrhPiZ472BnkdsW/rHmlgoIs+jfYh4EG8bs36GmDQg+j2fqj1iWI UXkY+F+znd435Sct8qsVfTD3gd9c+vDIHAIqsGPmOEshuJF1M2Vjjivt0AqW6cwTzRZd Lukdusi/mAsUNj5yaiSin/UFmBXpaNEIAD+koAKtOFTLaZar2/MMjTYFFVXRuK5843Eu IH8P/Fx3Mm79f6QxdGBXTazQ3NcCP8w3UmqfF7rrXHRCGgrz1IUtH8ZGVc1DBEQ+WAM3 H4ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753964915; x=1754569715; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JREE5x0ARz2QqEVTan/K1w654MwTZ7Hc0OH3aFNqbBE=; b=nBzb8xU6Ztx0nZV992ZASmX4/cvslWMTM9XBU14JXTmlWlaNPvCTezJxaR+bAeG+TE 1In/kHCIqMLTNxURb1YYpVjDmarEjhNzvR5nhaAOCfsH66cQJbWx7REbJjiDMa1y9fvS gFm6iXeW5sJvoUU69m8IGxOYsUfmtGubB4lThTOzmqijkbPx4WYny+4QwH+BUWkmGtOQ E9lt3Tfps6180k1zyXIL59Hq9gXRqLrIcxMxrRYHY5/9B39gf/rbvoMX0qe2/KGf2Ivn 0S4U7j9YzXDFQPVsiBjCtAn1LKz45dz43P9SIpO/oQQPPaSQeW2PwB8FjDmFvmDcxYH3 mw8A== X-Forwarded-Encrypted: i=1; AJvYcCVhXvnEMCNRtvbJWSbc/EOxmXMeeK2QPrpWVg39CfD+xWfSmx/Un8EXNXoskdlMrqeGxIj/nWyXS/z0vx5D@vger.kernel.org, AJvYcCWxUbi73O714FpKxv8Oov0BWjRFcEaNJxa1M4HXne6bDSdCOaFCvQ8RwO2iXN/AmKNfP/nsh+JnvXI=@vger.kernel.org X-Gm-Message-State: AOJu0YwvyICtKIFfxLveFX4G3nML9eQdi3FDfD0rka+e/vMQ3uHalhKl hlBHz47rsno81KbuVUZYtTOX5bJhNTjMMtI+AfqFJYQDpkf7JwkFO6Pz X-Gm-Gg: ASbGnctk+t/RtqplmlIuVOt8/kF7QKhNCz/XY9O7H5AmZKENeiUaRoIPFrbQCe7kWn6 1m+LApTfdVP8KyNbvoUCbbOBkmZOHZFYpILlA5fRuoN7Zx3NKxesHw8iHaWSuHyexCCm6OdTRCV Dbi55ju7oQ+y3N8I0JQdJyTmRg8aED8uBd/4GsAWw/3z95AQYQK6xsJQZ1v9kcrn0MckhUlwuv6 H+Pccfuc3jZEgg6ByOof1T+jQCs5tIpqDJWa//uCK6BJ9MA+X+dGHteGGC9bqGxyERpesB9Ul3w yh87kTAxz7feHCmei3+Pg6LSgMbVDk33BlNdJb36YYgYp5Xo9ou7BhLBHOBBWLd/+pARnrC27mS PLpbbde9vvmUeuzXBDTaYncbFgawQgA== X-Google-Smtp-Source: AGHT+IHnTo622/0F8XtAvaFAhVW0Cfz3Ak4EUz5CdKszTNMI3rpFIV38QgSvVNtOqU/62lqxWSWeSQ== X-Received: by 2002:a05:6214:cc6:b0:707:4d17:e280 with SMTP id 6a1803df08f44-707671e8f20mr109836866d6.28.1753964914446; Thu, 31 Jul 2025 05:28:34 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:1::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-7077ce24cd9sm6503666d6.76.2025.07.31.05.28.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Jul 2025 05:28:33 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED Date: Thu, 31 Jul 2025 13:27:20 +0100 Message-ID: <20250731122825.2102184-4-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250731122825.2102184-1-usamaarif642@gmail.com> References: <20250731122825.2102184-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand Let's allow for making MADV_COLLAPSE succeed on areas that neither have VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED). MADV_COLLAPSE is a clear advise that we want to collapse. Note that we still respect the VM_NOHUGEPAGE flag, just like MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED. Co-developed-by: Usama Arif Signed-off-by: Usama Arif Signed-off-by: David Hildenbrand --- include/linux/huge_mm.h | 8 +++++++- include/uapi/linux/prctl.h | 2 +- mm/huge_memory.c | 5 +++-- mm/memory.c | 6 ++++-- mm/shmem.c | 2 +- 5 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b0ff54eee81c..aeaf93f8ac2e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -329,7 +329,7 @@ struct thpsize { * through madvise or prctl. */ static inline bool vma_thp_disabled(struct vm_area_struct *vma, - vm_flags_t vm_flags) + vm_flags_t vm_flags, bool forced_collapse) { /* Are THPs disabled for this VMA? */ if (vm_flags & VM_NOHUGEPAGE) @@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_str= uct *vma, */ if (vm_flags & VM_HUGEPAGE) return false; + /* + * Forcing a collapse (e.g., madv_collapse), is a clear advise to + * use THPs. + */ + if (forced_collapse) + return false; return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags); } =20 diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 9c1d6e49b8a9..ee4165738779 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -185,7 +185,7 @@ struct prctl_mm_map { #define PR_SET_THP_DISABLE 41 /* * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE / - * VM_HUGEPAGE). + * VM_HUGEPAGE / MADV_COLLAPSE). */ # define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) #define PR_GET_THP_DISABLE 42 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 85252b468f80..ef5ccb0ec5d5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, { const bool smaps =3D type =3D=3D TVA_SMAPS; const bool in_pf =3D type =3D=3D TVA_PAGEFAULT; - const bool enforce_sysfs =3D type !=3D TVA_FORCED_COLLAPSE; + const bool forced_collapse =3D type =3D=3D TVA_FORCED_COLLAPSE; + const bool enforce_sysfs =3D !forced_collapse; unsigned long supported_orders; =20 /* Check the intersection of requested and supported orders. */ @@ -122,7 +123,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area= _struct *vma, if (!vma->vm_mm) /* vdso */ return 0; =20 - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags, forced_collap= se)) return 0; =20 /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/memory.c b/mm/memory.c index be761753f240..bd04212d6f79 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5186,9 +5186,11 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct f= olio *folio, struct page *pa * It is too late to allocate a small folio, we already have a large * folio in the pagecache: especially s390 KVM cannot tolerate any * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any - * PMD mappings if THPs are disabled. + * PMD mappings if THPs are disabled. As we already have a THP ... + * behave as if we are forcing a collapse. */ - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags, + /* forced_collapse=3D*/ true)) return ret; =20 if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) diff --git a/mm/shmem.c b/mm/shmem.c index e6cdfda08aed..30609197a266 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1816,7 +1816,7 @@ unsigned long shmem_allowable_huge_orders(struct inod= e *inode, vm_flags_t vm_flags =3D vma ? vma->vm_flags : 0; unsigned int global_orders; =20 - if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags, shmem= _huge_force))) return 0; =20 global_orders =3D shmem_huge_global_enabled(inode, index, write_end, --=20 2.47.3 From nobody Sun Oct 5 16:18:50 2025 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 235BC2C17A3; Thu, 31 Jul 2025 12:28:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964919; cv=none; b=BnajT0tLk9en+0l/pXr2DidPb1OEZn9YzUvzgN2bYKxqdVP+W2Gl34UimVDRjOMxE8tqZdryDGaGYQE+HP3VsJmhpBmvoB+udnkpGid/9KyjITPxWK/BVURUmSOOvsrjLbCDiZ8bbPoAmnVag4sdZ665YE/rq5s9TBRIlEVh56Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964919; c=relaxed/simple; bh=4ViueuDQjGl4HtttbzkwOuvd6U2q2wk2yvZbY78dZqg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tE93eUStRxZwDezPMAK4UxbjzplXMIXkKO/+FAImVpddaYVsEJu2OAMRYLnTv2xdH4Dl117+EHksIGiwHHPMRxJMWje1hJ5ngIGa6y8kUvuG0jwOfd1cOq8nw7V0NdZlQ31R/UwZjNsJc5qnxUyP4ACXVPzhRvB0SAfcfINKve8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=c5OJR/Sx; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="c5OJR/Sx" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-70756dc2c00so4180726d6.1; Thu, 31 Jul 2025 05:28:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753964916; x=1754569716; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DdseCjeXglmAeSVo3jNDX9M0lKkfFlPBpOaLA9628VE=; b=c5OJR/SxTfbCmPl7kddDSw7NBBBgDsYfDG9BXyR4SR1B2TbeEqapA+LrgqQvxvxRZv SFxpd7pykVi6Ocm+7097VbSTS5WNT9Jzk27fda8a4bT3Nh1EbnLaO0DTYQMmT5T3KfYC jl6PnTi4D++m7qbtNw8w/3+XtRNN4KDZ2JD13fRC4SNxX5teHaJglV7EteztbFjFbPb8 ri45vtL88X92NTyRR9B/vopmZU2YrmRL70eVyEUutUvK0I6QYyf35pWriBLA0UrKJrRR YMzRCHLNlvo1mQQYv81HlykQA29nFLgR/HD2lU4VJXl9i6mYNQIIOfO6L9wzXgCPRvTl HS4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753964916; x=1754569716; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DdseCjeXglmAeSVo3jNDX9M0lKkfFlPBpOaLA9628VE=; b=Cs9jdMkfaXoY4G7FNndBOE3tq52BG6C2t+pU/vPa/Y5k7UY6dtFp3eGvka6OFNH605 rOIXuuQPx+jjNMXq2wGTkuEnCEJBc8QNDZWxK6N3nkTnpgKhSOKNaplrPZ9zAcgNWfL2 v2bLnVfedUHvjXU4SUw/mdf4rXmXlqUbLMEvpWaE8RQCOdZms5nanu77mGVUi7Sv1hBc 03ERLfXA9M7/QVmfnJIkFykXOQFLegdT+Gc3hEst2EgNvDYx+IRPfVeBlG5oGtWArNjy cksqKzP+H3po2bIs1msdyEU0ZgutVwRYlW8WJ9NPNPXPRgt8KhEgtSMMJrDxM5MzjUFe 0w+Q== X-Forwarded-Encrypted: i=1; AJvYcCUmdrZpqM13skOnK0qoOOs9dMWhsxxGGZkCM7bZqQPosO45ipeRhctZbFeTcwc77Bwhd8+m+OMCcFNiM3Y1@vger.kernel.org, AJvYcCUq8wmKApSZ2pXePGbEhUGCbGg+uzS2MvlaWyX4rTJPZLoBwzCXJ9j45BQDnG8uQ4h4WtgCOeXx9FI=@vger.kernel.org X-Gm-Message-State: AOJu0YyCgBYs1ROFzaRCjHhX3Jmf8gyMEYqyo7mlY0cwBCjt3KxvaBKA vP8F7RB3IMzvALlNxWfQ4iFWTLo3I4WfB2A3mHanEA3YhxPh0o4eoqb2 X-Gm-Gg: ASbGncteAmQk6Kxtwxb1GnWfw1Ght8U/GRvHxJJX0nHlZZVTgeVP7p9QCUov2niLOVX ei7utS17mXZs8d/gwGzQPOwoyC48EzrlY6xpCfAvuw883vR5NVyJuM8LO/9fucwAewn7T2NIKto y1r/vHfVXtSlw5i2wv+ANWejRjrNdA1mYEmNRMAPKESzFUW+yX10Dh7zYkVQQX8yQUsyWjsK3AP /andBdQW3fjDMQUV8rDK07hYhKCbAD9xXeQqydCuThxtmG/cF4ImJwP7H0p8BzqUVdWeeSrwz+j epBPLfZ09HCMs2G63zIk25o9lniMYT9fKcLttQBgpt6UPjiHYUdY8TlQz2a9TZdYOQq7Ir8gwDw 6dbssVfVkURqut6AyXSfQFuHxdfB2Qm8= X-Google-Smtp-Source: AGHT+IENkVw8ynAfGXw4WuZqiIpcMKUMSu/EtMQV//+NQ4yDeACqLq0qMosxKHrbCo/7QDKjW7aI3w== X-Received: by 2002:a05:6214:cab:b0:707:5694:89e4 with SMTP id 6a1803df08f44-70767482bd0mr105266576d6.47.1753964915867; Thu, 31 Jul 2025 05:28:35 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:72::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-7077ca5897bsm6678386d6.39.2025.07.31.05.28.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Jul 2025 05:28:35 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v2 4/5] selftests: prctl: introduce tests for disabling THPs completely Date: Thu, 31 Jul 2025 13:27:21 +0100 Message-ID: <20250731122825.2102184-5-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250731122825.2102184-1-usamaarif642@gmail.com> References: <20250731122825.2102184-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The test will set the global system THP setting to never, madvise or always depending on the fixture variant and the 2M setting to inherit before it starts (and reset to original at teardown). This tests if the process can: - successfully set and get the policy to disable THPs completely. - never get a hugepage when the THPs are completely disabled with the prctl, including with MADV_HUGE and MADV_COLLAPSE. - successfully reset the policy of the process. - after reset, only get hugepages with: - MADV_COLLAPSE when policy is set to never. - MADV_HUGE and MADV_COLLAPSE when policy is set to madvise. - always when policy is set to "always". - repeat the above tests in a forked process to make sure the policy is carried across forks. Signed-off-by: Usama Arif --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../testing/selftests/mm/prctl_thp_disable.c | 241 ++++++++++++++++++ tools/testing/selftests/mm/thp_settings.c | 9 +- tools/testing/selftests/mm/thp_settings.h | 1 + 5 files changed, 252 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/mm/prctl_thp_disable.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftest= s/mm/.gitignore index e7b23a8a05fe..eb023ea857b3 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -58,3 +58,4 @@ pkey_sighandler_tests_32 pkey_sighandler_tests_64 guard-regions merge +prctl_thp_disable diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index d13b3cef2a2b..2bb8d3ebc17c 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -86,6 +86,7 @@ TEST_GEN_FILES +=3D on-fault-limit TEST_GEN_FILES +=3D pagemap_ioctl TEST_GEN_FILES +=3D pfnmap TEST_GEN_FILES +=3D process_madv +TEST_GEN_FILES +=3D prctl_thp_disable TEST_GEN_FILES +=3D thuge-gen TEST_GEN_FILES +=3D transhuge-stress TEST_GEN_FILES +=3D uffd-stress diff --git a/tools/testing/selftests/mm/prctl_thp_disable.c b/tools/testing= /selftests/mm/prctl_thp_disable.c new file mode 100644 index 000000000000..2f54e5e52274 --- /dev/null +++ b/tools/testing/selftests/mm/prctl_thp_disable.c @@ -0,0 +1,241 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Basic tests for PR_GET/SET_THP_DISABLE prctl calls + * + * Author(s): Usama Arif + */ +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest_harness.h" +#include "thp_settings.h" +#include "vm_util.h" + +static int sz2ord(size_t size, size_t pagesize) +{ + return __builtin_ctzll(size / pagesize); +} + +enum thp_collapse_type { + THP_COLLAPSE_NONE, + THP_COLLAPSE_MADV_HUGEPAGE, /* MADV_HUGEPAGE before access */ + THP_COLLAPSE_MADV_COLLAPSE, /* MADV_COLLAPSE after access */ +}; + +enum thp_policy { + THP_POLICY_NEVER, + THP_POLICY_MADVISE, + THP_POLICY_ALWAYS, +}; + +struct test_results { + int prctl_get_thp_disable; + int prctl_applied_collapse_none; + int prctl_applied_collapse_madv_huge; + int prctl_applied_collapse_madv_collapse; + int prctl_removed_collapse_none; + int prctl_removed_collapse_madv_huge; + int prctl_removed_collapse_madv_collapse; +}; + +/* + * Function to mmap a buffer, fault it in, madvise it appropriately (before + * page fault for MADV_HUGE, and after for MADV_COLLAPSE), and check if the + * mmap region is huge. + * Returns: + * 0 if test doesn't give hugepage + * 1 if test gives a hugepage + * -errno if mmap fails + */ +static int test_mmap_thp(enum thp_collapse_type madvise_buf, size_t pmdsiz= e) +{ + char *mem, *mmap_mem; + size_t mmap_size; + int ret; + + /* For alignment purposes, we need twice the THP size. */ + mmap_size =3D 2 * pmdsize; + mmap_mem =3D (char *)mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mmap_mem =3D=3D MAP_FAILED) + return -errno; + + /* We need a THP-aligned memory area. */ + mem =3D (char *)(((uintptr_t)mmap_mem + pmdsize) & ~(pmdsize - 1)); + + if (madvise_buf =3D=3D THP_COLLAPSE_MADV_HUGEPAGE) + madvise(mem, pmdsize, MADV_HUGEPAGE); + + /* Ensure memory is allocated */ + memset(mem, 1, pmdsize); + + if (madvise_buf =3D=3D THP_COLLAPSE_MADV_COLLAPSE) + madvise(mem, pmdsize, MADV_COLLAPSE); + + /* + * MADV_HUGEPAGE will create a new VMA at "mem", which is the address + * pattern we want to check for to detect the presence of hugepage in + * smaps. + * MADV_COLLAPSE will not create a new VMA, therefore we need to check + * for hugepage at "mmap_mem" in smaps. + * Check for hugepage at both locations to ensure that + * THP_COLLAPSE_NONE, THP_COLLAPSE_MADV_HUGEPAGE and + * THP_COLLAPSE_MADV_COLLAPSE only gives a THP when expected + * in the range [mmap_mem, mmap_mem + 2 * pmdsize]. + */ + ret =3D check_huge_anon(mem, 1, pmdsize) || + check_huge_anon(mmap_mem, 1, pmdsize); + munmap(mmap_mem, mmap_size); + return ret; +} + +static void prctl_thp_disable_test(struct __test_metadata *const _metadata, + size_t pmdsize, struct test_results *results) +{ + + ASSERT_EQ(prctl(PR_GET_THP_DISABLE, NULL, NULL, NULL, NULL), + results->prctl_get_thp_disable); + + /* tests after prctl overrides global policy */ + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_NONE, pmdsize), + results->prctl_applied_collapse_none); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_HUGEPAGE, pmdsize), + results->prctl_applied_collapse_madv_huge); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_COLLAPSE, pmdsize), + results->prctl_applied_collapse_madv_collapse); + + /* Reset to global policy */ + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 0, NULL, NULL, NULL), 0); + + /* tests after prctl is cleared, and only global policy is effective */ + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_NONE, pmdsize), + results->prctl_removed_collapse_none); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_HUGEPAGE, pmdsize), + results->prctl_removed_collapse_madv_huge); + + ASSERT_EQ(test_mmap_thp(THP_COLLAPSE_MADV_COLLAPSE, pmdsize), + results->prctl_removed_collapse_madv_collapse); +} + +FIXTURE(prctl_thp_disable_completely) +{ + struct thp_settings settings; + struct test_results results; + size_t pmdsize; +}; + +FIXTURE_VARIANT(prctl_thp_disable_completely) +{ + enum thp_policy thp_global_policy; +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_completely, never) +{ + .thp_global_policy =3D THP_POLICY_NEVER, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_completely, madvise) +{ + .thp_global_policy =3D THP_POLICY_MADVISE, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_completely, always) +{ + .thp_global_policy =3D THP_POLICY_ALWAYS, +}; + +FIXTURE_SETUP(prctl_thp_disable_completely) +{ + if (!thp_available()) + SKIP(return, "Transparent Hugepages not available\n"); + + self->pmdsize =3D read_pmd_pagesize(); + if (!self->pmdsize) + SKIP(return, "Unable to read PMD size\n"); + + thp_save_settings(); + thp_read_settings(&self->settings); + switch (variant->thp_global_policy) { + case THP_POLICY_NEVER: + self->settings.thp_enabled =3D THP_NEVER; + self->results =3D (struct test_results) { + .prctl_get_thp_disable =3D 1, + .prctl_applied_collapse_none =3D 0, + .prctl_applied_collapse_madv_huge =3D 0, + .prctl_applied_collapse_madv_collapse =3D 0, + .prctl_removed_collapse_none =3D 0, + .prctl_removed_collapse_madv_huge =3D 0, + .prctl_removed_collapse_madv_collapse =3D 1, + }; + break; + case THP_POLICY_MADVISE: + self->settings.thp_enabled =3D THP_MADVISE; + self->results =3D (struct test_results) { + .prctl_get_thp_disable =3D 1, + .prctl_applied_collapse_none =3D 0, + .prctl_applied_collapse_madv_huge =3D 0, + .prctl_applied_collapse_madv_collapse =3D 0, + .prctl_removed_collapse_none =3D 0, + .prctl_removed_collapse_madv_huge =3D 1, + .prctl_removed_collapse_madv_collapse =3D 1, + }; + break; + case THP_POLICY_ALWAYS: + self->settings.thp_enabled =3D THP_ALWAYS; + self->results =3D (struct test_results) { + .prctl_get_thp_disable =3D 1, + .prctl_applied_collapse_none =3D 0, + .prctl_applied_collapse_madv_huge =3D 0, + .prctl_applied_collapse_madv_collapse =3D 0, + .prctl_removed_collapse_none =3D 1, + .prctl_removed_collapse_madv_huge =3D 1, + .prctl_removed_collapse_madv_collapse =3D 1, + }; + break; + } + self->settings.hugepages[sz2ord(self->pmdsize, getpagesize())].enabled = =3D THP_INHERIT; + thp_write_settings(&self->settings); +} + +FIXTURE_TEARDOWN(prctl_thp_disable_completely) +{ + thp_restore_settings(); +} + +TEST_F(prctl_thp_disable_completely, nofork) +{ + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 1, NULL, NULL, NULL), 0); + prctl_thp_disable_test(_metadata, self->pmdsize, &self->results); +} + +TEST_F(prctl_thp_disable_completely, fork) +{ + int ret =3D 0; + pid_t pid; + + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 1, NULL, NULL, NULL), 0); + + /* Make sure prctl changes are carried across fork */ + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (!pid) + prctl_thp_disable_test(_metadata, self->pmdsize, &self->results); + + wait(&ret); + if (WIFEXITED(ret)) + ret =3D WEXITSTATUS(ret); + else + ret =3D -EINVAL; + ASSERT_EQ(ret, 0); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/thp_settings.c b/tools/testing/self= tests/mm/thp_settings.c index bad60ac52874..574bd0f8ae48 100644 --- a/tools/testing/selftests/mm/thp_settings.c +++ b/tools/testing/selftests/mm/thp_settings.c @@ -382,10 +382,17 @@ unsigned long thp_shmem_supported_orders(void) return __thp_supported_orders(true); } =20 -bool thp_is_enabled(void) +bool thp_available(void) { if (access(THP_SYSFS, F_OK) !=3D 0) return false; + return true; +} + +bool thp_is_enabled(void) +{ + if (!thp_available()) + return false; =20 int mode =3D thp_read_string("enabled", thp_enabled_strings); =20 diff --git a/tools/testing/selftests/mm/thp_settings.h b/tools/testing/self= tests/mm/thp_settings.h index 6c07f70beee9..76eeb712e5f1 100644 --- a/tools/testing/selftests/mm/thp_settings.h +++ b/tools/testing/selftests/mm/thp_settings.h @@ -84,6 +84,7 @@ void thp_set_read_ahead_path(char *path); unsigned long thp_supported_orders(void); unsigned long thp_shmem_supported_orders(void); =20 +bool thp_available(void); bool thp_is_enabled(void); =20 #endif /* __THP_SETTINGS_H__ */ --=20 2.47.3 From nobody Sun Oct 5 16:18:50 2025 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7015A2C3276; Thu, 31 Jul 2025 12:28:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964920; cv=none; b=nBUucWWf4652CqscDg6Pv2qaCpF4evRbL6zAd0g8Gt0oC7KfsULGUnOiTzBm2v5UMF9qPofQWe3/D0gWZt3q/qprhMHdo2ewoOpV6+K/P2ESsHKaHfzN7/Csm4BQN+BhdaZgG/cFf9AbMZr4GmZntbyuJKvOfTwdGrCmlqovTKg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753964920; c=relaxed/simple; bh=Xo2axdleKQKFm0NOSldkhQ8qF+6YtByFGwnagPvP6uA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OcDhp5gtMcabH+JvsqOPVISGEjQlkR9eskGBXRhzx05yGpkNRq3oJhrL4L3tiDnC1ntyyyWNVbOByIcA89rR/8lls+Y3AQX1ZRHTTffDQlV+i1P2QiSCRWhm/R7/70OE722k5O2DuXNJdHdxp27xlLKEgnMt8thZC59X7vkgYWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mYVt4S29; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mYVt4S29" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-7e6817653d0so55747585a.2; Thu, 31 Jul 2025 05:28:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753964917; x=1754569717; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ayWZPS7L38ZOhjcOnH9WSUS7UcJth2Mxw9Vr7h3Jt6g=; b=mYVt4S29VjTPQMr0r+iPvsPY5huy4h+5oBFoH2F7H1Sx6ynx20MAJZVNeH4ZO0tfBJ Ro6BmVrzC+LMz9rOCvvXGkuGmn1/ENiXhXYGM6xlODFKpnmhuaZA1Wo7b2rWUDxy8ISd TY4bMNdaXeWXw5fujZUfDC8wOs2XVvTH3M/ppKfPpVmaMj5pPpK7+8ZgSxdM6X/cOX3O EBXSBkbPw1uoYQGobWk6mzmxMtbu9wAeJYXsKDlLLm6vI8wpaaI+dDzxeLbjbyuAewb5 Wlt+q93ZrhYfefwGU+2YPzrmgLp8RTxLn/mcIe6YzGN5564QFO/iWAY7s1yyC4FCFMf9 MQng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753964917; x=1754569717; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ayWZPS7L38ZOhjcOnH9WSUS7UcJth2Mxw9Vr7h3Jt6g=; b=wk6k5skPWayReXzylZxdIujC/Zw0OI6eUlDn0Jhd0xd6vdH6xVDcMMYp7fIjpuK+bV vch0h7m8X++mGtNotMQkzYdhyjsY47HYYTdakEB6sbgV8UZyL4DpPHfUNZ/vsvkC4iCx KJ/E9cAF+Zi1yVfG808GcJX68PSEd5tl0ySLUxve0uplBAG+Gyn3Quy/rdag5M76R4SR 6rCsr0Fx1qb1DdwcLdGn1UgSPdYahHgWKGsYr9l1GPZLrJpm18WG9K+lz9qyRsYgA2Fe D4bp9YzBMDFAvLZ01MxegwZUQCzBJqnTOc3tsOLDNnSfqRuyaSngtAKkCXfPkJ3esWmb //Iw== X-Forwarded-Encrypted: i=1; AJvYcCUqzwtUUV5zCgVUi9LaVwTJyrwJfGyRu4Pzsez6KYzaS/1HQEvF2B17b60/Dv/kWj+IRpFYwaFfRxA=@vger.kernel.org, AJvYcCWnRaEn6zZcruNeA21yjYkITs3LZtJV/6ihlwFdcC8n9nC5SAmdKciyRTUYdnNwDpb6BeLK6WyCVupE6nI6@vger.kernel.org X-Gm-Message-State: AOJu0Yyw4fr7bEjK3SNp2kte1HcTI2agSzNMzt4a/D94hMQnLC6mvD5a L4TN1+i1IhCNF8JtJK0fjAShrHxaJ0VifmDVowApoOVDLU33JVRrwJRq X-Gm-Gg: ASbGncstGUYy2r6+i1//7LoHaVFuxu7qsL+OPQW/Z5Svd41vWIukjSEgpBzC0AImpED VZUZP1gK6BP1gJ27ppXzcQiyuxlQ6UkMuQY58NbQrcGWaeGA4ZgU5Bnm96X8tHee+/PEV+5w91Y 7lt8U9nF3hPfyQ/lVYSC2NT0iqqNM/kDPmpDX15yBlHj+3ad08MB8U1gLvJj7BQJbGRp6ntczs1 c+RjHxBnTi4C1O34IIlLG8W2skWq1T2hxVD5vKx1iMRZBM0kw6WNo8xuT5qoomtMgrSuCMJndLW LyO76Q7yqY1wS4FyQ/ZYx1GgXZbySw0XkHxFiogw7qMTxDlKEY+oekknlk0eFQCdt+2Z4kYzzHb VkNhDxp/x1VksmJ0M5zM= X-Google-Smtp-Source: AGHT+IE6BW4SDa30IuP4QhZLFrJkcl9wTyzJR0mdkv6O4rHte+RpwPOH5khsoAqI/rBgCbRpwrMO0w== X-Received: by 2002:a05:6214:5010:b0:707:5319:d3cd with SMTP id 6a1803df08f44-70767456d88mr92125636d6.26.1753964917125; Thu, 31 Jul 2025 05:28:37 -0700 (PDT) Received: from localhost ([2a03:2880:20ff:7::]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-7077cea87b7sm6577436d6.87.2025.07.31.05.28.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Jul 2025 05:28:36 -0700 (PDT) From: Usama Arif To: Andrew Morton , david@redhat.com, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, corbet@lwn.net, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , sj@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v2 5/5] selftests: prctl: introduce tests for disabling THPs except for madvise Date: Thu, 31 Jul 2025 13:27:22 +0100 Message-ID: <20250731122825.2102184-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250731122825.2102184-1-usamaarif642@gmail.com> References: <20250731122825.2102184-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The test will set the global system THP setting to never, madvise or always depending on the fixture variant and the 2M setting to inherit before it starts (and reset to original at teardown) This tests if the process can: - successfully set and get the policy to disable THPs expect for madvise. - get hugepages only on MADV_HUGE and MADV_COLLAPSE if the global policy is madvise/always and only with MADV_COLLAPSE if the global policy is never. - successfully reset the policy of the process. - after reset, only get hugepages with: - MADV_COLLAPSE when policy is set to never. - MADV_HUGE and MADV_COLLAPSE when policy is set to madvise. - always when policy is set to "always". - repeat the above tests in a forked process to make sure the policy is carried across forks. Signed-off-by: Usama Arif --- .../testing/selftests/mm/prctl_thp_disable.c | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) diff --git a/tools/testing/selftests/mm/prctl_thp_disable.c b/tools/testing= /selftests/mm/prctl_thp_disable.c index 2f54e5e52274..3c34ac7e11f1 100644 --- a/tools/testing/selftests/mm/prctl_thp_disable.c +++ b/tools/testing/selftests/mm/prctl_thp_disable.c @@ -16,6 +16,10 @@ #include "thp_settings.h" #include "vm_util.h" =20 +#ifndef PR_THP_DISABLE_EXCEPT_ADVISED +#define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) +#endif + static int sz2ord(size_t size, size_t pagesize) { return __builtin_ctzll(size / pagesize); @@ -238,4 +242,117 @@ TEST_F(prctl_thp_disable_completely, fork) ASSERT_EQ(ret, 0); } =20 +FIXTURE(prctl_thp_disable_except_madvise) +{ + struct thp_settings settings; + struct test_results results; + size_t pmdsize; +}; + +FIXTURE_VARIANT(prctl_thp_disable_except_madvise) +{ + enum thp_policy thp_global_policy; +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_except_madvise, never) +{ + .thp_global_policy =3D THP_POLICY_NEVER, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_except_madvise, madvise) +{ + .thp_global_policy =3D THP_POLICY_MADVISE, +}; + +FIXTURE_VARIANT_ADD(prctl_thp_disable_except_madvise, always) +{ + .thp_global_policy =3D THP_POLICY_ALWAYS, +}; + +FIXTURE_SETUP(prctl_thp_disable_except_madvise) +{ + if (!thp_available()) + SKIP(return, "Transparent Hugepages not available\n"); + + self->pmdsize =3D read_pmd_pagesize(); + if (!self->pmdsize) + SKIP(return, "Unable to read PMD size\n"); + + thp_save_settings(); + thp_read_settings(&self->settings); + switch (variant->thp_global_policy) { + case THP_POLICY_NEVER: + self->settings.thp_enabled =3D THP_NEVER; + self->results =3D (struct test_results) { + .prctl_get_thp_disable =3D 3, + .prctl_applied_collapse_none =3D 0, + .prctl_applied_collapse_madv_huge =3D 0, + .prctl_applied_collapse_madv_collapse =3D 1, + .prctl_removed_collapse_none =3D 0, + .prctl_removed_collapse_madv_huge =3D 0, + .prctl_removed_collapse_madv_collapse =3D 1, + }; + break; + case THP_POLICY_MADVISE: + self->settings.thp_enabled =3D THP_MADVISE; + self->results =3D (struct test_results) { + .prctl_get_thp_disable =3D 3, + .prctl_applied_collapse_none =3D 0, + .prctl_applied_collapse_madv_huge =3D 1, + .prctl_applied_collapse_madv_collapse =3D 1, + .prctl_removed_collapse_none =3D 0, + .prctl_removed_collapse_madv_huge =3D 1, + .prctl_removed_collapse_madv_collapse =3D 1, + }; + break; + case THP_POLICY_ALWAYS: + self->settings.thp_enabled =3D THP_ALWAYS; + self->results =3D (struct test_results) { + .prctl_get_thp_disable =3D 3, + .prctl_applied_collapse_none =3D 0, + .prctl_applied_collapse_madv_huge =3D 1, + .prctl_applied_collapse_madv_collapse =3D 1, + .prctl_removed_collapse_none =3D 1, + .prctl_removed_collapse_madv_huge =3D 1, + .prctl_removed_collapse_madv_collapse =3D 1, + }; + break; + } + self->settings.hugepages[sz2ord(self->pmdsize, getpagesize())].enabled = =3D THP_INHERIT; + thp_write_settings(&self->settings); +} + +FIXTURE_TEARDOWN(prctl_thp_disable_except_madvise) +{ + thp_restore_settings(); +} + +TEST_F(prctl_thp_disable_except_madvise, nofork) +{ + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, NUL= L, NULL), 0); + prctl_thp_disable_test(_metadata, self->pmdsize, &self->results); +} + +TEST_F(prctl_thp_disable_except_madvise, fork) +{ + int ret =3D 0; + pid_t pid; + + ASSERT_EQ(prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, NUL= L, NULL), 0); + + /* Make sure prctl changes are carried across fork */ + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (!pid) + prctl_thp_disable_test(_metadata, self->pmdsize, &self->results); + + wait(&ret); + if (WIFEXITED(ret)) + ret =3D WEXITSTATUS(ret); + else + ret =3D -EINVAL; + ASSERT_EQ(ret, 0); +} + TEST_HARNESS_MAIN --=20 2.47.3