From nobody Fri Dec 19 13:14:23 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63AEFC4167B for ; Sat, 9 Dec 2023 07:00:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236033AbjLIHAk (ORCPT ); Sat, 9 Dec 2023 02:00:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234377AbjLIHAU (ORCPT ); Sat, 9 Dec 2023 02:00:20 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6C4E1BCF; Fri, 8 Dec 2023 23:00:00 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id 5614622812f47-3b9e1a3e3f0so1535775b6e.1; Fri, 08 Dec 2023 23:00:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105200; x=1702710000; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qh3VQET8gdSfaH63bKHpkp7oSbUAim1IJTPPsejDsgw=; b=gp8UQYH7Vn0VPLF+VbLzfNdz5SrBAI7IRLk6PIfbC8tB5kWsJbmJlbZA4zNQHRwpV/ /RiN+9OLBYJERatXDLF3g3kucWKtqxSIkqdvYEeaBWF7kGjyGEK8IXzRJpnzvVfFju8a WMCBwP3n2TavAj+QoueAR6TkC/12oo2BdLc8wMT7ToJnwqj/zDwR+KShP6vx6a02EbZJ DwTWVoMwGZKMZHLbGpFn1u61mGgvvVVnOe4BOJ9rJUZZ21+e+8Sv0r7xu4uFkkTyV0jC Wmy9gE4bbbx/pMA7ruv4/COzS1nGevGEhDljOK6+8tprvOvygsys4OrHQDHJdrjGhf62 5NFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105200; x=1702710000; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qh3VQET8gdSfaH63bKHpkp7oSbUAim1IJTPPsejDsgw=; b=OkS9cDbT96ul0COD7WUXotvQSQneO/nYNlx9Jy/4kJ+3M7+cphN72H0DURXPJPN5tF q62Ecf+jWW/kNKzhTYync26uzazEI+ISehaLRtW9qX+bPZjapCu5v8/6Q90DorQShRhX HRPAQbD71bTtoWAMj4aShZ/eXzD/CClhw4PBsPASKaMsOA4KuNk2q5MHwjgNejPaAmW0 K67gOoL3ma81RMZoALZVx30Hw6lXUTqDx2ELg5kjoh3fQN6+6AvrL6p4OhYW5zBvNEnD MRhiNzkg6KFepOXYiWIBfTP467s4KgPkwEfZMWtgw5zAH3srsQ01GeMXo3PGyf12Ah0F TeqA== X-Gm-Message-State: AOJu0Yz+SCHJIEwcgUMv7sBQj750xU0hFDSYgUyWZrhhLuVuRMkLJ+bP jf+7dd7hYy87B9KCFqroBw== X-Google-Smtp-Source: AGHT+IG2mj1pOh1HHLifMkPv5gob2juFXWv2nAjPYtFYlMm5LsHYKCXmC/3wmfCJb7v1aXs04oFSqQ== X-Received: by 2002:a05:6808:114a:b0:3b8:39d9:90fd with SMTP id u10-20020a056808114a00b003b839d990fdmr1809704oiu.39.1702105199992; Fri, 08 Dec 2023 22:59:59 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.22.59.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 22:59:59 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, Michal Hocko Subject: [PATCH v2 08/11] mm/mempolicy: add set_mempolicy2 syscall Date: Sat, 9 Dec 2023 01:59:28 -0500 Message-Id: <20231209065931.3458-9-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" set_mempolicy2 is an extensible set_mempolicy interface which allows a user to set the per-task memory policy. Defined as: set_mempolicy2(struct mpol_args *args, size_t size, unsigned long flags); relevant mpol_args fields include the following: mode: The MPOL_* policy (DEFAULT, INTERLEAVE, etc.) mode_flags: The MPOL_F_* flags that were previously passed in or'd into the mode. This was split to hopefully allow future extensions additional mode/flag space. pol_nodes: the nodemask to apply for the memory policy pol_maxnodes: The max number of nodes described by pol_nodes The usize arg is intended for the user to pass in sizeof(mpol_args) to allow forward/backward compatibility whenever possible. The flags argument is intended to future proof the syscall against future extensions which may require interpreting the arguments in the structure differently. Semantics of `set_mempolicy` are otherwise the same as `set_mempolicy` as of this patch. Suggested-by: Michal Hocko Signed-off-by: Gregory Price --- .../admin-guide/mm/numa_memory_policy.rst | 10 ++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/unistd.h | 4 ++- mm/mempolicy.c | 36 +++++++++++++++++++ 18 files changed, 65 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Document= ation/admin-guide/mm/numa_memory_policy.rst index 64c5804dc40f..aabc24db92d3 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -432,6 +432,8 @@ Set [Task] Memory Policy:: =20 long set_mempolicy(int mode, const unsigned long *nmask, unsigned long maxnode); + long set_mempolicy2(struct mpol_args args, size_t size, + unsigned long flags); =20 Set's the calling task's "task/process memory policy" to mode specified by the 'mode' argument and the set of nodes defined by @@ -440,6 +442,12 @@ specified by the 'mode' argument and the set of nodes = defined by 'mode' argument with the flag (for example: MPOL_INTERLEAVE | MPOL_F_STATIC_NODES). =20 +set_mempolicy2() is an extended version of set_mempolicy() capable +of setting a mempolicy which requires more information than can be +passed via get_mempolicy(). For example, weighted interleave with +task-local weights requires a weight array to be passed via the +'mpol_args->il_weights' argument in the 'struct mpol_args' arg. + See the set_mempolicy(2) man page for more details =20 =20 @@ -498,6 +506,8 @@ Extended Mempolicy Arguments:: The extended mempolicy argument structure is defined to allow the mempolicy interfaces future extensibility without the need for additional system cal= ls. =20 +Extended interfaces (set_mempolicy2) use this argument structure. + The core arguments (mode, mode_flags, pol_nodes, and pol_maxnodes) apply to all interfaces relative to their non-extended counterparts. Each additional field may only apply to specific extended interfaces. See the respective diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/sys= calls/syscall.tbl index 18c842ca6c32..0dc288a1118a 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -496,3 +496,4 @@ 564 common futex_wake sys_futex_wake 565 common futex_wait sys_futex_wait 566 common futex_requeue sys_futex_requeue +567 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 584f9528c996..50172ec0e1f5 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -470,3 +470,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/sysca= lls/syscall.tbl index 7a4b780e82cb..839d90c535f2 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -456,3 +456,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/= kernel/syscalls/syscall.tbl index 5b6a0b02b7de..567c8b883735 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -462,3 +462,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/s= yscalls/syscall_n32.tbl index a842b41c8e06..cc0640e16f2f 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -395,3 +395,4 @@ 454 n32 futex_wake sys_futex_wake 455 n32 futex_wait sys_futex_wait 456 n32 futex_requeue sys_futex_requeue +457 n32 set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/s= yscalls/syscall_o32.tbl index 525cc54bc63b..f7262fde98d9 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -444,3 +444,4 @@ 454 o32 futex_wake sys_futex_wake 455 o32 futex_wait sys_futex_wait 456 o32 futex_requeue sys_futex_requeue +457 o32 set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/s= yscalls/syscall.tbl index a47798fed54e..e10f0e8bd064 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -455,3 +455,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel= /syscalls/syscall.tbl index 7fab411378f2..4f03f5f42b78 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -543,3 +543,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/sysca= lls/syscall.tbl index 86fec9b080f6..f98dadc2e9df 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -459,3 +459,4 @@ 454 common futex_wake sys_futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/= syscall.tbl index 363fae0fe9bf..f47ba9f2d05d 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -459,3 +459,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/sys= calls/syscall.tbl index 7bcaa3d5ea44..53fb16616728 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -502,3 +502,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscal= ls/syscall_32.tbl index c8fac5205803..4b4dc41b24ee 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -461,3 +461,4 @@ 454 i386 futex_wake sys_futex_wake 455 i386 futex_wait sys_futex_wait 456 i386 futex_requeue sys_futex_requeue +457 i386 set_mempolicy2 sys_set_mempolicy2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscal= ls/syscall_64.tbl index 8cb8bf68721c..1bc2190bec27 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -378,6 +378,7 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 =20 # # Due to a historical design error, certain syscalls are numbered differen= tly diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/s= yscalls/syscall.tbl index 06eefa9c1458..e26dc89399eb 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -427,3 +427,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common set_mempolicy2 sys_set_mempolicy2 diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index fd9d12de7e92..3244cd990858 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -822,6 +822,8 @@ asmlinkage long sys_get_mempolicy(int __user *policy, unsigned long addr, unsigned long flags); asmlinkage long sys_set_mempolicy(int mode, const unsigned long __user *nm= ask, unsigned long maxnode); +asmlinkage long sys_set_mempolicy2(struct mpol_args *args, size_t size, + unsigned long flags); asmlinkage long sys_migrate_pages(pid_t pid, unsigned long maxnode, const unsigned long __user *from, const unsigned long __user *to); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/u= nistd.h index 756b013fb832..55486aba099f 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_requeue 456 __SYSCALL(__NR_futex_requeue, sys_futex_requeue) +#define __NR_set_mempolicy2 457 +__SYSCALL(__NR_set_mempolicy2, sys_set_mempolicy2) =20 #undef __NR_syscalls -#define __NR_syscalls 457 +#define __NR_syscalls 458 =20 /* * 32 bit systems traditionally used different diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 446167dcebdc..a56ff02f780e 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1633,6 +1633,42 @@ SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsi= gned long __user *, nmask, return kernel_set_mempolicy(mode, nmask, maxnode); } =20 +SYSCALL_DEFINE3(set_mempolicy2, struct mpol_args __user *, uargs, size_t, = usize, + unsigned long, flags) +{ + struct mpol_args kargs; + struct mempolicy_args margs; + int err; + nodemask_t policy_nodemask; + unsigned long __user *nodes_ptr; + + if (flags) + return -EINVAL; + + err =3D copy_struct_from_user(&kargs, sizeof(kargs), uargs, usize); + if (err) + return err; + + err =3D validate_mpol_flags(kargs.mode, &kargs.mode_flags); + if (err) + return err; + + memset(&margs, '\0', sizeof(margs)); + margs.mode =3D kargs.mode; + margs.mode_flags =3D kargs.mode_flags; + if (kargs.pol_nodes) { + nodes_ptr =3D u64_to_user_ptr(kargs.pol_nodes); + err =3D get_nodes(&policy_nodemask, nodes_ptr, + kargs.pol_maxnodes); + if (err) + return err; + margs.policy_nodes =3D &policy_nodemask; + } else + margs.policy_nodes =3D NULL; + + return do_set_mempolicy(&margs); +} + static int kernel_migrate_pages(pid_t pid, unsigned long maxnode, const unsigned long __user *old_nodes, const unsigned long __user *new_nodes) --=20 2.39.1