From nobody Mon Jun 15 06:29:51 2026 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DA533D47A7 for ; Wed, 8 Apr 2026 14:55:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775660111; cv=none; b=skk7EFpAnK0nU6GnRGlDgHMAXfC0N7afSg6r6zebTfm4kkJXdVZDzLx2lsd6ulYHYiMKxXDG0bRGBRG2E9BbjdNUw9YbdKyaP2B90marRZCZt3enLiv9gfvjsHoKG5Otba46qreTz/L8+Zjl8t/cs2ySqX83GIGS0duowsPX7oM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775660111; c=relaxed/simple; bh=QLu/1dlSGUEA+04A2M/3X4XfnrGk+BU4pH2CvoOsF1M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V8ZRCX0lpUYw6vCuGV1AtQtSY+/gdl3UcAsnd5hZ0HtOSWsmQ3/D/u1BDNWzcSOAVesbJWQvIBYd/zzgsrqI+NVkbejulSsbczk7AJtr271QeEpP0+ZQg2dqu6XpOnbZP3I4VZ9w17X269kAu0BQ2qBCLCCQS+5T9fHEYO4lTzM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JM2kvNIa; arc=none smtp.client-ip=209.85.221.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JM2kvNIa" Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-43cfb723698so5470819f8f.3 for ; Wed, 08 Apr 2026 07:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775660108; x=1776264908; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=3wgIIDUfJ0o70u9l3Zid2L+X0IsBnuKW+IYefvDTKd0=; b=JM2kvNIa6n9NkFdGu06ZDq9/yIgPw+dgf0PmbuzNCebkuU5nmwtARZvpKYkPJc1VuO DTm3B+DG4joNwOzeuHNHHFUpNis9Nu7OHzCN+q7+I0/lCUHcENwwam/y9/RcRBGdXAdF LCB1JLk1tuOEtNUYo0MDOD9qzg4qBi86p5Hr3VHTu8dLlIZzwdKJZ5iDnK6UR8E23ROg Vaa5xoiPeFzgYDmWF9TlTvMBvfydoR7uBiulVj/KeDQshSwZxCzqyDFfkh/D49zXayBJ WP0VEGaNr1/Qf8bJ8+m77P7hdsTbOts216pa5EoYf8ex+GJehLcdOaBTmKLiiDqs03Zc yufg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775660108; x=1776264908; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3wgIIDUfJ0o70u9l3Zid2L+X0IsBnuKW+IYefvDTKd0=; b=soS1mP134doeUrYeZckwFb2aKQMAVO0+UIapcCqbD0jNL7BOSfIVh16jC8mjuelGqg 7/rHi23iY6H9+0b/I27dW60MQ//WH2E3l2O2KOfJmFiB3LZmmTXy6c6cZzDVieAgEIvl 0UzHcJUOc0kOJ6nMtJvsNDSMoE9mVvwRUyspedIRsqD2dQYW+27q7T9FX/TgxQmnkQEr B7h3ex0KGVp9zahrXnbd1C/epXd9yBLch8xG97U4SyAAm5Oq1lcto+e+GImj33AZiZRw jGz30ix9deNqsbVn5PRx6g6lFKbLOElR8QJ3vtpkRznWAHoZF1W1CA5daoyaTsOi8yR4 CyGQ== X-Forwarded-Encrypted: i=1; AJvYcCVaPloBa6rIZciCMz+WlxSiILY3AkorTteeJosSbN5EB0Hng3x9U0G+uXEGA5VjnrMJTxMG1YbXOxyN20Q=@vger.kernel.org X-Gm-Message-State: AOJu0Yz0PE7jSblB1FoLxUMV6kgycChMNNVLZxAQQfIkLd5l3AKCsde1 vNkQCysrNJFJ0V4od2S8lHArjPnIP+LvpMDhX3GjkF29BRHSuHWOOmWP X-Gm-Gg: AeBDievuYGNNU/DxEqv5WMfj30q2ODjhjRdc7ENhq4q6L3N0VM0JtSqXMe3NDZ7kYlC WKH57p4WuwJmQcrydTXviYnUlWwHX66yp12GAfeux7l64njIFJ8nSMX/Gv+UVCEzxVreS61b+Kl rogePI6a7F5phDFBOp5trWMu8OWthwChN3HH1FDuVhOhhvUCdZBSEM8utwrLxlAbqUM3hzySOzo zD+MUbUeNh/xmWBUTh5KmAeoBLVAPSmMPHaP2W8hDdlgP2RKrtxbRy71KApSRexPcIOWy6z1ncS YU8wz4D1LYTFC7+1ImCeXKjGrcEt5bomPyMe68HztuBXZBzI4Y2OL5nQ3msjUTzKVpkPXDV7FUp SfwYudWtMaaKQiSinu7JoSJ+HYTpMEbEq6tivkZQAXKL75uLUtljWWxlBArs6eYvy6tTdx6uqv5 zDnC3iei3MOAkS/iEEnoVNMY5x X-Received: by 2002:a05:6000:40cc:b0:43c:ef4f:79cd with SMTP id ffacd0b85a97d-43d292e70e0mr31995209f8f.48.1775660107660; Wed, 08 Apr 2026 07:55:07 -0700 (PDT) Received: from localhost.localdomain ([2a02:8308:b093:bb00::3006]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43d1e4f5294sm55762103f8f.35.2026.04.08.07.55.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2026 07:55:07 -0700 (PDT) Sender: Alban Crequy From: Alban Crequy To: Andrew Morton , David Hildenbrand , Christian Brauner Cc: Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alban Crequy , Alban Crequy , Peter Xu , Willy Tarreau , linux-kselftest@vger.kernel.org, shuah@kernel.org Subject: [PATCH v2 1/2] mm/process_vm_access: pidfd and nowait support for process_vm_readv/writev Date: Wed, 8 Apr 2026 16:54:35 +0200 Message-ID: <20260408145436.843538-2-alban.crequy@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20260408145436.843538-1-alban.crequy@gmail.com> References: <20260408145436.843538-1-alban.crequy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alban Crequy There are two categories of users for process_vm_readv: 1. Debuggers like GDB or strace. When a debugger attempts to read the target memory and triggers a page fault, the page fault needs to be resolved so that the debugger can accurately interpret the memory. A debugger is typically attached to a single process. 2. Profilers like OpenTelemetry eBPF Profiler. The profiler uses a perf event to get stack traces from all processes at 20Hz (20 stack traces to resolve per second). For interpreted languages (Ruby, Python, etc.), the profiler uses process_vm_readv to get the correct symbols. In this case, performance is the most important. It is fine if some stack traces cannot be resolved as long as it is not statistically significant. The current behaviour of process_vm_readv is to resolve page faults in the target VM. This is as desired for debuggers, but unwelcome for profilers because the page fault resolution could take a lot of time depending on the backing filesystem. Additionally, since profilers monitor all processes, we don't want a slow page fault resolution for one target process slowing down the monitoring for all other target processes. This patch adds the flag PROCESS_VM_NOWAIT, so the caller can choose to not block on IO if the memory access causes a page fault. Additionally, this patch adds the flag PROCESS_VM_PIDFD to refer to the remote process via PID file descriptor instead of PID. Such a file descriptor can be obtained with pidfd_open(2). This is useful to avoid the pid number being reused. It is unlikely to happen for debuggers because they can monitor the target process termination in other ways (ptrace), but can be helpful in some profiling scenarios. If a given flag is unsupported, the syscall returns the error EINVAL without checking the buffers. This gives a way to userspace to detect whether the current kernel supports a specific flag: process_vm_readv(pid, NULL, 1, NULL, 1, PROCESS_VM_PIDFD) -> EINVAL if the kernel does not support the flag PROCESS_VM_PIDFD (before this patch) -> EFAULT if the kernel supports the flag (after this patch) Signed-off-by: Alban Crequy Reviewed-by: Christian Brauner --- v2: - Expand commit message with use-case motivation (David Hildenbrand) - Use unsigned long consistently for pvm_flags parameter (David Hildenbrand) - Add PROCESS_VM_SUPPORTED_FLAGS kernel-internal define (David Hildenbrand) - Keep (1UL << N) in UAPI header: BIT() is defined in vdso/bits.h which is not exported to userspace, so UAPI headers using BIT() would break when included from userspace programs (David Hildenbrand) MAINTAINERS | 1 + include/uapi/linux/process_vm.h | 9 +++++++++ mm/process_vm_access.c | 24 ++++++++++++++++++------ 3 files changed, 28 insertions(+), 6 deletions(-) create mode 100644 include/uapi/linux/process_vm.h diff --git a/MAINTAINERS b/MAINTAINERS index c3fe46d7c4bc..f7168c5d7acc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16678,6 +16678,7 @@ F: include/linux/pgtable.h F: include/linux/ptdump.h F: include/linux/vmpressure.h F: include/linux/vmstat.h +F: include/uapi/linux/process_vm.h F: kernel/fork.c F: mm/Kconfig F: mm/debug.c diff --git a/include/uapi/linux/process_vm.h b/include/uapi/linux/process_v= m.h new file mode 100644 index 000000000000..4168e09f3f4e --- /dev/null +++ b/include/uapi/linux/process_vm.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_PROCESS_VM_H +#define _UAPI_LINUX_PROCESS_VM_H + +/* Flags for process_vm_readv/process_vm_writev */ +#define PROCESS_VM_PIDFD (1UL << 0) +#define PROCESS_VM_NOWAIT (1UL << 1) + +#endif /* _UAPI_LINUX_PROCESS_VM_H */ diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index 656d3e88755b..c6a25e9993e1 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -14,6 +14,9 @@ #include #include #include +#include + +#define PROCESS_VM_SUPPORTED_FLAGS (PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT) =20 /** * process_vm_rw_pages - read/write pages from task specified @@ -68,6 +71,7 @@ static int process_vm_rw_pages(struct page **pages, * @mm: mm for task * @task: task to read/write from * @vm_write: 0 means copy from, 1 means copy to + * @pvm_flags: PROCESS_VM_* flags * Returns 0 on success or on failure error code */ static int process_vm_rw_single_vec(unsigned long addr, @@ -76,7 +80,8 @@ static int process_vm_rw_single_vec(unsigned long addr, struct page **process_pages, struct mm_struct *mm, struct task_struct *task, - int vm_write) + int vm_write, + unsigned long pvm_flags) { unsigned long pa =3D addr & PAGE_MASK; unsigned long start_offset =3D addr - pa; @@ -91,6 +96,8 @@ static int process_vm_rw_single_vec(unsigned long addr, =20 if (vm_write) flags |=3D FOLL_WRITE; + if (pvm_flags & PROCESS_VM_NOWAIT) + flags |=3D FOLL_NOWAIT; =20 while (!rc && nr_pages && iov_iter_count(iter)) { int pinned_pages =3D min_t(unsigned long, nr_pages, PVM_MAX_USER_PAGES); @@ -141,7 +148,7 @@ static int process_vm_rw_single_vec(unsigned long addr, * @iter: where to copy to/from locally * @rvec: iovec array specifying where to copy to/from in the other process * @riovcnt: size of rvec array - * @flags: currently unused + * @flags: process_vm_readv/writev flags * @vm_write: 0 if reading from other process, 1 if writing to other proce= ss * * Returns the number of bytes read/written or error code. May @@ -163,6 +170,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov= _iter *iter, unsigned long nr_pages_iov; ssize_t iov_len; size_t total_len =3D iov_iter_count(iter); + unsigned int f_flags; =20 /* * Work out how many pages of struct pages we're going to need @@ -194,7 +202,11 @@ static ssize_t process_vm_rw_core(pid_t pid, struct io= v_iter *iter, } =20 /* Get process information */ - task =3D find_get_task_by_vpid(pid); + if (flags & PROCESS_VM_PIDFD) + task =3D pidfd_get_task(pid, &f_flags); + else + task =3D find_get_task_by_vpid(pid); + if (!task) { rc =3D -ESRCH; goto free_proc_pages; @@ -215,7 +227,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov= _iter *iter, for (i =3D 0; i < riovcnt && iov_iter_count(iter) && !rc; i++) rc =3D process_vm_rw_single_vec( (unsigned long)rvec[i].iov_base, rvec[i].iov_len, - iter, process_pages, mm, task, vm_write); + iter, process_pages, mm, task, vm_write, flags); =20 /* copied =3D space before - space after */ total_len -=3D iov_iter_count(iter); @@ -244,7 +256,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov= _iter *iter, * @liovcnt: size of lvec array * @rvec: iovec array specifying where to copy to/from in the other process * @riovcnt: size of rvec array - * @flags: currently unused + * @flags: process_vm_readv/writev flags * @vm_write: 0 if reading from other process, 1 if writing to other proce= ss * * Returns the number of bytes read/written or error code. May @@ -266,7 +278,7 @@ static ssize_t process_vm_rw(pid_t pid, ssize_t rc; int dir =3D vm_write ? ITER_SOURCE : ITER_DEST; =20 - if (flags !=3D 0) + if (flags & ~PROCESS_VM_SUPPORTED_FLAGS) return -EINVAL; =20 /* Check iovecs */ --=20 2.45.0 From nobody Mon Jun 15 06:29:51 2026 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50E933C9EEE for ; Wed, 8 Apr 2026 14:55:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775660139; cv=none; b=YbTp8emAldsstEhlv9TGdmDwNwmAPYaM4W9tOiyCgGPfQZ81CYUfwKpnlguVh5tG/WnEklk1I8gWo2PaHCgEybqUHflvU067Rwdmv13+xmXuDbjxCUuoOQhM9hul4xobZG8ro+QPmG3gSe080jxlD9KQUa7xcmfcLQUqSO2UphM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775660139; c=relaxed/simple; bh=Y2+213sQ20evLCG0V7wuf3xrLthHuVt5lzhfxNy50SY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LlBZ9DiI3iuJltbWkLUHoRB7RIuy0qzG7EcSq+3ADJvmB0zq3MB4+x40n+0kzEiL4wA3a0GCKixnMDpNMXXkDjufyrg5gluS0VOtrQFjiahmlz8ralZfe3RdBBN+RQXwz8bZxUgc8zb4yMA1IupcZTQ2Hoc6Cd4TPyXVn1KGQSc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=p91QIlo2; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="p91QIlo2" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-488afb0427eso38146115e9.1 for ; Wed, 08 Apr 2026 07:55:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775660136; x=1776264936; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=lTAJnt8Rzxnihpw2s0R85tzSbsKjYZRdJ/xdTQ6GUWU=; b=p91QIlo2lK/6GzZZJyo+6v/rcgffeQPF45Gnsp+C+8oP5gNBvYk//bwzwYouBLGJ36 09nkEXBf+G6oGMLsvTYktWCcmi4z8jpunReHMyviHF+6gB28Mms9CtQP+Xmza0+3F7TE qOIzEHnyHihaFpSNzGYaeDdWacbqijZszvUHVZiSsVAr8vxyw2X7mXi2jIqO4CtOJs4E 0GfW17zFfgvhkXnK8n90MFFXpQZlu+LdUHjjRWzNghw+3loompYXr5ejjsHNuX9hX3f/ JWwTtGvxKZ15b73p1VFU6aFmH/y46Cslm8BBO6WcP0EExNxapHq1XW8BSoyD9clS6NZz PAzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775660136; x=1776264936; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lTAJnt8Rzxnihpw2s0R85tzSbsKjYZRdJ/xdTQ6GUWU=; b=qljwgwaTLA2k4Rvz9WmLfDSL6wlPWgD3HBXV8DfsdtUoHBIvIpdt/ZfuHXul18u3C+ Q1VIxN+AKGxNIHZ9a5Ajh5oLIp2Zd0R07YCNvCQE8gqugy2pzZVnmEmE3HDON/jU+4/F zgXZYN8MI1FjO7e3HjZOXFeRZbfujoGBxK5ttGRycEtlIjwRFzo9RbVxmI8VbzIHkWJU SacQm6xyoVfkTNMGZ9zEDr/tjI2ZhMnC2nTgp1fVNA7gJjEf9nQ7noa8dLAaX9iS4iCY aKo5fzCWEqHqmTLf8w/nnhtvpPu4AuBmA+FfnMrekJyioRxTeJ7E8E0Zd+eF2C9voQSB b28Q== X-Forwarded-Encrypted: i=1; AJvYcCVF3bWXOjQlifT5OcGU2l6qTFrVIx1UzAmam9ej8HcULXtzrb0ewSvkv3CGSb/FewfJ/vVbpbhMzrmtN28=@vger.kernel.org X-Gm-Message-State: AOJu0YzpJthSHj51F2iSg18H9U5DmDtsj/gHXrtCxy4CMh0YuPXaz/H0 bHwjrvhF+uQ2RY4/F8xRiD4hJtTMcShePhmEWz53rGUjSPr5uz2pll82 X-Gm-Gg: AeBDievE3jnMtqzmBtH2A4WjSwy8T09ghQz40acx0EaPr8ZrVYDjGvQ5I4ikndk43dG 0Z9fS+08n1ysPMqs3EVQjOGV2N4wcBSDyNorwX/X/qg9VPAj7D+CSY0DwF/aMLSuMHx2JLdywjV 0Azj5/tRYomZzBihw/1lMwR26nv5yuFL+pmIRuaNgXpc+ZRojWcyOlX8lReE0vKIwZofpOZup7U A1uhJJzRReLEIkPizuzfSxW4AU+Ez/ITY5XRx1jrR/T3wWLujgCnttwMo6CeziV3EWeE66iz6SD n3Whxk6FlvZCNP4Ff4XOMvoqi9QkXrSurSBPzsOszwSyoh5ByFrbBL7PiZIbd7HP3e16PfRGAsK HpB9PhHsjM28qpQQMZtFrkqUPP+2pF0m7uQhKBJI/mybZA3Pdf8fvNYXqEhivjM9won88j4ft8Z R7xIBBwJ3fPtG8fpzeb2ahNtO9 X-Received: by 2002:a05:600c:3b19:b0:486:fab9:a578 with SMTP id 5b1f17b1804b1-4889976af3bmr294382785e9.11.1775660135451; Wed, 08 Apr 2026 07:55:35 -0700 (PDT) Received: from localhost.localdomain ([2a02:8308:b093:bb00::3006]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43d1e4f5294sm55762103f8f.35.2026.04.08.07.55.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2026 07:55:34 -0700 (PDT) Sender: Alban Crequy From: Alban Crequy To: Andrew Morton , David Hildenbrand , Christian Brauner Cc: Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alban Crequy , Alban Crequy , Peter Xu , Willy Tarreau , linux-kselftest@vger.kernel.org, shuah@kernel.org Subject: [PATCH v2 2/2] selftests/mm: add tests for process_vm_readv flags Date: Wed, 8 Apr 2026 16:54:36 +0200 Message-ID: <20260408145436.843538-3-alban.crequy@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20260408145436.843538-1-alban.crequy@gmail.com> References: <20260408145436.843538-1-alban.crequy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alban Crequy Add selftests for the PROCESS_VM_PIDFD and PROCESS_VM_NOWAIT flags introduced in process_vm_readv/writev. Tests cover: - basic read with no flags - invalid flags (EINVAL) - invalid address (EFAULT) - flag validation precedence over address validation - PROCESS_VM_PIDFD: read via pidfd - PROCESS_VM_NOWAIT: read from resident memory - PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT combined - userfaultfd blocking read (no flags) - PROCESS_VM_NOWAIT with userfaultfd (non-blocking, returns EFAULT) Signed-off-by: Alban Crequy Reviewed-by: Christian Brauner --- New in v2. tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/process_vm_readv.c | 368 ++++++++++++++++++ 2 files changed, 369 insertions(+) create mode 100644 tools/testing/selftests/mm/process_vm_readv.c diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index 7a5de4e9bf52..056d9d961f6b 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -105,6 +105,7 @@ TEST_GEN_FILES +=3D droppable TEST_GEN_FILES +=3D guard-regions TEST_GEN_FILES +=3D merge TEST_GEN_FILES +=3D rmap +TEST_GEN_FILES +=3D process_vm_readv =20 ifneq ($(ARCH),arm64) TEST_GEN_FILES +=3D soft-dirty diff --git a/tools/testing/selftests/mm/process_vm_readv.c b/tools/testing/= selftests/mm/process_vm_readv.c new file mode 100644 index 000000000000..cc25471410b5 --- /dev/null +++ b/tools/testing/selftests/mm/process_vm_readv.c @@ -0,0 +1,368 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kselftest_harness.h" + +#ifndef PROCESS_VM_PIDFD +#define PROCESS_VM_PIDFD (1UL << 0) +#endif + +#ifndef PROCESS_VM_NOWAIT +#define PROCESS_VM_NOWAIT (1UL << 1) +#endif + +#ifndef __NR_pidfd_open +#define __NR_pidfd_open 434 +#endif + +static int sys_pidfd_open(pid_t pid, unsigned int flags) +{ + return syscall(__NR_pidfd_open, pid, flags); +} + +static const uint8_t test_data[] =3D { 0x01, 0x02, 0x03, 0x04, + 0x05, 0x06, 0x07, 0x08 }; +#define POISON_BYTE 0xCC + +/* + * Test: basic process_vm_readv with no flags + */ +TEST(read_basic) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + memset(buf, POISON_BYTE, sizeof(buf)); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); +} + +/* + * Test: invalid flags should return EINVAL + */ +TEST(read_invalid_flags) +{ + uint8_t buf[8] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 255); + ASSERT_EQ(-1, n); + ASSERT_EQ(EINVAL, errno); +} + +/* + * Test: invalid address should return EFAULT + */ +TEST(read_invalid_address) +{ + uint8_t buf[8] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + ssize_t n; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(-1, n); + ASSERT_EQ(EFAULT, errno); +} + +/* + * Test: invalid address with invalid flags should return EINVAL + * (flag check happens before address validation) + */ +TEST(read_invalid_address_invalid_flags) +{ + uint8_t buf[8] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + ssize_t n; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 255); + ASSERT_EQ(-1, n); + ASSERT_EQ(EINVAL, errno); +} + +/* + * Test: invalid address with all valid flags should return EFAULT + * (flags are valid so we get past the flag check to the address check) + */ +TEST(read_invalid_address_all_valid_flags) +{ + int pidfd; + struct iovec local_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + struct iovec remote_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + ssize_t n; + + pidfd =3D sys_pidfd_open(getpid(), 0); + ASSERT_GE(pidfd, 0); + + n =3D process_vm_readv(pidfd, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT); + ASSERT_EQ(-1, n); + ASSERT_EQ(EFAULT, errno); + + close(pidfd); +} + +/* + * Test: read with PIDFD flag + */ +TEST(read_pidfd) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + int pidfd; + + memset(buf, POISON_BYTE, sizeof(buf)); + pidfd =3D sys_pidfd_open(getpid(), 0); + ASSERT_GE(pidfd, 0); + + n =3D process_vm_readv(pidfd, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); + + close(pidfd); +} + +/* + * Test: read with NOWAIT from resident memory (should succeed) + */ +TEST(read_nowait_resident) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + memset(buf, POISON_BYTE, sizeof(buf)); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, + PROCESS_VM_NOWAIT); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); +} + +/* + * Test: read with PIDFD + NOWAIT from resident memory + */ +TEST(read_pidfd_nowait_resident) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + int pidfd; + + memset(buf, POISON_BYTE, sizeof(buf)); + pidfd =3D sys_pidfd_open(getpid(), 0); + ASSERT_GE(pidfd, 0); + + n =3D process_vm_readv(pidfd, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); + + close(pidfd); +} + +/* + * Userfaultfd helpers for NOWAIT tests + */ +static int setup_userfaultfd(void) +{ + struct uffdio_api api =3D { .api =3D UFFD_API }; + int uffd; + + uffd =3D syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) + return -1; + + if (ioctl(uffd, UFFDIO_API, &api)) { + close(uffd); + return -1; + } + + return uffd; +} + +static void *register_uffd_region(int uffd, size_t size) +{ + struct uffdio_register reg; + void *mem; + + mem =3D mmap(NULL, size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mem =3D=3D MAP_FAILED) + return NULL; + + reg.range.start =3D (unsigned long)mem; + reg.range.len =3D size; + reg.mode =3D UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, ®)) { + munmap(mem, size); + return NULL; + } + + return mem; +} + +struct uffd_handler_args { + int uffd; + const void *content; + size_t content_len; +}; + +static void *uffd_handler_thread(void *arg) +{ + struct uffd_handler_args *ha =3D arg; + struct uffd_msg msg; + struct uffdio_copy copy; + struct pollfd pfd =3D { + .fd =3D ha->uffd, + .events =3D POLLIN + }; + void *page; + long page_size =3D sysconf(_SC_PAGESIZE); + int ret; + + page =3D mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (page =3D=3D MAP_FAILED) + return (void *)(long)-ENOMEM; + + memcpy(page, ha->content, ha->content_len); + + ret =3D poll(&pfd, 1, 5000); + if (ret <=3D 0) + goto out; + + if (read(ha->uffd, &msg, sizeof(msg)) !=3D sizeof(msg)) + goto out; + + if (msg.event !=3D UFFD_EVENT_PAGEFAULT) + goto out; + + copy.dst =3D msg.arg.pagefault.address & ~(page_size - 1); + copy.src =3D (unsigned long)page; + copy.len =3D page_size; + copy.mode =3D 0; + ioctl(ha->uffd, UFFDIO_COPY, ©); + +out: + munmap(page, page_size); + return NULL; +} + +/* + * Test: read from userfaultfd-registered memory (no flags, should block + * until page fault is resolved by handler thread) + */ +TEST(read_userfaultfd_blocking) +{ + int uffd; + void *mem; + long page_size =3D sysconf(_SC_PAGESIZE); + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov; + struct uffd_handler_args ha; + pthread_t handler; + ssize_t n; + + memset(buf, POISON_BYTE, sizeof(buf)); + + uffd =3D setup_userfaultfd(); + ASSERT_GE(uffd, 0); + + mem =3D register_uffd_region(uffd, page_size); + ASSERT_NE(NULL, mem); + + ha.uffd =3D uffd; + ha.content =3D test_data; + ha.content_len =3D sizeof(test_data); + ASSERT_EQ(0, pthread_create(&handler, NULL, uffd_handler_thread, &ha)); + + remote_iov.iov_base =3D mem; + remote_iov.iov_len =3D sizeof(test_data); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); + + pthread_join(handler, NULL); + munmap(mem, page_size); + close(uffd); +} + +/* + * Test: read with NOWAIT from userfaultfd-registered memory that has + * not been faulted in yet. Should return EFAULT (not block). + */ +TEST(read_nowait_userfaultfd) +{ + int uffd; + void *mem; + long page_size =3D sysconf(_SC_PAGESIZE); + uint8_t buf[sizeof(test_data)] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov; + ssize_t n; + + uffd =3D setup_userfaultfd(); + ASSERT_GE(uffd, 0); + + mem =3D register_uffd_region(uffd, page_size); + ASSERT_NE(NULL, mem); + + /* Ensure the page is not present */ + madvise(mem, page_size, MADV_DONTNEED); + + remote_iov.iov_base =3D mem; + remote_iov.iov_len =3D sizeof(test_data); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, + PROCESS_VM_NOWAIT); + ASSERT_EQ(-1, n); + ASSERT_EQ(EFAULT, errno); + + munmap(mem, page_size); + close(uffd); +} + +TEST_HARNESS_MAIN --=20 2.45.0