From nobody Mon Jun 8 03:20:52 2026 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5489367286 for ; Tue, 2 Jun 2026 10:10:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780395008; cv=none; b=YZiaqA81CISpJYpJtYe8JaEhxc6wDDPWEbSPdaE5PjBB5su5Et+M/n+UIcPIQn8ZIIkS6BkAeRRIZsWCAxt0s3wfhN5hcPtWII8Lh2kFBxUhwL+jF3nmg18QiZz1W/TvAZzR5R6mteaEJKVqm8p9kQU6mSyzsvw0mgyTrhuwS58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780395008; c=relaxed/simple; bh=cmgsm48jd/efzmqmGFSQ0hgFCMGrk517dywFTnMC4ec=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BvYFTr8qx6P32LXuHohrp75lAhBuoYWyw9t0MbfrONfQp+OMqtMZ10OdcQqWe4hscJ0s5EW7Pz1JOaItgtC141ZI1/bMkhZixfoiJWYCkMIuSIdwCC7j8P+ARaU28rGrKMttUVQpbUm405uvP8Kv6/W3KidcHjy8QawdzWZ4MsM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eJYqOuF4; arc=none smtp.client-ip=209.85.221.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eJYqOuF4" Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-45fd45e596cso1368637f8f.1 for ; Tue, 02 Jun 2026 03:10:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780395003; x=1780999803; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=c76SrHoI5fkYZYTcaa/8RFwrM5OKoWVXsmXqH1gEQ1g=; b=eJYqOuF46edLni6SYNbH6uPMt1mCVZdppwXi7326YMZRDaEjJe5DUPbGqaEYjx9LxV bfTk8XtNsRvrDwaxaQLcIp6nPkg/fEMxKwaO8zc1T7VVdLmiUj7AKILeJK7ZLpP6x9dU GwJrcI9FtMVbMokdP8lbV9SBZXvJ4eaaBpS3T9L8A2To/e8CozRdD9ZNhh/9L5AogH50 19Ot7UyVH3DnJkbBHOg9Z3ODkSN06QTj31S9cOs4TGoX0b5PmtlgYkSbI8elElQtwvDt lYfcglyItiZ0/nYTeUI04G19k4qyPVhe5KrNtiLZd8gBXN+K7/KNivV/PKf5SpVgCV3b EBhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780395003; x=1780999803; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c76SrHoI5fkYZYTcaa/8RFwrM5OKoWVXsmXqH1gEQ1g=; b=rZiVzeoQ7yMNWIb5Likm/heZDDdQ5bm+Uh2uBXItZPREI5pDn+eSc1QCTwSxrSUxnQ EC7z5K6uaTddarRNWpV76mnzpFntsLqIvL7QqshWOrQfYCozkT4DDcJgUOxpATvmx8sy AM9C22DhpvFCPP7l3ut9whmncFbVWUfZbdvo4DYZH8Lu469HHQMBzu/0LGlPwlenOJ+0 7cje1xnV9Gvdkxb76dLrqQuCzIx0MBh2fj+LZNiVffJbORsc3GhrXffW3cIoKL+qwYrT jOrb8nmufrFs6wNTB06SCsrNdtutnJvxnCR+S/z5Y1tMtz3XR5Muo4X8EMFS7esjExiG 0HIg== X-Forwarded-Encrypted: i=1; AFNElJ9xs7gqpLOmmLRBX6+4kxNfPLJCeiwMz3q0gPNHoMbmZJr6rAP+8fB0G+6g28zHIoZSL3TnWgN+PxCs1nM=@vger.kernel.org X-Gm-Message-State: AOJu0YzJTyzQ3AFjMILMiV1I8iIfggzxYaxtk7/ei+JuZTWC+CV5Vj0k WV4656uncztshtn+grJm3fLtVbTcJJ6YXS3X+GLfdtevsMd8Ujzws0xZ X-Gm-Gg: Acq92OHYbrIrIqY8s182OsBXYhGwRxw0VLIkfpvqVMTZKrLqauUyQD54AQvrMTYZ2wv FdZwxwt280LWgXB9HAAAvZFv4dC2kbXQHhSDhXPw3c2Za5LrHmjM8rVc9fryLt3BKPh+nbL6Z2d 0OKJT+RPq0BlN7LUCvxEKU5c6koZpko6sav99dqUd7b0ApOYXpyKKBrm1UI5y7GK1OHh3SFObcu /MCnV6teO5I88aNQaycGgLm7BhAYhU66bUSNSk8ONJZTwbFN66t0OmsadLkHwA8F11pPb1msO7o apSt+uUWPkGW1i+dt8pGNy8ZPfriJgNvGRLrfU6gB69mDNIl4HB4kqjdjhvzswiMwC8daBSfAqY 0Gog/y88bpgtTJxudnLB4+vbPPpG7mUW3zGxpHkzuMhV6EIFNJ+SW7NtQdT6imtNV8kPXIfLX0U fMeU9YCYGGjNN6c/z+GC6SHNZdZPpth+Jg2b0= X-Received: by 2002:a05:600d:844a:10b0:48a:5565:ec3d with SMTP id 5b1f17b1804b1-490a294935bmr203469825e9.22.1780395003021; Tue, 02 Jun 2026 03:10:03 -0700 (PDT) Received: from localhost.localdomain ([2a02:8308:b093:bb00::d167]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45ef354cd87sm30636722f8f.24.2026.06.02.03.10.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jun 2026 03:10:02 -0700 (PDT) Sender: Alban Crequy From: Alban Crequy To: Andrew Morton , David Hildenbrand , Christian Brauner Cc: Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alban Crequy , Alban Crequy , Peter Xu , Willy Tarreau , linux-kselftest@vger.kernel.org, shuah@kernel.org, Usama Arif , David Laight Subject: [PATCH v5 1/2] mm/process_vm_access: pidfd and nowait support for process_vm_readv/writev Date: Tue, 2 Jun 2026 12:09:16 +0200 Message-ID: <20260602100917.3641359-2-alban.crequy@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20260602100917.3641359-1-alban.crequy@gmail.com> References: <20260602100917.3641359-1-alban.crequy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alban Crequy There are two categories of users for process_vm_readv: 1. Debuggers like GDB or strace. When a debugger attempts to read the target memory and triggers a page fault, the page fault needs to be resolved so that the debugger can accurately interpret the memory. A debugger is typically attached to a single process. 2. Profilers like OpenTelemetry eBPF Profiler. The profiler uses a perf event to get stack traces from all processes at 20Hz (20 stack traces to resolve per second). For interpreted languages (Ruby, Python, etc.), the profiler uses process_vm_readv to get the correct symbols. In this case, performance is the most important. It is fine if some stack traces cannot be resolved as long as it is not statistically significant. The current behaviour of process_vm_readv is to resolve page faults in the target VM. This is as desired for debuggers, but unwelcome for profilers because the page fault resolution could take a lot of time depending on the backing filesystem. Additionally, since profilers monitor all processes, we don't want a slow page fault resolution for one target process slowing down the monitoring for all other target processes. This patch adds the flag PROCESS_VM_NOWAIT, so the caller can choose to not block on IO if the memory access causes a page fault. When a page is not resident and would require IO to fault in, the syscall returns a short read (the number of bytes successfully read before the fault) or -1 with errno set to EFAULT if no bytes were read. Additionally, this patch adds the flag PROCESS_VM_PIDFD to refer to the remote process via PID file descriptor instead of PID. Such a file descriptor can be obtained with pidfd_open(2). This is useful to avoid the pid number being reused. It is unlikely to happen for debuggers because they can monitor the target process termination in other ways (ptrace), but can be helpful in some profiling scenarios. When using PROCESS_VM_PIDFD, the first argument is a pidfd instead of a pid. If the pidfd is invalid, the syscall returns -1 with errno set to EBADF. If a given flag is unsupported, the syscall returns the error EINVAL without checking the buffers. This gives a way to userspace to detect whether the current kernel supports a specific flag: process_vm_readv(pid, NULL, 1, NULL, 1, PROCESS_VM_PIDFD) -> EINVAL if the kernel does not support the flag PROCESS_VM_PIDFD (before this patch) -> EFAULT if the kernel supports the flag (after this patch) Suggested man page update for process_vm_readv(2): The flags argument is the bitwise OR of zero or more of these flags: PROCESS_VM_PIDFD (since Linux 7.x) The pid argument is a PID file descriptor (see pidfd_open(2)) instead of a PID number. When using this flag, the existing ESRCH error applies if the process referred to by the pidfd has exited. PROCESS_VM_NOWAIT (since Linux 7.x) Do not block on IO. If a page in the remote address space is not resident and would require disk IO to fault in, the system call returns a short read or fails with EFAULT if no bytes were read. Additional error: EBADF pid is not a valid file descriptor (PROCESS_VM_PIDFD only). Signed-off-by: Alban Crequy Acked-by: David Hildenbrand (Arm) --- v5: - No changes in this patch. v4: - Rename process_vm.h to process_vm_access.h (David Hildenbrand) - Fix MAINTAINERS alphabetical sort order (David Hildenbrand) - Document NOWAIT return value and PIDFD EBADF error (David Hildenbrand) - Add suggested man page update text (David Hildenbrand, Christian Brauner, Mike Rapoport) - Keep (1UL << N) in UAPI header (David Hildenbrand) v3: - Fix ERR_PTR handling for pidfd_get_task(): use IS_ERR()/PTR_ERR() for the pidfd path, matching process_madvise() (Usama Arif, Sashiko) v2: - Expand commit message with use-case motivation (David Hildenbrand) - Use unsigned long consistently for pvm_flags parameter (David Hildenbrand) - Add PROCESS_VM_SUPPORTED_FLAGS kernel-internal define (David Hildenbrand) - Keep (1UL << N) in UAPI header: BIT() is defined in vdso/bits.h which is not exported to userspace, so UAPI headers using BIT() would break when included from userspace programs (David Hildenbrand) MAINTAINERS | 1 + include/uapi/linux/process_vm_access.h | 9 +++++++ mm/process_vm_access.c | 34 +++++++++++++++++++------- 3 files changed, 35 insertions(+), 9 deletions(-) create mode 100644 include/uapi/linux/process_vm_access.h diff --git a/MAINTAINERS b/MAINTAINERS index 9ec290e38b44..1bfce49dd0bb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16800,6 +16800,7 @@ F: include/linux/pgtable.h F: include/linux/ptdump.h F: include/linux/vmpressure.h F: include/linux/vmstat.h +F: include/uapi/linux/process_vm_access.h F: fs/proc/meminfo.c F: kernel/fork.c F: mm/Kconfig diff --git a/include/uapi/linux/process_vm_access.h b/include/uapi/linux/pr= ocess_vm_access.h new file mode 100644 index 000000000000..2196c9c46351 --- /dev/null +++ b/include/uapi/linux/process_vm_access.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_PROCESS_VM_ACCESS_H +#define _UAPI_LINUX_PROCESS_VM_ACCESS_H + +/* Flags for process_vm_readv/process_vm_writev */ +#define PROCESS_VM_PIDFD (1UL << 0) +#define PROCESS_VM_NOWAIT (1UL << 1) + +#endif /* _UAPI_LINUX_PROCESS_VM_ACCESS_H */ diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index 656d3e88755b..31004dd3c9e3 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -14,6 +14,9 @@ #include #include #include +#include + +#define PROCESS_VM_SUPPORTED_FLAGS (PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT) =20 /** * process_vm_rw_pages - read/write pages from task specified @@ -68,6 +71,7 @@ static int process_vm_rw_pages(struct page **pages, * @mm: mm for task * @task: task to read/write from * @vm_write: 0 means copy from, 1 means copy to + * @pvm_flags: PROCESS_VM_* flags * Returns 0 on success or on failure error code */ static int process_vm_rw_single_vec(unsigned long addr, @@ -76,7 +80,8 @@ static int process_vm_rw_single_vec(unsigned long addr, struct page **process_pages, struct mm_struct *mm, struct task_struct *task, - int vm_write) + int vm_write, + unsigned long pvm_flags) { unsigned long pa =3D addr & PAGE_MASK; unsigned long start_offset =3D addr - pa; @@ -91,6 +96,8 @@ static int process_vm_rw_single_vec(unsigned long addr, =20 if (vm_write) flags |=3D FOLL_WRITE; + if (pvm_flags & PROCESS_VM_NOWAIT) + flags |=3D FOLL_NOWAIT; =20 while (!rc && nr_pages && iov_iter_count(iter)) { int pinned_pages =3D min_t(unsigned long, nr_pages, PVM_MAX_USER_PAGES); @@ -141,7 +148,7 @@ static int process_vm_rw_single_vec(unsigned long addr, * @iter: where to copy to/from locally * @rvec: iovec array specifying where to copy to/from in the other process * @riovcnt: size of rvec array - * @flags: currently unused + * @flags: process_vm_readv/writev flags * @vm_write: 0 if reading from other process, 1 if writing to other proce= ss * * Returns the number of bytes read/written or error code. May @@ -163,6 +170,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov= _iter *iter, unsigned long nr_pages_iov; ssize_t iov_len; size_t total_len =3D iov_iter_count(iter); + unsigned int f_flags; =20 /* * Work out how many pages of struct pages we're going to need @@ -194,10 +202,18 @@ static ssize_t process_vm_rw_core(pid_t pid, struct i= ov_iter *iter, } =20 /* Get process information */ - task =3D find_get_task_by_vpid(pid); - if (!task) { - rc =3D -ESRCH; - goto free_proc_pages; + if (flags & PROCESS_VM_PIDFD) { + task =3D pidfd_get_task(pid, &f_flags); + if (IS_ERR(task)) { + rc =3D PTR_ERR(task); + goto free_proc_pages; + } + } else { + task =3D find_get_task_by_vpid(pid); + if (!task) { + rc =3D -ESRCH; + goto free_proc_pages; + } } =20 mm =3D mm_access(task, PTRACE_MODE_ATTACH_REALCREDS); @@ -215,7 +231,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov= _iter *iter, for (i =3D 0; i < riovcnt && iov_iter_count(iter) && !rc; i++) rc =3D process_vm_rw_single_vec( (unsigned long)rvec[i].iov_base, rvec[i].iov_len, - iter, process_pages, mm, task, vm_write); + iter, process_pages, mm, task, vm_write, flags); =20 /* copied =3D space before - space after */ total_len -=3D iov_iter_count(iter); @@ -244,7 +260,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov= _iter *iter, * @liovcnt: size of lvec array * @rvec: iovec array specifying where to copy to/from in the other process * @riovcnt: size of rvec array - * @flags: currently unused + * @flags: process_vm_readv/writev flags * @vm_write: 0 if reading from other process, 1 if writing to other proce= ss * * Returns the number of bytes read/written or error code. May @@ -266,7 +282,7 @@ static ssize_t process_vm_rw(pid_t pid, ssize_t rc; int dir =3D vm_write ? ITER_SOURCE : ITER_DEST; =20 - if (flags !=3D 0) + if (flags & ~PROCESS_VM_SUPPORTED_FLAGS) return -EINVAL; =20 /* Check iovecs */ --=20 2.45.0 From nobody Mon Jun 8 03:20:52 2026 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 386C13D410B for ; Tue, 2 Jun 2026 10:10:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780395015; cv=none; b=YldHURCRn8kx6f6NhNxrm5PzECBqJL/kSES9j8DlBHM9Rh9h+aUdCKupTFWBuToqjZV2zVWYEQPRggxPB0+IBvbSJJaz/VwzclmZk5RiEdH3yhBm9+JXGUt/XKihuZhOKTKFJCQA3ntoEzvwoX88Bv1LVzkcDQfS73qWJ7+zJ7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780395015; c=relaxed/simple; bh=O77tn/bArQ8lC/a/XWA9vUcE5mYbh1DI/hADTlOteVM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=heott0SrSayU4RkJf6X3AVbNBDMzR2RMOmHfw9t+Br7+zTM/thJa66isAEuzw4E631Q+w2tXafSORt4LGIqL1/JYFkc2VFVawoiPEySeGHctnQptEJEpdxmCsPQie/4uzVuUrwthIFOEv19mr4dz9fXGQIKaQHXgA7+dlwDScjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gclgUL73; arc=none smtp.client-ip=209.85.221.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gclgUL73" Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-45ef372c58aso2055292f8f.0 for ; Tue, 02 Jun 2026 03:10:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780395012; x=1780999812; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=lRCUMzAHndLIoqHDBN9uRut+UxmgXFuICM92TGjeUzA=; b=gclgUL732HyEcx80Han/lsYLfO8izmkrzzQWxvpfV/4m4vimF3UZpddOO2AjKZm/ki 4SdFIEiNAI8Ibm5bN5LtzKWo32gSnZV+2XXCHyctYtwBnGW7eQZmXjNIhyo28OddC+o8 4ouSCP69ZXoNRP2X3iNjp0Imbo/EZcgS7M5RA1eJcX0VRBo6abXXIpc0kYU8JeDP3ngQ AuHeTmwF74lqPsW5LINmi3soo7cMo5dLxsr4qIFOhNq0S5HDEVUiPSZn9jR2Yo8Crs/J UjlRZsbJv8n13wijJYwocSwX0kCSkNKXzqf3X77t+01LFgoTDFChRasfQoTKhhrB6dbI qizg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780395012; x=1780999812; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lRCUMzAHndLIoqHDBN9uRut+UxmgXFuICM92TGjeUzA=; b=lC3phZ1ym4R0A7vLo37i96llVk3SFYqBuwt+vO0Vi0B+aXzcHFtMtLpTODkiRuko7/ uxyhyELNqNZbIfyyuMMatGkKk/ap45/ZDmvb5MwoTdGRylI7MHEFtmwJYRE3CnAZR/kr 6ZYgyk/hezrcFj3GQ12w2JhWpqxO4Z/RJEr+H8ZCRVYBd0+khhppEdkdHQFEySvEXCS7 Rtc2JmQwl/4nBDxvVw1++EClpCmqi3fq9EyvOtckxB2fg6yQjLFdG3omekvdj0/zq0MA VMmGiASwtFotITh2TZjoj0OuBGy0ypfajsNqEvuEAtohX77vCgWIEkwEUq0t1U+WgqCO irqQ== X-Forwarded-Encrypted: i=1; AFNElJ/g24BzFHt57fCW/U/dYXJ9EMnDc3uq5u7bIOH8MuWcLsQxKxoowIv8gNgT//cCZJbTaIoKr4RJ8G8pvHg=@vger.kernel.org X-Gm-Message-State: AOJu0Yyfh6u17fuFxzmRT+IamaI0TewBakfs4kubG2SEhIzd+h5KkKFs GOmUtbfSWDQL5HVEIPfmMmPa27kbp2FFpx3z71K5DaRcNihFkx+z98Wq X-Gm-Gg: Acq92OGfpWQxwr0F9hz6QX08Pb4bLi3nyMHUpdf6+SP9YzRQ6iIKIUgetaYKa3sj4oS bUXsd0qKYlmrQxAdSYguCMRTSKUDpYioiQEwEcd6Zofh/A6itA26Qe69QpW3ugAMx2g9A01SVUi 3ftDeYt8YVedenq10my4IofwDn0b891IJpevyztFCJ7aG3w4yLTW8UJWeafwxxUVOUS2xRLH0Og KEkCW78lnu7F7c1tP/BzzCkYB4UIAvWIFt64pZSvEwD0wMLw3EsxILBrqyHGj3jsHSidP4DuYpK zkJol22oFjvm0RsOSlUtxgCjdBS1C67/h7yIcZyvfmKN0zKGhQODIg1sBUh+yTB/XG2lwLfWVB+ faHjtoRifApsdwTGUSXbCVHUDEJRSSCV3GAwDvr4Bc/0ynC8LIONfecmOxu2bT3xmNgX0o7HPF+ 40DlbMJXIFf+Zds9UNRiVJi+37IWkoUjzosBc= X-Received: by 2002:a5d:4684:0:b0:43d:7bc9:9b2c with SMTP id ffacd0b85a97d-45ef6b2a419mr23214449f8f.17.1780395011516; Tue, 02 Jun 2026 03:10:11 -0700 (PDT) Received: from localhost.localdomain ([2a02:8308:b093:bb00::d167]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45ef354cd87sm30636722f8f.24.2026.06.02.03.10.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jun 2026 03:10:11 -0700 (PDT) Sender: Alban Crequy From: Alban Crequy To: Andrew Morton , David Hildenbrand , Christian Brauner Cc: Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alban Crequy , Alban Crequy , Peter Xu , Willy Tarreau , linux-kselftest@vger.kernel.org, shuah@kernel.org, Usama Arif , David Laight Subject: [PATCH v5 2/2] selftests/mm: add tests for process_vm_readv flags Date: Tue, 2 Jun 2026 12:09:17 +0200 Message-ID: <20260602100917.3641359-3-alban.crequy@gmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20260602100917.3641359-1-alban.crequy@gmail.com> References: <20260602100917.3641359-1-alban.crequy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alban Crequy Add selftests for the PROCESS_VM_PIDFD and PROCESS_VM_NOWAIT flags introduced in process_vm_readv/writev. Tests cover: - basic read with no flags - invalid flags (EINVAL) - invalid address (EFAULT) - flag validation precedence over address validation - invalid pidfd (EBADF) - invalid pid (ESRCH) - PROCESS_VM_PIDFD: read via pidfd - PROCESS_VM_NOWAIT: read from resident memory - PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT combined - userfaultfd blocking read (no flags) - PROCESS_VM_NOWAIT with userfaultfd (non-blocking, returns EFAULT) - PROCESS_VM_NOWAIT partial read with single iovec across resident and non-resident pages (returns page_size bytes) - PROCESS_VM_NOWAIT partial read with two iovecs, first resident, second non-resident (returns page_size bytes) Tests gracefully SKIP on kernels without PROCESS_VM_PIDFD or PROCESS_VM_NOWAIT support (EINVAL). Signed-off-by: Alban Crequy --- v5: - Add process_vm_readv to .gitignore (Sashiko) - Use ~0UL instead of 255 for invalid flags tests (Sashiko) - Use volatile uint8_t instead of volatile uint64_t for page fault-in to avoid alignment issues on strict-alignment architectures (Sashiko) - Use UFFDIO_UNREGISTER in uffd handler error path to safely unblock the main thread without double-close (Sashiko) v4: - Add selftests for NOWAIT partial reads across resident and non-resident pages (single iovec and two iovecs) - SKIP tests gracefully on kernels without flag support (Sashiko) - Verify content of partial reads v3: - Add selftest for invalid pidfd (David Hildenbrand) - Add selftest for invalid pid - SKIP on kernels without PROCESS_VM_PIDFD support - Remove hardcoded __NR_pidfd_open fallback, use (Sashiko) - SKIP pidfd tests on kernels without pidfd_open (ENOSYS) (Sashiko) - SKIP userfaultfd tests when unprivileged userfaultfd is disabled (EPERM) = (Sashiko) - Fault in test_data before NOWAIT tests to ensure page is resident (Sashik= o) - Add ksft_process_vm_readv.sh wrapper and run_vmtests.sh entry v2: - New patch. tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 2 + .../selftests/mm/ksft_process_vm_readv.sh | 4 + tools/testing/selftests/mm/process_vm_readv.c | 591 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 5 files changed, 602 insertions(+) create mode 100755 tools/testing/selftests/mm/ksft_process_vm_readv.sh create mode 100644 tools/testing/selftests/mm/process_vm_readv.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftest= s/mm/.gitignore index b0c30c5ee9e3..78525dfa5a95 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -23,6 +23,7 @@ transhuge-stress pagemap_ioctl pfnmap process_madv +process_vm_readv *.tmp* protection_keys protection_keys_32 diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index cd24596cdd27..feb3a0b9a57e 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -106,6 +106,7 @@ TEST_GEN_FILES +=3D guard-regions TEST_GEN_FILES +=3D merge TEST_GEN_FILES +=3D rmap TEST_GEN_FILES +=3D folio_split_race_test +TEST_GEN_FILES +=3D process_vm_readv =20 ifneq ($(ARCH),arm64) TEST_GEN_FILES +=3D soft-dirty @@ -167,6 +168,7 @@ TEST_PROGS +=3D ksft_pfnmap.sh TEST_PROGS +=3D ksft_pkey.sh TEST_PROGS +=3D ksft_process_madv.sh TEST_PROGS +=3D ksft_process_mrelease.sh +TEST_PROGS +=3D ksft_process_vm_readv.sh TEST_PROGS +=3D ksft_rmap.sh TEST_PROGS +=3D ksft_soft_dirty.sh TEST_PROGS +=3D ksft_thp.sh diff --git a/tools/testing/selftests/mm/ksft_process_vm_readv.sh b/tools/te= sting/selftests/mm/ksft_process_vm_readv.sh new file mode 100755 index 000000000000..09d0fcc9a35d --- /dev/null +++ b/tools/testing/selftests/mm/ksft_process_vm_readv.sh @@ -0,0 +1,4 @@ +#!/bin/sh -e +# SPDX-License-Identifier: GPL-2.0 + +./run_vmtests.sh -t process_vm_readv diff --git a/tools/testing/selftests/mm/process_vm_readv.c b/tools/testing/= selftests/mm/process_vm_readv.c new file mode 100644 index 000000000000..34aa44c5eb67 --- /dev/null +++ b/tools/testing/selftests/mm/process_vm_readv.c @@ -0,0 +1,591 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kselftest_harness.h" + +#ifndef PROCESS_VM_PIDFD +#define PROCESS_VM_PIDFD (1UL << 0) +#endif + +#ifndef PROCESS_VM_NOWAIT +#define PROCESS_VM_NOWAIT (1UL << 1) +#endif + +static int sys_pidfd_open(pid_t pid, unsigned int flags) +{ + return syscall(__NR_pidfd_open, pid, flags); +} + +static const uint8_t test_data[] =3D { 0x01, 0x02, 0x03, 0x04, + 0x05, 0x06, 0x07, 0x08 }; +#define POISON_BYTE 0xCC + +/* + * Test: basic process_vm_readv with no flags + */ +TEST(read_basic) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + memset(buf, POISON_BYTE, sizeof(buf)); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); +} + +/* + * Test: invalid flags should return EINVAL + */ +TEST(read_invalid_flags) +{ + uint8_t buf[8] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, ~0UL); + ASSERT_EQ(-1, n); + ASSERT_EQ(EINVAL, errno); +} + +/* + * Test: invalid address should return EFAULT + */ +TEST(read_invalid_address) +{ + uint8_t buf[8] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + ssize_t n; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(-1, n); + ASSERT_EQ(EFAULT, errno); +} + +/* + * Test: invalid address with invalid flags should return EINVAL + * (flag check happens before address validation) + */ +TEST(read_invalid_address_invalid_flags) +{ + uint8_t buf[8] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + ssize_t n; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, ~0UL); + ASSERT_EQ(-1, n); + ASSERT_EQ(EINVAL, errno); +} + +/* + * Test: invalid address with all valid flags should return EFAULT + * (flags are valid so we get past the flag check to the address check) + */ +TEST(read_invalid_address_all_valid_flags) +{ + int pidfd; + struct iovec local_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + struct iovec remote_iov =3D { .iov_base =3D NULL, .iov_len =3D 8 }; + ssize_t n; + + pidfd =3D sys_pidfd_open(getpid(), 0); + if (pidfd < 0 && errno =3D=3D ENOSYS) + SKIP(return, "pidfd_open not supported"); + ASSERT_GE(pidfd, 0); + + n =3D process_vm_readv(pidfd, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT); + ASSERT_EQ(-1, n); + if (errno =3D=3D EINVAL) + SKIP(return, + "PROCESS_VM_PIDFD or PROCESS_VM_NOWAIT not supported"); + ASSERT_EQ(EFAULT, errno); + + close(pidfd); +} + +/* + * Test: read with an invalid pidfd should return an error, not crash + */ +TEST(read_invalid_pidfd) +{ + uint8_t buf[sizeof(test_data)] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + /* fd 9999 is almost certainly not a valid pidfd */ + n =3D process_vm_readv(9999, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD); + ASSERT_EQ(-1, n); + if (errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_PIDFD not supported"); + ASSERT_EQ(EBADF, errno); +} + +/* + * Test: read with an invalid pid should return ESRCH + */ +TEST(read_invalid_pid) +{ + uint8_t buf[sizeof(test_data)] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + /* pid 999999 is almost certainly not a valid process */ + n =3D process_vm_readv(999999, &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(-1, n); + ASSERT_EQ(ESRCH, errno); +} + +/* + * Test: read with PIDFD flag + */ +TEST(read_pidfd) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + int pidfd; + + memset(buf, POISON_BYTE, sizeof(buf)); + pidfd =3D sys_pidfd_open(getpid(), 0); + if (pidfd < 0 && errno =3D=3D ENOSYS) + SKIP(return, "pidfd_open not supported"); + ASSERT_GE(pidfd, 0); + + n =3D process_vm_readv(pidfd, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD); + if (n =3D=3D -1 && errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_PIDFD not supported"); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); + + close(pidfd); +} + +/* + * Test: read with NOWAIT from resident memory (should succeed) + */ +TEST(read_nowait_resident) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + + *(volatile uint8_t *)test_data; /* fault in page for NOWAIT */ + memset(buf, POISON_BYTE, sizeof(buf)); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, + PROCESS_VM_NOWAIT); + if (n =3D=3D -1 && errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_NOWAIT not supported"); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); +} + +/* + * Test: read with PIDFD + NOWAIT from resident memory + */ +TEST(read_pidfd_nowait_resident) +{ + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov =3D { + .iov_base =3D (void *)test_data, + .iov_len =3D sizeof(test_data) + }; + ssize_t n; + int pidfd; + + *(volatile uint8_t *)test_data; /* fault in page for NOWAIT */ + memset(buf, POISON_BYTE, sizeof(buf)); + pidfd =3D sys_pidfd_open(getpid(), 0); + if (pidfd < 0 && errno =3D=3D ENOSYS) + SKIP(return, "pidfd_open not supported"); + ASSERT_GE(pidfd, 0); + + n =3D process_vm_readv(pidfd, &local_iov, 1, &remote_iov, 1, + PROCESS_VM_PIDFD | PROCESS_VM_NOWAIT); + if (n =3D=3D -1 && errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_PIDFD or PROCESS_VM_NOWAIT not supported"); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); + + close(pidfd); +} + +/* + * Userfaultfd helpers for NOWAIT tests + */ +static int setup_userfaultfd(void) +{ + struct uffdio_api api =3D { .api =3D UFFD_API }; + int uffd; + + uffd =3D syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) + return -errno; + + if (ioctl(uffd, UFFDIO_API, &api)) { + close(uffd); + return -errno; + } + + return uffd; +} + +static void *register_uffd_region(int uffd, size_t size) +{ + struct uffdio_register reg; + void *mem; + + mem =3D mmap(NULL, size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mem =3D=3D MAP_FAILED) + return NULL; + + reg.range.start =3D (unsigned long)mem; + reg.range.len =3D size; + reg.mode =3D UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, ®)) { + munmap(mem, size); + return NULL; + } + + return mem; +} + +struct uffd_handler_args { + int uffd; + const void *content; + size_t content_len; + void *mem; + size_t mem_len; +}; + +static void *uffd_handler_thread(void *arg) +{ + struct uffd_handler_args *ha =3D arg; + struct uffd_msg msg; + struct uffdio_copy uffd_copy; + struct pollfd pfd =3D { + .fd =3D ha->uffd, + .events =3D POLLIN + }; + void *page; + long page_size =3D sysconf(_SC_PAGESIZE); + int ret; + + page =3D mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (page =3D=3D MAP_FAILED) + return (void *)(long)-ENOMEM; + + memcpy(page, ha->content, ha->content_len); + + ret =3D poll(&pfd, 1, 5000); + if (ret <=3D 0) + goto err; + + if (read(ha->uffd, &msg, sizeof(msg)) !=3D sizeof(msg)) + goto err; + + if (msg.event !=3D UFFD_EVENT_PAGEFAULT) + goto err; + + uffd_copy.dst =3D msg.arg.pagefault.address & ~(page_size - 1); + uffd_copy.src =3D (unsigned long)page; + uffd_copy.len =3D page_size; + uffd_copy.mode =3D 0; + ioctl(ha->uffd, UFFDIO_COPY, &uffd_copy); + + munmap(page, page_size); + return NULL; + +err: + /* + * Unregister the uffd region to unblock the main thread if it + * is waiting for page fault resolution. + */ + { + struct uffdio_range range =3D { + .start =3D (unsigned long)ha->mem, + .len =3D ha->mem_len, + }; + ioctl(ha->uffd, UFFDIO_UNREGISTER, &range); + } + munmap(page, page_size); + return (void *)(long)-EIO; +} + +/* + * Test: read from userfaultfd-registered memory (no flags, should block + * until page fault is resolved by handler thread) + */ +TEST(read_userfaultfd_blocking) +{ + int uffd; + void *mem; + long page_size =3D sysconf(_SC_PAGESIZE); + uint8_t buf[sizeof(test_data)]; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov; + struct uffd_handler_args ha; + pthread_t handler; + ssize_t n; + + memset(buf, POISON_BYTE, sizeof(buf)); + + uffd =3D setup_userfaultfd(); + if (uffd =3D=3D -EPERM) + SKIP(return, "userfaultfd requires privileges (vm.unprivileged_userfault= fd=3D0)"); + if (uffd =3D=3D -ENOSYS) + SKIP(return, "userfaultfd not supported"); + ASSERT_GE(uffd, 0); + + mem =3D register_uffd_region(uffd, page_size); + ASSERT_NE(NULL, mem); + + ha.uffd =3D uffd; + ha.content =3D test_data; + ha.content_len =3D sizeof(test_data); + ha.mem =3D mem; + ha.mem_len =3D page_size; + ASSERT_EQ(0, pthread_create(&handler, NULL, uffd_handler_thread, &ha)); + + remote_iov.iov_base =3D mem; + remote_iov.iov_len =3D sizeof(test_data); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0); + ASSERT_EQ(sizeof(test_data), n); + ASSERT_EQ(0, memcmp(buf, test_data, sizeof(test_data))); + + pthread_join(handler, NULL); + munmap(mem, page_size); + close(uffd); +} + +/* + * Test: read with NOWAIT from userfaultfd-registered memory that has + * not been faulted in yet. Should return EFAULT (not block). + */ +TEST(read_nowait_userfaultfd) +{ + int uffd; + void *mem; + long page_size =3D sysconf(_SC_PAGESIZE); + uint8_t buf[sizeof(test_data)] =3D { 0 }; + struct iovec local_iov =3D { .iov_base =3D buf, .iov_len =3D sizeof(buf) = }; + struct iovec remote_iov; + ssize_t n; + + uffd =3D setup_userfaultfd(); + if (uffd =3D=3D -EPERM) + SKIP(return, "userfaultfd requires privileges (vm.unprivileged_userfault= fd=3D0)"); + if (uffd =3D=3D -ENOSYS) + SKIP(return, "userfaultfd not supported"); + ASSERT_GE(uffd, 0); + + mem =3D register_uffd_region(uffd, page_size); + ASSERT_NE(NULL, mem); + + /* Ensure the page is not present */ + madvise(mem, page_size, MADV_DONTNEED); + + remote_iov.iov_base =3D mem; + remote_iov.iov_len =3D sizeof(test_data); + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, + PROCESS_VM_NOWAIT); + if (n =3D=3D -1 && errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_NOWAIT not supported"); + ASSERT_EQ(-1, n); + ASSERT_EQ(EFAULT, errno); + + munmap(mem, page_size); + close(uffd); +} + +/* + * Test: NOWAIT read across two pages with a single iovec, where the + * first page is resident and the second is not. Tests whether a + * partial read within a single iovec element is possible. + */ +TEST(read_nowait_partial_single_iovec) +{ + int uffd; + void *mem; + long page_size =3D sysconf(_SC_PAGESIZE); + uint8_t *buf; + struct iovec local_iov; + struct iovec remote_iov; + struct uffdio_copy uffd_copy; + ssize_t n; + void *src_page; + + uffd =3D setup_userfaultfd(); + if (uffd =3D=3D -EPERM) + SKIP(return, "userfaultfd requires privileges (vm.unprivileged_userfault= fd=3D0)"); + if (uffd =3D=3D -ENOSYS) + SKIP(return, "userfaultfd not supported"); + ASSERT_GE(uffd, 0); + + /* Allocate 2 pages and register with userfaultfd */ + mem =3D register_uffd_region(uffd, 2 * page_size); + ASSERT_NE(NULL, mem); + + /* Resolve page 1 via UFFDIO_COPY, leave page 2 missing */ + src_page =3D mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(MAP_FAILED, src_page); + memset(src_page, 0xAA, page_size); + + uffd_copy.dst =3D (unsigned long)mem; + uffd_copy.src =3D (unsigned long)src_page; + uffd_copy.len =3D page_size; + uffd_copy.mode =3D 0; + ASSERT_EQ(0, ioctl(uffd, UFFDIO_COPY, &uffd_copy)); + munmap(src_page, page_size); + + /* Read across both pages with a single iovec */ + buf =3D malloc(2 * page_size); + ASSERT_NE(NULL, buf); + memset(buf, POISON_BYTE, 2 * page_size); + + local_iov.iov_base =3D buf; + local_iov.iov_len =3D 2 * page_size; + remote_iov.iov_base =3D mem; + remote_iov.iov_len =3D 2 * page_size; + + n =3D process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, + PROCESS_VM_NOWAIT); + if (n =3D=3D -1 && errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_NOWAIT not supported"); + ASSERT_EQ(page_size, n); + + /* Verify the first page was read correctly */ + for (int i =3D 0; i < page_size; i++) + ASSERT_EQ(0xAA, buf[i]); + + free(buf); + munmap(mem, 2 * page_size); + close(uffd); +} + +/* + * Test: NOWAIT read across two pages with two iovecs (one per page), + * where the first page is resident and the second is not. Tests + * whether the syscall returns bytes from the first iovec. + */ +TEST(read_nowait_partial_two_iovecs) +{ + int uffd; + void *mem; + long page_size =3D sysconf(_SC_PAGESIZE); + uint8_t *buf1, *buf2; + struct iovec local_iov[2]; + struct iovec remote_iov[2]; + struct uffdio_copy uffd_copy; + ssize_t n; + void *src_page; + + uffd =3D setup_userfaultfd(); + if (uffd =3D=3D -EPERM) + SKIP(return, "userfaultfd requires privileges (vm.unprivileged_userfault= fd=3D0)"); + if (uffd =3D=3D -ENOSYS) + SKIP(return, "userfaultfd not supported"); + ASSERT_GE(uffd, 0); + + /* Allocate 2 pages and register with userfaultfd */ + mem =3D register_uffd_region(uffd, 2 * page_size); + ASSERT_NE(NULL, mem); + + /* Resolve page 1 via UFFDIO_COPY, leave page 2 missing */ + src_page =3D mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(MAP_FAILED, src_page); + memset(src_page, 0xBB, page_size); + + uffd_copy.dst =3D (unsigned long)mem; + uffd_copy.src =3D (unsigned long)src_page; + uffd_copy.len =3D page_size; + uffd_copy.mode =3D 0; + ASSERT_EQ(0, ioctl(uffd, UFFDIO_COPY, &uffd_copy)); + munmap(src_page, page_size); + + /* Two iovecs: one for each page */ + buf1 =3D malloc(page_size); + buf2 =3D malloc(page_size); + ASSERT_NE(NULL, buf1); + ASSERT_NE(NULL, buf2); + memset(buf1, POISON_BYTE, page_size); + memset(buf2, POISON_BYTE, page_size); + + local_iov[0].iov_base =3D buf1; + local_iov[0].iov_len =3D page_size; + local_iov[1].iov_base =3D buf2; + local_iov[1].iov_len =3D page_size; + + remote_iov[0].iov_base =3D mem; + remote_iov[0].iov_len =3D page_size; + remote_iov[1].iov_base =3D (char *)mem + page_size; + remote_iov[1].iov_len =3D page_size; + + n =3D process_vm_readv(getpid(), local_iov, 2, remote_iov, 2, + PROCESS_VM_NOWAIT); + if (n =3D=3D -1 && errno =3D=3D EINVAL) + SKIP(return, "PROCESS_VM_NOWAIT not supported"); + ASSERT_EQ(page_size, n); + + /* Verify the first page was read correctly */ + for (int i =3D 0; i < page_size; i++) + ASSERT_EQ(0xBB, buf1[i]); + + free(buf1); + free(buf2); + munmap(mem, 2 * page_size); + close(uffd); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/self= tests/mm/run_vmtests.sh index c17b133a81d2..f7b55dea8d68 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -91,6 +91,8 @@ separated by spaces: test VMA merge cases behave as expected - rmap test rmap behaves as expected +- process_vm_readv + test process_vm_readv flags (pidfd, nowait) - memory-failure test memory-failure behaves as expected =20 @@ -531,6 +533,8 @@ CATEGORY=3D"page_frag" run_test ./test_page_frag.sh non= aligned =20 CATEGORY=3D"rmap" run_test ./rmap =20 +CATEGORY=3D"process_vm_readv" run_test ./process_vm_readv + # Try to load hwpoison_inject if not present. HWPOISON_DIR=3D/sys/kernel/debug/hwpoison/ if [ ! -d "$HWPOISON_DIR" ]; then --=20 2.45.0