Modernize the NFS Direct I/O path as a preparatory step to enable PCI
Peer-to-Peer DMA (P2PDMA) support. Following feedback on the initial
RFC [1], the modernization and architectural changes are split into
this standalone series.
Currently, NFS O_DIRECT relies on the legacy iov_iter_get_pages_alloc2()
API which does not support the pinning requirements for P2P memory.
The implementation moves NFS to the modern iov_iter_extract_pages() API
and migrates NFS direct I/O away from pages to use folios.
Design
======
1. Pin-Awareness
Standard NFS requests use get_page() and put_page() for memory
management. However, memory extracted via iov_iter_extract_pages()
requires explicit pinning.
Introduce a PG_PINNED flag and a wb_nr_pinned count to struct nfs_page.
This allows the request lifecycle to track ownership of physical pins
and ensure that unpinning is performed only when the I/O is complete.
2. API Migration
Migrate the Direct I/O path to the modern iov_iter_extract_pages()
API. This aligns NFS with the modern extraction model and serves as
the foundation for passing ITER_ALLOW_P2PDMA in a follow-up series.
3. Extraction Helper and Folio Support
Introduce a new extraction helper in direct.c to group contiguous
pages from the same folio into a single struct nfs_page. This
effectively migrates the Direct I/O path from being page-based to being
folio-based.
Note: zone_device_pages_have_same_pgmap() checks are intentionally
omitted in the extraction helper since P2PDMA enablement will be
introduced in a follow-up series.
Bisectability
=============
The series attempts to remain bisectable.
[Patches 1-2] Introduce pin-aware infrastructure and accounting.
[Patch 3] Adds a centralized request release helper.
[Patch 4] Migrates the Direct I/O path to iov_iter_extract_pages().
[Patches 5-6] Implement the extraction helper and folio-based grouping.
[Patch 7] Removes orphaned page-based helpers.
Testing
=======
The series lightly tested using fio (bs=1M, size=1G) on a small
(non-server) machine running Linux 7.1-rc6. Some test logs from a run:
nfs-test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
fio-3.42-37-g5b47
Starting 1 process
nfs-test: (groupid=0, jobs=1): err= 0: pid=33264: Tue Jun 2 23:50:15 2026
read: IOPS=5145, BW=5146MiB/s (5396MB/s)(1024MiB/199msec)
slat (usec): min=8, max=168, avg=11.12, stdev= 5.16
clat (usec): min=153, max=628, avg=182.20, stdev=24.15
lat (usec): min=165, max=796, avg=193.33, stdev=27.64
clat percentiles (usec):
| 1.00th=[ 159], 5.00th=[ 163], 10.00th=[ 165], 20.00th=[ 169],
| 30.00th=[ 172], 40.00th=[ 176], 50.00th=[ 178], 60.00th=[ 182],
| 70.00th=[ 186], 80.00th=[ 194], 90.00th=[ 202], 95.00th=[ 215],
| 99.00th=[ 229], 99.50th=[ 334], 99.90th=[ 408], 99.95th=[ 627],
| 99.99th=[ 627]
lat (usec) : 250=99.32%, 500=0.59%, 750=0.10%
cpu : usr=1.01%, sys=5.56%, ctx=1025, majf=0, minf=265
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1024,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0.00ns, window=0.00ns, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=5146MiB/s (5396MB/s), 5146MiB/s-5146MiB/s (5396MB/s-5396MB/s), io=1024MiB (1074MB), run=199-199msec
Pranjal Shrivastava (7):
nfs: make nfs_page pin-aware
nfs: Track number of pinned pages in nfs_page
nfs: Introduce nfs_release_request_list helper
nfs: migrate direct I/O to iov_iter_extract_pages
nfs: introduce nfs_direct_extract_pages helper
nfs: Optimize direct I/O to use folios for requests
nfs: Cleanup the nfs_page_create_from_page helper
fs/nfs/direct.c | 160 ++++++++++++++++++++++-----------------
fs/nfs/pagelist.c | 86 +++++++++++----------
fs/nfs/read.c | 2 +-
fs/nfs/write.c | 2 +-
include/linux/nfs_page.h | 12 ++-
5 files changed, 144 insertions(+), 118 deletions(-)
base-commit: 2c9eb6f2c18bff4cf3ddeab96db5137cc2b2572b
--
2.54.0.1013.g208068f2d8-goog