[PATCH 0/7] direct-io: even more flexible io vectors

Keith Busch posted 7 patches 2 months ago
block/bio-integrity.c  |  4 +-
block/bio.c            | 58 +++++++++++++++++---------
block/blk-merge.c      |  5 +++
block/fops.c           |  4 +-
fs/iomap/direct-io.c   |  3 +-
include/linux/blkdev.h |  7 ----
include/linux/uio.h    |  2 -
lib/iov_iter.c         | 95 ------------------------------------------
8 files changed, 49 insertions(+), 129 deletions(-)
[PATCH 0/7] direct-io: even more flexible io vectors
Posted by Keith Busch 2 months ago
From: Keith Busch <kbusch@kernel.org>

In furthering direct IO use from user space buffers without bouncing to
align to unnecessary kernel software constraints, this series removes
the requirement that io vector lengths align to the logical block size.
The downside (if want to call it that) is that mis-aligned io vectors
are caught further down the block stack rather than closer to the
syscall.

This change also removes one walking of the io vector, so that's nice
too.

Keith Busch (7):
  block: check for valid bio while splitting
  block: align the bio after building it
  block: simplify direct io validity check
  iomap: simplify direct io validity check
  block: remove bdev_iter_is_aligned
  blk-integrity: use simpler alignment check
  iov_iter: remove iov_iter_is_aligned

 block/bio-integrity.c  |  4 +-
 block/bio.c            | 58 +++++++++++++++++---------
 block/blk-merge.c      |  5 +++
 block/fops.c           |  4 +-
 fs/iomap/direct-io.c   |  3 +-
 include/linux/blkdev.h |  7 ----
 include/linux/uio.h    |  2 -
 lib/iov_iter.c         | 95 ------------------------------------------
 8 files changed, 49 insertions(+), 129 deletions(-)

-- 
2.47.3
Re: [PATCH 0/7] direct-io: even more flexible io vectors
Posted by Jens Axboe 2 months ago
On 8/1/25 5:47 PM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> In furthering direct IO use from user space buffers without bouncing to
> align to unnecessary kernel software constraints, this series removes
> the requirement that io vector lengths align to the logical block size.
> The downside (if want to call it that) is that mis-aligned io vectors
> are caught further down the block stack rather than closer to the
> syscall.

That's not a downside imho, it's much nicer to have the correct/expected
case be fast, and catch the unexpected error case down the line when we
have to iterate the vecs anyway.

IOW, I love this patchset. I'll spend some time going over the details.
Did you write some test cases for this?

> This change also removes one walking of the io vector, so that's nice
> too.
> 
> Keith Busch (7):
>   block: check for valid bio while splitting
>   block: align the bio after building it
>   block: simplify direct io validity check
>   iomap: simplify direct io validity check
>   block: remove bdev_iter_is_aligned
>   blk-integrity: use simpler alignment check
>   iov_iter: remove iov_iter_is_aligned
> 
>  block/bio-integrity.c  |  4 +-
>  block/bio.c            | 58 +++++++++++++++++---------
>  block/blk-merge.c      |  5 +++
>  block/fops.c           |  4 +-
>  fs/iomap/direct-io.c   |  3 +-
>  include/linux/blkdev.h |  7 ----
>  include/linux/uio.h    |  2 -
>  lib/iov_iter.c         | 95 ------------------------------------------
>  8 files changed, 49 insertions(+), 129 deletions(-)

Now that's a beautiful diffstat.

-- 
Jens Axboe
Re: [PATCH 0/7] direct-io: even more flexible io vectors
Posted by Keith Busch 2 months ago
On Sat, Aug 02, 2025 at 09:37:32AM -0600, Jens Axboe wrote:
> Did you write some test cases for this?

I have some crude unit tests to hit specific conditions that might
happen with nvme.

Note, the "second" test here will fail with the wrong result with this
version of the patchset due to the issue I mentioned on patch 2, but
I've a fix for it ready for the next version.

---
/*
 * This test is aligned to NVMe's PRP virtual boundary. It is intended to
 * execute on such a device with 4k formatted logical block size.
 *
 * The first test will submit a vectored read with a total size aligned to a 4k
 * block, but individual vectors may not be. This should be successful.
 *
 * The second test will submit a vectored read with a total size aligned to a
 * 4k block, but the first vector contains an invalid address. This should get
 * EFAULT.
 *
 * The third one will submit an IO with a total size aligned to a 4k block,
 * but it will fail the virtual boundary condition, which should result in a
 * split to a 0 length bio. This should get an EINVAL.
 *
 * The fourth test will submit IO with a total size aligned to a 4k block, but
 * with invalid DMA offsets. This should get an EINVAL.
 *
 * The last test will submit a large IO with a page offset that should exceed
 * the bio max vectors limit, resulting in reverting part of a bio iteration.
 * This should be successful.
 */
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sys/uio.h>
#include <string.h>

#define BSIZE (8 * 1024 * 1024)
#define VECS 4

int main(int argc, char **argv)
{
        int fd, ret, i, j;
        struct iovec iov[VECS];
        char *buf;

        if (argc < 2)
                return -1;

        fd = open(argv[1], O_RDONLY | O_DIRECT);
        if (fd < 0)
                return fd;

        ret = posix_memalign((void **)&buf, 4096, BSIZE);
        if (ret)
                return ret;

        memset(buf, 0, BSIZE);

        iov[0].iov_base = buf + 3072;
        iov[0].iov_len = 1024;

        iov[1].iov_base = buf + (2 * 4096);
        iov[1].iov_len = 4096;

        iov[2].iov_base = buf + (8 * 4096);
        iov[2].iov_len = 4096;

        iov[3].iov_base = buf + (16 * 4096);
        iov[3].iov_len = 3072;

        ret = preadv(fd, iov, VECS, 0);
        if (ret < 0)
                perror("unexpected read failure");

        iov[0].iov_base = 0;
        ret = preadv(fd, iov, VECS, 0);
        if (ret < 0)
                perror("expected read failure for invalid address");

        iov[0].iov_base = buf;
        iov[0].iov_len = 1024;

        iov[1].iov_base = buf + (2 * 4096);
        iov[1].iov_len = 1024;

        iov[2].iov_base = buf + (8 * 4096);
        iov[2].iov_len = 1024;

        iov[3].iov_base = buf + (16 * 4096);
        iov[3].iov_len = 1024;

        ret = preadv(fd, iov, VECS, 0);
        if (ret < 0)
                perror("expected read for invalid virtual boundary");

        iov[0].iov_base = buf + 3072;
        iov[0].iov_len = 1025;

        iov[1].iov_base = buf + (2 * 4096);
        iov[1].iov_len = 4096;

        iov[2].iov_base = buf + (8 * 4096);
        iov[2].iov_len = 4096;

        iov[3].iov_base = buf + (16 * 4096);
        iov[3].iov_len = 3073;

        ret = preadv(fd, iov, VECS, 0);
        if (ret < 0)
                perror("expected read for invalid dma boundary");

        ret = pread(fd, buf + 2048, BSIZE - 8192, 0);
        if (ret < 0)
                perror("unexpected large read failure");

        free(buf);
        return errno;
}
--
Re: [PATCH 0/7] direct-io: even more flexible io vectors
Posted by Keith Busch 2 months ago
On Mon, Aug 04, 2025 at 11:06:12AM -0600, Keith Busch wrote:
> On Sat, Aug 02, 2025 at 09:37:32AM -0600, Jens Axboe wrote:
> > Did you write some test cases for this?
> 
> I have some crude unit tests to hit specific conditions that might
> happen with nvme.

I've made imporvements today that make these targeted tests fit into
blktests framework.

Just fyi, I took a look at what 'fio' needs in order to exercise these
new use cases. This patchset requires multiple io-vectors for anything
interesting to happen, which 'fio' currently doesn't do. I'm not even
sure what new command line parameters could best convey how you want to
construct iovecs! Maybe just make it random within some alignment
constraints?