[PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages

Pantelis Antoniou posted 1 patch 9 months ago
mm/gup.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
[PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages
Posted by Pantelis Antoniou 9 months ago
Updates to network filesystems enabled zero copy I/O by using the
netfslib common accessors.

One example of that is the 9p filesystem which is commonly used in qemu
based setups for sharing files with the host.

In our emulation environment we have noticed failing writes when performing
I/O from a userspace mapped DRM GEM buffer object.
The platform does not use VRAM, all graphics memory is regular DRAM memory,
allocated via __get_free_pages

The same write was successful from a heap allocated bounce buffer.

The sequence of events is as follows.

1. A BO (Buffer Object) is created, and it's backing memory is allocated via
   __get_user_pages()

2. Userspace mmaps a BO (Buffer Object) via a mmap call on the opened
   file handle of a DRM driver. The mapping is done via the
   drm_gem_mmap_obj() call.

3. Userspace issues a write to a file copying the contents of the BO.

3a. If the file is located on regular filesystem (like ext4), the write
    completes successfully.

3b. If the file is located on a network filesystem, like 9p the write fails.

The write fails because v9fs_file_write_iter() will call
netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() which will 
call netfs_extract_user_iter() 

netfs_extract_user_iter() will in turn call iov_iter_extract_pages() which for
a user backed iterator will call iov_iter_extract_user_pages which will call
pin_user_pages_fast() which finally will call __gup_longterm_locked().

__gup_longterm_locked() will call __get_user_pages_locked() which will fail
because the VMA is marked with the VM_IO and VM_PFNMAP flags.

Pantelis Antoniou (1):
  Fix zero copy I/O on __get_user_pages allocated pages

 mm/gup.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

-- 
2.25.1
Re: [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages
Posted by Andrew Morton 9 months ago
On Wed, 7 May 2025 10:41:04 -0500 Pantelis Antoniou <p.antoniou@partner.samsung.com> wrote:

> One example of that is the 9p filesystem which is commonly used in qemu
> based setups for sharing files with the host.
> 
> In our emulation environment we have noticed failing writes when performing
> I/O from a userspace mapped DRM GEM buffer object.
> The platform does not use VRAM, all graphics memory is regular DRAM memory,
> allocated via __get_free_pages
> 
> The same write was successful from a heap allocated bounce buffer.
> 
> The sequence of events is as follows.
>
> ..
>

There's a lot of good stuff in this [0/N], but a single patch "series"
is cumbersome.  Can you please redo this as a standalone patch in which
the changelog is the union of [0/1] and [1/1]?
Re: [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages
Posted by Andrew Morton 9 months ago
On Wed, 7 May 2025 10:41:04 -0500 Pantelis Antoniou <p.antoniou@partner.samsung.com> wrote:

> Updates to network filesystems enabled zero copy I/O by using the
> netfslib common accessors.

Updates by whom?  Are all the people who need to know about this being
cc'ed here?

> One example of that is the 9p filesystem which is commonly used in qemu
> based setups for sharing files with the host.
> 
> In our emulation environment we have noticed failing writes when performing
> I/O from a userspace mapped DRM GEM buffer object.
> The platform does not use VRAM, all graphics memory is regular DRAM memory,
> allocated via __get_free_pages

We should identify which kernel version(s) should be patched, please. 
6.16-rc1?  6.15?  -stable?

I often make these decisions but in this case I have far too little
information to be able to do that.

Thanks.
Re: [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages
Posted by Pantelis Antoniou 9 months ago
On Wed, 7 May 2025 14:50:18 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 7 May 2025 10: 41: 04 -0500 Pantelis Antoniou <p. antoniou@
> partner. samsung. com> wrote: > Updates to network filesystems
> enabled zero copy I/O by using the > netfslib common accessors.
> Updates by whom? Are all the people who 
> On Wed, 7 May 2025 10:41:04 -0500 Pantelis Antoniou
> <p.antoniou@partner.samsung.com> wrote:
> 
> > Updates to network filesystems enabled zero copy I/O by using the
> > netfslib common accessors.
> 
> Updates by whom?  Are all the people who need to know about this being
> cc'ed here?
> 

I think the first cover letter contains that information.

> > One example of that is the 9p filesystem which is commonly used in
> > qemu based setups for sharing files with the host.
> > 
> > In our emulation environment we have noticed failing writes when
> > performing I/O from a userspace mapped DRM GEM buffer object.
> > The platform does not use VRAM, all graphics memory is regular DRAM
> > memory, allocated via __get_free_pages
> 
> We should identify which kernel version(s) should be patched, please. 
> 6.16-rc1?  6.15?  -stable?
> 

The first occurance of the bug was on internal kernel tree that was
based on 6.8.

This patch is against 6.15-rc5.

> I often make these decisions but in this case I have far too little
> information to be able to do that.
> 

No worries.

I see that this is picked up for mm unstable as is? Do you want
me to generate a single patch merging the info of the cover letter
and the single patch?

The reason for the split is that I was not sure if you needed to
have all the sordid details included in the applied patch.

FWIW, we also have a buildroot patch that exhibits the problem
in a much simplified way that what the original bug report came about.
I don't think its appropriate content for the list, but I can
share if anyone is curious about it.

> Thanks.

Regards

-- Pantelis