drivers/vdpa/vdpa_user/iova_domain.c | 18 ++++++++++-------- drivers/vdpa/vdpa_user/iova_domain.h | 1 + 2 files changed, 11 insertions(+), 8 deletions(-)
Barry said [1]:
"""
mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
__GFP_NOFAIL without direct reclamation may just result in a busy
loop within non-sleepable contexts.
""“
Unfortuantely, we do that under read lock. A possible way to fix that
is to move the pages allocation out of the lock into the caller, but
having to allocate a huge number of pages and auxiliary page array
seems to be problematic as well per Tetsuon [2]:
"""
You should implement proper error handling instead of using
__GFP_NOFAIL if count can become large.
"""
So I choose another way, which does not release kernel bounce pages
when user tries to register usersapce bounce pages. Then we don't need
to do allocation in the path which is not expected to be fail (e.g in
the release). We pay this for more memory usage but further
optimizations could be done on top.
[1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
[2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vdpa/vdpa_user/iova_domain.c | 18 ++++++++++--------
drivers/vdpa/vdpa_user/iova_domain.h | 1 +
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index 791d38d6284c..933d2f7cd49a 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -162,6 +162,7 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
enum dma_data_direction dir)
{
struct vduse_bounce_map *map;
+ struct page *page;
unsigned int offset;
void *addr;
size_t sz;
@@ -178,7 +179,10 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
map->orig_phys == INVALID_PHYS_ADDR))
return;
- addr = kmap_local_page(map->bounce_page);
+ page = domain->user_bounce_pages ?
+ map->user_bounce_page : map->bounce_page;
+
+ addr = kmap_local_page(page);
do_bounce(map->orig_phys + offset, addr + offset, sz, dir);
kunmap_local(addr);
size -= sz;
@@ -270,9 +274,8 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
memcpy_to_page(pages[i], 0,
page_address(map->bounce_page),
PAGE_SIZE);
- __free_page(map->bounce_page);
}
- map->bounce_page = pages[i];
+ map->user_bounce_page = pages[i];
get_page(pages[i]);
}
domain->user_bounce_pages = true;
@@ -297,17 +300,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
struct page *page = NULL;
map = &domain->bounce_maps[i];
- if (WARN_ON(!map->bounce_page))
+ if (WARN_ON(!map->user_bounce_page))
continue;
/* Copy user page to kernel page if it's in use */
if (map->orig_phys != INVALID_PHYS_ADDR) {
- page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
+ page = map->bounce_page;
memcpy_from_page(page_address(page),
- map->bounce_page, 0, PAGE_SIZE);
+ map->user_bounce_page, 0, PAGE_SIZE);
}
- put_page(map->bounce_page);
- map->bounce_page = page;
+ put_page(map->user_bounce_page);
}
domain->user_bounce_pages = false;
out:
diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h
index f92f22a7267d..7f3f0928ec78 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.h
+++ b/drivers/vdpa/vdpa_user/iova_domain.h
@@ -21,6 +21,7 @@
struct vduse_bounce_map {
struct page *bounce_page;
+ struct page *user_bounce_page;
u64 orig_phys;
};
--
2.31.1
On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
>
> Barry said [1]:
>
> """
> mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> __GFP_NOFAIL without direct reclamation may just result in a busy
> loop within non-sleepable contexts.
> ""“
>
> Unfortuantely, we do that under read lock. A possible way to fix that
> is to move the pages allocation out of the lock into the caller, but
> having to allocate a huge number of pages and auxiliary page array
> seems to be problematic as well per Tetsuon [2]:
>
> """
> You should implement proper error handling instead of using
> __GFP_NOFAIL if count can become large.
> """
>
> So I choose another way, which does not release kernel bounce pages
> when user tries to register usersapce bounce pages. Then we don't need
> to do allocation in the path which is not expected to be fail (e.g in
> the release). We pay this for more memory usage but further
> optimizations could be done on top.
>
> [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
>
> Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Tested-by: Xie Yongji <xieyongji@bytedance.com>
Have tested it with qemu-storage-daemon [1]:
$ qemu-storage-daemon \
--chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
--monitor chardev=charmonitor \
--blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
\
--export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
[1] https://github.com/bytedance/qemu/tree/vduse-umem
> drivers/vdpa/vdpa_user/iova_domain.c | 18 ++++++++++--------
> drivers/vdpa/vdpa_user/iova_domain.h | 1 +
> 2 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> index 791d38d6284c..933d2f7cd49a 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -162,6 +162,7 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> enum dma_data_direction dir)
> {
> struct vduse_bounce_map *map;
> + struct page *page;
> unsigned int offset;
> void *addr;
> size_t sz;
> @@ -178,7 +179,10 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> map->orig_phys == INVALID_PHYS_ADDR))
> return;
>
> - addr = kmap_local_page(map->bounce_page);
> + page = domain->user_bounce_pages ?
> + map->user_bounce_page : map->bounce_page;
> +
> + addr = kmap_local_page(page);
> do_bounce(map->orig_phys + offset, addr + offset, sz, dir);
> kunmap_local(addr);
> size -= sz;
> @@ -270,9 +274,8 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
> memcpy_to_page(pages[i], 0,
> page_address(map->bounce_page),
> PAGE_SIZE);
> - __free_page(map->bounce_page);
> }
> - map->bounce_page = pages[i];
> + map->user_bounce_page = pages[i];
> get_page(pages[i]);
> }
> domain->user_bounce_pages = true;
> @@ -297,17 +300,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
> struct page *page = NULL;
>
> map = &domain->bounce_maps[i];
> - if (WARN_ON(!map->bounce_page))
> + if (WARN_ON(!map->user_bounce_page))
> continue;
>
> /* Copy user page to kernel page if it's in use */
> if (map->orig_phys != INVALID_PHYS_ADDR) {
> - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
> + page = map->bounce_page;
> memcpy_from_page(page_address(page),
> - map->bounce_page, 0, PAGE_SIZE);
> + map->user_bounce_page, 0, PAGE_SIZE);
> }
> - put_page(map->bounce_page);
> - map->bounce_page = page;
> + put_page(map->user_bounce_page);
map->user_bounce_page = NULL?
Thanks,
Yongji
On Wed, Aug 7, 2024 at 2:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > Barry said [1]:
> >
> > """
> > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > __GFP_NOFAIL without direct reclamation may just result in a busy
> > loop within non-sleepable contexts.
> > ""“
> >
> > Unfortuantely, we do that under read lock. A possible way to fix that
> > is to move the pages allocation out of the lock into the caller, but
> > having to allocate a huge number of pages and auxiliary page array
> > seems to be problematic as well per Tetsuon [2]:
> >
> > """
> > You should implement proper error handling instead of using
> > __GFP_NOFAIL if count can become large.
> > """
> >
> > So I choose another way, which does not release kernel bounce pages
> > when user tries to register usersapce bounce pages. Then we don't need
> > to do allocation in the path which is not expected to be fail (e.g in
> > the release). We pay this for more memory usage but further
> > optimizations could be done on top.
> >
> > [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> > [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
> >
> > Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
>
> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> Tested-by: Xie Yongji <xieyongji@bytedance.com>
Thanks.
>
> Have tested it with qemu-storage-daemon [1]:
>
> $ qemu-storage-daemon \
> --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
> --monitor chardev=charmonitor \
> --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
> \
> --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
>
> [1] https://github.com/bytedance/qemu/tree/vduse-umem
Great, would you want to post them to the Qemu?
>
> > drivers/vdpa/vdpa_user/iova_domain.c | 18 ++++++++++--------
> > drivers/vdpa/vdpa_user/iova_domain.h | 1 +
> > 2 files changed, 11 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> > index 791d38d6284c..933d2f7cd49a 100644
> > --- a/drivers/vdpa/vdpa_user/iova_domain.c
> > +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> > @@ -162,6 +162,7 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> > enum dma_data_direction dir)
> > {
> > struct vduse_bounce_map *map;
> > + struct page *page;
> > unsigned int offset;
> > void *addr;
> > size_t sz;
> > @@ -178,7 +179,10 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> > map->orig_phys == INVALID_PHYS_ADDR))
> > return;
> >
> > - addr = kmap_local_page(map->bounce_page);
> > + page = domain->user_bounce_pages ?
> > + map->user_bounce_page : map->bounce_page;
> > +
> > + addr = kmap_local_page(page);
> > do_bounce(map->orig_phys + offset, addr + offset, sz, dir);
> > kunmap_local(addr);
> > size -= sz;
> > @@ -270,9 +274,8 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
> > memcpy_to_page(pages[i], 0,
> > page_address(map->bounce_page),
> > PAGE_SIZE);
> > - __free_page(map->bounce_page);
> > }
> > - map->bounce_page = pages[i];
> > + map->user_bounce_page = pages[i];
> > get_page(pages[i]);
> > }
> > domain->user_bounce_pages = true;
> > @@ -297,17 +300,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
> > struct page *page = NULL;
> >
> > map = &domain->bounce_maps[i];
> > - if (WARN_ON(!map->bounce_page))
> > + if (WARN_ON(!map->user_bounce_page))
> > continue;
> >
> > /* Copy user page to kernel page if it's in use */
> > if (map->orig_phys != INVALID_PHYS_ADDR) {
> > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
> > + page = map->bounce_page;
> > memcpy_from_page(page_address(page),
> > - map->bounce_page, 0, PAGE_SIZE);
> > + map->user_bounce_page, 0, PAGE_SIZE);
> > }
> > - put_page(map->bounce_page);
> > - map->bounce_page = page;
> > + put_page(map->user_bounce_page);
>
> map->user_bounce_page = NULL?
For safety, it might be better. Let me add this and post a V2/
Thanks
>
> Thanks,
> Yongji
>
On Thu, Aug 8, 2024 at 10:58 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Aug 7, 2024 at 2:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > Barry said [1]:
> > >
> > > """
> > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > > __GFP_NOFAIL without direct reclamation may just result in a busy
> > > loop within non-sleepable contexts.
> > > ""“
> > >
> > > Unfortuantely, we do that under read lock. A possible way to fix that
> > > is to move the pages allocation out of the lock into the caller, but
> > > having to allocate a huge number of pages and auxiliary page array
> > > seems to be problematic as well per Tetsuon [2]:
> > >
> > > """
> > > You should implement proper error handling instead of using
> > > __GFP_NOFAIL if count can become large.
> > > """
> > >
> > > So I choose another way, which does not release kernel bounce pages
> > > when user tries to register usersapce bounce pages. Then we don't need
> > > to do allocation in the path which is not expected to be fail (e.g in
> > > the release). We pay this for more memory usage but further
> > > optimizations could be done on top.
> > >
> > > [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> > > [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
> > >
> > > Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> >
> > Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> > Tested-by: Xie Yongji <xieyongji@bytedance.com>
>
> Thanks.
>
> >
> > Have tested it with qemu-storage-daemon [1]:
> >
> > $ qemu-storage-daemon \
> > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
> > --monitor chardev=charmonitor \
> > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
> > \
> > --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
> >
> > [1] https://github.com/bytedance/qemu/tree/vduse-umem
>
> Great, would you want to post them to the Qemu?
>
Looks like qemu-storage-daemon would not benefit from this feature
which is designed for some hugepage users such as SPDK/DPDK.
Thanks,
Yongji
On Thu, Aug 8, 2024 at 6:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Thu, Aug 8, 2024 at 10:58 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 2:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > >
> > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > Barry said [1]:
> > > >
> > > > """
> > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > > > __GFP_NOFAIL without direct reclamation may just result in a busy
> > > > loop within non-sleepable contexts.
> > > > ""“
> > > >
> > > > Unfortuantely, we do that under read lock. A possible way to fix that
> > > > is to move the pages allocation out of the lock into the caller, but
> > > > having to allocate a huge number of pages and auxiliary page array
> > > > seems to be problematic as well per Tetsuon [2]:
> > > >
> > > > """
> > > > You should implement proper error handling instead of using
> > > > __GFP_NOFAIL if count can become large.
> > > > """
> > > >
> > > > So I choose another way, which does not release kernel bounce pages
> > > > when user tries to register usersapce bounce pages. Then we don't need
> > > > to do allocation in the path which is not expected to be fail (e.g in
> > > > the release). We pay this for more memory usage but further
> > > > optimizations could be done on top.
> > > >
> > > > [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> > > > [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
> > > >
> > > > Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > >
> > > Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> > > Tested-by: Xie Yongji <xieyongji@bytedance.com>
> >
> > Thanks.
> >
> > >
> > > Have tested it with qemu-storage-daemon [1]:
> > >
> > > $ qemu-storage-daemon \
> > > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
> > > --monitor chardev=charmonitor \
> > > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
> > > \
> > > --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
> > >
> > > [1] https://github.com/bytedance/qemu/tree/vduse-umem
> >
> > Great, would you want to post them to the Qemu?
> >
>
> Looks like qemu-storage-daemon would not benefit from this feature
> which is designed for some hugepage users such as SPDK/DPDK.
Yes, but maybe for testing purposes like here?
Thanks
>
> Thanks,
> Yongji
>
On Mon, Aug 12, 2024 at 3:00 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Aug 8, 2024 at 6:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Thu, Aug 8, 2024 at 10:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Aug 7, 2024 at 2:52 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> > > >
> > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > Barry said [1]:
> > > > >
> > > > > """
> > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > > > > __GFP_NOFAIL without direct reclamation may just result in a busy
> > > > > loop within non-sleepable contexts.
> > > > > ""“
> > > > >
> > > > > Unfortuantely, we do that under read lock. A possible way to fix that
> > > > > is to move the pages allocation out of the lock into the caller, but
> > > > > having to allocate a huge number of pages and auxiliary page array
> > > > > seems to be problematic as well per Tetsuon [2]:
> > > > >
> > > > > """
> > > > > You should implement proper error handling instead of using
> > > > > __GFP_NOFAIL if count can become large.
> > > > > """
> > > > >
> > > > > So I choose another way, which does not release kernel bounce pages
> > > > > when user tries to register usersapce bounce pages. Then we don't need
> > > > > to do allocation in the path which is not expected to be fail (e.g in
> > > > > the release). We pay this for more memory usage but further
> > > > > optimizations could be done on top.
> > > > >
> > > > > [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> > > > > [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
> > > > >
> > > > > Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > >
> > > > Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
> > > > Tested-by: Xie Yongji <xieyongji@bytedance.com>
> > >
> > > Thanks.
> > >
> > > >
> > > > Have tested it with qemu-storage-daemon [1]:
> > > >
> > > > $ qemu-storage-daemon \
> > > > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
> > > > --monitor chardev=charmonitor \
> > > > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
> > > > \
> > > > --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
> > > >
> > > > [1] https://github.com/bytedance/qemu/tree/vduse-umem
> > >
> > > Great, would you want to post them to the Qemu?
> > >
> >
> > Looks like qemu-storage-daemon would not benefit from this feature
> > which is designed for some hugepage users such as SPDK/DPDK.
>
> Yes, but maybe for testing purposes like here?
>
OK for me.
Thanks,
Yongji
On Mon, Aug 5, 2024 at 8:21 PM Jason Wang <jasowang@redhat.com> wrote:
>
> Barry said [1]:
>
> """
> mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> __GFP_NOFAIL without direct reclamation may just result in a busy
> loop within non-sleepable contexts.
> ""“
the current code will result in returning a NULL pointer but
not a busy-loop.
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
...
/*
* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
* we always retry
*/
if (gfp_mask & __GFP_NOFAIL) {
/*
* All existing users of the __GFP_NOFAIL are blockable, so warn
* of any new users that actually require GFP_NOWAIT
*/
if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
goto fail;
...
}
...
fail:
warn_alloc(gfp_mask, ac->nodemask,
"page allocation failure: order:%u", order);
got_pg:
return page;
}
We have two choices to address the issue:
1. busy-loop
2. BUG_ON
the below patch chose 2:
https://lore.kernel.org/linux-mm/20240731000155.109583-5-21cnbao@gmail.com/
>
> Unfortuantely, we do that under read lock. A possible way to fix that
> is to move the pages allocation out of the lock into the caller, but
> having to allocate a huge number of pages and auxiliary page array
> seems to be problematic as well per Tetsuon [2]:
>
> """
> You should implement proper error handling instead of using
> __GFP_NOFAIL if count can become large.
> """
>
> So I choose another way, which does not release kernel bounce pages
> when user tries to register usersapce bounce pages. Then we don't need
> to do allocation in the path which is not expected to be fail (e.g in
> the release). We pay this for more memory usage but further
> optimizations could be done on top.
>
> [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
>
> Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vdpa/vdpa_user/iova_domain.c | 18 ++++++++++--------
> drivers/vdpa/vdpa_user/iova_domain.h | 1 +
> 2 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> index 791d38d6284c..933d2f7cd49a 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -162,6 +162,7 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> enum dma_data_direction dir)
> {
> struct vduse_bounce_map *map;
> + struct page *page;
> unsigned int offset;
> void *addr;
> size_t sz;
> @@ -178,7 +179,10 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> map->orig_phys == INVALID_PHYS_ADDR))
> return;
>
> - addr = kmap_local_page(map->bounce_page);
> + page = domain->user_bounce_pages ?
> + map->user_bounce_page : map->bounce_page;
> +
> + addr = kmap_local_page(page);
> do_bounce(map->orig_phys + offset, addr + offset, sz, dir);
> kunmap_local(addr);
> size -= sz;
> @@ -270,9 +274,8 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
> memcpy_to_page(pages[i], 0,
> page_address(map->bounce_page),
> PAGE_SIZE);
> - __free_page(map->bounce_page);
> }
> - map->bounce_page = pages[i];
> + map->user_bounce_page = pages[i];
> get_page(pages[i]);
> }
> domain->user_bounce_pages = true;
> @@ -297,17 +300,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
> struct page *page = NULL;
>
> map = &domain->bounce_maps[i];
> - if (WARN_ON(!map->bounce_page))
> + if (WARN_ON(!map->user_bounce_page))
> continue;
>
> /* Copy user page to kernel page if it's in use */
> if (map->orig_phys != INVALID_PHYS_ADDR) {
> - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
> + page = map->bounce_page;
> memcpy_from_page(page_address(page),
> - map->bounce_page, 0, PAGE_SIZE);
> + map->user_bounce_page, 0, PAGE_SIZE);
> }
> - put_page(map->bounce_page);
> - map->bounce_page = page;
> + put_page(map->user_bounce_page);
> }
> domain->user_bounce_pages = false;
> out:
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h
> index f92f22a7267d..7f3f0928ec78 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.h
> +++ b/drivers/vdpa/vdpa_user/iova_domain.h
> @@ -21,6 +21,7 @@
>
> struct vduse_bounce_map {
> struct page *bounce_page;
> + struct page *user_bounce_page;
> u64 orig_phys;
> };
>
> --
> 2.31.1
>
On Tue, Aug 6, 2024 at 10:30 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Mon, Aug 5, 2024 at 8:21 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > Barry said [1]:
> >
> > """
> > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > __GFP_NOFAIL without direct reclamation may just result in a busy
> > loop within non-sleepable contexts.
> > ""“
>
> the current code will result in returning a NULL pointer but
> not a busy-loop.
>
> static inline struct page *
> __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> struct alloc_context *ac)
> {
> ...
> /*
> * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
> * we always retry
> */
> if (gfp_mask & __GFP_NOFAIL) {
> /*
> * All existing users of the __GFP_NOFAIL are blockable, so warn
> * of any new users that actually require GFP_NOWAIT
> */
> if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> goto fail;
> ...
> }
> ...
> fail:
> warn_alloc(gfp_mask, ac->nodemask,
> "page allocation failure: order:%u", order);
> got_pg:
> return page;
> }
>
>
> We have two choices to address the issue:
> 1. busy-loop
> 2. BUG_ON
>
> the below patch chose 2:
> https://lore.kernel.org/linux-mm/20240731000155.109583-5-21cnbao@gmail.com/
>
I will add those to V2.
Thanks
On Mon, Aug 05, 2024 at 04:21:06PM +0800, Jason Wang wrote:
> Barry said [1]:
>
> """
> mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> __GFP_NOFAIL without direct reclamation may just result in a busy
> loop within non-sleepable contexts.
> ""“
>
> Unfortuantely, we do that under read lock. A possible way to fix that
> is to move the pages allocation out of the lock into the caller, but
> having to allocate a huge number of pages and auxiliary page array
> seems to be problematic as well per Tetsuon [2]:
>
> """
> You should implement proper error handling instead of using
> __GFP_NOFAIL if count can become large.
> """
>
> So I choose another way, which does not release kernel bounce pages
> when user tries to register usersapce bounce pages. Then we don't need
userspace
> to do allocation in the path which is not expected to be fail (e.g in
> the release). We pay this for more memory usage but further
what does "we pay this for more memory usage" mean?
Do you mean "we pay for this by using more memory"?
How much more?
> optimizations could be done on top.
>
> [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
>
> Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vdpa/vdpa_user/iova_domain.c | 18 ++++++++++--------
> drivers/vdpa/vdpa_user/iova_domain.h | 1 +
> 2 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
> index 791d38d6284c..933d2f7cd49a 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -162,6 +162,7 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> enum dma_data_direction dir)
> {
> struct vduse_bounce_map *map;
> + struct page *page;
> unsigned int offset;
> void *addr;
> size_t sz;
> @@ -178,7 +179,10 @@ static void vduse_domain_bounce(struct vduse_iova_domain *domain,
> map->orig_phys == INVALID_PHYS_ADDR))
> return;
>
> - addr = kmap_local_page(map->bounce_page);
> + page = domain->user_bounce_pages ?
> + map->user_bounce_page : map->bounce_page;
> +
> + addr = kmap_local_page(page);
> do_bounce(map->orig_phys + offset, addr + offset, sz, dir);
> kunmap_local(addr);
> size -= sz;
> @@ -270,9 +274,8 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
> memcpy_to_page(pages[i], 0,
> page_address(map->bounce_page),
> PAGE_SIZE);
> - __free_page(map->bounce_page);
> }
> - map->bounce_page = pages[i];
> + map->user_bounce_page = pages[i];
> get_page(pages[i]);
> }
> domain->user_bounce_pages = true;
> @@ -297,17 +300,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
> struct page *page = NULL;
>
> map = &domain->bounce_maps[i];
> - if (WARN_ON(!map->bounce_page))
> + if (WARN_ON(!map->user_bounce_page))
> continue;
>
> /* Copy user page to kernel page if it's in use */
> if (map->orig_phys != INVALID_PHYS_ADDR) {
> - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
> + page = map->bounce_page;
> memcpy_from_page(page_address(page),
> - map->bounce_page, 0, PAGE_SIZE);
> + map->user_bounce_page, 0, PAGE_SIZE);
> }
> - put_page(map->bounce_page);
> - map->bounce_page = page;
> + put_page(map->user_bounce_page);
> }
> domain->user_bounce_pages = false;
> out:
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h
> index f92f22a7267d..7f3f0928ec78 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.h
> +++ b/drivers/vdpa/vdpa_user/iova_domain.h
> @@ -21,6 +21,7 @@
>
> struct vduse_bounce_map {
> struct page *bounce_page;
> + struct page *user_bounce_page;
> u64 orig_phys;
> };
>
> --
> 2.31.1
On Mon, Aug 5, 2024 at 4:26 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Aug 05, 2024 at 04:21:06PM +0800, Jason Wang wrote: > > Barry said [1]: > > > > """ > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > __GFP_NOFAIL without direct reclamation may just result in a busy > > loop within non-sleepable contexts. > > ""“ > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > is to move the pages allocation out of the lock into the caller, but > > having to allocate a huge number of pages and auxiliary page array > > seems to be problematic as well per Tetsuon [2]: > > > > """ > > You should implement proper error handling instead of using > > __GFP_NOFAIL if count can become large. > > """ > > > > So I choose another way, which does not release kernel bounce pages > > when user tries to register usersapce bounce pages. Then we don't need > > userspace > > > to do allocation in the path which is not expected to be fail (e.g in > > the release). We pay this for more memory usage but further > > what does "we pay this for more memory usage" mean? > Do you mean "we pay for this by using more memory"? Yes. > How much more? Depends on the workload, but basically, it's just the maximum size of bounce buffer: Default size 64M #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024) Maximum 1G: #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024) Thanks
On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
>
> Barry said [1]:
>
> """
> mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> __GFP_NOFAIL without direct reclamation may just result in a busy
> loop within non-sleepable contexts.
> ""“
>
> Unfortuantely, we do that under read lock. A possible way to fix that
> is to move the pages allocation out of the lock into the caller, but
> having to allocate a huge number of pages and auxiliary page array
> seems to be problematic as well per Tetsuon [2]:
>
> """
> You should implement proper error handling instead of using
> __GFP_NOFAIL if count can become large.
> """
>
> So I choose another way, which does not release kernel bounce pages
> when user tries to register usersapce bounce pages. Then we don't need
> to do allocation in the path which is not expected to be fail (e.g in
> the release). We pay this for more memory usage but further
> optimizations could be done on top.
>
> [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
>
> Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
Note for YongJi:
I can only test it without usersapce bounce pages as neither DPDK nor
libvduse users use that. Would you want to review and have a test for
this?
Thanks
On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > Barry said [1]:
> >
> > """
> > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > __GFP_NOFAIL without direct reclamation may just result in a busy
> > loop within non-sleepable contexts.
> > ""“
> >
> > Unfortuantely, we do that under read lock. A possible way to fix that
> > is to move the pages allocation out of the lock into the caller, but
> > having to allocate a huge number of pages and auxiliary page array
> > seems to be problematic as well per Tetsuon [2]:
> >
> > """
> > You should implement proper error handling instead of using
> > __GFP_NOFAIL if count can become large.
> > """
> >
I think the problem is it's hard to do the error handling in
fops->release() currently.
So can we temporarily hold the user page refcount, and release it when
vduse_dev_open()/vduse_domain_release() is executed. The kernel page
allocation and memcpy can be done in vduse_dev_open() which allows
some error handling.
> > So I choose another way, which does not release kernel bounce pages
> > when user tries to register usersapce bounce pages. Then we don't need
> > to do allocation in the path which is not expected to be fail (e.g in
> > the release). We pay this for more memory usage but further
> > optimizations could be done on top.
> >
> > [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> > [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
> >
> > Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
>
> Note for YongJi:
>
> I can only test it without usersapce bounce pages as neither DPDK nor
> libvduse users use that. Would you want to review and have a test for
> this?
>
I can do some tests for it.
Thanks,
Yongji
On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > Barry said [1]:
> > >
> > > """
> > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because
> > > __GFP_NOFAIL without direct reclamation may just result in a busy
> > > loop within non-sleepable contexts.
> > > ""“
> > >
> > > Unfortuantely, we do that under read lock. A possible way to fix that
> > > is to move the pages allocation out of the lock into the caller, but
> > > having to allocate a huge number of pages and auxiliary page array
> > > seems to be problematic as well per Tetsuon [2]:
> > >
> > > """
> > > You should implement proper error handling instead of using
> > > __GFP_NOFAIL if count can become large.
> > > """
> > >
>
> I think the problem is it's hard to do the error handling in
> fops->release() currently.
vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail.
>
> So can we temporarily hold the user page refcount, and release it when
> vduse_dev_open()/vduse_domain_release() is executed. The kernel page
> allocation and memcpy can be done in vduse_dev_open() which allows
> some error handling.
Just to make sure I understand this, the free is probably not the big
issue but the allocation itself.
And if we do the memcpy() in open(), it seems to be a subtle userspace
noticeable change? (Or I don't get how copying in vduse_dev_open() can
help here).
>
> > > So I choose another way, which does not release kernel bounce pages
> > > when user tries to register usersapce bounce pages. Then we don't need
> > > to do allocation in the path which is not expected to be fail (e.g in
> > > the release). We pay this for more memory usage but further
> > > optimizations could be done on top.
> > >
> > > [1] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m3caef86a66ea6318ef94f9976ddb3a0ccfe6fcf8
> > > [2] https://lore.kernel.org/all/CACGkMEtcOJAA96SF9B8m-nZ1X04-XZr+nq8ZQ2saLnUdfOGOLg@mail.gmail.com/T/#m7ad10eaba48ade5abf2d572f24e185d9fb146480
> > >
> > > Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> >
> > Note for YongJi:
> >
> > I can only test it without usersapce bounce pages as neither DPDK nor
> > libvduse users use that. Would you want to review and have a test for
> > this?
> >
>
> I can do some tests for it.
Great.
>
> Thanks,
> Yongji
>
Thanks
On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > Barry said [1]: > > > > > > > > """ > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > loop within non-sleepable contexts. > > > > ""“ > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > is to move the pages allocation out of the lock into the caller, but > > > > having to allocate a huge number of pages and auxiliary page array > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > """ > > > > You should implement proper error handling instead of using > > > > __GFP_NOFAIL if count can become large. > > > > """ > > > > > > > > I think the problem is it's hard to do the error handling in > > fops->release() currently. > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > So can we temporarily hold the user page refcount, and release it when > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > allocation and memcpy can be done in vduse_dev_open() which allows > > some error handling. > > Just to make sure I understand this, the free is probably not the big > issue but the allocation itself. > Yes, so defer the allocation might be a solution. > And if we do the memcpy() in open(), it seems to be a subtle userspace > noticeable change? (Or I don't get how copying in vduse_dev_open() can > help here). > Maybe we don't need to do the copy in open(). We can hold the user page refcount until the inflight I/O is completed. That means the allocation of new kernel pages can be done in vduse_domain_map_bounce_page() and the release of old user pages can be done in vduse_domain_unmap_bounce_page(). Of course, we still have a copy (old user page -> new user spage) if the daemon calls vduse_dev_reg_umem() again. Thanks, Yongji
On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > Barry said [1]: > > > > > > > > > > """ > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > loop within non-sleepable contexts. > > > > > ""“ > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > """ > > > > > You should implement proper error handling instead of using > > > > > __GFP_NOFAIL if count can become large. > > > > > """ > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > fops->release() currently. > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > some error handling. > > > > Just to make sure I understand this, the free is probably not the big > > issue but the allocation itself. > > > > Yes, so defer the allocation might be a solution. Would you mind posting a patch for this? > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > help here). > > > > Maybe we don't need to do the copy in open(). We can hold the user > page refcount until the inflight I/O is completed. That means the > allocation of new kernel pages can be done in > vduse_domain_map_bounce_page() and the release of old user pages can > be done in vduse_domain_unmap_bounce_page(). This seems to be a subtle userspace noticeable behaviour? > Of course, we still have > a copy (old user page -> new user spage) if the daemon calls > vduse_dev_reg_umem() again. > > Thanks, > Yongji > Thanks
On Wed, Aug 7, 2024 at 10:39 AM Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > Barry said [1]: > > > > > > > > > > > > """ > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > loop within non-sleepable contexts. > > > > > > ""“ > > > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > > > """ > > > > > > You should implement proper error handling instead of using > > > > > > __GFP_NOFAIL if count can become large. > > > > > > """ > > > > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > > fops->release() currently. > > > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > > some error handling. > > > > > > Just to make sure I understand this, the free is probably not the big > > > issue but the allocation itself. > > > > > > > Yes, so defer the allocation might be a solution. > > Would you mind posting a patch for this? > > > > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > > help here). > > > > > > > Maybe we don't need to do the copy in open(). We can hold the user > > page refcount until the inflight I/O is completed. That means the > > allocation of new kernel pages can be done in > > vduse_domain_map_bounce_page() and the release of old user pages can > > be done in vduse_domain_unmap_bounce_page(). > > This seems to be a subtle userspace noticeable behaviour? > Yes, userspace needs to ensure that it does not reuse the old user pages for other purposes before vduse_dev_dereg_umem() returns successfully. The vduse_dev_dereg_umem() will only return successfully when there is no inflight I/O which means we don't need to allocate extra kernel pages to store data. If we can't accept this, then your current patch might be the most suitable. I will test this patch first. Thanks, Yongji
On Wed, Aug 7, 2024 at 11:13 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Wed, Aug 7, 2024 at 10:39 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > Barry said [1]: > > > > > > > > > > > > > > """ > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > > loop within non-sleepable contexts. > > > > > > > ""“ > > > > > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > > > > > """ > > > > > > > You should implement proper error handling instead of using > > > > > > > __GFP_NOFAIL if count can become large. > > > > > > > """ > > > > > > > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > > > fops->release() currently. > > > > > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > > > some error handling. > > > > > > > > Just to make sure I understand this, the free is probably not the big > > > > issue but the allocation itself. > > > > > > > > > > Yes, so defer the allocation might be a solution. > > > > Would you mind posting a patch for this? > > > > > > > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > > > help here). > > > > > > > > > > Maybe we don't need to do the copy in open(). We can hold the user > > > page refcount until the inflight I/O is completed. That means the > > > allocation of new kernel pages can be done in > > > vduse_domain_map_bounce_page() and the release of old user pages can > > > be done in vduse_domain_unmap_bounce_page(). > > > > This seems to be a subtle userspace noticeable behaviour? > > > > Yes, userspace needs to ensure that it does not reuse the old user > pages for other purposes before vduse_dev_dereg_umem() returns > successfully. The vduse_dev_dereg_umem() will only return successfully > when there is no inflight I/O which means we don't need to allocate > extra kernel pages to store data. If we can't accept this, then your > current patch might be the most suitable. It might be better to not break. Actually during my testing, the read_lock in the do_bounce path slows down the performance. Remove read_lock or use rcu_read_lock() to give 20% improvement of PPS. I do want to get rid of it (e.g moving the copying some other where) seems it meets the exact "issue" here which introduces some behaviour change... > I will test this patch > first. > Great. Thanks > Thanks, > Yongji >
On Wed, Aug 7, 2024 at 12:38 PM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Aug 7, 2024 at 11:13 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Wed, Aug 7, 2024 at 10:39 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > Barry said [1]: > > > > > > > > > > > > > > > > """ > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > > > loop within non-sleepable contexts. > > > > > > > > ""“ > > > > > > > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > > > > > > > """ > > > > > > > > You should implement proper error handling instead of using > > > > > > > > __GFP_NOFAIL if count can become large. > > > > > > > > """ > > > > > > > > > > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > > > > fops->release() currently. > > > > > > > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > > > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > > > > some error handling. > > > > > > > > > > Just to make sure I understand this, the free is probably not the big > > > > > issue but the allocation itself. > > > > > > > > > > > > > Yes, so defer the allocation might be a solution. > > > > > > Would you mind posting a patch for this? > > > > > > > > > > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > > > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > > > > help here). > > > > > > > > > > > > > Maybe we don't need to do the copy in open(). We can hold the user > > > > page refcount until the inflight I/O is completed. That means the > > > > allocation of new kernel pages can be done in > > > > vduse_domain_map_bounce_page() and the release of old user pages can > > > > be done in vduse_domain_unmap_bounce_page(). > > > > > > This seems to be a subtle userspace noticeable behaviour? > > > > > > > Yes, userspace needs to ensure that it does not reuse the old user > > pages for other purposes before vduse_dev_dereg_umem() returns > > successfully. The vduse_dev_dereg_umem() will only return successfully > > when there is no inflight I/O which means we don't need to allocate > > extra kernel pages to store data. If we can't accept this, then your > > current patch might be the most suitable. > > It might be better to not break. > > Actually during my testing, the read_lock in the do_bounce path slows > down the performance. Remove read_lock or use rcu_read_lock() to give > 20% improvement of PPS. > Looks like rcu_read_lock() should be OK here. Thanks, Yongji
On Wed, Aug 7, 2024 at 2:54 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Wed, Aug 7, 2024 at 12:38 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Aug 7, 2024 at 11:13 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Wed, Aug 7, 2024 at 10:39 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > > > Barry said [1]: > > > > > > > > > > > > > > > > > > """ > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > > > > loop within non-sleepable contexts. > > > > > > > > > ""“ > > > > > > > > > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > > > > > > > > > """ > > > > > > > > > You should implement proper error handling instead of using > > > > > > > > > __GFP_NOFAIL if count can become large. > > > > > > > > > """ > > > > > > > > > > > > > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > > > > > fops->release() currently. > > > > > > > > > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > > > > > > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > > > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > > > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > > > > > some error handling. > > > > > > > > > > > > Just to make sure I understand this, the free is probably not the big > > > > > > issue but the allocation itself. > > > > > > > > > > > > > > > > Yes, so defer the allocation might be a solution. > > > > > > > > Would you mind posting a patch for this? > > > > > > > > > > > > > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > > > > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > > > > > help here). > > > > > > > > > > > > > > > > Maybe we don't need to do the copy in open(). We can hold the user > > > > > page refcount until the inflight I/O is completed. That means the > > > > > allocation of new kernel pages can be done in > > > > > vduse_domain_map_bounce_page() and the release of old user pages can > > > > > be done in vduse_domain_unmap_bounce_page(). > > > > > > > > This seems to be a subtle userspace noticeable behaviour? > > > > > > > > > > Yes, userspace needs to ensure that it does not reuse the old user > > > pages for other purposes before vduse_dev_dereg_umem() returns > > > successfully. The vduse_dev_dereg_umem() will only return successfully > > > when there is no inflight I/O which means we don't need to allocate > > > extra kernel pages to store data. If we can't accept this, then your > > > current patch might be the most suitable. > > > > It might be better to not break. > > > > Actually during my testing, the read_lock in the do_bounce path slows > > down the performance. Remove read_lock or use rcu_read_lock() to give > > 20% improvement of PPS. > > > > Looks like rcu_read_lock() should be OK here. The tricky part is that we may still end up behaviour changes (or lose some of the synchronization between kernel and bounce pages): RCU allows the read to be executed in parallel with the writer. So bouncing could be done in parallel with vduse_domain_add_user_bounce_pages(), there would be a race in two memcpy. Thanks > > Thanks, > Yongji >
On Thu, Aug 8, 2024 at 1:50 PM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Aug 7, 2024 at 2:54 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Wed, Aug 7, 2024 at 12:38 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Wed, Aug 7, 2024 at 11:13 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > On Wed, Aug 7, 2024 at 10:39 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > Barry said [1]: > > > > > > > > > > > > > > > > > > > > """ > > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > > > > > loop within non-sleepable contexts. > > > > > > > > > > ""“ > > > > > > > > > > > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > > > > > > > > > > > """ > > > > > > > > > > You should implement proper error handling instead of using > > > > > > > > > > __GFP_NOFAIL if count can become large. > > > > > > > > > > """ > > > > > > > > > > > > > > > > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > > > > > > fops->release() currently. > > > > > > > > > > > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > > > > > > > > > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > > > > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > > > > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > > > > > > some error handling. > > > > > > > > > > > > > > Just to make sure I understand this, the free is probably not the big > > > > > > > issue but the allocation itself. > > > > > > > > > > > > > > > > > > > Yes, so defer the allocation might be a solution. > > > > > > > > > > Would you mind posting a patch for this? > > > > > > > > > > > > > > > > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > > > > > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > > > > > > help here). > > > > > > > > > > > > > > > > > > > Maybe we don't need to do the copy in open(). We can hold the user > > > > > > page refcount until the inflight I/O is completed. That means the > > > > > > allocation of new kernel pages can be done in > > > > > > vduse_domain_map_bounce_page() and the release of old user pages can > > > > > > be done in vduse_domain_unmap_bounce_page(). > > > > > > > > > > This seems to be a subtle userspace noticeable behaviour? > > > > > > > > > > > > > Yes, userspace needs to ensure that it does not reuse the old user > > > > pages for other purposes before vduse_dev_dereg_umem() returns > > > > successfully. The vduse_dev_dereg_umem() will only return successfully > > > > when there is no inflight I/O which means we don't need to allocate > > > > extra kernel pages to store data. If we can't accept this, then your > > > > current patch might be the most suitable. > > > > > > It might be better to not break. > > > > > > Actually during my testing, the read_lock in the do_bounce path slows > > > down the performance. Remove read_lock or use rcu_read_lock() to give > > > 20% improvement of PPS. > > > > > > > Looks like rcu_read_lock() should be OK here. > > The tricky part is that we may still end up behaviour changes (or lose > some of the synchronization between kernel and bounce pages): > > RCU allows the read to be executed in parallel with the writer. So > bouncing could be done in parallel with > vduse_domain_add_user_bounce_pages(), there would be a race in two > memcpy. > Hmm...this is a problem. We may still need some userspace noticeable behaviour, e.g. only allowing reg_umem/dereg_umem when the device is not started. Thanks, Yongji
On Thu, Aug 8, 2024 at 7:09 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Thu, Aug 8, 2024 at 1:50 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Aug 7, 2024 at 2:54 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Wed, Aug 7, 2024 at 12:38 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Wed, Aug 7, 2024 at 11:13 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > On Wed, Aug 7, 2024 at 10:39 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Tue, Aug 6, 2024 at 11:10 AM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > > > On Tue, Aug 6, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 6:42 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:24 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Aug 5, 2024 at 4:21 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Barry said [1]: > > > > > > > > > > > > > > > > > > > > > > """ > > > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > > > > > > loop within non-sleepable contexts. > > > > > > > > > > > ""“ > > > > > > > > > > > > > > > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix that > > > > > > > > > > > is to move the pages allocation out of the lock into the caller, but > > > > > > > > > > > having to allocate a huge number of pages and auxiliary page array > > > > > > > > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > > > > > > > > > > > > > > > """ > > > > > > > > > > > You should implement proper error handling instead of using > > > > > > > > > > > __GFP_NOFAIL if count can become large. > > > > > > > > > > > """ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think the problem is it's hard to do the error handling in > > > > > > > > > fops->release() currently. > > > > > > > > > > > > > > > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to fail. > > > > > > > > > > > > > > > > > > > > > > > > > > So can we temporarily hold the user page refcount, and release it when > > > > > > > > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > > > > > > > > allocation and memcpy can be done in vduse_dev_open() which allows > > > > > > > > > some error handling. > > > > > > > > > > > > > > > > Just to make sure I understand this, the free is probably not the big > > > > > > > > issue but the allocation itself. > > > > > > > > > > > > > > > > > > > > > > Yes, so defer the allocation might be a solution. > > > > > > > > > > > > Would you mind posting a patch for this? > > > > > > > > > > > > > > > > > > > > > And if we do the memcpy() in open(), it seems to be a subtle userspace > > > > > > > > noticeable change? (Or I don't get how copying in vduse_dev_open() can > > > > > > > > help here). > > > > > > > > > > > > > > > > > > > > > > Maybe we don't need to do the copy in open(). We can hold the user > > > > > > > page refcount until the inflight I/O is completed. That means the > > > > > > > allocation of new kernel pages can be done in > > > > > > > vduse_domain_map_bounce_page() and the release of old user pages can > > > > > > > be done in vduse_domain_unmap_bounce_page(). > > > > > > > > > > > > This seems to be a subtle userspace noticeable behaviour? > > > > > > > > > > > > > > > > Yes, userspace needs to ensure that it does not reuse the old user > > > > > pages for other purposes before vduse_dev_dereg_umem() returns > > > > > successfully. The vduse_dev_dereg_umem() will only return successfully > > > > > when there is no inflight I/O which means we don't need to allocate > > > > > extra kernel pages to store data. If we can't accept this, then your > > > > > current patch might be the most suitable. > > > > > > > > It might be better to not break. > > > > > > > > Actually during my testing, the read_lock in the do_bounce path slows > > > > down the performance. Remove read_lock or use rcu_read_lock() to give > > > > 20% improvement of PPS. > > > > > > > > > > Looks like rcu_read_lock() should be OK here. > > > > The tricky part is that we may still end up behaviour changes (or lose > > some of the synchronization between kernel and bounce pages): > > > > RCU allows the read to be executed in parallel with the writer. So > > bouncing could be done in parallel with > > vduse_domain_add_user_bounce_pages(), there would be a race in two > > memcpy. > > > > Hmm...this is a problem. We may still need some userspace noticeable > behaviour, e.g. only allowing reg_umem/dereg_umem when the device is > not started. Exactly, maybe have a new userspace flag. Thanks > > Thanks, > Yongji >
© 2016 - 2025 Red Hat, Inc.