drivers/vdpa/vdpa_user/iova_domain.c | 134 ++++++++++++++++++++--- drivers/vdpa/vdpa_user/iova_domain.h | 9 ++ drivers/vdpa/vdpa_user/vduse_dev.c | 152 +++++++++++++++++++++++++++ include/uapi/linux/vduse.h | 45 ++++++++ 4 files changed, 327 insertions(+), 13 deletions(-)
Hi all,
This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
registering and de-registering userspace memory for IOTLB
as bounce buffer in virtio-vdpa case.
The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
information such as bounce buffer size. Then user can use
those information on VDUSE_IOTLB_REG_UMEM and
VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
userspace memory for IOTLB.
During registering and de-registering, the DMA data in use
would be copied from kernel bounce pages to userspace bounce
pages and back.
With this feature, some existing application such as SPDK
and DPDK can leverage the datapath of VDUSE directly and
efficiently as discussed before [1][2]. They can register
some preallocated hugepages to VDUSE to avoid an extra
memcpy from bounce-buffer to hugepages.
The kernel and userspace codes could be found in github:
https://github.com/bytedance/linux/tree/vduse-umem
https://github.com/bytedance/qemu/tree/vduse-umem
To test it with qemu-storage-daemon:
$ qemu-storage-daemon \
--chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
--monitor chardev=charmonitor \
--blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
--export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
[1] https://lkml.org/lkml/2021/6/27/318
[2] https://lkml.org/lkml/2022/7/4/246
Please review, thanks!
V1 to V2:
- Drop the patch that updating API version [MST]
- Replace unpin_user_pages() with unpin_user_pages_dirty_lock() [MST]
- Use __vmalloc(__GFP_ACCOUNT) for memory accounting [MST]
Xie Yongji (5):
vduse: Remove unnecessary spin lock protection
vduse: Use memcpy_{to,from}_page() in do_bounce()
vduse: Support using userspace pages as bounce buffer
vduse: Support querying IOLTB information
vduse: Support registering userspace memory for IOTLB
drivers/vdpa/vdpa_user/iova_domain.c | 134 ++++++++++++++++++++---
drivers/vdpa/vdpa_user/iova_domain.h | 9 ++
drivers/vdpa/vdpa_user/vduse_dev.c | 152 +++++++++++++++++++++++++++
include/uapi/linux/vduse.h | 45 ++++++++
4 files changed, 327 insertions(+), 13 deletions(-)
--
2.20.1
On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote:
>
> Hi all,
>
> This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO,
> VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support
> registering and de-registering userspace memory for IOTLB
> as bounce buffer in virtio-vdpa case.
>
> The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB
> information such as bounce buffer size. Then user can use
> those information on VDUSE_IOTLB_REG_UMEM and
> VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register
> userspace memory for IOTLB.
>
> During registering and de-registering, the DMA data in use
> would be copied from kernel bounce pages to userspace bounce
> pages and back.
>
> With this feature, some existing application such as SPDK
> and DPDK can leverage the datapath of VDUSE directly and
> efficiently as discussed before [1][2]. They can register
> some preallocated hugepages to VDUSE to avoid an extra
> memcpy from bounce-buffer to hugepages.
This is really interesting.
But a small concern on uAPI is that this seems to expose the VDUSE
internal implementation (bounce buffer) to userspace. We tried hard to
hide it via the GET_FD before. Anyway can we keep it?
Thanks
>
> The kernel and userspace codes could be found in github:
>
> https://github.com/bytedance/linux/tree/vduse-umem
> https://github.com/bytedance/qemu/tree/vduse-umem
>
> To test it with qemu-storage-daemon:
>
> $ qemu-storage-daemon \
> --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \
> --monitor chardev=charmonitor \
> --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
> --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on
>
> [1] https://lkml.org/lkml/2021/6/27/318
> [2] https://lkml.org/lkml/2022/7/4/246
>
> Please review, thanks!
>
> V1 to V2:
> - Drop the patch that updating API version [MST]
> - Replace unpin_user_pages() with unpin_user_pages_dirty_lock() [MST]
> - Use __vmalloc(__GFP_ACCOUNT) for memory accounting [MST]
>
> Xie Yongji (5):
> vduse: Remove unnecessary spin lock protection
> vduse: Use memcpy_{to,from}_page() in do_bounce()
> vduse: Support using userspace pages as bounce buffer
> vduse: Support querying IOLTB information
> vduse: Support registering userspace memory for IOTLB
>
> drivers/vdpa/vdpa_user/iova_domain.c | 134 ++++++++++++++++++++---
> drivers/vdpa/vdpa_user/iova_domain.h | 9 ++
> drivers/vdpa/vdpa_user/vduse_dev.c | 152 +++++++++++++++++++++++++++
> include/uapi/linux/vduse.h | 45 ++++++++
> 4 files changed, 327 insertions(+), 13 deletions(-)
>
> --
> 2.20.1
>
On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > Hi all, > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support > > registering and de-registering userspace memory for IOTLB > > as bounce buffer in virtio-vdpa case. > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB > > information such as bounce buffer size. Then user can use > > those information on VDUSE_IOTLB_REG_UMEM and > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register > > userspace memory for IOTLB. > > > > During registering and de-registering, the DMA data in use > > would be copied from kernel bounce pages to userspace bounce > > pages and back. > > > > With this feature, some existing application such as SPDK > > and DPDK can leverage the datapath of VDUSE directly and > > efficiently as discussed before [1][2]. They can register > > some preallocated hugepages to VDUSE to avoid an extra > > memcpy from bounce-buffer to hugepages. > > This is really interesting. > > But a small concern on uAPI is that this seems to expose the VDUSE > internal implementation (bounce buffer) to userspace. We tried hard to > hide it via the GET_FD before. Anyway can we keep it? > Another way is changing GET_FD ioctl to add a flag or reuse 'perm' field to indicate whether a IOVA region supports userspace memory registration. Then userspace can use VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister userspace memory for this IOVA region. Any suggestions? Thanks, Yongji
On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > Hi all, > > > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support > > > registering and de-registering userspace memory for IOTLB > > > as bounce buffer in virtio-vdpa case. > > > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB > > > information such as bounce buffer size. Then user can use > > > those information on VDUSE_IOTLB_REG_UMEM and > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register > > > userspace memory for IOTLB. > > > > > > During registering and de-registering, the DMA data in use > > > would be copied from kernel bounce pages to userspace bounce > > > pages and back. > > > > > > With this feature, some existing application such as SPDK > > > and DPDK can leverage the datapath of VDUSE directly and > > > efficiently as discussed before [1][2]. They can register > > > some preallocated hugepages to VDUSE to avoid an extra > > > memcpy from bounce-buffer to hugepages. > > > > This is really interesting. > > > > But a small concern on uAPI is that this seems to expose the VDUSE > > internal implementation (bounce buffer) to userspace. We tried hard to > > hide it via the GET_FD before. Anyway can we keep it? > > > > Another way is changing GET_FD ioctl to add a flag or reuse 'perm' > field to indicate whether a IOVA region supports userspace memory > registration. Then userspace can use > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister > userspace memory for this IOVA region. Looks better. > Any suggestions? I wonder what's the value of keeping the compatibility with the kernel mmaped bounce buffer. It means we need to take extra care on e.g data copying when reg/reg user space memory. Can we simply allow the third kind of fd that only works for umem registration? Thanks > > Thanks, > Yongji >
On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > Hi all, > > > > > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, > > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support > > > > registering and de-registering userspace memory for IOTLB > > > > as bounce buffer in virtio-vdpa case. > > > > > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB > > > > information such as bounce buffer size. Then user can use > > > > those information on VDUSE_IOTLB_REG_UMEM and > > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register > > > > userspace memory for IOTLB. > > > > > > > > During registering and de-registering, the DMA data in use > > > > would be copied from kernel bounce pages to userspace bounce > > > > pages and back. > > > > > > > > With this feature, some existing application such as SPDK > > > > and DPDK can leverage the datapath of VDUSE directly and > > > > efficiently as discussed before [1][2]. They can register > > > > some preallocated hugepages to VDUSE to avoid an extra > > > > memcpy from bounce-buffer to hugepages. > > > > > > This is really interesting. > > > > > > But a small concern on uAPI is that this seems to expose the VDUSE > > > internal implementation (bounce buffer) to userspace. We tried hard to > > > hide it via the GET_FD before. Anyway can we keep it? > > > > > > > Another way is changing GET_FD ioctl to add a flag or reuse 'perm' > > field to indicate whether a IOVA region supports userspace memory > > registration. Then userspace can use > > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister > > userspace memory for this IOVA region. > > Looks better. > OK. > > Any suggestions? > > I wonder what's the value of keeping the compatibility with the kernel > mmaped bounce buffer. It means we need to take extra care on e.g data > copying when reg/reg user space memory. > I'm not sure I get your point on the compatibility with the kernel bounce buffer. Do you mean they use the same iova region? The userspace daemon might crash or reboot. In those cases, we still need a kernel buffer to store/recover the data. > Can we simply allow the third kind of fd that only works for umem registration? > Do you mean using another iova region for umem? I think we don't need a fd in umem case since the userspace daemon can access the memory directly without using mmap() to map it into the address space in advance. Thanks, Yongji
On Fri, Jul 8, 2022 at 5:53 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > > > Hi all, > > > > > > > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, > > > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support > > > > > registering and de-registering userspace memory for IOTLB > > > > > as bounce buffer in virtio-vdpa case. > > > > > > > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB > > > > > information such as bounce buffer size. Then user can use > > > > > those information on VDUSE_IOTLB_REG_UMEM and > > > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register > > > > > userspace memory for IOTLB. > > > > > > > > > > During registering and de-registering, the DMA data in use > > > > > would be copied from kernel bounce pages to userspace bounce > > > > > pages and back. > > > > > > > > > > With this feature, some existing application such as SPDK > > > > > and DPDK can leverage the datapath of VDUSE directly and > > > > > efficiently as discussed before [1][2]. They can register > > > > > some preallocated hugepages to VDUSE to avoid an extra > > > > > memcpy from bounce-buffer to hugepages. > > > > > > > > This is really interesting. > > > > > > > > But a small concern on uAPI is that this seems to expose the VDUSE > > > > internal implementation (bounce buffer) to userspace. We tried hard to > > > > hide it via the GET_FD before. Anyway can we keep it? > > > > > > > > > > Another way is changing GET_FD ioctl to add a flag or reuse 'perm' > > > field to indicate whether a IOVA region supports userspace memory > > > registration. Then userspace can use > > > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister > > > userspace memory for this IOVA region. > > > > Looks better. > > > > OK. > > > > Any suggestions? > > > > I wonder what's the value of keeping the compatibility with the kernel > > mmaped bounce buffer. It means we need to take extra care on e.g data > > copying when reg/reg user space memory. > > > > I'm not sure I get your point on the compatibility with the kernel > bounce buffer. Do you mean they use the same iova region? Yes. > > The userspace daemon might crash or reboot. In those cases, we still > need a kernel buffer to store/recover the data. Yes, this should be a good point. > > > Can we simply allow the third kind of fd that only works for umem registration? > > > > Do you mean using another iova region for umem? I meant having a new kind of fd that only allows umem registration. >I think we don't need > a fd in umem case since the userspace daemon can access the memory > directly without using mmap() to map it into the address space in > advance. Ok, I will have a look at the code and get back. Thanks > > Thanks, > Yongji >
On Mon, Jul 11, 2022 at 2:02 PM Jason Wang <jasowang@redhat.com> wrote: > > On Fri, Jul 8, 2022 at 5:53 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote: > > > > > > > > On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote: > > > > > > > > > > > > Hi all, > > > > > > > > > > > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, > > > > > > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support > > > > > > registering and de-registering userspace memory for IOTLB > > > > > > as bounce buffer in virtio-vdpa case. > > > > > > > > > > > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB > > > > > > information such as bounce buffer size. Then user can use > > > > > > those information on VDUSE_IOTLB_REG_UMEM and > > > > > > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register > > > > > > userspace memory for IOTLB. > > > > > > > > > > > > During registering and de-registering, the DMA data in use > > > > > > would be copied from kernel bounce pages to userspace bounce > > > > > > pages and back. > > > > > > > > > > > > With this feature, some existing application such as SPDK > > > > > > and DPDK can leverage the datapath of VDUSE directly and > > > > > > efficiently as discussed before [1][2]. They can register > > > > > > some preallocated hugepages to VDUSE to avoid an extra > > > > > > memcpy from bounce-buffer to hugepages. > > > > > > > > > > This is really interesting. > > > > > > > > > > But a small concern on uAPI is that this seems to expose the VDUSE > > > > > internal implementation (bounce buffer) to userspace. We tried hard to > > > > > hide it via the GET_FD before. Anyway can we keep it? > > > > > > > > > > > > > Another way is changing GET_FD ioctl to add a flag or reuse 'perm' > > > > field to indicate whether a IOVA region supports userspace memory > > > > registration. Then userspace can use > > > > VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister > > > > userspace memory for this IOVA region. > > > > > > Looks better. > > > > > > > OK. > > > > > > Any suggestions? > > > > > > I wonder what's the value of keeping the compatibility with the kernel > > > mmaped bounce buffer. It means we need to take extra care on e.g data > > > copying when reg/reg user space memory. > > > > > > > I'm not sure I get your point on the compatibility with the kernel > > bounce buffer. Do you mean they use the same iova region? > > Yes. > > > > > The userspace daemon might crash or reboot. In those cases, we still > > need a kernel buffer to store/recover the data. > > Yes, this should be a good point. > > > > > > Can we simply allow the third kind of fd that only works for umem registration? > > > > > > > Do you mean using another iova region for umem? > > I meant having a new kind of fd that only allows umem registration. > OK. It seems to be a little complicated to allow mapping a registered user memory via a new fd, e.g. how to handle the mapping if the userspace daemon exits but the fd is already passed to another process. > >I think we don't need > > a fd in umem case since the userspace daemon can access the memory > > directly without using mmap() to map it into the address space in > > advance. > > Ok, I will have a look at the code and get back. > OK. Looking forward to your reply. Thanks, Yongji
在 2022/7/11 15:24, Yongji Xie 写道: > On Mon, Jul 11, 2022 at 2:02 PM Jason Wang <jasowang@redhat.com> wrote: >> On Fri, Jul 8, 2022 at 5:53 PM Yongji Xie <xieyongji@bytedance.com> wrote: >>> On Fri, Jul 8, 2022 at 4:38 PM Jason Wang <jasowang@redhat.com> wrote: >>>> On Wed, Jul 6, 2022 at 6:16 PM Yongji Xie <xieyongji@bytedance.com> wrote: >>>>> On Wed, Jul 6, 2022 at 5:30 PM Jason Wang <jasowang@redhat.com> wrote: >>>>>> On Wed, Jul 6, 2022 at 1:05 PM Xie Yongji <xieyongji@bytedance.com> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, >>>>>>> VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support >>>>>>> registering and de-registering userspace memory for IOTLB >>>>>>> as bounce buffer in virtio-vdpa case. >>>>>>> >>>>>>> The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB >>>>>>> information such as bounce buffer size. Then user can use >>>>>>> those information on VDUSE_IOTLB_REG_UMEM and >>>>>>> VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register >>>>>>> userspace memory for IOTLB. >>>>>>> >>>>>>> During registering and de-registering, the DMA data in use >>>>>>> would be copied from kernel bounce pages to userspace bounce >>>>>>> pages and back. >>>>>>> >>>>>>> With this feature, some existing application such as SPDK >>>>>>> and DPDK can leverage the datapath of VDUSE directly and >>>>>>> efficiently as discussed before [1][2]. They can register >>>>>>> some preallocated hugepages to VDUSE to avoid an extra >>>>>>> memcpy from bounce-buffer to hugepages. >>>>>> This is really interesting. >>>>>> >>>>>> But a small concern on uAPI is that this seems to expose the VDUSE >>>>>> internal implementation (bounce buffer) to userspace. We tried hard to >>>>>> hide it via the GET_FD before. Anyway can we keep it? >>>>>> >>>>> Another way is changing GET_FD ioctl to add a flag or reuse 'perm' >>>>> field to indicate whether a IOVA region supports userspace memory >>>>> registration. Then userspace can use >>>>> VDUSE_IOTLB_REG_UMEM/VDUSE_IOTLB_DEREG_UMEM to register/deregister >>>>> userspace memory for this IOVA region. >>>> Looks better. >>>> >>> OK. >>> >>>>> Any suggestions? >>>> I wonder what's the value of keeping the compatibility with the kernel >>>> mmaped bounce buffer. It means we need to take extra care on e.g data >>>> copying when reg/reg user space memory. >>>> >>> I'm not sure I get your point on the compatibility with the kernel >>> bounce buffer. Do you mean they use the same iova region? >> Yes. >> >>> The userspace daemon might crash or reboot. In those cases, we still >>> need a kernel buffer to store/recover the data. >> Yes, this should be a good point. >> >>>> Can we simply allow the third kind of fd that only works for umem registration? >>>> >>> Do you mean using another iova region for umem? >> I meant having a new kind of fd that only allows umem registration. >> > OK. It seems to be a little complicated to allow mapping a registered > user memory via a new fd, e.g. how to handle the mapping if the > userspace daemon exits but the fd is already passed to another > process. > >>> I think we don't need >>> a fd in umem case since the userspace daemon can access the memory >>> directly without using mmap() to map it into the address space in >>> advance. >> Ok, I will have a look at the code and get back. >> > OK. Looking forward to your reply. Looks good overall. Just few comments. Thanks > > Thanks, > Yongji >
© 2016 - 2026 Red Hat, Inc.