Use dmabufs for display updates instead of pixman

[RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Vivek Kasireddy 3 years, 2 months ago

This is still a WIP/RFC patch that attempts to use dmabufs for display
updates with the help of Udmabuf driver instead of pixman. This patch
is posted to the ML to elicit feedback and start a discussion whether
something like this would be useful or not for mainly non-Virgl
rendered BOs and also potentially in other cases.

This patch was tested to work OK with Weston version from here:
https://gitlab.freedesktop.org/Vivek/weston/-/blob/virtgpu_dmabuf/libweston/backend-drm/drm-gbm.c#L282

and Qemu launched with these options:
qemu-system-x86_64 -m 8192m .... -device virtio-gpu-pci,max_outputs=1 -display gtk,gl=on .....
-object memory-backend-memfd,id=mem1,size=8192M -machine memory-backend=mem1

TODO:
- Use Blob resources for getting meta-data such as modifier, format, etc.
- Test with Virgil rendered BOs to see if this can be used in that case..

Considerations/Challenges:
- One of the main concerns with using dmabufs is how to synchronize access
to them and this use-case is no different. If the Guest is running Weston,
then it could use a maximum of 4 color buffers but uses only 2 by default and
flips between them if it is not sharing the FBs with other plugins while
running with the drm backend. In this case, how do we make sure that Weston
and Qemu UI are not using the same buffer at any given time? This is
complicated by the fact that the toolkits (that Qemu UI uses) do not seem to
provide a way to wait for buffer events. For example, GTK does not apparently
provide a way to either wait for "send done" events or register a listener
for wl_buffer release events that native Wayland/Weston clients have access to.

- If we have Xorg running in the Guest, then it gets even more interesting as
Xorg in some cases does frontbuffer rendering (uses DRM_IOCTL_MODE_DIRTYFB).
The same challenge arises in this case as well to determine how to safely
destroy or reuse the buffer in the Guest while it might be used on the Host.

Some of the potential solutions for addressing the above challenges include 
using syncronization primitives such as fences/sync objs in Qemu UI to
determine when a buffer/dmabuf is consumed by the Host display server/compositor
and hold up the vblank/flip done event until that time. But this one comes with
a performance concern as the Guest would not be able to queue up another flip
until the previous one finishes.

Other options include caching 2 or more dmabufs and releasing the others but 
this may not be feasible without having to modify the Guest display server/
compositor to use all color buffers or create a new color buffer each time.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Vivek Kasireddy (1):
  virtio-gpu: Use dmabuf for display updates if possible instead of
    pixman

 hw/display/virtio-gpu.c        | 133 +++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-gpu.h |  12 +++
 2 files changed, 145 insertions(+)

-- 
2.26.2

Re: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by no-reply@patchew.org 3 years, 2 months ago

Patchew URL: https://patchew.org/QEMU/20210302080358.3095748-1-vivek.kasireddy@intel.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210302080358.3095748-1-vivek.kasireddy@intel.com
Subject: [RFC 0/1] Use dmabufs for display updates instead of pixman

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20210302080358.3095748-1-vivek.kasireddy@intel.com -> patchew/20210302080358.3095748-1-vivek.kasireddy@intel.com
Switched to a new branch 'test'
5509447 virtio-gpu: Use dmabuf for display updates if possible instead of pixman

=== OUTPUT BEGIN ===
ERROR: space prohibited between function name and open parenthesis '('
#65: FILE: hw/display/virtio-gpu.c:541:
+                       res->iov_cnt * sizeof (struct udmabuf_create_item));

ERROR: braces {} are necessary for all arms of this statement
#66: FILE: hw/display/virtio-gpu.c:542:
+    if (!create)
[...]

ERROR: space required after that ',' (ctx:VxV)
#92: FILE: hw/display/virtio-gpu.c:568:
+    modifier_lo = fourcc_mod_code(INTEL,I915_FORMAT_MOD_X_TILED) & 0xFFFFFFFF;
                                        ^

ERROR: braces {} are necessary for all arms of this statement
#182: FILE: hw/display/virtio-gpu.c:699:
+        if (!ret)
[...]

total: 4 errors, 0 warnings, 196 lines checked

Commit 550944737e2a (virtio-gpu: Use dmabuf for display updates if possible instead of pixman) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210302080358.3095748-1-vivek.kasireddy@intel.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Re: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Gerd Hoffmann 3 years, 2 months ago

On Tue, Mar 02, 2021 at 12:03:57AM -0800, Vivek Kasireddy wrote:
> This is still a WIP/RFC patch that attempts to use dmabufs for display
> updates with the help of Udmabuf driver instead of pixman. This patch
> is posted to the ML to elicit feedback and start a discussion whether
> something like this would be useful or not for mainly non-Virgl
> rendered BOs and also potentially in other cases.

Yes, it surely makes sense to go into that direction.
The patch as-is doesn't, it breaks the guest/host interface.
That's ok-ish for a quick proof-of-concept, but clearly not
merge-able.

> TODO:
> - Use Blob resources for getting meta-data such as modifier, format, etc.

That is pretty much mandatory.  Without blob resources there is no
concept of resources shared between host and guest in virtio-gpu,
all data is explicitly copied with transfer commands.

Which implies quite a bit of work because we don't have blob resource
support in qemu yet.

> - Test with Virgil rendered BOs to see if this can be used in that case..

That also opens up the question how to go forward with virtio-gpu in
general.  The object hierarchy we have right now (skipping pci + vga
variants for simplicity):

  TYPE_VIRTIO_GPU_BASE (abstract base)
   -> TYPE_VIRTIO_GPU (in-qemu implementation)
   -> TYPE_VHOST_USER_GPU (vhost-user implementation)

When compiled with opengl + virgl TYPE_VIRTIO_GPU has a virgl=on/off
property.  Having a single device is not ideal for modular builds.
because the hw-display-virtio-gpu.so module has a dependency on
ui-opengl.so so that is needed (due to symbol references) even for the
virgl=off case.  Also the code is a bit of a #ifdef mess.

I think we should split TYPE_VIRTIO_GPU into two devices.  Remove
virgl+opengl support from TYPE_VIRTIO_GPU.  Add a new
TYPE_VIRTIO_GPU_VIRGL, with either TYPE_VIRTIO_GPU or
TYPE_VIRTIO_GPU_BASE as parent (not sure which is easier), have all
opengl/virgl support code there.

I think when using opengl it makes sense to also require virgl, so we
can use the virglrenderer library to manage blob resources (even when
the actual rendering isn't done with virgl).  Also reduces the
complexity and test matrix.

Maybe it even makes sense to deprecate in-qemu virgl support and focus
exclusively on the vhost-user implementation, so we don't have to
duplicate all work for both implementations.

> Considerations/Challenges:
> - One of the main concerns with using dmabufs is how to synchronize access
> to them and this use-case is no different. If the Guest is running Weston,
> then it could use a maximum of 4 color buffers but uses only 2 by default and
> flips between them if it is not sharing the FBs with other plugins while
> running with the drm backend. In this case, how do we make sure that Weston
> and Qemu UI are not using the same buffer at any given time?

There is graphic_hw_gl_block + graphic_hw_gl_flushed for syncronization.
Right now this is only wired up in spice, and it is rather simple (just
stalls virgl rendering instead of providing per-buffer syncronization).

> - If we have Xorg running in the Guest, then it gets even more interesting as
> Xorg in some cases does frontbuffer rendering (uses DRM_IOCTL_MODE_DIRTYFB).

Well, if the guest does frontbuffer rendering we can't do much about it
and have to live with rendering glitches I guess.

take care,
  Gerd

RE: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Kasireddy, Vivek 3 years, 2 months ago

Hi Gerd,

> Yes, it surely makes sense to go into that direction.
> The patch as-is doesn't, it breaks the guest/host interface.
> That's ok-ish for a quick proof-of-concept, but clearly not merge-able.
> 
> > TODO:
> > - Use Blob resources for getting meta-data such as modifier, format, etc.
> 
> That is pretty much mandatory.  Without blob resources there is no concept of resources
> shared between host and guest in virtio-gpu, all data is explicitly copied with transfer
> commands.
[Kasireddy, Vivek] My understanding of virtio-gpu and the concept of resources is still
fairly limited but are blob resources really needed for non-Virgl use-cases -- other than
something like a dmabuf/scanout blob that shares the meta-data such as modifer? I
thought the main motivation for blob resources would be to avoid the explicit copy you
mentioned for Virgl workloads. 

> 
> Which implies quite a bit of work because we don't have blob resource support in qemu
> yet.
[Kasireddy, Vivek] I was scrubbing through old mailing list messages to understand the
motivation behind blob resources as to why they are needed and came across this:
https://gitlab.freedesktop.org/virgl/qemu/-/commits/virtio-gpu-next

Does your work above not count for anything?

> 
> > - Test with Virgil rendered BOs to see if this can be used in that case..
> 
> That also opens up the question how to go forward with virtio-gpu in general.  The object
> hierarchy we have right now (skipping pci + vga variants for simplicity):
> 
>   TYPE_VIRTIO_GPU_BASE (abstract base)
>    -> TYPE_VIRTIO_GPU (in-qemu implementation)
>    -> TYPE_VHOST_USER_GPU (vhost-user implementation)
> 
> When compiled with opengl + virgl TYPE_VIRTIO_GPU has a virgl=on/off property.
> Having a single device is not ideal for modular builds.
> because the hw-display-virtio-gpu.so module has a dependency on ui-opengl.so so that is
> needed (due to symbol references) even for the virgl=off case.  Also the code is a bit of a
> #ifdef mess.
> 
> I think we should split TYPE_VIRTIO_GPU into two devices.  Remove
> virgl+opengl support from TYPE_VIRTIO_GPU.  Add a new
> TYPE_VIRTIO_GPU_VIRGL, with either TYPE_VIRTIO_GPU or
> TYPE_VIRTIO_GPU_BASE as parent (not sure which is easier), have all opengl/virgl
> support code there.
> 
> I think when using opengl it makes sense to also require virgl, so we can use the
> virglrenderer library to manage blob resources (even when the actual rendering isn't done
> with virgl).  Also reduces the complexity and test matrix.
[Kasireddy, Vivek] When you say "using opengl" are you referring to the presentation of
the rendered buffer via dmabuf or pixman? If yes, I am not sure why this would need to
depend on Virgl. For our use-case(s) where we are using virtio-gpu in buffer sharing mode,
we'd still need opengl for submitting the dmabuf to UI, IIUC.

> 
> Maybe it even makes sense to deprecate in-qemu virgl support and focus exclusively on
> the vhost-user implementation, so we don't have to duplicate all work for both
> implementations.
[Kasireddy, Vivek] Is the vhost-user implementation better in terms of performance, generally? 

> > case, how do we make sure that Weston and Qemu UI are not using the same buffer at
> any given time?
> 
> There is graphic_hw_gl_block + graphic_hw_gl_flushed for syncronization.
> Right now this is only wired up in spice, and it is rather simple (just stalls virgl rendering
> instead of providing per-buffer syncronization).
[Kasireddy, Vivek] I guess that might work for Virgl rendering but not for our use-case. What
we need is a way to tell if the previously submitted dmabuf has been consumed by the Host 
compositor or not before we release/close it. Weston (wl_buffer.release event and fences) 
and EGL (sync and fences) do provide few options but I am not sure if GTK lets us use
any of those or not. Any recommendations? EGLSync objects?

On a different note, any particular reason why Qemu UI EGL implementation is limited to Xorg
and not extended to Wayland/Weston for which there is GTK glarea?

Thanks,
Vivek

> 
> take care,
>   Gerd

Re: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Gerd Hoffmann 3 years, 2 months ago

  Hi,

> > That is pretty much mandatory.  Without blob resources there is no concept of resources
> > shared between host and guest in virtio-gpu, all data is explicitly copied with transfer
> > commands.
> [Kasireddy, Vivek] My understanding of virtio-gpu and the concept of resources is still
> fairly limited but are blob resources really needed for non-Virgl use-cases -- other than
> something like a dmabuf/scanout blob that shares the meta-data such as modifer? I
> thought the main motivation for blob resources would be to avoid the explicit copy you
> mentioned for Virgl workloads. 

Well, you want avoid the copy as well, right?  With blob resources you
can do that in a well defined way, i.e. the guest knows what you are
doing and behaves accordingly.  Without blob resources you can't, at
least not without violating the guests expectation that any changes it
does only visible to the host after an explicit transfer (aka copy)
command.

> > Which implies quite a bit of work because we don't have blob resource support in qemu
> > yet.
> [Kasireddy, Vivek] I was scrubbing through old mailing list messages to understand the
> motivation behind blob resources as to why they are needed and came across this:
> https://gitlab.freedesktop.org/virgl/qemu/-/commits/virtio-gpu-next
> 
> Does your work above not count for anything?

It is quite old, and I think not up-to-date with the final revision of
the blob resource specification.  I wouldn't be able to update this in
near future due to being busy with other projects.  Feel free to grab
& update & submit these patches though.

> > I think when using opengl it makes sense to also require virgl, so we can use the
> > virglrenderer library to manage blob resources (even when the actual rendering isn't done
> > with virgl).  Also reduces the complexity and test matrix.
> [Kasireddy, Vivek] When you say "using opengl" are you referring to the presentation of
> the rendered buffer via dmabuf or pixman? If yes, I am not sure why this would need to
> depend on Virgl.

Well, you can probably do it without virgl as well.  But why?  Instead
of just using the virglrenderer library effectively duplicate the blob
resource management bits in qemu?

Beside the code duplication this is also a maintainance issue.  This
adds one more configuration to virtio-gpu.  Right now you can build
virtio-gpu with virgl (depends on opengl), or you can build without
virgl (doesn't use opengl then).  I don't think it is a good idea to
add a third mode, without virgl support but using opengl for blob
dma-bufs.

> For our use-case(s) where we are using virtio-gpu in buffer sharing mode,
> we'd still need opengl for submitting the dmabuf to UI, IIUC.

Correct.  When you want use dma-bufs you need opengl.

> > Maybe it even makes sense to deprecate in-qemu virgl support and focus exclusively on
> > the vhost-user implementation, so we don't have to duplicate all work for both
> > implementations.
> [Kasireddy, Vivek] Is the vhost-user implementation better in terms of performance, generally? 

It is better both in terms of security (it's easier to sandbox) and
performance.  The in-qemu implementation runs in the qemu iothread.
Which also handles a bunch of other jobs.  Also virglrenderer being busy
-- for example with compiling complex shaders -- can block qemu for a
while, which in turn can cause latency spikes in the guest.  With the
vhost-user implementation this is not a problem.

Drawback is the extra communication (and synchronization) needed between
vhost-user + qemu to make the guest display available via spice or gtk.

The latter can possibly be solved by exporting the guest display as
pipewire remote desktop (random idea I didn't investigate much yet).

> On a different note, any particular reason why Qemu UI EGL
> implementation is limited to Xorg and not extended to Wayland/Weston
> for which there is GTK glarea?

Well, ideally I'd love to just use glarea.  Which happens on wayland.

The problem with Xorg is that the gtk x11 backend uses glx not egl to
create an opengl context for glarea.  At least that used to be the case
in the past, maybe that has changed with newer versions.  qemu needs egl
contexts though, otherwise dma-bufs don't work.  So we are stuck with
our own egl widget implementation for now.  Probably we will be able to
drop it at some point in the future.

HTH,
  Gerd

RE: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Kasireddy, Vivek 3 years, 1 month ago

Hi Gerd,
Sorry for the delayed response. I wanted to wait until I finished my proof-of-concept --
that included adding synchronization --  to ask follow up questions.

> >
> > Does your work above not count for anything?
> 
> It is quite old, and I think not up-to-date with the final revision of the blob resource
> specification.  I wouldn't be able to update this in near future due to being busy with other
> projects.  Feel free to grab & update & submit these patches though.
[Kasireddy, Vivek] Sure, we'll take a look at your work and use that as a starting
point. Roughly, how much of your work can be reused?

Also, given my limited understanding of how discrete GPUs work, I was wondering how 
many copies would there need to be with blob resources/dmabufs and whether a zero-copy
goal would be feasible or not?

> 
> Beside the code duplication this is also a maintainance issue.  This adds one more
> configuration to virtio-gpu.  Right now you can build virtio-gpu with virgl (depends on
> opengl), or you can build without virgl (doesn't use opengl then).  I don't think it is a good
> idea to add a third mode, without virgl support but using opengl for blob dma-bufs.
[Kasireddy, Vivek] We'll have to re-visit this part but for our use-case with virtio-gpu, we
are disabling virglrenderer in Qemu and virgl DRI driver in the Guest. However, we still
need to use Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part of
the UI/GTK updates. 

> 
> 
> > On a different note, any particular reason why Qemu UI EGL
> > implementation is limited to Xorg and not extended to Wayland/Weston
> > for which there is GTK glarea?
> 
> Well, ideally I'd love to just use glarea.  Which happens on wayland.
> 
> The problem with Xorg is that the gtk x11 backend uses glx not egl to create an opengl
> context for glarea.  At least that used to be the case in the past, maybe that has changed
> with newer versions.  qemu needs egl contexts though, otherwise dma-bufs don't work.  So
> we are stuck with our own egl widget implementation for now.  Probably we will be able
> to drop it at some point in the future.
[Kasireddy, Vivek] GTK X11 backend still uses GLX and it seems like that is not going to 
change anytime soon. Having said that, I was wondering if it makes sense to add a new
purely Wayland backend besides GtkGlArea so that Qemu UI can more quickly adopt new
features such as explicit sync. I was thinking about the new backend being similar to this:
https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf-egl.c

The reason why I am proposing this idea is because even if we manage to add explicit 
sync support to GTK and it gets merged, upgrading Qemu GTK support from 3.22 
to > 4.x may prove to be daunting. Currently, the way I am doing explicit sync is
by adding these new APIs to GTK and calling them from Qemu:

static int
create_egl_fence_fd(EGLDisplay dpy)
{
        EGLSyncKHR sync = eglCreateSyncKHR(dpy,
                                           EGL_SYNC_NATIVE_FENCE_ANDROID,
                                           NULL);
        int fd;

        g_assert(sync != EGL_NO_SYNC_KHR);
        fd = eglDupNativeFenceFDANDROID(dpy, sync);
        g_assert(fd >= 0);

        eglDestroySyncKHR(dpy, sync);

        return fd;
}

static void
wait_for_buffer_release_fence(EGLDisplay dpy)
{
        int ret;
        EGLint attrib_list[] = {
                EGL_SYNC_NATIVE_FENCE_FD_ANDROID, release_fence_fd,
                EGL_NONE,
        };

        if (release_fence_fd < 0)
          return;

        EGLSyncKHR sync = eglCreateSyncKHR(dpy,
                                           EGL_SYNC_NATIVE_FENCE_ANDROID,
                                           attrib_list);
        g_assert(sync);

        release_fence_fd = -1;
        eglClientWaitSyncKHR(dpy, sync, 0,
                             EGL_FOREVER_KHR);
        eglDestroySyncKHR(dpy, sync);
}

And, of-course, I am tying the wait above to a dma_fence associated with the
previous guest FB that is signalled to ensure that the Host is done using the FB 
thereby providing explicit synchronization between Guest and Host. It seems to
work OK but I was wondering if you had any alternative ideas or suggestions 
for doing explicit or implicit sync that are more easier.

Lastly, on a different note, I noticed that there is a virtio-gpu Windows driver here:
https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/viogpu

We are going to try it out but do you know how up to date it is kept?


Thanks,
Vivek

Re: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Gerd Hoffmann 3 years, 1 month ago

On Wed, Mar 17, 2021 at 08:28:33AM +0000, Kasireddy, Vivek wrote:
> Hi Gerd,
> Sorry for the delayed response. I wanted to wait until I finished my proof-of-concept --
> that included adding synchronization --  to ask follow up questions.
> 
> > >
> > > Does your work above not count for anything?
> > 
> > It is quite old, and I think not up-to-date with the final revision of the blob resource
> > specification.  I wouldn't be able to update this in near future due to being busy with other
> > projects.  Feel free to grab & update & submit these patches though.
> [Kasireddy, Vivek] Sure, we'll take a look at your work and use that as a starting
> point. Roughly, how much of your work can be reused?

There are some small udmabuf support patches which can probably be
reused pretty much as-is.  Everything else needs larger changes I
suspect, but it's been a while I looked at this ...

> Also, given my limited understanding of how discrete GPUs work, I was wondering how 
> many copies would there need to be with blob resources/dmabufs and whether a zero-copy
> goal would be feasible or not?

Good question.

Right now there are two copies (gtk ui):

  (1) guest ram -> DisplaySurface -> gtk widget (gl=off), or
  (2) guest ram -> DisplaySurface -> texture (gl=on).

You should be able to reduce this to one copy for gl=on ...

  (3) guest ram -> texture

... by taking DisplaySurface out of the picture, without any changes to
the guest/host interface.  Drawback is that it requires adding an opengl
dependency to virtio-gpu even with virgl=off, because the virtio-gpu
device will have to handle the copy to the texture then, in response to
guest TRANSFER commands.

When adding blob resource support:

Easiest is probably supporting VIRTIO_GPU_BLOB_MEM_GUEST (largely
identical to non-blob resources) with VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE
(allows the host to create a shared mapping).  Then you can go create a
udmabuf for the resource on the host side.  For the non-gl code path you
can mmap() the udmabuf (which gives you a linear mapping for the
scattered guest pages) and create a DisplaySurface backed by guest ram
pages (removing the guest ram -> DisplaySurface copy).  For the gl code
path you can create a texture backed by the udmabuf and go render on the
host without copying at all.

Using VIRTIO_GPU_BLOB_MEM_GUEST + VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE for
resources needs guest changes too, either in mesa (when using virgl) or
the kernel driver's dumb buffer handling (when not using virgl).

Alternatively (listed more for completeness):

You can create a blob resource with VIRTGPU_BLOB_MEM_HOST3D (requires
virgl, see also virgl_drm_winsys_resource_create_blob in mesa).  It will
be allocated by the host, then mapped into the guest using a virtual pci
memory bar.  Guest userspace (aka mesa driver) can mmap() these
resources and has direct, zero-copy access to the host resource.

Going to dma-buf export that, import into i915, then let the gpu render
implies we are doing p2p dma from a physical (pci-assigned) device to
the memory bar of a virtual pci device.

Doing that should be possible, but frankly I would be surprised if that
actually works out-of-the-box.  Dunno how many dragons are lurking here.
Could become an interesting challenge to make that fly.

> > Beside the code duplication this is also a maintainance issue.  This adds one more
> > configuration to virtio-gpu.  Right now you can build virtio-gpu with virgl (depends on
> > opengl), or you can build without virgl (doesn't use opengl then).  I don't think it is a good
> > idea to add a third mode, without virgl support but using opengl for blob dma-bufs.
> [Kasireddy, Vivek] We'll have to re-visit this part but for our use-case with virtio-gpu, we
> are disabling virglrenderer in Qemu and virgl DRI driver in the Guest. However, we still
> need to use Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part of
> the UI/GTK updates. 

Well, VIRTGPU_BLOB_MEM_HOST3D blob resources are created using virgl
renderer commands (VIRGL_CCMD_PIPE_RESOURCE_CREATE).  So supporting that
without virglrenderer is not an option.

VIRTIO_GPU_BLOB_MEM_GUEST might be possible without too much effort.

> > > On a different note, any particular reason why Qemu UI EGL
> > > implementation is limited to Xorg and not extended to Wayland/Weston
> > > for which there is GTK glarea?
> > 
> > Well, ideally I'd love to just use glarea.  Which happens on wayland.
> > 
> > The problem with Xorg is that the gtk x11 backend uses glx not egl to create an opengl
> > context for glarea.  At least that used to be the case in the past, maybe that has changed
> > with newer versions.  qemu needs egl contexts though, otherwise dma-bufs don't work.  So
> > we are stuck with our own egl widget implementation for now.  Probably we will be able
> > to drop it at some point in the future.

> [Kasireddy, Vivek] GTK X11 backend still uses GLX and it seems like that is not going to 
> change anytime soon.

Hmm, so the egl backend has to stay for the time being.

> Having said that, I was wondering if it makes sense to add a new
> purely Wayland backend besides GtkGlArea so that Qemu UI can more quickly adopt new
> features such as explicit sync. I was thinking about the new backend being similar to this:
> https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf-egl.c

I'd prefer to not do that.

> The reason why I am proposing this idea is because even if we manage to add explicit 
> sync support to GTK and it gets merged, upgrading Qemu GTK support from 3.22 
> to > 4.x may prove to be daunting. Currently, the way I am doing explicit sync is
> by adding these new APIs to GTK and calling them from Qemu:

Well, we had the same code supporting gtk2+3 with #ifdefs.  There are
also #ifdefs to avoid using functions deprecated during 3.x lifetime.
So I expect porting to gtk4 wouldn't be too bad.

Also I expect qemu wouldn't be the only application needing sync
support, so trying to get that integrated with upstream gtk certainly
makes sense.

> Lastly, on a different note, I noticed that there is a virtio-gpu Windows driver here:
> https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/viogpu
> 
> We are going to try it out but do you know how up to date it is kept?

No, not following development closely.

take care,
  Gerd

RE: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Kasireddy, Vivek 3 years, 1 month ago

Hi Gerd,
Thank you for taking the time to explain how support for blob resources needs
to be added. We are going to get started soon and here are the tasks we are
planning to do in order of priority:

1) Add support for VIRTIO_GPU_BLOB_MEM_GUEST +
VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE
2) Upgrade Qemu GTK UI from 3.22 to 4.x
3) Add explicit sync support to GTK4 and Qemu UI
4) Add support for VIRTGPU_BLOB_MEM_HOST3D 

We'll start sending patches as we go along.

Thanks,
Vivek


> > [Kasireddy, Vivek] Sure, we'll take a look at your work and use that
> > as a starting point. Roughly, how much of your work can be reused?
> 
> There are some small udmabuf support patches which can probably be reused pretty much
> as-is.  Everything else needs larger changes I suspect, but it's been a while I looked at this
> ...
> 
> > Also, given my limited understanding of how discrete GPUs work, I was
> > wondering how many copies would there need to be with blob
> > resources/dmabufs and whether a zero-copy goal would be feasible or not?
> 
> Good question.
> 
> Right now there are two copies (gtk ui):
> 
>   (1) guest ram -> DisplaySurface -> gtk widget (gl=off), or
>   (2) guest ram -> DisplaySurface -> texture (gl=on).
> 
> You should be able to reduce this to one copy for gl=on ...
> 
>   (3) guest ram -> texture
> 
> ... by taking DisplaySurface out of the picture, without any changes to the guest/host
> interface.  Drawback is that it requires adding an opengl dependency to virtio-gpu even
> with virgl=off, because the virtio-gpu device will have to handle the copy to the texture
> then, in response to guest TRANSFER commands.
> 
> When adding blob resource support:
> 
> Easiest is probably supporting VIRTIO_GPU_BLOB_MEM_GUEST (largely identical to
> non-blob resources) with VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE
> (allows the host to create a shared mapping).  Then you can go create a udmabuf for the
> resource on the host side.  For the non-gl code path you can mmap() the udmabuf (which
> gives you a linear mapping for the scattered guest pages) and create a DisplaySurface
> backed by guest ram pages (removing the guest ram -> DisplaySurface copy).  For the gl
> code path you can create a texture backed by the udmabuf and go render on the host
> without copying at all.
> 
> Using VIRTIO_GPU_BLOB_MEM_GUEST +
> VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE for resources needs guest changes too,
> either in mesa (when using virgl) or the kernel driver's dumb buffer handling (when not
> using virgl).
> 
> Alternatively (listed more for completeness):
> 
> You can create a blob resource with VIRTGPU_BLOB_MEM_HOST3D (requires virgl,
> see also virgl_drm_winsys_resource_create_blob in mesa).  It will be allocated by the
> host, then mapped into the guest using a virtual pci memory bar.  Guest userspace (aka
> mesa driver) can mmap() these resources and has direct, zero-copy access to the host
> resource.
> 
> Going to dma-buf export that, import into i915, then let the gpu render implies we are
> doing p2p dma from a physical (pci-assigned) device to the memory bar of a virtual pci
> device.
> 
> Doing that should be possible, but frankly I would be surprised if that actually works out-
> of-the-box.  Dunno how many dragons are lurking here.
> Could become an interesting challenge to make that fly.
> 
> > > Beside the code duplication this is also a maintainance issue.  This
> > > adds one more configuration to virtio-gpu.  Right now you can build
> > > virtio-gpu with virgl (depends on opengl), or you can build without
> > > virgl (doesn't use opengl then).  I don't think it is a good idea to add a third mode,
> without virgl support but using opengl for blob dma-bufs.
> > [Kasireddy, Vivek] We'll have to re-visit this part but for our
> > use-case with virtio-gpu, we are disabling virglrenderer in Qemu and
> > virgl DRI driver in the Guest. However, we still need to use
> > Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part of the
> UI/GTK updates.
> 
> Well, VIRTGPU_BLOB_MEM_HOST3D blob resources are created using virgl renderer
> commands (VIRGL_CCMD_PIPE_RESOURCE_CREATE).  So supporting that without
> virglrenderer is not an option.
> 
> VIRTIO_GPU_BLOB_MEM_GUEST might be possible without too much effort.
> 
> > > > On a different note, any particular reason why Qemu UI EGL
> > > > implementation is limited to Xorg and not extended to
> > > > Wayland/Weston for which there is GTK glarea?
> > >
> > > Well, ideally I'd love to just use glarea.  Which happens on wayland.
> > >
> > > The problem with Xorg is that the gtk x11 backend uses glx not egl
> > > to create an opengl context for glarea.  At least that used to be
> > > the case in the past, maybe that has changed with newer versions.
> > > qemu needs egl contexts though, otherwise dma-bufs don't work.  So
> > > we are stuck with our own egl widget implementation for now.  Probably we will be
> able to drop it at some point in the future.
> 
> > [Kasireddy, Vivek] GTK X11 backend still uses GLX and it seems like
> > that is not going to change anytime soon.
> 
> Hmm, so the egl backend has to stay for the time being.
> 
> > Having said that, I was wondering if it makes sense to add a new
> > purely Wayland backend besides GtkGlArea so that Qemu UI can more
> > quickly adopt new features such as explicit sync. I was thinking about the new backend
> being similar to this:
> > https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf
> > -egl.c
> 
> I'd prefer to not do that.
> 
> > The reason why I am proposing this idea is because even if we manage
> > to add explicit sync support to GTK and it gets merged, upgrading Qemu
> > GTK support from 3.22 to > 4.x may prove to be daunting. Currently,
> > the way I am doing explicit sync is by adding these new APIs to GTK and calling them
> from Qemu:
> 
> Well, we had the same code supporting gtk2+3 with #ifdefs.  There are also #ifdefs to
> avoid using functions deprecated during 3.x lifetime.
> So I expect porting to gtk4 wouldn't be too bad.
> 
> Also I expect qemu wouldn't be the only application needing sync support, so trying to get
> that integrated with upstream gtk certainly makes sense.
> 
> > Lastly, on a different note, I noticed that there is a virtio-gpu Windows driver here:
> > https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/vi
> > ogpu
> >
> > We are going to try it out but do you know how up to date it is kept?
> 
> No, not following development closely.
> 
> take care,
>   Gerd

RE: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Zhang, Tina 3 years, 1 month ago


> -----Original Message-----
> From: Qemu-devel <qemu-devel-bounces+tina.zhang=intel.com@nongnu.org>
> On Behalf Of Gerd Hoffmann
> Sent: Tuesday, March 2, 2021 8:04 PM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>; Kim, Dongwon
> <dongwon.kim@intel.com>; qemu-devel@nongnu.org; Marc-André Lureau
> <marcandre.lureau@redhat.com>
> Subject: Re: [RFC 0/1] Use dmabufs for display updates instead of pixman
> 
> On Tue, Mar 02, 2021 at 12:03:57AM -0800, Vivek Kasireddy wrote:
> > This is still a WIP/RFC patch that attempts to use dmabufs for display
> > updates with the help of Udmabuf driver instead of pixman. This patch
> > is posted to the ML to elicit feedback and start a discussion whether
> > something like this would be useful or not for mainly non-Virgl
> > rendered BOs and also potentially in other cases.
> 
> Yes, it surely makes sense to go into that direction.
> The patch as-is doesn't, it breaks the guest/host interface.
> That's ok-ish for a quick proof-of-concept, but clearly not merge-able.

Hi,
According to https://lore.kernel.org/dri-devel/20210212110140.gdpu7kapnr7ovdcn@sirius.home.kraxel.org/ proposal, we made some progress on making a 'virtio-gpu (display) + pass-through GPU' prototype. We leverage the kmsro framework provided by mesa to let the virtio-gpu display work with a passed-through GPU in headless mode. And the MR is here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

Although our work is different from this on-going discussion which is about enabling a general way to share buffers between guest and host, we'd like to leverage this patch. So, is there any plan to refine this patch? E.g. move the uuid blob support into another patch, as the implementation of the proposal doesn't require guest user space to share buffers with host side, and also maybe add the dma-buf support for cursor plane. Thanks.

BR,
Tina

> 
> > TODO:
> > - Use Blob resources for getting meta-data such as modifier, format, etc.
> 
> That is pretty much mandatory.  Without blob resources there is no concept of
> resources shared between host and guest in virtio-gpu, all data is explicitly
> copied with transfer commands.
> 
> Which implies quite a bit of work because we don't have blob resource support
> in qemu yet.
> 
> > - Test with Virgil rendered BOs to see if this can be used in that case..
> 
> That also opens up the question how to go forward with virtio-gpu in general.
> The object hierarchy we have right now (skipping pci + vga variants for
> simplicity):
> 
>   TYPE_VIRTIO_GPU_BASE (abstract base)
>    -> TYPE_VIRTIO_GPU (in-qemu implementation)
>    -> TYPE_VHOST_USER_GPU (vhost-user implementation)
> 
> When compiled with opengl + virgl TYPE_VIRTIO_GPU has a virgl=on/off
> property.  Having a single device is not ideal for modular builds.
> because the hw-display-virtio-gpu.so module has a dependency on ui-opengl.so
> so that is needed (due to symbol references) even for the virgl=off case.  Also
> the code is a bit of a #ifdef mess.
> 
> I think we should split TYPE_VIRTIO_GPU into two devices.  Remove
> virgl+opengl support from TYPE_VIRTIO_GPU.  Add a new
> TYPE_VIRTIO_GPU_VIRGL, with either TYPE_VIRTIO_GPU or
> TYPE_VIRTIO_GPU_BASE as parent (not sure which is easier), have all
> opengl/virgl support code there.
> 
> I think when using opengl it makes sense to also require virgl, so we can use the
> virglrenderer library to manage blob resources (even when the actual rendering
> isn't done with virgl).  Also reduces the complexity and test matrix.
> 
> Maybe it even makes sense to deprecate in-qemu virgl support and focus
> exclusively on the vhost-user implementation, so we don't have to duplicate all
> work for both implementations.
> 
> > Considerations/Challenges:
> > - One of the main concerns with using dmabufs is how to synchronize
> > access to them and this use-case is no different. If the Guest is
> > running Weston, then it could use a maximum of 4 color buffers but
> > uses only 2 by default and flips between them if it is not sharing the
> > FBs with other plugins while running with the drm backend. In this
> > case, how do we make sure that Weston and Qemu UI are not using the same
> buffer at any given time?
> 
> There is graphic_hw_gl_block + graphic_hw_gl_flushed for syncronization.
> Right now this is only wired up in spice, and it is rather simple (just stalls virgl
> rendering instead of providing per-buffer syncronization).
> 
> > - If we have Xorg running in the Guest, then it gets even more
> > interesting as Xorg in some cases does frontbuffer rendering (uses
> DRM_IOCTL_MODE_DIRTYFB).
> 
> Well, if the guest does frontbuffer rendering we can't do much about it and have
> to live with rendering glitches I guess.
> 
> take care,
>   Gerd
>

Re: [RFC 0/1] Use dmabufs for display updates instead of pixman

Posted by Gerd Hoffmann 3 years, 1 month ago

> Hi,

> According to
> https://lore.kernel.org/dri-devel/20210212110140.gdpu7kapnr7ovdcn@sirius.home.kraxel.org/
> proposal, we made some progress on making a 'virtio-gpu (display) +
> pass-through GPU' prototype. We leverage the kmsro framework provided
> by mesa to let the virtio-gpu display work with a passed-through GPU
> in headless mode. And the MR is here:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

Cool.

> Although our work is different from this on-going discussion which is
> about enabling a general way to share buffers between guest and host,
> we'd like to leverage this patch. So, is there any plan to refine this
> patch?

Item (1) on Vivek's new TODO list should provide that.  Once we have
shared blob resources we can create udmabufs on the host side, which
in turn allows to drop extra copies in the display path and speed up
this use case as well (both with and without opengl).

take care,
  Gerd