[v1] make display updates thread safe.

[Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

  Hi,

First attempt on making display updates thread-save for real.  Most
interesting patches at this point are #2 (adds helper functions to
create and use a dirty bitmap copy) and #3 (updates vga code to use
them).

Patch #1 fixes a bug I've noticed while wading through the vga code,
and #4 removes the temporary workaround for testing purposes.  Which
will of course break some or all non-vga display adapters as those
are not ported over yet.

git branch is available here:
  git://git.kraxel.org/qemu work/vga-fixes

Gerd Hoffmann (4):
  vga: add vga_scanline_invalidated helper
  memory: add support getting and using a dirty bitmap copy.
  vga: make display updates thread safe.
  [testing] console: remove do_safe_dpy_refresh

 exec.c                  | 79 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/display/vga.c        | 50 +++++++++++++++++--------------
 include/exec/memory.h   | 13 ++++++++
 include/exec/ram_addr.h |  8 +++++
 include/qemu/typedefs.h |  1 +
 memory.c                | 15 ++++++++++
 ui/console.c            | 25 +---------------
 7 files changed, 145 insertions(+), 46 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Mark Cave-Ayland 8 years, 10 months ago

On 30/03/17 07:55, Gerd Hoffmann wrote:

>   Hi,
> 
> First attempt on making display updates thread-save for real.  Most
> interesting patches at this point are #2 (adds helper functions to
> create and use a dirty bitmap copy) and #3 (updates vga code to use
> them).
> 
> Patch #1 fixes a bug I've noticed while wading through the vga code,
> and #4 removes the temporary workaround for testing purposes.  Which
> will of course break some or all non-vga display adapters as those
> are not ported over yet.
> 
> git branch is available here:
>   git://git.kraxel.org/qemu work/vga-fixes
> 
> Gerd Hoffmann (4):
>   vga: add vga_scanline_invalidated helper
>   memory: add support getting and using a dirty bitmap copy.
>   vga: make display updates thread safe.
>   [testing] console: remove do_safe_dpy_refresh
> 
>  exec.c                  | 79 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/display/vga.c        | 50 +++++++++++++++++--------------
>  include/exec/memory.h   | 13 ++++++++
>  include/exec/ram_addr.h |  8 +++++
>  include/qemu/typedefs.h |  1 +
>  memory.c                | 15 ++++++++++
>  ui/console.c            | 25 +---------------
>  7 files changed, 145 insertions(+), 46 deletions(-)

Excellent! I can help out with converting and/or testing the SPARC
devices (cg3/tcx) if required.

Out of interest, from your work do you have a rough estimate as to how
this affects guest performance? For example benchmarks with Jan's
original commit to reduce the locking vs. Alex's current workaround vs.
with this patchset applied?


ATB,

Mark.

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

  Hi,

> Excellent! I can help out with converting and/or testing the SPARC
> devices (cg3/tcx) if required.

Sure, test results and patches are very welcome.

I've touched dirty bitmap code for the first time, so I've sent out this
rfc for early feedback at the approach taken before going out converting
more display drivers.  Seems nobody found patch #2 horrible so far, I
take that as good sign that we can go on with this draft API.

> Out of interest, from your work do you have a rough estimate as to how
> this affects guest performance? For example benchmarks with Jan's
> original commit to reduce the locking vs. Alex's current workaround vs.
> with this patchset applied?

I expect this improves performance with mttcg because the locking goes
away, which allows more concurrency, probably more visible with more
guest cores.  Didn't benchmark it though.

cheers,
  Gerd

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Mark Cave-Ayland 8 years, 10 months ago

On 30/03/17 14:41, Gerd Hoffmann wrote:

>   Hi,
> 
>> Excellent! I can help out with converting and/or testing the SPARC
>> devices (cg3/tcx) if required.
> 
> Sure, test results and patches are very welcome.

I've spent a bit of time over the weekend attempting to convert the
SPARC CG3/TCX framebuffers over to use your new code and pushed the
result to my github branch here:
https://github.com/mcayland/qemu/commits/vga-fixes-sparc.

Feel free to steal/test/borrow as required. The two main things I've
discovered to date are:

1) Problems with asserts() in cpu_physical_memory_snapshot_get_dirty()

Using your current git branch, I get an assert() using VGA under
qemu-system-ppc on startup unless I change the first assert to
"assert(start >= snap->start)".

I'm also not convinced by the second assert "assert(start + length <
snap->end)" either. In the example CG3/TCX conversions I ended up having
to subtract 1 from the size of the region passed to
memory_region_snapshot_get_dirty() in order to avoid it firing when
accessing the last scanline which seems wrong. I didn't explicitly add a
patch to change this to "<=" in my git patchset just in case it broke
further assumptions in the bitmap code relating to the end address.

2) Redraw issues with CG3/TCX after conversion

Despite the conversion from your patches looking reasonably
straightforward, I still see problems with areas of the screen not
updating after conversion. I can easily reproduce these locally as follows:

./qemu-system-sparc -vga cg3 -prom-env 'auto-boot?=false'
./qemu-system-sparc -vga tcx -g 1024x768x8 -prom-env 'auto-boot?=false'
./qemu-system-sparc -vga tcx -g 1024x768x24 -prom-env 'auto-boot?=false'

You can trigger extra screen updates by typing "show-devs" into OpenBIOS
which will cause a lot of screen activity as it writes the device tree
out to the framebuffer. I'm wondering if maybe this is related to the
difference in sizes between DIRTY_MEMORY_BLOCK_SIZE and TARGET_PAGE_BITS
for qemu-system-sparc?

Finally on a slightly unrelated note, the TCX driver still has old code
for 8 and 16 bpp surfaces in QEMU. Am I right in understanding that all
surfaces in QEMU are now 32-bit and the 8/16 bpp code can be removed?

ATB,

Mark.

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

  Hi,

> I've spent a bit of time over the weekend attempting to convert the
> SPARC CG3/TCX framebuffers over to use your new code and pushed the
> result to my github branch here:
> https://github.com/mcayland/qemu/commits/vga-fixes-sparc.

work/vga-fixes branch updated & rebased to latest master, with your
patches added.

> Using your current git branch, I get an assert() using VGA under
> qemu-system-ppc on startup unless I change the first assert to
> "assert(start >= snap->start)".
> 
> I'm also not convinced by the second assert "assert(start + length <
> snap->end)" either. In the example CG3/TCX conversions I ended up having
> to subtract 1 from the size of the region passed to
> memory_region_snapshot_get_dirty() in order to avoid it firing when
> accessing the last scanline which seems wrong. I didn't explicitly add a
> patch to change this to "<=" in my git patchset just in case it broke
> further assumptions in the bitmap code relating to the end address.

Both asserts should use inclusive compare.  Added a patch fixing the
other assert too.

> 2) Redraw issues with CG3/TCX after conversion
> 
> Despite the conversion from your patches looking reasonably
> straightforward, I still see problems with areas of the screen not
> updating after conversion. I can easily reproduce these locally as follows:
> 
> ./qemu-system-sparc -vga cg3 -prom-env 'auto-boot?=false'
> ./qemu-system-sparc -vga tcx -g 1024x768x8 -prom-env 'auto-boot?=false'
> ./qemu-system-sparc -vga tcx -g 1024x768x24 -prom-env 'auto-boot?=false'

Hmm, I have a blank screen only (yellow with cg3, black with tcx).
No openfirmware prompt showing up ...

> Finally on a slightly unrelated note, the TCX driver still has old code
> for 8 and 16 bpp surfaces in QEMU. Am I right in understanding that all
> surfaces in QEMU are now 32-bit and the 8/16 bpp code can be removed?

Yes, that is correct.  tcx uses qemu_resize_console() so surfaces will
always be 32bpp.

cheers,
  Gerd

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Mark Cave-Ayland 8 years, 10 months ago

On 03/04/17 09:44, Gerd Hoffmann wrote:

>   Hi,
> 
>> I've spent a bit of time over the weekend attempting to convert the
>> SPARC CG3/TCX framebuffers over to use your new code and pushed the
>> result to my github branch here:
>> https://github.com/mcayland/qemu/commits/vga-fixes-sparc.
> 
> work/vga-fixes branch updated & rebased to latest master, with your
> patches added.

I've just rebased on your branch with some extra fixes which should be
squashed into the earlier commits and pushed to github again - see the
commit messages for more information.

>> 2) Redraw issues with CG3/TCX after conversion
>>
>> Despite the conversion from your patches looking reasonably
>> straightforward, I still see problems with areas of the screen not
>> updating after conversion. I can easily reproduce these locally as follows:
>>
>> ./qemu-system-sparc -vga cg3 -prom-env 'auto-boot?=false'
>> ./qemu-system-sparc -vga tcx -g 1024x768x8 -prom-env 'auto-boot?=false'
>> ./qemu-system-sparc -vga tcx -g 1024x768x24 -prom-env 'auto-boot?=false'
> 
> Hmm, I have a blank screen only (yellow with cg3, black with tcx).
> No openfirmware prompt showing up ...

Yeah it's slightly baffling. The easiest one to look at is cg3 since
it's a simple 8-bit palette dumb framebuffer with 256 colors, i.e. the
differences between 8d3af4a~1 and 8d3af4a ("cg3: make display updates
thread safe") in your updated branch. If nothing seems wrong with the
conversion (once my latest fixes have been squashed in) then I presume
something odd is happening in the dirty bitmap code...

ATB,

Mark.

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Paolo Bonzini 8 years, 10 months ago


On 30/03/2017 15:41, Gerd Hoffmann wrote:
>   Hi,
> 
>> Excellent! I can help out with converting and/or testing the SPARC
>> devices (cg3/tcx) if required.
> 
> Sure, test results and patches are very welcome.
> 
> I've touched dirty bitmap code for the first time, so I've sent out this
> rfc for early feedback at the approach taken before going out converting
> more display drivers.  Seems nobody found patch #2 horrible so far, I
> take that as good sign that we can go on with this draft API.
> 
>> Out of interest, from your work do you have a rough estimate as to how
>> this affects guest performance? For example benchmarks with Jan's
>> original commit to reduce the locking vs. Alex's current workaround vs.
>> with this patchset applied?
> 
> I expect this improves performance with mttcg because the locking goes
> away, which allows more concurrency, probably more visible with more
> guest cores.  Didn't benchmark it though.

I checked the branch, is bitmap_copy_and_clear_atomic correct when you
have partial updates?  Maybe you need to handle partial updates of the
first and last word?  Also, the first and last bit should optionally be
left untouched, for example:

first y: 20
last y: 99
stride: 320
bytes read: 6400..31999
snapshotted pages: 4096..32767 (copying all of word 0 would do)
cleared dirty bits: 8192..28671 (only bits 1..14 must be zeroed)

Thanks,

Paolo

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

  Hi,

> I checked the branch, is bitmap_copy_and_clear_atomic correct when you
> have partial updates?  Maybe you need to handle partial updates of the
> first and last word?

Should not be a problem.  We might clear some more bits, but these are
outsize the visible area so they should cause visible corruption (and if
the visible area changes the display code needs to do a full refresh
anyway).

cheers,
  Gerd

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Mark Cave-Ayland 8 years, 10 months ago

On 03/04/17 13:03, Gerd Hoffmann wrote:

>   Hi,
> 
>> I checked the branch, is bitmap_copy_and_clear_atomic correct when you
>> have partial updates?  Maybe you need to handle partial updates of the
>> first and last word?
> 
> Should not be a problem.  We might clear some more bits, but these are
> outsize the visible area so they should cause visible corruption (and if
> the visible area changes the display code needs to do a full refresh
> anyway).

Right. Also is there a reason that
cpu_physical_memory_snapshot_and_clear_dirty() uses BITS_PER_LEVEL when
calculating the alignment used for the first/last addresses? I would
have thought you can remove the align variable and use:

    ram_addr_t first = QEMU_ALIGN_DOWN(start, TARGET_PAGE_SIZE);
    ram_addr_t last  = QEMU_ALIGN_UP(start + length, TARGET_PAGE_SIZE);

Otherwise you end up expanding the range of your last address
considerably beyond the next page alignment. And also in the main page
loop why is BITS_PER_LEVEL used? I see that several of the internal
bitmap routines appear to use BITS_PER_LONG for compressing into the
bitmap which might be more appropriate?

ATB,

Mark.

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

On Mo, 2017-04-03 at 13:24 +0100, Mark Cave-Ayland wrote:
> On 03/04/17 13:03, Gerd Hoffmann wrote:
> 
> >   Hi,
> > 
> >> I checked the branch, is bitmap_copy_and_clear_atomic correct when you
> >> have partial updates?  Maybe you need to handle partial updates of the
> >> first and last word?
> > 
> > Should not be a problem.  We might clear some more bits, but these are
> > outsize the visible area so they should cause visible corruption (and if
> > the visible area changes the display code needs to do a full refresh
> > anyway).
> 
> Right. Also is there a reason that
> cpu_physical_memory_snapshot_and_clear_dirty() uses BITS_PER_LEVEL when
> calculating the alignment used for the first/last addresses?

The code copy doesn't operate on bits but on bitmap longs, so I can just
copy the whole long instead of fiddeling with bits.  So I round down to
a bitmap long start.  On 64bit hosts that are units of 64 pages.

Actually querying the copied bitmap operates on page granularity of
course.

> Otherwise you end up expanding the range of your last address
> considerably beyond the next page alignment.

Yes, it's larger, but as it expands to the start of the long word I
expect the penalty is almost zero (same cache line) and being able to
just copy the whole long instead of dealing with individual bits should
be a win.

> And also in the main page
> loop why is BITS_PER_LEVEL used? I see that several of the internal
> bitmap routines appear to use BITS_PER_LONG for compressing into the
> bitmap which might be more appropriate?

shift by BITS_PER_LEVEL is the same as division by BITS_PER_LONG

cheers,
  Gerd

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Mark Cave-Ayland 8 years, 10 months ago

On 03/04/17 13:42, Gerd Hoffmann wrote:

> On Mo, 2017-04-03 at 13:24 +0100, Mark Cave-Ayland wrote:
>> On 03/04/17 13:03, Gerd Hoffmann wrote:
>>
>>>   Hi,
>>>
>>>> I checked the branch, is bitmap_copy_and_clear_atomic correct when you
>>>> have partial updates?  Maybe you need to handle partial updates of the
>>>> first and last word?
>>>
>>> Should not be a problem.  We might clear some more bits, but these are
>>> outsize the visible area so they should cause visible corruption (and if
>>> the visible area changes the display code needs to do a full refresh
>>> anyway).
>>
>> Right. Also is there a reason that
>> cpu_physical_memory_snapshot_and_clear_dirty() uses BITS_PER_LEVEL when
>> calculating the alignment used for the first/last addresses?
> 
> The code copy doesn't operate on bits but on bitmap longs, so I can just
> copy the whole long instead of fiddeling with bits.  So I round down to
> a bitmap long start.  On 64bit hosts that are units of 64 pages.
> 
> Actually querying the copied bitmap operates on page granularity of
> course.
> 
>> Otherwise you end up expanding the range of your last address
>> considerably beyond the next page alignment.
> 
> Yes, it's larger, but as it expands to the start of the long word I
> expect the penalty is almost zero (same cache line) and being able to
> just copy the whole long instead of dealing with individual bits should
> be a win.
> 
>> And also in the main page
>> loop why is BITS_PER_LEVEL used? I see that several of the internal
>> bitmap routines appear to use BITS_PER_LONG for compressing into the
>> bitmap which might be more appropriate?
> 
> shift by BITS_PER_LEVEL is the same as division by BITS_PER_LONG

Right hopefully I understand this now. The following diff appears to fix
CG3/TCX for me:

diff --git a/exec.c b/exec.c
index 000af18..66bdcf4 100644
--- a/exec.c
+++ b/exec.c
@@ -1071,7 +1071,7 @@ DirtyBitmapSnapshot
*cpu_physical_memory_snapshot_and_clear_dirty
      (ram_addr_t start, ram_addr_t length, unsigned client)
 {
     DirtyMemoryBlocks *blocks;
-    unsigned long align = 1 << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
+    unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
     ram_addr_t first = QEMU_ALIGN_DOWN(start, align);
     ram_addr_t last  = QEMU_ALIGN_UP(start + length, align);
     DirtyBitmapSnapshot *snap;
@@ -1097,7 +1097,6 @@ DirtyBitmapSnapshot
*cpu_physical_memory_snapshot_and_clear_dirty

         assert(QEMU_IS_ALIGNED(offset, (1 << BITS_PER_LEVEL)));
         assert(QEMU_IS_ALIGNED(num,    (1 << BITS_PER_LEVEL)));
-        offset >>= BITS_PER_LEVEL;

         bitmap_copy_and_clear_atomic(snap->dirty + dest,
                                      blocks->blocks[idx] + (offset >>
BITS_PER_LEVEL),


There were 2 issues here: without the UL suffix on align I was getting
incorrect first/last addresses since the high bits of align weren't
being cleared, and then offset appeared to be shifted twice.

Does this look reasonable to you?


ATB,

Mark.

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

  Hi,

> -    unsigned long align = 1 << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
> +    unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);

> There were 2 issues here: without the UL suffix on align I was getting
> incorrect first/last addresses since the high bits of align weren't
> being cleared,

Ah, thanks, I'll add that.

> and then offset appeared to be shifted twice.

Yep, noticed that too meanwhile, fixed in the branch pushed half an hour
ago.  I've dropped the other shift though ;)

cheers,
  Gerd

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Mark Cave-Ayland 8 years, 10 months ago

On 04/04/17 07:12, Gerd Hoffmann wrote:

>   Hi,
> 
>> -    unsigned long align = 1 << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
>> +    unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
> 
>> There were 2 issues here: without the UL suffix on align I was getting
>> incorrect first/last addresses since the high bits of align weren't
>> being cleared,
> 
> Ah, thanks, I'll add that.
> 
>> and then offset appeared to be shifted twice.
> 
> Yep, noticed that too meanwhile, fixed in the branch pushed half an hour
> ago.  I've dropped the other shift though ;)

Confirmed!

The only minor nit I've noticed is that commit 322aef7 "cg3: fix up size
parameter for memory_region_get_dirty()" isn't quite right now that the
asserts() in cpu_physical_memory_snapshot_get_dirty() have now been
fixed - the size should now be "width" rather than "width - 1".

Other than that, I've just given it a quick spin across my SPARC CG3/TCX
and PPC VGA images and it looks good here :)


ATB,

Mark.

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Paolo Bonzini 8 years, 10 months ago

On 03/04/2017 14:03, Gerd Hoffmann wrote:
> We might clear some more bits, but these are
> outsize the visible area so they should cause visible corruption (and if
> the visible area changes the display code needs to do a full refresh
> anyway).

True, though this makes the snapshot abstraction a little more
specialized.  Please add a FIXME comment with the explanation, at least.

Paolo

Re: [Qemu-devel] [RfC PATCH 0/4] make display updates thread safe.

Posted by Gerd Hoffmann 8 years, 10 months ago

On Mo, 2017-04-03 at 19:02 +0200, Paolo Bonzini wrote:
> 
> On 03/04/2017 14:03, Gerd Hoffmann wrote:
> > We might clear some more bits, but these are
> > outsize the visible area so they should cause visible corruption (and if
> > the visible area changes the display code needs to do a full refresh
> > anyway).
> 
> True, though this makes the snapshot abstraction a little more
> specialized.  Please add a FIXME comment with the explanation, at least.

Sure, I'll add a note to the (to be written) docs.

cheers,
  Gerd