[PATCH v2 0/3] exclude hyperv synic sections from vhost

Dr. David Alan Gilbert (git) posted 3 patches 1 week ago
Test docker-mingw@fedora passed
Test checkpatch passed
Test docker-quick@centos7 passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200113173647.84842-1-dgilbert@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
hw/hyperv/hyperv.c    |  8 ++++++++
hw/virtio/vhost.c     | 10 ++++++----
include/exec/memory.h | 21 +++++++++++++++++++++
memory.c              | 15 +++++++++++++++
4 files changed, 50 insertions(+), 4 deletions(-)

[PATCH v2 0/3] exclude hyperv synic sections from vhost

Posted by Dr. David Alan Gilbert (git) 1 week ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hyperv's synic (that we emulate) is a feature that allows the guest
to place some magic (4k) pages of RAM anywhere it likes in GPA.
This confuses vhost's RAM section merging when these pages
land over the top of hugepages.

Since they're not normal RAM, and they shouldn't have vhost DMAing
into them, exclude them from the vhost set.

This v2 is a complete rework after the v1 review; I've now got
a flag on MemoryRegion's that we set.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041


Dr. David Alan Gilbert (3):
  vhost: Add names to section rounded warning
  memory: Allow a MemoryRegion to be marked no_vhost
  hyperv/synic: Mark regions as no vhost

 hw/hyperv/hyperv.c    |  8 ++++++++
 hw/virtio/vhost.c     | 10 ++++++----
 include/exec/memory.h | 21 +++++++++++++++++++++
 memory.c              | 15 +++++++++++++++
 4 files changed, 50 insertions(+), 4 deletions(-)

-- 
2.24.1


Re: [PATCH v2 0/3] exclude hyperv synic sections from vhost

Posted by Roman Kagan 1 week ago
On Mon, Jan 13, 2020 at 05:36:44PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.
> 
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.

I still don't think this is correct assessment.  These pages are normal
RAM, perfectly eligible for DMA and what not.

It was a thinko to implement them this way, taking the Hyper-V spec too
literally.  Among the downsides is the excessive consumption of KVM
memslots, and unnecessary large page splits or conflicts with
unsplittable ones.  I'm working on an alternative approach that doesn't
suffer from these issues; struggling to preserve compatibility ATM.

Thanks,
Roman.

Re: [PATCH v2 0/3] exclude hyperv synic sections from vhost

Posted by Paolo Bonzini 1 week ago
On 13/01/20 18:36, Dr. David Alan Gilbert (git) wrote:
> 
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.

Can you explain what is the confusion like?  The memory API should just
tell vhost to treat it as three sections (RAM before synIC, synIC
region, RAM after synIC) and it's not clear to me why postcopy breaks
either.

Paolo

> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.


Re: [PATCH v2 0/3] exclude hyperv synic sections from vhost

Posted by Dr. David Alan Gilbert 1 week ago
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 13/01/20 18:36, Dr. David Alan Gilbert (git) wrote:
> > 
> > Hyperv's synic (that we emulate) is a feature that allows the guest
> > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > This confuses vhost's RAM section merging when these pages
> > land over the top of hugepages.
> 
> Can you explain what is the confusion like?  The memory API should just
> tell vhost to treat it as three sections (RAM before synIC, synIC
> region, RAM after synIC) and it's not clear to me why postcopy breaks
> either.

See my v3 I posted yesterday; I've made this a lot simpler by just
turning the alignment off for vhost-kernel and only enabling it for
vhost-user;  vhost-user skips any section without a backing fd anyway,
so the synic problem goes away, as does another problem reported by
Peter Lieven that he was seeing that seemed like one of the VGA regions
getting in the way (which I'd not seen before).

Dave

> Paolo
> 
> > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > into them, exclude them from the vhost set.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v2 0/3] exclude hyperv synic sections from vhost

Posted by Dr. David Alan Gilbert 1 week ago
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 13/01/20 18:36, Dr. David Alan Gilbert (git) wrote:
> > 
> > Hyperv's synic (that we emulate) is a feature that allows the guest
> > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > This confuses vhost's RAM section merging when these pages
> > land over the top of hugepages.
> 
> Can you explain what is the confusion like?  The memory API should just
> tell vhost to treat it as three sections (RAM before synIC, synIC
> region, RAM after synIC) and it's not clear to me why postcopy breaks
> either.

There's two separate problems:
  a) For vhost-user there's a limited size for the 'mem table' message
     containing the number of regions to send; that's small - so an
     attempt is made to coalesce regions that all refer to the same
     underlying RAMblock.  If things split the region up you use more
     slots. (it's why the coalescing code was originally there.)

  b) With postcopy + vhost-user life gets more complex because of
     userfault.  We require that the vhost-user client can mmap the
     memory areas on host page granularity (i.e. hugepage granularity
     if it's hugepage backed).  To do that we tweak the aggregation code
     to align the blocks to page size boundaries and then perform
     aggregation - as long as nothing else important gets in the way
     we're OK.
     In this case the guest is programming synic to land at the 512k
     boundary (in 16 separate 4k pages next to each other).  So we end
     up with 0-512k (stretched to 0..2MB alignment) - then we see
     synic (512k-+4k ...) then we see RAM at 640k - and when we try
     to align that we error because we realise the synic mapping is in
     the way and we can't merge the 640k ram chunk with the base 0-512k
     aligned chunk.

Note the reported failure here is kernel vhost, not vhost-user;
so actually it probably doesn't need the alignment, and vhost-user would
probably filter out the synic mappings anyway due to the fact they've
not got an fd ( vhost_user_mem_section_filter ).  But the alignment
code always runs.

Dave



> Paolo
> 
> > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > into them, exclude them from the vhost set.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v2 0/3] exclude hyperv synic sections from vhost

Posted by Michael S. Tsirkin 1 week ago
On Mon, Jan 13, 2020 at 06:58:30PM +0000, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
> > On 13/01/20 18:36, Dr. David Alan Gilbert (git) wrote:
> > > 
> > > Hyperv's synic (that we emulate) is a feature that allows the guest
> > > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > > This confuses vhost's RAM section merging when these pages
> > > land over the top of hugepages.
> > 
> > Can you explain what is the confusion like?  The memory API should just
> > tell vhost to treat it as three sections (RAM before synIC, synIC
> > region, RAM after synIC) and it's not clear to me why postcopy breaks
> > either.
> 
> There's two separate problems:
>   a) For vhost-user there's a limited size for the 'mem table' message
>      containing the number of regions to send; that's small - so an
>      attempt is made to coalesce regions that all refer to the same
>      underlying RAMblock.  If things split the region up you use more
>      slots. (it's why the coalescing code was originally there.)
> 
>   b) With postcopy + vhost-user life gets more complex because of
>      userfault.  We require that the vhost-user client can mmap the
>      memory areas on host page granularity (i.e. hugepage granularity
>      if it's hugepage backed).  To do that we tweak the aggregation code
>      to align the blocks to page size boundaries and then perform
>      aggregation - as long as nothing else important gets in the way
>      we're OK.
>      In this case the guest is programming synic to land at the 512k
>      boundary (in 16 separate 4k pages next to each other).  So we end
>      up with 0-512k (stretched to 0..2MB alignment) - then we see
>      synic (512k-+4k ...) then we see RAM at 640k - and when we try
>      to align that we error because we realise the synic mapping is in
>      the way and we can't merge the 640k ram chunk with the base 0-512k
>      aligned chunk.
> 
> Note the reported failure here is kernel vhost, not vhost-user;
> so actually it probably doesn't need the alignment,

Yea vhost in the kernel just does copy from/to user. No alignment
requirements.

> and vhost-user would
> probably filter out the synic mappings anyway due to the fact they've
> not got an fd ( vhost_user_mem_section_filter ).  But the alignment
> code always runs.
> 
> Dave
> 
> 
> 
> > Paolo
> > 
> > > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > > into them, exclude them from the vhost set.
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK