[PATCH 0/2] exclude hyperv synic sections from vhost

Dr. David Alan Gilbert (git) posted 2 patches 4 years, 3 months ago
Test asan failed
Test docker-mingw@fedora passed
Test checkpatch passed
Test docker-quick@centos7 passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200108135353.75471-1-dgilbert@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>
There is a newer version of this series
hw/hyperv/hyperv.c | 14 ++++++++------
hw/virtio/vhost.c  |  1 +
2 files changed, 9 insertions(+), 6 deletions(-)
[PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Dr. David Alan Gilbert (git) 4 years, 3 months ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hyperv's synic (that we emulate) is a feature that allows the guest
to place some magic (4k) pages of RAM anywhere it likes in GPA.
This confuses vhost's RAM section merging when these pages
land over the top of hugepages.

Since they're not normal RAM, and they shouldn't have vhost DMAing
into them, exclude them from the vhost set.

I do that by marking them as device-ram and then excluding device-ram
from vhost.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041

Dr. David Alan Gilbert (2):
  vhost: Don't pass ram device sections
  hyperv/synic: Allocate as ram_device

 hw/hyperv/hyperv.c | 14 ++++++++------
 hw/virtio/vhost.c  |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

-- 
2.24.1


Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Vitaly Kuznetsov 4 years, 3 months ago
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.
>
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.
>
> I do that by marking them as device-ram and then excluding device-ram
> from vhost.
>
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
>

Cc: Roman who's The Synic Overlord to make sure he doesn't miss this.

-- 
Vitaly


Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Jason Wang 4 years, 3 months ago
On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.


Hi David:

A silly question, is this because the alignment when adding sections? If 
yes, what's the reason for doing alignment which is not a must for vhost 
memory table.

Thanks


>
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.
>
> I do that by marking them as device-ram and then excluding device-ram
> from vhost.
>
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
>
> Dr. David Alan Gilbert (2):
>    vhost: Don't pass ram device sections
>    hyperv/synic: Allocate as ram_device
>
>   hw/hyperv/hyperv.c | 14 ++++++++------
>   hw/virtio/vhost.c  |  1 +
>   2 files changed, 9 insertions(+), 6 deletions(-)
>


Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Dr. David Alan Gilbert 4 years, 3 months ago
* Jason Wang (jasowang@redhat.com) wrote:
> 
> On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hyperv's synic (that we emulate) is a feature that allows the guest
> > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > This confuses vhost's RAM section merging when these pages
> > land over the top of hugepages.
> 
> 
> Hi David:
> 
> A silly question, is this because the alignment when adding sections? If
> yes, what's the reason for doing alignment which is not a must for vhost
> memory table.

Page alignment is a bit odd with vhost-user - it ends up having to mmap
each of the sections itself; and still has to map them as hugepages
to be able to mmap - in the old world you could sometimes have
the daemon mmaping the same chunk of memory twice into the vhost-user
process; without the aggregation  you'd get a hugepage mapping for the
0-2MB chunk for the 0-512K mapping, and then maybe another 0-2MB chunk
for some of the other parts over 512K.
With postcopy we can't have the multiple mappings of the same part of
guest memory; we need to have one mapping for userfault.

Also, given the 16 separate synic regions, you'd probably end up having
a lot of wasted vhost-sections.

Dave




> Thanks
> 
> 
> > 
> > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > into them, exclude them from the vhost set.
> > 
> > I do that by marking them as device-ram and then excluding device-ram
> > from vhost.
> > 
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
> > 
> > Dr. David Alan Gilbert (2):
> >    vhost: Don't pass ram device sections
> >    hyperv/synic: Allocate as ram_device
> > 
> >   hw/hyperv/hyperv.c | 14 ++++++++------
> >   hw/virtio/vhost.c  |  1 +
> >   2 files changed, 9 insertions(+), 6 deletions(-)
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Michael S. Tsirkin 4 years, 3 months ago
On Thu, Jan 09, 2020 at 12:02:16PM +0000, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
> > 
> > On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hyperv's synic (that we emulate) is a feature that allows the guest
> > > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > > This confuses vhost's RAM section merging when these pages
> > > land over the top of hugepages.
> > 
> > 
> > Hi David:
> > 
> > A silly question, is this because the alignment when adding sections? If
> > yes, what's the reason for doing alignment which is not a must for vhost
> > memory table.
> 
> Page alignment is a bit odd with vhost-user - it ends up having to mmap
> each of the sections itself; and still has to map them as hugepages
> to be able to mmap - in the old world you could sometimes have
> the daemon mmaping the same chunk of memory twice into the vhost-user
> process; without the aggregation  you'd get a hugepage mapping for the
> 0-2MB chunk for the 0-512K mapping, and then maybe another 0-2MB chunk
> for some of the other parts over 512K.
> With postcopy we can't have the multiple mappings of the same part of
> guest memory; we need to have one mapping for userfault.
> 
> Also, given the 16 separate synic regions, you'd probably end up having
> a lot of wasted vhost-sections.
> 
> Dave

So I'd worry that this is more an abuse of an interface.
E.g. this means it's skipped from dumps, which is not nice.


And for vhost I worry these patches will break pass-through of
PCI attached memory.


> 
> 
> 
> > Thanks
> > 
> > 
> > > 
> > > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > > into them, exclude them from the vhost set.
> > > 
> > > I do that by marking them as device-ram and then excluding device-ram
> > > from vhost.
> > > 
> > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
> > > 
> > > Dr. David Alan Gilbert (2):
> > >    vhost: Don't pass ram device sections
> > >    hyperv/synic: Allocate as ram_device
> > > 
> > >   hw/hyperv/hyperv.c | 14 ++++++++------
> > >   hw/virtio/vhost.c  |  1 +
> > >   2 files changed, 9 insertions(+), 6 deletions(-)
> > > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Vitaly Kuznetsov 4 years, 3 months ago
Jason Wang <jasowang@redhat.com> writes:

> On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> Hyperv's synic (that we emulate) is a feature that allows the guest
>> to place some magic (4k) pages of RAM anywhere it likes in GPA.
>> This confuses vhost's RAM section merging when these pages
>> land over the top of hugepages.
>
>
> Hi David:
>
> A silly question, is this because the alignment when adding sections? If 
> yes, what's the reason for doing alignment which is not a must for vhost 
> memory table.

SynIC regions are two 4k pages and they are picked by the guest, not the
host. These can be anywhere in guest's ram.

-- 
Vitaly


Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Roman Kagan 4 years, 3 months ago
On Wed, Jan 08, 2020 at 01:53:51PM +0000, Dr. David Alan Gilbert (git) wrote:
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.
> 
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.

But they *are* normal RAM.  If the guest driver sets up the device to
DMA to a SynIC page so be it, and the guest deserves what it gets.

> I do that by marking them as device-ram and then excluding device-ram
> from vhost.
> 
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041

I was pointed a while back by Vitaly at
https://bugs.launchpad.net/qemu/+bug/1811533 which appeared to be the
same issue, but failed to reproduce the problem.  Can you please provide
some more detail as to how it's triggered?

Thanks,
Roman.

Re: [PATCH 0/2] exclude hyperv synic sections from vhost
Posted by Dr. David Alan Gilbert 4 years, 3 months ago
* Roman Kagan (rkagan@virtuozzo.com) wrote:
> On Wed, Jan 08, 2020 at 01:53:51PM +0000, Dr. David Alan Gilbert (git) wrote:
> > Hyperv's synic (that we emulate) is a feature that allows the guest
> > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > This confuses vhost's RAM section merging when these pages
> > land over the top of hugepages.
> > 
> > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > into them, exclude them from the vhost set.
> 
> But they *are* normal RAM.  If the guest driver sets up the device to
> DMA to a SynIC page so be it, and the guest deserves what it gets.

I don't think that's guaranteed to work.
However, in our case the guest isn't doing anything that crazy; it's
just setting the GPA of these pages in an inconveninent place for us.

> > I do that by marking them as device-ram and then excluding device-ram
> > from vhost.
> > 
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
> 
> I was pointed a while back by Vitaly at
> https://bugs.launchpad.net/qemu/+bug/1811533 which appeared to be the
> same issue, but failed to reproduce the problem.  Can you please provide
> some more detail as to how it's triggered?

Our test script is:
/usr/libexec/qemu-kvm \
    -name 'dgilbert-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 4096 \
    -object memory-backend-file,size=1024M,prealloc=no,mem-path=/mnt/kvm_hugepage,policy=default,id=mem-mem0 \
    -object memory-backend-file,size=3072M,prealloc=no,mem-path=/mnt/kvm_hugepage,policy=default,id=mem-mem1  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -numa node,memdev=mem-mem0  \
    -numa node,memdev=mem-mem1  \
    -cpu SandyBridge,hv_stimer,hv_time,hv_synic,hv_vpindex \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=./win2019-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:5c:d7:9f:cd:48,id=id7ex9m8,netdev=idim5Sro,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idim5Sro,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio -qmp tcp:0:4444,server,nowait

Dave

> Thanks,
> Roman.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK