[Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry()

Peter Xu posted 3 patches 8 years, 5 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
exec.c                 | 73 +++++++++++++++++++++++++++++++++-----------------
hw/virtio/trace-events |  4 +++
hw/virtio/vhost.c      | 66 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 119 insertions(+), 24 deletions(-)
[Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry()
Posted by Peter Xu 8 years, 5 months ago
With the patch applied:

  [PATCH v3] exec: fix address_space_get_iotlb_entry page mask
  (already in Paolo's pull request but not yet merged)

Now we can have valid address masks. However it is still not ideal,
considering that the mask may not be aligned to guest page sizes. One
example would be when huge page is used in guest (please see commit
message in patch 1 for details). It applies to normal pages too. So we
not only need a valid address mask, we should make sure it is page
mask (for x86, it should be either 4K/2M/1G pages).

Patch 1+2 fixes the problem. Tested with both kernel net driver or
testpmd, on either 4K/2M pages, to make sure the page mask is correct.

Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll
definitely want patch 3 now. Here's the simplest TCP streaming test
using vhost dmar and iommu=pt in guest:

  without patch 3:    12.0Gbps
  with patch 3:       33.5Gbps

Please review, thanks.

Peter Xu (3):
  exec: add page_mask for address_space_do_translate
  exec: simplify address_space_get_iotlb_entry
  vhost: iommu: cache static mapping if there is

 exec.c                 | 73 +++++++++++++++++++++++++++++++++-----------------
 hw/virtio/trace-events |  4 +++
 hw/virtio/vhost.c      | 66 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 24 deletions(-)

-- 
2.7.4


Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry()
Posted by Michael S. Tsirkin 8 years, 5 months ago
On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote:
> With the patch applied:
> 
>   [PATCH v3] exec: fix address_space_get_iotlb_entry page mask
>   (already in Paolo's pull request but not yet merged)
> 
> Now we can have valid address masks. However it is still not ideal,
> considering that the mask may not be aligned to guest page sizes. One
> example would be when huge page is used in guest (please see commit
> message in patch 1 for details). It applies to normal pages too. So we
> not only need a valid address mask, we should make sure it is page
> mask (for x86, it should be either 4K/2M/1G pages).

Why should we? To get better performance, right?

> Patch 1+2 fixes the problem. Tested with both kernel net driver or
> testpmd, on either 4K/2M pages, to make sure the page mask is correct.
> 
> Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll
> definitely want patch 3 now. Here's the simplest TCP streaming test
> using vhost dmar and iommu=pt in guest:
> 
>   without patch 3:    12.0Gbps

And what happens without patches 1-2?

>   with patch 3:       33.5Gbps

This is the part I don't get. Patches 1-2 will return a bigger region to
callers. The result should be better performance - instead it seems to
slow down vhost for some reason and we need tricks to get
performance back. What's going on?

> Please review, thanks.
> 
> Peter Xu (3):
>   exec: add page_mask for address_space_do_translate
>   exec: simplify address_space_get_iotlb_entry
>   vhost: iommu: cache static mapping if there is
> 
>  exec.c                 | 73 +++++++++++++++++++++++++++++++++-----------------
>  hw/virtio/trace-events |  4 +++
>  hw/virtio/vhost.c      | 66 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 119 insertions(+), 24 deletions(-)
> 
> -- 
> 2.7.4

Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry()
Posted by Peter Xu 8 years, 5 months ago
On Fri, Jun 02, 2017 at 05:51:07PM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote:
> > With the patch applied:
> > 
> >   [PATCH v3] exec: fix address_space_get_iotlb_entry page mask
> >   (already in Paolo's pull request but not yet merged)
> > 
> > Now we can have valid address masks. However it is still not ideal,
> > considering that the mask may not be aligned to guest page sizes. One
> > example would be when huge page is used in guest (please see commit
> > message in patch 1 for details). It applies to normal pages too. So we
> > not only need a valid address mask, we should make sure it is page
> > mask (for x86, it should be either 4K/2M/1G pages).
> 
> Why should we? To get better performance, right?

IMHO one point is for performance, the other point is on how we should
define the IOTLB interface. My opinion is that it is better valid
masks.

> 
> > Patch 1+2 fixes the problem. Tested with both kernel net driver or
> > testpmd, on either 4K/2M pages, to make sure the page mask is correct.
> > 
> > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll
> > definitely want patch 3 now. Here's the simplest TCP streaming test
> > using vhost dmar and iommu=pt in guest:
> > 
> >   without patch 3:    12.0Gbps
> 
> And what happens without patches 1-2?

Without 1-2, performance is good. But I think it is hacky to have such
a good result (I explained why the performance is good in the VT-d PT
support thread with some logs)...

> 
> >   with patch 3:       33.5Gbps
> 
> This is the part I don't get. Patches 1-2 will return a bigger region to
> callers. The result should be better performance - instead it seems to
> slow down vhost for some reason and we need tricks to get
> performance back. What's going on?

Yes. The problem is that if without patch 1/2 I think the codes lacks
correctness. With correctness, we lost performance, then I picked
patch 3 as well.

Again, I think the first thing we need to settle is what should be the
best definition for IOTLB (addr_mask or arbitary length).

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry()
Posted by Michael S. Tsirkin 8 years, 5 months ago
On Mon, Jun 05, 2017 at 11:20:13AM +0800, Peter Xu wrote:
> On Fri, Jun 02, 2017 at 05:51:07PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote:
> > > With the patch applied:
> > > 
> > >   [PATCH v3] exec: fix address_space_get_iotlb_entry page mask
> > >   (already in Paolo's pull request but not yet merged)
> > > 
> > > Now we can have valid address masks. However it is still not ideal,
> > > considering that the mask may not be aligned to guest page sizes. One
> > > example would be when huge page is used in guest (please see commit
> > > message in patch 1 for details). It applies to normal pages too. So we
> > > not only need a valid address mask, we should make sure it is page
> > > mask (for x86, it should be either 4K/2M/1G pages).
> > 
> > Why should we? To get better performance, right?
> 
> IMHO one point is for performance, the other point is on how we should
> define the IOTLB interface. My opinion is that it is better valid
> masks.
> 
> > 
> > > Patch 1+2 fixes the problem. Tested with both kernel net driver or
> > > testpmd, on either 4K/2M pages, to make sure the page mask is correct.
> > > 
> > > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll
> > > definitely want patch 3 now. Here's the simplest TCP streaming test
> > > using vhost dmar and iommu=pt in guest:
> > > 
> > >   without patch 3:    12.0Gbps
> > 
> > And what happens without patches 1-2?
> 
> Without 1-2, performance is good. But I think it is hacky to have such
> a good result (I explained why the performance is good in the VT-d PT
> support thread with some logs)...
> 
> > 
> > >   with patch 3:       33.5Gbps
> > 
> > This is the part I don't get. Patches 1-2 will return a bigger region to
> > callers. The result should be better performance - instead it seems to
> > slow down vhost for some reason and we need tricks to get
> > performance back. What's going on?
> 
> Yes. The problem is that if without patch 1/2 I think the codes lacks
> correctness. With correctness, we lost performance, then I picked
> patch 3 as well.
> 
> Again, I think the first thing we need to settle is what should be the
> best definition for IOTLB (addr_mask or arbitary length).
> 
> Thanks,

If arbitary length means we don't require prefaulting hacks,
I'm for using arbitary length.


> -- 
> Peter Xu