[Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code

Yi Min Zhao posted 3 patches 6 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180205072258.5968-1-zyimin@linux.vnet.ibm.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppc passed
Test s390x passed
hw/s390x/s390-pci-bus.c  | 233 ++++++++++++++++++++++++++++++++++++++---------
hw/s390x/s390-pci-bus.h  |  17 ++++
hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
3 files changed, 275 insertions(+), 78 deletions(-)
[Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
Posted by Yi Min Zhao 6 years, 2 months ago
This series contains three patches,
1) optimizes the code including walking DMA tables and rpcit handler
2) fixes the issue caused by IOTLB global refresh
3) uses the right pal and pba when registering ioat

The issue mentioned above was found when we tested SMC-r tools. This
behavior has been introduced when linux guest started using a global
refresh to purge the whole IOTLB of invalid entries in a lazy fashion
instead of flushing each entry when invalidating table entries.

The previous QEMU implementation didn't keep track of the mapping,
didn't handle correctly the global flush demand from the guest and a
major part of the IOTLB entries were not flushed.

Consequently linux kernel on the host keeping the previous mapping
reports, as it should, -EEXIST DMA mapping error on the next mapping
with the same IOVA. The second patch fixes this issue.

During the investigation, we noticed that the current code walking
PCI IOMMU page tables didn't check important flags of table entries,
including:
1) protection bit
2) table length
3) table offset
4) intermediate tables' invalid bit
5) format control bit

We implement the checking in the first patch before handling the
IOTLB global refresh issue. To keep track of the mapped IOTLB entries
and be able to check if the host IOTLB entries need to be refreshed
we implement a IOTLB cache in QEMU, and introduce some helper
functions to check these bits. All S390IOTLBEntry instances are stored
in a new hashtable which are indexed by IOVA. Each PCI device has its
own IOMMU. Therefore each IOMMU also has its own hashtable caching
corresponding PCI device's DMA entries. Finally, we split 1M
contiguous DMA range into 4K pages to do DMA map, and the code about
error notification is also optimized.

Change log:
v1->v2:
1) update commit messages
2) move some changes from the 2nd patch to the 1st patch
3) define macros for 'ett' in the 1st patch

Yi Min Zhao (3):
  s390x/pci: fixup the code walking IOMMU tables
  s390x/pci: fixup global refresh
  s390x/pci: use the right pal and pba in reg_ioat()

 hw/s390x/s390-pci-bus.c  | 233 ++++++++++++++++++++++++++++++++++++++---------
 hw/s390x/s390-pci-bus.h  |  17 ++++
 hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
 3 files changed, 275 insertions(+), 78 deletions(-)

-- 
2.14.3 (Apple Git-98)


Re: [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
Posted by Cornelia Huck 6 years, 2 months ago
On Mon,  5 Feb 2018 15:22:55 +0800
Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:

> This series contains three patches,
> 1) optimizes the code including walking DMA tables and rpcit handler
> 2) fixes the issue caused by IOTLB global refresh
> 3) uses the right pal and pba when registering ioat
> 
> The issue mentioned above was found when we tested SMC-r tools. This
> behavior has been introduced when linux guest started using a global
> refresh to purge the whole IOTLB of invalid entries in a lazy fashion
> instead of flushing each entry when invalidating table entries.
> 
> The previous QEMU implementation didn't keep track of the mapping,
> didn't handle correctly the global flush demand from the guest and a
> major part of the IOTLB entries were not flushed.
> 
> Consequently linux kernel on the host keeping the previous mapping
> reports, as it should, -EEXIST DMA mapping error on the next mapping
> with the same IOVA. The second patch fixes this issue.

Introducing a local tracking mechanism still feels a bit awkward to me
(even though it works, of course). If nobody else needs such a thing,
our best choice is to do it like that, though.

> 
> During the investigation, we noticed that the current code walking
> PCI IOMMU page tables didn't check important flags of table entries,
> including:
> 1) protection bit
> 2) table length
> 3) table offset
> 4) intermediate tables' invalid bit
> 5) format control bit
> 
> We implement the checking in the first patch before handling the
> IOTLB global refresh issue. To keep track of the mapped IOTLB entries
> and be able to check if the host IOTLB entries need to be refreshed
> we implement a IOTLB cache in QEMU, and introduce some helper
> functions to check these bits. All S390IOTLBEntry instances are stored
> in a new hashtable which are indexed by IOVA. Each PCI device has its
> own IOMMU. Therefore each IOMMU also has its own hashtable caching
> corresponding PCI device's DMA entries. Finally, we split 1M
> contiguous DMA range into 4K pages to do DMA map, and the code about
> error notification is also optimized.
> 
> Change log:
> v1->v2:
> 1) update commit messages
> 2) move some changes from the 2nd patch to the 1st patch
> 3) define macros for 'ett' in the 1st patch
> 
> Yi Min Zhao (3):
>   s390x/pci: fixup the code walking IOMMU tables
>   s390x/pci: fixup global refresh
>   s390x/pci: use the right pal and pba in reg_ioat()
> 
>  hw/s390x/s390-pci-bus.c  | 233 ++++++++++++++++++++++++++++++++++++++---------
>  hw/s390x/s390-pci-bus.h  |  17 ++++
>  hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
>  3 files changed, 275 insertions(+), 78 deletions(-)
> 

I have played with these patches and some virtio-pci devices (since I
don't have access to real zpci cards), and it worked both under kvm and
under tcg. So I'm inclined to apply this (I can't review further due to
missing documentation), unless the pci folks have further comments.

Re: [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
Posted by Yi Min Zhao 6 years, 2 months ago

在 2018/2/6 下午6:23, Cornelia Huck 写道:
> On Mon,  5 Feb 2018 15:22:55 +0800
> Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
>
>> This series contains three patches,
>> 1) optimizes the code including walking DMA tables and rpcit handler
>> 2) fixes the issue caused by IOTLB global refresh
>> 3) uses the right pal and pba when registering ioat
>>
>> The issue mentioned above was found when we tested SMC-r tools. This
>> behavior has been introduced when linux guest started using a global
>> refresh to purge the whole IOTLB of invalid entries in a lazy fashion
>> instead of flushing each entry when invalidating table entries.
>>
>> The previous QEMU implementation didn't keep track of the mapping,
>> didn't handle correctly the global flush demand from the guest and a
>> major part of the IOTLB entries were not flushed.
>>
>> Consequently linux kernel on the host keeping the previous mapping
>> reports, as it should, -EEXIST DMA mapping error on the next mapping
>> with the same IOVA. The second patch fixes this issue.
> Introducing a local tracking mechanism still feels a bit awkward to me
> (even though it works, of course). If nobody else needs such a thing,
> our best choice is to do it like that, though.
Caching iotlb entries is also helpful for us to support 2G mapping in 
future.
>
>> During the investigation, we noticed that the current code walking
>> PCI IOMMU page tables didn't check important flags of table entries,
>> including:
>> 1) protection bit
>> 2) table length
>> 3) table offset
>> 4) intermediate tables' invalid bit
>> 5) format control bit
>>
>> We implement the checking in the first patch before handling the
>> IOTLB global refresh issue. To keep track of the mapped IOTLB entries
>> and be able to check if the host IOTLB entries need to be refreshed
>> we implement a IOTLB cache in QEMU, and introduce some helper
>> functions to check these bits. All S390IOTLBEntry instances are stored
>> in a new hashtable which are indexed by IOVA. Each PCI device has its
>> own IOMMU. Therefore each IOMMU also has its own hashtable caching
>> corresponding PCI device's DMA entries. Finally, we split 1M
>> contiguous DMA range into 4K pages to do DMA map, and the code about
>> error notification is also optimized.
>>
>> Change log:
>> v1->v2:
>> 1) update commit messages
>> 2) move some changes from the 2nd patch to the 1st patch
>> 3) define macros for 'ett' in the 1st patch
>>
>> Yi Min Zhao (3):
>>    s390x/pci: fixup the code walking IOMMU tables
>>    s390x/pci: fixup global refresh
>>    s390x/pci: use the right pal and pba in reg_ioat()
>>
>>   hw/s390x/s390-pci-bus.c  | 233 ++++++++++++++++++++++++++++++++++++++---------
>>   hw/s390x/s390-pci-bus.h  |  17 ++++
>>   hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
>>   3 files changed, 275 insertions(+), 78 deletions(-)
>>
> I have played with these patches and some virtio-pci devices (since I
> don't have access to real zpci cards), and it worked both under kvm and
> under tcg. So I'm inclined to apply this (I can't review further due to
> missing documentation), unless the pci folks have further comments.
Thanks!
>
>


Re: [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
Posted by Cornelia Huck 6 years, 2 months ago
On Wed, 7 Feb 2018 10:56:01 +0800
Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:

> 在 2018/2/6 下午6:23, Cornelia Huck 写道:
> > On Mon,  5 Feb 2018 15:22:55 +0800
> > Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
> >  
> >> This series contains three patches,
> >> 1) optimizes the code including walking DMA tables and rpcit handler
> >> 2) fixes the issue caused by IOTLB global refresh
> >> 3) uses the right pal and pba when registering ioat
> >>
> >> The issue mentioned above was found when we tested SMC-r tools. This
> >> behavior has been introduced when linux guest started using a global
> >> refresh to purge the whole IOTLB of invalid entries in a lazy fashion
> >> instead of flushing each entry when invalidating table entries.
> >>
> >> The previous QEMU implementation didn't keep track of the mapping,
> >> didn't handle correctly the global flush demand from the guest and a
> >> major part of the IOTLB entries were not flushed.
> >>
> >> Consequently linux kernel on the host keeping the previous mapping
> >> reports, as it should, -EEXIST DMA mapping error on the next mapping
> >> with the same IOVA. The second patch fixes this issue.  
> > Introducing a local tracking mechanism still feels a bit awkward to me
> > (even though it works, of course). If nobody else needs such a thing,
> > our best choice is to do it like that, though.  
> Caching iotlb entries is also helpful for us to support 2G mapping in 
> future.

OK, that makes sense.

> >  
> >> During the investigation, we noticed that the current code walking
> >> PCI IOMMU page tables didn't check important flags of table entries,
> >> including:
> >> 1) protection bit
> >> 2) table length
> >> 3) table offset
> >> 4) intermediate tables' invalid bit
> >> 5) format control bit
> >>
> >> We implement the checking in the first patch before handling the
> >> IOTLB global refresh issue. To keep track of the mapped IOTLB entries
> >> and be able to check if the host IOTLB entries need to be refreshed
> >> we implement a IOTLB cache in QEMU, and introduce some helper
> >> functions to check these bits. All S390IOTLBEntry instances are stored
> >> in a new hashtable which are indexed by IOVA. Each PCI device has its
> >> own IOMMU. Therefore each IOMMU also has its own hashtable caching
> >> corresponding PCI device's DMA entries. Finally, we split 1M
> >> contiguous DMA range into 4K pages to do DMA map, and the code about
> >> error notification is also optimized.
> >>
> >> Change log:
> >> v1->v2:
> >> 1) update commit messages
> >> 2) move some changes from the 2nd patch to the 1st patch
> >> 3) define macros for 'ett' in the 1st patch
> >>
> >> Yi Min Zhao (3):
> >>    s390x/pci: fixup the code walking IOMMU tables
> >>    s390x/pci: fixup global refresh
> >>    s390x/pci: use the right pal and pba in reg_ioat()
> >>
> >>   hw/s390x/s390-pci-bus.c  | 233 ++++++++++++++++++++++++++++++++++++++---------
> >>   hw/s390x/s390-pci-bus.h  |  17 ++++
> >>   hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
> >>   3 files changed, 275 insertions(+), 78 deletions(-)
> >>  
> > I have played with these patches and some virtio-pci devices (since I
> > don't have access to real zpci cards), and it worked both under kvm and
> > under tcg. So I'm inclined to apply this (I can't review further due to
> > missing documentation), unless the pci folks have further comments.  
> Thanks!

Applied to s390-next, thanks.