drivers/iommu/intel/iommu.c | 154 ++++++++++++++++++++++++++++-------- drivers/iommu/intel/iommu.h | 42 ++++++++-- drivers/iommu/iommu-pages.h | 30 +++++-- 3 files changed, 180 insertions(+), 46 deletions(-)
Changelog ================================================================ v2: Use mapcount instead of refcount Synchronized with IOMMU Observability changes. ================================================================ This series frees empty page tables on unmaps. It intends to be a low overhead feature. The read-writer lock is used to synchronize page table, but most of time the lock is held is reader. It is held as a writer for short period of time when unmapping a page that is bigger than the current iova request. For all other cases this lock is read-only. page->mapcount is used in order to track number of entries at each page table. Microbenchmark data using iova_stress[1]: Base: $ ./iova_stress -s 16 dma_size: 4K iova space: 16T iommu: ~ 32847M time: 36.074s Fix: $ ./iova_stress -s 16 dma_size: 4K iova space: 16T iommu: ~ 27M time: 38.870s The test maps/unmaps 4K pages and cycles through the IOVA space in a tight loop. Base uses 32G of memory, and test completes in 36.074s Fix uses 0G of memory, and test completes in 38.870s. I believe the proposed fix is a good compromise in terms of complexity/ scalability. A more scalable solution would be to spread read/writer lock per-page table, and user page->private field to store the lock itself. However, since iommu already has some protection: i.e. no-one touches the iova space of the request map/unmap we can avoid the extra complexity and rely on a single per page table RW lock, and be in a reader mode most of the time. [1] https://github.com/soleen/iova_stress Pasha Tatashin (3): iommu/intel: Use page->_mapcount to count number of entries in IOMMU iommu/intel: synchronize page table map and unmap operations iommu/intel: free empty page tables on unmaps drivers/iommu/intel/iommu.c | 154 ++++++++++++++++++++++++++++-------- drivers/iommu/intel/iommu.h | 42 ++++++++-- drivers/iommu/iommu-pages.h | 30 +++++-- 3 files changed, 180 insertions(+), 46 deletions(-) -- 2.44.0.769.g3c40516874-goog
On 26.04.24 05:43, Pasha Tatashin wrote: > Changelog > ================================================================ > v2: Use mapcount instead of refcount > Synchronized with IOMMU Observability changes. > ================================================================ > > This series frees empty page tables on unmaps. It intends to be a > low overhead feature. > > The read-writer lock is used to synchronize page table, but most of > time the lock is held is reader. It is held as a writer for short > period of time when unmapping a page that is bigger than the current > iova request. For all other cases this lock is read-only. > > page->mapcount is used in order to track number of entries at each page > table. I'm wondering if this will conflict with page_type at some point? We're already converting other page table users to ptdesc. CCing Willy. -- Cheers, David / dhildenb
On Fri, Apr 26, 2024 at 2:42 AM David Hildenbrand <david@redhat.com> wrote: > > On 26.04.24 05:43, Pasha Tatashin wrote: > > Changelog > > ================================================================ > > v2: Use mapcount instead of refcount > > Synchronized with IOMMU Observability changes. > > ================================================================ > > > > This series frees empty page tables on unmaps. It intends to be a > > low overhead feature. > > > > The read-writer lock is used to synchronize page table, but most of > > time the lock is held is reader. It is held as a writer for short > > period of time when unmapping a page that is bigger than the current > > iova request. For all other cases this lock is read-only. > > > > page->mapcount is used in order to track number of entries at each page > > table. > > I'm wondering if this will conflict with page_type at some point? We're > already converting other page table users to ptdesc. CCing Willy. Hi David, This contradicts with the following comment in mm_types.h: * If your page will not be mapped to userspace, you can also use the four * bytes in the mapcount union, but you must call page_mapcount_reset() * before freeing it. Thank you, Pasha
On 26.04.24 15:49, Pasha Tatashin wrote: > On Fri, Apr 26, 2024 at 2:42 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 26.04.24 05:43, Pasha Tatashin wrote: >>> Changelog >>> ================================================================ >>> v2: Use mapcount instead of refcount >>> Synchronized with IOMMU Observability changes. >>> ================================================================ >>> >>> This series frees empty page tables on unmaps. It intends to be a >>> low overhead feature. >>> >>> The read-writer lock is used to synchronize page table, but most of >>> time the lock is held is reader. It is held as a writer for short >>> period of time when unmapping a page that is bigger than the current >>> iova request. For all other cases this lock is read-only. >>> >>> page->mapcount is used in order to track number of entries at each page >>> table. >> >> I'm wondering if this will conflict with page_type at some point? We're >> already converting other page table users to ptdesc. CCing Willy. > > Hi David, Hi! > > This contradicts with the following comment in mm_types.h: > * If your page will not be mapped to userspace, you can also use the four > * bytes in the mapcount union, but you must call > page_mapcount_reset() > * before freeing it. I think the documentation is a bit outdated, because we now have page types that are: "For pages that are never mapped to userspace" which includes #define PG_table (we should update that comment, because we're now also using it for hugetlb that can be mapped to user space, which is fine.) Right now, using page->_mapcount would likely still be fine, as long as you cannot end up creating a value that would resemble a type (e.g., PG_offline could be bad). But staring at users of _mapcount and page_mapcount_reset() ... you'd be pretty much the only user of that. mm/zsmalloc.c calls page_mapcount_reset(), and I am not completely sure why ... I can see it touch page->index but not page->_mapcount. Hopefully Willy can comment. -- Cheers, David / dhildenb
On Fri, Apr 26, 2024 at 04:39:05PM +0200, David Hildenbrand wrote: > On 26.04.24 15:49, Pasha Tatashin wrote: > > On Fri, Apr 26, 2024 at 2:42 AM David Hildenbrand <david@redhat.com> wrote: > > > > > > On 26.04.24 05:43, Pasha Tatashin wrote: > > > > Changelog > > > > ================================================================ > > > > v2: Use mapcount instead of refcount > > > > Synchronized with IOMMU Observability changes. > > > > ================================================================ > > > > > > > > This series frees empty page tables on unmaps. It intends to be a > > > > low overhead feature. > > > > > > > > The read-writer lock is used to synchronize page table, but most of > > > > time the lock is held is reader. It is held as a writer for short > > > > period of time when unmapping a page that is bigger than the current > > > > iova request. For all other cases this lock is read-only. > > > > > > > > page->mapcount is used in order to track number of entries at each page > > > > table. > > > > > > I'm wondering if this will conflict with page_type at some point? We're > > > already converting other page table users to ptdesc. CCing Willy. > > > > Hi David, > > Hi! > > > > > This contradicts with the following comment in mm_types.h: > > * If your page will not be mapped to userspace, you can also use the four > > * bytes in the mapcount union, but you must call > > page_mapcount_reset() > > * before freeing it. > > I think the documentation is a bit outdated, because we now have page types > that are: "For pages that are never mapped to userspace" > > which includes > > #define PG_table > > (we should update that comment, because we're now also using it for hugetlb > that can be mapped to user space, which is fine.) > > Right now, using page->_mapcount would likely still be fine, as long as you > cannot end up creating a value that would resemble a type (e.g., PG_offline > could be bad). > > But staring at users of _mapcount and page_mapcount_reset() ... you'd be > pretty much the only user of that. > > mm/zsmalloc.c calls page_mapcount_reset(), and I am not completely sure why > ... I can see it touch page->index but not page->_mapcount. > > > Hopefully Willy can comment. I feel like I have to say "no" to Pasha far too often ;-( Agreed the documentation is out of date. I think there's a lot of space in the struct page that can be used. These are iommu page tables, not cpu page tables, so things are a bit different for them. But should they be converted to use ptdesc? Maybe! I'd suggest putting this into the union with pt_mm and pt_frag_refcount. I think it could even go in the union with pt_list, but I think I'd rather see it in the pt_mm union.
On Fri, Apr 26, 2024 at 08:39:14PM +0100, Matthew Wilcox wrote: > I think there's a lot of space in the struct page that can be used. > These are iommu page tables, not cpu page tables, so things are a bit > different for them. But should they be converted to use ptdesc? Maybe! Definately! Someday we will need more stuff in here.. Jason
© 2016 - 2024 Red Hat, Inc.