On 24.08.2021 16:13, Jan Beulich wrote:
> For a long time we've been rather inefficient with IOMMU page table
> management when not sharing page tables, i.e. in particular for PV (and
> further specifically also for PV Dom0) and AMD (where nowadays we never
> share page tables). While up to about 2.5 years ago AMD code had logic
> to un-shatter page mappings, that logic was ripped out for being buggy
> (XSA-275 plus follow-on).
>
> This series enables use of large pages in AMD and Intel (VT-d) code;
> Arm is presently not in need of any enabling as pagetables are always
> shared there. It also augments PV Dom0 creation with suitable explicit
> IOMMU mapping calls to facilitate use of large pages there without
> getting into the business of un-shattering page mappings just yet.
> Depending on the amount of memory handed to Dom0 this improves booting
> time (latency until Dom0 actually starts) quite a bit; subsequent
> shattering of some of the large pages may of course consume some of the
> saved time.
>
> Parts of this series functionally depend on the previously submitted
> "VT-d: fix caching mode IOTLB flushing".
>
> Known fallout has been spelled out here:
> https://lists.xen.org/archives/html/xen-devel/2021-08/msg00781.html
>
> I'm inclined to say "of course" there are also various seemingly
> unrelated changes included here, which I just came to consider necessary
> or at least desirable (in part for having been in need of adjustment for
> a long time) along the way. Some of these changes are likely independent
> of the bulk of the work here, and hence may be fine to go in ahead of
> earlier patches.
>
> While, as said above, un-shattering of mappings isn't an immediate goal,
> I intend to at least arrange for freeing page tables which have ended up
> all empty. But that's not part of this v1 of the series.
>
> 01: AMD/IOMMU: avoid recording each level's MFN when walking page table
> 02: AMD/IOMMU: have callers specify the target level for page table walks
> 03: VT-d: have callers specify the target level for page table walks
> 04: IOMMU: have vendor code announce supported page sizes
> 05: IOMMU: add order parameter to ->{,un}map_page() hooks
> 06: IOMMU: have iommu_{,un}map() split requests into largest possible chunks
> 07: IOMMU/x86: restrict IO-APIC mappings for PV Dom0
> 08: IOMMU/x86: perform PV Dom0 mappings in batches
> 09: IOMMU/x86: support freeing of pagetables
> 10: AMD/IOMMU: drop stray TLB flush
> 11: AMD/IOMMU: walk trees upon page fault
> 12: AMD/IOMMU: return old PTE from {set,clear}_iommu_pte_present()
> 13: AMD/IOMMU: allow use of superpage mappings
> 14: VT-d: allow use of superpage mappings
I should probably spell this out explicitly: These last two, which actually
enable use of superpages, and which hence introduce the need to shatter
some, comes with the risk of tripping code like this
ret = xenmem_reservation_decrease(nr_pages, frame_list);
BUG_ON(ret != nr_pages);
still found e.g. in up-to-date Linux. This is - similar to the other
fallout description pointed at further up - a result of now potentially
needing to allocate memory in order to free some, and perhaps the needed
amount being higher than the freed one.
Jan