arch/x86/Kconfig | 1 + arch/x86/mm/init_64.c | 2 +- arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/pgtable.c | 12 ++++----- drivers/iommu/iommu-sva.c | 29 +++++++++++++++++++++- include/asm-generic/pgalloc.h | 18 ++++++++++++++ include/linux/iommu.h | 4 +++ include/linux/mm.h | 24 +++++++++++++++--- include/linux/page-flags.h | 46 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 3 +++ mm/pgtable-generic.c | 39 +++++++++++++++++++++++++++++ 11 files changed, 168 insertions(+), 12 deletions(-)
This proposes a fix for a security vulnerability related to IOMMU Shared Virtual Addressing (SVA). In an SVA context, an IOMMU can cache kernel page table entries. When a kernel page table page is freed and reallocated for another purpose, the IOMMU might still hold stale, incorrect entries. This can be exploited to cause a use-after-free or write-after-free condition, potentially leading to privilege escalation or data corruption. This solution introduces a deferred freeing mechanism for kernel page table pages, which provides a safe window to notify the IOMMU to invalidate its caches before the page is reused. Change log: v5: - Renamed pagetable_free_async() to pagetable_free_kernel() to avoid confusion. - Removed list_del() when the list is on the stack, as it will be freed when the function returns. - Discussed a corner case related to memory unplug of memory that was present as reserved memory at boot. Given that it's extremely rare and cannot be triggered by unprivileged users. We decided to focus our efforts on the common vfree() case and noted that corner case in the commit message. - Some cleanups. v4: - https://lore.kernel.org/linux-iommu/20250905055103.3821518-1-baolu.lu@linux.intel.com/ - Introduce a mechanism to defer the freeing of page-table pages for KVA mappings. Call iommu_sva_invalidate_kva_range() in the deferred work thread before freeing the pages. v3: - https://lore.kernel.org/linux-iommu/20250806052505.3113108-1-baolu.lu@linux.intel.com/ - iommu_sva_mms is an unbound list; iterating it in an atomic context could introduce significant latency issues. Schedule it in a kernel thread and replace the spinlock with a mutex. - Replace the static key with a normal bool; it can be brought back if data shows the benefit. - Invalidate KVA range in the flush_tlb_all() paths. - All previous reviewed-bys are preserved. Please let me know if there are any objections. v2: - https://lore.kernel.org/linux-iommu/20250709062800.651521-1-baolu.lu@linux.intel.com/ - Remove EXPORT_SYMBOL_GPL(iommu_sva_invalidate_kva_range); - Replace the mutex with a spinlock to make the interface usable in the critical regions. v1: https://lore.kernel.org/linux-iommu/20250704133056.4023816-1-baolu.lu@linux.intel.com/ Dave Hansen (6): mm: Add a ptdesc flag to mark kernel page tables mm: Actually mark kernel page table pages x86/mm: Use 'ptdesc' when freeing PMD pages mm: Introduce pure page table freeing function mm: Introduce deferred freeing for kernel page tables mm: Hook up Kconfig options for async page table freeing Lu Baolu (2): x86/mm: Use pagetable_free() iommu/sva: Invalidate stale IOTLB entries for kernel address space arch/x86/Kconfig | 1 + arch/x86/mm/init_64.c | 2 +- arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/pgtable.c | 12 ++++----- drivers/iommu/iommu-sva.c | 29 +++++++++++++++++++++- include/asm-generic/pgalloc.h | 18 ++++++++++++++ include/linux/iommu.h | 4 +++ include/linux/mm.h | 24 +++++++++++++++--- include/linux/page-flags.h | 46 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 3 +++ mm/pgtable-generic.c | 39 +++++++++++++++++++++++++++++ 11 files changed, 168 insertions(+), 12 deletions(-) -- 2.43.0
On 9/18/25 22:39, Lu Baolu wrote: > This solution introduces a deferred freeing mechanism for kernel page > table pages, which provides a safe window to notify the IOMMU to > invalidate its caches before the page is reused. I think all the activity has died down and I everyone seems happy enough with how this looks. Right? So is this something we should prod Andrew to take through the mm tree, or is it x86-specific enough it should go through tip?
I wondered why no mm folks were commenting on this. linux-mm@ was cc'd, but the _people_ on cc seem to have been almost all IOMMU and x86 folks. so I added a few mm folks... On 9/25/25 13:24, Dave Hansen wrote: > On 9/18/25 22:39, Lu Baolu wrote: >> This solution introduces a deferred freeing mechanism for kernel page >> table pages, which provides a safe window to notify the IOMMU to >> invalidate its caches before the page is reused. > > I think all the activity has died down and I everyone seems happy enough > with how this looks. Right? > > So is this something we should prod Andrew to take through the mm tree, > or is it x86-specific enough it should go through tip? Hi Folks! We've got a bug fix here that has impact on x86, mm, and IOMMU code. I know I've talked with a few of you about this along the way, but it's really thin on mm reviews, probably because mm folks haven't been cc'd. Any eyeballs on it would be appreciated! It seems like it should _probably_ go through the mm tree, although I'm happy to send it through tip if folks disagree. Diffstat for reference: arch/x86/Kconfig | 1 + arch/x86/mm/init_64.c | 2 +- arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/pgtable.c | 12 ++++----- drivers/iommu/iommu-sva.c | 29 +++++++++++++++++++++- include/asm-generic/pgalloc.h | 18 ++++++++++++++ include/linux/iommu.h | 4 +++ include/linux/mm.h | 24 +++++++++++++++--- include/linux/page-flags.h | 46 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 3 +++ mm/pgtable-generic.c | 39 +++++++++++++++++++++++++++++ 11 files changed, 168 insertions(+), 12 deletions(-)
On 10/9/2025 3:42 AM, Dave Hansen wrote: > I wondered why no mm folks were commenting on this. linux-mm@ was cc'd, > but the_people_ on cc seem to have been almost all IOMMU and x86 folks. > so I added a few mm folks... > > On 9/25/25 13:24, Dave Hansen wrote: >> On 9/18/25 22:39, Lu Baolu wrote: >>> This solution introduces a deferred freeing mechanism for kernel page >>> table pages, which provides a safe window to notify the IOMMU to >>> invalidate its caches before the page is reused. >> I think all the activity has died down and I everyone seems happy enough >> with how this looks. Right? >> >> So is this something we should prod Andrew to take through the mm tree, >> or is it x86-specific enough it should go through tip? > Hi Folks! We've got a bug fix here that has impact on x86, mm, and IOMMU > code. I know I've talked with a few of you about this along the way, but > it's really thin on mm reviews, probably because mm folks haven't been > cc'd. Any eyeballs on it would be appreciated! > > It seems like it should_probably_ go through the mm tree, although I'm > happy to send it through tip if folks disagree. Thank you all for the review comments. I have updated this series with a new version and posted it here, with the mm folks cc'ed: https://lore.kernel.org/linux-iommu/20251014130437.1090448-1-baolu.lu@linux.intel.com/ Thanks, baolu
On 08.10.25 21:42, Dave Hansen wrote: > I wondered why no mm folks were commenting on this. linux-mm@ was cc'd, > but the _people_ on cc seem to have been almost all IOMMU and x86 folks. > so I added a few mm folks... Thanks. Lately I find myself scanning linux-mm only randomly. So if it's not in my inbox, likely I won't realize easily that there is something that needs our attention. Will take a look. > > On 9/25/25 13:24, Dave Hansen wrote: >> On 9/18/25 22:39, Lu Baolu wrote: >>> This solution introduces a deferred freeing mechanism for kernel page >>> table pages, which provides a safe window to notify the IOMMU to >>> invalidate its caches before the page is reused. >> >> I think all the activity has died down and I everyone seems happy enough >> with how this looks. Right? >> >> So is this something we should prod Andrew to take through the mm tree, >> or is it x86-specific enough it should go through tip? > > Hi Folks! We've got a bug fix here that has impact on x86, mm, and IOMMU > code. I know I've talked with a few of you about this along the way, but > it's really thin on mm reviews, probably because mm folks haven't been > cc'd. Any eyeballs on it would be appreciated! > > It seems like it should _probably_ go through the mm tree, although I'm > happy to send it through tip if folks disagree. > > Diffstat for reference: > > arch/x86/Kconfig | 1 + > arch/x86/mm/init_64.c | 2 +- > arch/x86/mm/pat/set_memory.c | 2 +- > arch/x86/mm/pgtable.c | 12 ++++----- > drivers/iommu/iommu-sva.c | 29 +++++++++++++++++++++- > include/asm-generic/pgalloc.h | 18 ++++++++++++++ > include/linux/iommu.h | 4 +++ > include/linux/mm.h | 24 +++++++++++++++--- > include/linux/page-flags.h | 46 +++++++++++++++++++++++++++++++++++ > mm/Kconfig | 3 +++ > mm/pgtable-generic.c | 39 +++++++++++++++++++++++++++++ > 11 files changed, 168 insertions(+), 12 deletions(-) > -- Cheers David / dhildenb
© 2016 - 2026 Red Hat, Inc.