drivers/iommu/intel/iommu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
switch_to_super_page() assumes the memory range it's working on is aligned
to the target large page level. Unfortunately, __domain_mapping() doesn't
take this into account when using it, and will pass unaligned ranges
ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400,
0x4c0600], which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
to switch_to_super_page() at a 1 GiB granularity, which will in turn
free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to
switch_to_super_page() in __domain_mapping()
to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
Signed-off-by: Eugene Koira <eugkoira@amazon.com>
Cc: stable@vger.kernel.org
---
drivers/iommu/intel/iommu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9c3ab9d9f69a..dff2d895b8ab 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
unsigned long lvl_pages = lvl_to_nr_pages(level);
struct dma_pte *pte = NULL;
+ if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
+ !IS_ALIGNED(end_pfn + 1, lvl_pages)))
+ return;
+
while (start_pfn <= end_pfn) {
if (!pte)
pte = pfn_to_dma_pte(domain, start_pfn, &level,
@@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long pages_to_remove;
pteval |= DMA_PTE_LARGE_PAGE;
- pages_to_remove = min_t(unsigned long, nr_pages,
+ pages_to_remove = min_t(unsigned long,
+ round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages);
end_pfn = iov_pfn + pages_to_remove - 1;
switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
--
2.47.3
Amazon Development Center (Netherlands) B.V., Johanna Westerdijkplein 1, NL-2521 EN The Hague, Registration No. Chamber of Commerce 56869649, VAT: NL 852339859B01
On 8/26/25 22:38, Eugene Koira wrote: > switch_to_super_page() assumes the memory range it's working on is aligned > to the target large page level. Unfortunately, __domain_mapping() doesn't > take this into account when using it, and will pass unaligned ranges > ultimately freeing a PTE range larger than expected. > > Take for example a mapping with the following iov_pfn range [0x3fe400, > 0x4c0600], which should be backed by the following mappings: > > iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages > iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages > iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages > > Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] > to switch_to_super_page() at a 1 GiB granularity, which will in turn > free PTEs all the way to iov_pfn 0x4fffff. > > Mitigate this by rounding down the iov_pfn range passed to > switch_to_super_page() in __domain_mapping() > to the target large page level. > > Additionally add range alignment checks to switch_to_super_page. > > Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") > Signed-off-by: Eugene Koira<eugkoira@amazon.com> > Cc:stable@vger.kernel.org > --- > drivers/iommu/intel/iommu.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) Queued for v6.17-rc. Thank you! Thanks, baolu
On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote: > switch_to_super_page() assumes the memory range it's working on is aligned > to the target large page level. Unfortunately, __domain_mapping() doesn't > take this into account when using it, and will pass unaligned ranges > ultimately freeing a PTE range larger than expected. > > Take for example a mapping with the following iov_pfn range [0x3fe400, > 0x4c0600], which should be backed by the following mappings: The range is [0x3fe400, 0x4c0600) ? > iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages > iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages > iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages > > Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] > to switch_to_super_page() at a 1 GiB granularity, which will in turn > free PTEs all the way to iov_pfn 0x4fffff. > > Mitigate this by rounding down the iov_pfn range passed to > switch_to_super_page() in __domain_mapping() > to the target large page level. > > Additionally add range alignment checks to switch_to_super_page. > > Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") > Signed-off-by: Eugene Koira <eugkoira@amazon.com> > Cc: stable@vger.kernel.org > --- > drivers/iommu/intel/iommu.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 9c3ab9d9f69a..dff2d895b8ab 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, > unsigned long lvl_pages = lvl_to_nr_pages(level); > struct dma_pte *pte = NULL; > > + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || > + !IS_ALIGNED(end_pfn + 1, lvl_pages))) > + return; > + > while (start_pfn <= end_pfn) { > if (!pte) > pte = pfn_to_dma_pte(domain, start_pfn, &level, > @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, > unsigned long pages_to_remove; > > pteval |= DMA_PTE_LARGE_PAGE; > - pages_to_remove = min_t(unsigned long, nr_pages, > + pages_to_remove = min_t(unsigned long, > + round_down(nr_pages, lvl_pages), > nr_pte_to_next_page(pte) * lvl_pages); > end_pfn = iov_pfn + pages_to_remove - 1; > switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl); I'm mildly entertained by the fact that the *only* comment in this block of code is a lie. Would you care to address that while you're here? Maybe the comment could look something like... /* If the new mapping is eligible for a large page, then * remove all smaller pages that the existing pte at this * level references. * XX: do we even need to bother calling switch_to_super_page() * if this PTE wasn't *present* before? */ I bet it would benefit from one or two other one-line comments to make it clearer what's going on, too... But in general, I think this looks sane even though this code makes my brain hurt. Could do with a test case, in an ideal world. Maybe we can work on that as part of the generic pagetable support which is coming?
On 8/26/25 23:48, David Woodhouse wrote: > On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote: >> switch_to_super_page() assumes the memory range it's working on is aligned >> to the target large page level. Unfortunately, __domain_mapping() doesn't >> take this into account when using it, and will pass unaligned ranges >> ultimately freeing a PTE range larger than expected. >> >> Take for example a mapping with the following iov_pfn range [0x3fe400, >> 0x4c0600], which should be backed by the following mappings: > > The range is [0x3fe400, 0x4c0600) ? > >> iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages >> iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages >> iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages >> >> Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] >> to switch_to_super_page() at a 1 GiB granularity, which will in turn >> free PTEs all the way to iov_pfn 0x4fffff. >> >> Mitigate this by rounding down the iov_pfn range passed to >> switch_to_super_page() in __domain_mapping() >> to the target large page level. >> >> Additionally add range alignment checks to switch_to_super_page. >> >> Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") >> Signed-off-by: Eugene Koira <eugkoira@amazon.com> >> Cc: stable@vger.kernel.org >> --- >> drivers/iommu/intel/iommu.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index 9c3ab9d9f69a..dff2d895b8ab 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, >> unsigned long lvl_pages = lvl_to_nr_pages(level); >> struct dma_pte *pte = NULL; >> >> + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || >> + !IS_ALIGNED(end_pfn + 1, lvl_pages))) >> + return; >> + >> while (start_pfn <= end_pfn) { >> if (!pte) >> pte = pfn_to_dma_pte(domain, start_pfn, &level, >> @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, >> unsigned long pages_to_remove; >> >> pteval |= DMA_PTE_LARGE_PAGE; >> - pages_to_remove = min_t(unsigned long, nr_pages, >> + pages_to_remove = min_t(unsigned long, >> + round_down(nr_pages, lvl_pages), >> nr_pte_to_next_page(pte) * lvl_pages); >> end_pfn = iov_pfn + pages_to_remove - 1; >> switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl); > > I'm mildly entertained by the fact that the *only* comment in this > block of code is a lie. Would you care to address that while you're > here? Maybe the comment could look something like... > > /* If the new mapping is eligible for a large page, then > * remove all smaller pages that the existing pte at this > * level references. > * XX: do we even need to bother calling switch_to_super_page() > * if this PTE wasn't *present* before? > */ > > I bet it would benefit from one or two other one-line comments to make > it clearer what's going on, too... > > But in general, I think this looks sane even though this code makes my > brain hurt. Could do with a test case, in an ideal world. Maybe we can > work on that as part of the generic pagetable support which is coming? Agreed. The generic pagetable work will replace this code, so this will be removed. Therefore, we need a fix patch that can be backported before the generic pagetable for vt-d lands. Thanks, baolu
On Thu, 2025-08-28 at 14:33 +0800, Baolu Lu wrote: > On 8/26/25 23:48, David Woodhouse wrote: > > On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote: > > > switch_to_super_page() assumes the memory range it's working on is aligned > > > to the target large page level. Unfortunately, __domain_mapping() doesn't > > > take this into account when using it, and will pass unaligned ranges > > > ultimately freeing a PTE range larger than expected. > > > > > > Take for example a mapping with the following iov_pfn range [0x3fe400, > > > 0x4c0600], which should be backed by the following mappings: > > > > The range is [0x3fe400, 0x4c0600) ? > > > > > iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages > > > iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages > > > iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages > > > > > > Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] > > > to switch_to_super_page() at a 1 GiB granularity, which will in turn > > > free PTEs all the way to iov_pfn 0x4fffff. > > > > > > Mitigate this by rounding down the iov_pfn range passed to > > > switch_to_super_page() in __domain_mapping() > > > to the target large page level. > > > > > > Additionally add range alignment checks to switch_to_super_page. > > > > > > Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") > > > Signed-off-by: Eugene Koira <eugkoira@amazon.com> > > > Cc: stable@vger.kernel.org > > > --- > > > drivers/iommu/intel/iommu.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > > > index 9c3ab9d9f69a..dff2d895b8ab 100644 > > > --- a/drivers/iommu/intel/iommu.c > > > +++ b/drivers/iommu/intel/iommu.c > > > @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, > > > unsigned long lvl_pages = lvl_to_nr_pages(level); > > > struct dma_pte *pte = NULL; > > > > > > + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || > > > + !IS_ALIGNED(end_pfn + 1, lvl_pages))) > > > + return; > > > + > > > while (start_pfn <= end_pfn) { > > > if (!pte) > > > pte = pfn_to_dma_pte(domain, start_pfn, &level, > > > @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, > > > unsigned long pages_to_remove; > > > > > > pteval |= DMA_PTE_LARGE_PAGE; > > > - pages_to_remove = min_t(unsigned long, nr_pages, > > > + pages_to_remove = min_t(unsigned long, > > > + round_down(nr_pages, lvl_pages), > > > nr_pte_to_next_page(pte) * lvl_pages); > > > end_pfn = iov_pfn + pages_to_remove - 1; > > > switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl); > > > > I'm mildly entertained by the fact that the *only* comment in this > > block of code is a lie. Would you care to address that while you're > > here? Maybe the comment could look something like... > > > > /* If the new mapping is eligible for a large page, then > > * remove all smaller pages that the existing pte at this > > * level references. > > * XX: do we even need to bother calling switch_to_super_page() > > * if this PTE wasn't *present* before? > > */ > > > > I bet it would benefit from one or two other one-line comments to make > > it clearer what's going on, too... > > > > But in general, I think this looks sane even though this code makes my > > brain hurt. Could do with a test case, in an ideal world. Maybe we can > > work on that as part of the generic pagetable support which is coming? > > Agreed. The generic pagetable work will replace this code, so this will > be removed. Therefore, we need a fix patch that can be backported before > the generic pagetable for vt-d lands. Yep. And since Jason has in fact already posted the patches which *delete* all this code, I don't think I even care about fixing the comments. Eugene's patch is fine as-is. Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
On Tue Aug 26, 2025 at 2:38 PM UTC, Eugene Koira wrote: > switch_to_super_page() assumes the memory range it's working on is aligned > to the target large page level. Unfortunately, __domain_mapping() doesn't > take this into account when using it, and will pass unaligned ranges > ultimately freeing a PTE range larger than expected. > > Take for example a mapping with the following iov_pfn range [0x3fe400, > 0x4c0600], which should be backed by the following mappings: > > iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages > iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages > iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages > > Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] > to switch_to_super_page() at a 1 GiB granularity, which will in turn > free PTEs all the way to iov_pfn 0x4fffff. > > Mitigate this by rounding down the iov_pfn range passed to > switch_to_super_page() in __domain_mapping() > to the target large page level. > > Additionally add range alignment checks to switch_to_super_page. > > Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") > Signed-off-by: Eugene Koira <eugkoira@amazon.com> > Cc: stable@vger.kernel.org > --- > drivers/iommu/intel/iommu.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 9c3ab9d9f69a..dff2d895b8ab 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain, > unsigned long lvl_pages = lvl_to_nr_pages(level); > struct dma_pte *pte = NULL; > > + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) || > + !IS_ALIGNED(end_pfn + 1, lvl_pages))) It might make sense to downgrade the warning to WARN_ON_ONCE(). Other than that: Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Regards, Nicolas
© 2016 - 2025 Red Hat, Inc.