xen/drivers/passthrough/vtd/iommu.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
When translating an address that falls inside of a superpage in the
IOMMU page tables the fetching of the PTE physical address field
wasn't using dma_pte_addr(), which caused the returned data to be
corrupt as it would contain bits not related to the address field.
Fix this by re-using the value of pte_maddr which is obtained using
dma_pte_addr(). Such change requires adjusting the previous error
path to zero pte_maddr.
Fixes: c71e55501a61 ('VT-d: have callers specify the target level for page table walks')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/drivers/passthrough/vtd/iommu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 130a159cde07..819e996e6269 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
if ( !alloc )
{
- pte_maddr = 0;
if ( !dma_pte_present(*pte) )
+ {
+ pte_maddr = 0;
break;
+ }
/*
* When the leaf entry was requested, pass back the full PTE,
* with the address adjusted to account for the residual of
* the walk.
*/
- pte_maddr = pte->val +
+ pte_maddr +=
(addr & ((1UL << level_to_offset_bits(level)) - 1) &
PAGE_MASK);
if ( !target )
--
2.40.0
On 09.05.2023 12:41, Roger Pau Monne wrote:
> When translating an address that falls inside of a superpage in the
> IOMMU page tables the fetching of the PTE physical address field
> wasn't using dma_pte_addr(), which caused the returned data to be
> corrupt as it would contain bits not related to the address field.
I'm afraid I don't understand:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
>
> if ( !alloc )
> {
> - pte_maddr = 0;
> if ( !dma_pte_present(*pte) )
> + {
> + pte_maddr = 0;
> break;
> + }
>
> /*
> * When the leaf entry was requested, pass back the full PTE,
> * with the address adjusted to account for the residual of
> * the walk.
> */
> - pte_maddr = pte->val +
> + pte_maddr +=
> (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> PAGE_MASK);
With this change you're now violating what the comment says (plus what
the comment ahead of the function says). And it says what it says for
a reason - see intel_iommu_lookup_page(), which I think your change is
breaking.
Note also the following code:
if ( !target )
break;
}
pte_maddr = level - 1;
IOW the local variable is overwritten right away unless target == 0.
Jan
On Tue, May 09, 2023 at 06:06:45PM +0200, Jan Beulich wrote:
> On 09.05.2023 12:41, Roger Pau Monne wrote:
> > When translating an address that falls inside of a superpage in the
> > IOMMU page tables the fetching of the PTE physical address field
> > wasn't using dma_pte_addr(), which caused the returned data to be
> > corrupt as it would contain bits not related to the address field.
>
> I'm afraid I don't understand:
>
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
> >
> > if ( !alloc )
> > {
> > - pte_maddr = 0;
> > if ( !dma_pte_present(*pte) )
> > + {
> > + pte_maddr = 0;
> > break;
> > + }
> >
> > /*
> > * When the leaf entry was requested, pass back the full PTE,
> > * with the address adjusted to account for the residual of
> > * the walk.
> > */
> > - pte_maddr = pte->val +
> > + pte_maddr +=
> > (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> > PAGE_MASK);
>
> With this change you're now violating what the comment says (plus what
> the comment ahead of the function says). And it says what it says for
> a reason - see intel_iommu_lookup_page(), which I think your change is
> breaking.
Hm, but the code in intel_iommu_lookup_page() is now wrong as it takes
the bits in DMA_PTE_CONTIG_MASK as part of the physical address when
doing the conversion to mfn? maddr_to_mfn() doesn't perform a any
masking to remove the bits above PADDR_BITS.
Thanks, Roger.
On 10.05.2023 10:27, Roger Pau Monné wrote:
> On Tue, May 09, 2023 at 06:06:45PM +0200, Jan Beulich wrote:
>> On 09.05.2023 12:41, Roger Pau Monne wrote:
>>> When translating an address that falls inside of a superpage in the
>>> IOMMU page tables the fetching of the PTE physical address field
>>> wasn't using dma_pte_addr(), which caused the returned data to be
>>> corrupt as it would contain bits not related to the address field.
>>
>> I'm afraid I don't understand:
>>
>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>> @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
>>>
>>> if ( !alloc )
>>> {
>>> - pte_maddr = 0;
>>> if ( !dma_pte_present(*pte) )
>>> + {
>>> + pte_maddr = 0;
>>> break;
>>> + }
>>>
>>> /*
>>> * When the leaf entry was requested, pass back the full PTE,
>>> * with the address adjusted to account for the residual of
>>> * the walk.
>>> */
>>> - pte_maddr = pte->val +
>>> + pte_maddr +=
>>> (addr & ((1UL << level_to_offset_bits(level)) - 1) &
>>> PAGE_MASK);
>>
>> With this change you're now violating what the comment says (plus what
>> the comment ahead of the function says). And it says what it says for
>> a reason - see intel_iommu_lookup_page(), which I think your change is
>> breaking.
>
> Hm, but the code in intel_iommu_lookup_page() is now wrong as it takes
> the bits in DMA_PTE_CONTIG_MASK as part of the physical address when
> doing the conversion to mfn? maddr_to_mfn() doesn't perform a any
> masking to remove the bits above PADDR_BITS.
Oh, right. But that's a missing dma_pte_addr() in intel_iommu_lookup_page()
then. (It would likely be better anyway to switch "uint64_t val" to
"struct dma_pte pte" there, to make more visible that it's a PTE we're
dealing with.) I indeed overlooked this aspect when doing the earlier
change.
Jan
On Wed, May 10, 2023 at 12:00:51PM +0200, Jan Beulich wrote:
> On 10.05.2023 10:27, Roger Pau Monné wrote:
> > On Tue, May 09, 2023 at 06:06:45PM +0200, Jan Beulich wrote:
> >> On 09.05.2023 12:41, Roger Pau Monne wrote:
> >>> When translating an address that falls inside of a superpage in the
> >>> IOMMU page tables the fetching of the PTE physical address field
> >>> wasn't using dma_pte_addr(), which caused the returned data to be
> >>> corrupt as it would contain bits not related to the address field.
> >>
> >> I'm afraid I don't understand:
> >>
> >>> --- a/xen/drivers/passthrough/vtd/iommu.c
> >>> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >>> @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
> >>>
> >>> if ( !alloc )
> >>> {
> >>> - pte_maddr = 0;
> >>> if ( !dma_pte_present(*pte) )
> >>> + {
> >>> + pte_maddr = 0;
> >>> break;
> >>> + }
> >>>
> >>> /*
> >>> * When the leaf entry was requested, pass back the full PTE,
> >>> * with the address adjusted to account for the residual of
> >>> * the walk.
> >>> */
> >>> - pte_maddr = pte->val +
> >>> + pte_maddr +=
> >>> (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> >>> PAGE_MASK);
> >>
> >> With this change you're now violating what the comment says (plus what
> >> the comment ahead of the function says). And it says what it says for
> >> a reason - see intel_iommu_lookup_page(), which I think your change is
> >> breaking.
> >
> > Hm, but the code in intel_iommu_lookup_page() is now wrong as it takes
> > the bits in DMA_PTE_CONTIG_MASK as part of the physical address when
> > doing the conversion to mfn? maddr_to_mfn() doesn't perform a any
> > masking to remove the bits above PADDR_BITS.
>
> Oh, right. But that's a missing dma_pte_addr() in intel_iommu_lookup_page()
> then. (It would likely be better anyway to switch "uint64_t val" to
> "struct dma_pte pte" there, to make more visible that it's a PTE we're
> dealing with.) I indeed overlooked this aspect when doing the earlier
> change.
I guess I'm still confused, as the other return value for target == 0
(when the address is not part of a superpage) does return
dma_pte_addr(pte). I think that needs further fixing then.
Thanks, Roger.
On 10.05.2023 12:22, Roger Pau Monné wrote:
> On Wed, May 10, 2023 at 12:00:51PM +0200, Jan Beulich wrote:
>> On 10.05.2023 10:27, Roger Pau Monné wrote:
>>> On Tue, May 09, 2023 at 06:06:45PM +0200, Jan Beulich wrote:
>>>> On 09.05.2023 12:41, Roger Pau Monne wrote:
>>>>> When translating an address that falls inside of a superpage in the
>>>>> IOMMU page tables the fetching of the PTE physical address field
>>>>> wasn't using dma_pte_addr(), which caused the returned data to be
>>>>> corrupt as it would contain bits not related to the address field.
>>>>
>>>> I'm afraid I don't understand:
>>>>
>>>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>>>> @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
>>>>>
>>>>> if ( !alloc )
>>>>> {
>>>>> - pte_maddr = 0;
>>>>> if ( !dma_pte_present(*pte) )
>>>>> + {
>>>>> + pte_maddr = 0;
>>>>> break;
>>>>> + }
>>>>>
>>>>> /*
>>>>> * When the leaf entry was requested, pass back the full PTE,
>>>>> * with the address adjusted to account for the residual of
>>>>> * the walk.
>>>>> */
>>>>> - pte_maddr = pte->val +
>>>>> + pte_maddr +=
>>>>> (addr & ((1UL << level_to_offset_bits(level)) - 1) &
>>>>> PAGE_MASK);
>>>>
>>>> With this change you're now violating what the comment says (plus what
>>>> the comment ahead of the function says). And it says what it says for
>>>> a reason - see intel_iommu_lookup_page(), which I think your change is
>>>> breaking.
>>>
>>> Hm, but the code in intel_iommu_lookup_page() is now wrong as it takes
>>> the bits in DMA_PTE_CONTIG_MASK as part of the physical address when
>>> doing the conversion to mfn? maddr_to_mfn() doesn't perform a any
>>> masking to remove the bits above PADDR_BITS.
>>
>> Oh, right. But that's a missing dma_pte_addr() in intel_iommu_lookup_page()
>> then. (It would likely be better anyway to switch "uint64_t val" to
>> "struct dma_pte pte" there, to make more visible that it's a PTE we're
>> dealing with.) I indeed overlooked this aspect when doing the earlier
>> change.
>
> I guess I'm still confused, as the other return value for target == 0
> (when the address is not part of a superpage) does return
> dma_pte_addr(pte). I think that needs further fixing then.
Hmm, indeed. But I think it's worse than this: addr_to_dma_page_maddr()
also does one too many iterations in that case. All "normal" callers
supply a positive "target". We need to terminate the walk at level 1
also when target == 0.
Jan
On Wed, May 10, 2023 at 03:30:21PM +0200, Jan Beulich wrote:
> On 10.05.2023 12:22, Roger Pau Monné wrote:
> > On Wed, May 10, 2023 at 12:00:51PM +0200, Jan Beulich wrote:
> >> On 10.05.2023 10:27, Roger Pau Monné wrote:
> >>> On Tue, May 09, 2023 at 06:06:45PM +0200, Jan Beulich wrote:
> >>>> On 09.05.2023 12:41, Roger Pau Monne wrote:
> >>>>> When translating an address that falls inside of a superpage in the
> >>>>> IOMMU page tables the fetching of the PTE physical address field
> >>>>> wasn't using dma_pte_addr(), which caused the returned data to be
> >>>>> corrupt as it would contain bits not related to the address field.
> >>>>
> >>>> I'm afraid I don't understand:
> >>>>
> >>>>> --- a/xen/drivers/passthrough/vtd/iommu.c
> >>>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >>>>> @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
> >>>>>
> >>>>> if ( !alloc )
> >>>>> {
> >>>>> - pte_maddr = 0;
> >>>>> if ( !dma_pte_present(*pte) )
> >>>>> + {
> >>>>> + pte_maddr = 0;
> >>>>> break;
> >>>>> + }
> >>>>>
> >>>>> /*
> >>>>> * When the leaf entry was requested, pass back the full PTE,
> >>>>> * with the address adjusted to account for the residual of
> >>>>> * the walk.
> >>>>> */
> >>>>> - pte_maddr = pte->val +
> >>>>> + pte_maddr +=
> >>>>> (addr & ((1UL << level_to_offset_bits(level)) - 1) &
> >>>>> PAGE_MASK);
> >>>>
> >>>> With this change you're now violating what the comment says (plus what
> >>>> the comment ahead of the function says). And it says what it says for
> >>>> a reason - see intel_iommu_lookup_page(), which I think your change is
> >>>> breaking.
> >>>
> >>> Hm, but the code in intel_iommu_lookup_page() is now wrong as it takes
> >>> the bits in DMA_PTE_CONTIG_MASK as part of the physical address when
> >>> doing the conversion to mfn? maddr_to_mfn() doesn't perform a any
> >>> masking to remove the bits above PADDR_BITS.
> >>
> >> Oh, right. But that's a missing dma_pte_addr() in intel_iommu_lookup_page()
> >> then. (It would likely be better anyway to switch "uint64_t val" to
> >> "struct dma_pte pte" there, to make more visible that it's a PTE we're
> >> dealing with.) I indeed overlooked this aspect when doing the earlier
> >> change.
> >
> > I guess I'm still confused, as the other return value for target == 0
> > (when the address is not part of a superpage) does return
> > dma_pte_addr(pte). I think that needs further fixing then.
>
> Hmm, indeed. But I think it's worse than this: addr_to_dma_page_maddr()
> also does one too many iterations in that case. All "normal" callers
> supply a positive "target". We need to terminate the walk at level 1
> also when target == 0.
Don't we do that already due to the following check:
if ( --level == target )
break;
Which prevents mapping the PTE address as a page table directory?
Thanks, Roger.
On 10.05.2023 17:19, Roger Pau Monné wrote:
> On Wed, May 10, 2023 at 03:30:21PM +0200, Jan Beulich wrote:
>> On 10.05.2023 12:22, Roger Pau Monné wrote:
>>> On Wed, May 10, 2023 at 12:00:51PM +0200, Jan Beulich wrote:
>>>> On 10.05.2023 10:27, Roger Pau Monné wrote:
>>>>> On Tue, May 09, 2023 at 06:06:45PM +0200, Jan Beulich wrote:
>>>>>> On 09.05.2023 12:41, Roger Pau Monne wrote:
>>>>>>> When translating an address that falls inside of a superpage in the
>>>>>>> IOMMU page tables the fetching of the PTE physical address field
>>>>>>> wasn't using dma_pte_addr(), which caused the returned data to be
>>>>>>> corrupt as it would contain bits not related to the address field.
>>>>>>
>>>>>> I'm afraid I don't understand:
>>>>>>
>>>>>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>>>>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>>>>>> @@ -359,16 +359,18 @@ static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr,
>>>>>>>
>>>>>>> if ( !alloc )
>>>>>>> {
>>>>>>> - pte_maddr = 0;
>>>>>>> if ( !dma_pte_present(*pte) )
>>>>>>> + {
>>>>>>> + pte_maddr = 0;
>>>>>>> break;
>>>>>>> + }
>>>>>>>
>>>>>>> /*
>>>>>>> * When the leaf entry was requested, pass back the full PTE,
>>>>>>> * with the address adjusted to account for the residual of
>>>>>>> * the walk.
>>>>>>> */
>>>>>>> - pte_maddr = pte->val +
>>>>>>> + pte_maddr +=
>>>>>>> (addr & ((1UL << level_to_offset_bits(level)) - 1) &
>>>>>>> PAGE_MASK);
>>>>>>
>>>>>> With this change you're now violating what the comment says (plus what
>>>>>> the comment ahead of the function says). And it says what it says for
>>>>>> a reason - see intel_iommu_lookup_page(), which I think your change is
>>>>>> breaking.
>>>>>
>>>>> Hm, but the code in intel_iommu_lookup_page() is now wrong as it takes
>>>>> the bits in DMA_PTE_CONTIG_MASK as part of the physical address when
>>>>> doing the conversion to mfn? maddr_to_mfn() doesn't perform a any
>>>>> masking to remove the bits above PADDR_BITS.
>>>>
>>>> Oh, right. But that's a missing dma_pte_addr() in intel_iommu_lookup_page()
>>>> then. (It would likely be better anyway to switch "uint64_t val" to
>>>> "struct dma_pte pte" there, to make more visible that it's a PTE we're
>>>> dealing with.) I indeed overlooked this aspect when doing the earlier
>>>> change.
>>>
>>> I guess I'm still confused, as the other return value for target == 0
>>> (when the address is not part of a superpage) does return
>>> dma_pte_addr(pte). I think that needs further fixing then.
>>
>> Hmm, indeed. But I think it's worse than this: addr_to_dma_page_maddr()
>> also does one too many iterations in that case. All "normal" callers
>> supply a positive "target". We need to terminate the walk at level 1
>> also when target == 0.
>
> Don't we do that already due to the following check:
>
> if ( --level == target )
> break;
>
> Which prevents mapping the PTE address as a page table directory?
I don't think this is enough - this code, afaict right now, is only
sufficient when target >= 1.
Jan
© 2016 - 2026 Red Hat, Inc.