[PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0

Alejandro Jimenez posted 6 patches 3 weeks, 1 day ago
[PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0
Posted by Alejandro Jimenez 3 weeks, 1 day ago
The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
TV=1 does not contain a valid address translation information.  If a request
requires a table walk, the walk is terminated when this condition is
encountered.

Do not assume that addresses for a device with DTE[TV]=0 are passed through
(i.e. not remapped) and instead terminate the page table walk early.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
 hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 39 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index cf00450ebe..31d5522a62 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace *as, uint64_t *dte,
     uint64_t pte = dte[0], pte_addr, page_mask;
 
     /* make sure the DTE has TV = 1 */
-    if (pte & AMDVI_DEV_TRANSLATION_VALID) {
-        level = get_pte_translation_mode(pte);
-        if (level >= 7) {
-            trace_amdvi_mode_invalid(level, addr);
+    if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
+        /*
+         * A DTE with V=1, TV=0 does not have a valid Page Table Root Pointer.
+         * An IOMMU processing a request that requires a table walk terminates
+         * the walk when it encounters this condition. Do the same and return
+         * instead of assuming that the address is forwarded without translation
+         * i.e. the passthrough case, as it is done for the case where DTE[V]=0.
+         */
+        return;
+    }
+
+    level = get_pte_translation_mode(pte);
+    if (level >= 7) {
+        trace_amdvi_mode_invalid(level, addr);
+        return;
+    }
+    if (level == 0) {
+        goto no_remap;
+    }
+
+    /* we are at the leaf page table or page table encodes a huge page */
+    do {
+        pte_perms = amdvi_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
+            trace_amdvi_page_fault(addr);
             return;
         }
-        if (level == 0) {
-            goto no_remap;
-        }
 
-        /* we are at the leaf page table or page table encodes a huge page */
-        do {
-            pte_perms = amdvi_get_perms(pte);
-            present = pte & 1;
-            if (!present || perms != (perms & pte_perms)) {
-                amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
-                trace_amdvi_page_fault(addr);
-                return;
-            }
-
-            /* go to the next lower level */
-            pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
-            /* add offset and load pte */
-            pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
-            pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
-            if (!pte) {
-                return;
-            }
-            oldlevel = level;
-            level = get_pte_translation_mode(pte);
-        } while (level > 0 && level < 7);
-
-        if (level == 0x7) {
-            page_mask = pte_override_page_mask(pte);
-        } else {
-            page_mask = pte_get_page_mask(oldlevel);
+        /* go to the next lower level */
+        pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
+        /* add offset and load pte */
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
+        if (!pte) {
+            return;
         }
+        oldlevel = level;
+        level = get_pte_translation_mode(pte);
+    } while (level > 0 && level < 7);
 
-        /* get access permissions from pte */
-        ret->iova = addr & page_mask;
-        ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
-        ret->addr_mask = ~page_mask;
-        ret->perm = amdvi_get_perms(pte);
-        return;
+    if (level == 0x7) {
+        page_mask = pte_override_page_mask(pte);
+    } else {
+        page_mask = pte_get_page_mask(oldlevel);
     }
+
+    /* get access permissions from pte */
+    ret->iova = addr & page_mask;
+    ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
+    ret->addr_mask = ~page_mask;
+    ret->perm = amdvi_get_perms(pte);
+    return;
+
 no_remap:
     ret->iova = addr & AMDVI_PAGE_MASK_4K;
     ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
-- 
2.43.5
Re: [PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0
Posted by Arun Kodilkar, Sairaj 2 weeks ago

On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
> TV=1 does not contain a valid address translation information.  If a request
> requires a table walk, the walk is terminated when this condition is
> encountered.
> 
> Do not assume that addresses for a device with DTE[TV]=0 are passed through
> (i.e. not remapped) and instead terminate the page table walk early.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> ---
>   hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
>   1 file changed, 49 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index cf00450ebe..31d5522a62 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace *as, uint64_t *dte,
>       uint64_t pte = dte[0], pte_addr, page_mask;
>   
>       /* make sure the DTE has TV = 1 */
> -    if (pte & AMDVI_DEV_TRANSLATION_VALID) {
> -        level = get_pte_translation_mode(pte);
> -        if (level >= 7) {
> -            trace_amdvi_mode_invalid(level, addr);
> +    if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
> +        /*
> +         * A DTE with V=1, TV=0 does not have a valid Page Table Root Pointer.
> +         * An IOMMU processing a request that requires a table walk terminates
> +         * the walk when it encounters this condition. Do the same and return
> +         * instead of assuming that the address is forwarded without translation
> +         * i.e. the passthrough case, as it is done for the case where DTE[V]=0.
> +         */
Hi Alejandro,
According to AMD IOMMU specs TABLE 44 (IO_PAGE_FAULT Event Types), IOMMU
reports IO_PAGE_FAULT event when TV bit is not set in the DTE.

Hence you should use amdvi_page_fault to report the IO_PAGE_FAULT
event before returning.

Regards
Sairaj Kodilkar
> +        return;
> +    }
> +
> +    level = get_pte_translation_mode(pte);
> +    if (level >= 7) {
> +        trace_amdvi_mode_invalid(level, addr);
> +        return;
> +    }
> +    if (level == 0) {
> +        goto no_remap;
> +    }
> +
> +    /* we are at the leaf page table or page table encodes a huge page */
> +    do {
> +        pte_perms = amdvi_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> +            trace_amdvi_page_fault(addr);
>               return;
>           }
> -        if (level == 0) {
> -            goto no_remap;
> -        }
>   
> -        /* we are at the leaf page table or page table encodes a huge page */
> -        do {
> -            pte_perms = amdvi_get_perms(pte);
> -            present = pte & 1;
> -            if (!present || perms != (perms & pte_perms)) {
> -                amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> -                trace_amdvi_page_fault(addr);
> -                return;
> -            }
> -
> -            /* go to the next lower level */
> -            pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> -            /* add offset and load pte */
> -            pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> -            pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> -            if (!pte) {
> -                return;
> -            }
> -            oldlevel = level;
> -            level = get_pte_translation_mode(pte);
> -        } while (level > 0 && level < 7);
> -
> -        if (level == 0x7) {
> -            page_mask = pte_override_page_mask(pte);
> -        } else {
> -            page_mask = pte_get_page_mask(oldlevel);
> +        /* go to the next lower level */
> +        pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> +        /* add offset and load pte */
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> +        if (!pte) {
> +            return;
>           }
> +        oldlevel = level;
> +        level = get_pte_translation_mode(pte);
> +    } while (level > 0 && level < 7);
>   
> -        /* get access permissions from pte */
> -        ret->iova = addr & page_mask;
> -        ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> -        ret->addr_mask = ~page_mask;
> -        ret->perm = amdvi_get_perms(pte);
> -        return;
> +    if (level == 0x7) {
> +        page_mask = pte_override_page_mask(pte);
> +    } else {
> +        page_mask = pte_get_page_mask(oldlevel);
>       }
> +
> +    /* get access permissions from pte */
> +    ret->iova = addr & page_mask;
> +    ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> +    ret->addr_mask = ~page_mask;
> +    ret->perm = amdvi_get_perms(pte);
> +    return;
> +
>   no_remap:
>       ret->iova = addr & AMDVI_PAGE_MASK_4K;
>       ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
Re: [PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0
Posted by Alejandro Jimenez 1 week, 6 days ago
Hi Sairaj Kodilkar,

On 3/20/25 1:11 AM, Arun Kodilkar, Sairaj wrote:
> 
> 
> On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
>> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 
>> 8: V,
>> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
>> TV=1 does not contain a valid address translation information.  If a 
>> request
>> requires a table walk, the walk is terminated when this condition is
>> encountered.
>>
>> Do not assume that addresses for a device with DTE[TV]=0 are passed 
>> through
>> (i.e. not remapped) and instead terminate the page table walk early.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>> ---
>>   hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
>>   1 file changed, 49 insertions(+), 39 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index cf00450ebe..31d5522a62 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace 
>> *as, uint64_t *dte,
>>       uint64_t pte = dte[0], pte_addr, page_mask;
>>       /* make sure the DTE has TV = 1 */
>> -    if (pte & AMDVI_DEV_TRANSLATION_VALID) {
>> -        level = get_pte_translation_mode(pte);
>> -        if (level >= 7) {
>> -            trace_amdvi_mode_invalid(level, addr);
>> +    if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
>> +        /*
>> +         * A DTE with V=1, TV=0 does not have a valid Page Table Root 
>> Pointer.
>> +         * An IOMMU processing a request that requires a table walk 
>> terminates
>> +         * the walk when it encounters this condition. Do the same 
>> and return
>> +         * instead of assuming that the address is forwarded without 
>> translation
>> +         * i.e. the passthrough case, as it is done for the case 
>> where DTE[V]=0.
>> +         */
> Hi Alejandro,
> According to AMD IOMMU specs TABLE 44 (IO_PAGE_FAULT Event Types), IOMMU
> reports IO_PAGE_FAULT event when TV bit is not set in the DTE.
> 
> Hence you should use amdvi_page_fault to report the IO_PAGE_FAULT
> event before returning.

Good point, I had not considered this (and really haven't paid attention 
to the page fault reporting until now). On first impression, I tend to 
agree with your observation and will include in on v2. That being said...

The current role of amdvi_page_walk() is to be a helper for the 
translate() method and ultimately the IOMMU replay() functionality. 
These are needed for emulation but do not necessarily correspond 1:1 
with guest-initiated operations. e.g. VFIO code will call 
memory_region_iommu_replay() (which ends up calling amdvi_walk_page())
when syncing the dirty bitmap or after registering a new notifier. The 
guest would get IO_PAGE_FAULT events for all the regions where a mapping 
doesn't currently exist, which doesn't seem correct.

IOW, I think even this existing call to amdvi_page_fault() is not 
necessary/correct:

     do {
         pte_perms = amdvi_get_perms(pte);
         present = pte & 1;
         if (!present || perms != (perms & pte_perms)) {
             amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
             trace_amdvi_page_fault(addr);
             return;
         }
    [...]

I am aware I am jumping ahead a bit too much, but the point is that the 
whole event reporting likely needs a comprehensive review.

I need to study the area a lot more since even the simplest/only current 
use of amdvi_page_fault() right now is not very clear to me.

Alejandro

P.S.
Also confusing is this assertion in 2.5.3 IO_PAGE_FAULT Event:

"I/O page faults detected for translation requests return a 
translation-not-present response (R=W=0) to the device and are not 
logged in the event log."

so this suggests we should not emit a page fault i.e. don't log it in 
the event log.


> 
> Regards
> Sairaj Kodilkar



Re: [PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0
Posted by Arun Kodilkar, Sairaj 1 week, 6 days ago

On 3/20/2025 10:26 PM, Alejandro Jimenez wrote:
> Hi Sairaj Kodilkar,
> 
> On 3/20/25 1:11 AM, Arun Kodilkar, Sairaj wrote:
>>
>>
>> On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
>>> The AMD I/O Virtualization Technology (IOMMU) Specification (see 
>>> Table 8: V,
>>> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
>>> TV=1 does not contain a valid address translation information.  If a 
>>> request
>>> requires a table walk, the walk is terminated when this condition is
>>> encountered.
>>>
>>> Do not assume that addresses for a device with DTE[TV]=0 are passed 
>>> through
>>> (i.e. not remapped) and instead terminate the page table walk early.
>>>
>>> Cc: qemu-stable@nongnu.org
>>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>>> ---
>>>   hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
>>>   1 file changed, 49 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>>> index cf00450ebe..31d5522a62 100644
>>> --- a/hw/i386/amd_iommu.c
>>> +++ b/hw/i386/amd_iommu.c
>>> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace 
>>> *as, uint64_t *dte,
>>>       uint64_t pte = dte[0], pte_addr, page_mask;
>>>       /* make sure the DTE has TV = 1 */
>>> -    if (pte & AMDVI_DEV_TRANSLATION_VALID) {
>>> -        level = get_pte_translation_mode(pte);
>>> -        if (level >= 7) {
>>> -            trace_amdvi_mode_invalid(level, addr);
>>> +    if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
>>> +        /*
>>> +         * A DTE with V=1, TV=0 does not have a valid Page Table 
>>> Root Pointer.
>>> +         * An IOMMU processing a request that requires a table walk 
>>> terminates
>>> +         * the walk when it encounters this condition. Do the same 
>>> and return
>>> +         * instead of assuming that the address is forwarded without 
>>> translation
>>> +         * i.e. the passthrough case, as it is done for the case 
>>> where DTE[V]=0.
>>> +         */
>> Hi Alejandro,
>> According to AMD IOMMU specs TABLE 44 (IO_PAGE_FAULT Event Types), IOMMU
>> reports IO_PAGE_FAULT event when TV bit is not set in the DTE.
>>
>> Hence you should use amdvi_page_fault to report the IO_PAGE_FAULT
>> event before returning.
> 
> Good point, I had not considered this (and really haven't paid attention 
> to the page fault reporting until now). On first impression, I tend to 
> agree with your observation and will include in on v2. That being said...
> 
> The current role of amdvi_page_walk() is to be a helper for the 
> translate() method and ultimately the IOMMU replay() functionality. 
> These are needed for emulation but do not necessarily correspond 1:1 
> with guest-initiated operations. e.g. VFIO code will call 
> memory_region_iommu_replay() (which ends up calling amdvi_walk_page())
> when syncing the dirty bitmap or after registering a new notifier. The 
> guest would get IO_PAGE_FAULT events for all the regions where a mapping 
> doesn't currently exist, which doesn't seem correct.
> 
> IOW, I think even this existing call to amdvi_page_fault() is not 
> necessary/correct:
> 
>      do {
>          pte_perms = amdvi_get_perms(pte);
>          present = pte & 1;
>          if (!present || perms != (perms & pte_perms)) {
>              amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
>              trace_amdvi_page_fault(addr);
>              return;
>          }
>     [...]
> 
> I am aware I am jumping ahead a bit too much, but the point is that the 
> whole event reporting likely needs a comprehensive review.
>
Hi Alejandro,
I agree with this and we can do a comprehensive review before reporting
page fault in this path.
I think in future we should separate these two paths (replay and normal 
DMA translation). That way we can report page faults in DMA translation 
path and keep the replay path clean.

> I need to study the area a lot more since even the simplest/only current 
> use of amdvi_page_fault() right now is not very clear to me.
> 
> Alejandro
> 
> P.S.
> Also confusing is this assertion in 2.5.3 IO_PAGE_FAULT Event:
> 
> "I/O page faults detected for translation requests return a translation- 
> not-present response (R=W=0) to the device and are not logged in the 
> event log."
> 
> so this suggests we should not emit a page fault i.e. don't log it in 
> the event log.
> 

Here "translation requests" means ATS translation requests and not the 
DMA read/write. Basically IOMMU emits the IO_PAGE_FAULT when it fails to 
translate the DMA request but returns "translation-not-present" response
to the device for ATS request

Regards
Sairaj Kodilkar
> 
>>
>> Regards
>> Sairaj Kodilkar
> 
> 


Re: [PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0
Posted by Vasant Hegde 2 weeks, 1 day ago
Alejandro,


On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
> TV=1 does not contain a valid address translation information.  If a request
> requires a table walk, the walk is terminated when this condition is
> encountered.
> 
> Do not assume that addresses for a device with DTE[TV]=0 are passed through
> (i.e. not remapped) and instead terminate the page table walk early.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

I did double check and I think this patch matches HW behaviour. I did run few
tests w/ this series. It seems to work fine for me.


Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>



> ---
>  hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
>  1 file changed, 49 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index cf00450ebe..31d5522a62 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace *as, uint64_t *dte,
>      uint64_t pte = dte[0], pte_addr, page_mask;
>  
>      /* make sure the DTE has TV = 1 */
> -    if (pte & AMDVI_DEV_TRANSLATION_VALID) {
> -        level = get_pte_translation_mode(pte);
> -        if (level >= 7) {
> -            trace_amdvi_mode_invalid(level, addr);
> +    if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
> +        /*
> +         * A DTE with V=1, TV=0 does not have a valid Page Table Root Pointer.
> +         * An IOMMU processing a request that requires a table walk terminates
> +         * the walk when it encounters this condition. Do the same and return
> +         * instead of assuming that the address is forwarded without translation
> +         * i.e. the passthrough case, as it is done for the case where DTE[V]=0.
> +         */
> +        return;
> +    }
> +
> +    level = get_pte_translation_mode(pte);
> +    if (level >= 7) {
> +        trace_amdvi_mode_invalid(level, addr);
> +        return;
> +    }
> +    if (level == 0) {
> +        goto no_remap;
> +    }
> +
> +    /* we are at the leaf page table or page table encodes a huge page */
> +    do {
> +        pte_perms = amdvi_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> +            trace_amdvi_page_fault(addr);
>              return;
>          }
> -        if (level == 0) {
> -            goto no_remap;
> -        }
>  
> -        /* we are at the leaf page table or page table encodes a huge page */
> -        do {
> -            pte_perms = amdvi_get_perms(pte);
> -            present = pte & 1;
> -            if (!present || perms != (perms & pte_perms)) {
> -                amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> -                trace_amdvi_page_fault(addr);
> -                return;
> -            }
> -
> -            /* go to the next lower level */
> -            pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> -            /* add offset and load pte */
> -            pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> -            pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> -            if (!pte) {
> -                return;
> -            }
> -            oldlevel = level;
> -            level = get_pte_translation_mode(pte);
> -        } while (level > 0 && level < 7);
> -
> -        if (level == 0x7) {
> -            page_mask = pte_override_page_mask(pte);
> -        } else {
> -            page_mask = pte_get_page_mask(oldlevel);
> +        /* go to the next lower level */
> +        pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> +        /* add offset and load pte */
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> +        if (!pte) {
> +            return;
>          }
> +        oldlevel = level;
> +        level = get_pte_translation_mode(pte);

Unrelated to this patch.. We may want to add check to make sure level is
decreasing. Something like
	if (oldlevel <= level)
		error out

Otherwise bad page table can cause the issue.

-Vasant


> +    } while (level > 0 && level < 7);
>  
> -        /* get access permissions from pte */
> -        ret->iova = addr & page_mask;
> -        ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> -        ret->addr_mask = ~page_mask;
> -        ret->perm = amdvi_get_perms(pte);
> -        return;
> +    if (level == 0x7) {
> +        page_mask = pte_override_page_mask(pte);
> +    } else {
> +        page_mask = pte_get_page_mask(oldlevel);
>      }
> +
> +    /* get access permissions from pte */
> +    ret->iova = addr & page_mask;
> +    ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> +    ret->addr_mask = ~page_mask;
> +    ret->perm = amdvi_get_perms(pte);
> +    return;
> +
>  no_remap:
>      ret->iova = addr & AMDVI_PAGE_MASK_4K;
>      ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
Re: [PATCH 6/6] amd_iommu: Do not assume passthrough translation for devices with DTE[TV]=0
Posted by Alejandro Jimenez 2 weeks ago

On 3/19/25 2:06 AM, Vasant Hegde wrote:
> Alejandro,
> 
> 
> On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
>> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
>> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
>> TV=1 does not contain a valid address translation information.  If a request
>> requires a table walk, the walk is terminated when this condition is
>> encountered.
>>
>> Do not assume that addresses for a device with DTE[TV]=0 are passed through
>> (i.e. not remapped) and instead terminate the page table walk early.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> 
> I did double check and I think this patch matches HW behaviour. I did run few
> tests w/ this series. It seems to work fine for me.

Thank you for confirming and reviewing.

My understanding from the Linux driver code is that the TV field is not 
really used to control behavior and DTE[V] ends up doing that job 
(though I need to take a closer look at the more involved scenarios with 
SNP enabled). So I have not seen the case with V=1,TV=0 in the basic 
scenarios I have run, but my intention is to adhere to the spec as 
closely as possible. So I appreciate the confirmation.


> 
> 
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> 
> 
> 

[...]


>> +        oldlevel = level;
>> +        level = get_pte_translation_mode(pte);
> 
> Unrelated to this patch.. We may want to add check to make sure level is
> decreasing. Something like
> 	if (oldlevel <= level)
> 		error out
> 
> Otherwise bad page table can cause the issue.

ACK. I have considered replacing this (amdvi_page_walk()) with code that 
more closely mimics the fetch_pte() algorithm in the Linux driver. I've 
been using it on my (coming soon) RFC for DMA remapping, and comparing 
against amdvi_page_walk() for verification. I put in assertions like the 
one you proposed in that new code. But if that approach is not taken, 
I'll add the safety checks you suggested here.

Thank you,
Alejandro

> 
> -Vasant
> 
> 
>> +    } while (level > 0 && level < 7);
>>