The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
TV=1 does not contain a valid address translation information. If a request
requires a table walk, the walk is terminated when this condition is
encountered.
Do not assume that addresses for a device with DTE[TV]=0 are passed through
(i.e. not remapped) and instead terminate the page table walk early.
Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
1 file changed, 49 insertions(+), 39 deletions(-)
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index cf00450ebe..31d5522a62 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace *as, uint64_t *dte,
uint64_t pte = dte[0], pte_addr, page_mask;
/* make sure the DTE has TV = 1 */
- if (pte & AMDVI_DEV_TRANSLATION_VALID) {
- level = get_pte_translation_mode(pte);
- if (level >= 7) {
- trace_amdvi_mode_invalid(level, addr);
+ if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
+ /*
+ * A DTE with V=1, TV=0 does not have a valid Page Table Root Pointer.
+ * An IOMMU processing a request that requires a table walk terminates
+ * the walk when it encounters this condition. Do the same and return
+ * instead of assuming that the address is forwarded without translation
+ * i.e. the passthrough case, as it is done for the case where DTE[V]=0.
+ */
+ return;
+ }
+
+ level = get_pte_translation_mode(pte);
+ if (level >= 7) {
+ trace_amdvi_mode_invalid(level, addr);
+ return;
+ }
+ if (level == 0) {
+ goto no_remap;
+ }
+
+ /* we are at the leaf page table or page table encodes a huge page */
+ do {
+ pte_perms = amdvi_get_perms(pte);
+ present = pte & 1;
+ if (!present || perms != (perms & pte_perms)) {
+ amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
+ trace_amdvi_page_fault(addr);
return;
}
- if (level == 0) {
- goto no_remap;
- }
- /* we are at the leaf page table or page table encodes a huge page */
- do {
- pte_perms = amdvi_get_perms(pte);
- present = pte & 1;
- if (!present || perms != (perms & pte_perms)) {
- amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
- trace_amdvi_page_fault(addr);
- return;
- }
-
- /* go to the next lower level */
- pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
- /* add offset and load pte */
- pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
- pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
- if (!pte) {
- return;
- }
- oldlevel = level;
- level = get_pte_translation_mode(pte);
- } while (level > 0 && level < 7);
-
- if (level == 0x7) {
- page_mask = pte_override_page_mask(pte);
- } else {
- page_mask = pte_get_page_mask(oldlevel);
+ /* go to the next lower level */
+ pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
+ /* add offset and load pte */
+ pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+ pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
+ if (!pte) {
+ return;
}
+ oldlevel = level;
+ level = get_pte_translation_mode(pte);
+ } while (level > 0 && level < 7);
- /* get access permissions from pte */
- ret->iova = addr & page_mask;
- ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
- ret->addr_mask = ~page_mask;
- ret->perm = amdvi_get_perms(pte);
- return;
+ if (level == 0x7) {
+ page_mask = pte_override_page_mask(pte);
+ } else {
+ page_mask = pte_get_page_mask(oldlevel);
}
+
+ /* get access permissions from pte */
+ ret->iova = addr & page_mask;
+ ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
+ ret->addr_mask = ~page_mask;
+ ret->perm = amdvi_get_perms(pte);
+ return;
+
no_remap:
ret->iova = addr & AMDVI_PAGE_MASK_4K;
ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
--
2.43.5
On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
> TV=1 does not contain a valid address translation information. If a request
> requires a table walk, the walk is terminated when this condition is
> encountered.
>
> Do not assume that addresses for a device with DTE[TV]=0 are passed through
> (i.e. not remapped) and instead terminate the page table walk early.
>
> Cc: qemu-stable@nongnu.org
> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> ---
> hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
> 1 file changed, 49 insertions(+), 39 deletions(-)
>
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index cf00450ebe..31d5522a62 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace *as, uint64_t *dte,
> uint64_t pte = dte[0], pte_addr, page_mask;
>
> /* make sure the DTE has TV = 1 */
> - if (pte & AMDVI_DEV_TRANSLATION_VALID) {
> - level = get_pte_translation_mode(pte);
> - if (level >= 7) {
> - trace_amdvi_mode_invalid(level, addr);
> + if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
> + /*
> + * A DTE with V=1, TV=0 does not have a valid Page Table Root Pointer.
> + * An IOMMU processing a request that requires a table walk terminates
> + * the walk when it encounters this condition. Do the same and return
> + * instead of assuming that the address is forwarded without translation
> + * i.e. the passthrough case, as it is done for the case where DTE[V]=0.
> + */
Hi Alejandro,
According to AMD IOMMU specs TABLE 44 (IO_PAGE_FAULT Event Types), IOMMU
reports IO_PAGE_FAULT event when TV bit is not set in the DTE.
Hence you should use amdvi_page_fault to report the IO_PAGE_FAULT
event before returning.
Regards
Sairaj Kodilkar
> + return;
> + }
> +
> + level = get_pte_translation_mode(pte);
> + if (level >= 7) {
> + trace_amdvi_mode_invalid(level, addr);
> + return;
> + }
> + if (level == 0) {
> + goto no_remap;
> + }
> +
> + /* we are at the leaf page table or page table encodes a huge page */
> + do {
> + pte_perms = amdvi_get_perms(pte);
> + present = pte & 1;
> + if (!present || perms != (perms & pte_perms)) {
> + amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> + trace_amdvi_page_fault(addr);
> return;
> }
> - if (level == 0) {
> - goto no_remap;
> - }
>
> - /* we are at the leaf page table or page table encodes a huge page */
> - do {
> - pte_perms = amdvi_get_perms(pte);
> - present = pte & 1;
> - if (!present || perms != (perms & pte_perms)) {
> - amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> - trace_amdvi_page_fault(addr);
> - return;
> - }
> -
> - /* go to the next lower level */
> - pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> - /* add offset and load pte */
> - pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> - pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> - if (!pte) {
> - return;
> - }
> - oldlevel = level;
> - level = get_pte_translation_mode(pte);
> - } while (level > 0 && level < 7);
> -
> - if (level == 0x7) {
> - page_mask = pte_override_page_mask(pte);
> - } else {
> - page_mask = pte_get_page_mask(oldlevel);
> + /* go to the next lower level */
> + pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> + /* add offset and load pte */
> + pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> + pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> + if (!pte) {
> + return;
> }
> + oldlevel = level;
> + level = get_pte_translation_mode(pte);
> + } while (level > 0 && level < 7);
>
> - /* get access permissions from pte */
> - ret->iova = addr & page_mask;
> - ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> - ret->addr_mask = ~page_mask;
> - ret->perm = amdvi_get_perms(pte);
> - return;
> + if (level == 0x7) {
> + page_mask = pte_override_page_mask(pte);
> + } else {
> + page_mask = pte_get_page_mask(oldlevel);
> }
> +
> + /* get access permissions from pte */
> + ret->iova = addr & page_mask;
> + ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> + ret->addr_mask = ~page_mask;
> + ret->perm = amdvi_get_perms(pte);
> + return;
> +
> no_remap:
> ret->iova = addr & AMDVI_PAGE_MASK_4K;
> ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
Hi Sairaj Kodilkar,
On 3/20/25 1:11 AM, Arun Kodilkar, Sairaj wrote:
>
>
> On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
>> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table
>> 8: V,
>> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
>> TV=1 does not contain a valid address translation information. If a
>> request
>> requires a table walk, the walk is terminated when this condition is
>> encountered.
>>
>> Do not assume that addresses for a device with DTE[TV]=0 are passed
>> through
>> (i.e. not remapped) and instead terminate the page table walk early.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>> ---
>> hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
>> 1 file changed, 49 insertions(+), 39 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index cf00450ebe..31d5522a62 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace
>> *as, uint64_t *dte,
>> uint64_t pte = dte[0], pte_addr, page_mask;
>> /* make sure the DTE has TV = 1 */
>> - if (pte & AMDVI_DEV_TRANSLATION_VALID) {
>> - level = get_pte_translation_mode(pte);
>> - if (level >= 7) {
>> - trace_amdvi_mode_invalid(level, addr);
>> + if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
>> + /*
>> + * A DTE with V=1, TV=0 does not have a valid Page Table Root
>> Pointer.
>> + * An IOMMU processing a request that requires a table walk
>> terminates
>> + * the walk when it encounters this condition. Do the same
>> and return
>> + * instead of assuming that the address is forwarded without
>> translation
>> + * i.e. the passthrough case, as it is done for the case
>> where DTE[V]=0.
>> + */
> Hi Alejandro,
> According to AMD IOMMU specs TABLE 44 (IO_PAGE_FAULT Event Types), IOMMU
> reports IO_PAGE_FAULT event when TV bit is not set in the DTE.
>
> Hence you should use amdvi_page_fault to report the IO_PAGE_FAULT
> event before returning.
Good point, I had not considered this (and really haven't paid attention
to the page fault reporting until now). On first impression, I tend to
agree with your observation and will include in on v2. That being said...
The current role of amdvi_page_walk() is to be a helper for the
translate() method and ultimately the IOMMU replay() functionality.
These are needed for emulation but do not necessarily correspond 1:1
with guest-initiated operations. e.g. VFIO code will call
memory_region_iommu_replay() (which ends up calling amdvi_walk_page())
when syncing the dirty bitmap or after registering a new notifier. The
guest would get IO_PAGE_FAULT events for all the regions where a mapping
doesn't currently exist, which doesn't seem correct.
IOW, I think even this existing call to amdvi_page_fault() is not
necessary/correct:
do {
pte_perms = amdvi_get_perms(pte);
present = pte & 1;
if (!present || perms != (perms & pte_perms)) {
amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
trace_amdvi_page_fault(addr);
return;
}
[...]
I am aware I am jumping ahead a bit too much, but the point is that the
whole event reporting likely needs a comprehensive review.
I need to study the area a lot more since even the simplest/only current
use of amdvi_page_fault() right now is not very clear to me.
Alejandro
P.S.
Also confusing is this assertion in 2.5.3 IO_PAGE_FAULT Event:
"I/O page faults detected for translation requests return a
translation-not-present response (R=W=0) to the device and are not
logged in the event log."
so this suggests we should not emit a page fault i.e. don't log it in
the event log.
>
> Regards
> Sairaj Kodilkar
On 3/20/2025 10:26 PM, Alejandro Jimenez wrote:
> Hi Sairaj Kodilkar,
>
> On 3/20/25 1:11 AM, Arun Kodilkar, Sairaj wrote:
>>
>>
>> On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
>>> The AMD I/O Virtualization Technology (IOMMU) Specification (see
>>> Table 8: V,
>>> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
>>> TV=1 does not contain a valid address translation information. If a
>>> request
>>> requires a table walk, the walk is terminated when this condition is
>>> encountered.
>>>
>>> Do not assume that addresses for a device with DTE[TV]=0 are passed
>>> through
>>> (i.e. not remapped) and instead terminate the page table walk early.
>>>
>>> Cc: qemu-stable@nongnu.org
>>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>>> ---
>>> hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
>>> 1 file changed, 49 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>>> index cf00450ebe..31d5522a62 100644
>>> --- a/hw/i386/amd_iommu.c
>>> +++ b/hw/i386/amd_iommu.c
>>> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace
>>> *as, uint64_t *dte,
>>> uint64_t pte = dte[0], pte_addr, page_mask;
>>> /* make sure the DTE has TV = 1 */
>>> - if (pte & AMDVI_DEV_TRANSLATION_VALID) {
>>> - level = get_pte_translation_mode(pte);
>>> - if (level >= 7) {
>>> - trace_amdvi_mode_invalid(level, addr);
>>> + if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
>>> + /*
>>> + * A DTE with V=1, TV=0 does not have a valid Page Table
>>> Root Pointer.
>>> + * An IOMMU processing a request that requires a table walk
>>> terminates
>>> + * the walk when it encounters this condition. Do the same
>>> and return
>>> + * instead of assuming that the address is forwarded without
>>> translation
>>> + * i.e. the passthrough case, as it is done for the case
>>> where DTE[V]=0.
>>> + */
>> Hi Alejandro,
>> According to AMD IOMMU specs TABLE 44 (IO_PAGE_FAULT Event Types), IOMMU
>> reports IO_PAGE_FAULT event when TV bit is not set in the DTE.
>>
>> Hence you should use amdvi_page_fault to report the IO_PAGE_FAULT
>> event before returning.
>
> Good point, I had not considered this (and really haven't paid attention
> to the page fault reporting until now). On first impression, I tend to
> agree with your observation and will include in on v2. That being said...
>
> The current role of amdvi_page_walk() is to be a helper for the
> translate() method and ultimately the IOMMU replay() functionality.
> These are needed for emulation but do not necessarily correspond 1:1
> with guest-initiated operations. e.g. VFIO code will call
> memory_region_iommu_replay() (which ends up calling amdvi_walk_page())
> when syncing the dirty bitmap or after registering a new notifier. The
> guest would get IO_PAGE_FAULT events for all the regions where a mapping
> doesn't currently exist, which doesn't seem correct.
>
> IOW, I think even this existing call to amdvi_page_fault() is not
> necessary/correct:
>
> do {
> pte_perms = amdvi_get_perms(pte);
> present = pte & 1;
> if (!present || perms != (perms & pte_perms)) {
> amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> trace_amdvi_page_fault(addr);
> return;
> }
> [...]
>
> I am aware I am jumping ahead a bit too much, but the point is that the
> whole event reporting likely needs a comprehensive review.
>
Hi Alejandro,
I agree with this and we can do a comprehensive review before reporting
page fault in this path.
I think in future we should separate these two paths (replay and normal
DMA translation). That way we can report page faults in DMA translation
path and keep the replay path clean.
> I need to study the area a lot more since even the simplest/only current
> use of amdvi_page_fault() right now is not very clear to me.
>
> Alejandro
>
> P.S.
> Also confusing is this assertion in 2.5.3 IO_PAGE_FAULT Event:
>
> "I/O page faults detected for translation requests return a translation-
> not-present response (R=W=0) to the device and are not logged in the
> event log."
>
> so this suggests we should not emit a page fault i.e. don't log it in
> the event log.
>
Here "translation requests" means ATS translation requests and not the
DMA read/write. Basically IOMMU emits the IO_PAGE_FAULT when it fails to
translate the DMA request but returns "translation-not-present" response
to the device for ATS request
Regards
Sairaj Kodilkar
>
>>
>> Regards
>> Sairaj Kodilkar
>
>
Alejandro,
On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
> TV=1 does not contain a valid address translation information. If a request
> requires a table walk, the walk is terminated when this condition is
> encountered.
>
> Do not assume that addresses for a device with DTE[TV]=0 are passed through
> (i.e. not remapped) and instead terminate the page table walk early.
>
> Cc: qemu-stable@nongnu.org
> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
I did double check and I think this patch matches HW behaviour. I did run few
tests w/ this series. It seems to work fine for me.
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
> hw/i386/amd_iommu.c | 88 +++++++++++++++++++++++++--------------------
> 1 file changed, 49 insertions(+), 39 deletions(-)
>
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index cf00450ebe..31d5522a62 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -932,51 +932,61 @@ static void amdvi_page_walk(AMDVIAddressSpace *as, uint64_t *dte,
> uint64_t pte = dte[0], pte_addr, page_mask;
>
> /* make sure the DTE has TV = 1 */
> - if (pte & AMDVI_DEV_TRANSLATION_VALID) {
> - level = get_pte_translation_mode(pte);
> - if (level >= 7) {
> - trace_amdvi_mode_invalid(level, addr);
> + if (!(pte & AMDVI_DEV_TRANSLATION_VALID)) {
> + /*
> + * A DTE with V=1, TV=0 does not have a valid Page Table Root Pointer.
> + * An IOMMU processing a request that requires a table walk terminates
> + * the walk when it encounters this condition. Do the same and return
> + * instead of assuming that the address is forwarded without translation
> + * i.e. the passthrough case, as it is done for the case where DTE[V]=0.
> + */
> + return;
> + }
> +
> + level = get_pte_translation_mode(pte);
> + if (level >= 7) {
> + trace_amdvi_mode_invalid(level, addr);
> + return;
> + }
> + if (level == 0) {
> + goto no_remap;
> + }
> +
> + /* we are at the leaf page table or page table encodes a huge page */
> + do {
> + pte_perms = amdvi_get_perms(pte);
> + present = pte & 1;
> + if (!present || perms != (perms & pte_perms)) {
> + amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> + trace_amdvi_page_fault(addr);
> return;
> }
> - if (level == 0) {
> - goto no_remap;
> - }
>
> - /* we are at the leaf page table or page table encodes a huge page */
> - do {
> - pte_perms = amdvi_get_perms(pte);
> - present = pte & 1;
> - if (!present || perms != (perms & pte_perms)) {
> - amdvi_page_fault(as->iommu_state, as->devfn, addr, perms);
> - trace_amdvi_page_fault(addr);
> - return;
> - }
> -
> - /* go to the next lower level */
> - pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> - /* add offset and load pte */
> - pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> - pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> - if (!pte) {
> - return;
> - }
> - oldlevel = level;
> - level = get_pte_translation_mode(pte);
> - } while (level > 0 && level < 7);
> -
> - if (level == 0x7) {
> - page_mask = pte_override_page_mask(pte);
> - } else {
> - page_mask = pte_get_page_mask(oldlevel);
> + /* go to the next lower level */
> + pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
> + /* add offset and load pte */
> + pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> + pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
> + if (!pte) {
> + return;
> }
> + oldlevel = level;
> + level = get_pte_translation_mode(pte);
Unrelated to this patch.. We may want to add check to make sure level is
decreasing. Something like
if (oldlevel <= level)
error out
Otherwise bad page table can cause the issue.
-Vasant
> + } while (level > 0 && level < 7);
>
> - /* get access permissions from pte */
> - ret->iova = addr & page_mask;
> - ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> - ret->addr_mask = ~page_mask;
> - ret->perm = amdvi_get_perms(pte);
> - return;
> + if (level == 0x7) {
> + page_mask = pte_override_page_mask(pte);
> + } else {
> + page_mask = pte_get_page_mask(oldlevel);
> }
> +
> + /* get access permissions from pte */
> + ret->iova = addr & page_mask;
> + ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) & page_mask;
> + ret->addr_mask = ~page_mask;
> + ret->perm = amdvi_get_perms(pte);
> + return;
> +
> no_remap:
> ret->iova = addr & AMDVI_PAGE_MASK_4K;
> ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
On 3/19/25 2:06 AM, Vasant Hegde wrote:
> Alejandro,
>
>
> On 3/11/2025 8:54 PM, Alejandro Jimenez wrote:
>> The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V,
>> TV, and GV Fields in Device Table Entry), specifies that a DTE with V=0,
>> TV=1 does not contain a valid address translation information. If a request
>> requires a table walk, the walk is terminated when this condition is
>> encountered.
>>
>> Do not assume that addresses for a device with DTE[TV]=0 are passed through
>> (i.e. not remapped) and instead terminate the page table walk early.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>
> I did double check and I think this patch matches HW behaviour. I did run few
> tests w/ this series. It seems to work fine for me.
Thank you for confirming and reviewing.
My understanding from the Linux driver code is that the TV field is not
really used to control behavior and DTE[V] ends up doing that job
(though I need to take a closer look at the more involved scenarios with
SNP enabled). So I have not seen the case with V=1,TV=0 in the basic
scenarios I have run, but my intention is to adhere to the spec as
closely as possible. So I appreciate the confirmation.
>
>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
>
>
>
[...]
>> + oldlevel = level;
>> + level = get_pte_translation_mode(pte);
>
> Unrelated to this patch.. We may want to add check to make sure level is
> decreasing. Something like
> if (oldlevel <= level)
> error out
>
> Otherwise bad page table can cause the issue.
ACK. I have considered replacing this (amdvi_page_walk()) with code that
more closely mimics the fetch_pte() algorithm in the Linux driver. I've
been using it on my (coming soon) RFC for DMA remapping, and comparing
against amdvi_page_walk() for verification. I put in assertions like the
one you proposed in that new code. But if that approach is not taken,
I'll add the safety checks you suggested here.
Thank you,
Alejandro
>
> -Vasant
>
>
>> + } while (level > 0 && level < 7);
>>
© 2016 - 2026 Red Hat, Inc.