[PATCH V3] hw/riscv/riscv-iommu: Fixup PDT Nested Walk

guoren@kernel.org posted 1 patch 4 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250913041233.972870-1-guoren@kernel.org
Maintainers: Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
hw/riscv/riscv-iommu.c | 141 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 139 insertions(+), 2 deletions(-)
[PATCH V3] hw/riscv/riscv-iommu: Fixup PDT Nested Walk
Posted by guoren@kernel.org 4 months, 3 weeks ago
From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>

Current implementation is wrong when iohgatp != bare. The RISC-V
IOMMU specification has defined that the PDT is based on GPA, not
SPA. So this patch fixes the problem, making PDT walk correctly
when the G-stage table walk is enabled.

Fixes: 0c54acb8243d ("hw/riscv: add RISC-V IOMMU base emulation")
Cc: qemu-stable@nongnu.org
Cc: Sebastien Boeuf <seb@rivosinc.com>
Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
Reviewed-by: Weiwei Li <liwei1518@gmail.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
---
Changes in V3:
 - Fixup inner non-leaf walking for 4KB-align.
 - Add two Reviewed-by tags.

Changes in V2:
 - Remove nested param to make patch clearer.
---
 hw/riscv/riscv-iommu.c | 141 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 139 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index 96a7fbdefcf3..ddb5236f55d1 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -866,6 +866,143 @@ static bool riscv_iommu_validate_process_ctx(RISCVIOMMUState *s,
     return true;
 }
 
+/**
+ * pdt_memory_read: PDT wrapper of dma_memory_read.
+ *
+ * @s: IOMMU Device State
+ * @ctx: Device Translation Context with devid and pasid set
+ * @addr: address within that address space
+ * @buf: buffer with the data transferred
+ * @len: length of the data transferred
+ * @attrs: memory transaction attributes
+ */
+static MemTxResult pdt_memory_read(RISCVIOMMUState *s,
+                                   RISCVIOMMUContext *ctx,
+                                   dma_addr_t addr,
+                                   void *buf, dma_addr_t len,
+                                   MemTxAttrs attrs)
+{
+    uint64_t gatp_mode, pte;
+    struct {
+        unsigned char step;
+        unsigned char levels;
+        unsigned char ptidxbits;
+        unsigned char ptesize;
+    } sc;
+    MemTxResult ret;
+    dma_addr_t base = addr;
+
+    /* G stages translation mode */
+    gatp_mode = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
+    if (gatp_mode == RISCV_IOMMU_DC_IOHGATP_MODE_BARE)
+        goto out;
+
+    /* G stages translation tables root pointer */
+    base = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD));
+
+    /* Start at step 0 */
+    sc.step = 0;
+
+    if (s->fctl & RISCV_IOMMU_FCTL_GXL) {
+        /* 32bit mode for GXL == 1 */
+        switch (gatp_mode) {
+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4:
+            if (!(s->cap & RISCV_IOMMU_CAP_SV32X4)) {
+                return MEMTX_ACCESS_ERROR;
+            }
+            sc.levels    = 2;
+            sc.ptidxbits = 10;
+            sc.ptesize   = 4;
+            break;
+        default:
+            return MEMTX_ACCESS_ERROR;
+        }
+    } else {
+        /* 64bit mode for GXL == 0 */
+        switch (gatp_mode) {
+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4:
+            if (!(s->cap & RISCV_IOMMU_CAP_SV39X4)) {
+                return MEMTX_ACCESS_ERROR;
+            }
+            sc.levels    = 3;
+            sc.ptidxbits = 9;
+            sc.ptesize   = 8;
+            break;
+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4:
+            if (!(s->cap & RISCV_IOMMU_CAP_SV48X4)) {
+                return MEMTX_ACCESS_ERROR;
+            }
+            sc.levels    = 4;
+            sc.ptidxbits = 9;
+            sc.ptesize   = 8;
+            break;
+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4:
+            if (!(s->cap & RISCV_IOMMU_CAP_SV57X4)) {
+                return MEMTX_ACCESS_ERROR;
+            }
+            sc.levels    = 5;
+            sc.ptidxbits = 9;
+            sc.ptesize   = 8;
+            break;
+        default:
+            return MEMTX_ACCESS_ERROR;
+        }
+    }
+
+    do {
+        const unsigned va_bits = (sc.step ? 0 : 2) + sc.ptidxbits;
+        const unsigned va_skip = TARGET_PAGE_BITS + sc.ptidxbits *
+                                 (sc.levels - 1 - sc.step);
+        const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1);
+        const dma_addr_t pte_addr = base + idx * sc.ptesize;
+
+        /* Address range check before first level lookup */
+        if (!sc.step) {
+            const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1;
+            if ((addr & va_mask) != addr) {
+                return MEMTX_ACCESS_ERROR;
+            }
+        }
+
+        /* Read page table entry */
+        if (sc.ptesize == 4) {
+            uint32_t pte32 = 0;
+            ret = ldl_le_dma(s->target_as, pte_addr, &pte32, attrs);
+            pte = pte32;
+        } else {
+            ret = ldq_le_dma(s->target_as, pte_addr, &pte, attrs);
+        }
+        if (ret != MEMTX_OK)
+            return ret;
+
+        sc.step++;
+        hwaddr ppn = pte >> PTE_PPN_SHIFT;
+
+        if (!(pte & PTE_V)) {
+            return MEMTX_ACCESS_ERROR; /* Invalid PTE */
+        } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
+            base = PPN_PHYS(ppn); /* Inner PTE, continue walking */
+        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
+            return MEMTX_ACCESS_ERROR; /* Reserved leaf PTE flags: PTE_W */
+        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) {
+            return MEMTX_ACCESS_ERROR; /* Reserved leaf PTE flags: PTE_W + PTE_X */
+        } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) {
+            return MEMTX_ACCESS_ERROR; /* Misaligned PPN */
+        } else {
+            /* Leaf PTE, translation completed. */
+            base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1));
+            break;
+        }
+
+        if (sc.step == sc.levels) {
+            return MEMTX_ACCESS_ERROR; /* Can't find leaf PTE */
+        }
+    } while (1);
+
+out:
+    return dma_memory_read(s->target_as, base, buf, len, attrs);
+}
+
 /*
  * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk
  *
@@ -1038,7 +1175,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
          */
         const int split = depth * 9 + 8;
         addr |= ((ctx->process_id >> split) << 3) & ~TARGET_PAGE_MASK;
-        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
+        if (pdt_memory_read(s, ctx, addr, &de, sizeof(de),
                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
             return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
         }
@@ -1053,7 +1190,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
 
     /* Leaf entry in PDT */
     addr |= (ctx->process_id << 4) & ~TARGET_PAGE_MASK;
-    if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2,
+    if (pdt_memory_read(s, ctx, addr, &dc.ta, sizeof(uint64_t) * 2,
                         MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
         return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
     }
-- 
2.40.1
Re: [PATCH V3] hw/riscv/riscv-iommu: Fixup PDT Nested Walk
Posted by Alistair Francis 4 months, 1 week ago
On Sat, Sep 13, 2025 at 2:15 PM <guoren@kernel.org> wrote:
>
> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
>
> Current implementation is wrong when iohgatp != bare. The RISC-V
> IOMMU specification has defined that the PDT is based on GPA, not
> SPA. So this patch fixes the problem, making PDT walk correctly
> when the G-stage table walk is enabled.
>
> Fixes: 0c54acb8243d ("hw/riscv: add RISC-V IOMMU base emulation")
> Cc: qemu-stable@nongnu.org
> Cc: Sebastien Boeuf <seb@rivosinc.com>
> Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
> Reviewed-by: Weiwei Li <liwei1518@gmail.com>
> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
> Changes in V3:
>  - Fixup inner non-leaf walking for 4KB-align.
>  - Add two Reviewed-by tags.
>
> Changes in V2:
>  - Remove nested param to make patch clearer.
> ---
>  hw/riscv/riscv-iommu.c | 141 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 139 insertions(+), 2 deletions(-)
>
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> index 96a7fbdefcf3..ddb5236f55d1 100644
> --- a/hw/riscv/riscv-iommu.c
> +++ b/hw/riscv/riscv-iommu.c
> @@ -866,6 +866,143 @@ static bool riscv_iommu_validate_process_ctx(RISCVIOMMUState *s,
>      return true;
>  }
>
> +/**
> + * pdt_memory_read: PDT wrapper of dma_memory_read.
> + *
> + * @s: IOMMU Device State
> + * @ctx: Device Translation Context with devid and pasid set
> + * @addr: address within that address space
> + * @buf: buffer with the data transferred
> + * @len: length of the data transferred
> + * @attrs: memory transaction attributes
> + */
> +static MemTxResult pdt_memory_read(RISCVIOMMUState *s,
> +                                   RISCVIOMMUContext *ctx,
> +                                   dma_addr_t addr,
> +                                   void *buf, dma_addr_t len,
> +                                   MemTxAttrs attrs)
> +{
> +    uint64_t gatp_mode, pte;
> +    struct {
> +        unsigned char step;
> +        unsigned char levels;
> +        unsigned char ptidxbits;
> +        unsigned char ptesize;
> +    } sc;
> +    MemTxResult ret;
> +    dma_addr_t base = addr;
> +
> +    /* G stages translation mode */
> +    gatp_mode = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
> +    if (gatp_mode == RISCV_IOMMU_DC_IOHGATP_MODE_BARE)
> +        goto out;
> +
> +    /* G stages translation tables root pointer */
> +    base = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD));
> +
> +    /* Start at step 0 */
> +    sc.step = 0;
> +
> +    if (s->fctl & RISCV_IOMMU_FCTL_GXL) {
> +        /* 32bit mode for GXL == 1 */
> +        switch (gatp_mode) {
> +        case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4:
> +            if (!(s->cap & RISCV_IOMMU_CAP_SV32X4)) {
> +                return MEMTX_ACCESS_ERROR;
> +            }
> +            sc.levels    = 2;
> +            sc.ptidxbits = 10;
> +            sc.ptesize   = 4;
> +            break;
> +        default:
> +            return MEMTX_ACCESS_ERROR;
> +        }
> +    } else {
> +        /* 64bit mode for GXL == 0 */
> +        switch (gatp_mode) {
> +        case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4:
> +            if (!(s->cap & RISCV_IOMMU_CAP_SV39X4)) {
> +                return MEMTX_ACCESS_ERROR;
> +            }
> +            sc.levels    = 3;
> +            sc.ptidxbits = 9;
> +            sc.ptesize   = 8;
> +            break;
> +        case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4:
> +            if (!(s->cap & RISCV_IOMMU_CAP_SV48X4)) {
> +                return MEMTX_ACCESS_ERROR;
> +            }
> +            sc.levels    = 4;
> +            sc.ptidxbits = 9;
> +            sc.ptesize   = 8;
> +            break;
> +        case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4:
> +            if (!(s->cap & RISCV_IOMMU_CAP_SV57X4)) {
> +                return MEMTX_ACCESS_ERROR;
> +            }
> +            sc.levels    = 5;
> +            sc.ptidxbits = 9;
> +            sc.ptesize   = 8;
> +            break;
> +        default:
> +            return MEMTX_ACCESS_ERROR;
> +        }
> +    }
> +
> +    do {
> +        const unsigned va_bits = (sc.step ? 0 : 2) + sc.ptidxbits;
> +        const unsigned va_skip = TARGET_PAGE_BITS + sc.ptidxbits *
> +                                 (sc.levels - 1 - sc.step);
> +        const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1);
> +        const dma_addr_t pte_addr = base + idx * sc.ptesize;
> +
> +        /* Address range check before first level lookup */
> +        if (!sc.step) {
> +            const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1;
> +            if ((addr & va_mask) != addr) {
> +                return MEMTX_ACCESS_ERROR;
> +            }
> +        }
> +
> +        /* Read page table entry */
> +        if (sc.ptesize == 4) {
> +            uint32_t pte32 = 0;
> +            ret = ldl_le_dma(s->target_as, pte_addr, &pte32, attrs);
> +            pte = pte32;
> +        } else {
> +            ret = ldq_le_dma(s->target_as, pte_addr, &pte, attrs);
> +        }
> +        if (ret != MEMTX_OK)
> +            return ret;
> +
> +        sc.step++;
> +        hwaddr ppn = pte >> PTE_PPN_SHIFT;
> +
> +        if (!(pte & PTE_V)) {
> +            return MEMTX_ACCESS_ERROR; /* Invalid PTE */
> +        } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
> +            base = PPN_PHYS(ppn); /* Inner PTE, continue walking */
> +        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
> +            return MEMTX_ACCESS_ERROR; /* Reserved leaf PTE flags: PTE_W */
> +        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) {
> +            return MEMTX_ACCESS_ERROR; /* Reserved leaf PTE flags: PTE_W + PTE_X */
> +        } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) {
> +            return MEMTX_ACCESS_ERROR; /* Misaligned PPN */
> +        } else {
> +            /* Leaf PTE, translation completed. */
> +            base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1));
> +            break;
> +        }
> +
> +        if (sc.step == sc.levels) {
> +            return MEMTX_ACCESS_ERROR; /* Can't find leaf PTE */
> +        }
> +    } while (1);
> +
> +out:
> +    return dma_memory_read(s->target_as, base, buf, len, attrs);
> +}
> +
>  /*
>   * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk
>   *
> @@ -1038,7 +1175,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>           */
>          const int split = depth * 9 + 8;
>          addr |= ((ctx->process_id >> split) << 3) & ~TARGET_PAGE_MASK;
> -        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
> +        if (pdt_memory_read(s, ctx, addr, &de, sizeof(de),
>                              MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
>              return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
>          }
> @@ -1053,7 +1190,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>
>      /* Leaf entry in PDT */
>      addr |= (ctx->process_id << 4) & ~TARGET_PAGE_MASK;
> -    if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2,
> +    if (pdt_memory_read(s, ctx, addr, &dc.ta, sizeof(uint64_t) * 2,
>                          MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
>          return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
>      }
> --
> 2.40.1
>
>
Re: [PATCH V3] hw/riscv/riscv-iommu: Fixup PDT Nested Walk
Posted by fangyu.yu@linux.alibaba.com 4 months, 1 week ago
>From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
>
>Current implementation is wrong when iohgatp != bare. The RISC-V
>IOMMU specification has defined that the PDT is based on GPA, not
>SPA. So this patch fixes the problem, making PDT walk correctly
>when the G-stage table walk is enabled.
>
>Fixes: 0c54acb8243d ("hw/riscv: add RISC-V IOMMU base emulation")
>Cc: qemu-stable@nongnu.org
>Cc: Sebastien Boeuf <seb@rivosinc.com>
>Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
>Reviewed-by: Weiwei Li <liwei1518@gmail.com>
>Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
>Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
>---
>Changes in V3:
> - Fixup inner non-leaf walking for 4KB-align.
> - Add two Reviewed-by tags.
>
>Changes in V2:
> - Remove nested param to make patch clearer.
>---
> hw/riscv/riscv-iommu.c | 141 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 139 insertions(+), 2 deletions(-)
>
>diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
>index 96a7fbdefcf3..ddb5236f55d1 100644
>--- a/hw/riscv/riscv-iommu.c
>+++ b/hw/riscv/riscv-iommu.c
>@@ -866,6 +866,143 @@ static bool riscv_iommu_validate_process_ctx(RISCVIOMMUState *s,
>     return true;
> }
> 
>+/**
>+ * pdt_memory_read: PDT wrapper of dma_memory_read.
>+ *
>+ * @s: IOMMU Device State
>+ * @ctx: Device Translation Context with devid and pasid set
>+ * @addr: address within that address space
>+ * @buf: buffer with the data transferred
>+ * @len: length of the data transferred
>+ * @attrs: memory transaction attributes
>+ */
>+static MemTxResult pdt_memory_read(RISCVIOMMUState *s,
>+                                   RISCVIOMMUContext *ctx,
>+                                   dma_addr_t addr,
>+                                   void *buf, dma_addr_t len,
>+                                   MemTxAttrs attrs)
>+{
>+    uint64_t gatp_mode, pte;
>+    struct {
>+        unsigned char step;
>+        unsigned char levels;
>+        unsigned char ptidxbits;
>+        unsigned char ptesize;
>+    } sc;
>+    MemTxResult ret;
>+    dma_addr_t base = addr;
>+
>+    /* G stages translation mode */
>+    gatp_mode = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
>+    if (gatp_mode == RISCV_IOMMU_DC_IOHGATP_MODE_BARE)
>+        goto out;
>+
>+    /* G stages translation tables root pointer */
>+    base = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD));
>+
>+    /* Start at step 0 */
>+    sc.step = 0;
>+
>+    if (s->fctl & RISCV_IOMMU_FCTL_GXL) {
>+        /* 32bit mode for GXL == 1 */
>+        switch (gatp_mode) {
>+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4:
>+            if (!(s->cap & RISCV_IOMMU_CAP_SV32X4)) {
>+                return MEMTX_ACCESS_ERROR;
>+            }
>+            sc.levels    = 2;
>+            sc.ptidxbits = 10;
>+            sc.ptesize   = 4;
>+            break;
>+        default:
>+            return MEMTX_ACCESS_ERROR;
>+        }
>+    } else {
>+        /* 64bit mode for GXL == 0 */
>+        switch (gatp_mode) {
>+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4:
>+            if (!(s->cap & RISCV_IOMMU_CAP_SV39X4)) {
>+                return MEMTX_ACCESS_ERROR;
>+            }
>+            sc.levels    = 3;
>+            sc.ptidxbits = 9;
>+            sc.ptesize   = 8;
>+            break;
>+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4:
>+            if (!(s->cap & RISCV_IOMMU_CAP_SV48X4)) {
>+                return MEMTX_ACCESS_ERROR;
>+            }
>+            sc.levels    = 4;
>+            sc.ptidxbits = 9;
>+            sc.ptesize   = 8;
>+            break;
>+        case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4:
>+            if (!(s->cap & RISCV_IOMMU_CAP_SV57X4)) {
>+                return MEMTX_ACCESS_ERROR;
>+            }
>+            sc.levels    = 5;
>+            sc.ptidxbits = 9;
>+            sc.ptesize   = 8;
>+            break;
>+        default:
>+            return MEMTX_ACCESS_ERROR;
>+        }
>+    }
>+
>+    do {
>+        const unsigned va_bits = (sc.step ? 0 : 2) + sc.ptidxbits;
>+        const unsigned va_skip = TARGET_PAGE_BITS + sc.ptidxbits *
>+                                 (sc.levels - 1 - sc.step);
>+        const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1);
>+        const dma_addr_t pte_addr = base + idx * sc.ptesize;
>+
>+        /* Address range check before first level lookup */
>+        if (!sc.step) {
>+            const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1;
>+            if ((addr & va_mask) != addr) {
>+                return MEMTX_ACCESS_ERROR;
>+            }
>+        }
>+
>+        /* Read page table entry */
>+        if (sc.ptesize == 4) {
>+            uint32_t pte32 = 0;
>+            ret = ldl_le_dma(s->target_as, pte_addr, &pte32, attrs);
>+            pte = pte32;
>+        } else {
>+            ret = ldq_le_dma(s->target_as, pte_addr, &pte, attrs);
>+        }
>+        if (ret != MEMTX_OK)
>+            return ret;
>+
>+        sc.step++;
>+        hwaddr ppn = pte >> PTE_PPN_SHIFT;
>+
>+        if (!(pte & PTE_V)) {
>+            return MEMTX_ACCESS_ERROR; /* Invalid PTE */
>+        } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
>+            base = PPN_PHYS(ppn); /* Inner PTE, continue walking */
>+        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
>+            return MEMTX_ACCESS_ERROR; /* Reserved leaf PTE flags: PTE_W */
>+        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) {
>+            return MEMTX_ACCESS_ERROR; /* Reserved leaf PTE flags: PTE_W + PTE_X */
>+        } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) {
>+            return MEMTX_ACCESS_ERROR; /* Misaligned PPN */
>+        } else {
>+            /* Leaf PTE, translation completed. */
>+            base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1));
>+            break;
>+        }
>+
>+        if (sc.step == sc.levels) {
>+            return MEMTX_ACCESS_ERROR; /* Can't find leaf PTE */
>+        }
>+    } while (1);
>+
>+out:
>+    return dma_memory_read(s->target_as, base, buf, len, attrs);
>+}
>+
> /*
>  * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk
>  *
>@@ -1038,7 +1175,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>          */
>         const int split = depth * 9 + 8;
>         addr |= ((ctx->process_id >> split) << 3) & ~TARGET_PAGE_MASK;
>-        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
>+        if (pdt_memory_read(s, ctx, addr, &de, sizeof(de),
>                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
>             return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
>         }
>@@ -1053,7 +1190,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
> 
>     /* Leaf entry in PDT */
>     addr |= (ctx->process_id << 4) & ~TARGET_PAGE_MASK;
>-    if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2,
>+    if (pdt_memory_read(s, ctx, addr, &dc.ta, sizeof(uint64_t) * 2,
>                         MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
>         return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
>     }
>-- 
>2.40.1

Enabled PDT in the VM and test this patch works correctly.

Tested-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>

Thanks,
Fangyu

Re: [PATCH V3] hw/riscv/riscv-iommu: Fixup PDT Nested Walk
Posted by cp0613@linux.alibaba.com 4 months, 1 week ago
On Sat, 13 Sep 2025 00:12:33 -0400, guoren@kernel.org wrote:

> Changes in V3:
>  - Fixup inner non-leaf walking for 4KB-align.
>  - Add two Reviewed-by tags.> 

> Changes in V2:
>  - Remove nested param to make patch clearer.
> ---
>  hw/riscv/riscv-iommu.c | 141 ++++++++++++++++++++
>  1 file changed, 139 insertions(+), 2 deletions(-)

Tested-by: Chen Pei <cp0613@linux.alibaba.com>

Thanks,
Pei
Re: [PATCH V3] hw/riscv/riscv-iommu: Fixup PDT Nested Walk
Posted by Michael Tokarev 4 months, 1 week ago
On 13.09.2025 07:12, guoren@kernel.org wrote:
> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> 
> Current implementation is wrong when iohgatp != bare. The RISC-V
> IOMMU specification has defined that the PDT is based on GPA, not
> SPA. So this patch fixes the problem, making PDT walk correctly
> when the G-stage table walk is enabled.
> 
> Fixes: 0c54acb8243d ("hw/riscv: add RISC-V IOMMU base emulation")
> Cc: qemu-stable@nongnu.org
> Cc: Sebastien Boeuf <seb@rivosinc.com>
> Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
> Reviewed-by: Weiwei Li <liwei1518@gmail.com>
> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>

Ping?  Can we merge this one to master, so I can pick it up for
the stable branches?

Thanks,

/mjt