:p
atchew
Login
Hi, This is the second version of the work Tomasz sent in July 2023 [1]. I'll be helping Tomasz upstreaming it. The core emulation code is left unchanged but a few tweaks were made in v2: - The most notable difference in this version is that the code was split in smaller chunks. Patch 03 is still a 1700 lines patch, which is an improvement from the 3800 lines patch from v1, but we can only go so far when splitting the core components of the emulation. The reality is that the IOMMU emulation is a rather complex piece of software and there's not much we can do to alleviate it; - I'm not contributing the HPM support that was present in v1. It shaved off 600 lines of code from the series, which is already large enough as is. We'll introduce HPM in later versions or as a follow-up; - The riscv-iommu-header.h header was also trimmed. I shaved it of 300 or so from it, all of them from definitions that the emulation isn't using it. The header will be eventually be imported from the Linux driver (not upstream yet), so for now we can live with a trimmed header for the emulation usage alone; - I added libqos tests for the riscv-iommu-pci device. The idea of these tests is to give us more confidence in the emulation code; - 'edu' device support. The support was retrieved from Tomasz EDU branch [2]. This device can then be used to test PCI passthrough to exercise the IOMMU. Patches based on alistair/riscv-to-apply.next. v1 link: https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/ [1] https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/ [2] https://github.com/tjeznach/qemu.git, branch 'riscv_iommu_edu_impl' Andrew Jones (1): hw/riscv/riscv-iommu: Add another irq for mrif notifications Daniel Henrique Barboza (2): test/qtest: add riscv-iommu-pci tests qtest/riscv-iommu-test: add init queues test Tomasz Jeznach (12): exec/memtxattr: add process identifier to the transaction attributes hw/riscv: add riscv-iommu-bits.h hw/riscv: add RISC-V IOMMU base emulation hw/riscv: add riscv-iommu-pci device hw/riscv: add riscv-iommu-sys platform device hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug hw/riscv/riscv-iommu: add Address Translation Cache (IOATC) hw/riscv/riscv-iommu: add s-stage and g-stage support hw/riscv/riscv-iommu: add ATS support hw/riscv/riscv-iommu: add DBG support hw/misc: EDU: added PASID support hw/misc: EDU: add ATS/PRI capability hw/misc/edu.c | 297 ++++- hw/riscv/Kconfig | 4 + hw/riscv/meson.build | 1 + hw/riscv/riscv-iommu-bits.h | 407 ++++++ hw/riscv/riscv-iommu-pci.c | 173 +++ hw/riscv/riscv-iommu-sys.c | 93 ++ hw/riscv/riscv-iommu.c | 2085 ++++++++++++++++++++++++++++++ hw/riscv/riscv-iommu.h | 146 +++ hw/riscv/trace-events | 15 + hw/riscv/trace.h | 2 + hw/riscv/virt.c | 33 +- include/exec/memattrs.h | 5 + include/hw/riscv/iommu.h | 40 + meson.build | 1 + tests/qtest/libqos/meson.build | 4 + tests/qtest/libqos/riscv-iommu.c | 79 ++ tests/qtest/libqos/riscv-iommu.h | 96 ++ tests/qtest/meson.build | 1 + tests/qtest/riscv-iommu-test.c | 234 ++++ 19 files changed, 3704 insertions(+), 12 deletions(-) create mode 100644 hw/riscv/riscv-iommu-bits.h create mode 100644 hw/riscv/riscv-iommu-pci.c create mode 100644 hw/riscv/riscv-iommu-sys.c create mode 100644 hw/riscv/riscv-iommu.c create mode 100644 hw/riscv/riscv-iommu.h create mode 100644 hw/riscv/trace-events create mode 100644 hw/riscv/trace.h create mode 100644 include/hw/riscv/iommu.h create mode 100644 tests/qtest/libqos/riscv-iommu.c create mode 100644 tests/qtest/libqos/riscv-iommu.h create mode 100644 tests/qtest/riscv-iommu-test.c -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Extend memory transaction attributes with process identifier to allow per-request address translation logic to use requester_id / process_id to identify memory mapping (e.g. enabling IOMMU w/ PASID translations). Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> --- include/exec/memattrs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h index XXXXXXX..XXXXXXX 100644 --- a/include/exec/memattrs.h +++ b/include/exec/memattrs.h @@ -XXX,XX +XXX,XX @@ typedef struct MemTxAttrs { unsigned int memory:1; /* Requester ID (for MSI for example) */ unsigned int requester_id:16; + + /* + * PCI PASID support: Limited to 8 bits process identifier. + */ + unsigned int pasid:8; } MemTxAttrs; /* Bus masters which don't specify any attributes will get this, -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> This header will be used by the RISC-V IOMMU emulation to be added in the next patch. Due to its size it's being sent in separate for an easier review. One thing to notice is that this header can be replaced by the future Linux RISC-V IOMMU driver header, which would become a linux-header we would import instead of keeping our own. The Linux implementation isn't upstream yet so for now we'll have to manage riscv-iommu-bits.h. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/riscv-iommu-bits.h | 335 ++++++++++++++++++++++++++++++++++++ 1 file changed, 335 insertions(+) create mode 100644 hw/riscv/riscv-iommu-bits.h diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright © 2022-2023 Rivos Inc. + * Copyright © 2023 FORTH-ICS/CARV + * Copyright © 2023 RISC-V IOMMU Task Group + * + * RISC-V Ziommu - Register Layout and Data Structures. + * + * Based on the IOMMU spec version 1.0, 3/2023 + * https://github.com/riscv-non-isa/riscv-iommu + */ + +#ifndef HW_RISCV_IOMMU_BITS_H +#define HW_RISCV_IOMMU_BITS_H + +#include "qemu/osdep.h" + +#define RISCV_IOMMU_SPEC_DOT_VER 0x010 + +#ifndef GENMASK_ULL +#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l)) +#endif + +/* + * struct riscv_iommu_fq_record - Fault/Event Queue Record + * See section 3.2 for more info. + */ +struct riscv_iommu_fq_record { + uint64_t hdr; + uint64_t _reserved; + uint64_t iotval; + uint64_t iotval2; +}; +/* Header fields */ +#define RISCV_IOMMU_FQ_HDR_CAUSE GENMASK_ULL(11, 0) +#define RISCV_IOMMU_FQ_HDR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_FQ_HDR_PV BIT_ULL(32) +#define RISCV_IOMMU_FQ_HDR_TTYPE GENMASK_ULL(39, 34) +#define RISCV_IOMMU_FQ_HDR_DID GENMASK_ULL(63, 40) + +/* + * struct riscv_iommu_pq_record - PCIe Page Request record + * For more infos on the PCIe Page Request queue see chapter 3.3. + */ +struct riscv_iommu_pq_record { + uint64_t hdr; + uint64_t payload; +}; +/* Header fields */ +#define RISCV_IOMMU_PREQ_HDR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_PREQ_HDR_PV BIT_ULL(32) +#define RISCV_IOMMU_PREQ_HDR_PRIV BIT_ULL(33) +#define RISCV_IOMMU_PREQ_HDR_EXEC BIT_ULL(34) +#define RISCV_IOMMU_PREQ_HDR_DID GENMASK_ULL(63, 40) +/* Payload fields */ +#define RISCV_IOMMU_PREQ_PAYLOAD_M GENMASK_ULL(2, 0) + +/* Common field positions */ +#define RISCV_IOMMU_PPN_FIELD GENMASK_ULL(53, 10) +#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD GENMASK_ULL(4, 0) +#define RISCV_IOMMU_QUEUE_INDEX_FIELD GENMASK_ULL(31, 0) +#define RISCV_IOMMU_QUEUE_ENABLE BIT(0) +#define RISCV_IOMMU_QUEUE_INTR_ENABLE BIT(1) +#define RISCV_IOMMU_QUEUE_MEM_FAULT BIT(8) +#define RISCV_IOMMU_QUEUE_OVERFLOW BIT(9) +#define RISCV_IOMMU_QUEUE_ACTIVE BIT(16) +#define RISCV_IOMMU_QUEUE_BUSY BIT(17) +#define RISCV_IOMMU_ATP_PPN_FIELD GENMASK_ULL(43, 0) +#define RISCV_IOMMU_ATP_MODE_FIELD GENMASK_ULL(63, 60) + +/* 5.3 IOMMU Capabilities (64bits) */ +#define RISCV_IOMMU_REG_CAP 0x0000 +#define RISCV_IOMMU_CAP_VERSION GENMASK_ULL(7, 0) +#define RISCV_IOMMU_CAP_MSI_FLAT BIT_ULL(22) +#define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) +#define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) +#define RISCV_IOMMU_CAP_PAS GENMASK_ULL(37, 32) +#define RISCV_IOMMU_CAP_PD8 BIT_ULL(38) + +/* 5.4 Features control register (32bits) */ +#define RISCV_IOMMU_REG_FCTL 0x0008 + +/* 5.5 Device-directory-table pointer (64bits) */ +#define RISCV_IOMMU_REG_DDTP 0x0010 +#define RISCV_IOMMU_DDTP_MODE GENMASK_ULL(3, 0) +#define RISCV_IOMMU_DDTP_BUSY BIT_ULL(4) +#define RISCV_IOMMU_DDTP_PPN RISCV_IOMMU_PPN_FIELD + +enum riscv_iommu_ddtp_modes { + RISCV_IOMMU_DDTP_MODE_OFF = 0, + RISCV_IOMMU_DDTP_MODE_BARE = 1, + RISCV_IOMMU_DDTP_MODE_1LVL = 2, + RISCV_IOMMU_DDTP_MODE_2LVL = 3, + RISCV_IOMMU_DDTP_MODE_3LVL = 4, + RISCV_IOMMU_DDTP_MODE_MAX = 4 +}; + +/* 5.6 Command Queue Base (64bits) */ +#define RISCV_IOMMU_REG_CQB 0x0018 +#define RISCV_IOMMU_CQB_LOG2SZ RISCV_IOMMU_QUEUE_LOGSZ_FIELD +#define RISCV_IOMMU_CQB_PPN RISCV_IOMMU_PPN_FIELD + +/* 5.7 Command Queue head (32bits) */ +#define RISCV_IOMMU_REG_CQH 0x0020 + +/* 5.8 Command Queue tail (32bits) */ +#define RISCV_IOMMU_REG_CQT 0x0024 + +/* 5.9 Fault Queue Base (64bits) */ +#define RISCV_IOMMU_REG_FQB 0x0028 +#define RISCV_IOMMU_FQB_LOG2SZ RISCV_IOMMU_QUEUE_LOGSZ_FIELD +#define RISCV_IOMMU_FQB_PPN RISCV_IOMMU_PPN_FIELD + +/* 5.10 Fault Queue Head (32bits) */ +#define RISCV_IOMMU_REG_FQH 0x0030 + +/* 5.11 Fault Queue tail (32bits) */ +#define RISCV_IOMMU_REG_FQT 0x0034 + +/* 5.12 Page Request Queue base (64bits) */ +#define RISCV_IOMMU_REG_PQB 0x0038 +#define RISCV_IOMMU_PQB_LOG2SZ RISCV_IOMMU_QUEUE_LOGSZ_FIELD +#define RISCV_IOMMU_PQB_PPN RISCV_IOMMU_PPN_FIELD + +/* 5.13 Page Request Queue head (32bits) */ +#define RISCV_IOMMU_REG_PQH 0x0040 + +/* 5.14 Page Request Queue tail (32bits) */ +#define RISCV_IOMMU_REG_PQT 0x0044 + +/* 5.15 Command Queue CSR (32bits) */ +#define RISCV_IOMMU_REG_CQCSR 0x0048 +#define RISCV_IOMMU_CQCSR_CQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_CQCSR_CIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_CQCSR_CQMF RISCV_IOMMU_QUEUE_MEM_FAULT +#define RISCV_IOMMU_CQCSR_CMD_TO BIT(9) +#define RISCV_IOMMU_CQCSR_CMD_ILL BIT(10) +#define RISCV_IOMMU_CQCSR_CQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_CQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +/* 5.16 Fault Queue CSR (32bits) */ +#define RISCV_IOMMU_REG_FQCSR 0x004C +#define RISCV_IOMMU_FQCSR_FQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_FQCSR_FIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_FQCSR_FQMF RISCV_IOMMU_QUEUE_MEM_FAULT +#define RISCV_IOMMU_FQCSR_FQOF RISCV_IOMMU_QUEUE_OVERFLOW +#define RISCV_IOMMU_FQCSR_FQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_FQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +/* 5.17 Page Request Queue CSR (32bits) */ +#define RISCV_IOMMU_REG_PQCSR 0x0050 +#define RISCV_IOMMU_PQCSR_PQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_PQCSR_PIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_PQCSR_PQMF RISCV_IOMMU_QUEUE_MEM_FAULT +#define RISCV_IOMMU_PQCSR_PQOF RISCV_IOMMU_QUEUE_OVERFLOW +#define RISCV_IOMMU_PQCSR_PQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_PQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +/* 5.18 Interrupt Pending Status (32bits) */ +#define RISCV_IOMMU_REG_IPSR 0x0054 + +enum { + RISCV_IOMMU_INTR_CQ, + RISCV_IOMMU_INTR_FQ, + RISCV_IOMMU_INTR_PM, + RISCV_IOMMU_INTR_PQ, + RISCV_IOMMU_INTR_COUNT +}; + +/* 5.27 Interrupt cause to vector (64bits) */ +#define RISCV_IOMMU_REG_IVEC 0x02F8 + +/* 5.28 MSI Configuration table (32 * 64bits) */ +#define RISCV_IOMMU_REG_MSI_CONFIG 0x0300 + +#define RISCV_IOMMU_REG_SIZE 0x1000 + +#define RISCV_IOMMU_DDTE_VALID BIT_ULL(0) +#define RISCV_IOMMU_DDTE_PPN RISCV_IOMMU_PPN_FIELD + +/* Struct riscv_iommu_dc - Device Context - section 2.1 */ +struct riscv_iommu_dc { + uint64_t tc; + uint64_t iohgatp; + uint64_t ta; + uint64_t fsc; + uint64_t msiptp; + uint64_t msi_addr_mask; + uint64_t msi_addr_pattern; + uint64_t _reserved; +}; + +/* Translation control fields */ +#define RISCV_IOMMU_DC_TC_V BIT_ULL(0) +#define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) +#define RISCV_IOMMU_DC_TC_PDTV BIT_ULL(5) +#define RISCV_IOMMU_DC_TC_PRPR BIT_ULL(6) +#define RISCV_IOMMU_DC_TC_DPE BIT_ULL(9) +#define RISCV_IOMMU_DC_TC_SXL BIT_ULL(11) + +/* Second-stage (aka G-stage) context fields */ +#define RISCV_IOMMU_DC_IOHGATP_PPN RISCV_IOMMU_ATP_PPN_FIELD +#define RISCV_IOMMU_DC_IOHGATP_GSCID GENMASK_ULL(59, 44) +#define RISCV_IOMMU_DC_IOHGATP_MODE RISCV_IOMMU_ATP_MODE_FIELD + +enum riscv_iommu_dc_iohgatp_modes { + RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0, + RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8, + RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8, + RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9, + RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10 +}; + +/* Translation attributes fields */ +#define RISCV_IOMMU_DC_TA_PSCID GENMASK_ULL(31, 12) + +/* First-stage context fields */ +#define RISCV_IOMMU_DC_FSC_PPN RISCV_IOMMU_ATP_PPN_FIELD +#define RISCV_IOMMU_DC_FSC_MODE RISCV_IOMMU_ATP_MODE_FIELD + +/* Generic I/O MMU command structure - check section 3.1 */ +struct riscv_iommu_command { + uint64_t dword0; + uint64_t dword1; +}; + +#define RISCV_IOMMU_CMD_OPCODE GENMASK_ULL(6, 0) +#define RISCV_IOMMU_CMD_FUNC GENMASK_ULL(9, 7) + +#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE 1 +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA 0 +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA 1 +#define RISCV_IOMMU_CMD_IOTINVAL_AV BIT_ULL(10) +#define RISCV_IOMMU_CMD_IOTINVAL_PSCID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_CMD_IOTINVAL_PSCV BIT_ULL(32) +#define RISCV_IOMMU_CMD_IOTINVAL_GV BIT_ULL(33) +#define RISCV_IOMMU_CMD_IOTINVAL_GSCID GENMASK_ULL(59, 44) + +#define RISCV_IOMMU_CMD_IOFENCE_OPCODE 2 +#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C 0 +#define RISCV_IOMMU_CMD_IOFENCE_AV BIT_ULL(10) +#define RISCV_IOMMU_CMD_IOFENCE_DATA GENMASK_ULL(63, 32) + +#define RISCV_IOMMU_CMD_IODIR_OPCODE 3 +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT 0 +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT 1 +#define RISCV_IOMMU_CMD_IODIR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_CMD_IODIR_DV BIT_ULL(33) +#define RISCV_IOMMU_CMD_IODIR_DID GENMASK_ULL(63, 40) + +enum riscv_iommu_dc_fsc_atp_modes { + RISCV_IOMMU_DC_FSC_MODE_BARE = 0, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10, + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1, + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2, + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3 +}; + +enum riscv_iommu_fq_causes { + RISCV_IOMMU_FQ_CAUSE_INST_FAULT = 1, + RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED = 4, + RISCV_IOMMU_FQ_CAUSE_RD_FAULT = 5, + RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED = 6, + RISCV_IOMMU_FQ_CAUSE_WR_FAULT = 7, + RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S = 12, + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S = 13, + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S = 15, + RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS = 20, + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS = 21, + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS = 23, + RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED = 256, + RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT = 257, + RISCV_IOMMU_FQ_CAUSE_DDT_INVALID = 258, + RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED = 259, + RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED = 260, + RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT = 261, + RISCV_IOMMU_FQ_CAUSE_MSI_INVALID = 262, + RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED = 263, + RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT = 264, + RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT = 265, + RISCV_IOMMU_FQ_CAUSE_PDT_INVALID = 266, + RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED = 267, + RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED = 268, + RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED = 269, + RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED = 270, + RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED = 271, + RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR = 272, + RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT = 273, + RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED = 274 +}; + +/* MSI page table pointer */ +#define RISCV_IOMMU_DC_MSIPTP_PPN RISCV_IOMMU_ATP_PPN_FIELD +#define RISCV_IOMMU_DC_MSIPTP_MODE RISCV_IOMMU_ATP_MODE_FIELD +#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT 1 + +/* Translation attributes fields */ +#define RISCV_IOMMU_PC_TA_V BIT_ULL(0) + +/* First stage context fields */ +#define RISCV_IOMMU_PC_FSC_PPN GENMASK_ULL(43, 0) + +enum riscv_iommu_fq_ttypes { + RISCV_IOMMU_FQ_TTYPE_NONE = 0, + RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1, + RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2, + RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3, + RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5, + RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6, + RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7, + RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8, +}; + +/* Fields on pte */ +#define RISCV_IOMMU_MSI_PTE_V BIT_ULL(0) +#define RISCV_IOMMU_MSI_PTE_M GENMASK_ULL(2, 1) + +#define RISCV_IOMMU_MSI_PTE_M_MRIF 1 +#define RISCV_IOMMU_MSI_PTE_M_BASIC 3 + +/* When M == 1 (MRIF mode) */ +#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR GENMASK_ULL(53, 7) +/* When M == 3 (basic mode) */ +#define RISCV_IOMMU_MSI_PTE_PPN RISCV_IOMMU_PPN_FIELD +#define RISCV_IOMMU_MSI_PTE_C BIT_ULL(63) + +/* Fields on mrif_info */ +#define RISCV_IOMMU_MSI_MRIF_NID GENMASK_ULL(9, 0) +#define RISCV_IOMMU_MSI_MRIF_NPPN RISCV_IOMMU_PPN_FIELD +#define RISCV_IOMMU_MSI_MRIF_NID_MSB BIT_ULL(60) + +#endif /* _RISCV_IOMMU_BITS_H_ */ -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> The RISC-V IOMMU specification is now ratified as-per the RISC-V international process. The latest frozen specifcation can be found at: https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf Add the foundation of the device emulation for RISC-V IOMMU, which includes an IOMMU that has no capabilities but MSI interrupt support and fault queue interfaces. We'll add add more features incrementally in the next patches. Co-developed-by: Sebastien Boeuf <seb@rivosinc.com> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/Kconfig | 4 + hw/riscv/meson.build | 1 + hw/riscv/riscv-iommu.c | 1492 ++++++++++++++++++++++++++++++++++++++ hw/riscv/riscv-iommu.h | 141 ++++ hw/riscv/trace-events | 11 + hw/riscv/trace.h | 2 + include/hw/riscv/iommu.h | 36 + meson.build | 1 + 8 files changed, 1688 insertions(+) create mode 100644 hw/riscv/riscv-iommu.c create mode 100644 hw/riscv/riscv-iommu.h create mode 100644 hw/riscv/trace-events create mode 100644 hw/riscv/trace.h create mode 100644 include/hw/riscv/iommu.h diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/Kconfig +++ b/hw/riscv/Kconfig @@ -XXX,XX +XXX,XX @@ +config RISCV_IOMMU + bool + config RISCV_NUMA bool @@ -XXX,XX +XXX,XX @@ config RISCV_VIRT select SERIAL select RISCV_ACLINT select RISCV_APLIC + select RISCV_IOMMU select RISCV_IMSIC select SIFIVE_PLIC select SIFIVE_TEST diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/meson.build +++ b/hw/riscv/meson.build @@ -XXX,XX +XXX,XX @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c')) riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c')) riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c')) riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c')) +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c')) hw_arch += {'riscv': riscv_ss} diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU (Ziommu) + * + * Copyright (C) 2021-2023, Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qom/object.h" +#include "hw/pci/pci_bus.h" +#include "hw/pci/pci_device.h" +#include "hw/qdev-properties.h" +#include "hw/riscv/riscv_hart.h" +#include "migration/vmstate.h" +#include "qapi/error.h" +#include "qemu/timer.h" + +#include "cpu_bits.h" +#include "riscv-iommu.h" +#include "riscv-iommu-bits.h" +#include "trace.h" + +#define LIMIT_CACHE_CTX (1U << 7) +#define LIMIT_CACHE_IOT (1U << 20) + +/* Physical page number coversions */ +#define PPN_PHYS(ppn) ((ppn) << TARGET_PAGE_BITS) +#define PPN_DOWN(phy) ((phy) >> TARGET_PAGE_BITS) + +typedef struct RISCVIOMMUContext RISCVIOMMUContext; +typedef struct RISCVIOMMUEntry RISCVIOMMUEntry; + +/* Device assigned I/O address space */ +struct RISCVIOMMUSpace { + IOMMUMemoryRegion iova_mr; /* IOVA memory region for attached device */ + AddressSpace iova_as; /* IOVA address space for attached device */ + RISCVIOMMUState *iommu; /* Managing IOMMU device state */ + uint32_t devid; /* Requester identifier, AKA device_id */ + bool notifier; /* IOMMU unmap notifier enabled */ + QLIST_ENTRY(RISCVIOMMUSpace) list; +}; + +/* Device translation context state. */ +struct RISCVIOMMUContext { + uint64_t devid:24; /* Requester Id, AKA device_id */ + uint64_t pasid:20; /* Process Address Space ID */ + uint64_t __rfu:20; /* reserved */ + uint64_t tc; /* Translation Control */ + uint64_t ta; /* Translation Attributes */ + uint64_t msi_addr_mask; /* MSI filtering - address mask */ + uint64_t msi_addr_pattern; /* MSI filtering - address pattern */ + uint64_t msiptp; /* MSI redirection page table pointer */ +}; + +/* IOMMU index for transactions without PASID specified. */ +#define RISCV_IOMMU_NOPASID 0 + +static void riscv_iommu_notify(RISCVIOMMUState *s, int vec) +{ + const uint32_t ipsr = + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0); + const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC); + if (s->notify && !(ipsr & (1 << vec))) { + s->notify(s, (ivec >> (vec * 4)) & 0x0F); + } +} + +static void riscv_iommu_fault(RISCVIOMMUState *s, + struct riscv_iommu_fq_record *ev) +{ + uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR); + uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQH) & s->fq_mask; + uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQT) & s->fq_mask; + uint32_t next = (tail + 1) & s->fq_mask; + uint32_t devid = get_field(ev->hdr, RISCV_IOMMU_FQ_HDR_DID); + + trace_riscv_iommu_flt(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid), + PCI_FUNC(devid), ev->hdr, ev->iotval); + + if (!(ctrl & RISCV_IOMMU_FQCSR_FQON) || + !!(ctrl & (RISCV_IOMMU_FQCSR_FQOF | RISCV_IOMMU_FQCSR_FQMF))) { + return; + } + + if (head == next) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, + RISCV_IOMMU_FQCSR_FQOF, 0); + } else { + dma_addr_t addr = s->fq_addr + tail * sizeof(*ev); + if (dma_memory_write(s->target_as, addr, ev, sizeof(*ev), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, + RISCV_IOMMU_FQCSR_FQMF, 0); + } else { + riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_FQT, next); + } + } + + if (ctrl & RISCV_IOMMU_FQCSR_FIE) { + riscv_iommu_notify(s, RISCV_IOMMU_INTR_FQ); + } +} + +static void riscv_iommu_pri(RISCVIOMMUState *s, + struct riscv_iommu_pq_record *pr) +{ + uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR); + uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQH) & s->pq_mask; + uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQT) & s->pq_mask; + uint32_t next = (tail + 1) & s->pq_mask; + uint32_t devid = get_field(pr->hdr, RISCV_IOMMU_PREQ_HDR_DID); + + trace_riscv_iommu_pri(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid), + PCI_FUNC(devid), pr->payload); + + if (!(ctrl & RISCV_IOMMU_PQCSR_PQON) || + !!(ctrl & (RISCV_IOMMU_PQCSR_PQOF | RISCV_IOMMU_PQCSR_PQMF))) { + return; + } + + if (head == next) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, + RISCV_IOMMU_PQCSR_PQOF, 0); + } else { + dma_addr_t addr = s->pq_addr + tail * sizeof(*pr); + if (dma_memory_write(s->target_as, addr, pr, sizeof(*pr), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, + RISCV_IOMMU_PQCSR_PQMF, 0); + } else { + riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_PQT, next); + } + } + + if (ctrl & RISCV_IOMMU_PQCSR_PIE) { + riscv_iommu_notify(s, RISCV_IOMMU_INTR_PQ); + } +} + +/* Portable implementation of pext_u64, bit-mask extraction. */ +static uint64_t _pext_u64(uint64_t val, uint64_t ext) +{ + uint64_t ret = 0; + uint64_t rot = 1; + + while (ext) { + if (ext & 1) { + if (val & 1) { + ret |= rot; + } + rot <<= 1; + } + val >>= 1; + ext >>= 1; + } + + return ret; +} + +/* Check if GPA matches MSI/MRIF pattern. */ +static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, + dma_addr_t gpa) +{ + if (get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_MODE) != + RISCV_IOMMU_DC_MSIPTP_MODE_FLAT) { + return false; /* Invalid MSI/MRIF mode */ + } + + if ((PPN_DOWN(gpa) ^ ctx->msi_addr_pattern) & ~ctx->msi_addr_mask) { + return false; /* GPA not in MSI range defined by AIA IMSIC rules. */ + } + + return true; +} + +/* RISCV IOMMU Address Translation Lookup - Page Table Walk */ +static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, + IOMMUTLBEntry *iotlb) +{ + /* Early check for MSI address match when IOVA == GPA */ + if (iotlb->perm & IOMMU_WO && + riscv_iommu_msi_check(s, ctx, iotlb->iova)) { + iotlb->target_as = &s->trap_as; + iotlb->translated_addr = iotlb->iova; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + return 0; + } + + /* Exit early for pass-through mode. */ + iotlb->translated_addr = iotlb->iova; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + /* Allow R/W in pass-through mode */ + iotlb->perm = IOMMU_RW; + return 0; +} + +/* Redirect MSI write for given GPA. */ +static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s, + RISCVIOMMUContext *ctx, uint64_t gpa, uint64_t data, + unsigned size, MemTxAttrs attrs) +{ + MemTxResult res; + dma_addr_t addr; + uint64_t intn; + uint32_t n190; + uint64_t pte[2]; + + if (!riscv_iommu_msi_check(s, ctx, gpa)) { + return MEMTX_ACCESS_ERROR; + } + + /* Interrupt File Number */ + intn = _pext_u64(PPN_DOWN(gpa), ctx->msi_addr_mask); + if (intn >= 256) { + /* Interrupt file number out of range */ + return MEMTX_ACCESS_ERROR; + } + + /* fetch MSI PTE */ + addr = PPN_PHYS(get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_PPN)); + addr = addr | (intn * sizeof(pte)); + res = dma_memory_read(s->target_as, addr, &pte, sizeof(pte), + MEMTXATTRS_UNSPECIFIED); + if (res != MEMTX_OK) { + return res; + } + + le64_to_cpus(&pte[0]); + le64_to_cpus(&pte[1]); + + if (!(pte[0] & RISCV_IOMMU_MSI_PTE_V) || (pte[0] & RISCV_IOMMU_MSI_PTE_C)) { + return MEMTX_ACCESS_ERROR; + } + + switch (get_field(pte[0], RISCV_IOMMU_MSI_PTE_M)) { + case RISCV_IOMMU_MSI_PTE_M_BASIC: + /* MSI Pass-through mode */ + addr = PPN_PHYS(get_field(pte[0], RISCV_IOMMU_MSI_PTE_PPN)); + addr = addr | (gpa & TARGET_PAGE_MASK); + + trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid), + PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), + gpa, addr); + + return dma_memory_write(s->target_as, addr, &data, size, attrs); + case RISCV_IOMMU_MSI_PTE_M_MRIF: + /* MRIF mode, continue. */ + break; + default: + return MEMTX_ACCESS_ERROR; + } + + /* + * Report an error for interrupt identities exceeding the maximum allowed + * for an IMSIC interrupt file (2047) or destination address is not 32-bit + * aligned. See IOMMU Specification, Chapter 2.3. MSI page tables. + */ + if ((data > 2047) || (gpa & 3)) { + return MEMTX_ACCESS_ERROR; + } + + /* MSI MRIF mode, non atomic pending bit update */ + + /* MRIF pending bit address */ + addr = get_field(pte[0], RISCV_IOMMU_MSI_PTE_MRIF_ADDR) << 9; + addr = addr | ((data & 0x7c0) >> 3); + + trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid), + PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), + gpa, addr); + + /* MRIF pending bit mask */ + data = 1ULL << (data & 0x03f); + res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs); + if (res != MEMTX_OK) { + return res; + } + intn = intn | data; + res = dma_memory_write(s->target_as, addr, &intn, sizeof(intn), attrs); + if (res != MEMTX_OK) { + return res; + } + + /* Get MRIF enable bits */ + addr = addr + sizeof(intn); + res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs); + if (res != MEMTX_OK) { + return res; + } + if (!(intn & data)) { + /* notification disabled, MRIF update completed. */ + return MEMTX_OK; + } + + /* Send notification message */ + addr = PPN_PHYS(get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NPPN)); + n190 = get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID) | + (get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID_MSB) << 10); + + res = dma_memory_write(s->target_as, addr, &n190, sizeof(n190), attrs); + if (res != MEMTX_OK) { + return res; + } + + return MEMTX_OK; +} + +/* + * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk + * + * @s : IOMMU Device State + * @ctx : Device Translation Context with devid and pasid set. + * @return : success or fault code. + */ +static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) +{ + const uint64_t ddtp = s->ddtp; + unsigned mode = get_field(ddtp, RISCV_IOMMU_DDTP_MODE); + dma_addr_t addr = PPN_PHYS(get_field(ddtp, RISCV_IOMMU_DDTP_PPN)); + struct riscv_iommu_dc dc; + /* Device Context format: 0: extended (64 bytes) | 1: base (32 bytes) */ + const int dc_fmt = !s->enable_msi; + const size_t dc_len = sizeof(dc) >> dc_fmt; + unsigned depth; + uint64_t de; + + switch (mode) { + case RISCV_IOMMU_DDTP_MODE_OFF: + return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED; + + case RISCV_IOMMU_DDTP_MODE_BARE: + /* mock up pass-through translation context */ + ctx->tc = RISCV_IOMMU_DC_TC_V; + ctx->ta = 0; + ctx->msiptp = 0; + return 0; + + case RISCV_IOMMU_DDTP_MODE_1LVL: + depth = 0; + break; + + case RISCV_IOMMU_DDTP_MODE_2LVL: + depth = 1; + break; + + case RISCV_IOMMU_DDTP_MODE_3LVL: + depth = 2; + break; + + default: + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + + /* + * Check supported device id width (in bits). + * See IOMMU Specification, Chapter 6. Software guidelines. + * - if extended device-context format is used: + * 1LVL: 6, 2LVL: 15, 3LVL: 24 + * - if base device-context format is used: + * 1LVL: 7, 2LVL: 16, 3LVL: 24 + */ + if (ctx->devid >= (1 << (depth * 9 + 6 + (dc_fmt && depth != 2)))) { + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + + /* Device directory tree walk */ + for (; depth-- > 0; ) { + /* + * Select device id index bits based on device directory tree level + * and device context format. + * See IOMMU Specification, Chapter 2. Data Structures. + * - if extended device-context format is used: + * device index: [23:15][14:6][5:0] + * - if base device-context format is used: + * device index: [23:16][15:7][6:0] + */ + const int split = depth * 9 + 6 + dc_fmt; + addr |= ((ctx->devid >> split) << 3) & ~TARGET_PAGE_MASK; + if (dma_memory_read(s->target_as, addr, &de, sizeof(de), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT; + } + le64_to_cpus(&de); + if (!(de & RISCV_IOMMU_DDTE_VALID)) { + /* invalid directory entry */ + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + if (de & ~(RISCV_IOMMU_DDTE_PPN | RISCV_IOMMU_DDTE_VALID)) { + /* reserved bits set */ + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + addr = PPN_PHYS(get_field(de, RISCV_IOMMU_DDTE_PPN)); + } + + /* index into device context entry page */ + addr |= (ctx->devid * dc_len) & ~TARGET_PAGE_MASK; + + memset(&dc, 0, sizeof(dc)); + if (dma_memory_read(s->target_as, addr, &dc, dc_len, + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT; + } + + /* Set translation context. */ + ctx->tc = le64_to_cpu(dc.tc); + ctx->ta = le64_to_cpu(dc.ta); + ctx->msiptp = le64_to_cpu(dc.msiptp); + ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask); + ctx->msi_addr_pattern = le64_to_cpu(dc.msi_addr_pattern); + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_V)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) { + if (ctx->pasid != RISCV_IOMMU_NOPASID) { + /* PASID is disabled */ + return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED; + } + return 0; + } + + /* FSC.TC.PDTV enabled */ + if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) { + /* Invalid PDTP.MODE */ + return RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED; + } + + for (depth = mode - RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8; depth-- > 0; ) { + /* + * Select process id index bits based on process directory tree + * level. See IOMMU Specification, 2.2. Process-Directory-Table. + */ + const int split = depth * 9 + 8; + addr |= ((ctx->pasid >> split) << 3) & ~TARGET_PAGE_MASK; + if (dma_memory_read(s->target_as, addr, &de, sizeof(de), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT; + } + le64_to_cpus(&de); + if (!(de & RISCV_IOMMU_PC_TA_V)) { + return RISCV_IOMMU_FQ_CAUSE_PDT_INVALID; + } + addr = PPN_PHYS(get_field(de, RISCV_IOMMU_PC_FSC_PPN)); + } + + /* Leaf entry in PDT */ + addr |= (ctx->pasid << 4) & ~TARGET_PAGE_MASK; + if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2, + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT; + } + + /* Use FSC and TA from process directory entry. */ + ctx->ta = le64_to_cpu(dc.ta); + + return 0; +} + +/* Translation Context cache support */ +static gboolean __ctx_equal(gconstpointer v1, gconstpointer v2) +{ + RISCVIOMMUContext *c1 = (RISCVIOMMUContext *) v1; + RISCVIOMMUContext *c2 = (RISCVIOMMUContext *) v2; + return c1->devid == c2->devid && c1->pasid == c2->pasid; +} + +static guint __ctx_hash(gconstpointer v) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) v; + /* Generate simple hash of (pasid, devid), assuming 24-bit wide devid */ + return (guint)(ctx->devid) + ((guint)(ctx->pasid) << 24); +} + +static void __ctx_inval_devid_pasid(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value; + RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data; + if (ctx->tc & RISCV_IOMMU_DC_TC_V && + ctx->devid == arg->devid && + ctx->pasid == arg->pasid) { + ctx->tc &= ~RISCV_IOMMU_DC_TC_V; + } +} + +static void __ctx_inval_devid(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value; + RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data; + if (ctx->tc & RISCV_IOMMU_DC_TC_V && + ctx->devid == arg->devid) { + ctx->tc &= ~RISCV_IOMMU_DC_TC_V; + } +} + +static void __ctx_inval_all(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value; + if (ctx->tc & RISCV_IOMMU_DC_TC_V) { + ctx->tc &= ~RISCV_IOMMU_DC_TC_V; + } +} + +static void riscv_iommu_ctx_inval(RISCVIOMMUState *s, GHFunc func, + uint32_t devid, uint32_t pasid) +{ + GHashTable *ctx_cache; + RISCVIOMMUContext key = { + .devid = devid, + .pasid = pasid, + }; + ctx_cache = g_hash_table_ref(s->ctx_cache); + g_hash_table_foreach(ctx_cache, func, &key); + g_hash_table_unref(ctx_cache); +} + +/* Find or allocate translation context for a given {device_id, process_id} */ +static RISCVIOMMUContext *riscv_iommu_ctx(RISCVIOMMUState *s, + unsigned devid, unsigned pasid, void **ref) +{ + GHashTable *ctx_cache; + RISCVIOMMUContext *ctx; + RISCVIOMMUContext key = { + .devid = devid, + .pasid = pasid, + }; + + ctx_cache = g_hash_table_ref(s->ctx_cache); + ctx = g_hash_table_lookup(ctx_cache, &key); + + if (ctx && (ctx->tc & RISCV_IOMMU_DC_TC_V)) { + *ref = ctx_cache; + return ctx; + } + + if (g_hash_table_size(s->ctx_cache) >= LIMIT_CACHE_CTX) { + ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal, + g_free, NULL); + g_hash_table_unref(qatomic_xchg(&s->ctx_cache, ctx_cache)); + } + + ctx = g_new0(RISCVIOMMUContext, 1); + ctx->devid = devid; + ctx->pasid = pasid; + + int fault = riscv_iommu_ctx_fetch(s, ctx); + if (!fault) { + g_hash_table_add(ctx_cache, ctx); + *ref = ctx_cache; + return ctx; + } + + g_hash_table_unref(ctx_cache); + *ref = NULL; + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_DTF)) { + struct riscv_iommu_fq_record ev = { 0 }; + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, fault); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, + RISCV_IOMMU_FQ_TTYPE_UADDR_RD); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, devid); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, pasid); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, !!pasid); + riscv_iommu_fault(s, &ev); + } + + g_free(ctx); + return NULL; +} + +static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref) +{ + if (ref) { + g_hash_table_unref((GHashTable *)ref); + } +} + +/* Find or allocate address space for a given device */ +static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid) +{ + RISCVIOMMUSpace *as; + + /* FIXME: PCIe bus remapping for attached endpoints. */ + devid |= s->bus << 8; + + qemu_mutex_lock(&s->core_lock); + QLIST_FOREACH(as, &s->spaces, list) { + if (as->devid == devid) { + break; + } + } + qemu_mutex_unlock(&s->core_lock); + + if (as == NULL) { + char name[64]; + as = g_new0(RISCVIOMMUSpace, 1); + + as->iommu = s; + as->devid = devid; + + snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova", + PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid)); + + /* IOVA address space, untranslated addresses */ + memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr), + TYPE_RISCV_IOMMU_MEMORY_REGION, + OBJECT(as), name, UINT64_MAX); + address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr), + TYPE_RISCV_IOMMU_PCI); + + qemu_mutex_lock(&s->core_lock); + QLIST_INSERT_HEAD(&s->spaces, as, list); + qemu_mutex_unlock(&s->core_lock); + + trace_riscv_iommu_new(s->parent_obj.id, PCI_BUS_NUM(as->devid), + PCI_SLOT(as->devid), PCI_FUNC(as->devid)); + } + return &as->iova_as; +} + +static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, + IOMMUTLBEntry *iotlb) +{ + bool enable_faults; + bool enable_pasid; + bool enable_pri; + int fault; + + enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF); + /* + * TC[32] is reserved for custom extensions, used here to temporarily + * enable automatic page-request generation for ATS queries. + */ + enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32)); + enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV); + + /* Translate using device directory / page table information. */ + fault = riscv_iommu_spa_fetch(s, ctx, iotlb); + + if (enable_pri && fault) { + struct riscv_iommu_pq_record pr = {0}; + if (enable_pasid) { + pr.hdr = set_field(RISCV_IOMMU_PREQ_HDR_PV, + RISCV_IOMMU_PREQ_HDR_PID, ctx->pasid); + } + pr.hdr = set_field(pr.hdr, RISCV_IOMMU_PREQ_HDR_DID, ctx->devid); + pr.payload = (iotlb->iova & TARGET_PAGE_MASK) | + RISCV_IOMMU_PREQ_PAYLOAD_M; + riscv_iommu_pri(s, &pr); + return fault; + } + + if (enable_faults && fault) { + struct riscv_iommu_fq_record ev; + unsigned ttype; + + if (iotlb->perm & IOMMU_RW) { + ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR; + } else { + ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD; + } + ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, ctx->pasid); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, ctx->devid); + ev.iotval = iotlb->iova; + ev.iotval2 = iotlb->translated_addr; + ev._reserved = 0; + riscv_iommu_fault(s, &ev); + return fault; + } + + return 0; +} + +/* IOMMU Command Interface */ +static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify, + uint64_t addr, uint32_t data) +{ + /* + * ATS processing in this implementation of the IOMMU is synchronous, + * no need to wait for completions here. + */ + if (!notify) { + return MEMTX_OK; + } + + return dma_memory_write(s->target_as, addr, &data, sizeof(data), + MEMTXATTRS_UNSPECIFIED); +} + +static void riscv_iommu_process_ddtp(RISCVIOMMUState *s) +{ + uint64_t old_ddtp = s->ddtp; + uint64_t new_ddtp = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_DDTP); + unsigned new_mode = get_field(new_ddtp, RISCV_IOMMU_DDTP_MODE); + unsigned old_mode = get_field(old_ddtp, RISCV_IOMMU_DDTP_MODE); + bool ok = false; + + /* + * Check for allowed DDTP.MODE transitions: + * {OFF, BARE} -> {OFF, BARE, 1LVL, 2LVL, 3LVL} + * {1LVL, 2LVL, 3LVL} -> {OFF, BARE} + */ + if (new_mode == old_mode || + new_mode == RISCV_IOMMU_DDTP_MODE_OFF || + new_mode == RISCV_IOMMU_DDTP_MODE_BARE) { + ok = true; + } else if (new_mode == RISCV_IOMMU_DDTP_MODE_1LVL || + new_mode == RISCV_IOMMU_DDTP_MODE_2LVL || + new_mode == RISCV_IOMMU_DDTP_MODE_3LVL) { + ok = old_mode == RISCV_IOMMU_DDTP_MODE_OFF || + old_mode == RISCV_IOMMU_DDTP_MODE_BARE; + } + + if (ok) { + /* clear reserved and busy bits, report back sanitized version */ + new_ddtp = set_field(new_ddtp & RISCV_IOMMU_DDTP_PPN, + RISCV_IOMMU_DDTP_MODE, new_mode); + } else { + new_ddtp = old_ddtp; + } + s->ddtp = new_ddtp; + + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_DDTP, new_ddtp); +} + +/* Command function and opcode field. */ +#define RISCV_IOMMU_CMD(func, op) (((func) << 7) | (op)) + +static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s) +{ + struct riscv_iommu_command cmd; + MemTxResult res; + dma_addr_t addr; + uint32_t tail, head, ctrl; + uint64_t cmd_opcode; + GHFunc func; + + ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR); + tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQT) & s->cq_mask; + head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQH) & s->cq_mask; + + /* Check for pending error or queue processing disabled */ + if (!(ctrl & RISCV_IOMMU_CQCSR_CQON) || + !!(ctrl & (RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CQMF))) { + return; + } + + while (tail != head) { + addr = s->cq_addr + head * sizeof(cmd); + res = dma_memory_read(s->target_as, addr, &cmd, sizeof(cmd), + MEMTXATTRS_UNSPECIFIED); + + if (res != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, + RISCV_IOMMU_CQCSR_CQMF, 0); + goto fault; + } + + trace_riscv_iommu_cmd(s->parent_obj.id, cmd.dword0, cmd.dword1); + + cmd_opcode = get_field(cmd.dword0, + RISCV_IOMMU_CMD_OPCODE | RISCV_IOMMU_CMD_FUNC); + + switch (cmd_opcode) { + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOFENCE_FUNC_C, + RISCV_IOMMU_CMD_IOFENCE_OPCODE): + res = riscv_iommu_iofence(s, + cmd.dword0 & RISCV_IOMMU_CMD_IOFENCE_AV, cmd.dword1, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOFENCE_DATA)); + + if (res != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, + RISCV_IOMMU_CQCSR_CQMF, 0); + goto fault; + } + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA, + RISCV_IOMMU_CMD_IOTINVAL_OPCODE): + if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) { + /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */ + goto cmd_ill; + } + /* translation cache not implemented yet */ + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA, + RISCV_IOMMU_CMD_IOTINVAL_OPCODE): + /* translation cache not implemented yet */ + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT, + RISCV_IOMMU_CMD_IODIR_OPCODE): + if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) { + /* invalidate all device context cache mappings */ + func = __ctx_inval_all; + } else { + /* invalidate all device context matching DID */ + func = __ctx_inval_devid; + } + riscv_iommu_ctx_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), 0); + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT, + RISCV_IOMMU_CMD_IODIR_OPCODE): + if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) { + /* illegal command arguments IODIR_PDT & DV == 0 */ + goto cmd_ill; + } else { + func = __ctx_inval_devid_pasid; + } + riscv_iommu_ctx_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), + get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID)); + break; + + default: + cmd_ill: + /* Invalid instruction, do not advance instruction index. */ + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, + RISCV_IOMMU_CQCSR_CMD_ILL, 0); + goto fault; + } + + /* Advance and update head pointer after command completes. */ + head = (head + 1) & s->cq_mask; + riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_CQH, head); + } + return; + +fault: + if (ctrl & RISCV_IOMMU_CQCSR_CIE) { + riscv_iommu_notify(s, RISCV_IOMMU_INTR_CQ); + } +} + +static void riscv_iommu_process_cq_control(RISCVIOMMUState *s) +{ + uint64_t base; + uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR); + uint32_t ctrl_clr; + bool enable = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQEN); + bool active = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQON); + + if (enable && !active) { + base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_CQB); + s->cq_mask = (2ULL << get_field(base, RISCV_IOMMU_CQB_LOG2SZ)) - 1; + s->cq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_CQB_PPN)); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~s->cq_mask); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQH], 0); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQT], 0); + ctrl_set = RISCV_IOMMU_CQCSR_CQON; + ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQMF | + RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CMD_TO; + } else if (!enable && active) { + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~0); + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQON; + } else { + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_CQCSR_BUSY; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, ctrl_set, ctrl_clr); +} + +static void riscv_iommu_process_fq_control(RISCVIOMMUState *s) +{ + uint64_t base; + uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR); + uint32_t ctrl_clr; + bool enable = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQEN); + bool active = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQON); + + if (enable && !active) { + base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_FQB); + s->fq_mask = (2ULL << get_field(base, RISCV_IOMMU_FQB_LOG2SZ)) - 1; + s->fq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_FQB_PPN)); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~s->fq_mask); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQH], 0); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQT], 0); + ctrl_set = RISCV_IOMMU_FQCSR_FQON; + ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQMF | + RISCV_IOMMU_FQCSR_FQOF; + } else if (!enable && active) { + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~0); + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQON; + } else { + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_FQCSR_BUSY; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, ctrl_set, ctrl_clr); +} + +static void riscv_iommu_process_pq_control(RISCVIOMMUState *s) +{ + uint64_t base; + uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR); + uint32_t ctrl_clr; + bool enable = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQEN); + bool active = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQON); + + if (enable && !active) { + base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_PQB); + s->pq_mask = (2ULL << get_field(base, RISCV_IOMMU_PQB_LOG2SZ)) - 1; + s->pq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_PQB_PPN)); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~s->pq_mask); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQH], 0); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQT], 0); + ctrl_set = RISCV_IOMMU_PQCSR_PQON; + ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQMF | + RISCV_IOMMU_PQCSR_PQOF; + } else if (!enable && active) { + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~0); + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQON; + } else { + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_PQCSR_BUSY; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr); +} + +/* Core IOMMU execution activation */ +enum { + RISCV_IOMMU_EXEC_DDTP, + RISCV_IOMMU_EXEC_CQCSR, + RISCV_IOMMU_EXEC_CQT, + RISCV_IOMMU_EXEC_FQCSR, + RISCV_IOMMU_EXEC_FQH, + RISCV_IOMMU_EXEC_PQCSR, + RISCV_IOMMU_EXEC_PQH, + RISCV_IOMMU_EXEC_TR_REQUEST, + /* RISCV_IOMMU_EXEC_EXIT must be the last enum value */ + RISCV_IOMMU_EXEC_EXIT, +}; + +static void *riscv_iommu_core_proc(void* arg) +{ + RISCVIOMMUState *s = arg; + unsigned exec = 0; + unsigned mask = 0; + + while (!(exec & BIT(RISCV_IOMMU_EXEC_EXIT))) { + mask = (mask ? mask : BIT(RISCV_IOMMU_EXEC_EXIT)) >> 1; + switch (exec & mask) { + case BIT(RISCV_IOMMU_EXEC_DDTP): + riscv_iommu_process_ddtp(s); + break; + case BIT(RISCV_IOMMU_EXEC_CQCSR): + riscv_iommu_process_cq_control(s); + break; + case BIT(RISCV_IOMMU_EXEC_CQT): + riscv_iommu_process_cq_tail(s); + break; + case BIT(RISCV_IOMMU_EXEC_FQCSR): + riscv_iommu_process_fq_control(s); + break; + case BIT(RISCV_IOMMU_EXEC_FQH): + /* NOP */ + break; + case BIT(RISCV_IOMMU_EXEC_PQCSR): + riscv_iommu_process_pq_control(s); + break; + case BIT(RISCV_IOMMU_EXEC_PQH): + /* NOP */ + break; + case BIT(RISCV_IOMMU_EXEC_TR_REQUEST): + /* DBG support not implemented yet */ + break; + } + exec &= ~mask; + if (!exec) { + qemu_mutex_lock(&s->core_lock); + exec = s->core_exec; + while (!exec) { + qemu_cond_wait(&s->core_cond, &s->core_lock); + exec = s->core_exec; + } + s->core_exec = 0; + qemu_mutex_unlock(&s->core_lock); + } + }; + + return NULL; +} + +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size, MemTxAttrs attrs) +{ + RISCVIOMMUState *s = opaque; + uint32_t regb = addr & ~3; + uint32_t busy = 0; + uint32_t exec = 0; + + if (size == 0 || size > 8 || (addr & (size - 1)) != 0) { + /* Unsupported MMIO alignment or access size */ + return MEMTX_ERROR; + } + + if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) { + /* Unsupported MMIO access location. */ + return MEMTX_ACCESS_ERROR; + } + + /* Track actionable MMIO write. */ + switch (regb) { + case RISCV_IOMMU_REG_DDTP: + case RISCV_IOMMU_REG_DDTP + 4: + exec = BIT(RISCV_IOMMU_EXEC_DDTP); + regb = RISCV_IOMMU_REG_DDTP; + busy = RISCV_IOMMU_DDTP_BUSY; + break; + + case RISCV_IOMMU_REG_CQT: + exec = BIT(RISCV_IOMMU_EXEC_CQT); + break; + + case RISCV_IOMMU_REG_CQCSR: + exec = BIT(RISCV_IOMMU_EXEC_CQCSR); + busy = RISCV_IOMMU_CQCSR_BUSY; + break; + + case RISCV_IOMMU_REG_FQH: + exec = BIT(RISCV_IOMMU_EXEC_FQH); + break; + + case RISCV_IOMMU_REG_FQCSR: + exec = BIT(RISCV_IOMMU_EXEC_FQCSR); + busy = RISCV_IOMMU_FQCSR_BUSY; + break; + + case RISCV_IOMMU_REG_PQH: + exec = BIT(RISCV_IOMMU_EXEC_PQH); + break; + + case RISCV_IOMMU_REG_PQCSR: + exec = BIT(RISCV_IOMMU_EXEC_PQCSR); + busy = RISCV_IOMMU_PQCSR_BUSY; + break; + } + + /* + * Registers update might be not synchronized with core logic. + * If system software updates register when relevant BUSY bit is set + * IOMMU behavior of additional writes to the register is UNSPECIFIED + */ + + qemu_spin_lock(&s->regs_lock); + if (size == 1) { + uint8_t ro = s->regs_ro[addr]; + uint8_t wc = s->regs_wc[addr]; + uint8_t rw = s->regs_rw[addr]; + s->regs_rw[addr] = ((rw & ro) | (data & ~ro)) & ~(data & wc); + } else if (size == 2) { + uint16_t ro = lduw_le_p(&s->regs_ro[addr]); + uint16_t wc = lduw_le_p(&s->regs_wc[addr]); + uint16_t rw = lduw_le_p(&s->regs_rw[addr]); + stw_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc)); + } else if (size == 4) { + uint32_t ro = ldl_le_p(&s->regs_ro[addr]); + uint32_t wc = ldl_le_p(&s->regs_wc[addr]); + uint32_t rw = ldl_le_p(&s->regs_rw[addr]); + stl_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc)); + } else if (size == 8) { + uint64_t ro = ldq_le_p(&s->regs_ro[addr]); + uint64_t wc = ldq_le_p(&s->regs_wc[addr]); + uint64_t rw = ldq_le_p(&s->regs_rw[addr]); + stq_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc)); + } + + /* Busy flag update, MSB 4-byte register. */ + if (busy) { + uint32_t rw = ldl_le_p(&s->regs_rw[regb]); + stl_le_p(&s->regs_rw[regb], rw | busy); + } + qemu_spin_unlock(&s->regs_lock); + + /* Wake up core processing thread. */ + if (exec) { + qemu_mutex_lock(&s->core_lock); + s->core_exec |= exec; + qemu_cond_signal(&s->core_cond); + qemu_mutex_unlock(&s->core_lock); + } + + return MEMTX_OK; +} + +static MemTxResult riscv_iommu_mmio_read(void *opaque, hwaddr addr, + uint64_t *data, unsigned size, MemTxAttrs attrs) +{ + RISCVIOMMUState *s = opaque; + uint64_t val = -1; + uint8_t *ptr; + + if ((addr & (size - 1)) != 0) { + /* Unsupported MMIO alignment. */ + return MEMTX_ERROR; + } + + if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) { + return MEMTX_ACCESS_ERROR; + } + + ptr = &s->regs_rw[addr]; + + if (size == 1) { + val = (uint64_t)*ptr; + } else if (size == 2) { + val = lduw_le_p(ptr); + } else if (size == 4) { + val = ldl_le_p(ptr); + } else if (size == 8) { + val = ldq_le_p(ptr); + } else { + return MEMTX_ERROR; + } + + *data = val; + + return MEMTX_OK; +} + +static const MemoryRegionOps riscv_iommu_mmio_ops = { + .read_with_attrs = riscv_iommu_mmio_read, + .write_with_attrs = riscv_iommu_mmio_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .impl = { + .min_access_size = 1, + .max_access_size = 8, + .unaligned = false, + }, + .valid = { + .min_access_size = 1, + .max_access_size = 8, + } +}; + +/* + * Translations matching MSI pattern check are redirected to "riscv-iommu-trap" + * memory region as untranslated address, for additional MSI/MRIF interception + * by IOMMU interrupt remapping implementation. + * Note: Device emulation code generating an MSI is expected to provide a valid + * memory transaction attributes with requested_id set. + */ +static MemTxResult riscv_iommu_trap_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size, MemTxAttrs attrs) +{ + RISCVIOMMUState* s = (RISCVIOMMUState *)opaque; + RISCVIOMMUContext *ctx; + MemTxResult res; + void *ref; + uint32_t devid = attrs.requester_id; + + if (attrs.unspecified) { + return MEMTX_ACCESS_ERROR; + } + + /* FIXME: PCIe bus remapping for attached endpoints. */ + devid |= s->bus << 8; + + ctx = riscv_iommu_ctx(s, devid, 0, &ref); + if (ctx == NULL) { + res = MEMTX_ACCESS_ERROR; + } else { + res = riscv_iommu_msi_write(s, ctx, addr, data, size, attrs); + } + riscv_iommu_ctx_put(s, ref); + return res; +} + +static MemTxResult riscv_iommu_trap_read(void *opaque, hwaddr addr, + uint64_t *data, unsigned size, MemTxAttrs attrs) +{ + return MEMTX_ACCESS_ERROR; +} + +static const MemoryRegionOps riscv_iommu_trap_ops = { + .read_with_attrs = riscv_iommu_trap_read, + .write_with_attrs = riscv_iommu_trap_write, + .endianness = DEVICE_LITTLE_ENDIAN, + .impl = { + .min_access_size = 1, + .max_access_size = 8, + .unaligned = true, + }, + .valid = { + .min_access_size = 1, + .max_access_size = 8, + } +}; + +static void riscv_iommu_realize(DeviceState *dev, Error **errp) +{ + RISCVIOMMUState *s = RISCV_IOMMU(dev); + + s->cap = s->version & RISCV_IOMMU_CAP_VERSION; + if (s->enable_msi) { + s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF; + } + /* Report QEMU target physical address space limits */ + s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS, + TARGET_PHYS_ADDR_SPACE_BITS); + + /* TODO: method to report supported PASID bits */ + s->pasid_bits = 8; /* restricted to size of MemTxAttrs.pasid */ + s->cap |= RISCV_IOMMU_CAP_PD8; + + /* Out-of-reset translation mode: OFF (DMA disabled) BARE (passthrough) */ + s->ddtp = set_field(0, RISCV_IOMMU_DDTP_MODE, s->enable_off ? + RISCV_IOMMU_DDTP_MODE_OFF : RISCV_IOMMU_DDTP_MODE_BARE); + + /* register storage */ + s->regs_rw = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE); + s->regs_ro = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE); + s->regs_wc = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE); + + /* Mark all registers read-only */ + memset(s->regs_ro, 0xff, RISCV_IOMMU_REG_SIZE); + + /* + * Register complete MMIO space, including MSI/PBA registers. + * Note, PCIDevice implementation will add overlapping MR for MSI/PBA, + * managed directly by the PCIDevice implementation. + */ + memory_region_init_io(&s->regs_mr, OBJECT(dev), &riscv_iommu_mmio_ops, s, + "riscv-iommu-regs", RISCV_IOMMU_REG_SIZE); + + /* Set power-on register state */ + stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_CAP], s->cap); + stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_FCTL], s->fctl); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_DDTP], + ~(RISCV_IOMMU_DDTP_PPN | RISCV_IOMMU_DDTP_MODE)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQB], + ~(RISCV_IOMMU_CQB_LOG2SZ | RISCV_IOMMU_CQB_PPN)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQB], + ~(RISCV_IOMMU_FQB_LOG2SZ | RISCV_IOMMU_FQB_PPN)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQB], + ~(RISCV_IOMMU_PQB_LOG2SZ | RISCV_IOMMU_PQB_PPN)); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQMF | + RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQON | + RISCV_IOMMU_CQCSR_BUSY); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQMF | + RISCV_IOMMU_FQCSR_FQOF); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQON | + RISCV_IOMMU_FQCSR_BUSY); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQMF | + RISCV_IOMMU_PQCSR_PQOF); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQON | + RISCV_IOMMU_PQCSR_BUSY); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0); + stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp); + + /* Memory region for downstream access, if specified. */ + if (s->target_mr) { + s->target_as = g_new0(AddressSpace, 1); + address_space_init(s->target_as, s->target_mr, + "riscv-iommu-downstream"); + } else { + /* Fallback to global system memory. */ + s->target_as = &address_space_memory; + } + + /* Memory region for untranslated MRIF/MSI writes */ + memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s, + "riscv-iommu-trap", ~0ULL); + address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as"); + + /* Device translation context cache */ + s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal, + g_free, NULL); + + s->iommus.le_next = NULL; + s->iommus.le_prev = NULL; + QLIST_INIT(&s->spaces); + qemu_cond_init(&s->core_cond); + qemu_mutex_init(&s->core_lock); + qemu_spin_init(&s->regs_lock); + qemu_thread_create(&s->core_proc, "riscv-iommu-core", + riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE); +} + +static void riscv_iommu_unrealize(DeviceState *dev) +{ + RISCVIOMMUState *s = RISCV_IOMMU(dev); + + qemu_mutex_lock(&s->core_lock); + /* cancel pending operations and stop */ + s->core_exec = BIT(RISCV_IOMMU_EXEC_EXIT); + qemu_cond_signal(&s->core_cond); + qemu_mutex_unlock(&s->core_lock); + qemu_thread_join(&s->core_proc); + qemu_cond_destroy(&s->core_cond); + qemu_mutex_destroy(&s->core_lock); + g_hash_table_unref(s->ctx_cache); +} + +static Property riscv_iommu_properties[] = { + DEFINE_PROP_UINT32("version", RISCVIOMMUState, version, + RISCV_IOMMU_SPEC_DOT_VER), + DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0), + DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), + DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), + DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr, + TYPE_MEMORY_REGION, MemoryRegion *), + DEFINE_PROP_END_OF_LIST(), +}; + +static void riscv_iommu_class_init(ObjectClass *klass, void* data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + /* internal device for riscv-iommu-{pci/sys}, not user-creatable */ + dc->user_creatable = false; + dc->realize = riscv_iommu_realize; + dc->unrealize = riscv_iommu_unrealize; + device_class_set_props(dc, riscv_iommu_properties); +} + +static const TypeInfo riscv_iommu_info = { + .name = TYPE_RISCV_IOMMU, + .parent = TYPE_DEVICE, + .instance_size = sizeof(RISCVIOMMUState), + .class_init = riscv_iommu_class_init, +}; + +static const char *IOMMU_FLAG_STR[] = { + "NA", + "RO", + "WR", + "RW", +}; + +/* RISC-V IOMMU Memory Region - Address Translation Space */ +static IOMMUTLBEntry riscv_iommu_memory_region_translate( + IOMMUMemoryRegion *iommu_mr, hwaddr addr, + IOMMUAccessFlags flag, int iommu_idx) +{ + RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr); + RISCVIOMMUContext *ctx; + void *ref; + IOMMUTLBEntry iotlb = { + .iova = addr, + .target_as = as->iommu->target_as, + .addr_mask = ~0ULL, + .perm = flag, + }; + + ctx = riscv_iommu_ctx(as->iommu, as->devid, iommu_idx, &ref); + if (ctx == NULL) { + /* Translation disabled or invalid. */ + iotlb.addr_mask = 0; + iotlb.perm = IOMMU_NONE; + } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) { + /* Translation disabled or fault reported. */ + iotlb.addr_mask = 0; + iotlb.perm = IOMMU_NONE; + } + + /* Trace all dma translations with original access flags. */ + trace_riscv_iommu_dma(as->iommu->parent_obj.id, PCI_BUS_NUM(as->devid), + PCI_SLOT(as->devid), PCI_FUNC(as->devid), iommu_idx, + IOMMU_FLAG_STR[flag & IOMMU_RW], iotlb.iova, + iotlb.translated_addr); + + riscv_iommu_ctx_put(as->iommu, ref); + + return iotlb; +} + +static int riscv_iommu_memory_region_notify( + IOMMUMemoryRegion *iommu_mr, IOMMUNotifierFlag old, + IOMMUNotifierFlag new, Error **errp) +{ + RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr); + + if (old == IOMMU_NOTIFIER_NONE) { + as->notifier = true; + trace_riscv_iommu_notifier_add(iommu_mr->parent_obj.name); + } else if (new == IOMMU_NOTIFIER_NONE) { + as->notifier = false; + trace_riscv_iommu_notifier_del(iommu_mr->parent_obj.name); + } + + return 0; +} + +static inline bool pci_is_iommu(PCIDevice *pdev) +{ + return pci_get_word(pdev->config + PCI_CLASS_DEVICE) == 0x0806; +} + +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn) +{ + RISCVIOMMUState *s = (RISCVIOMMUState *) opaque; + PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn); + AddressSpace *as = NULL; + + if (pdev && pci_is_iommu(pdev)) { + return s->target_as; + } + + /* Find first registered IOMMU device */ + while (s->iommus.le_prev) { + s = *(s->iommus.le_prev); + } + + /* Find first matching IOMMU */ + while (s != NULL && as == NULL) { + as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn)); + s = s->iommus.le_next; + } + + return as ? as : &address_space_memory; +} + +static const PCIIOMMUOps riscv_iommu_ops = { + .get_address_space = riscv_iommu_find_as, +}; + +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus, + Error **errp) +{ + if (bus->iommu_ops && + bus->iommu_ops->get_address_space == riscv_iommu_find_as) { + /* Allow multiple IOMMUs on the same PCIe bus, link known devices */ + RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque; + QLIST_INSERT_AFTER(last, iommu, iommus); + } else if (bus->iommu_ops == NULL) { + pci_setup_iommu(bus, &riscv_iommu_ops, iommu); + } else { + error_setg(errp, "can't register secondary IOMMU for PCI bus #%d", + pci_bus_num(bus)); + } +} + +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr, + MemTxAttrs attrs) +{ + return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid; +} + +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr) +{ + RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr); + return 1 << as->iommu->pasid_bits; +} + +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data) +{ + IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass); + + imrc->translate = riscv_iommu_memory_region_translate; + imrc->notify_flag_changed = riscv_iommu_memory_region_notify; + imrc->attrs_to_index = riscv_iommu_memory_region_index; + imrc->num_indexes = riscv_iommu_memory_region_index_len; +} + +static const TypeInfo riscv_iommu_memory_region_info = { + .parent = TYPE_IOMMU_MEMORY_REGION, + .name = TYPE_RISCV_IOMMU_MEMORY_REGION, + .class_init = riscv_iommu_memory_region_init, +}; + +static void riscv_iommu_register_mr_types(void) +{ + type_register_static(&riscv_iommu_memory_region_info); + type_register_static(&riscv_iommu_info); +} + +type_init(riscv_iommu_register_mr_types); diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU (Ziommu) + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_RISCV_IOMMU_STATE_H +#define HW_RISCV_IOMMU_STATE_H + +#include "qemu/osdep.h" +#include "qom/object.h" + +#include "hw/riscv/iommu.h" + +struct RISCVIOMMUState { + /*< private >*/ + DeviceState parent_obj; + + /*< public >*/ + uint32_t version; /* Reported interface version number */ + uint32_t pasid_bits; /* process identifier width */ + uint32_t bus; /* PCI bus mapping for non-root endpoints */ + + uint64_t cap; /* IOMMU supported capabilities */ + uint64_t fctl; /* IOMMU enabled features */ + + bool enable_off; /* Enable out-of-reset OFF mode (DMA disabled) */ + bool enable_msi; /* Enable MSI remapping */ + + /* IOMMU Internal State */ + uint64_t ddtp; /* Validated Device Directory Tree Root Pointer */ + + dma_addr_t cq_addr; /* Command queue base physical address */ + dma_addr_t fq_addr; /* Fault/event queue base physical address */ + dma_addr_t pq_addr; /* Page request queue base physical address */ + + uint32_t cq_mask; /* Command queue index bit mask */ + uint32_t fq_mask; /* Fault/event queue index bit mask */ + uint32_t pq_mask; /* Page request queue index bit mask */ + + /* interrupt notifier */ + void (*notify)(RISCVIOMMUState *iommu, unsigned vector); + + /* IOMMU State Machine */ + QemuThread core_proc; /* Background processing thread */ + QemuMutex core_lock; /* Global IOMMU lock, used for cache/regs updates */ + QemuCond core_cond; /* Background processing wake up signal */ + unsigned core_exec; /* Processing thread execution actions */ + + /* IOMMU target address space */ + AddressSpace *target_as; + MemoryRegion *target_mr; + + /* MSI / MRIF access trap */ + AddressSpace trap_as; + MemoryRegion trap_mr; + + GHashTable *ctx_cache; /* Device translation Context Cache */ + + /* MMIO Hardware Interface */ + MemoryRegion regs_mr; + QemuSpin regs_lock; + uint8_t *regs_rw; /* register state (user write) */ + uint8_t *regs_wc; /* write-1-to-clear mask */ + uint8_t *regs_ro; /* read-only mask */ + + QLIST_ENTRY(RISCVIOMMUState) iommus; + QLIST_HEAD(, RISCVIOMMUSpace) spaces; +}; + +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus, + Error **errp); + +/* private helpers */ + +/* Register helper functions */ +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s, + unsigned idx, uint32_t set, uint32_t clr) +{ + uint32_t val; + qemu_spin_lock(&s->regs_lock); + val = ldl_le_p(s->regs_rw + idx); + stl_le_p(s->regs_rw + idx, (val & ~clr) | set); + qemu_spin_unlock(&s->regs_lock); + return val; +} + +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s, + unsigned idx, uint32_t set) +{ + qemu_spin_lock(&s->regs_lock); + stl_le_p(s->regs_rw + idx, set); + qemu_spin_unlock(&s->regs_lock); +} + +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s, + unsigned idx) +{ + return ldl_le_p(s->regs_rw + idx); +} + +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s, + unsigned idx, uint64_t set, uint64_t clr) +{ + uint64_t val; + qemu_spin_lock(&s->regs_lock); + val = ldq_le_p(s->regs_rw + idx); + stq_le_p(s->regs_rw + idx, (val & ~clr) | set); + qemu_spin_unlock(&s->regs_lock); + return val; +} + +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s, + unsigned idx, uint64_t set) +{ + qemu_spin_lock(&s->regs_lock); + stq_le_p(s->regs_rw + idx, set); + qemu_spin_unlock(&s->regs_lock); +} + +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s, + unsigned idx) +{ + return ldq_le_p(s->regs_rw + idx); +} + + + +#endif diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/trace-events @@ -XXX,XX +XXX,XX @@ +# See documentation at docs/devel/tracing.rst + +# riscv-iommu.c +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d" +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64 +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64 +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64 +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64 +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64 +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added" +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed" diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/trace.h @@ -XXX,XX +XXX,XX @@ +#include "trace/trace-hw_riscv.h" + diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/riscv/iommu.h @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU (Ziommu) + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_RISCV_IOMMU_H +#define HW_RISCV_IOMMU_H + +#include "qemu/osdep.h" +#include "qom/object.h" + +#define TYPE_RISCV_IOMMU "riscv-iommu" +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU) +typedef struct RISCVIOMMUState RISCVIOMMUState; + +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr" +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace; + +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci" +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI) +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci; + +#endif diff --git a/meson.build b/meson.build index XXXXXXX..XXXXXXX 100644 --- a/meson.build +++ b/meson.build @@ -XXX,XX +XXX,XX @@ if have_system 'hw/rdma', 'hw/rdma/vmw', 'hw/rtc', + 'hw/riscv', 'hw/s390x', 'hw/scsi', 'hw/sd', -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> The RISC-V IOMMU can be modelled as a PCIe device following the guidelines of the RISC-V IOMMU spec, chapter 7.1, "Integrating an IOMMU as a PCIe device". Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/meson.build | 2 +- hw/riscv/riscv-iommu-pci.c | 173 +++++++++++++++++++++++++++++++++++++ 2 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 hw/riscv/riscv-iommu-pci.c diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/meson.build +++ b/hw/riscv/meson.build @@ -XXX,XX +XXX,XX @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c')) riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c')) riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c')) riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c')) -riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c')) +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c')) hw_arch += {'riscv': riscv_ss} diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu-pci.c @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU (Ziommu) + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" +#include "hw/pci/pci_bus.h" +#include "hw/qdev-properties.h" +#include "hw/riscv/riscv_hart.h" +#include "migration/vmstate.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "qemu/host-utils.h" +#include "qom/object.h" + +#include "cpu_bits.h" +#include "riscv-iommu.h" +#include "riscv-iommu-bits.h" + +#ifndef PCI_VENDOR_ID_RIVOS +#define PCI_VENDOR_ID_RIVOS 0x1efd +#endif + +#ifndef PCI_DEVICE_ID_RIVOS_IOMMU +#define PCI_DEVICE_ID_RIVOS_IOMMU 0xedf1 +#endif + +/* RISC-V IOMMU PCI Device Emulation */ + +typedef struct RISCVIOMMUStatePci { + PCIDevice pci; /* Parent PCIe device state */ + MemoryRegion bar0; /* PCI BAR (including MSI-x config) */ + RISCVIOMMUState iommu; /* common IOMMU state */ +} RISCVIOMMUStatePci; + +/* interrupt delivery callback */ +static void riscv_iommu_pci_notify(RISCVIOMMUState *iommu, unsigned vector) +{ + RISCVIOMMUStatePci *s = container_of(iommu, RISCVIOMMUStatePci, iommu); + + if (msix_enabled(&(s->pci))) { + msix_notify(&(s->pci), vector); + } +} + +static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp) +{ + RISCVIOMMUStatePci *s = DO_UPCAST(RISCVIOMMUStatePci, pci, dev); + RISCVIOMMUState *iommu = &s->iommu; + Error *err = NULL; + + /* Set device id for trace / debug */ + DEVICE(iommu)->id = g_strdup_printf("%02x:%02x.%01x", + pci_dev_bus_num(dev), PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn)); + qdev_realize(DEVICE(iommu), NULL, errp); + + memory_region_init(&s->bar0, OBJECT(s), "riscv-iommu-bar0", + QEMU_ALIGN_UP(memory_region_size(&iommu->regs_mr), TARGET_PAGE_SIZE)); + memory_region_add_subregion(&s->bar0, 0, &iommu->regs_mr); + + pcie_endpoint_cap_init(dev, 0); + + pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0); + + int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT, + &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG, + &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err); + + if (ret == -ENOTSUP) { + /* + * MSI-x is not supported by the platform. + * Driver should use timer/polling based notification handlers. + */ + warn_report_err(err); + } else if (ret < 0) { + error_propagate(errp, err); + return; + } else { + /* mark all allocated MSIx vectors as used. */ + msix_vector_use(dev, RISCV_IOMMU_INTR_CQ); + msix_vector_use(dev, RISCV_IOMMU_INTR_FQ); + msix_vector_use(dev, RISCV_IOMMU_INTR_PM); + msix_vector_use(dev, RISCV_IOMMU_INTR_PQ); + iommu->notify = riscv_iommu_pci_notify; + } + + PCIBus *bus = pci_device_root_bus(dev); + if (!bus) { + error_setg(errp, "can't find PCIe root port for %02x:%02x.%x", + pci_bus_num(pci_get_bus(dev)), PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn)); + return; + } + + riscv_iommu_pci_setup_iommu(iommu, bus, errp); +} + +static void riscv_iommu_pci_exit(PCIDevice *pci_dev) +{ + pci_setup_iommu(pci_device_root_bus(pci_dev), NULL, NULL); +} + +static const VMStateDescription riscv_iommu_vmstate = { + .name = "riscv-iommu", + .unmigratable = 1 +}; + +static void riscv_iommu_pci_init(Object *obj) +{ + RISCVIOMMUStatePci *s = RISCV_IOMMU_PCI(obj); + RISCVIOMMUState *iommu = &s->iommu; + + object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU); + qdev_alias_all_properties(DEVICE(iommu), obj); +} + +static Property riscv_iommu_pci_properties[] = { + DEFINE_PROP_END_OF_LIST(), +}; + +static void riscv_iommu_pci_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + + k->realize = riscv_iommu_pci_realize; + k->exit = riscv_iommu_pci_exit; + k->vendor_id = PCI_VENDOR_ID_RIVOS; + k->device_id = PCI_DEVICE_ID_RIVOS_IOMMU; + k->revision = 0; + k->class_id = 0x0806; + dc->desc = "RISCV-IOMMU DMA Remapping device"; + dc->vmsd = &riscv_iommu_vmstate; + dc->hotpluggable = false; + dc->user_creatable = true; + set_bit(DEVICE_CATEGORY_MISC, dc->categories); + device_class_set_props(dc, riscv_iommu_pci_properties); +} + +static const TypeInfo riscv_iommu_pci = { + .name = TYPE_RISCV_IOMMU_PCI, + .parent = TYPE_PCI_DEVICE, + .class_init = riscv_iommu_pci_class_init, + .instance_init = riscv_iommu_pci_init, + .instance_size = sizeof(RISCVIOMMUStatePci), + .interfaces = (InterfaceInfo[]) { + { INTERFACE_PCIE_DEVICE }, + { }, + }, +}; + +static void riscv_iommu_register_pci_types(void) +{ + type_register_static(&riscv_iommu_pci); +} + +type_init(riscv_iommu_register_pci_types); -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> This device models the RISC-V IOMMU as a sysbus device. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/meson.build | 2 +- hw/riscv/riscv-iommu-sys.c | 93 ++++++++++++++++++++++++++++++++++++++ include/hw/riscv/iommu.h | 4 ++ 3 files changed, 98 insertions(+), 1 deletion(-) create mode 100644 hw/riscv/riscv-iommu-sys.c diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/meson.build +++ b/hw/riscv/meson.build @@ -XXX,XX +XXX,XX @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c')) riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c')) riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c')) riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c')) -riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c')) +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c', 'riscv-iommu-sys.c')) hw_arch += {'riscv': riscv_ss} diff --git a/hw/riscv/riscv-iommu-sys.c b/hw/riscv/riscv-iommu-sys.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu-sys.c @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU (Ziommu) - Platform Device + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "hw/pci/pci_bus.h" +#include "hw/qdev-properties.h" +#include "hw/sysbus.h" +#include "qapi/error.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "qemu/host-utils.h" +#include "qemu/module.h" +#include "qemu/osdep.h" +#include "qom/object.h" + +#include "riscv-iommu.h" + +/* RISC-V IOMMU System Platform Device Emulation */ + +struct RISCVIOMMUStateSys { + SysBusDevice parent; + uint64_t addr; + RISCVIOMMUState iommu; +}; + +static void riscv_iommu_sys_realize(DeviceState *dev, Error **errp) +{ + RISCVIOMMUStateSys *s = RISCV_IOMMU_SYS(dev); + PCIBus *pci_bus; + + qdev_realize(DEVICE(&s->iommu), NULL, errp); + sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iommu.regs_mr); + if (s->addr) { + sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, s->addr); + } + + pci_bus = (PCIBus *) object_resolve_path_type("", TYPE_PCI_BUS, NULL); + if (pci_bus) { + riscv_iommu_pci_setup_iommu(&s->iommu, pci_bus, errp); + } +} + +static void riscv_iommu_sys_init(Object *obj) +{ + RISCVIOMMUStateSys *s = RISCV_IOMMU_SYS(obj); + RISCVIOMMUState *iommu = &s->iommu; + + object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU); + qdev_alias_all_properties(DEVICE(iommu), obj); +} + +static Property riscv_iommu_sys_properties[] = { + DEFINE_PROP_UINT64("addr", RISCVIOMMUStateSys, addr, 0), + DEFINE_PROP_END_OF_LIST(), +}; + +static void riscv_iommu_sys_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + dc->realize = riscv_iommu_sys_realize; + set_bit(DEVICE_CATEGORY_MISC, dc->categories); + device_class_set_props(dc, riscv_iommu_sys_properties); +} + +static const TypeInfo riscv_iommu_sys = { + .name = TYPE_RISCV_IOMMU_SYS, + .parent = TYPE_SYS_BUS_DEVICE, + .class_init = riscv_iommu_sys_class_init, + .instance_init = riscv_iommu_sys_init, + .instance_size = sizeof(RISCVIOMMUStateSys), +}; + +static void riscv_iommu_register_sys(void) +{ + type_register_static(&riscv_iommu_sys); +} + +type_init(riscv_iommu_register_sys) diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/riscv/iommu.h +++ b/include/hw/riscv/iommu.h @@ -XXX,XX +XXX,XX @@ typedef struct RISCVIOMMUSpace RISCVIOMMUSpace; OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI) typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci; +#define TYPE_RISCV_IOMMU_SYS "riscv-iommu-device" +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStateSys, RISCV_IOMMU_SYS) +typedef struct RISCVIOMMUStateSys RISCVIOMMUStateSys; + #endif -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Generate device tree entry for riscv-iommu PCI device, along with mapping all PCI device identifiers to the single IOMMU device instance. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/virt.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/virt.c +++ b/hw/riscv/virt.c @@ -XXX,XX +XXX,XX @@ #include "hw/core/sysbus-fdt.h" #include "target/riscv/pmu.h" #include "hw/riscv/riscv_hart.h" +#include "hw/riscv/iommu.h" #include "hw/riscv/virt.h" #include "hw/riscv/boot.h" #include "hw/riscv/numa.h" @@ -XXX,XX +XXX,XX @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, uint16_t bdf) bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf); } +static void create_fdt_iommu(RISCVVirtState *s, uint16_t bdf) +{ + const char comp[] = "riscv,pci-iommu"; + void *fdt = MACHINE(s)->fdt; + uint32_t iommu_phandle; + g_autofree char *iommu_node = NULL; + g_autofree char *pci_node = NULL; + + pci_node = g_strdup_printf("/soc/pci@%lx", + (long) virt_memmap[VIRT_PCIE_ECAM].base); + iommu_node = g_strdup_printf("%s/iommu@%x", pci_node, bdf); + iommu_phandle = qemu_fdt_alloc_phandle(fdt); + qemu_fdt_add_subnode(fdt, iommu_node); + + qemu_fdt_setprop(fdt, iommu_node, "compatible", comp, sizeof(comp)); + qemu_fdt_setprop_cell(fdt, iommu_node, "#iommu-cells", 1); + qemu_fdt_setprop_cell(fdt, iommu_node, "phandle", iommu_phandle); + qemu_fdt_setprop_cells(fdt, iommu_node, "reg", + bdf << 8, 0, 0, 0, 0); + qemu_fdt_setprop_cells(fdt, pci_node, "iommu-map", + 0, iommu_phandle, 0, bdf, + bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf); +} + static void finalize_fdt(RISCVVirtState *s) { uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1; @@ -XXX,XX +XXX,XX @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine, MachineClass *mc = MACHINE_GET_CLASS(machine); if (device_is_dynamic_sysbus(mc, dev) || - object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) { + object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI) || + object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) { return HOTPLUG_HANDLER(machine); } + return NULL; } @@ -XXX,XX +XXX,XX @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev, if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) { create_fdt_virtio_iommu(s, pci_get_bdf(PCI_DEVICE(dev))); } + + if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) { + create_fdt_iommu(s, pci_get_bdf(PCI_DEVICE(dev))); + } } static void virt_machine_class_init(ObjectClass *oc, void *data) -- 2.43.2
To test the RISC-V IOMMU emulation we'll use its PCI representation. Create a new 'riscv-iommu-pci' libqos device that will be present with CONFIG_RISCV_IOMMU. This config is only available for RISC-V, so this device will only be consumed by the RISC-V libqos machine. Start with basic tests: a PCI sanity check and a reset state register test. The reset test was taken from the RISC-V IOMMU spec chapter 5.2, "Reset behavior". More tests will be added later. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- tests/qtest/libqos/meson.build | 4 ++ tests/qtest/libqos/riscv-iommu.c | 79 +++++++++++++++++++++++++++ tests/qtest/libqos/riscv-iommu.h | 67 +++++++++++++++++++++++ tests/qtest/meson.build | 1 + tests/qtest/riscv-iommu-test.c | 93 ++++++++++++++++++++++++++++++++ 5 files changed, 244 insertions(+) create mode 100644 tests/qtest/libqos/riscv-iommu.c create mode 100644 tests/qtest/libqos/riscv-iommu.h create mode 100644 tests/qtest/riscv-iommu-test.c diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/libqos/meson.build +++ b/tests/qtest/libqos/meson.build @@ -XXX,XX +XXX,XX @@ if have_virtfs libqos_srcs += files('virtio-9p.c', 'virtio-9p-client.c') endif +if config_all_devices.has_key('CONFIG_RISCV_IOMMU') + libqos_srcs += files('riscv-iommu.c') +endif + libqos = static_library('qos', libqos_srcs + genh, name_suffix: 'fa', build_by_default: false) diff --git a/tests/qtest/libqos/riscv-iommu.c b/tests/qtest/libqos/riscv-iommu.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/libqos/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ +/* + * libqos driver riscv-iommu-pci framework + * + * Copyright (c) 2024 Ventana Micro Systems Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "../libqtest.h" +#include "qemu/module.h" +#include "qgraph.h" +#include "pci.h" +#include "riscv-iommu.h" + +#define PCI_VENDOR_ID_RIVOS 0x1efd +#define PCI_DEVICE_ID_RIVOS_IOMMU 0xedf1 + +static void *riscv_iommu_pci_get_driver(void *obj, const char *interface) +{ + QRISCVIOMMU *r_iommu_pci = obj; + + if (!g_strcmp0(interface, "pci-device")) { + return &r_iommu_pci->dev; + } + + fprintf(stderr, "%s not present in riscv_iommu_pci\n", interface); + g_assert_not_reached(); +} + +static void riscv_iommu_pci_start_hw(QOSGraphObject *obj) +{ + QRISCVIOMMU *pci = (QRISCVIOMMU *)obj; + qpci_device_enable(&pci->dev); +} + +static void riscv_iommu_pci_destructor(QOSGraphObject *obj) +{ + QRISCVIOMMU *pci = (QRISCVIOMMU *)obj; + qpci_iounmap(&pci->dev, pci->reg_bar); +} + +static void *riscv_iommu_pci_create(void *pci_bus, QGuestAllocator *alloc, + void *addr) +{ + QRISCVIOMMU *r_iommu_pci = g_new0(QRISCVIOMMU, 1); + QPCIBus *bus = pci_bus; + + qpci_device_init(&r_iommu_pci->dev, bus, addr); + r_iommu_pci->reg_bar = qpci_iomap(&r_iommu_pci->dev, 0, NULL); + + r_iommu_pci->obj.get_driver = riscv_iommu_pci_get_driver; + r_iommu_pci->obj.start_hw = riscv_iommu_pci_start_hw; + r_iommu_pci->obj.destructor = riscv_iommu_pci_destructor; + return &r_iommu_pci->obj; +} + +static void riscv_iommu_pci_register_nodes(void) +{ + QPCIAddress addr = { + .vendor_id = PCI_VENDOR_ID_RIVOS, + .device_id = PCI_DEVICE_ID_RIVOS_IOMMU, + .devfn = QPCI_DEVFN(1, 0), + }; + + QOSGraphEdgeOptions opts = { + .extra_device_opts = "addr=01.0", + }; + + add_qpci_address(&opts, &addr); + + qos_node_create_driver("riscv-iommu-pci", riscv_iommu_pci_create); + qos_node_produces("riscv-iommu-pci", "pci-device"); + qos_node_consumes("riscv-iommu-pci", "pci-bus", &opts); +} + +libqos_init(riscv_iommu_pci_register_nodes); diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/libqos/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ +/* + * libqos driver riscv-iommu-pci framework + * + * Copyright (c) 2024 Ventana Micro Systems Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ + +#ifndef TESTS_LIBQOS_RISCV_IOMMU_H +#define TESTS_LIBQOS_RISCV_IOMMU_H + +#include "qgraph.h" +#include "pci.h" +#include "qemu/bitops.h" + +#ifndef GENMASK_ULL +#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l)) +#endif + +#define RISCV_IOMMU_PCI_VENDOR_ID_RIVOS 0x1efd +#define RISCV_IOMMU_PCI_DEVICE_ID_RIVOS 0xedf1 +#define RISCV_IOMMU_PCI_DEVICE_CLASS 0x0806 + +/* Common field positions */ +#define RISCV_IOMMU_QUEUE_ENABLE BIT(0) +#define RISCV_IOMMU_QUEUE_INTR_ENABLE BIT(1) +#define RISCV_IOMMU_QUEUE_MEM_FAULT BIT(8) +#define RISCV_IOMMU_QUEUE_ACTIVE BIT(16) +#define RISCV_IOMMU_QUEUE_BUSY BIT(17) + +#define RISCV_IOMMU_REG_CAP 0x0000 +#define RISCV_IOMMU_CAP_VERSION GENMASK_ULL(7, 0) + +#define RISCV_IOMMU_REG_DDTP 0x0010 +#define RISCV_IOMMU_DDTP_BUSY BIT_ULL(4) +#define RISCV_IOMMU_DDTP_MODE GENMASK_ULL(3, 0) +#define RISCV_IOMMU_DDTP_MODE_OFF 0 + +#define RISCV_IOMMU_REG_CQCSR 0x0048 +#define RISCV_IOMMU_CQCSR_CQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_CQCSR_CIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_CQCSR_CQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_CQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +#define RISCV_IOMMU_REG_FQCSR 0x004C +#define RISCV_IOMMU_FQCSR_FQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_FQCSR_FIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_FQCSR_FQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_FQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +#define RISCV_IOMMU_REG_PQCSR 0x0050 +#define RISCV_IOMMU_PQCSR_PQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_PQCSR_PIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_PQCSR_PQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_PQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +#define RISCV_IOMMU_REG_IPSR 0x0054 + +typedef struct QRISCVIOMMU { + QOSGraphObject obj; + QPCIDevice dev; + QPCIBar reg_bar; +} QRISCVIOMMU; + +#endif diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/meson.build +++ b/tests/qtest/meson.build @@ -XXX,XX +XXX,XX @@ qos_test_ss.add( 'vmxnet3-test.c', 'igb-test.c', 'ufs-test.c', + 'riscv-iommu-test.c', ) if config_all_devices.has_key('CONFIG_VIRTIO_SERIAL') diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/riscv-iommu-test.c @@ -XXX,XX +XXX,XX @@ +/* + * QTest testcase for RISC-V IOMMU + * + * Copyright (c) 2024 Ventana Micro Systems Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "libqtest-single.h" +#include "qemu/module.h" +#include "libqos/qgraph.h" +#include "libqos/riscv-iommu.h" +#include "hw/pci/pci_regs.h" + +static uint32_t riscv_iommu_read_reg32(QRISCVIOMMU *r_iommu, int reg_offset) +{ + uint32_t reg; + + qpci_memread(&r_iommu->dev, r_iommu->reg_bar, reg_offset, + ®, sizeof(reg)); + return reg; +} + +static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset) +{ + uint64_t reg; + + qpci_memread(&r_iommu->dev, r_iommu->reg_bar, reg_offset, + ®, sizeof(reg)); + return reg; +} + +static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc) +{ + QRISCVIOMMU *r_iommu = obj; + QPCIDevice *dev = &r_iommu->dev; + uint16_t vendorid, deviceid, classid; + + vendorid = qpci_config_readw(dev, PCI_VENDOR_ID); + deviceid = qpci_config_readw(dev, PCI_DEVICE_ID); + classid = qpci_config_readw(dev, PCI_CLASS_DEVICE); + + g_assert_cmpuint(vendorid, ==, RISCV_IOMMU_PCI_VENDOR_ID_RIVOS); + g_assert_cmpuint(deviceid, ==, RISCV_IOMMU_PCI_DEVICE_ID_RIVOS); + g_assert_cmpuint(classid, ==, RISCV_IOMMU_PCI_DEVICE_CLASS); +} + +static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc) +{ + QRISCVIOMMU *r_iommu = obj; + uint64_t cap; + uint32_t reg; + + cap = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP); + g_assert_cmpuint(cap & RISCV_IOMMU_CAP_VERSION, ==, 0x10); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQEN, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CIE, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQON, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_BUSY, ==, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQEN, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FIE, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQON, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_BUSY, ==, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQEN, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PIE, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQON, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_BUSY, ==, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_DDTP); + g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_BUSY, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_MODE, ==, + RISCV_IOMMU_DDTP_MODE_OFF); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IPSR); + g_assert_cmpuint(reg, ==, 0); +} + +static void register_riscv_iommu_test(void) +{ + qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL); + qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL); +} + +libqos_init(register_riscv_iommu_test); -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> The RISC-V IOMMU spec predicts that the IOMMU can use translation caches to hold entries from the DDT. This includes implementation for all cache commands that are marked as 'not implemented'. There are some artifacts included in the cache that predicts s-stage and g-stage elements, although we don't support it yet. We'll introduce them next. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/riscv-iommu.c | 190 ++++++++++++++++++++++++++++++++++++++++- hw/riscv/riscv-iommu.h | 2 + 2 files changed, 188 insertions(+), 4 deletions(-) diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUContext { uint64_t msiptp; /* MSI redirection page table pointer */ }; +/* Address translation cache entry */ +struct RISCVIOMMUEntry { + uint64_t iova:44; /* IOVA Page Number */ + uint64_t pscid:20; /* Process Soft-Context identifier */ + uint64_t phys:44; /* Physical Page Number */ + uint64_t gscid:16; /* Guest Soft-Context identifier */ + uint64_t perm:2; /* IOMMU_RW flags */ + uint64_t __rfu:2; +}; + /* IOMMU index for transactions without PASID specified. */ #define RISCV_IOMMU_NOPASID 0 @@ -XXX,XX +XXX,XX @@ static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid) return &as->iova_as; } +/* Translation Object cache support */ +static gboolean __iot_equal(gconstpointer v1, gconstpointer v2) +{ + RISCVIOMMUEntry *t1 = (RISCVIOMMUEntry *) v1; + RISCVIOMMUEntry *t2 = (RISCVIOMMUEntry *) v2; + return t1->gscid == t2->gscid && t1->pscid == t2->pscid && + t1->iova == t2->iova; +} + +static guint __iot_hash(gconstpointer v) +{ + RISCVIOMMUEntry *t = (RISCVIOMMUEntry *) v; + return (guint)t->iova; +} + +/* GV: 1 PSCV: 1 AV: 1 */ +static void __iot_inval_pscid_iova(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid && + iot->pscid == arg->pscid && + iot->iova == arg->iova) { + iot->perm = 0; + } +} + +/* GV: 1 PSCV: 1 AV: 0 */ +static void __iot_inval_pscid(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid && + iot->pscid == arg->pscid) { + iot->perm = 0; + } +} + +/* GV: 1 GVMA: 1 */ +static void __iot_inval_gscid_gpa(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid) { + /* simplified cache, no GPA matching */ + iot->perm = 0; + } +} + +/* GV: 1 GVMA: 0 */ +static void __iot_inval_gscid(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid) { + iot->perm = 0; + } +} + +/* GV: 0 */ +static void __iot_inval_all(gpointer key, gpointer value, gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + iot->perm = 0; +} + +/* caller should keep ref-count for iot_cache object */ +static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx, + GHashTable *iot_cache, hwaddr iova) +{ + RISCVIOMMUEntry key = { + .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID), + .iova = PPN_DOWN(iova), + }; + return g_hash_table_lookup(iot_cache, &key); +} + +/* caller should keep ref-count for iot_cache object */ +static void riscv_iommu_iot_update(RISCVIOMMUState *s, + GHashTable *iot_cache, RISCVIOMMUEntry *iot) +{ + if (!s->iot_limit) { + return; + } + + if (g_hash_table_size(s->iot_cache) >= s->iot_limit) { + iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal, + g_free, NULL); + g_hash_table_unref(qatomic_xchg(&s->iot_cache, iot_cache)); + } + g_hash_table_add(iot_cache, iot); +} + +static void riscv_iommu_iot_inval(RISCVIOMMUState *s, GHFunc func, + uint32_t gscid, uint32_t pscid, hwaddr iova) +{ + GHashTable *iot_cache; + RISCVIOMMUEntry key = { + .gscid = gscid, + .pscid = pscid, + .iova = PPN_DOWN(iova), + }; + + iot_cache = g_hash_table_ref(s->iot_cache); + g_hash_table_foreach(iot_cache, func, &key); + g_hash_table_unref(iot_cache); +} + static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, - IOMMUTLBEntry *iotlb) + IOMMUTLBEntry *iotlb, bool enable_cache) { + RISCVIOMMUEntry *iot; + IOMMUAccessFlags perm; bool enable_faults; bool enable_pasid; bool enable_pri; + GHashTable *iot_cache; int fault; + iot_cache = g_hash_table_ref(s->iot_cache); + enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF); /* * TC[32] is reserved for custom extensions, used here to temporarily @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32)); enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV); + iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova); + perm = iot ? iot->perm : IOMMU_NONE; + if (perm != IOMMU_NONE) { + iotlb->translated_addr = PPN_PHYS(iot->phys); + iotlb->addr_mask = ~TARGET_PAGE_MASK; + iotlb->perm = perm; + fault = 0; + goto done; + } + /* Translate using device directory / page table information. */ fault = riscv_iommu_spa_fetch(s, ctx, iotlb); + if (!fault && iotlb->target_as == &s->trap_as) { + /* Do not cache trapped MSI translations */ + goto done; + } + + if (!fault && iotlb->translated_addr != iotlb->iova && enable_cache) { + iot = g_new0(RISCVIOMMUEntry, 1); + iot->iova = PPN_DOWN(iotlb->iova); + iot->phys = PPN_DOWN(iotlb->translated_addr); + iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID); + iot->perm = iotlb->perm; + riscv_iommu_iot_update(s, iot_cache, iot); + } + +done: + g_hash_table_unref(iot_cache); + if (enable_pri && fault) { struct riscv_iommu_pq_record pr = {0}; if (enable_pasid) { @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s) if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) { /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */ goto cmd_ill; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) { + /* invalidate all cache mappings */ + func = __iot_inval_all; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) { + /* invalidate cache matching GSCID */ + func = __iot_inval_gscid; + } else { + /* invalidate cache matching GSCID and ADDR (GPA) */ + func = __iot_inval_gscid_gpa; } - /* translation cache not implemented yet */ + riscv_iommu_iot_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), 0, + cmd.dword1 & TARGET_PAGE_MASK); break; case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA, RISCV_IOMMU_CMD_IOTINVAL_OPCODE): - /* translation cache not implemented yet */ + if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) { + /* invalidate all cache mappings, simplified model */ + func = __iot_inval_all; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV)) { + /* invalidate cache matching GSCID, simplified model */ + func = __iot_inval_gscid; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) { + /* invalidate cache matching GSCID and PSCID */ + func = __iot_inval_pscid; + } else { + /* invalidate cache matching GSCID and PSCID and ADDR (IOVA) */ + func = __iot_inval_pscid_iova; + } + riscv_iommu_iot_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_PSCID), + cmd.dword1 & TARGET_PAGE_MASK); break; case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT, @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) /* Device translation context cache */ s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal, g_free, NULL); + s->iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal, + g_free, NULL); s->iommus.le_next = NULL; s->iommus.le_prev = NULL; @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_unrealize(DeviceState *dev) qemu_thread_join(&s->core_proc); qemu_cond_destroy(&s->core_cond); qemu_mutex_destroy(&s->core_lock); + g_hash_table_unref(s->iot_cache); g_hash_table_unref(s->ctx_cache); } @@ -XXX,XX +XXX,XX @@ static Property riscv_iommu_properties[] = { DEFINE_PROP_UINT32("version", RISCVIOMMUState, version, RISCV_IOMMU_SPEC_DOT_VER), DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0), + DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit, + LIMIT_CACHE_IOT), DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr, @@ -XXX,XX +XXX,XX @@ static IOMMUTLBEntry riscv_iommu_memory_region_translate( /* Translation disabled or invalid. */ iotlb.addr_mask = 0; iotlb.perm = IOMMU_NONE; - } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) { + } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb, true)) { /* Translation disabled or fault reported. */ iotlb.addr_mask = 0; iotlb.perm = IOMMU_NONE; diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.h +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUState { MemoryRegion trap_mr; GHashTable *ctx_cache; /* Device translation Context Cache */ + GHashTable *iot_cache; /* IO Translated Address Cache */ + unsigned iot_limit; /* IO Translation Cache size limit */ /* MMIO Hardware Interface */ MemoryRegion regs_mr; -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Add support for s-stage (sv32, sv39, sv48, sv57 caps) and g-stage (sv32x4, sv39x4, sv48x4, sv57x4 caps). Most of the work is done in the riscv_iommu_spa_fetch() function that now has to consider how many translation stages we need to walk the page table. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/riscv-iommu-bits.h | 11 ++ hw/riscv/riscv-iommu.c | 282 ++++++++++++++++++++++++++++++++++-- hw/riscv/riscv-iommu.h | 2 + 3 files changed, 286 insertions(+), 9 deletions(-) diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-bits.h +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { /* 5.3 IOMMU Capabilities (64bits) */ #define RISCV_IOMMU_REG_CAP 0x0000 #define RISCV_IOMMU_CAP_VERSION GENMASK_ULL(7, 0) +#define RISCV_IOMMU_CAP_SV32 BIT_ULL(8) +#define RISCV_IOMMU_CAP_SV39 BIT_ULL(9) +#define RISCV_IOMMU_CAP_SV48 BIT_ULL(10) +#define RISCV_IOMMU_CAP_SV57 BIT_ULL(11) +#define RISCV_IOMMU_CAP_SV32X4 BIT_ULL(16) +#define RISCV_IOMMU_CAP_SV39X4 BIT_ULL(17) +#define RISCV_IOMMU_CAP_SV48X4 BIT_ULL(18) +#define RISCV_IOMMU_CAP_SV57X4 BIT_ULL(19) #define RISCV_IOMMU_CAP_MSI_FLAT BIT_ULL(22) #define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) #define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { /* 5.4 Features control register (32bits) */ #define RISCV_IOMMU_REG_FCTL 0x0008 +#define RISCV_IOMMU_FCTL_GXL BIT(2) /* 5.5 Device-directory-table pointer (64bits) */ #define RISCV_IOMMU_REG_DDTP 0x0010 @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_dc { #define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) #define RISCV_IOMMU_DC_TC_PDTV BIT_ULL(5) #define RISCV_IOMMU_DC_TC_PRPR BIT_ULL(6) +#define RISCV_IOMMU_DC_TC_GADE BIT_ULL(7) +#define RISCV_IOMMU_DC_TC_SADE BIT_ULL(8) #define RISCV_IOMMU_DC_TC_DPE BIT_ULL(9) #define RISCV_IOMMU_DC_TC_SXL BIT_ULL(11) diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUContext { uint64_t __rfu:20; /* reserved */ uint64_t tc; /* Translation Control */ uint64_t ta; /* Translation Attributes */ + uint64_t satp; /* S-Stage address translation and protection */ + uint64_t gatp; /* G-Stage address translation and protection */ uint64_t msi_addr_mask; /* MSI filtering - address mask */ uint64_t msi_addr_pattern; /* MSI filtering - address pattern */ uint64_t msiptp; /* MSI redirection page table pointer */ @@ -XXX,XX +XXX,XX @@ static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, return true; } -/* RISCV IOMMU Address Translation Lookup - Page Table Walk */ +/* + * RISCV IOMMU Address Translation Lookup - Page Table Walk + * + * Note: Code is based on get_physical_address() from target/riscv/cpu_helper.c + * Both implementation can be merged into single helper function in future. + * Keeping them separate for now, as error reporting and flow specifics are + * sufficiently different for separate implementation. + * + * @s : IOMMU Device State + * @ctx : Translation context for device id and process address space id. + * @iotlb : translation data: physical address and access mode. + * @gpa : provided IOVA is a guest physical address, use G-Stage only. + * @return : success or fault cause code. + */ static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, - IOMMUTLBEntry *iotlb) + IOMMUTLBEntry *iotlb, bool gpa) { + dma_addr_t addr, base; + uint64_t satp, gatp, pte; + bool en_s, en_g; + struct { + unsigned char step; + unsigned char levels; + unsigned char ptidxbits; + unsigned char ptesize; + } sc[2]; + /* Translation stage phase */ + enum { + S_STAGE = 0, + G_STAGE = 1, + } pass; + + satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD); + gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD); + + en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE && !gpa; + en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE; + /* Early check for MSI address match when IOVA == GPA */ - if (iotlb->perm & IOMMU_WO && + if (!en_s && (iotlb->perm & IOMMU_WO) && riscv_iommu_msi_check(s, ctx, iotlb->iova)) { iotlb->target_as = &s->trap_as; iotlb->translated_addr = iotlb->iova; @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, } /* Exit early for pass-through mode. */ - iotlb->translated_addr = iotlb->iova; - iotlb->addr_mask = ~TARGET_PAGE_MASK; - /* Allow R/W in pass-through mode */ - iotlb->perm = IOMMU_RW; - return 0; + if (!(en_s || en_g)) { + iotlb->translated_addr = iotlb->iova; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + /* Allow R/W in pass-through mode */ + iotlb->perm = IOMMU_RW; + return 0; + } + + /* S/G translation parameters. */ + for (pass = 0; pass < 2; pass++) { + uint32_t sv_mode; + + sc[pass].step = 0; + if (pass ? (s->fctl & RISCV_IOMMU_FCTL_GXL) : + (ctx->tc & RISCV_IOMMU_DC_TC_SXL)) { + /* 32bit mode for GXL/SXL == 1 */ + switch (pass ? gatp : satp) { + case RISCV_IOMMU_DC_IOHGATP_MODE_BARE: + sc[pass].levels = 0; + sc[pass].ptidxbits = 0; + sc[pass].ptesize = 0; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV32X4 : RISCV_IOMMU_CAP_SV32; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 2; + sc[pass].ptidxbits = 10; + sc[pass].ptesize = 4; + break; + default: + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + } else { + /* 64bit mode for GXL/SXL == 0 */ + switch (pass ? gatp : satp) { + case RISCV_IOMMU_DC_IOHGATP_MODE_BARE: + sc[pass].levels = 0; + sc[pass].ptidxbits = 0; + sc[pass].ptesize = 0; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV39X4 : RISCV_IOMMU_CAP_SV39; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 3; + sc[pass].ptidxbits = 9; + sc[pass].ptesize = 8; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV48X4 : RISCV_IOMMU_CAP_SV48; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 4; + sc[pass].ptidxbits = 9; + sc[pass].ptesize = 8; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV57X4 : RISCV_IOMMU_CAP_SV57; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 5; + sc[pass].ptidxbits = 9; + sc[pass].ptesize = 8; + break; + default: + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + } + }; + + /* S/G stages translation tables root pointers */ + gatp = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD)); + satp = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_ATP_PPN_FIELD)); + addr = (en_s && en_g) ? satp : iotlb->iova; + base = en_g ? gatp : satp; + pass = en_g ? G_STAGE : S_STAGE; + + do { + const unsigned widened = (pass && !sc[pass].step) ? 2 : 0; + const unsigned va_bits = widened + sc[pass].ptidxbits; + const unsigned va_skip = TARGET_PAGE_BITS + sc[pass].ptidxbits * + (sc[pass].levels - 1 - sc[pass].step); + const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1); + const dma_addr_t pte_addr = base + idx * sc[pass].ptesize; + const bool ade = + ctx->tc & (pass ? RISCV_IOMMU_DC_TC_GADE : RISCV_IOMMU_DC_TC_SADE); + + /* Address range check before first level lookup */ + if (!sc[pass].step) { + const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1; + if ((addr & va_mask) != addr) { + return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED; + } + } + + /* Read page table entry */ + if (dma_memory_read(s->target_as, pte_addr, &pte, + sc[pass].ptesize, MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return (iotlb->perm & IOMMU_WO) ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT + : RISCV_IOMMU_FQ_CAUSE_RD_FAULT; + } + + if (sc[pass].ptesize == 4) { + pte = (uint64_t) le32_to_cpu(*((uint32_t *)&pte)); + } else { + pte = le64_to_cpu(pte); + } + + sc[pass].step++; + hwaddr ppn = pte >> PTE_PPN_SHIFT; + + if (!(pte & PTE_V)) { + break; /* Invalid PTE */ + } else if (!(pte & (PTE_R | PTE_W | PTE_X))) { + base = PPN_PHYS(ppn); /* Inner PTE, continue walking */ + } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) { + break; /* Reserved leaf PTE flags: PTE_W */ + } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) { + break; /* Reserved leaf PTE flags: PTE_W + PTE_X */ + } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) { + break; /* Misaligned PPN */ + } else if ((iotlb->perm & IOMMU_RO) && !(pte & PTE_R)) { + break; /* Read access check failed */ + } else if ((iotlb->perm & IOMMU_WO) && !(pte & PTE_W)) { + break; /* Write access check failed */ + } else if ((iotlb->perm & IOMMU_RO) && !ade && !(pte & PTE_A)) { + break; /* Access bit not set */ + } else if ((iotlb->perm & IOMMU_WO) && !ade && !(pte & PTE_D)) { + break; /* Dirty bit not set */ + } else { + /* Leaf PTE, translation completed. */ + sc[pass].step = sc[pass].levels; + base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1)); + /* Update address mask based on smallest translation granularity */ + iotlb->addr_mask &= (1ULL << va_skip) - 1; + /* Continue with S-Stage translation? */ + if (pass && sc[0].step != sc[0].levels) { + pass = S_STAGE; + addr = iotlb->iova; + continue; + } + /* Translation phase completed (GPA or SPA) */ + iotlb->translated_addr = base; + iotlb->perm = (pte & PTE_W) ? ((pte & PTE_R) ? IOMMU_RW : IOMMU_WO) + : IOMMU_RO; + + /* Check MSI GPA address match */ + if (pass == S_STAGE && (iotlb->perm & IOMMU_WO) && + riscv_iommu_msi_check(s, ctx, base)) { + /* Trap MSI writes and return GPA address. */ + iotlb->target_as = &s->trap_as; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + return 0; + } + + /* Continue with G-Stage translation? */ + if (!pass && en_g) { + pass = G_STAGE; + addr = base; + base = gatp; + sc[pass].step = 0; + continue; + } + + return 0; + } + + if (sc[pass].step == sc[pass].levels) { + break; /* Can't find leaf PTE */ + } + + /* Continue with G-Stage translation? */ + if (!pass && en_g) { + pass = G_STAGE; + addr = base; + base = gatp; + sc[pass].step = 0; + } + } while (1); + + return (iotlb->perm & IOMMU_WO) ? + (pass ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS : + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S) : + (pass ? RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS : + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S); } /* Redirect MSI write for given GPA. */ @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) case RISCV_IOMMU_DDTP_MODE_BARE: /* mock up pass-through translation context */ + ctx->gatp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD, + RISCV_IOMMU_DC_IOHGATP_MODE_BARE); + ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD, + RISCV_IOMMU_DC_FSC_MODE_BARE); ctx->tc = RISCV_IOMMU_DC_TC_V; ctx->ta = 0; ctx->msiptp = 0; @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) /* Set translation context. */ ctx->tc = le64_to_cpu(dc.tc); + ctx->gatp = le64_to_cpu(dc.iohgatp); + ctx->satp = le64_to_cpu(dc.fsc); ctx->ta = le64_to_cpu(dc.ta); ctx->msiptp = le64_to_cpu(dc.msiptp); ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask); @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; } + /* FSC field checks */ + mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE); + addr = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_DC_FSC_PPN)); + + if (mode == RISCV_IOMMU_DC_FSC_MODE_BARE) { + /* No S-Stage translation, done. */ + return 0; + } + if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) { if (ctx->pasid != RISCV_IOMMU_NOPASID) { /* PASID is disabled */ return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED; } + if (mode > RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57) { + /* Invalid translation mode */ + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } return 0; } + if (ctx->pasid == RISCV_IOMMU_NOPASID) { + if (!(ctx->tc & RISCV_IOMMU_DC_TC_DPE)) { + /* No default PASID enabled, set BARE mode */ + ctx->satp = 0ULL; + return 0; + } else { + /* Use default PASID #0 */ + ctx->pasid = 0; + } + } + /* FSC.TC.PDTV enabled */ if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) { /* Invalid PDTP.MODE */ @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) /* Use FSC and TA from process directory entry. */ ctx->ta = le64_to_cpu(dc.ta); + ctx->satp = le64_to_cpu(dc.fsc); return 0; } @@ -XXX,XX +XXX,XX @@ static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx, GHashTable *iot_cache, hwaddr iova) { RISCVIOMMUEntry key = { + .gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID), .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID), .iova = PPN_DOWN(iova), }; @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, } /* Translate using device directory / page table information. */ - fault = riscv_iommu_spa_fetch(s, ctx, iotlb); + fault = riscv_iommu_spa_fetch(s, ctx, iotlb, false); if (!fault && iotlb->target_as == &s->trap_as) { /* Do not cache trapped MSI translations */ @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, iot = g_new0(RISCVIOMMUEntry, 1); iot->iova = PPN_DOWN(iotlb->iova); iot->phys = PPN_DOWN(iotlb->translated_addr); + iot->gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID); iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID); iot->perm = iotlb->perm; riscv_iommu_iot_update(s, iot_cache, iot); @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) if (s->enable_msi) { s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF; } + if (s->enable_s_stage) { + s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 | + RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57; + } + if (s->enable_g_stage) { + s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 | + RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4; + } /* Report QEMU target physical address space limits */ s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS, TARGET_PHYS_ADDR_SPACE_BITS); @@ -XXX,XX +XXX,XX @@ static Property riscv_iommu_properties[] = { LIMIT_CACHE_IOT), DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), + DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE), + DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE), DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr, TYPE_MEMORY_REGION, MemoryRegion *), DEFINE_PROP_END_OF_LIST(), diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.h +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUState { bool enable_off; /* Enable out-of-reset OFF mode (DMA disabled) */ bool enable_msi; /* Enable MSI remapping */ + bool enable_s_stage; /* Enable S/VS-Stage translation */ + bool enable_g_stage; /* Enable G-Stage translation */ /* IOMMU Internal State */ uint64_t ddtp; /* Validated Device Directory Tree Root Pointer */ -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Add PCIe Address Translation Services (ATS) capabilities to the IOMMU. This will add support for ATS translation requests in Fault/Event queues, Page-request queue and IOATC invalidations. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/riscv-iommu-bits.h | 43 ++++++++++++++- hw/riscv/riscv-iommu.c | 107 +++++++++++++++++++++++++++++++++--- hw/riscv/riscv-iommu.h | 1 + hw/riscv/trace-events | 3 + 4 files changed, 145 insertions(+), 9 deletions(-) diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-bits.h +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { #define RISCV_IOMMU_CAP_SV57X4 BIT_ULL(19) #define RISCV_IOMMU_CAP_MSI_FLAT BIT_ULL(22) #define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) +#define RISCV_IOMMU_CAP_ATS BIT_ULL(25) #define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) #define RISCV_IOMMU_CAP_PAS GENMASK_ULL(37, 32) #define RISCV_IOMMU_CAP_PD8 BIT_ULL(38) @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_dc { /* Translation control fields */ #define RISCV_IOMMU_DC_TC_V BIT_ULL(0) +#define RISCV_IOMMU_DC_TC_EN_ATS BIT_ULL(1) #define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) #define RISCV_IOMMU_DC_TC_PDTV BIT_ULL(5) #define RISCV_IOMMU_DC_TC_PRPR BIT_ULL(6) @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_command { #define RISCV_IOMMU_CMD_IODIR_DV BIT_ULL(33) #define RISCV_IOMMU_CMD_IODIR_DID GENMASK_ULL(63, 40) +/* 3.1.4 I/O MMU PCIe ATS */ +#define RISCV_IOMMU_CMD_ATS_OPCODE 4 +#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL 0 +#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR 1 +#define RISCV_IOMMU_CMD_ATS_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_CMD_ATS_PV BIT_ULL(32) +#define RISCV_IOMMU_CMD_ATS_DSV BIT_ULL(33) +#define RISCV_IOMMU_CMD_ATS_RID GENMASK_ULL(55, 40) +#define RISCV_IOMMU_CMD_ATS_DSEG GENMASK_ULL(63, 56) +/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */ + +/* ATS.PRGR payload */ +#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE GENMASK_ULL(47, 44) + enum riscv_iommu_dc_fsc_atp_modes { RISCV_IOMMU_DC_FSC_MODE_BARE = 0, RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8, @@ -XXX,XX +XXX,XX @@ enum riscv_iommu_fq_ttypes { RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5, RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6, RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7, - RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8, + RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8, + RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9, +}; + +/* Header fields */ +#define RISCV_IOMMU_PREQ_HDR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_PREQ_HDR_PV BIT_ULL(32) +#define RISCV_IOMMU_PREQ_HDR_PRIV BIT_ULL(33) +#define RISCV_IOMMU_PREQ_HDR_EXEC BIT_ULL(34) +#define RISCV_IOMMU_PREQ_HDR_DID GENMASK_ULL(63, 40) + +/* Payload fields */ +#define RISCV_IOMMU_PREQ_PAYLOAD_R BIT_ULL(0) +#define RISCV_IOMMU_PREQ_PAYLOAD_W BIT_ULL(1) +#define RISCV_IOMMU_PREQ_PAYLOAD_L BIT_ULL(2) +#define RISCV_IOMMU_PREQ_PAYLOAD_M GENMASK_ULL(2, 0) +#define RISCV_IOMMU_PREQ_PRG_INDEX GENMASK_ULL(11, 3) +#define RISCV_IOMMU_PREQ_UADDR GENMASK_ULL(63, 12) + + +/* + * struct riscv_iommu_msi_pte - MSI Page Table Entry + */ +struct riscv_iommu_msi_pte { + uint64_t pte; + uint64_t mrif_info; }; /* Fields on pte */ diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) RISCV_IOMMU_DC_IOHGATP_MODE_BARE); ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD, RISCV_IOMMU_DC_FSC_MODE_BARE); - ctx->tc = RISCV_IOMMU_DC_TC_V; + ctx->tc = RISCV_IOMMU_DC_TC_EN_ATS | RISCV_IOMMU_DC_TC_V; ctx->ta = 0; ctx->msiptp = 0; return 0; @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32)); enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV); + /* Check for ATS request. */ + if (iotlb->perm == IOMMU_NONE) { + /* Check if ATS is disabled. */ + if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS)) { + enable_pri = false; + fault = RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED; + goto done; + } + trace_riscv_iommu_ats(s->parent_obj.id, PCI_BUS_NUM(ctx->devid), + PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), iotlb->iova); + } + iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova); perm = iot ? iot->perm : IOMMU_NONE; if (perm != IOMMU_NONE) { @@ -XXX,XX +XXX,XX @@ done: if (enable_faults && fault) { struct riscv_iommu_fq_record ev; - unsigned ttype; - - if (iotlb->perm & IOMMU_RW) { - ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR; - } else { - ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD; - } + const unsigned ttype = + (iotlb->perm & IOMMU_RW) ? RISCV_IOMMU_FQ_TTYPE_UADDR_WR : + ((iotlb->perm & IOMMU_RO) ? RISCV_IOMMU_FQ_TTYPE_UADDR_RD : + RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ); ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault); ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype); ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid); @@ -XXX,XX +XXX,XX @@ static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify, MEMTXATTRS_UNSPECIFIED); } +static void riscv_iommu_ats(RISCVIOMMUState *s, + struct riscv_iommu_command *cmd, IOMMUNotifierFlag flag, + IOMMUAccessFlags perm, + void (*trace_fn)(const char *id)) +{ + RISCVIOMMUSpace *as = NULL; + IOMMUNotifier *n; + IOMMUTLBEvent event; + uint32_t pasid; + uint32_t devid; + const bool pv = cmd->dword0 & RISCV_IOMMU_CMD_ATS_PV; + + if (cmd->dword0 & RISCV_IOMMU_CMD_ATS_DSV) { + /* Use device segment and requester id */ + devid = get_field(cmd->dword0, + RISCV_IOMMU_CMD_ATS_DSEG | RISCV_IOMMU_CMD_ATS_RID); + } else { + devid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_RID); + } + + pasid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_PID); + + qemu_mutex_lock(&s->core_lock); + QLIST_FOREACH(as, &s->spaces, list) { + if (as->devid == devid) { + break; + } + } + qemu_mutex_unlock(&s->core_lock); + + if (!as || !as->notifier) { + return; + } + + event.type = flag; + event.entry.perm = perm; + event.entry.target_as = s->target_as; + + IOMMU_NOTIFIER_FOREACH(n, &as->iova_mr) { + if (!pv || n->iommu_idx == pasid) { + event.entry.iova = n->start; + event.entry.addr_mask = n->end - n->start; + trace_fn(as->iova_mr.parent_obj.name); + memory_region_notify_iommu_one(n, &event); + } + } +} + +static void riscv_iommu_ats_inval(RISCVIOMMUState *s, + struct riscv_iommu_command *cmd) +{ + return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_DEVIOTLB_UNMAP, IOMMU_NONE, + trace_riscv_iommu_ats_inval); +} + +static void riscv_iommu_ats_prgr(RISCVIOMMUState *s, + struct riscv_iommu_command *cmd) +{ + unsigned resp_code = get_field(cmd->dword1, + RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE); + + /* Using the access flag to carry response code information */ + IOMMUAccessFlags perm = resp_code ? IOMMU_NONE : IOMMU_RW; + return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_MAP, perm, + trace_riscv_iommu_ats_prgr); +} + static void riscv_iommu_process_ddtp(RISCVIOMMUState *s) { uint64_t old_ddtp = s->ddtp; @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s) get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID)); break; + /* ATS commands */ + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_INVAL, + RISCV_IOMMU_CMD_ATS_OPCODE): + riscv_iommu_ats_inval(s, &cmd); + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_PRGR, + RISCV_IOMMU_CMD_ATS_OPCODE): + riscv_iommu_ats_prgr(s, &cmd); + break; + default: cmd_ill: /* Invalid instruction, do not advance instruction index. */ @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) if (s->enable_msi) { s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF; } + if (s->enable_ats) { + s->cap |= RISCV_IOMMU_CAP_ATS; + } if (s->enable_s_stage) { s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 | RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57; @@ -XXX,XX +XXX,XX @@ static Property riscv_iommu_properties[] = { DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit, LIMIT_CACHE_IOT), DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), + DEFINE_PROP_BOOL("ats", RISCVIOMMUState, enable_ats, TRUE), DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE), DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE), diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.h +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUState { bool enable_off; /* Enable out-of-reset OFF mode (DMA disabled) */ bool enable_msi; /* Enable MSI remapping */ + bool enable_ats; /* Enable ATS support */ bool enable_s_stage; /* Enable S/VS-Stage translation */ bool enable_g_stage; /* Enable G-Stage translation */ diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/trace-events +++ b/hw/riscv/trace-events @@ -XXX,XX +XXX,XX @@ riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iov riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64 riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added" riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed" +riscv_iommu_ats(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: translate request %04x:%02x.%u iova: 0x%"PRIx64 +riscv_iommu_ats_inval(const char *id) "%s: dev-iotlb invalidate" +riscv_iommu_ats_prgr(const char *id) "%s: dev-iotlb page request group response" -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> DBG support adds three additional registers: tr_req_iova, tr_req_ctl and tr_response. The DBG cap is always enabled. No on/off toggle is provided for it. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/riscv-iommu-bits.h | 20 +++++++++++++ hw/riscv/riscv-iommu.c | 57 ++++++++++++++++++++++++++++++++++++- 2 files changed, 76 insertions(+), 1 deletion(-) diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-bits.h +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { #define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) #define RISCV_IOMMU_CAP_ATS BIT_ULL(25) #define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) +#define RISCV_IOMMU_CAP_DBG BIT_ULL(31) #define RISCV_IOMMU_CAP_PAS GENMASK_ULL(37, 32) #define RISCV_IOMMU_CAP_PD8 BIT_ULL(38) @@ -XXX,XX +XXX,XX @@ enum { RISCV_IOMMU_INTR_COUNT }; +#define RISCV_IOMMU_IPSR_CIP BIT(RISCV_IOMMU_INTR_CQ) +#define RISCV_IOMMU_IPSR_FIP BIT(RISCV_IOMMU_INTR_FQ) +#define RISCV_IOMMU_IPSR_PMIP BIT(RISCV_IOMMU_INTR_PM) +#define RISCV_IOMMU_IPSR_PIP BIT(RISCV_IOMMU_INTR_PQ) + +/* 5.24 Translation request IOVA (64bits) */ +#define RISCV_IOMMU_REG_TR_REQ_IOVA 0x0258 + +/* 5.25 Translation request control (64bits) */ +#define RISCV_IOMMU_REG_TR_REQ_CTL 0x0260 +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY BIT_ULL(0) +#define RISCV_IOMMU_TR_REQ_CTL_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_TR_REQ_CTL_DID GENMASK_ULL(63, 40) + +/* 5.26 Translation request response (64bits) */ +#define RISCV_IOMMU_REG_TR_RESPONSE 0x0268 +#define RISCV_IOMMU_TR_RESPONSE_FAULT BIT_ULL(0) +#define RISCV_IOMMU_TR_RESPONSE_PPN RISCV_IOMMU_PPN_FIELD + /* 5.27 Interrupt cause to vector (64bits) */ #define RISCV_IOMMU_REG_IVEC 0x02F8 diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_process_pq_control(RISCVIOMMUState *s) riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr); } +static void riscv_iommu_process_dbg(RISCVIOMMUState *s) +{ + uint64_t iova = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_IOVA); + uint64_t ctrl = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_CTL); + unsigned devid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_DID); + unsigned pid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_PID); + RISCVIOMMUContext *ctx; + void *ref; + + if (!(ctrl & RISCV_IOMMU_TR_REQ_CTL_GO_BUSY)) { + return; + } + + ctx = riscv_iommu_ctx(s, devid, pid, &ref); + if (ctx == NULL) { + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, + RISCV_IOMMU_TR_RESPONSE_FAULT | + (RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED << 10)); + } else { + IOMMUTLBEntry iotlb = { + .iova = iova, + .perm = IOMMU_NONE, + .addr_mask = ~0, + .target_as = NULL, + }; + int fault = riscv_iommu_translate(s, ctx, &iotlb, false); + if (fault) { + iova = RISCV_IOMMU_TR_RESPONSE_FAULT | (((uint64_t) fault) << 10); + } else { + iova = ((iotlb.translated_addr & ~iotlb.addr_mask) >> 2) & + RISCV_IOMMU_TR_RESPONSE_PPN; + } + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, iova); + } + + riscv_iommu_reg_mod64(s, RISCV_IOMMU_REG_TR_REQ_CTL, 0, + RISCV_IOMMU_TR_REQ_CTL_GO_BUSY); + riscv_iommu_ctx_put(s, ref); +} + /* Core IOMMU execution activation */ enum { RISCV_IOMMU_EXEC_DDTP, @@ -XXX,XX +XXX,XX @@ static void *riscv_iommu_core_proc(void* arg) /* NOP */ break; case BIT(RISCV_IOMMU_EXEC_TR_REQUEST): - /* DBG support not implemented yet */ + riscv_iommu_process_dbg(s); break; } exec &= ~mask; @@ -XXX,XX +XXX,XX @@ static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr, exec = BIT(RISCV_IOMMU_EXEC_PQCSR); busy = RISCV_IOMMU_PQCSR_BUSY; break; + + case RISCV_IOMMU_REG_TR_REQ_CTL: + exec = BIT(RISCV_IOMMU_EXEC_TR_REQUEST); + regb = RISCV_IOMMU_REG_TR_REQ_CTL; + busy = RISCV_IOMMU_TR_REQ_CTL_GO_BUSY; + break; } /* @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 | RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4; } + /* Enable translation debug interface */ + s->cap |= RISCV_IOMMU_CAP_DBG; + /* Report QEMU target physical address space limits */ s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS, TARGET_PHYS_ADDR_SPACE_BITS); @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0); stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0); stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp); + /* If debug registers enabled. */ + if (s->cap & RISCV_IOMMU_CAP_DBG) { + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_IOVA], 0); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_CTL], + RISCV_IOMMU_TR_REQ_CTL_GO_BUSY); + } /* Memory region for downstream access, if specified. */ if (s->target_mr) { -- 2.43.2
From: Andrew Jones <ajones@ventanamicro.com> And add mrif notification trace. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- hw/riscv/riscv-iommu-pci.c | 2 +- hw/riscv/riscv-iommu.c | 1 + hw/riscv/trace-events | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-pci.c +++ b/hw/riscv/riscv-iommu-pci.c @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp) pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0); - int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT, + int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT + 1, &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG, &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err); diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s, if (res != MEMTX_OK) { return res; } + trace_riscv_iommu_mrif_notification(s->parent_obj.id, n190, addr); return MEMTX_OK; } diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/trace-events +++ b/hw/riscv/trace-events @@ -XXX,XX +XXX,XX @@ riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t rea riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64 riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64 riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64 +riscv_iommu_mrif_notification(const char *id, uint32_t nid, uint64_t phys) "%s: sent MRIF notification 0x%x to 0x%"PRIx64 riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64 riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added" riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed" -- 2.43.2
Add an additional test to further exercise the IOMMU where we attempt to initialize the command, fault and page-request queues. These steps are taken from chapter 6.2 of the RISC-V IOMMU spec, "Guidelines for initialization". It emulates what we expect from the software/OS when initializing the IOMMU. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- tests/qtest/libqos/riscv-iommu.h | 29 +++++++ tests/qtest/riscv-iommu-test.c | 141 +++++++++++++++++++++++++++++++ 2 files changed, 170 insertions(+) diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/libqos/riscv-iommu.h +++ b/tests/qtest/libqos/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ #define RISCV_IOMMU_REG_IPSR 0x0054 +#define RISCV_IOMMU_REG_IVEC 0x02F8 +#define RISCV_IOMMU_REG_IVEC_CIV GENMASK_ULL(3, 0) +#define RISCV_IOMMU_REG_IVEC_FIV GENMASK_ULL(7, 4) +#define RISCV_IOMMU_REG_IVEC_PIV GENMASK_ULL(15, 12) + +#define RISCV_IOMMU_REG_CQB 0x0018 +#define RISCV_IOMMU_CQB_PPN_START 10 +#define RISCV_IOMMU_CQB_PPN_LEN 44 +#define RISCV_IOMMU_CQB_LOG2SZ_START 0 +#define RISCV_IOMMU_CQB_LOG2SZ_LEN 5 + +#define RISCV_IOMMU_REG_CQT 0x0024 + +#define RISCV_IOMMU_REG_FQB 0x0028 +#define RISCV_IOMMU_FQB_PPN_START 10 +#define RISCV_IOMMU_FQB_PPN_LEN 44 +#define RISCV_IOMMU_FQB_LOG2SZ_START 0 +#define RISCV_IOMMU_FQB_LOG2SZ_LEN 5 + +#define RISCV_IOMMU_REG_FQT 0x0034 + +#define RISCV_IOMMU_REG_PQB 0x0038 +#define RISCV_IOMMU_PQB_PPN_START 10 +#define RISCV_IOMMU_PQB_PPN_LEN 44 +#define RISCV_IOMMU_PQB_LOG2SZ_START 0 +#define RISCV_IOMMU_PQB_LOG2SZ_LEN 5 + +#define RISCV_IOMMU_REG_PQT 0x0044 + typedef struct QRISCVIOMMU { QOSGraphObject obj; QPCIDevice dev; diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/riscv-iommu-test.c +++ b/tests/qtest/riscv-iommu-test.c @@ -XXX,XX +XXX,XX @@ static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset) return reg; } +static void riscv_iommu_write_reg32(QRISCVIOMMU *r_iommu, int reg_offset, + uint32_t val) +{ + qpci_memwrite(&r_iommu->dev, r_iommu->reg_bar, reg_offset, + &val, sizeof(val)); +} + +static void riscv_iommu_write_reg64(QRISCVIOMMU *r_iommu, int reg_offset, + uint64_t val) +{ + qpci_memwrite(&r_iommu->dev, r_iommu->reg_bar, reg_offset, + &val, sizeof(val)); +} + static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc) { QRISCVIOMMU *r_iommu = obj; @@ -XXX,XX +XXX,XX @@ static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc) g_assert_cmpuint(reg, ==, 0); } +/* + * Common timeout-based poll for CQCSR, FQCSR and PQCSR. All + * their ON bits are mapped as RISCV_IOMMU_QUEUE_ACTIVE (16), + */ +static void qtest_wait_for_queue_active(QRISCVIOMMU *r_iommu, + uint32_t queue_csr) +{ + QTestState *qts = global_qtest; + guint64 timeout_us = 2 * 1000 * 1000; + gint64 start_time = g_get_monotonic_time(); + uint32_t reg; + + for (;;) { + qtest_clock_step(qts, 100); + + reg = riscv_iommu_read_reg32(r_iommu, queue_csr); + if (reg & RISCV_IOMMU_QUEUE_ACTIVE) { + break; + } + g_assert(g_get_monotonic_time() - start_time <= timeout_us); + } +} + +/* + * Goes through the queue activation procedures of chapter 6.2, + * "Guidelines for initialization", of the RISCV-IOMMU spec. + */ +static void test_iommu_init_queues(void *obj, void *data, + QGuestAllocator *t_alloc) +{ + QRISCVIOMMU *r_iommu = obj; + uint64_t reg64, q_addr; + uint32_t reg; + int k; + + reg64 = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP); + g_assert_cmpuint(reg64 & RISCV_IOMMU_CAP_VERSION, ==, 0x10); + + /* + * Program the command queue. Write 0xF to civ, assert that + * we have 4 writable bits (k = 4). The amount of entries N in the + * command queue is 2^4 = 16. We need to alloc a N*16 bytes + * buffer and use it to set cqb. + */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC, + 0xFFFF & RISCV_IOMMU_REG_IVEC_CIV); + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_CIV, ==, 0xF); + + q_addr = guest_alloc(t_alloc, 16 * 16); + reg64 = 0; + k = 4; + deposit64(reg64, RISCV_IOMMU_CQB_PPN_START, + RISCV_IOMMU_CQB_PPN_LEN, q_addr); + deposit64(reg64, RISCV_IOMMU_CQB_LOG2SZ_START, + RISCV_IOMMU_CQB_LOG2SZ_LEN, k - 1); + riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_CQB, reg64); + + /* cqt = 0, cqcsr.cqen = 1, poll cqcsr.cqon until it reads 1 */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQT, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR); + reg |= RISCV_IOMMU_CQCSR_CQEN; + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR, reg); + + qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_CQCSR); + + /* + * Program the fault queue. Similar to the above: + * - Write 0xF to fiv, assert that we have 4 writable bits (k = 4) + * - Alloc a 16*32 bytes (instead of 16*16) buffer and use it to set + * fqb + */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC, + 0xFFFF & RISCV_IOMMU_REG_IVEC_FIV); + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_FIV, ==, 0xF0); + + q_addr = guest_alloc(t_alloc, 16 * 32); + reg64 = 0; + k = 4; + deposit64(reg64, RISCV_IOMMU_FQB_PPN_START, + RISCV_IOMMU_FQB_PPN_LEN, q_addr); + deposit64(reg64, RISCV_IOMMU_FQB_LOG2SZ_START, + RISCV_IOMMU_FQB_LOG2SZ_LEN, k - 1); + riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_FQB, reg64); + + /* fqt = 0, fqcsr.fqen = 1, poll fqcsr.fqon until it reads 1 */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQT, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR); + reg |= RISCV_IOMMU_FQCSR_FQEN; + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR, reg); + + qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_FQCSR); + + /* + * Program the page-request queue: + - Write 0xF to piv, assert that we have 4 writable bits (k = 4) + - Alloc a 16*16 bytes buffer and use it to set pqb. + */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC, + 0xFFFF & RISCV_IOMMU_REG_IVEC_PIV); + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_PIV, ==, 0xF000); + + q_addr = guest_alloc(t_alloc, 16 * 16); + reg64 = 0; + k = 4; + deposit64(reg64, RISCV_IOMMU_PQB_PPN_START, + RISCV_IOMMU_PQB_PPN_LEN, q_addr); + deposit64(reg64, RISCV_IOMMU_PQB_LOG2SZ_START, + RISCV_IOMMU_PQB_LOG2SZ_LEN, k - 1); + riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_PQB, reg64); + + /* pqt = 0, pqcsr.pqen = 1, poll pqcsr.pqon until it reads 1 */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQT, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR); + reg |= RISCV_IOMMU_PQCSR_PQEN; + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR, reg); + + qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_PQCSR); +} + static void register_riscv_iommu_test(void) { qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL); qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL); + qos_add_test("iommu_init_queues", "riscv-iommu-pci", + test_iommu_init_queues, NULL); } libqos_init(register_riscv_iommu_test); -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Extension to support DMA with PASID identifier and reporting PASID extended PCIe capabilities. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> --- hw/misc/edu.c | 57 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 44 insertions(+), 13 deletions(-) diff --git a/hw/misc/edu.c b/hw/misc/edu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/edu.c +++ b/hw/misc/edu.c @@ -XXX,XX +XXX,XX @@ #include "qemu/units.h" #include "hw/pci/pci.h" #include "hw/hw.h" +#include "hw/qdev-properties.h" #include "hw/pci/msi.h" #include "qemu/timer.h" #include "qom/object.h" @@ -XXX,XX +XXX,XX @@ struct EduState { QemuCond thr_cond; bool stopping; + bool enable_pasid; + uint32_t addr4; uint32_t fact; #define EDU_STATUS_COMPUTING 0x01 @@ -XXX,XX +XXX,XX @@ struct EduState { # define EDU_DMA_FROM_PCI 0 # define EDU_DMA_TO_PCI 1 #define EDU_DMA_IRQ 0x4 +#define EDU_DMA_PV 0x8 +#define EDU_DMA_PASID(cmd) (((cmd) >> 8) & ((1U << 20) - 1)) + struct dma_state { dma_addr_t src; dma_addr_t dst; @@ -XXX,XX +XXX,XX @@ static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start, static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr) { - dma_addr_t res = addr & edu->dma_mask; - - if (addr != res) { - printf("EDU: clamping DMA %#.16"PRIx64" to %#.16"PRIx64"!\n", addr, res); - } - + dma_addr_t res = addr; return res; } @@ -XXX,XX +XXX,XX @@ static void edu_dma_timer(void *opaque) { EduState *edu = opaque; bool raise_irq = false; + MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED; if (!(edu->dma.cmd & EDU_DMA_RUN)) { return; } + if (edu->enable_pasid && (edu->dma.cmd & EDU_DMA_PV)) { + attrs.unspecified = 0; + attrs.pasid = EDU_DMA_PASID(edu->dma.cmd); + attrs.requester_id = pci_requester_id(&edu->pdev); + attrs.secure = 0; + } + if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) { uint64_t dst = edu->dma.dst; edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE); dst -= DMA_START; - pci_dma_read(&edu->pdev, edu_clamp_addr(edu, edu->dma.src), - edu->dma_buf + dst, edu->dma.cnt); + pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.src), + edu->dma_buf + dst, edu->dma.cnt, + DMA_DIRECTION_TO_DEVICE, attrs); } else { uint64_t src = edu->dma.src; edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE); src -= DMA_START; - pci_dma_write(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst), - edu->dma_buf + src, edu->dma.cnt); + pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst), + edu->dma_buf + src, edu->dma.cnt, + DMA_DIRECTION_FROM_DEVICE, attrs); } edu->dma.cmd &= ~EDU_DMA_RUN; @@ -XXX,XX +XXX,XX @@ static void edu_mmio_write(void *opaque, hwaddr addr, uint64_t val, if (qatomic_read(&edu->status) & EDU_STATUS_COMPUTING) { break; } - /* EDU_STATUS_COMPUTING cannot go 0->1 concurrently, because it is only + /* + * EDU_STATUS_COMPUTING cannot go 0->1 concurrently, because it is only * set in this function and it is under the iothread mutex. */ qemu_mutex_lock(&edu->thr_mutex); @@ -XXX,XX +XXX,XX @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp) { EduState *edu = EDU(pdev); uint8_t *pci_conf = pdev->config; + int pos; pci_config_set_interrupt_pin(pci_conf, 1); + pcie_endpoint_cap_init(pdev, 0); + + /* PCIe extended capability for PASID */ + pos = PCI_CONFIG_SPACE_SIZE; + if (edu->enable_pasid) { + /* PCIe Spec 7.8.9 PASID Extended Capability Structure */ + pcie_add_capability(pdev, 0x1b, 1, pos, 8); + pci_set_long(pdev->config + pos + 4, 0x00001400); + pci_set_long(pdev->wmask + pos + 4, 0xfff0ffff); + } + if (msi_init(pdev, 0, 1, true, false, errp)) { return; } @@ -XXX,XX +XXX,XX @@ static void pci_edu_uninit(PCIDevice *pdev) msi_uninit(pdev); } + static void edu_instance_init(Object *obj) { EduState *edu = EDU(obj); - edu->dma_mask = (1UL << 28) - 1; + edu->dma_mask = ~0ULL; object_property_add_uint64_ptr(obj, "dma_mask", &edu->dma_mask, OBJ_PROP_FLAG_READWRITE); } +static Property edu_properties[] = { + DEFINE_PROP_BOOL("pasid", EduState, enable_pasid, TRUE), + DEFINE_PROP_END_OF_LIST(), +}; + static void edu_class_init(ObjectClass *class, void *data) { DeviceClass *dc = DEVICE_CLASS(class); PCIDeviceClass *k = PCI_DEVICE_CLASS(class); + device_class_set_props(dc, edu_properties); k->realize = pci_edu_realize; k->exit = pci_edu_uninit; k->vendor_id = PCI_VENDOR_ID_QEMU; @@ -XXX,XX +XXX,XX @@ static void edu_class_init(ObjectClass *class, void *data) static void pci_edu_register_types(void) { static InterfaceInfo interfaces[] = { - { INTERFACE_CONVENTIONAL_PCI_DEVICE }, + { INTERFACE_PCIE_DEVICE }, { }, }; static const TypeInfo edu_info = { -- 2.43.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Mimic ATS interface with IOMMU translate request with IOMMU_NONE. If mapping exists, translation service will return current permission flags, otherwise will report no permissions. Implement and register the IOMMU memory region listener to be notified whenever an ATS invalidation request is sent from the IOMMU. Implement and register the IOMMU memory region listener to be notified whenever an ATS page request group response is triggered from the IOMMU. Introduces a retry mechanism to the timer design so that any page that's not available should be only accessed after the PRGR notification has been received. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com> --- hw/misc/edu.c | 258 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 251 insertions(+), 7 deletions(-) diff --git a/hw/misc/edu.c b/hw/misc/edu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/edu.c +++ b/hw/misc/edu.c @@ -XXX,XX +XXX,XX @@ DECLARE_INSTANCE_CHECKER(EduState, EDU, #define DMA_START 0x40000 #define DMA_SIZE 4096 +/* + * Number of tries before giving up on page request group response. + * Given the timer callback is scheduled to be run again after 100ms, + * 10 tries give roughly a second for the PRGR notification to be + * received. + */ +#define NUM_TRIES 10 + struct EduState { PCIDevice pdev; MemoryRegion mmio; @@ -XXX,XX +XXX,XX @@ struct EduState { bool stopping; bool enable_pasid; + uint32_t try; uint32_t addr4; uint32_t fact; @@ -XXX,XX +XXX,XX @@ struct EduState { QEMUTimer dma_timer; char dma_buf[DMA_SIZE]; uint64_t dma_mask; + + MemoryListener iommu_listener; + QLIST_HEAD(, edu_iommu) iommu_list; + + bool prgr_rcvd; + bool prgr_success; +}; + +struct edu_iommu { + EduState *edu; + IOMMUMemoryRegion *iommu_mr; + hwaddr iommu_offset; + IOMMUNotifier n; + QLIST_ENTRY(edu_iommu) iommu_next; }; static bool edu_msi_enabled(EduState *edu) @@ -XXX,XX +XXX,XX @@ static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr) return res; } +static bool __find_iommu_mr_cb(Int128 start, Int128 len, const MemoryRegion *mr, + hwaddr offset_in_region, void *opaque) +{ + IOMMUMemoryRegion **iommu_mr = opaque; + *iommu_mr = memory_region_get_iommu((MemoryRegion *)mr); + return *iommu_mr != NULL; +} + +static int pci_dma_perm(PCIDevice *pdev, dma_addr_t iova, MemTxAttrs attrs) +{ + IOMMUMemoryRegion *iommu_mr = NULL; + IOMMUMemoryRegionClass *imrc; + int iommu_idx; + FlatView *fv; + EduState *edu = EDU(pdev); + struct edu_iommu *iommu; + + RCU_READ_LOCK_GUARD(); + + fv = address_space_to_flatview(pci_get_address_space(pdev)); + + /* Find first IOMMUMemoryRegion */ + flatview_for_each_range(fv, __find_iommu_mr_cb, &iommu_mr); + + if (iommu_mr) { + imrc = memory_region_get_iommu_class_nocheck(iommu_mr); + + /* IOMMU Index is mapping to memory attributes (PASID, etc) */ + iommu_idx = imrc->attrs_to_index ? + imrc->attrs_to_index(iommu_mr, attrs) : 0; + + /* Update IOMMU notifiers with proper index */ + QLIST_FOREACH(iommu, &edu->iommu_list, iommu_next) { + if (iommu->iommu_mr == iommu_mr && + iommu->n.iommu_idx != iommu_idx) { + memory_region_unregister_iommu_notifier( + MEMORY_REGION(iommu->iommu_mr), &iommu->n); + iommu->n.iommu_idx = iommu_idx; + memory_region_register_iommu_notifier( + MEMORY_REGION(iommu->iommu_mr), &iommu->n, NULL); + } + } + + /* Translate request with IOMMU_NONE is an ATS request */ + IOMMUTLBEntry iotlb = imrc->translate(iommu_mr, iova, IOMMU_NONE, + iommu_idx); + + return iotlb.perm; + } + + return IOMMU_NONE; +} + static void edu_dma_timer(void *opaque) { EduState *edu = opaque; bool raise_irq = false; MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED; + MemTxResult res; if (!(edu->dma.cmd & EDU_DMA_RUN)) { return; @@ -XXX,XX +XXX,XX @@ static void edu_dma_timer(void *opaque) if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) { uint64_t dst = edu->dma.dst; + uint64_t src = edu_clamp_addr(edu, edu->dma.src); edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE); dst -= DMA_START; - pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.src), - edu->dma_buf + dst, edu->dma.cnt, - DMA_DIRECTION_TO_DEVICE, attrs); + if (edu->try-- == NUM_TRIES) { + edu->prgr_rcvd = false; + if (!(pci_dma_perm(&edu->pdev, src, attrs) & IOMMU_RO)) { + timer_mod(&edu->dma_timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100); + return; + } + } else if (edu->try) { + if (!edu->prgr_rcvd) { + timer_mod(&edu->dma_timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100); + return; + } + if (!edu->prgr_success) { + /* PRGR failure, fail DMA. */ + edu->dma.cmd &= ~EDU_DMA_RUN; + return; + } + } else { + /* timeout, fail DMA. */ + edu->dma.cmd &= ~EDU_DMA_RUN; + return; + } + res = pci_dma_rw(&edu->pdev, src, edu->dma_buf + dst, edu->dma.cnt, + DMA_DIRECTION_TO_DEVICE, attrs); + if (res != MEMTX_OK) { + hw_error("EDU: DMA transfer TO 0x%"PRIx64" failed.\n", dst); + } } else { uint64_t src = edu->dma.src; + uint64_t dst = edu_clamp_addr(edu, edu->dma.dst); edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE); src -= DMA_START; - pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst), - edu->dma_buf + src, edu->dma.cnt, - DMA_DIRECTION_FROM_DEVICE, attrs); + if (edu->try-- == NUM_TRIES) { + edu->prgr_rcvd = false; + if (!(pci_dma_perm(&edu->pdev, dst, attrs) & IOMMU_WO)) { + timer_mod(&edu->dma_timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100); + return; + } + } else if (edu->try) { + if (!edu->prgr_rcvd) { + timer_mod(&edu->dma_timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100); + return; + } + if (!edu->prgr_success) { + /* PRGR failure, fail DMA. */ + edu->dma.cmd &= ~EDU_DMA_RUN; + return; + } + } else { + /* timeout, fail DMA. */ + edu->dma.cmd &= ~EDU_DMA_RUN; + return; + } + res = pci_dma_rw(&edu->pdev, dst, edu->dma_buf + src, edu->dma.cnt, + DMA_DIRECTION_FROM_DEVICE, attrs); + if (res != MEMTX_OK) { + hw_error("EDU: DMA transfer FROM 0x%"PRIx64" failed.\n", src); + } } edu->dma.cmd &= ~EDU_DMA_RUN; @@ -XXX,XX +XXX,XX @@ static void dma_rw(EduState *edu, bool write, dma_addr_t *val, dma_addr_t *dma, } if (timer) { + edu->try = NUM_TRIES; timer_mod(&edu->dma_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100); } } @@ -XXX,XX +XXX,XX @@ static void *edu_fact_thread(void *opaque) return NULL; } +static void edu_iommu_ats_prgr_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) +{ + struct edu_iommu *iommu = container_of(n, struct edu_iommu, n); + EduState *edu = iommu->edu; + edu->prgr_success = (iotlb->perm != IOMMU_NONE); + barrier(); + edu->prgr_rcvd = true; +} + +static void edu_iommu_ats_inval_notify(IOMMUNotifier *n, + IOMMUTLBEntry *iotlb) +{ + +} + +static void edu_iommu_region_add(MemoryListener *listener, + MemoryRegionSection *section) +{ + EduState *edu = container_of(listener, EduState, iommu_listener); + struct edu_iommu *iommu; + Int128 end; + int iommu_idx; + IOMMUMemoryRegion *iommu_mr; + + if (!memory_region_is_iommu(section->mr)) { + return; + } + + iommu_mr = IOMMU_MEMORY_REGION(section->mr); + + /* Register ATS.INVAL notifier */ + iommu = g_malloc0(sizeof(*iommu)); + iommu->iommu_mr = iommu_mr; + iommu->iommu_offset = section->offset_within_address_space - + section->offset_within_region; + iommu->edu = edu; + end = int128_add(int128_make64(section->offset_within_region), + section->size); + end = int128_sub(end, int128_one()); + iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr, + MEMTXATTRS_UNSPECIFIED); + iommu_notifier_init(&iommu->n, edu_iommu_ats_inval_notify, + IOMMU_NOTIFIER_DEVIOTLB_UNMAP, + section->offset_within_region, + int128_get64(end), + iommu_idx); + memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL); + QLIST_INSERT_HEAD(&edu->iommu_list, iommu, iommu_next); + + /* Register ATS.PRGR notifier */ + iommu = g_memdup2(iommu, sizeof(*iommu)); + iommu_notifier_init(&iommu->n, edu_iommu_ats_prgr_notify, + IOMMU_NOTIFIER_MAP, + section->offset_within_region, + int128_get64(end), + iommu_idx); + memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL); + QLIST_INSERT_HEAD(&edu->iommu_list, iommu, iommu_next); +} + +static void edu_iommu_region_del(MemoryListener *listener, + MemoryRegionSection *section) +{ + EduState *edu = container_of(listener, EduState, iommu_listener); + struct edu_iommu *iommu; + + if (!memory_region_is_iommu(section->mr)) { + return; + } + + QLIST_FOREACH(iommu, &edu->iommu_list, iommu_next) { + if (MEMORY_REGION(iommu->iommu_mr) == section->mr && + iommu->n.start == section->offset_within_region) { + memory_region_unregister_iommu_notifier(section->mr, + &iommu->n); + QLIST_REMOVE(iommu, iommu_next); + g_free(iommu); + break; + } + } +} + static void pci_edu_realize(PCIDevice *pdev, Error **errp) { EduState *edu = EDU(pdev); + AddressSpace *dma_as = NULL; uint8_t *pci_conf = pdev->config; int pos; @@ -XXX,XX +XXX,XX @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp) pos = PCI_CONFIG_SPACE_SIZE; if (edu->enable_pasid) { /* PCIe Spec 7.8.9 PASID Extended Capability Structure */ - pcie_add_capability(pdev, 0x1b, 1, pos, 8); + pcie_add_capability(pdev, PCI_EXT_CAP_ID_PASID, 1, pos, 8); pci_set_long(pdev->config + pos + 4, 0x00001400); pci_set_long(pdev->wmask + pos + 4, 0xfff0ffff); + pos += 8; + + /* ATS Capability */ + pcie_ats_init(pdev, pos, true); + pos += PCI_EXT_CAP_ATS_SIZEOF; + + /* PRI Capability */ + pcie_add_capability(pdev, PCI_EXT_CAP_ID_PRI, 1, pos, 16); + /* PRI STOPPED */ + pci_set_long(pdev->config + pos + 4, 0x01000000); + /* PRI ENABLE bit writable */ + pci_set_long(pdev->wmask + pos + 4, 0x00000001); + /* PRI Capacity Supported */ + pci_set_long(pdev->config + pos + 8, 0x00000080); + /* PRI Allocations Allowed, 32 */ + pci_set_long(pdev->config + pos + 12, 0x00000040); + pci_set_long(pdev->wmask + pos + 12, 0x0000007f); + + pos += 8; } if (msi_init(pdev, 0, 1, true, false, errp)) { @@ -XXX,XX +XXX,XX @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp) memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu, "edu-mmio", 1 * MiB); pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio); + + /* Register IOMMU listener */ + edu->iommu_listener = (MemoryListener) { + .name = "edu-iommu", + .region_add = edu_iommu_region_add, + .region_del = edu_iommu_region_del, + }; + + dma_as = pci_device_iommu_address_space(pdev); + memory_listener_register(&edu->iommu_listener, dma_as); } static void pci_edu_uninit(PCIDevice *pdev) { EduState *edu = EDU(pdev); + memory_listener_unregister(&edu->iommu_listener); + qemu_mutex_lock(&edu->thr_mutex); edu->stopping = true; qemu_mutex_unlock(&edu->thr_mutex); -- 2.43.2
Hi, We had problems right at the finish line of the pull request due to endianness problems reported in the Gitlab CI [1]. This triggered discussions in the middle of the pull request patches [2] that resulted in this new version. We dealt with the endianness problem that was hitting the Gitlab CI in s390x. It was a mix of test changes required in the libqos support, a solution proposed by Peter, and a real endianness problem in patch 3. I made sure to test this version in a s390x VM and everything seems good. Another change made, after review comments from Peter, was the removal of all locks that were result of the old threading approach that we abandoned in v3. BQL is proven to be enough, at least in my testing, to prevent race conditions. Nothing is stopping us from changing our minds later, and even rollback to a thread model with spinlocks and so on, but for now we'll stick to what we decided back in v3. The rest were punctual changes also requested by Peter and Tomasz during the discussions in the pull request [1]. Patches based on riscv-to-apply.next. All patches acked. Changes from v7: - patch 3: - fixed Copyright text of added files - all functions starting with '__X' were renamed to 'riscv_iommu_X' - removed qemu/osdep.h from riscv-iommu.h - removed unused '__rfu' attribute from RISCVIOMMUContext - removed core_lock, ctx_lock and regs_lock - converted the 'read page table entry' code from using dma_memory_read() and le_to_cpu() to use ldl_le_dma() - replaced the 'if size' chain for 'ldn_le_p()' in riscv_iommu_mmio_read() - added a new riscv_iommu_write_reg_val() helper. Used it to remove the 'if size' chains inside riscv_iommu_mmio_write() - do le64_to_cpu() before using 'val' in riscv_iommu_mmio_write() ICVEC/IPSR path; - use brackets to wrap around (icvec & bit) in riscv_iommu_get_icvec_vector() - patch 5: - fixed Copyright text of riscv-iommu-pci.c - patch 7: - use qpci_io_read(l/q) instead of qcpi_memread() - patch 8: - remove iot_lock - remove unused '__rfu' attribute from RISCVIOMMUEntry - all functions starting with '__X' were renamed to 'riscv_iommu_X' - patch 11: - use qpci_io_write(l/q) instead of qpci_memwrite() - v7 link: https://lore.kernel.org/qemu-riscv/20240903201633.93182-1-dbarboza@ventanamicro.com/ [1] https://lore.kernel.org/qemu-devel/CAFEAcA_8A=8q8aFSPwy177x6q3RLOS-kOHwyUDhTbpOjNjLexg@mail.gmail.com/ [2] https://lore.kernel.org/qemu-devel/CAFEAcA8rdFYACFKdJga72WA4ET9NFRwrOifdbTYDBxY6G6uOXA@mail.gmail.com/ Daniel Henrique Barboza (4): pci-ids.rst: add Red Hat pci-id for RISC-V IOMMU device test/qtest: add riscv-iommu-pci tests qtest/riscv-iommu-test: add init queues test docs/specs: add riscv-iommu Tomasz Jeznach (8): exec/memtxattr: add process identifier to the transaction attributes hw/riscv: add riscv-iommu-bits.h hw/riscv: add RISC-V IOMMU base emulation hw/riscv: add riscv-iommu-pci reference device hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug hw/riscv/riscv-iommu: add Address Translation Cache (IOATC) hw/riscv/riscv-iommu: add ATS support hw/riscv/riscv-iommu: add DBG support docs/specs/index.rst | 1 + docs/specs/pci-ids.rst | 2 + docs/specs/riscv-iommu.rst | 90 ++ docs/system/riscv/virt.rst | 13 + hw/riscv/Kconfig | 4 + hw/riscv/meson.build | 1 + hw/riscv/riscv-iommu-bits.h | 421 ++++++ hw/riscv/riscv-iommu-pci.c | 202 +++ hw/riscv/riscv-iommu.c | 2396 ++++++++++++++++++++++++++++++ hw/riscv/riscv-iommu.h | 130 ++ hw/riscv/trace-events | 17 + hw/riscv/trace.h | 1 + hw/riscv/virt.c | 33 +- include/exec/memattrs.h | 5 + include/hw/pci/pci.h | 1 + include/hw/riscv/iommu.h | 36 + meson.build | 1 + tests/qtest/libqos/meson.build | 4 + tests/qtest/libqos/riscv-iommu.c | 76 + tests/qtest/libqos/riscv-iommu.h | 101 ++ tests/qtest/meson.build | 1 + tests/qtest/riscv-iommu-test.c | 210 +++ 22 files changed, 3745 insertions(+), 1 deletion(-) create mode 100644 docs/specs/riscv-iommu.rst create mode 100644 hw/riscv/riscv-iommu-bits.h create mode 100644 hw/riscv/riscv-iommu-pci.c create mode 100644 hw/riscv/riscv-iommu.c create mode 100644 hw/riscv/riscv-iommu.h create mode 100644 hw/riscv/trace-events create mode 100644 hw/riscv/trace.h create mode 100644 include/hw/riscv/iommu.h create mode 100644 tests/qtest/libqos/riscv-iommu.c create mode 100644 tests/qtest/libqos/riscv-iommu.h create mode 100644 tests/qtest/riscv-iommu-test.c -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Extend memory transaction attributes with process identifier to allow per-request address translation logic to use requester_id / process_id to identify memory mapping (e.g. enabling IOMMU w/ PASID translations). Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Jason Chien <jason.chien@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> --- include/exec/memattrs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h index XXXXXXX..XXXXXXX 100644 --- a/include/exec/memattrs.h +++ b/include/exec/memattrs.h @@ -XXX,XX +XXX,XX @@ typedef struct MemTxAttrs { unsigned int memory:1; /* Requester ID (for MSI for example) */ unsigned int requester_id:16; + + /* + * PID (PCI PASID) support: Limited to 8 bits process identifier. + */ + unsigned int pid:8; } MemTxAttrs; /* Bus masters which don't specify any attributes will get this, -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> This header will be used by the RISC-V IOMMU emulation to be added in the next patch. Due to its size it's being sent in separate for an easier review. One thing to notice is that this header can be replaced by the future Linux RISC-V IOMMU driver header, which would become a linux-header we would import instead of keeping our own. The Linux implementation isn't upstream yet so for now we'll have to manage riscv-iommu-bits.h. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Jason Chien <jason.chien@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/riscv-iommu-bits.h | 345 ++++++++++++++++++++++++++++++++++++ 1 file changed, 345 insertions(+) create mode 100644 hw/riscv/riscv-iommu-bits.h diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright © 2022-2023 Rivos Inc. + * Copyright © 2023 FORTH-ICS/CARV + * Copyright © 2023 RISC-V IOMMU Task Group + * + * RISC-V IOMMU - Register Layout and Data Structures. + * + * Based on the IOMMU spec version 1.0, 3/2023 + * https://github.com/riscv-non-isa/riscv-iommu + */ + +#ifndef HW_RISCV_IOMMU_BITS_H +#define HW_RISCV_IOMMU_BITS_H + +#define RISCV_IOMMU_SPEC_DOT_VER 0x010 + +#ifndef GENMASK_ULL +#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l)) +#endif + +/* + * struct riscv_iommu_fq_record - Fault/Event Queue Record + * See section 3.2 for more info. + */ +struct riscv_iommu_fq_record { + uint64_t hdr; + uint64_t _reserved; + uint64_t iotval; + uint64_t iotval2; +}; +/* Header fields */ +#define RISCV_IOMMU_FQ_HDR_CAUSE GENMASK_ULL(11, 0) +#define RISCV_IOMMU_FQ_HDR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_FQ_HDR_PV BIT_ULL(32) +#define RISCV_IOMMU_FQ_HDR_TTYPE GENMASK_ULL(39, 34) +#define RISCV_IOMMU_FQ_HDR_DID GENMASK_ULL(63, 40) + +/* + * struct riscv_iommu_pq_record - PCIe Page Request record + * For more infos on the PCIe Page Request queue see chapter 3.3. + */ +struct riscv_iommu_pq_record { + uint64_t hdr; + uint64_t payload; +}; +/* Header fields */ +#define RISCV_IOMMU_PREQ_HDR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_PREQ_HDR_PV BIT_ULL(32) +#define RISCV_IOMMU_PREQ_HDR_PRIV BIT_ULL(33) +#define RISCV_IOMMU_PREQ_HDR_EXEC BIT_ULL(34) +#define RISCV_IOMMU_PREQ_HDR_DID GENMASK_ULL(63, 40) +/* Payload fields */ +#define RISCV_IOMMU_PREQ_PAYLOAD_M GENMASK_ULL(2, 0) + +/* Common field positions */ +#define RISCV_IOMMU_PPN_FIELD GENMASK_ULL(53, 10) +#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD GENMASK_ULL(4, 0) +#define RISCV_IOMMU_QUEUE_INDEX_FIELD GENMASK_ULL(31, 0) +#define RISCV_IOMMU_QUEUE_ENABLE BIT(0) +#define RISCV_IOMMU_QUEUE_INTR_ENABLE BIT(1) +#define RISCV_IOMMU_QUEUE_MEM_FAULT BIT(8) +#define RISCV_IOMMU_QUEUE_OVERFLOW BIT(9) +#define RISCV_IOMMU_QUEUE_ACTIVE BIT(16) +#define RISCV_IOMMU_QUEUE_BUSY BIT(17) +#define RISCV_IOMMU_ATP_PPN_FIELD GENMASK_ULL(43, 0) +#define RISCV_IOMMU_ATP_MODE_FIELD GENMASK_ULL(63, 60) + +/* 5.3 IOMMU Capabilities (64bits) */ +#define RISCV_IOMMU_REG_CAP 0x0000 +#define RISCV_IOMMU_CAP_VERSION GENMASK_ULL(7, 0) +#define RISCV_IOMMU_CAP_MSI_FLAT BIT_ULL(22) +#define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) +#define RISCV_IOMMU_CAP_T2GPA BIT_ULL(26) +#define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) +#define RISCV_IOMMU_CAP_PAS GENMASK_ULL(37, 32) +#define RISCV_IOMMU_CAP_PD8 BIT_ULL(38) +#define RISCV_IOMMU_CAP_PD17 BIT_ULL(39) +#define RISCV_IOMMU_CAP_PD20 BIT_ULL(40) + +/* 5.4 Features control register (32bits) */ +#define RISCV_IOMMU_REG_FCTL 0x0008 +#define RISCV_IOMMU_FCTL_WSI BIT(1) + +/* 5.5 Device-directory-table pointer (64bits) */ +#define RISCV_IOMMU_REG_DDTP 0x0010 +#define RISCV_IOMMU_DDTP_MODE GENMASK_ULL(3, 0) +#define RISCV_IOMMU_DDTP_BUSY BIT_ULL(4) +#define RISCV_IOMMU_DDTP_PPN RISCV_IOMMU_PPN_FIELD + +enum riscv_iommu_ddtp_modes { + RISCV_IOMMU_DDTP_MODE_OFF = 0, + RISCV_IOMMU_DDTP_MODE_BARE = 1, + RISCV_IOMMU_DDTP_MODE_1LVL = 2, + RISCV_IOMMU_DDTP_MODE_2LVL = 3, + RISCV_IOMMU_DDTP_MODE_3LVL = 4, + RISCV_IOMMU_DDTP_MODE_MAX = 4 +}; + +/* 5.6 Command Queue Base (64bits) */ +#define RISCV_IOMMU_REG_CQB 0x0018 +#define RISCV_IOMMU_CQB_LOG2SZ RISCV_IOMMU_QUEUE_LOGSZ_FIELD +#define RISCV_IOMMU_CQB_PPN RISCV_IOMMU_PPN_FIELD + +/* 5.7 Command Queue head (32bits) */ +#define RISCV_IOMMU_REG_CQH 0x0020 + +/* 5.8 Command Queue tail (32bits) */ +#define RISCV_IOMMU_REG_CQT 0x0024 + +/* 5.9 Fault Queue Base (64bits) */ +#define RISCV_IOMMU_REG_FQB 0x0028 +#define RISCV_IOMMU_FQB_LOG2SZ RISCV_IOMMU_QUEUE_LOGSZ_FIELD +#define RISCV_IOMMU_FQB_PPN RISCV_IOMMU_PPN_FIELD + +/* 5.10 Fault Queue Head (32bits) */ +#define RISCV_IOMMU_REG_FQH 0x0030 + +/* 5.11 Fault Queue tail (32bits) */ +#define RISCV_IOMMU_REG_FQT 0x0034 + +/* 5.12 Page Request Queue base (64bits) */ +#define RISCV_IOMMU_REG_PQB 0x0038 +#define RISCV_IOMMU_PQB_LOG2SZ RISCV_IOMMU_QUEUE_LOGSZ_FIELD +#define RISCV_IOMMU_PQB_PPN RISCV_IOMMU_PPN_FIELD + +/* 5.13 Page Request Queue head (32bits) */ +#define RISCV_IOMMU_REG_PQH 0x0040 + +/* 5.14 Page Request Queue tail (32bits) */ +#define RISCV_IOMMU_REG_PQT 0x0044 + +/* 5.15 Command Queue CSR (32bits) */ +#define RISCV_IOMMU_REG_CQCSR 0x0048 +#define RISCV_IOMMU_CQCSR_CQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_CQCSR_CIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_CQCSR_CQMF RISCV_IOMMU_QUEUE_MEM_FAULT +#define RISCV_IOMMU_CQCSR_CMD_TO BIT(9) +#define RISCV_IOMMU_CQCSR_CMD_ILL BIT(10) +#define RISCV_IOMMU_CQCSR_FENCE_W_IP BIT(11) +#define RISCV_IOMMU_CQCSR_CQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_CQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +/* 5.16 Fault Queue CSR (32bits) */ +#define RISCV_IOMMU_REG_FQCSR 0x004C +#define RISCV_IOMMU_FQCSR_FQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_FQCSR_FIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_FQCSR_FQMF RISCV_IOMMU_QUEUE_MEM_FAULT +#define RISCV_IOMMU_FQCSR_FQOF RISCV_IOMMU_QUEUE_OVERFLOW +#define RISCV_IOMMU_FQCSR_FQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_FQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +/* 5.17 Page Request Queue CSR (32bits) */ +#define RISCV_IOMMU_REG_PQCSR 0x0050 +#define RISCV_IOMMU_PQCSR_PQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_PQCSR_PIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_PQCSR_PQMF RISCV_IOMMU_QUEUE_MEM_FAULT +#define RISCV_IOMMU_PQCSR_PQOF RISCV_IOMMU_QUEUE_OVERFLOW +#define RISCV_IOMMU_PQCSR_PQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_PQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +/* 5.18 Interrupt Pending Status (32bits) */ +#define RISCV_IOMMU_REG_IPSR 0x0054 +#define RISCV_IOMMU_IPSR_CIP BIT(0) +#define RISCV_IOMMU_IPSR_FIP BIT(1) +#define RISCV_IOMMU_IPSR_PIP BIT(3) + +enum { + RISCV_IOMMU_INTR_CQ, + RISCV_IOMMU_INTR_FQ, + RISCV_IOMMU_INTR_PM, + RISCV_IOMMU_INTR_PQ, + RISCV_IOMMU_INTR_COUNT +}; + +/* 5.27 Interrupt cause to vector (64bits) */ +#define RISCV_IOMMU_REG_ICVEC 0x02F8 + +/* 5.28 MSI Configuration table (32 * 64bits) */ +#define RISCV_IOMMU_REG_MSI_CONFIG 0x0300 + +#define RISCV_IOMMU_REG_SIZE 0x1000 + +#define RISCV_IOMMU_DDTE_VALID BIT_ULL(0) +#define RISCV_IOMMU_DDTE_PPN RISCV_IOMMU_PPN_FIELD + +/* Struct riscv_iommu_dc - Device Context - section 2.1 */ +struct riscv_iommu_dc { + uint64_t tc; + uint64_t iohgatp; + uint64_t ta; + uint64_t fsc; + uint64_t msiptp; + uint64_t msi_addr_mask; + uint64_t msi_addr_pattern; + uint64_t _reserved; +}; + +/* Translation control fields */ +#define RISCV_IOMMU_DC_TC_V BIT_ULL(0) +#define RISCV_IOMMU_DC_TC_EN_PRI BIT_ULL(2) +#define RISCV_IOMMU_DC_TC_T2GPA BIT_ULL(3) +#define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) +#define RISCV_IOMMU_DC_TC_PDTV BIT_ULL(5) +#define RISCV_IOMMU_DC_TC_PRPR BIT_ULL(6) +#define RISCV_IOMMU_DC_TC_DPE BIT_ULL(9) +#define RISCV_IOMMU_DC_TC_SBE BIT_ULL(10) +#define RISCV_IOMMU_DC_TC_SXL BIT_ULL(11) + +/* Second-stage (aka G-stage) context fields */ +#define RISCV_IOMMU_DC_IOHGATP_PPN RISCV_IOMMU_ATP_PPN_FIELD +#define RISCV_IOMMU_DC_IOHGATP_GSCID GENMASK_ULL(59, 44) +#define RISCV_IOMMU_DC_IOHGATP_MODE RISCV_IOMMU_ATP_MODE_FIELD + +enum riscv_iommu_dc_iohgatp_modes { + RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0, + RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8, + RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8, + RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9, + RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10 +}; + +/* Translation attributes fields */ +#define RISCV_IOMMU_DC_TA_PSCID GENMASK_ULL(31, 12) + +/* First-stage context fields */ +#define RISCV_IOMMU_DC_FSC_PPN RISCV_IOMMU_ATP_PPN_FIELD +#define RISCV_IOMMU_DC_FSC_MODE RISCV_IOMMU_ATP_MODE_FIELD + +/* Generic I/O MMU command structure - check section 3.1 */ +struct riscv_iommu_command { + uint64_t dword0; + uint64_t dword1; +}; + +#define RISCV_IOMMU_CMD_OPCODE GENMASK_ULL(6, 0) +#define RISCV_IOMMU_CMD_FUNC GENMASK_ULL(9, 7) + +#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE 1 +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA 0 +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA 1 +#define RISCV_IOMMU_CMD_IOTINVAL_AV BIT_ULL(10) +#define RISCV_IOMMU_CMD_IOTINVAL_PSCID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_CMD_IOTINVAL_PSCV BIT_ULL(32) +#define RISCV_IOMMU_CMD_IOTINVAL_GV BIT_ULL(33) +#define RISCV_IOMMU_CMD_IOTINVAL_GSCID GENMASK_ULL(59, 44) + +#define RISCV_IOMMU_CMD_IOFENCE_OPCODE 2 +#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C 0 +#define RISCV_IOMMU_CMD_IOFENCE_AV BIT_ULL(10) +#define RISCV_IOMMU_CMD_IOFENCE_DATA GENMASK_ULL(63, 32) + +#define RISCV_IOMMU_CMD_IODIR_OPCODE 3 +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT 0 +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT 1 +#define RISCV_IOMMU_CMD_IODIR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_CMD_IODIR_DV BIT_ULL(33) +#define RISCV_IOMMU_CMD_IODIR_DID GENMASK_ULL(63, 40) + +enum riscv_iommu_dc_fsc_atp_modes { + RISCV_IOMMU_DC_FSC_MODE_BARE = 0, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9, + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10, + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1, + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2, + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3 +}; + +enum riscv_iommu_fq_causes { + RISCV_IOMMU_FQ_CAUSE_INST_FAULT = 1, + RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED = 4, + RISCV_IOMMU_FQ_CAUSE_RD_FAULT = 5, + RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED = 6, + RISCV_IOMMU_FQ_CAUSE_WR_FAULT = 7, + RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S = 12, + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S = 13, + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S = 15, + RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS = 20, + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS = 21, + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS = 23, + RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED = 256, + RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT = 257, + RISCV_IOMMU_FQ_CAUSE_DDT_INVALID = 258, + RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED = 259, + RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED = 260, + RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT = 261, + RISCV_IOMMU_FQ_CAUSE_MSI_INVALID = 262, + RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED = 263, + RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT = 264, + RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT = 265, + RISCV_IOMMU_FQ_CAUSE_PDT_INVALID = 266, + RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED = 267, + RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED = 268, + RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED = 269, + RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED = 270, + RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED = 271, + RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR = 272, + RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT = 273, + RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED = 274 +}; + +/* MSI page table pointer */ +#define RISCV_IOMMU_DC_MSIPTP_PPN RISCV_IOMMU_ATP_PPN_FIELD +#define RISCV_IOMMU_DC_MSIPTP_MODE RISCV_IOMMU_ATP_MODE_FIELD +#define RISCV_IOMMU_DC_MSIPTP_MODE_OFF 0 +#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT 1 + +/* Translation attributes fields */ +#define RISCV_IOMMU_PC_TA_V BIT_ULL(0) + +/* First stage context fields */ +#define RISCV_IOMMU_PC_FSC_PPN GENMASK_ULL(43, 0) + +enum riscv_iommu_fq_ttypes { + RISCV_IOMMU_FQ_TTYPE_NONE = 0, + RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1, + RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2, + RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3, + RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5, + RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6, + RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7, + RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8, +}; + +/* Fields on pte */ +#define RISCV_IOMMU_MSI_PTE_V BIT_ULL(0) +#define RISCV_IOMMU_MSI_PTE_M GENMASK_ULL(2, 1) + +#define RISCV_IOMMU_MSI_PTE_M_MRIF 1 +#define RISCV_IOMMU_MSI_PTE_M_BASIC 3 + +/* When M == 1 (MRIF mode) */ +#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR GENMASK_ULL(53, 7) +/* When M == 3 (basic mode) */ +#define RISCV_IOMMU_MSI_PTE_PPN RISCV_IOMMU_PPN_FIELD +#define RISCV_IOMMU_MSI_PTE_C BIT_ULL(63) + +/* Fields on mrif_info */ +#define RISCV_IOMMU_MSI_MRIF_NID GENMASK_ULL(9, 0) +#define RISCV_IOMMU_MSI_MRIF_NPPN RISCV_IOMMU_PPN_FIELD +#define RISCV_IOMMU_MSI_MRIF_NID_MSB BIT_ULL(60) + +#endif /* _RISCV_IOMMU_BITS_H_ */ -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> The RISC-V IOMMU specification is now ratified as-per the RISC-V international process. The latest frozen specifcation can be found at: https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf Add the foundation of the device emulation for RISC-V IOMMU. It includes support for s-stage (sv32, sv39, sv48, sv57 caps) and g-stage (sv32x4, sv39x4, sv48x4, sv57x4 caps). Other capabilities like ATS and DBG support will be added incrementally in the next patches. Co-developed-by: Sebastien Boeuf <seb@rivosinc.com> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/Kconfig | 4 + hw/riscv/meson.build | 1 + hw/riscv/riscv-iommu-bits.h | 18 + hw/riscv/riscv-iommu.c | 2018 +++++++++++++++++++++++++++++++++++ hw/riscv/riscv-iommu.h | 126 +++ hw/riscv/trace-events | 14 + hw/riscv/trace.h | 1 + include/hw/riscv/iommu.h | 36 + meson.build | 1 + 9 files changed, 2219 insertions(+) create mode 100644 hw/riscv/riscv-iommu.c create mode 100644 hw/riscv/riscv-iommu.h create mode 100644 hw/riscv/trace-events create mode 100644 hw/riscv/trace.h create mode 100644 include/hw/riscv/iommu.h diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/Kconfig +++ b/hw/riscv/Kconfig @@ -XXX,XX +XXX,XX @@ +config RISCV_IOMMU + bool + config RISCV_NUMA bool @@ -XXX,XX +XXX,XX @@ config RISCV_VIRT select SERIAL select RISCV_ACLINT select RISCV_APLIC + select RISCV_IOMMU select RISCV_IMSIC select SIFIVE_PLIC select SIFIVE_TEST diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/meson.build +++ b/hw/riscv/meson.build @@ -XXX,XX +XXX,XX @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c')) riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c')) riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c')) riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c')) +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c')) hw_arch += {'riscv': riscv_ss} diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-bits.h +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { /* 5.3 IOMMU Capabilities (64bits) */ #define RISCV_IOMMU_REG_CAP 0x0000 #define RISCV_IOMMU_CAP_VERSION GENMASK_ULL(7, 0) +#define RISCV_IOMMU_CAP_SV32 BIT_ULL(8) +#define RISCV_IOMMU_CAP_SV39 BIT_ULL(9) +#define RISCV_IOMMU_CAP_SV48 BIT_ULL(10) +#define RISCV_IOMMU_CAP_SV57 BIT_ULL(11) +#define RISCV_IOMMU_CAP_SV32X4 BIT_ULL(16) +#define RISCV_IOMMU_CAP_SV39X4 BIT_ULL(17) +#define RISCV_IOMMU_CAP_SV48X4 BIT_ULL(18) +#define RISCV_IOMMU_CAP_SV57X4 BIT_ULL(19) #define RISCV_IOMMU_CAP_MSI_FLAT BIT_ULL(22) #define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) #define RISCV_IOMMU_CAP_T2GPA BIT_ULL(26) @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { /* 5.4 Features control register (32bits) */ #define RISCV_IOMMU_REG_FCTL 0x0008 +#define RISCV_IOMMU_FCTL_BE BIT(0) #define RISCV_IOMMU_FCTL_WSI BIT(1) +#define RISCV_IOMMU_FCTL_GXL BIT(2) /* 5.5 Device-directory-table pointer (64bits) */ #define RISCV_IOMMU_REG_DDTP 0x0010 @@ -XXX,XX +XXX,XX @@ enum { /* 5.27 Interrupt cause to vector (64bits) */ #define RISCV_IOMMU_REG_ICVEC 0x02F8 +#define RISCV_IOMMU_ICVEC_CIV GENMASK_ULL(3, 0) +#define RISCV_IOMMU_ICVEC_FIV GENMASK_ULL(7, 4) +#define RISCV_IOMMU_ICVEC_PMIV GENMASK_ULL(11, 8) +#define RISCV_IOMMU_ICVEC_PIV GENMASK_ULL(15, 12) /* 5.28 MSI Configuration table (32 * 64bits) */ #define RISCV_IOMMU_REG_MSI_CONFIG 0x0300 @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_dc { #define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) #define RISCV_IOMMU_DC_TC_PDTV BIT_ULL(5) #define RISCV_IOMMU_DC_TC_PRPR BIT_ULL(6) +#define RISCV_IOMMU_DC_TC_GADE BIT_ULL(7) +#define RISCV_IOMMU_DC_TC_SADE BIT_ULL(8) #define RISCV_IOMMU_DC_TC_DPE BIT_ULL(9) #define RISCV_IOMMU_DC_TC_SBE BIT_ULL(10) #define RISCV_IOMMU_DC_TC_SXL BIT_ULL(11) @@ -XXX,XX +XXX,XX @@ enum riscv_iommu_fq_causes { /* Translation attributes fields */ #define RISCV_IOMMU_PC_TA_V BIT_ULL(0) +#define RISCV_IOMMU_PC_TA_RESERVED GENMASK_ULL(63, 32) /* First stage context fields */ #define RISCV_IOMMU_PC_FSC_PPN GENMASK_ULL(43, 0) +#define RISCV_IOMMU_PC_FSC_RESERVED GENMASK_ULL(59, 44) enum riscv_iommu_fq_ttypes { RISCV_IOMMU_FQ_TTYPE_NONE = 0, diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU + * + * Copyright (C) 2021-2023, Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qom/object.h" +#include "hw/pci/pci_bus.h" +#include "hw/pci/pci_device.h" +#include "hw/qdev-properties.h" +#include "hw/riscv/riscv_hart.h" +#include "migration/vmstate.h" +#include "qapi/error.h" +#include "qemu/timer.h" + +#include "cpu_bits.h" +#include "riscv-iommu.h" +#include "riscv-iommu-bits.h" +#include "trace.h" + +#define LIMIT_CACHE_CTX (1U << 7) +#define LIMIT_CACHE_IOT (1U << 20) + +/* Physical page number coversions */ +#define PPN_PHYS(ppn) ((ppn) << TARGET_PAGE_BITS) +#define PPN_DOWN(phy) ((phy) >> TARGET_PAGE_BITS) + +typedef struct RISCVIOMMUContext RISCVIOMMUContext; +typedef struct RISCVIOMMUEntry RISCVIOMMUEntry; + +/* Device assigned I/O address space */ +struct RISCVIOMMUSpace { + IOMMUMemoryRegion iova_mr; /* IOVA memory region for attached device */ + AddressSpace iova_as; /* IOVA address space for attached device */ + RISCVIOMMUState *iommu; /* Managing IOMMU device state */ + uint32_t devid; /* Requester identifier, AKA device_id */ + bool notifier; /* IOMMU unmap notifier enabled */ + QLIST_ENTRY(RISCVIOMMUSpace) list; +}; + +/* Device translation context state. */ +struct RISCVIOMMUContext { + uint64_t devid:24; /* Requester Id, AKA device_id */ + uint64_t process_id:20; /* Process ID. PASID for PCIe */ + uint64_t tc; /* Translation Control */ + uint64_t ta; /* Translation Attributes */ + uint64_t satp; /* S-Stage address translation and protection */ + uint64_t gatp; /* G-Stage address translation and protection */ + uint64_t msi_addr_mask; /* MSI filtering - address mask */ + uint64_t msi_addr_pattern; /* MSI filtering - address pattern */ + uint64_t msiptp; /* MSI redirection page table pointer */ +}; + +/* IOMMU index for transactions without process_id specified. */ +#define RISCV_IOMMU_NOPROCID 0 + +static uint8_t riscv_iommu_get_icvec_vector(uint32_t icvec, uint32_t vec_type) +{ + switch (vec_type) { + case RISCV_IOMMU_INTR_CQ: + return icvec & RISCV_IOMMU_ICVEC_CIV; + case RISCV_IOMMU_INTR_FQ: + return (icvec & RISCV_IOMMU_ICVEC_FIV) >> 4; + case RISCV_IOMMU_INTR_PM: + return (icvec & RISCV_IOMMU_ICVEC_PMIV) >> 8; + case RISCV_IOMMU_INTR_PQ: + return (icvec & RISCV_IOMMU_ICVEC_PIV) >> 12; + default: + g_assert_not_reached(); + } +} + +static void riscv_iommu_notify(RISCVIOMMUState *s, int vec_type) +{ + const uint32_t fctl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FCTL); + uint32_t ipsr, icvec, vector; + + if (fctl & RISCV_IOMMU_FCTL_WSI || !s->notify) { + return; + } + + icvec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_ICVEC); + ipsr = riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec_type), 0); + + if (!(ipsr & (1 << vec_type))) { + vector = riscv_iommu_get_icvec_vector(icvec, vec_type); + s->notify(s, vector); + trace_riscv_iommu_notify_int_vector(vec_type, vector); + } +} + +static void riscv_iommu_fault(RISCVIOMMUState *s, + struct riscv_iommu_fq_record *ev) +{ + uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR); + uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQH) & s->fq_mask; + uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQT) & s->fq_mask; + uint32_t next = (tail + 1) & s->fq_mask; + uint32_t devid = get_field(ev->hdr, RISCV_IOMMU_FQ_HDR_DID); + + trace_riscv_iommu_flt(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid), + PCI_FUNC(devid), ev->hdr, ev->iotval); + + if (!(ctrl & RISCV_IOMMU_FQCSR_FQON) || + !!(ctrl & (RISCV_IOMMU_FQCSR_FQOF | RISCV_IOMMU_FQCSR_FQMF))) { + return; + } + + if (head == next) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, + RISCV_IOMMU_FQCSR_FQOF, 0); + } else { + dma_addr_t addr = s->fq_addr + tail * sizeof(*ev); + if (dma_memory_write(s->target_as, addr, ev, sizeof(*ev), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, + RISCV_IOMMU_FQCSR_FQMF, 0); + } else { + riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_FQT, next); + } + } + + if (ctrl & RISCV_IOMMU_FQCSR_FIE) { + riscv_iommu_notify(s, RISCV_IOMMU_INTR_FQ); + } +} + +static void riscv_iommu_pri(RISCVIOMMUState *s, + struct riscv_iommu_pq_record *pr) +{ + uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR); + uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQH) & s->pq_mask; + uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQT) & s->pq_mask; + uint32_t next = (tail + 1) & s->pq_mask; + uint32_t devid = get_field(pr->hdr, RISCV_IOMMU_PREQ_HDR_DID); + + trace_riscv_iommu_pri(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid), + PCI_FUNC(devid), pr->payload); + + if (!(ctrl & RISCV_IOMMU_PQCSR_PQON) || + !!(ctrl & (RISCV_IOMMU_PQCSR_PQOF | RISCV_IOMMU_PQCSR_PQMF))) { + return; + } + + if (head == next) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, + RISCV_IOMMU_PQCSR_PQOF, 0); + } else { + dma_addr_t addr = s->pq_addr + tail * sizeof(*pr); + if (dma_memory_write(s->target_as, addr, pr, sizeof(*pr), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, + RISCV_IOMMU_PQCSR_PQMF, 0); + } else { + riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_PQT, next); + } + } + + if (ctrl & RISCV_IOMMU_PQCSR_PIE) { + riscv_iommu_notify(s, RISCV_IOMMU_INTR_PQ); + } +} + +/* Portable implementation of pext_u64, bit-mask extraction. */ +static uint64_t _pext_u64(uint64_t val, uint64_t ext) +{ + uint64_t ret = 0; + uint64_t rot = 1; + + while (ext) { + if (ext & 1) { + if (val & 1) { + ret |= rot; + } + rot <<= 1; + } + val >>= 1; + ext >>= 1; + } + + return ret; +} + +/* Check if GPA matches MSI/MRIF pattern. */ +static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, + dma_addr_t gpa) +{ + if (!s->enable_msi) { + return false; + } + + if (get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_MODE) != + RISCV_IOMMU_DC_MSIPTP_MODE_FLAT) { + return false; /* Invalid MSI/MRIF mode */ + } + + if ((PPN_DOWN(gpa) ^ ctx->msi_addr_pattern) & ~ctx->msi_addr_mask) { + return false; /* GPA not in MSI range defined by AIA IMSIC rules. */ + } + + return true; +} + +/* + * RISCV IOMMU Address Translation Lookup - Page Table Walk + * + * Note: Code is based on get_physical_address() from target/riscv/cpu_helper.c + * Both implementation can be merged into single helper function in future. + * Keeping them separate for now, as error reporting and flow specifics are + * sufficiently different for separate implementation. + * + * @s : IOMMU Device State + * @ctx : Translation context for device id and process address space id. + * @iotlb : translation data: physical address and access mode. + * @return : success or fault cause code. + */ +static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, + IOMMUTLBEntry *iotlb) +{ + dma_addr_t addr, base; + uint64_t satp, gatp, pte; + bool en_s, en_g; + struct { + unsigned char step; + unsigned char levels; + unsigned char ptidxbits; + unsigned char ptesize; + } sc[2]; + /* Translation stage phase */ + enum { + S_STAGE = 0, + G_STAGE = 1, + } pass; + MemTxResult ret; + + satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD); + gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD); + + en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE; + en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE; + + /* + * Early check for MSI address match when IOVA == GPA. This check + * is required to ensure MSI translation is applied in case + * first-stage translation is set to BARE mode. In this case IOVA + * provided is a valid GPA. Running translation through page walk + * second stage translation will incorrectly try to translate GPA + * to host physical page, likely hitting IOPF. + */ + if ((iotlb->perm & IOMMU_WO) && + riscv_iommu_msi_check(s, ctx, iotlb->iova)) { + iotlb->target_as = &s->trap_as; + iotlb->translated_addr = iotlb->iova; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + return 0; + } + + /* Exit early for pass-through mode. */ + if (!(en_s || en_g)) { + iotlb->translated_addr = iotlb->iova; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + /* Allow R/W in pass-through mode */ + iotlb->perm = IOMMU_RW; + return 0; + } + + /* S/G translation parameters. */ + for (pass = 0; pass < 2; pass++) { + uint32_t sv_mode; + + sc[pass].step = 0; + if (pass ? (s->fctl & RISCV_IOMMU_FCTL_GXL) : + (ctx->tc & RISCV_IOMMU_DC_TC_SXL)) { + /* 32bit mode for GXL/SXL == 1 */ + switch (pass ? gatp : satp) { + case RISCV_IOMMU_DC_IOHGATP_MODE_BARE: + sc[pass].levels = 0; + sc[pass].ptidxbits = 0; + sc[pass].ptesize = 0; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV32X4 : RISCV_IOMMU_CAP_SV32; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 2; + sc[pass].ptidxbits = 10; + sc[pass].ptesize = 4; + break; + default: + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + } else { + /* 64bit mode for GXL/SXL == 0 */ + switch (pass ? gatp : satp) { + case RISCV_IOMMU_DC_IOHGATP_MODE_BARE: + sc[pass].levels = 0; + sc[pass].ptidxbits = 0; + sc[pass].ptesize = 0; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV39X4 : RISCV_IOMMU_CAP_SV39; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 3; + sc[pass].ptidxbits = 9; + sc[pass].ptesize = 8; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV48X4 : RISCV_IOMMU_CAP_SV48; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 4; + sc[pass].ptidxbits = 9; + sc[pass].ptesize = 8; + break; + case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4: + sv_mode = pass ? RISCV_IOMMU_CAP_SV57X4 : RISCV_IOMMU_CAP_SV57; + if (!(s->cap & sv_mode)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + sc[pass].levels = 5; + sc[pass].ptidxbits = 9; + sc[pass].ptesize = 8; + break; + default: + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + } + }; + + /* S/G stages translation tables root pointers */ + gatp = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD)); + satp = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_ATP_PPN_FIELD)); + addr = (en_s && en_g) ? satp : iotlb->iova; + base = en_g ? gatp : satp; + pass = en_g ? G_STAGE : S_STAGE; + + do { + const unsigned widened = (pass && !sc[pass].step) ? 2 : 0; + const unsigned va_bits = widened + sc[pass].ptidxbits; + const unsigned va_skip = TARGET_PAGE_BITS + sc[pass].ptidxbits * + (sc[pass].levels - 1 - sc[pass].step); + const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1); + const dma_addr_t pte_addr = base + idx * sc[pass].ptesize; + const bool ade = + ctx->tc & (pass ? RISCV_IOMMU_DC_TC_GADE : RISCV_IOMMU_DC_TC_SADE); + + /* Address range check before first level lookup */ + if (!sc[pass].step) { + const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1; + if ((addr & va_mask) != addr) { + return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED; + } + } + + /* Read page table entry */ + if (sc[pass].ptesize == 4) { + uint32_t pte32 = 0; + ret = ldl_le_dma(s->target_as, pte_addr, &pte32, + MEMTXATTRS_UNSPECIFIED); + pte = pte32; + } else { + ret = ldq_le_dma(s->target_as, pte_addr, &pte, + MEMTXATTRS_UNSPECIFIED); + } + if (ret != MEMTX_OK) { + return (iotlb->perm & IOMMU_WO) ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT + : RISCV_IOMMU_FQ_CAUSE_RD_FAULT; + } + + sc[pass].step++; + hwaddr ppn = pte >> PTE_PPN_SHIFT; + + if (!(pte & PTE_V)) { + break; /* Invalid PTE */ + } else if (!(pte & (PTE_R | PTE_W | PTE_X))) { + base = PPN_PHYS(ppn); /* Inner PTE, continue walking */ + } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) { + break; /* Reserved leaf PTE flags: PTE_W */ + } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) { + break; /* Reserved leaf PTE flags: PTE_W + PTE_X */ + } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) { + break; /* Misaligned PPN */ + } else if ((iotlb->perm & IOMMU_RO) && !(pte & PTE_R)) { + break; /* Read access check failed */ + } else if ((iotlb->perm & IOMMU_WO) && !(pte & PTE_W)) { + break; /* Write access check failed */ + } else if ((iotlb->perm & IOMMU_RO) && !ade && !(pte & PTE_A)) { + break; /* Access bit not set */ + } else if ((iotlb->perm & IOMMU_WO) && !ade && !(pte & PTE_D)) { + break; /* Dirty bit not set */ + } else { + /* Leaf PTE, translation completed. */ + sc[pass].step = sc[pass].levels; + base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1)); + /* Update address mask based on smallest translation granularity */ + iotlb->addr_mask &= (1ULL << va_skip) - 1; + /* Continue with S-Stage translation? */ + if (pass && sc[0].step != sc[0].levels) { + pass = S_STAGE; + addr = iotlb->iova; + continue; + } + /* Translation phase completed (GPA or SPA) */ + iotlb->translated_addr = base; + iotlb->perm = (pte & PTE_W) ? ((pte & PTE_R) ? IOMMU_RW : IOMMU_WO) + : IOMMU_RO; + + /* Check MSI GPA address match */ + if (pass == S_STAGE && (iotlb->perm & IOMMU_WO) && + riscv_iommu_msi_check(s, ctx, base)) { + /* Trap MSI writes and return GPA address. */ + iotlb->target_as = &s->trap_as; + iotlb->addr_mask = ~TARGET_PAGE_MASK; + return 0; + } + + /* Continue with G-Stage translation? */ + if (!pass && en_g) { + pass = G_STAGE; + addr = base; + base = gatp; + sc[pass].step = 0; + continue; + } + + return 0; + } + + if (sc[pass].step == sc[pass].levels) { + break; /* Can't find leaf PTE */ + } + + /* Continue with G-Stage translation? */ + if (!pass && en_g) { + pass = G_STAGE; + addr = base; + base = gatp; + sc[pass].step = 0; + } + } while (1); + + return (iotlb->perm & IOMMU_WO) ? + (pass ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS : + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S) : + (pass ? RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS : + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S); +} + +static void riscv_iommu_report_fault(RISCVIOMMUState *s, + RISCVIOMMUContext *ctx, + uint32_t fault_type, uint32_t cause, + bool pv, + uint64_t iotval, uint64_t iotval2) +{ + struct riscv_iommu_fq_record ev = { 0 }; + + if (ctx->tc & RISCV_IOMMU_DC_TC_DTF) { + switch (cause) { + case RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED: + case RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT: + case RISCV_IOMMU_FQ_CAUSE_DDT_INVALID: + case RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED: + case RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED: + case RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR: + case RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT: + break; + default: + /* DTF prevents reporting a fault for this given cause */ + return; + } + } + + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, cause); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, fault_type); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, ctx->devid); + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, true); + + if (pv) { + ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, ctx->process_id); + } + + ev.iotval = iotval; + ev.iotval2 = iotval2; + + riscv_iommu_fault(s, &ev); +} + +/* Redirect MSI write for given GPA. */ +static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s, + RISCVIOMMUContext *ctx, uint64_t gpa, uint64_t data, + unsigned size, MemTxAttrs attrs) +{ + MemTxResult res; + dma_addr_t addr; + uint64_t intn; + uint32_t n190; + uint64_t pte[2]; + int fault_type = RISCV_IOMMU_FQ_TTYPE_UADDR_WR; + int cause; + + /* Interrupt File Number */ + intn = _pext_u64(PPN_DOWN(gpa), ctx->msi_addr_mask); + if (intn >= 256) { + /* Interrupt file number out of range */ + res = MEMTX_ACCESS_ERROR; + cause = RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT; + goto err; + } + + /* fetch MSI PTE */ + addr = PPN_PHYS(get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_PPN)); + addr = addr | (intn * sizeof(pte)); + res = dma_memory_read(s->target_as, addr, &pte, sizeof(pte), + MEMTXATTRS_UNSPECIFIED); + if (res != MEMTX_OK) { + if (res == MEMTX_DECODE_ERROR) { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED; + } else { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT; + } + goto err; + } + + le64_to_cpus(&pte[0]); + le64_to_cpus(&pte[1]); + + if (!(pte[0] & RISCV_IOMMU_MSI_PTE_V) || (pte[0] & RISCV_IOMMU_MSI_PTE_C)) { + /* + * The spec mentions that: "If msipte.C == 1, then further + * processing to interpret the PTE is implementation + * defined.". We'll abort with cause = 262 for this + * case too. + */ + res = MEMTX_ACCESS_ERROR; + cause = RISCV_IOMMU_FQ_CAUSE_MSI_INVALID; + goto err; + } + + switch (get_field(pte[0], RISCV_IOMMU_MSI_PTE_M)) { + case RISCV_IOMMU_MSI_PTE_M_BASIC: + /* MSI Pass-through mode */ + addr = PPN_PHYS(get_field(pte[0], RISCV_IOMMU_MSI_PTE_PPN)); + + trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid), + PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), + gpa, addr); + + res = dma_memory_write(s->target_as, addr, &data, size, attrs); + if (res != MEMTX_OK) { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT; + goto err; + } + + return MEMTX_OK; + case RISCV_IOMMU_MSI_PTE_M_MRIF: + /* MRIF mode, continue. */ + break; + default: + res = MEMTX_ACCESS_ERROR; + cause = RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED; + goto err; + } + + /* + * Report an error for interrupt identities exceeding the maximum allowed + * for an IMSIC interrupt file (2047) or destination address is not 32-bit + * aligned. See IOMMU Specification, Chapter 2.3. MSI page tables. + */ + if ((data > 2047) || (gpa & 3)) { + res = MEMTX_ACCESS_ERROR; + cause = RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED; + goto err; + } + + /* MSI MRIF mode, non atomic pending bit update */ + + /* MRIF pending bit address */ + addr = get_field(pte[0], RISCV_IOMMU_MSI_PTE_MRIF_ADDR) << 9; + addr = addr | ((data & 0x7c0) >> 3); + + trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid), + PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), + gpa, addr); + + /* MRIF pending bit mask */ + data = 1ULL << (data & 0x03f); + res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs); + if (res != MEMTX_OK) { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT; + goto err; + } + + intn = intn | data; + res = dma_memory_write(s->target_as, addr, &intn, sizeof(intn), attrs); + if (res != MEMTX_OK) { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT; + goto err; + } + + /* Get MRIF enable bits */ + addr = addr + sizeof(intn); + res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs); + if (res != MEMTX_OK) { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT; + goto err; + } + + if (!(intn & data)) { + /* notification disabled, MRIF update completed. */ + return MEMTX_OK; + } + + /* Send notification message */ + addr = PPN_PHYS(get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NPPN)); + n190 = get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID) | + (get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID_MSB) << 10); + + res = dma_memory_write(s->target_as, addr, &n190, sizeof(n190), attrs); + if (res != MEMTX_OK) { + cause = RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT; + goto err; + } + + trace_riscv_iommu_mrif_notification(s->parent_obj.id, n190, addr); + + return MEMTX_OK; + +err: + riscv_iommu_report_fault(s, ctx, fault_type, cause, + !!ctx->process_id, 0, 0); + return res; +} + +/* + * Check device context configuration as described by the + * riscv-iommu spec section "Device-context configuration + * checks". + */ +static bool riscv_iommu_validate_device_ctx(RISCVIOMMUState *s, + RISCVIOMMUContext *ctx) +{ + uint32_t fsc_mode, msi_mode; + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_PRI) && + ctx->tc & RISCV_IOMMU_DC_TC_PRPR) { + return false; + } + + if (!(s->cap & RISCV_IOMMU_CAP_T2GPA) && + ctx->tc & RISCV_IOMMU_DC_TC_T2GPA) { + return false; + } + + if (s->cap & RISCV_IOMMU_CAP_MSI_FLAT) { + msi_mode = get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_MODE); + + if (msi_mode != RISCV_IOMMU_DC_MSIPTP_MODE_OFF && + msi_mode != RISCV_IOMMU_DC_MSIPTP_MODE_FLAT) { + return false; + } + } + + fsc_mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE); + + if (ctx->tc & RISCV_IOMMU_DC_TC_PDTV) { + switch (fsc_mode) { + case RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8: + if (!(s->cap & RISCV_IOMMU_CAP_PD8)) { + return false; + } + break; + case RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17: + if (!(s->cap & RISCV_IOMMU_CAP_PD17)) { + return false; + } + break; + case RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20: + if (!(s->cap & RISCV_IOMMU_CAP_PD20)) { + return false; + } + break; + } + } else { + /* DC.tc.PDTV is 0 */ + if (ctx->tc & RISCV_IOMMU_DC_TC_DPE) { + return false; + } + + if (ctx->tc & RISCV_IOMMU_DC_TC_SXL) { + if (fsc_mode == RISCV_IOMMU_CAP_SV32 && + !(s->cap & RISCV_IOMMU_CAP_SV32)) { + return false; + } + } else { + switch (fsc_mode) { + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: + if (!(s->cap & RISCV_IOMMU_CAP_SV39)) { + return false; + } + break; + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: + if (!(s->cap & RISCV_IOMMU_CAP_SV48)) { + return false; + } + break; + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: + if (!(s->cap & RISCV_IOMMU_CAP_SV57)) { + return false; + } + break; + } + } + } + + /* + * CAP_END is always zero (only one endianess). FCTL_BE is + * always zero (little-endian accesses). Thus TC_SBE must + * always be LE, i.e. zero. + */ + if (ctx->tc & RISCV_IOMMU_DC_TC_SBE) { + return false; + } + + return true; +} + +/* + * Validate process context (PC) according to section + * "Process-context configuration checks". + */ +static bool riscv_iommu_validate_process_ctx(RISCVIOMMUState *s, + RISCVIOMMUContext *ctx) +{ + uint32_t mode; + + if (get_field(ctx->ta, RISCV_IOMMU_PC_TA_RESERVED)) { + return false; + } + + if (get_field(ctx->satp, RISCV_IOMMU_PC_FSC_RESERVED)) { + return false; + } + + mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE); + switch (mode) { + case RISCV_IOMMU_DC_FSC_MODE_BARE: + /* sv39 and sv32 modes have the same value (8) */ + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: + break; + default: + return false; + } + + if (ctx->tc & RISCV_IOMMU_DC_TC_SXL) { + if (mode == RISCV_IOMMU_CAP_SV32 && + !(s->cap & RISCV_IOMMU_CAP_SV32)) { + return false; + } + } else { + switch (mode) { + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: + if (!(s->cap & RISCV_IOMMU_CAP_SV39)) { + return false; + } + break; + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: + if (!(s->cap & RISCV_IOMMU_CAP_SV48)) { + return false; + } + break; + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: + if (!(s->cap & RISCV_IOMMU_CAP_SV57)) { + return false; + } + break; + } + } + + return true; +} + +/* + * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk + * + * @s : IOMMU Device State + * @ctx : Device Translation Context with devid and process_id set. + * @return : success or fault code. + */ +static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) +{ + const uint64_t ddtp = s->ddtp; + unsigned mode = get_field(ddtp, RISCV_IOMMU_DDTP_MODE); + dma_addr_t addr = PPN_PHYS(get_field(ddtp, RISCV_IOMMU_DDTP_PPN)); + struct riscv_iommu_dc dc; + /* Device Context format: 0: extended (64 bytes) | 1: base (32 bytes) */ + const int dc_fmt = !s->enable_msi; + const size_t dc_len = sizeof(dc) >> dc_fmt; + unsigned depth; + uint64_t de; + + switch (mode) { + case RISCV_IOMMU_DDTP_MODE_OFF: + return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED; + + case RISCV_IOMMU_DDTP_MODE_BARE: + /* mock up pass-through translation context */ + ctx->gatp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD, + RISCV_IOMMU_DC_IOHGATP_MODE_BARE); + ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD, + RISCV_IOMMU_DC_FSC_MODE_BARE); + ctx->tc = RISCV_IOMMU_DC_TC_V; + ctx->ta = 0; + ctx->msiptp = 0; + return 0; + + case RISCV_IOMMU_DDTP_MODE_1LVL: + depth = 0; + break; + + case RISCV_IOMMU_DDTP_MODE_2LVL: + depth = 1; + break; + + case RISCV_IOMMU_DDTP_MODE_3LVL: + depth = 2; + break; + + default: + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + + /* + * Check supported device id width (in bits). + * See IOMMU Specification, Chapter 6. Software guidelines. + * - if extended device-context format is used: + * 1LVL: 6, 2LVL: 15, 3LVL: 24 + * - if base device-context format is used: + * 1LVL: 7, 2LVL: 16, 3LVL: 24 + */ + if (ctx->devid >= (1 << (depth * 9 + 6 + (dc_fmt && depth != 2)))) { + return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED; + } + + /* Device directory tree walk */ + for (; depth-- > 0; ) { + /* + * Select device id index bits based on device directory tree level + * and device context format. + * See IOMMU Specification, Chapter 2. Data Structures. + * - if extended device-context format is used: + * device index: [23:15][14:6][5:0] + * - if base device-context format is used: + * device index: [23:16][15:7][6:0] + */ + const int split = depth * 9 + 6 + dc_fmt; + addr |= ((ctx->devid >> split) << 3) & ~TARGET_PAGE_MASK; + if (dma_memory_read(s->target_as, addr, &de, sizeof(de), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT; + } + le64_to_cpus(&de); + if (!(de & RISCV_IOMMU_DDTE_VALID)) { + /* invalid directory entry */ + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + if (de & ~(RISCV_IOMMU_DDTE_PPN | RISCV_IOMMU_DDTE_VALID)) { + /* reserved bits set */ + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + addr = PPN_PHYS(get_field(de, RISCV_IOMMU_DDTE_PPN)); + } + + /* index into device context entry page */ + addr |= (ctx->devid * dc_len) & ~TARGET_PAGE_MASK; + + memset(&dc, 0, sizeof(dc)); + if (dma_memory_read(s->target_as, addr, &dc, dc_len, + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT; + } + + /* Set translation context. */ + ctx->tc = le64_to_cpu(dc.tc); + ctx->gatp = le64_to_cpu(dc.iohgatp); + ctx->satp = le64_to_cpu(dc.fsc); + ctx->ta = le64_to_cpu(dc.ta); + ctx->msiptp = le64_to_cpu(dc.msiptp); + ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask); + ctx->msi_addr_pattern = le64_to_cpu(dc.msi_addr_pattern); + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_V)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + + if (!riscv_iommu_validate_device_ctx(s, ctx)) { + return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED; + } + + /* FSC field checks */ + mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE); + addr = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_DC_FSC_PPN)); + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) { + if (ctx->process_id != RISCV_IOMMU_NOPROCID) { + /* PID is disabled */ + return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED; + } + if (mode > RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57) { + /* Invalid translation mode */ + return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID; + } + return 0; + } + + if (ctx->process_id == RISCV_IOMMU_NOPROCID) { + if (!(ctx->tc & RISCV_IOMMU_DC_TC_DPE)) { + /* No default process_id enabled, set BARE mode */ + ctx->satp = 0ULL; + return 0; + } else { + /* Use default process_id #0 */ + ctx->process_id = 0; + } + } + + if (mode == RISCV_IOMMU_DC_FSC_MODE_BARE) { + /* No S-Stage translation, done. */ + return 0; + } + + /* FSC.TC.PDTV enabled */ + if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) { + /* Invalid PDTP.MODE */ + return RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED; + } + + for (depth = mode - RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8; depth-- > 0; ) { + /* + * Select process id index bits based on process directory tree + * level. See IOMMU Specification, 2.2. Process-Directory-Table. + */ + const int split = depth * 9 + 8; + addr |= ((ctx->process_id >> split) << 3) & ~TARGET_PAGE_MASK; + if (dma_memory_read(s->target_as, addr, &de, sizeof(de), + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT; + } + le64_to_cpus(&de); + if (!(de & RISCV_IOMMU_PC_TA_V)) { + return RISCV_IOMMU_FQ_CAUSE_PDT_INVALID; + } + addr = PPN_PHYS(get_field(de, RISCV_IOMMU_PC_FSC_PPN)); + } + + /* Leaf entry in PDT */ + addr |= (ctx->process_id << 4) & ~TARGET_PAGE_MASK; + if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2, + MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) { + return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT; + } + + /* Use FSC and TA from process directory entry. */ + ctx->ta = le64_to_cpu(dc.ta); + ctx->satp = le64_to_cpu(dc.fsc); + + if (!(ctx->ta & RISCV_IOMMU_PC_TA_V)) { + return RISCV_IOMMU_FQ_CAUSE_PDT_INVALID; + } + + if (!riscv_iommu_validate_process_ctx(s, ctx)) { + return RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED; + } + + return 0; +} + +/* Translation Context cache support */ +static gboolean riscv_iommu_ctx_equal(gconstpointer v1, gconstpointer v2) +{ + RISCVIOMMUContext *c1 = (RISCVIOMMUContext *) v1; + RISCVIOMMUContext *c2 = (RISCVIOMMUContext *) v2; + return c1->devid == c2->devid && + c1->process_id == c2->process_id; +} + +static guint riscv_iommu_ctx_hash(gconstpointer v) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) v; + /* + * Generate simple hash of (process_id, devid) + * assuming 24-bit wide devid. + */ + return (guint)(ctx->devid) + ((guint)(ctx->process_id) << 24); +} + +static void riscv_iommu_ctx_inval_devid_procid(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value; + RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data; + if (ctx->tc & RISCV_IOMMU_DC_TC_V && + ctx->devid == arg->devid && + ctx->process_id == arg->process_id) { + ctx->tc &= ~RISCV_IOMMU_DC_TC_V; + } +} + +static void riscv_iommu_ctx_inval_devid(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value; + RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data; + if (ctx->tc & RISCV_IOMMU_DC_TC_V && + ctx->devid == arg->devid) { + ctx->tc &= ~RISCV_IOMMU_DC_TC_V; + } +} + +static void riscv_iommu_ctx_inval_all(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value; + if (ctx->tc & RISCV_IOMMU_DC_TC_V) { + ctx->tc &= ~RISCV_IOMMU_DC_TC_V; + } +} + +static void riscv_iommu_ctx_inval(RISCVIOMMUState *s, GHFunc func, + uint32_t devid, uint32_t process_id) +{ + GHashTable *ctx_cache; + RISCVIOMMUContext key = { + .devid = devid, + .process_id = process_id, + }; + ctx_cache = g_hash_table_ref(s->ctx_cache); + g_hash_table_foreach(ctx_cache, func, &key); + g_hash_table_unref(ctx_cache); +} + +/* Find or allocate translation context for a given {device_id, process_id} */ +static RISCVIOMMUContext *riscv_iommu_ctx(RISCVIOMMUState *s, + unsigned devid, unsigned process_id, + void **ref) +{ + GHashTable *ctx_cache; + RISCVIOMMUContext *ctx; + RISCVIOMMUContext key = { + .devid = devid, + .process_id = process_id, + }; + + ctx_cache = g_hash_table_ref(s->ctx_cache); + ctx = g_hash_table_lookup(ctx_cache, &key); + + if (ctx && (ctx->tc & RISCV_IOMMU_DC_TC_V)) { + *ref = ctx_cache; + return ctx; + } + + ctx = g_new0(RISCVIOMMUContext, 1); + ctx->devid = devid; + ctx->process_id = process_id; + + int fault = riscv_iommu_ctx_fetch(s, ctx); + if (!fault) { + if (g_hash_table_size(ctx_cache) >= LIMIT_CACHE_CTX) { + g_hash_table_unref(ctx_cache); + ctx_cache = g_hash_table_new_full(riscv_iommu_ctx_hash, + riscv_iommu_ctx_equal, + g_free, NULL); + g_hash_table_ref(ctx_cache); + g_hash_table_unref(qatomic_xchg(&s->ctx_cache, ctx_cache)); + } + g_hash_table_add(ctx_cache, ctx); + *ref = ctx_cache; + return ctx; + } + + g_hash_table_unref(ctx_cache); + *ref = NULL; + + riscv_iommu_report_fault(s, ctx, RISCV_IOMMU_FQ_TTYPE_UADDR_RD, + fault, !!process_id, 0, 0); + + g_free(ctx); + return NULL; +} + +static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref) +{ + if (ref) { + g_hash_table_unref((GHashTable *)ref); + } +} + +/* Find or allocate address space for a given device */ +static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid) +{ + RISCVIOMMUSpace *as; + + /* FIXME: PCIe bus remapping for attached endpoints. */ + devid |= s->bus << 8; + + QLIST_FOREACH(as, &s->spaces, list) { + if (as->devid == devid) { + break; + } + } + + if (as == NULL) { + char name[64]; + as = g_new0(RISCVIOMMUSpace, 1); + + as->iommu = s; + as->devid = devid; + + snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova", + PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid)); + + /* IOVA address space, untranslated addresses */ + memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr), + TYPE_RISCV_IOMMU_MEMORY_REGION, + OBJECT(as), "riscv_iommu", UINT64_MAX); + address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr), name); + + QLIST_INSERT_HEAD(&s->spaces, as, list); + + trace_riscv_iommu_new(s->parent_obj.id, PCI_BUS_NUM(as->devid), + PCI_SLOT(as->devid), PCI_FUNC(as->devid)); + } + return &as->iova_as; +} + +static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, + IOMMUTLBEntry *iotlb) +{ + bool enable_pid; + bool enable_pri; + int fault; + + /* + * TC[32] is reserved for custom extensions, used here to temporarily + * enable automatic page-request generation for ATS queries. + */ + enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32)); + enable_pid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV); + + /* Translate using device directory / page table information. */ + fault = riscv_iommu_spa_fetch(s, ctx, iotlb); + + if (enable_pri && fault) { + struct riscv_iommu_pq_record pr = {0}; + if (enable_pid) { + pr.hdr = set_field(RISCV_IOMMU_PREQ_HDR_PV, + RISCV_IOMMU_PREQ_HDR_PID, ctx->process_id); + } + pr.hdr = set_field(pr.hdr, RISCV_IOMMU_PREQ_HDR_DID, ctx->devid); + pr.payload = (iotlb->iova & TARGET_PAGE_MASK) | + RISCV_IOMMU_PREQ_PAYLOAD_M; + riscv_iommu_pri(s, &pr); + return fault; + } + + if (fault) { + unsigned ttype; + + if (iotlb->perm & IOMMU_RW) { + ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR; + } else { + ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD; + } + + riscv_iommu_report_fault(s, ctx, ttype, fault, enable_pid, + iotlb->iova, iotlb->translated_addr); + return fault; + } + + return 0; +} + +/* IOMMU Command Interface */ +static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify, + uint64_t addr, uint32_t data) +{ + /* + * ATS processing in this implementation of the IOMMU is synchronous, + * no need to wait for completions here. + */ + if (!notify) { + return MEMTX_OK; + } + + return dma_memory_write(s->target_as, addr, &data, sizeof(data), + MEMTXATTRS_UNSPECIFIED); +} + +static void riscv_iommu_process_ddtp(RISCVIOMMUState *s) +{ + uint64_t old_ddtp = s->ddtp; + uint64_t new_ddtp = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_DDTP); + unsigned new_mode = get_field(new_ddtp, RISCV_IOMMU_DDTP_MODE); + unsigned old_mode = get_field(old_ddtp, RISCV_IOMMU_DDTP_MODE); + bool ok = false; + + /* + * Check for allowed DDTP.MODE transitions: + * {OFF, BARE} -> {OFF, BARE, 1LVL, 2LVL, 3LVL} + * {1LVL, 2LVL, 3LVL} -> {OFF, BARE} + */ + if (new_mode == old_mode || + new_mode == RISCV_IOMMU_DDTP_MODE_OFF || + new_mode == RISCV_IOMMU_DDTP_MODE_BARE) { + ok = true; + } else if (new_mode == RISCV_IOMMU_DDTP_MODE_1LVL || + new_mode == RISCV_IOMMU_DDTP_MODE_2LVL || + new_mode == RISCV_IOMMU_DDTP_MODE_3LVL) { + ok = old_mode == RISCV_IOMMU_DDTP_MODE_OFF || + old_mode == RISCV_IOMMU_DDTP_MODE_BARE; + } + + if (ok) { + /* clear reserved and busy bits, report back sanitized version */ + new_ddtp = set_field(new_ddtp & RISCV_IOMMU_DDTP_PPN, + RISCV_IOMMU_DDTP_MODE, new_mode); + } else { + new_ddtp = old_ddtp; + } + s->ddtp = new_ddtp; + + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_DDTP, new_ddtp); +} + +/* Command function and opcode field. */ +#define RISCV_IOMMU_CMD(func, op) (((func) << 7) | (op)) + +static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s) +{ + struct riscv_iommu_command cmd; + MemTxResult res; + dma_addr_t addr; + uint32_t tail, head, ctrl; + uint64_t cmd_opcode; + GHFunc func; + + ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR); + tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQT) & s->cq_mask; + head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQH) & s->cq_mask; + + /* Check for pending error or queue processing disabled */ + if (!(ctrl & RISCV_IOMMU_CQCSR_CQON) || + !!(ctrl & (RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CQMF))) { + return; + } + + while (tail != head) { + addr = s->cq_addr + head * sizeof(cmd); + res = dma_memory_read(s->target_as, addr, &cmd, sizeof(cmd), + MEMTXATTRS_UNSPECIFIED); + + if (res != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, + RISCV_IOMMU_CQCSR_CQMF, 0); + goto fault; + } + + trace_riscv_iommu_cmd(s->parent_obj.id, cmd.dword0, cmd.dword1); + + cmd_opcode = get_field(cmd.dword0, + RISCV_IOMMU_CMD_OPCODE | RISCV_IOMMU_CMD_FUNC); + + switch (cmd_opcode) { + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOFENCE_FUNC_C, + RISCV_IOMMU_CMD_IOFENCE_OPCODE): + res = riscv_iommu_iofence(s, + cmd.dword0 & RISCV_IOMMU_CMD_IOFENCE_AV, cmd.dword1, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOFENCE_DATA)); + + if (res != MEMTX_OK) { + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, + RISCV_IOMMU_CQCSR_CQMF, 0); + goto fault; + } + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA, + RISCV_IOMMU_CMD_IOTINVAL_OPCODE): + if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) { + /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */ + goto cmd_ill; + } + /* translation cache not implemented yet */ + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA, + RISCV_IOMMU_CMD_IOTINVAL_OPCODE): + /* translation cache not implemented yet */ + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT, + RISCV_IOMMU_CMD_IODIR_OPCODE): + if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) { + /* invalidate all device context cache mappings */ + func = riscv_iommu_ctx_inval_all; + } else { + /* invalidate all device context matching DID */ + func = riscv_iommu_ctx_inval_devid; + } + riscv_iommu_ctx_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), 0); + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT, + RISCV_IOMMU_CMD_IODIR_OPCODE): + if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) { + /* illegal command arguments IODIR_PDT & DV == 0 */ + goto cmd_ill; + } else { + func = riscv_iommu_ctx_inval_devid_procid; + } + riscv_iommu_ctx_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), + get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID)); + break; + + default: + cmd_ill: + /* Invalid instruction, do not advance instruction index. */ + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, + RISCV_IOMMU_CQCSR_CMD_ILL, 0); + goto fault; + } + + /* Advance and update head pointer after command completes. */ + head = (head + 1) & s->cq_mask; + riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_CQH, head); + } + return; + +fault: + if (ctrl & RISCV_IOMMU_CQCSR_CIE) { + riscv_iommu_notify(s, RISCV_IOMMU_INTR_CQ); + } +} + +static void riscv_iommu_process_cq_control(RISCVIOMMUState *s) +{ + uint64_t base; + uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR); + uint32_t ctrl_clr; + bool enable = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQEN); + bool active = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQON); + + if (enable && !active) { + base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_CQB); + s->cq_mask = (2ULL << get_field(base, RISCV_IOMMU_CQB_LOG2SZ)) - 1; + s->cq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_CQB_PPN)); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~s->cq_mask); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQH], 0); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQT], 0); + ctrl_set = RISCV_IOMMU_CQCSR_CQON; + ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQMF | + RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CMD_TO | + RISCV_IOMMU_CQCSR_FENCE_W_IP; + } else if (!enable && active) { + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~0); + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQON; + } else { + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_CQCSR_BUSY; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, ctrl_set, ctrl_clr); +} + +static void riscv_iommu_process_fq_control(RISCVIOMMUState *s) +{ + uint64_t base; + uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR); + uint32_t ctrl_clr; + bool enable = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQEN); + bool active = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQON); + + if (enable && !active) { + base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_FQB); + s->fq_mask = (2ULL << get_field(base, RISCV_IOMMU_FQB_LOG2SZ)) - 1; + s->fq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_FQB_PPN)); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~s->fq_mask); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQH], 0); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQT], 0); + ctrl_set = RISCV_IOMMU_FQCSR_FQON; + ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQMF | + RISCV_IOMMU_FQCSR_FQOF; + } else if (!enable && active) { + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~0); + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQON; + } else { + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_FQCSR_BUSY; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, ctrl_set, ctrl_clr); +} + +static void riscv_iommu_process_pq_control(RISCVIOMMUState *s) +{ + uint64_t base; + uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR); + uint32_t ctrl_clr; + bool enable = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQEN); + bool active = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQON); + + if (enable && !active) { + base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_PQB); + s->pq_mask = (2ULL << get_field(base, RISCV_IOMMU_PQB_LOG2SZ)) - 1; + s->pq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_PQB_PPN)); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~s->pq_mask); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQH], 0); + stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQT], 0); + ctrl_set = RISCV_IOMMU_PQCSR_PQON; + ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQMF | + RISCV_IOMMU_PQCSR_PQOF; + } else if (!enable && active) { + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~0); + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQON; + } else { + ctrl_set = 0; + ctrl_clr = RISCV_IOMMU_PQCSR_BUSY; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr); +} + +typedef void riscv_iommu_process_fn(RISCVIOMMUState *s); + +static void riscv_iommu_update_icvec(RISCVIOMMUState *s, uint64_t data) +{ + uint64_t icvec = 0; + + icvec |= MIN(data & RISCV_IOMMU_ICVEC_CIV, + s->icvec_avail_vectors & RISCV_IOMMU_ICVEC_CIV); + + icvec |= MIN(data & RISCV_IOMMU_ICVEC_FIV, + s->icvec_avail_vectors & RISCV_IOMMU_ICVEC_FIV); + + icvec |= MIN(data & RISCV_IOMMU_ICVEC_PMIV, + s->icvec_avail_vectors & RISCV_IOMMU_ICVEC_PMIV); + + icvec |= MIN(data & RISCV_IOMMU_ICVEC_PIV, + s->icvec_avail_vectors & RISCV_IOMMU_ICVEC_PIV); + + trace_riscv_iommu_icvec_write(data, icvec); + + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_ICVEC, icvec); +} + +static void riscv_iommu_update_ipsr(RISCVIOMMUState *s, uint64_t data) +{ + uint32_t cqcsr, fqcsr, pqcsr; + uint32_t ipsr_set = 0; + uint32_t ipsr_clr = 0; + + if (data & RISCV_IOMMU_IPSR_CIP) { + cqcsr = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR); + + if (cqcsr & RISCV_IOMMU_CQCSR_CIE && + (cqcsr & RISCV_IOMMU_CQCSR_FENCE_W_IP || + cqcsr & RISCV_IOMMU_CQCSR_CMD_ILL || + cqcsr & RISCV_IOMMU_CQCSR_CMD_TO || + cqcsr & RISCV_IOMMU_CQCSR_CQMF)) { + ipsr_set |= RISCV_IOMMU_IPSR_CIP; + } else { + ipsr_clr |= RISCV_IOMMU_IPSR_CIP; + } + } else { + ipsr_clr |= RISCV_IOMMU_IPSR_CIP; + } + + if (data & RISCV_IOMMU_IPSR_FIP) { + fqcsr = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR); + + if (fqcsr & RISCV_IOMMU_FQCSR_FIE && + (fqcsr & RISCV_IOMMU_FQCSR_FQOF || + fqcsr & RISCV_IOMMU_FQCSR_FQMF)) { + ipsr_set |= RISCV_IOMMU_IPSR_FIP; + } else { + ipsr_clr |= RISCV_IOMMU_IPSR_FIP; + } + } else { + ipsr_clr |= RISCV_IOMMU_IPSR_FIP; + } + + if (data & RISCV_IOMMU_IPSR_PIP) { + pqcsr = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR); + + if (pqcsr & RISCV_IOMMU_PQCSR_PIE && + (pqcsr & RISCV_IOMMU_PQCSR_PQOF || + pqcsr & RISCV_IOMMU_PQCSR_PQMF)) { + ipsr_set |= RISCV_IOMMU_IPSR_PIP; + } else { + ipsr_clr |= RISCV_IOMMU_IPSR_PIP; + } + } else { + ipsr_clr |= RISCV_IOMMU_IPSR_PIP; + } + + riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, ipsr_set, ipsr_clr); +} + +/* + * Write the resulting value of 'data' for the reg specified + * by 'reg_addr', after considering read-only/read-write/write-clear + * bits, in the pointer 'dest'. + * + * The result is written in little-endian. + */ +static void riscv_iommu_write_reg_val(RISCVIOMMUState *s, + void *dest, hwaddr reg_addr, + int size, uint64_t data) +{ + uint64_t ro = ldn_le_p(&s->regs_ro[reg_addr], size); + uint64_t wc = ldn_le_p(&s->regs_wc[reg_addr], size); + uint64_t rw = ldn_le_p(&s->regs_rw[reg_addr], size); + + stn_le_p(dest, size, ((rw & ro) | (data & ~ro)) & ~(data & wc)); +} + +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size, + MemTxAttrs attrs) +{ + riscv_iommu_process_fn *process_fn = NULL; + RISCVIOMMUState *s = opaque; + uint32_t regb = addr & ~3; + uint32_t busy = 0; + uint64_t val = 0; + + if ((addr & (size - 1)) != 0) { + /* Unsupported MMIO alignment or access size */ + return MEMTX_ERROR; + } + + if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) { + /* Unsupported MMIO access location. */ + return MEMTX_ACCESS_ERROR; + } + + /* Track actionable MMIO write. */ + switch (regb) { + case RISCV_IOMMU_REG_DDTP: + case RISCV_IOMMU_REG_DDTP + 4: + process_fn = riscv_iommu_process_ddtp; + regb = RISCV_IOMMU_REG_DDTP; + busy = RISCV_IOMMU_DDTP_BUSY; + break; + + case RISCV_IOMMU_REG_CQT: + process_fn = riscv_iommu_process_cq_tail; + break; + + case RISCV_IOMMU_REG_CQCSR: + process_fn = riscv_iommu_process_cq_control; + busy = RISCV_IOMMU_CQCSR_BUSY; + break; + + case RISCV_IOMMU_REG_FQCSR: + process_fn = riscv_iommu_process_fq_control; + busy = RISCV_IOMMU_FQCSR_BUSY; + break; + + case RISCV_IOMMU_REG_PQCSR: + process_fn = riscv_iommu_process_pq_control; + busy = RISCV_IOMMU_PQCSR_BUSY; + break; + + case RISCV_IOMMU_REG_ICVEC: + case RISCV_IOMMU_REG_IPSR: + /* + * ICVEC and IPSR have special read/write procedures. We'll + * call their respective helpers and exit. + */ + riscv_iommu_write_reg_val(s, &val, addr, size, data); + + /* + * 'val' is stored as LE. Switch to host endianess + * before using it. + */ + val = le64_to_cpu(val); + + if (regb == RISCV_IOMMU_REG_ICVEC) { + riscv_iommu_update_icvec(s, val); + } else { + riscv_iommu_update_ipsr(s, val); + } + + return MEMTX_OK; + + default: + break; + } + + /* + * Registers update might be not synchronized with core logic. + * If system software updates register when relevant BUSY bit + * is set IOMMU behavior of additional writes to the register + * is UNSPECIFIED. + */ + riscv_iommu_write_reg_val(s, &s->regs_rw[addr], addr, size, data); + + /* Busy flag update, MSB 4-byte register. */ + if (busy) { + uint32_t rw = ldl_le_p(&s->regs_rw[regb]); + stl_le_p(&s->regs_rw[regb], rw | busy); + } + + if (process_fn) { + process_fn(s); + } + + return MEMTX_OK; +} + +static MemTxResult riscv_iommu_mmio_read(void *opaque, hwaddr addr, + uint64_t *data, unsigned size, MemTxAttrs attrs) +{ + RISCVIOMMUState *s = opaque; + uint64_t val = -1; + uint8_t *ptr; + + if ((addr & (size - 1)) != 0) { + /* Unsupported MMIO alignment. */ + return MEMTX_ERROR; + } + + if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) { + return MEMTX_ACCESS_ERROR; + } + + ptr = &s->regs_rw[addr]; + val = ldn_le_p(ptr, size); + + *data = val; + + return MEMTX_OK; +} + +static const MemoryRegionOps riscv_iommu_mmio_ops = { + .read_with_attrs = riscv_iommu_mmio_read, + .write_with_attrs = riscv_iommu_mmio_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .impl = { + .min_access_size = 4, + .max_access_size = 8, + .unaligned = false, + }, + .valid = { + .min_access_size = 4, + .max_access_size = 8, + } +}; + +/* + * Translations matching MSI pattern check are redirected to "riscv-iommu-trap" + * memory region as untranslated address, for additional MSI/MRIF interception + * by IOMMU interrupt remapping implementation. + * Note: Device emulation code generating an MSI is expected to provide a valid + * memory transaction attributes with requested_id set. + */ +static MemTxResult riscv_iommu_trap_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size, MemTxAttrs attrs) +{ + RISCVIOMMUState* s = (RISCVIOMMUState *)opaque; + RISCVIOMMUContext *ctx; + MemTxResult res; + void *ref; + uint32_t devid = attrs.requester_id; + + if (attrs.unspecified) { + return MEMTX_ACCESS_ERROR; + } + + /* FIXME: PCIe bus remapping for attached endpoints. */ + devid |= s->bus << 8; + + ctx = riscv_iommu_ctx(s, devid, 0, &ref); + if (ctx == NULL) { + res = MEMTX_ACCESS_ERROR; + } else { + res = riscv_iommu_msi_write(s, ctx, addr, data, size, attrs); + } + riscv_iommu_ctx_put(s, ref); + return res; +} + +static MemTxResult riscv_iommu_trap_read(void *opaque, hwaddr addr, + uint64_t *data, unsigned size, MemTxAttrs attrs) +{ + return MEMTX_ACCESS_ERROR; +} + +static const MemoryRegionOps riscv_iommu_trap_ops = { + .read_with_attrs = riscv_iommu_trap_read, + .write_with_attrs = riscv_iommu_trap_write, + .endianness = DEVICE_LITTLE_ENDIAN, + .impl = { + .min_access_size = 4, + .max_access_size = 8, + .unaligned = true, + }, + .valid = { + .min_access_size = 4, + .max_access_size = 8, + } +}; + +static void riscv_iommu_realize(DeviceState *dev, Error **errp) +{ + RISCVIOMMUState *s = RISCV_IOMMU(dev); + + s->cap = s->version & RISCV_IOMMU_CAP_VERSION; + if (s->enable_msi) { + s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF; + } + if (s->enable_s_stage) { + s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 | + RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57; + } + if (s->enable_g_stage) { + s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 | + RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4; + } + /* Report QEMU target physical address space limits */ + s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS, + TARGET_PHYS_ADDR_SPACE_BITS); + + /* TODO: method to report supported PID bits */ + s->pid_bits = 8; /* restricted to size of MemTxAttrs.pid */ + s->cap |= RISCV_IOMMU_CAP_PD8; + + /* Out-of-reset translation mode: OFF (DMA disabled) BARE (passthrough) */ + s->ddtp = set_field(0, RISCV_IOMMU_DDTP_MODE, s->enable_off ? + RISCV_IOMMU_DDTP_MODE_OFF : RISCV_IOMMU_DDTP_MODE_BARE); + + /* register storage */ + s->regs_rw = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE); + s->regs_ro = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE); + s->regs_wc = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE); + + /* Mark all registers read-only */ + memset(s->regs_ro, 0xff, RISCV_IOMMU_REG_SIZE); + + /* + * Register complete MMIO space, including MSI/PBA registers. + * Note, PCIDevice implementation will add overlapping MR for MSI/PBA, + * managed directly by the PCIDevice implementation. + */ + memory_region_init_io(&s->regs_mr, OBJECT(dev), &riscv_iommu_mmio_ops, s, + "riscv-iommu-regs", RISCV_IOMMU_REG_SIZE); + + /* Set power-on register state */ + stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_CAP], s->cap); + stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_FCTL], 0); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FCTL], + ~(RISCV_IOMMU_FCTL_BE | RISCV_IOMMU_FCTL_WSI)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_DDTP], + ~(RISCV_IOMMU_DDTP_PPN | RISCV_IOMMU_DDTP_MODE)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQB], + ~(RISCV_IOMMU_CQB_LOG2SZ | RISCV_IOMMU_CQB_PPN)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQB], + ~(RISCV_IOMMU_FQB_LOG2SZ | RISCV_IOMMU_FQB_PPN)); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQB], + ~(RISCV_IOMMU_PQB_LOG2SZ | RISCV_IOMMU_PQB_PPN)); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQMF | + RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQON | + RISCV_IOMMU_CQCSR_BUSY); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQMF | + RISCV_IOMMU_FQCSR_FQOF); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQON | + RISCV_IOMMU_FQCSR_BUSY); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQMF | + RISCV_IOMMU_PQCSR_PQOF); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQON | + RISCV_IOMMU_PQCSR_BUSY); + stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0); + stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_ICVEC], 0); + stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp); + + /* Memory region for downstream access, if specified. */ + if (s->target_mr) { + s->target_as = g_new0(AddressSpace, 1); + address_space_init(s->target_as, s->target_mr, + "riscv-iommu-downstream"); + } else { + /* Fallback to global system memory. */ + s->target_as = &address_space_memory; + } + + /* Memory region for untranslated MRIF/MSI writes */ + memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s, + "riscv-iommu-trap", ~0ULL); + address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as"); + + /* Device translation context cache */ + s->ctx_cache = g_hash_table_new_full(riscv_iommu_ctx_hash, + riscv_iommu_ctx_equal, + g_free, NULL); + + s->iommus.le_next = NULL; + s->iommus.le_prev = NULL; + QLIST_INIT(&s->spaces); +} + +static void riscv_iommu_unrealize(DeviceState *dev) +{ + RISCVIOMMUState *s = RISCV_IOMMU(dev); + + g_hash_table_unref(s->ctx_cache); +} + +static Property riscv_iommu_properties[] = { + DEFINE_PROP_UINT32("version", RISCVIOMMUState, version, + RISCV_IOMMU_SPEC_DOT_VER), + DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0), + DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), + DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), + DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE), + DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE), + DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr, + TYPE_MEMORY_REGION, MemoryRegion *), + DEFINE_PROP_END_OF_LIST(), +}; + +static void riscv_iommu_class_init(ObjectClass *klass, void* data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + /* internal device for riscv-iommu-{pci/sys}, not user-creatable */ + dc->user_creatable = false; + dc->realize = riscv_iommu_realize; + dc->unrealize = riscv_iommu_unrealize; + device_class_set_props(dc, riscv_iommu_properties); +} + +static const TypeInfo riscv_iommu_info = { + .name = TYPE_RISCV_IOMMU, + .parent = TYPE_DEVICE, + .instance_size = sizeof(RISCVIOMMUState), + .class_init = riscv_iommu_class_init, +}; + +static const char *IOMMU_FLAG_STR[] = { + "NA", + "RO", + "WR", + "RW", +}; + +/* RISC-V IOMMU Memory Region - Address Translation Space */ +static IOMMUTLBEntry riscv_iommu_memory_region_translate( + IOMMUMemoryRegion *iommu_mr, hwaddr addr, + IOMMUAccessFlags flag, int iommu_idx) +{ + RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr); + RISCVIOMMUContext *ctx; + void *ref; + IOMMUTLBEntry iotlb = { + .iova = addr, + .target_as = as->iommu->target_as, + .addr_mask = ~0ULL, + .perm = flag, + }; + + ctx = riscv_iommu_ctx(as->iommu, as->devid, iommu_idx, &ref); + if (ctx == NULL) { + /* Translation disabled or invalid. */ + iotlb.addr_mask = 0; + iotlb.perm = IOMMU_NONE; + } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) { + /* Translation disabled or fault reported. */ + iotlb.addr_mask = 0; + iotlb.perm = IOMMU_NONE; + } + + /* Trace all dma translations with original access flags. */ + trace_riscv_iommu_dma(as->iommu->parent_obj.id, PCI_BUS_NUM(as->devid), + PCI_SLOT(as->devid), PCI_FUNC(as->devid), iommu_idx, + IOMMU_FLAG_STR[flag & IOMMU_RW], iotlb.iova, + iotlb.translated_addr); + + riscv_iommu_ctx_put(as->iommu, ref); + + return iotlb; +} + +static int riscv_iommu_memory_region_notify( + IOMMUMemoryRegion *iommu_mr, IOMMUNotifierFlag old, + IOMMUNotifierFlag new, Error **errp) +{ + RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr); + + if (old == IOMMU_NOTIFIER_NONE) { + as->notifier = true; + trace_riscv_iommu_notifier_add(iommu_mr->parent_obj.name); + } else if (new == IOMMU_NOTIFIER_NONE) { + as->notifier = false; + trace_riscv_iommu_notifier_del(iommu_mr->parent_obj.name); + } + + return 0; +} + +static inline bool pci_is_iommu(PCIDevice *pdev) +{ + return pci_get_word(pdev->config + PCI_CLASS_DEVICE) == 0x0806; +} + +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn) +{ + RISCVIOMMUState *s = (RISCVIOMMUState *) opaque; + PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn); + AddressSpace *as = NULL; + + if (pdev && pci_is_iommu(pdev)) { + return s->target_as; + } + + /* Find first registered IOMMU device */ + while (s->iommus.le_prev) { + s = *(s->iommus.le_prev); + } + + /* Find first matching IOMMU */ + while (s != NULL && as == NULL) { + as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn)); + s = s->iommus.le_next; + } + + return as ? as : &address_space_memory; +} + +static const PCIIOMMUOps riscv_iommu_ops = { + .get_address_space = riscv_iommu_find_as, +}; + +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus, + Error **errp) +{ + if (bus->iommu_ops && + bus->iommu_ops->get_address_space == riscv_iommu_find_as) { + /* Allow multiple IOMMUs on the same PCIe bus, link known devices */ + RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque; + QLIST_INSERT_AFTER(last, iommu, iommus); + } else if (!bus->iommu_ops && !bus->iommu_opaque) { + pci_setup_iommu(bus, &riscv_iommu_ops, iommu); + } else { + error_setg(errp, "can't register secondary IOMMU for PCI bus #%d", + pci_bus_num(bus)); + } +} + +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr, + MemTxAttrs attrs) +{ + return attrs.unspecified ? RISCV_IOMMU_NOPROCID : (int)attrs.pid; +} + +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr) +{ + RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr); + return 1 << as->iommu->pid_bits; +} + +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data) +{ + IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass); + + imrc->translate = riscv_iommu_memory_region_translate; + imrc->notify_flag_changed = riscv_iommu_memory_region_notify; + imrc->attrs_to_index = riscv_iommu_memory_region_index; + imrc->num_indexes = riscv_iommu_memory_region_index_len; +} + +static const TypeInfo riscv_iommu_memory_region_info = { + .parent = TYPE_IOMMU_MEMORY_REGION, + .name = TYPE_RISCV_IOMMU_MEMORY_REGION, + .class_init = riscv_iommu_memory_region_init, +}; + +static void riscv_iommu_register_mr_types(void) +{ + type_register_static(&riscv_iommu_memory_region_info); + type_register_static(&riscv_iommu_info); +} + +type_init(riscv_iommu_register_mr_types); diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_RISCV_IOMMU_STATE_H +#define HW_RISCV_IOMMU_STATE_H + +#include "qom/object.h" +#include "hw/riscv/iommu.h" + +struct RISCVIOMMUState { + /*< private >*/ + DeviceState parent_obj; + + /*< public >*/ + uint32_t version; /* Reported interface version number */ + uint32_t pid_bits; /* process identifier width */ + uint32_t bus; /* PCI bus mapping for non-root endpoints */ + + uint64_t cap; /* IOMMU supported capabilities */ + uint64_t fctl; /* IOMMU enabled features */ + uint64_t icvec_avail_vectors; /* Available interrupt vectors in ICVEC */ + + bool enable_off; /* Enable out-of-reset OFF mode (DMA disabled) */ + bool enable_msi; /* Enable MSI remapping */ + bool enable_s_stage; /* Enable S/VS-Stage translation */ + bool enable_g_stage; /* Enable G-Stage translation */ + + /* IOMMU Internal State */ + uint64_t ddtp; /* Validated Device Directory Tree Root Pointer */ + + dma_addr_t cq_addr; /* Command queue base physical address */ + dma_addr_t fq_addr; /* Fault/event queue base physical address */ + dma_addr_t pq_addr; /* Page request queue base physical address */ + + uint32_t cq_mask; /* Command queue index bit mask */ + uint32_t fq_mask; /* Fault/event queue index bit mask */ + uint32_t pq_mask; /* Page request queue index bit mask */ + + /* interrupt notifier */ + void (*notify)(RISCVIOMMUState *iommu, unsigned vector); + + /* IOMMU State Machine */ + QemuThread core_proc; /* Background processing thread */ + QemuCond core_cond; /* Background processing wake up signal */ + unsigned core_exec; /* Processing thread execution actions */ + + /* IOMMU target address space */ + AddressSpace *target_as; + MemoryRegion *target_mr; + + /* MSI / MRIF access trap */ + AddressSpace trap_as; + MemoryRegion trap_mr; + + GHashTable *ctx_cache; /* Device translation Context Cache */ + + /* MMIO Hardware Interface */ + MemoryRegion regs_mr; + uint8_t *regs_rw; /* register state (user write) */ + uint8_t *regs_wc; /* write-1-to-clear mask */ + uint8_t *regs_ro; /* read-only mask */ + + QLIST_ENTRY(RISCVIOMMUState) iommus; + QLIST_HEAD(, RISCVIOMMUSpace) spaces; +}; + +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus, + Error **errp); + +/* private helpers */ + +/* Register helper functions */ +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s, + unsigned idx, uint32_t set, uint32_t clr) +{ + uint32_t val = ldl_le_p(s->regs_rw + idx); + stl_le_p(s->regs_rw + idx, (val & ~clr) | set); + return val; +} + +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s, unsigned idx, + uint32_t set) +{ + stl_le_p(s->regs_rw + idx, set); +} + +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s, unsigned idx) +{ + return ldl_le_p(s->regs_rw + idx); +} + +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s, unsigned idx, + uint64_t set, uint64_t clr) +{ + uint64_t val = ldq_le_p(s->regs_rw + idx); + stq_le_p(s->regs_rw + idx, (val & ~clr) | set); + return val; +} + +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s, unsigned idx, + uint64_t set) +{ + stq_le_p(s->regs_rw + idx, set); +} + +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s, + unsigned idx) +{ + return ldq_le_p(s->regs_rw + idx); +} +#endif diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/trace-events @@ -XXX,XX +XXX,XX @@ +# See documentation at docs/devel/tracing.rst + +# riscv-iommu.c +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d" +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64 +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64 +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64 +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64 +riscv_iommu_mrif_notification(const char *id, uint32_t nid, uint64_t phys) "%s: sent MRIF notification 0x%x to 0x%"PRIx64 +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64 +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added" +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed" +riscv_iommu_notify_int_vector(uint32_t cause, uint32_t vector) "Interrupt cause 0x%x sent via vector 0x%x" +riscv_iommu_icvec_write(uint32_t orig, uint32_t actual) "ICVEC write: incoming 0x%x actual 0x%x" diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/trace.h @@ -0,0 +1 @@ +#include "trace/trace-hw_riscv.h" diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/riscv/iommu.h @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_RISCV_IOMMU_H +#define HW_RISCV_IOMMU_H + +#include "qemu/osdep.h" +#include "qom/object.h" + +#define TYPE_RISCV_IOMMU "riscv-iommu" +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU) +typedef struct RISCVIOMMUState RISCVIOMMUState; + +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr" +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace; + +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci" +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI) +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci; + +#endif diff --git a/meson.build b/meson.build index XXXXXXX..XXXXXXX 100644 --- a/meson.build +++ b/meson.build @@ -XXX,XX +XXX,XX @@ if have_system 'hw/pci-host', 'hw/ppc', 'hw/rtc', + 'hw/riscv', 'hw/s390x', 'hw/scsi', 'hw/sd', -- 2.45.2
The RISC-V IOMMU PCI device we're going to add next is a reference implementation of the riscv-iommu spec [1], which predicts that the IOMMU can be implemented as a PCIe device. However, RISC-V International (RVI), the entity that ratified the riscv-iommu spec, didn't bother assigning a PCI ID for this IOMMU PCIe implementation that the spec predicts. This puts us in an uncommon situation because we want to add the reference IOMMU PCIe implementation but we don't have a PCI ID for it. Given that RVI doesn't provide a PCI ID for it we reached out to Red Hat and Gerd Hoffman, and they were kind enough to give us a PCI ID for the RISC-V IOMMU PCI reference device. Thanks Red Hat and Gerd for this RISC-V IOMMU PCIe device ID. [1] https://github.com/riscv-non-isa/riscv-iommu/releases/tag/v1.0.0 Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> --- docs/specs/pci-ids.rst | 2 ++ include/hw/pci/pci.h | 1 + 2 files changed, 3 insertions(+) diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/specs/pci-ids.rst +++ b/docs/specs/pci-ids.rst @@ -XXX,XX +XXX,XX @@ PCI devices (other than virtio): PCI ACPI ERST device (``-device acpi-erst``) 1b36:0013 PCI UFS device (``-device ufs``) +1b36:0014 + PCI RISC-V IOMMU device All these devices are documented in :doc:`index`. diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -XXX,XX +XXX,XX @@ extern bool pci_available; #define PCI_DEVICE_ID_REDHAT_PVPANIC 0x0011 #define PCI_DEVICE_ID_REDHAT_ACPI_ERST 0x0012 #define PCI_DEVICE_ID_REDHAT_UFS 0x0013 +#define PCI_DEVICE_ID_REDHAT_RISCV_IOMMU 0x0014 #define PCI_DEVICE_ID_REDHAT_QXL 0x0100 #define FMT_PCIBUS PRIx64 -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> The RISC-V IOMMU can be modelled as a PCIe device following the guidelines of the RISC-V IOMMU spec, chapter 7.1, "Integrating an IOMMU as a PCIe device". Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/meson.build | 2 +- hw/riscv/riscv-iommu-pci.c | 202 +++++++++++++++++++++++++++++++++++++ 2 files changed, 203 insertions(+), 1 deletion(-) create mode 100644 hw/riscv/riscv-iommu-pci.c diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/meson.build +++ b/hw/riscv/meson.build @@ -XXX,XX +XXX,XX @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c')) riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c')) riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c')) riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c')) -riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c')) +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c')) hw_arch += {'riscv': riscv_ss} diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/riscv/riscv-iommu-pci.c @@ -XXX,XX +XXX,XX @@ +/* + * QEMU emulation of an RISC-V IOMMU + * + * Copyright (C) 2022-2023 Rivos Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" +#include "hw/pci/pci_bus.h" +#include "hw/qdev-properties.h" +#include "hw/riscv/riscv_hart.h" +#include "migration/vmstate.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "qemu/host-utils.h" +#include "qom/object.h" + +#include "cpu_bits.h" +#include "riscv-iommu.h" +#include "riscv-iommu-bits.h" + +/* RISC-V IOMMU PCI Device Emulation */ +#define RISCV_PCI_CLASS_SYSTEM_IOMMU 0x0806 + +/* + * 4 MSIx vectors for ICVEC, one for MRIF. The spec mentions in + * the "Placement and data flow" section that: + * + * "The interfaces related to recording an incoming MSI in a memory-resident + * interrupt file (MRIF) are implementation-specific. The partitioning of + * responsibility between the IOMMU and the IO bridge for recording the + * incoming MSI in an MRIF and generating the associated notice MSI are + * implementation-specific." + * + * We're making a design decision to create the MSIx for MRIF in the + * IOMMU MSIx emulation. + */ +#define RISCV_IOMMU_PCI_MSIX_VECTORS 5 + +/* + * 4 vectors that can be used by civ, fiv, pmiv and piv. Number of + * vectors is represented by 2^N, where N = number of writable bits + * in each cause. For 4 vectors we'll write 0b11 (3) in each reg. + */ +#define RISCV_IOMMU_PCI_ICVEC_VECTORS 0x3333 + +typedef struct RISCVIOMMUStatePci { + PCIDevice pci; /* Parent PCIe device state */ + uint16_t vendor_id; + uint16_t device_id; + uint8_t revision; + MemoryRegion bar0; /* PCI BAR (including MSI-x config) */ + RISCVIOMMUState iommu; /* common IOMMU state */ +} RISCVIOMMUStatePci; + +/* interrupt delivery callback */ +static void riscv_iommu_pci_notify(RISCVIOMMUState *iommu, unsigned vector) +{ + RISCVIOMMUStatePci *s = container_of(iommu, RISCVIOMMUStatePci, iommu); + + if (msix_enabled(&(s->pci))) { + msix_notify(&(s->pci), vector); + } +} + +static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp) +{ + RISCVIOMMUStatePci *s = DO_UPCAST(RISCVIOMMUStatePci, pci, dev); + RISCVIOMMUState *iommu = &s->iommu; + uint8_t *pci_conf = dev->config; + Error *err = NULL; + + pci_set_word(pci_conf + PCI_VENDOR_ID, s->vendor_id); + pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID, s->vendor_id); + pci_set_word(pci_conf + PCI_DEVICE_ID, s->device_id); + pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, s->device_id); + pci_set_byte(pci_conf + PCI_REVISION_ID, s->revision); + + /* Set device id for trace / debug */ + DEVICE(iommu)->id = g_strdup_printf("%02x:%02x.%01x", + pci_dev_bus_num(dev), PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn)); + qdev_realize(DEVICE(iommu), NULL, errp); + + memory_region_init(&s->bar0, OBJECT(s), "riscv-iommu-bar0", + QEMU_ALIGN_UP(memory_region_size(&iommu->regs_mr), TARGET_PAGE_SIZE)); + memory_region_add_subregion(&s->bar0, 0, &iommu->regs_mr); + + pcie_endpoint_cap_init(dev, 0); + + pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0); + + int ret = msix_init(dev, RISCV_IOMMU_PCI_MSIX_VECTORS, + &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG, + &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err); + + if (ret == -ENOTSUP) { + /* + * MSI-x is not supported by the platform. + * Driver should use timer/polling based notification handlers. + */ + warn_report_err(err); + } else if (ret < 0) { + error_propagate(errp, err); + return; + } else { + /* Mark all ICVEC MSIx vectors as used */ + for (int i = 0; i < RISCV_IOMMU_PCI_MSIX_VECTORS; i++) { + msix_vector_use(dev, i); + } + + iommu->notify = riscv_iommu_pci_notify; + } + + PCIBus *bus = pci_device_root_bus(dev); + if (!bus) { + error_setg(errp, "can't find PCIe root port for %02x:%02x.%x", + pci_bus_num(pci_get_bus(dev)), PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn)); + return; + } + + riscv_iommu_pci_setup_iommu(iommu, bus, errp); +} + +static void riscv_iommu_pci_exit(PCIDevice *pci_dev) +{ + pci_setup_iommu(pci_device_root_bus(pci_dev), NULL, NULL); +} + +static const VMStateDescription riscv_iommu_vmstate = { + .name = "riscv-iommu", + .unmigratable = 1 +}; + +static void riscv_iommu_pci_init(Object *obj) +{ + RISCVIOMMUStatePci *s = RISCV_IOMMU_PCI(obj); + RISCVIOMMUState *iommu = &s->iommu; + + object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU); + qdev_alias_all_properties(DEVICE(iommu), obj); + + iommu->icvec_avail_vectors = RISCV_IOMMU_PCI_ICVEC_VECTORS; +} + +static Property riscv_iommu_pci_properties[] = { + DEFINE_PROP_UINT16("vendor-id", RISCVIOMMUStatePci, vendor_id, + PCI_VENDOR_ID_REDHAT), + DEFINE_PROP_UINT16("device-id", RISCVIOMMUStatePci, device_id, + PCI_DEVICE_ID_REDHAT_RISCV_IOMMU), + DEFINE_PROP_UINT8("revision", RISCVIOMMUStatePci, revision, 0x01), + DEFINE_PROP_END_OF_LIST(), +}; + +static void riscv_iommu_pci_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + + k->realize = riscv_iommu_pci_realize; + k->exit = riscv_iommu_pci_exit; + k->class_id = RISCV_PCI_CLASS_SYSTEM_IOMMU; + dc->desc = "RISCV-IOMMU DMA Remapping device"; + dc->vmsd = &riscv_iommu_vmstate; + dc->hotpluggable = false; + dc->user_creatable = true; + set_bit(DEVICE_CATEGORY_MISC, dc->categories); + device_class_set_props(dc, riscv_iommu_pci_properties); +} + +static const TypeInfo riscv_iommu_pci = { + .name = TYPE_RISCV_IOMMU_PCI, + .parent = TYPE_PCI_DEVICE, + .class_init = riscv_iommu_pci_class_init, + .instance_init = riscv_iommu_pci_init, + .instance_size = sizeof(RISCVIOMMUStatePci), + .interfaces = (InterfaceInfo[]) { + { INTERFACE_PCIE_DEVICE }, + { }, + }, +}; + +static void riscv_iommu_register_pci_types(void) +{ + type_register_static(&riscv_iommu_pci); +} + +type_init(riscv_iommu_register_pci_types); -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Generate device tree entry for riscv-iommu PCI device, along with mapping all PCI device identifiers to the single IOMMU device instance. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/virt.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/virt.c +++ b/hw/riscv/virt.c @@ -XXX,XX +XXX,XX @@ #include "hw/core/sysbus-fdt.h" #include "target/riscv/pmu.h" #include "hw/riscv/riscv_hart.h" +#include "hw/riscv/iommu.h" #include "hw/riscv/virt.h" #include "hw/riscv/boot.h" #include "hw/riscv/numa.h" @@ -XXX,XX +XXX,XX @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, uint16_t bdf) bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf); } +static void create_fdt_iommu(RISCVVirtState *s, uint16_t bdf) +{ + const char comp[] = "riscv,pci-iommu"; + void *fdt = MACHINE(s)->fdt; + uint32_t iommu_phandle; + g_autofree char *iommu_node = NULL; + g_autofree char *pci_node = NULL; + + pci_node = g_strdup_printf("/soc/pci@%lx", + (long) virt_memmap[VIRT_PCIE_ECAM].base); + iommu_node = g_strdup_printf("%s/iommu@%x", pci_node, bdf); + iommu_phandle = qemu_fdt_alloc_phandle(fdt); + qemu_fdt_add_subnode(fdt, iommu_node); + + qemu_fdt_setprop(fdt, iommu_node, "compatible", comp, sizeof(comp)); + qemu_fdt_setprop_cell(fdt, iommu_node, "#iommu-cells", 1); + qemu_fdt_setprop_cell(fdt, iommu_node, "phandle", iommu_phandle); + qemu_fdt_setprop_cells(fdt, iommu_node, "reg", + bdf << 8, 0, 0, 0, 0); + qemu_fdt_setprop_cells(fdt, pci_node, "iommu-map", + 0, iommu_phandle, 0, bdf, + bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf); +} + static void finalize_fdt(RISCVVirtState *s) { uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1; @@ -XXX,XX +XXX,XX @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine, MachineClass *mc = MACHINE_GET_CLASS(machine); if (device_is_dynamic_sysbus(mc, dev) || - object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) { + object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI) || + object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) { return HOTPLUG_HANDLER(machine); } + return NULL; } @@ -XXX,XX +XXX,XX @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev, if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) { create_fdt_virtio_iommu(s, pci_get_bdf(PCI_DEVICE(dev))); } + + if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) { + create_fdt_iommu(s, pci_get_bdf(PCI_DEVICE(dev))); + } } static void virt_machine_class_init(ObjectClass *oc, void *data) -- 2.45.2
To test the RISC-V IOMMU emulation we'll use its PCI representation. Create a new 'riscv-iommu-pci' libqos device that will be present with CONFIG_RISCV_IOMMU. This config is only available for RISC-V, so this device will only be consumed by the RISC-V libqos machine. Start with basic tests: a PCI sanity check and a reset state register test. The reset test was taken from the RISC-V IOMMU spec chapter 5.2, "Reset behavior". More tests will be added later. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> --- tests/qtest/libqos/meson.build | 4 ++ tests/qtest/libqos/riscv-iommu.c | 76 ++++++++++++++++++++++++++++ tests/qtest/libqos/riscv-iommu.h | 71 ++++++++++++++++++++++++++ tests/qtest/meson.build | 1 + tests/qtest/riscv-iommu-test.c | 85 ++++++++++++++++++++++++++++++++ 5 files changed, 237 insertions(+) create mode 100644 tests/qtest/libqos/riscv-iommu.c create mode 100644 tests/qtest/libqos/riscv-iommu.h create mode 100644 tests/qtest/riscv-iommu-test.c diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/libqos/meson.build +++ b/tests/qtest/libqos/meson.build @@ -XXX,XX +XXX,XX @@ if have_virtfs libqos_srcs += files('virtio-9p.c', 'virtio-9p-client.c') endif +if config_all_devices.has_key('CONFIG_RISCV_IOMMU') + libqos_srcs += files('riscv-iommu.c') +endif + libqos = static_library('qos', libqos_srcs + genh, build_by_default: false) diff --git a/tests/qtest/libqos/riscv-iommu.c b/tests/qtest/libqos/riscv-iommu.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/libqos/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ +/* + * libqos driver riscv-iommu-pci framework + * + * Copyright (c) 2024 Ventana Micro Systems Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "../libqtest.h" +#include "qemu/module.h" +#include "qgraph.h" +#include "pci.h" +#include "riscv-iommu.h" + +static void *riscv_iommu_pci_get_driver(void *obj, const char *interface) +{ + QRISCVIOMMU *r_iommu_pci = obj; + + if (!g_strcmp0(interface, "pci-device")) { + return &r_iommu_pci->dev; + } + + fprintf(stderr, "%s not present in riscv_iommu_pci\n", interface); + g_assert_not_reached(); +} + +static void riscv_iommu_pci_start_hw(QOSGraphObject *obj) +{ + QRISCVIOMMU *pci = (QRISCVIOMMU *)obj; + qpci_device_enable(&pci->dev); +} + +static void riscv_iommu_pci_destructor(QOSGraphObject *obj) +{ + QRISCVIOMMU *pci = (QRISCVIOMMU *)obj; + qpci_iounmap(&pci->dev, pci->reg_bar); +} + +static void *riscv_iommu_pci_create(void *pci_bus, QGuestAllocator *alloc, + void *addr) +{ + QRISCVIOMMU *r_iommu_pci = g_new0(QRISCVIOMMU, 1); + QPCIBus *bus = pci_bus; + + qpci_device_init(&r_iommu_pci->dev, bus, addr); + r_iommu_pci->reg_bar = qpci_iomap(&r_iommu_pci->dev, 0, NULL); + + r_iommu_pci->obj.get_driver = riscv_iommu_pci_get_driver; + r_iommu_pci->obj.start_hw = riscv_iommu_pci_start_hw; + r_iommu_pci->obj.destructor = riscv_iommu_pci_destructor; + return &r_iommu_pci->obj; +} + +static void riscv_iommu_pci_register_nodes(void) +{ + QPCIAddress addr = { + .vendor_id = RISCV_IOMMU_PCI_VENDOR_ID, + .device_id = RISCV_IOMMU_PCI_DEVICE_ID, + .devfn = QPCI_DEVFN(1, 0), + }; + + QOSGraphEdgeOptions opts = { + .extra_device_opts = "addr=01.0", + }; + + add_qpci_address(&opts, &addr); + + qos_node_create_driver("riscv-iommu-pci", riscv_iommu_pci_create); + qos_node_produces("riscv-iommu-pci", "pci-device"); + qos_node_consumes("riscv-iommu-pci", "pci-bus", &opts); +} + +libqos_init(riscv_iommu_pci_register_nodes); diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/libqos/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ +/* + * libqos driver riscv-iommu-pci framework + * + * Copyright (c) 2024 Ventana Micro Systems Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ + +#ifndef TESTS_LIBQOS_RISCV_IOMMU_H +#define TESTS_LIBQOS_RISCV_IOMMU_H + +#include "qgraph.h" +#include "pci.h" +#include "qemu/bitops.h" + +#ifndef GENMASK_ULL +#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l)) +#endif + +/* + * RISC-V IOMMU uses PCI_VENDOR_ID_REDHAT 0x1b36 and + * PCI_DEVICE_ID_REDHAT_RISCV_IOMMU 0x0014. + */ +#define RISCV_IOMMU_PCI_VENDOR_ID 0x1b36 +#define RISCV_IOMMU_PCI_DEVICE_ID 0x0014 +#define RISCV_IOMMU_PCI_DEVICE_CLASS 0x0806 + +/* Common field positions */ +#define RISCV_IOMMU_QUEUE_ENABLE BIT(0) +#define RISCV_IOMMU_QUEUE_INTR_ENABLE BIT(1) +#define RISCV_IOMMU_QUEUE_MEM_FAULT BIT(8) +#define RISCV_IOMMU_QUEUE_ACTIVE BIT(16) +#define RISCV_IOMMU_QUEUE_BUSY BIT(17) + +#define RISCV_IOMMU_REG_CAP 0x0000 +#define RISCV_IOMMU_CAP_VERSION GENMASK_ULL(7, 0) + +#define RISCV_IOMMU_REG_DDTP 0x0010 +#define RISCV_IOMMU_DDTP_BUSY BIT_ULL(4) +#define RISCV_IOMMU_DDTP_MODE GENMASK_ULL(3, 0) +#define RISCV_IOMMU_DDTP_MODE_OFF 0 + +#define RISCV_IOMMU_REG_CQCSR 0x0048 +#define RISCV_IOMMU_CQCSR_CQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_CQCSR_CIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_CQCSR_CQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_CQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +#define RISCV_IOMMU_REG_FQCSR 0x004C +#define RISCV_IOMMU_FQCSR_FQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_FQCSR_FIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_FQCSR_FQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_FQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +#define RISCV_IOMMU_REG_PQCSR 0x0050 +#define RISCV_IOMMU_PQCSR_PQEN RISCV_IOMMU_QUEUE_ENABLE +#define RISCV_IOMMU_PQCSR_PIE RISCV_IOMMU_QUEUE_INTR_ENABLE +#define RISCV_IOMMU_PQCSR_PQON RISCV_IOMMU_QUEUE_ACTIVE +#define RISCV_IOMMU_PQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY + +#define RISCV_IOMMU_REG_IPSR 0x0054 + +typedef struct QRISCVIOMMU { + QOSGraphObject obj; + QPCIDevice dev; + QPCIBar reg_bar; +} QRISCVIOMMU; + +#endif diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/meson.build +++ b/tests/qtest/meson.build @@ -XXX,XX +XXX,XX @@ qos_test_ss.add( 'vmxnet3-test.c', 'igb-test.c', 'ufs-test.c', + 'riscv-iommu-test.c', ) if config_all_devices.has_key('CONFIG_VIRTIO_SERIAL') diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/riscv-iommu-test.c @@ -XXX,XX +XXX,XX @@ +/* + * QTest testcase for RISC-V IOMMU + * + * Copyright (c) 2024 Ventana Micro Systems Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "libqtest-single.h" +#include "qemu/module.h" +#include "libqos/qgraph.h" +#include "libqos/riscv-iommu.h" +#include "hw/pci/pci_regs.h" + +static uint32_t riscv_iommu_read_reg32(QRISCVIOMMU *r_iommu, int reg_offset) +{ + return qpci_io_readl(&r_iommu->dev, r_iommu->reg_bar, reg_offset); +} + +static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset) +{ + return qpci_io_readq(&r_iommu->dev, r_iommu->reg_bar, reg_offset); +} + +static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc) +{ + QRISCVIOMMU *r_iommu = obj; + QPCIDevice *dev = &r_iommu->dev; + uint16_t vendorid, deviceid, classid; + + vendorid = qpci_config_readw(dev, PCI_VENDOR_ID); + deviceid = qpci_config_readw(dev, PCI_DEVICE_ID); + classid = qpci_config_readw(dev, PCI_CLASS_DEVICE); + + g_assert_cmpuint(vendorid, ==, RISCV_IOMMU_PCI_VENDOR_ID); + g_assert_cmpuint(deviceid, ==, RISCV_IOMMU_PCI_DEVICE_ID); + g_assert_cmpuint(classid, ==, RISCV_IOMMU_PCI_DEVICE_CLASS); +} + +static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc) +{ + QRISCVIOMMU *r_iommu = obj; + uint64_t cap; + uint32_t reg; + + cap = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP); + g_assert_cmpuint(cap & RISCV_IOMMU_CAP_VERSION, ==, 0x10); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQEN, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CIE, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQON, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_BUSY, ==, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQEN, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FIE, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQON, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_BUSY, ==, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQEN, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PIE, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQON, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_BUSY, ==, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_DDTP); + g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_BUSY, ==, 0); + g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_MODE, ==, + RISCV_IOMMU_DDTP_MODE_OFF); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IPSR); + g_assert_cmpuint(reg, ==, 0); +} + +static void register_riscv_iommu_test(void) +{ + qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL); + qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL); +} + +libqos_init(register_riscv_iommu_test); -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> The RISC-V IOMMU spec predicts that the IOMMU can use translation caches to hold entries from the DDT. This includes implementation for all cache commands that are marked as 'not implemented'. There are some artifacts included in the cache that predicts s-stage and g-stage elements, although we don't support it yet. We'll introduce them next. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/riscv-iommu.c | 204 ++++++++++++++++++++++++++++++++++++++++- hw/riscv/riscv-iommu.h | 3 + 2 files changed, 203 insertions(+), 4 deletions(-) diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUContext { uint64_t msiptp; /* MSI redirection page table pointer */ }; +/* Address translation cache entry */ +struct RISCVIOMMUEntry { + uint64_t iova:44; /* IOVA Page Number */ + uint64_t pscid:20; /* Process Soft-Context identifier */ + uint64_t phys:44; /* Physical Page Number */ + uint64_t gscid:16; /* Guest Soft-Context identifier */ + uint64_t perm:2; /* IOMMU_RW flags */ +}; + /* IOMMU index for transactions without process_id specified. */ #define RISCV_IOMMU_NOPROCID 0 @@ -XXX,XX +XXX,XX @@ static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid) return &as->iova_as; } +/* Translation Object cache support */ +static gboolean riscv_iommu_iot_equal(gconstpointer v1, gconstpointer v2) +{ + RISCVIOMMUEntry *t1 = (RISCVIOMMUEntry *) v1; + RISCVIOMMUEntry *t2 = (RISCVIOMMUEntry *) v2; + return t1->gscid == t2->gscid && t1->pscid == t2->pscid && + t1->iova == t2->iova; +} + +static guint riscv_iommu_iot_hash(gconstpointer v) +{ + RISCVIOMMUEntry *t = (RISCVIOMMUEntry *) v; + return (guint)t->iova; +} + +/* GV: 1 PSCV: 1 AV: 1 */ +static void riscv_iommu_iot_inval_pscid_iova(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid && + iot->pscid == arg->pscid && + iot->iova == arg->iova) { + iot->perm = IOMMU_NONE; + } +} + +/* GV: 1 PSCV: 1 AV: 0 */ +static void riscv_iommu_iot_inval_pscid(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid && + iot->pscid == arg->pscid) { + iot->perm = IOMMU_NONE; + } +} + +/* GV: 1 GVMA: 1 */ +static void riscv_iommu_iot_inval_gscid_gpa(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid) { + /* simplified cache, no GPA matching */ + iot->perm = IOMMU_NONE; + } +} + +/* GV: 1 GVMA: 0 */ +static void riscv_iommu_iot_inval_gscid(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data; + if (iot->gscid == arg->gscid) { + iot->perm = IOMMU_NONE; + } +} + +/* GV: 0 */ +static void riscv_iommu_iot_inval_all(gpointer key, gpointer value, + gpointer data) +{ + RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value; + iot->perm = IOMMU_NONE; +} + +/* caller should keep ref-count for iot_cache object */ +static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx, + GHashTable *iot_cache, hwaddr iova) +{ + RISCVIOMMUEntry key = { + .gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID), + .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID), + .iova = PPN_DOWN(iova), + }; + return g_hash_table_lookup(iot_cache, &key); +} + +/* caller should keep ref-count for iot_cache object */ +static void riscv_iommu_iot_update(RISCVIOMMUState *s, + GHashTable *iot_cache, RISCVIOMMUEntry *iot) +{ + if (!s->iot_limit) { + return; + } + + if (g_hash_table_size(s->iot_cache) >= s->iot_limit) { + iot_cache = g_hash_table_new_full(riscv_iommu_iot_hash, + riscv_iommu_iot_equal, + g_free, NULL); + g_hash_table_unref(qatomic_xchg(&s->iot_cache, iot_cache)); + } + g_hash_table_add(iot_cache, iot); +} + +static void riscv_iommu_iot_inval(RISCVIOMMUState *s, GHFunc func, + uint32_t gscid, uint32_t pscid, hwaddr iova) +{ + GHashTable *iot_cache; + RISCVIOMMUEntry key = { + .gscid = gscid, + .pscid = pscid, + .iova = PPN_DOWN(iova), + }; + + iot_cache = g_hash_table_ref(s->iot_cache); + g_hash_table_foreach(iot_cache, func, &key); + g_hash_table_unref(iot_cache); +} + static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, - IOMMUTLBEntry *iotlb) + IOMMUTLBEntry *iotlb, bool enable_cache) { + RISCVIOMMUEntry *iot; + IOMMUAccessFlags perm; bool enable_pid; bool enable_pri; + GHashTable *iot_cache; int fault; + iot_cache = g_hash_table_ref(s->iot_cache); /* * TC[32] is reserved for custom extensions, used here to temporarily * enable automatic page-request generation for ATS queries. @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32)); enable_pid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV); + iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova); + perm = iot ? iot->perm : IOMMU_NONE; + if (perm != IOMMU_NONE) { + iotlb->translated_addr = PPN_PHYS(iot->phys); + iotlb->addr_mask = ~TARGET_PAGE_MASK; + iotlb->perm = perm; + fault = 0; + goto done; + } + /* Translate using device directory / page table information. */ fault = riscv_iommu_spa_fetch(s, ctx, iotlb); + if (!fault && iotlb->target_as == &s->trap_as) { + /* Do not cache trapped MSI translations */ + goto done; + } + + /* + * We made an implementation choice to not cache identity-mapped + * translations, as allowed by the specification, to avoid + * translation cache evictions for other devices sharing the + * IOMMU hardware model. + */ + if (!fault && iotlb->translated_addr != iotlb->iova && enable_cache) { + iot = g_new0(RISCVIOMMUEntry, 1); + iot->iova = PPN_DOWN(iotlb->iova); + iot->phys = PPN_DOWN(iotlb->translated_addr); + iot->gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID); + iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID); + iot->perm = iotlb->perm; + riscv_iommu_iot_update(s, iot_cache, iot); + } + +done: + g_hash_table_unref(iot_cache); + if (enable_pri && fault) { struct riscv_iommu_pq_record pr = {0}; if (enable_pid) { @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s) if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) { /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */ goto cmd_ill; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) { + /* invalidate all cache mappings */ + func = riscv_iommu_iot_inval_all; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) { + /* invalidate cache matching GSCID */ + func = riscv_iommu_iot_inval_gscid; + } else { + /* invalidate cache matching GSCID and ADDR (GPA) */ + func = riscv_iommu_iot_inval_gscid_gpa; } - /* translation cache not implemented yet */ + riscv_iommu_iot_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), 0, + cmd.dword1 & TARGET_PAGE_MASK); break; case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA, RISCV_IOMMU_CMD_IOTINVAL_OPCODE): - /* translation cache not implemented yet */ + if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) { + /* invalidate all cache mappings, simplified model */ + func = riscv_iommu_iot_inval_all; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV)) { + /* invalidate cache matching GSCID, simplified model */ + func = riscv_iommu_iot_inval_gscid; + } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) { + /* invalidate cache matching GSCID and PSCID */ + func = riscv_iommu_iot_inval_pscid; + } else { + /* invalidate cache matching GSCID and PSCID and ADDR (IOVA) */ + func = riscv_iommu_iot_inval_pscid_iova; + } + riscv_iommu_iot_inval(s, func, + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), + get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_PSCID), + cmd.dword1 & TARGET_PAGE_MASK); break; case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT, @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) riscv_iommu_ctx_equal, g_free, NULL); + s->iot_cache = g_hash_table_new_full(riscv_iommu_iot_hash, + riscv_iommu_iot_equal, + g_free, NULL); + s->iommus.le_next = NULL; s->iommus.le_prev = NULL; QLIST_INIT(&s->spaces); @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_unrealize(DeviceState *dev) { RISCVIOMMUState *s = RISCV_IOMMU(dev); + g_hash_table_unref(s->iot_cache); g_hash_table_unref(s->ctx_cache); } @@ -XXX,XX +XXX,XX @@ static Property riscv_iommu_properties[] = { DEFINE_PROP_UINT32("version", RISCVIOMMUState, version, RISCV_IOMMU_SPEC_DOT_VER), DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0), + DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit, + LIMIT_CACHE_IOT), DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE), @@ -XXX,XX +XXX,XX @@ static IOMMUTLBEntry riscv_iommu_memory_region_translate( /* Translation disabled or invalid. */ iotlb.addr_mask = 0; iotlb.perm = IOMMU_NONE; - } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) { + } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb, true)) { /* Translation disabled or fault reported. */ iotlb.addr_mask = 0; iotlb.perm = IOMMU_NONE; diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.h +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUState { GHashTable *ctx_cache; /* Device translation Context Cache */ + GHashTable *iot_cache; /* IO Translated Address Cache */ + unsigned iot_limit; /* IO Translation Cache size limit */ + /* MMIO Hardware Interface */ MemoryRegion regs_mr; uint8_t *regs_rw; /* register state (user write) */ -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> Add PCIe Address Translation Services (ATS) capabilities to the IOMMU. This will add support for ATS translation requests in Fault/Event queues, Page-request queue and IOATC invalidations. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/riscv-iommu-bits.h | 43 +++++++++++- hw/riscv/riscv-iommu.c | 127 +++++++++++++++++++++++++++++++++++- hw/riscv/riscv-iommu.h | 1 + hw/riscv/trace-events | 3 + 4 files changed, 171 insertions(+), 3 deletions(-) diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-bits.h +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { #define RISCV_IOMMU_CAP_SV57X4 BIT_ULL(19) #define RISCV_IOMMU_CAP_MSI_FLAT BIT_ULL(22) #define RISCV_IOMMU_CAP_MSI_MRIF BIT_ULL(23) +#define RISCV_IOMMU_CAP_ATS BIT_ULL(25) #define RISCV_IOMMU_CAP_T2GPA BIT_ULL(26) #define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) #define RISCV_IOMMU_CAP_PAS GENMASK_ULL(37, 32) @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_dc { /* Translation control fields */ #define RISCV_IOMMU_DC_TC_V BIT_ULL(0) +#define RISCV_IOMMU_DC_TC_EN_ATS BIT_ULL(1) #define RISCV_IOMMU_DC_TC_EN_PRI BIT_ULL(2) #define RISCV_IOMMU_DC_TC_T2GPA BIT_ULL(3) #define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_command { #define RISCV_IOMMU_CMD_IODIR_DV BIT_ULL(33) #define RISCV_IOMMU_CMD_IODIR_DID GENMASK_ULL(63, 40) +/* 3.1.4 I/O MMU PCIe ATS */ +#define RISCV_IOMMU_CMD_ATS_OPCODE 4 +#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL 0 +#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR 1 +#define RISCV_IOMMU_CMD_ATS_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_CMD_ATS_PV BIT_ULL(32) +#define RISCV_IOMMU_CMD_ATS_DSV BIT_ULL(33) +#define RISCV_IOMMU_CMD_ATS_RID GENMASK_ULL(55, 40) +#define RISCV_IOMMU_CMD_ATS_DSEG GENMASK_ULL(63, 56) +/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */ + +/* ATS.PRGR payload */ +#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE GENMASK_ULL(47, 44) + enum riscv_iommu_dc_fsc_atp_modes { RISCV_IOMMU_DC_FSC_MODE_BARE = 0, RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8, @@ -XXX,XX +XXX,XX @@ enum riscv_iommu_fq_ttypes { RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5, RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6, RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7, - RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8, + RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8, + RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9, +}; + +/* Header fields */ +#define RISCV_IOMMU_PREQ_HDR_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_PREQ_HDR_PV BIT_ULL(32) +#define RISCV_IOMMU_PREQ_HDR_PRIV BIT_ULL(33) +#define RISCV_IOMMU_PREQ_HDR_EXEC BIT_ULL(34) +#define RISCV_IOMMU_PREQ_HDR_DID GENMASK_ULL(63, 40) + +/* Payload fields */ +#define RISCV_IOMMU_PREQ_PAYLOAD_R BIT_ULL(0) +#define RISCV_IOMMU_PREQ_PAYLOAD_W BIT_ULL(1) +#define RISCV_IOMMU_PREQ_PAYLOAD_L BIT_ULL(2) +#define RISCV_IOMMU_PREQ_PAYLOAD_M GENMASK_ULL(2, 0) +#define RISCV_IOMMU_PREQ_PRG_INDEX GENMASK_ULL(11, 3) +#define RISCV_IOMMU_PREQ_UADDR GENMASK_ULL(63, 12) + + +/* + * struct riscv_iommu_msi_pte - MSI Page Table Entry + */ +struct riscv_iommu_msi_pte { + uint64_t pte; + uint64_t mrif_info; }; /* Fields on pte */ diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ static bool riscv_iommu_validate_device_ctx(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) { uint32_t fsc_mode, msi_mode; + uint64_t gatp; + + if (!(s->cap & RISCV_IOMMU_CAP_ATS) && + (ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS || + ctx->tc & RISCV_IOMMU_DC_TC_EN_PRI || + ctx->tc & RISCV_IOMMU_DC_TC_PRPR)) { + return false; + } + + if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS) && + (ctx->tc & RISCV_IOMMU_DC_TC_T2GPA || + ctx->tc & RISCV_IOMMU_DC_TC_EN_PRI)) { + return false; + } if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_PRI) && ctx->tc & RISCV_IOMMU_DC_TC_PRPR) { @@ -XXX,XX +XXX,XX @@ static bool riscv_iommu_validate_device_ctx(RISCVIOMMUState *s, } } + gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD); + if (ctx->tc & RISCV_IOMMU_DC_TC_T2GPA && + gatp == RISCV_IOMMU_DC_IOHGATP_MODE_BARE) { + return false; + } + fsc_mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE); if (ctx->tc & RISCV_IOMMU_DC_TC_PDTV) { @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx) RISCV_IOMMU_DC_IOHGATP_MODE_BARE); ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD, RISCV_IOMMU_DC_FSC_MODE_BARE); + ctx->tc = RISCV_IOMMU_DC_TC_V; + if (s->enable_ats) { + ctx->tc |= RISCV_IOMMU_DC_TC_EN_ATS; + } + ctx->ta = 0; ctx->msiptp = 0; return 0; @@ -XXX,XX +XXX,XX @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx, enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32)); enable_pid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV); + /* Check for ATS request. */ + if (iotlb->perm == IOMMU_NONE) { + /* Check if ATS is disabled. */ + if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS)) { + enable_pri = false; + fault = RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED; + goto done; + } + } + iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova); perm = iot ? iot->perm : IOMMU_NONE; if (perm != IOMMU_NONE) { @@ -XXX,XX +XXX,XX @@ done: } if (fault) { - unsigned ttype; + unsigned ttype = RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ; if (iotlb->perm & IOMMU_RW) { ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR; - } else { + } else if (iotlb->perm & IOMMU_RO) { ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD; } @@ -XXX,XX +XXX,XX @@ static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify, MEMTXATTRS_UNSPECIFIED); } +static void riscv_iommu_ats(RISCVIOMMUState *s, + struct riscv_iommu_command *cmd, IOMMUNotifierFlag flag, + IOMMUAccessFlags perm, + void (*trace_fn)(const char *id)) +{ + RISCVIOMMUSpace *as = NULL; + IOMMUNotifier *n; + IOMMUTLBEvent event; + uint32_t pid; + uint32_t devid; + const bool pv = cmd->dword0 & RISCV_IOMMU_CMD_ATS_PV; + + if (cmd->dword0 & RISCV_IOMMU_CMD_ATS_DSV) { + /* Use device segment and requester id */ + devid = get_field(cmd->dword0, + RISCV_IOMMU_CMD_ATS_DSEG | RISCV_IOMMU_CMD_ATS_RID); + } else { + devid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_RID); + } + + pid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_PID); + + QLIST_FOREACH(as, &s->spaces, list) { + if (as->devid == devid) { + break; + } + } + + if (!as || !as->notifier) { + return; + } + + event.type = flag; + event.entry.perm = perm; + event.entry.target_as = s->target_as; + + IOMMU_NOTIFIER_FOREACH(n, &as->iova_mr) { + if (!pv || n->iommu_idx == pid) { + event.entry.iova = n->start; + event.entry.addr_mask = n->end - n->start; + trace_fn(as->iova_mr.parent_obj.name); + memory_region_notify_iommu_one(n, &event); + } + } +} + +static void riscv_iommu_ats_inval(RISCVIOMMUState *s, + struct riscv_iommu_command *cmd) +{ + return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_DEVIOTLB_UNMAP, IOMMU_NONE, + trace_riscv_iommu_ats_inval); +} + +static void riscv_iommu_ats_prgr(RISCVIOMMUState *s, + struct riscv_iommu_command *cmd) +{ + unsigned resp_code = get_field(cmd->dword1, + RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE); + + /* Using the access flag to carry response code information */ + IOMMUAccessFlags perm = resp_code ? IOMMU_NONE : IOMMU_RW; + return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_MAP, perm, + trace_riscv_iommu_ats_prgr); +} + static void riscv_iommu_process_ddtp(RISCVIOMMUState *s) { uint64_t old_ddtp = s->ddtp; @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s) get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID)); break; + /* ATS commands */ + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_INVAL, + RISCV_IOMMU_CMD_ATS_OPCODE): + if (!s->enable_ats) { + goto cmd_ill; + } + + riscv_iommu_ats_inval(s, &cmd); + break; + + case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_PRGR, + RISCV_IOMMU_CMD_ATS_OPCODE): + if (!s->enable_ats) { + goto cmd_ill; + } + + riscv_iommu_ats_prgr(s, &cmd); + break; + default: cmd_ill: /* Invalid instruction, do not advance instruction index. */ @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) if (s->enable_msi) { s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF; } + if (s->enable_ats) { + s->cap |= RISCV_IOMMU_CAP_ATS; + } if (s->enable_s_stage) { s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 | RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57; @@ -XXX,XX +XXX,XX @@ static Property riscv_iommu_properties[] = { DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit, LIMIT_CACHE_IOT), DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE), + DEFINE_PROP_BOOL("ats", RISCVIOMMUState, enable_ats, TRUE), DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE), DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE), DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE), diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.h +++ b/hw/riscv/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ struct RISCVIOMMUState { bool enable_off; /* Enable out-of-reset OFF mode (DMA disabled) */ bool enable_msi; /* Enable MSI remapping */ + bool enable_ats; /* Enable ATS support */ bool enable_s_stage; /* Enable S/VS-Stage translation */ bool enable_g_stage; /* Enable G-Stage translation */ diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/trace-events +++ b/hw/riscv/trace-events @@ -XXX,XX +XXX,XX @@ riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added" riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed" riscv_iommu_notify_int_vector(uint32_t cause, uint32_t vector) "Interrupt cause 0x%x sent via vector 0x%x" riscv_iommu_icvec_write(uint32_t orig, uint32_t actual) "ICVEC write: incoming 0x%x actual 0x%x" +riscv_iommu_ats(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: translate request %04x:%02x.%u iova: 0x%"PRIx64 +riscv_iommu_ats_inval(const char *id) "%s: dev-iotlb invalidate" +riscv_iommu_ats_prgr(const char *id) "%s: dev-iotlb page request group response" -- 2.45.2
From: Tomasz Jeznach <tjeznach@rivosinc.com> DBG support adds three additional registers: tr_req_iova, tr_req_ctl and tr_response. The DBG cap is always enabled. No on/off toggle is provided for it. Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> --- hw/riscv/riscv-iommu-bits.h | 17 +++++++++++ hw/riscv/riscv-iommu.c | 59 +++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+) diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu-bits.h +++ b/hw/riscv/riscv-iommu-bits.h @@ -XXX,XX +XXX,XX @@ struct riscv_iommu_pq_record { #define RISCV_IOMMU_CAP_ATS BIT_ULL(25) #define RISCV_IOMMU_CAP_T2GPA BIT_ULL(26) #define RISCV_IOMMU_CAP_IGS GENMASK_ULL(29, 28) +#define RISCV_IOMMU_CAP_DBG BIT_ULL(31) #define RISCV_IOMMU_CAP_PAS GENMASK_ULL(37, 32) #define RISCV_IOMMU_CAP_PD8 BIT_ULL(38) #define RISCV_IOMMU_CAP_PD17 BIT_ULL(39) @@ -XXX,XX +XXX,XX @@ enum { RISCV_IOMMU_INTR_COUNT }; +/* 5.24 Translation request IOVA (64bits) */ +#define RISCV_IOMMU_REG_TR_REQ_IOVA 0x0258 + +/* 5.25 Translation request control (64bits) */ +#define RISCV_IOMMU_REG_TR_REQ_CTL 0x0260 +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY BIT_ULL(0) +#define RISCV_IOMMU_TR_REQ_CTL_NW BIT_ULL(3) +#define RISCV_IOMMU_TR_REQ_CTL_PID GENMASK_ULL(31, 12) +#define RISCV_IOMMU_TR_REQ_CTL_DID GENMASK_ULL(63, 40) + +/* 5.26 Translation request response (64bits) */ +#define RISCV_IOMMU_REG_TR_RESPONSE 0x0268 +#define RISCV_IOMMU_TR_RESPONSE_FAULT BIT_ULL(0) +#define RISCV_IOMMU_TR_RESPONSE_S BIT_ULL(9) +#define RISCV_IOMMU_TR_RESPONSE_PPN RISCV_IOMMU_PPN_FIELD + /* 5.27 Interrupt cause to vector (64bits) */ #define RISCV_IOMMU_REG_ICVEC 0x02F8 #define RISCV_IOMMU_ICVEC_CIV GENMASK_ULL(3, 0) diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/riscv/riscv-iommu.c +++ b/hw/riscv/riscv-iommu.c @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_process_pq_control(RISCVIOMMUState *s) riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr); } +static void riscv_iommu_process_dbg(RISCVIOMMUState *s) +{ + uint64_t iova = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_IOVA); + uint64_t ctrl = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_CTL); + unsigned devid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_DID); + unsigned pid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_PID); + RISCVIOMMUContext *ctx; + void *ref; + + if (!(ctrl & RISCV_IOMMU_TR_REQ_CTL_GO_BUSY)) { + return; + } + + ctx = riscv_iommu_ctx(s, devid, pid, &ref); + if (ctx == NULL) { + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, + RISCV_IOMMU_TR_RESPONSE_FAULT | + (RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED << 10)); + } else { + IOMMUTLBEntry iotlb = { + .iova = iova, + .perm = ctrl & RISCV_IOMMU_TR_REQ_CTL_NW ? IOMMU_RO : IOMMU_RW, + .addr_mask = ~0, + .target_as = NULL, + }; + int fault = riscv_iommu_translate(s, ctx, &iotlb, false); + if (fault) { + iova = RISCV_IOMMU_TR_RESPONSE_FAULT | (((uint64_t) fault) << 10); + } else { + iova = iotlb.translated_addr & ~iotlb.addr_mask; + iova >>= TARGET_PAGE_BITS; + iova &= RISCV_IOMMU_TR_RESPONSE_PPN; + + /* We do not support superpages (> 4kbs) for now */ + iova &= ~RISCV_IOMMU_TR_RESPONSE_S; + } + riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, iova); + } + + riscv_iommu_reg_mod64(s, RISCV_IOMMU_REG_TR_REQ_CTL, 0, + RISCV_IOMMU_TR_REQ_CTL_GO_BUSY); + riscv_iommu_ctx_put(s, ref); +} + typedef void riscv_iommu_process_fn(RISCVIOMMUState *s); static void riscv_iommu_update_icvec(RISCVIOMMUState *s, uint64_t data) @@ -XXX,XX +XXX,XX @@ static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr, return MEMTX_OK; + case RISCV_IOMMU_REG_TR_REQ_CTL: + process_fn = riscv_iommu_process_dbg; + regb = RISCV_IOMMU_REG_TR_REQ_CTL; + busy = RISCV_IOMMU_TR_REQ_CTL_GO_BUSY; + break; + default: break; } @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 | RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4; } + /* Enable translation debug interface */ + s->cap |= RISCV_IOMMU_CAP_DBG; + /* Report QEMU target physical address space limits */ s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS, TARGET_PHYS_ADDR_SPACE_BITS); @@ -XXX,XX +XXX,XX @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp) stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0); stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_ICVEC], 0); stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp); + /* If debug registers enabled. */ + if (s->cap & RISCV_IOMMU_CAP_DBG) { + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_IOVA], 0); + stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_CTL], + RISCV_IOMMU_TR_REQ_CTL_GO_BUSY); + } /* Memory region for downstream access, if specified. */ if (s->target_mr) { -- 2.45.2
Add an additional test to further exercise the IOMMU where we attempt to initialize the command, fault and page-request queues. These steps are taken from chapter 6.2 of the RISC-V IOMMU spec, "Guidelines for initialization". It emulates what we expect from the software/OS when initializing the IOMMU. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> --- tests/qtest/libqos/riscv-iommu.h | 30 ++++++++ tests/qtest/riscv-iommu-test.c | 125 +++++++++++++++++++++++++++++++ 2 files changed, 155 insertions(+) diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/libqos/riscv-iommu.h +++ b/tests/qtest/libqos/riscv-iommu.h @@ -XXX,XX +XXX,XX @@ #define RISCV_IOMMU_REG_IPSR 0x0054 +#define RISCV_IOMMU_REG_IVEC 0x02F8 +#define RISCV_IOMMU_REG_IVEC_CIV GENMASK_ULL(3, 0) +#define RISCV_IOMMU_REG_IVEC_FIV GENMASK_ULL(7, 4) +#define RISCV_IOMMU_REG_IVEC_PMIV GENMASK_ULL(11, 8) +#define RISCV_IOMMU_REG_IVEC_PIV GENMASK_ULL(15, 12) + +#define RISCV_IOMMU_REG_CQB 0x0018 +#define RISCV_IOMMU_CQB_PPN_START 10 +#define RISCV_IOMMU_CQB_PPN_LEN 44 +#define RISCV_IOMMU_CQB_LOG2SZ_START 0 +#define RISCV_IOMMU_CQB_LOG2SZ_LEN 5 + +#define RISCV_IOMMU_REG_CQT 0x0024 + +#define RISCV_IOMMU_REG_FQB 0x0028 +#define RISCV_IOMMU_FQB_PPN_START 10 +#define RISCV_IOMMU_FQB_PPN_LEN 44 +#define RISCV_IOMMU_FQB_LOG2SZ_START 0 +#define RISCV_IOMMU_FQB_LOG2SZ_LEN 5 + +#define RISCV_IOMMU_REG_FQT 0x0034 + +#define RISCV_IOMMU_REG_PQB 0x0038 +#define RISCV_IOMMU_PQB_PPN_START 10 +#define RISCV_IOMMU_PQB_PPN_LEN 44 +#define RISCV_IOMMU_PQB_LOG2SZ_START 0 +#define RISCV_IOMMU_PQB_LOG2SZ_LEN 5 + +#define RISCV_IOMMU_REG_PQT 0x0044 + typedef struct QRISCVIOMMU { QOSGraphObject obj; QPCIDevice dev; diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/riscv-iommu-test.c +++ b/tests/qtest/riscv-iommu-test.c @@ -XXX,XX +XXX,XX @@ static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset) return qpci_io_readq(&r_iommu->dev, r_iommu->reg_bar, reg_offset); } +static void riscv_iommu_write_reg32(QRISCVIOMMU *r_iommu, int reg_offset, + uint32_t val) +{ + qpci_io_writel(&r_iommu->dev, r_iommu->reg_bar, reg_offset, val); +} + +static void riscv_iommu_write_reg64(QRISCVIOMMU *r_iommu, int reg_offset, + uint64_t val) +{ + qpci_io_writeq(&r_iommu->dev, r_iommu->reg_bar, reg_offset, val); +} + static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc) { QRISCVIOMMU *r_iommu = obj; @@ -XXX,XX +XXX,XX @@ static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc) g_assert_cmpuint(reg, ==, 0); } +/* + * Common timeout-based poll for CQCSR, FQCSR and PQCSR. All + * their ON bits are mapped as RISCV_IOMMU_QUEUE_ACTIVE (16), + */ +static void qtest_wait_for_queue_active(QRISCVIOMMU *r_iommu, + uint32_t queue_csr) +{ + QTestState *qts = global_qtest; + guint64 timeout_us = 2 * 1000 * 1000; + gint64 start_time = g_get_monotonic_time(); + uint32_t reg; + + for (;;) { + qtest_clock_step(qts, 100); + + reg = riscv_iommu_read_reg32(r_iommu, queue_csr); + if (reg & RISCV_IOMMU_QUEUE_ACTIVE) { + break; + } + g_assert(g_get_monotonic_time() - start_time <= timeout_us); + } +} + +/* + * Goes through the queue activation procedures of chapter 6.2, + * "Guidelines for initialization", of the RISCV-IOMMU spec. + */ +static void test_iommu_init_queues(void *obj, void *data, + QGuestAllocator *t_alloc) +{ + QRISCVIOMMU *r_iommu = obj; + uint64_t reg64, q_addr; + uint32_t reg; + int k = 2; + + reg64 = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP); + g_assert_cmpuint(reg64 & RISCV_IOMMU_CAP_VERSION, ==, 0x10); + + /* + * Program the command queue. Write 0xF to civ, fiv, pmiv and + * piv. With the current PCI device impl we expect 2 writable + * bits for each (k = 2) since we have N = 4 total vectors (2^k). + */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC, 0xFFFF); + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_CIV, ==, 0x3); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_FIV, ==, 0x30); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_PMIV, ==, 0x300); + g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_PIV, ==, 0x3000); + + /* Alloc a 4*16 bytes buffer and use it to set cqb */ + q_addr = guest_alloc(t_alloc, 4 * 16); + reg64 = 0; + deposit64(reg64, RISCV_IOMMU_CQB_PPN_START, + RISCV_IOMMU_CQB_PPN_LEN, q_addr); + deposit64(reg64, RISCV_IOMMU_CQB_LOG2SZ_START, + RISCV_IOMMU_CQB_LOG2SZ_LEN, k - 1); + riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_CQB, reg64); + + /* cqt = 0, cqcsr.cqen = 1, poll cqcsr.cqon until it reads 1 */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQT, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR); + reg |= RISCV_IOMMU_CQCSR_CQEN; + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR, reg); + + qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_CQCSR); + + /* + * Program the fault queue. Alloc a 4*32 bytes (instead of 4*16) + * buffer and use it to set fqb. + */ + q_addr = guest_alloc(t_alloc, 4 * 32); + reg64 = 0; + deposit64(reg64, RISCV_IOMMU_FQB_PPN_START, + RISCV_IOMMU_FQB_PPN_LEN, q_addr); + deposit64(reg64, RISCV_IOMMU_FQB_LOG2SZ_START, + RISCV_IOMMU_FQB_LOG2SZ_LEN, k - 1); + riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_FQB, reg64); + + /* fqt = 0, fqcsr.fqen = 1, poll fqcsr.fqon until it reads 1 */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQT, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR); + reg |= RISCV_IOMMU_FQCSR_FQEN; + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR, reg); + + qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_FQCSR); + + /* + * Program the page-request queue. Alloc a 4*16 bytes buffer + * and use it to set pqb. + */ + q_addr = guest_alloc(t_alloc, 4 * 16); + reg64 = 0; + deposit64(reg64, RISCV_IOMMU_PQB_PPN_START, + RISCV_IOMMU_PQB_PPN_LEN, q_addr); + deposit64(reg64, RISCV_IOMMU_PQB_LOG2SZ_START, + RISCV_IOMMU_PQB_LOG2SZ_LEN, k - 1); + riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_PQB, reg64); + + /* pqt = 0, pqcsr.pqen = 1, poll pqcsr.pqon until it reads 1 */ + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQT, 0); + + reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR); + reg |= RISCV_IOMMU_PQCSR_PQEN; + riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR, reg); + + qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_PQCSR); +} + static void register_riscv_iommu_test(void) { qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL); qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL); + qos_add_test("iommu_init_queues", "riscv-iommu-pci", + test_iommu_init_queues, NULL); } libqos_init(register_riscv_iommu_test); -- 2.45.2
Add a simple guideline to use the existing RISC-V IOMMU support we just added. This doc will be updated once we add the riscv-iommu-sys device. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> --- docs/specs/index.rst | 1 + docs/specs/riscv-iommu.rst | 90 ++++++++++++++++++++++++++++++++++++++ docs/system/riscv/virt.rst | 13 ++++++ 3 files changed, 104 insertions(+) create mode 100644 docs/specs/riscv-iommu.rst diff --git a/docs/specs/index.rst b/docs/specs/index.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -XXX,XX +XXX,XX @@ guest hardware that is specific to QEMU. vmgenid rapl-msr rocker + riscv-iommu diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/docs/specs/riscv-iommu.rst @@ -XXX,XX +XXX,XX @@ +.. _riscv-iommu: + +RISC-V IOMMU support for RISC-V machines +======================================== + +QEMU implements a RISC-V IOMMU emulation based on the RISC-V IOMMU spec +version 1.0 `iommu1.0`_. + +The emulation includes a PCI reference device, riscv-iommu-pci, that QEMU +RISC-V boards can use. The 'virt' RISC-V machine is compatible with this +device. + +riscv-iommu-pci reference device +-------------------------------- + +This device implements the RISC-V IOMMU emulation as recommended by the section +"Integrating an IOMMU as a PCIe device" of `iommu1.0`_: a PCI device with base +class 08h, sub-class 06h and programming interface 00h. + +As a reference device it doesn't implement anything outside of the specification, +so it uses a generic default PCI ID given by QEMU: 1b36:0014. + +To include the device in the 'virt' machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -device riscv-iommu-pci,[optional_pci_opts] (...) + +This will add a RISC-V IOMMU PCI device in the board following any additional +PCI parameters (like PCI bus address). The behavior of the RISC-V IOMMU is +defined by the spec but its operation is OS dependent. + +As of this writing the existing Linux kernel support `linux-v8`_, not yet merged, +does not have support for features like VFIO passthrough. The IOMMU emulation +was tested using a public Ventana Micro Systems kernel repository in +`ventana-linux`_. This kernel is based on `linux-v8`_ with additional patches that +enable features like KVM VFIO passthrough with irqbypass. Until the kernel support +is feature complete feel free to use the kernel available in the Ventana Micro Systems +mirror. + +The current Linux kernel support will use the IOMMU device to create IOMMU groups +with any eligible cards available in the system, regardless of factors such as the +order in which the devices are added in the command line. + +This means that these command lines are equivalent as far as the current +IOMMU kernel driver behaves: + +.. code-block:: bash + + $ qemu-system-riscv64 \ + -M virt,aia=aplic-imsic,aia-guests=5 \ + -device riscv-iommu-pci,addr=1.0,vendor-id=0x1efd,device-id=0xedf1 \ + -device e1000e,netdev=net1 -netdev user,id=net1,net=192.168.0.0/24 \ + -device e1000e,netdev=net2 -netdev user,id=net2,net=192.168.200.0/24 \ + (...) + + $ qemu-system-riscv64 \ + -M virt,aia=aplic-imsic,aia-guests=5 \ + -device e1000e,netdev=net1 -netdev user,id=net1,net=192.168.0.0/24 \ + -device e1000e,netdev=net2 -netdev user,id=net2,net=192.168.200.0/24 \ + -device riscv-iommu-pci,addr=1.0,vendor-id=0x1efd,device-id=0xedf1 \ + (...) + +Both will create iommu groups for the two e1000e cards. + +Another thing to notice on `linux-v8`_ and `ventana-linux`_ is that the kernel driver +considers an IOMMU identified as a Rivos device, i.e. it uses Rivos vendor ID. To +use the riscv-iommu-pci device with the existing kernel support we need to emulate +a Rivos PCI IOMMU by setting 'vendor-id' and 'device-id': + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt \ + -device riscv-iommu-pci,vendor-id=0x1efd,device-id=0xedf1 (...) + +Several options are available to control the capabilities of the device, namely: + +- "bus": the bus that the IOMMU device uses +- "ioatc-limit": size of the Address Translation Cache (default to 2Mb) +- "intremap": enable/disable MSI support +- "ats": enable ATS support +- "off" (Out-of-reset translation mode: 'on' for DMA disabled, 'off' for 'BARE' (passthrough)) +- "s-stage": enable s-stage support +- "g-stage": enable g-stage support + +.. _iommu1.0: https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf + +.. _linux-v8: https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjeznach@rivosinc.com/ + +.. _ventana-linux: https://github.com/ventanamicro/linux/tree/dev-upstream diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/riscv/virt.rst +++ b/docs/system/riscv/virt.rst @@ -XXX,XX +XXX,XX @@ none``, as in Firmware images used for pflash must be exactly 32 MiB in size. +riscv-iommu support +------------------- + +The board has support for the riscv-iommu-pci device by using the following +command line: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -device riscv-iommu-pci (...) + +Refer to :ref:`riscv-iommu` for more information on how the RISC-V IOMMU support +works. + Machine-specific options ------------------------ -- 2.45.2