:p
atchew
Login
In this patch series are introduced necessary functions to build and manage RISC-V guest page tables and MMIO/RAM mappings. --- Changes in V3: - Introduce metadata table to store P2M types. - Use x86's way to allocate VMID. - Abstract Arm-specific p2m type name for device MMIO mappings. - All other updates please look at specific patch. --- Changes in V2: - Merged to staging: - [PATCH v1 1/6] xen/riscv: add inclusion of xen/bitops.h to asm/cmpxchg.h - New patches: - xen/riscv: implement sbi_remote_hfence_gvma{_vmid}(). - Split patch "xen/riscv: implement p2m mapping functionality" into smaller one patches: - xen/riscv: introduce page_set_xenheap_gfn() - xen/riscv: implement guest_physmap_add_entry() for mapping GFNs to MFNs - xen/riscv: implement p2m_set_entry() and __p2m_set_entry() - xen/riscv: Implement p2m_free_entry() and related helpers - xen/riscv: Implement superpage splitting for p2m mappings - xen/riscv: implement p2m_next_level() - xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration - Move root p2m table allocation to separate patch: xen/riscv: add root page table allocation - Drop dependency of this patch series from the patch witn an introduction of SvPBMT as it was merged. - Patch "[PATCH v1 4/6] xen/riscv: define pt_t and pt_walk_t structures" was renamed to xen/riscv: introduce pte_{set,get}_mfn() as after dropping of bitfields for PTE structure, this patch introduce only pte_{set,get}_mfn(). - Rename "xen/riscv: define pt_t and pt_walk_t structures" to "xen/riscv: introduce pte_{set,get}_mfn()" as pt_t and pt_walk_t were dropped. - Introduce guest domain's VMID allocation and manegement. - Add patches necessary to implement p2m lookup: - xen/riscv: implement mfn_valid() and page reference, ownership handling helpers - xen/riscv: add support of page lookup by GFN - Re-sort patch series. - All other changes are patch-specific. Please check them. --- Oleksii Kurochko (20): xen/riscv: implement sbi_remote_hfence_gvma() xen/riscv: introduce sbi_remote_hfence_gvma_vmid() xen/riscv: introduce VMID allocation and manegement xen/riscv: introduce things necessary for p2m initialization xen/riscv: construct the P2M pages pool for guests xen/riscv: add root page table allocation xen/riscv: introduce pte_{set,get}_mfn() xen/riscv: add new p2m types and helper macros for type classification xen/dom0less: abstract Arm-specific p2m type name for device MMIO mappings xen/riscv: introduce page_{get,set}_xenheap_gfn() xen/riscv: implement function to map memory in guest p2m xen/riscv: implement p2m_set_range() xen/riscv: Implement p2m_free_subtree() and related helpers xen/riscv: Implement p2m_pte_from_mfn() and support PBMT configuration xen/riscv: implement p2m_next_level() xen/riscv: Implement superpage splitting for p2m mappings xen/riscv: implement put_page() xen/riscv: implement mfn_valid() and page reference, ownership handling helpers xen/riscv: add support of page lookup by GFN xen/riscv: introduce metadata table to store P2M type xen/arch/arm/include/asm/p2m.h | 2 + xen/arch/riscv/Makefile | 3 + xen/arch/riscv/include/asm/Makefile | 1 - xen/arch/riscv/include/asm/domain.h | 23 + xen/arch/riscv/include/asm/flushtlb.h | 5 + xen/arch/riscv/include/asm/mm.h | 72 +- xen/arch/riscv/include/asm/p2m.h | 145 ++- xen/arch/riscv/include/asm/page.h | 38 + xen/arch/riscv/include/asm/paging.h | 19 + xen/arch/riscv/include/asm/riscv_encoding.h | 6 + xen/arch/riscv/include/asm/sbi.h | 32 + xen/arch/riscv/include/asm/vmid.h | 8 + xen/arch/riscv/mm.c | 73 +- xen/arch/riscv/p2m.c | 1107 +++++++++++++++++++ xen/arch/riscv/paging.c | 112 ++ xen/arch/riscv/sbi.c | 14 + xen/arch/riscv/setup.c | 3 + xen/arch/riscv/vmid.c | 165 +++ xen/common/device-tree/dom0less-build.c | 2 +- 19 files changed, 1809 insertions(+), 21 deletions(-) create mode 100644 xen/arch/riscv/include/asm/paging.h create mode 100644 xen/arch/riscv/include/asm/vmid.h create mode 100644 xen/arch/riscv/p2m.c create mode 100644 xen/arch/riscv/paging.c create mode 100644 xen/arch/riscv/vmid.c -- 2.50.1
Instruct the remote harts to execute one or more HFENCE.GVMA instructions, covering the range of guest physical addresses between start_addr and start_addr + size for all VMIDs. The remote fence operation applies to the entire address space if either: - start_addr and size are both 0, or - size is equal to 2^XLEN-1. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Update the comment message above declaration of sbi_remote_hfence_gvma() and update the commit message in sync. - Drop ASSERT() in sbi_remote_hfence_gvma(). --- xen/arch/riscv/include/asm/sbi.h | 19 +++++++++++++++++++ xen/arch/riscv/sbi.c | 7 +++++++ 2 files changed, 26 insertions(+) diff --git a/xen/arch/riscv/include/asm/sbi.h b/xen/arch/riscv/include/asm/sbi.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/sbi.h +++ b/xen/arch/riscv/include/asm/sbi.h @@ -XXX,XX +XXX,XX @@ bool sbi_has_rfence(void); int sbi_remote_sfence_vma(const cpumask_t *cpu_mask, vaddr_t start, size_t size); +/* + * Instructs the remote harts to execute one or more HFENCE.GVMA + * instructions, covering the range of guest physical addresses + * between start_addr and start_addr + size for all VMIDs. + * + * Returns 0 if IPI was sent to all the targeted harts successfully + * or negative value if start_addr or size is not valid. + * + * The remote fence operation applies to the entire address space if either: + * - start_addr and size are both 0, or + * - size is equal to 2^XLEN-1. + * + * @cpu_mask a cpu mask containing all the target CPUs (in Xen space). + * @param start virtual address start + * @param size virtual address range size + */ +int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start, + size_t size); + /* * Initialize SBI library * diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/sbi.c +++ b/xen/arch/riscv/sbi.c @@ -XXX,XX +XXX,XX @@ int sbi_remote_sfence_vma(const cpumask_t *cpu_mask, vaddr_t start, cpu_mask, start, size, 0, 0); } +int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start, + size_t size) +{ + return sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA, + cpu_mask, start, size, 0, 0); +} + /* This function must always succeed. */ #define sbi_get_spec_version() \ sbi_ext_base_func(SBI_EXT_BASE_GET_SPEC_VERSION) -- 2.50.1
It instructs the remote harts to execute one or more HFENCE.GVMA instructions by making an SBI call, covering the range of guest physical addresses between start_addr and start_addr + size only for the given VMID. The remote fence operation applies to the entire address space if either: - start_addr and size are both 0, or - size is equal to 2^XLEN-1. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Drop ASSERT() in sbi_remote_hfence_gvma_vmid() as failure will happen anyway if rfence isn't initialized. - Drop "This function call is only valid for harts implementing hypervisor extension." from the commit message and the comment above the declaration of sbi_remote_hfence_gvma_vmid(). - Use proper FID for sbi_remote_hfence_gvma_vmid(). --- Changes in V2: - New patch. --- xen/arch/riscv/include/asm/sbi.h | 13 +++++++++++++ xen/arch/riscv/sbi.c | 7 +++++++ 2 files changed, 20 insertions(+) diff --git a/xen/arch/riscv/include/asm/sbi.h b/xen/arch/riscv/include/asm/sbi.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/sbi.h +++ b/xen/arch/riscv/include/asm/sbi.h @@ -XXX,XX +XXX,XX @@ int sbi_remote_sfence_vma(const cpumask_t *cpu_mask, vaddr_t start, int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start, size_t size); +/* + * Instruct the remote harts to execute one or more HFENCE.GVMA instructions, + * covering the range of guest physical addresses between start_addr and + * start_addr + size only for the given VMID. + * + * @cpu_mask a cpu mask containing all the target CPUs (in Xen space). + * @param start virtual address start + * @param size virtual address range size + * @param vmid virtual machine id + */ +int sbi_remote_hfence_gvma_vmid(const cpumask_t *cpu_mask, vaddr_t start, + size_t size, unsigned long vmid); + /* * Initialize SBI library * diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/sbi.c +++ b/xen/arch/riscv/sbi.c @@ -XXX,XX +XXX,XX @@ int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start, cpu_mask, start, size, 0, 0); } +int sbi_remote_hfence_gvma_vmid(const cpumask_t *cpu_mask, vaddr_t start, + size_t size, unsigned long vmid) +{ + return sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID, + cpu_mask, start, size, vmid, 0); +} + /* This function must always succeed. */ #define sbi_get_spec_version() \ sbi_ext_base_func(SBI_EXT_BASE_GET_SPEC_VERSION) -- 2.50.1
Current implementation is based on x86's way to allocate VMIDs: VMIDs partition the physical TLB. In the current implementation VMIDs are introduced to reduce the number of TLB flushes. Each time the guest's virtual address space changes, instead of flushing the TLB, a new VMID is assigned. This reduces the number of TLB flushes to at most 1/#VMIDs. The biggest advantage is that hot parts of the hypervisor's code and data retain in the TLB. VMIDs are a hart-local resource. As preemption of VMIDs is not possible, VMIDs are assigned in a round-robin scheme. To minimize the overhead of VMID invalidation, at the time of a TLB flush, VMIDs are tagged with a 64-bit generation. Only on a generation overflow the code needs to invalidate all VMID information stored at the VCPUs with are run on the specific physical processor. This overflow appears after about 2^80 host processor cycles, so we do not optimize this case, but simply disable VMID useage to retain correctness. Only minor changes are made compared to the x86 implementation. These include using RISC-V-specific terminology, adding a check to ensure the type used for storing the VMID has enough bits to hold VMIDLEN, and introducing a new function vmidlen_detect() to clarify the VMIDLEN value. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Reimplemnt VMID allocation similar to what x86 has implemented. --- Changes in V2: - New patch. --- xen/arch/riscv/Makefile | 1 + xen/arch/riscv/include/asm/domain.h | 6 + xen/arch/riscv/include/asm/flushtlb.h | 5 + xen/arch/riscv/include/asm/vmid.h | 8 ++ xen/arch/riscv/setup.c | 3 + xen/arch/riscv/vmid.c | 165 ++++++++++++++++++++++++++ 6 files changed, 188 insertions(+) create mode 100644 xen/arch/riscv/include/asm/vmid.h create mode 100644 xen/arch/riscv/vmid.c diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/Makefile +++ b/xen/arch/riscv/Makefile @@ -XXX,XX +XXX,XX @@ obj-y += smpboot.o obj-y += stubs.o obj-y += time.o obj-y += traps.o +obj-y += vmid.o obj-y += vm_event.o $(TARGET): $(TARGET)-syms diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/domain.h +++ b/xen/arch/riscv/include/asm/domain.h @@ -XXX,XX +XXX,XX @@ #include <xen/xmalloc.h> #include <public/hvm/params.h> +struct vcpu_vmid { + uint64_t generation; + uint16_t vmid; +}; + struct hvm_domain { uint64_t params[HVM_NR_PARAMS]; @@ -XXX,XX +XXX,XX @@ struct arch_vcpu_io { }; struct arch_vcpu { + struct vcpu_vmid vmid; }; struct arch_domain { diff --git a/xen/arch/riscv/include/asm/flushtlb.h b/xen/arch/riscv/include/asm/flushtlb.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/flushtlb.h +++ b/xen/arch/riscv/include/asm/flushtlb.h @@ -XXX,XX +XXX,XX @@ #include <asm/sbi.h> +static inline void local_hfence_gvma_all(void) +{ + asm volatile ( "hfence.gvma zero, zero" ::: "memory" ); +} + /* Flush TLB of local processor for address va. */ static inline void flush_tlb_one_local(vaddr_t va) { diff --git a/xen/arch/riscv/include/asm/vmid.h b/xen/arch/riscv/include/asm/vmid.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/riscv/include/asm/vmid.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef ASM_RISCV_VMID_H +#define ASM_RISCV_VMID_H + +void vmid_init(void); + +#endif /* ASM_RISCV_VMID_H */ diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/setup.c +++ b/xen/arch/riscv/setup.c @@ -XXX,XX +XXX,XX @@ #include <asm/sbi.h> #include <asm/setup.h> #include <asm/traps.h> +#include <asm/vmid.h> /* Xen stack for bringing up the first CPU. */ unsigned char __initdata cpu0_boot_stack[STACK_SIZE] @@ -XXX,XX +XXX,XX @@ void __init noreturn start_xen(unsigned long bootcpu_id, console_init_postirq(); + vmid_init(); + printk("All set up\n"); machine_halt(); diff --git a/xen/arch/riscv/vmid.c b/xen/arch/riscv/vmid.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/riscv/vmid.c @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#include <xen/domain.h> +#include <xen/init.h> +#include <xen/sections.h> +#include <xen/lib.h> +#include <xen/param.h> +#include <xen/percpu.h> + +#include <asm/atomic.h> +#include <asm/csr.h> +#include <asm/flushtlb.h> + +/* Xen command-line option to enable VMIDs */ +static bool __read_mostly opt_vmid_enabled = true; +boolean_param("vmid", opt_vmid_enabled); + +/* + * VMIDs partition the physical TLB. In the current implementation VMIDs are + * introduced to reduce the number of TLB flushes. Each time the guest's + * virtual address space changes, instead of flushing the TLB, a new VMID is + * assigned. This reduces the number of TLB flushes to at most 1/#VMIDs. + * The biggest advantage is that hot parts of the hypervisor's code and data + * retain in the TLB. + * + * Sketch of the Implementation: + * + * VMIDs are a hart-local resource. As preemption of VMIDs is not possible, + * VMIDs are assigned in a round-robin scheme. To minimize the overhead of + * VMID invalidation, at the time of a TLB flush, VMIDs are tagged with a + * 64-bit generation. Only on a generation overflow the code needs to + * invalidate all VMID information stored at the VCPUs with are run on the + * specific physical processor. This overflow appears after about 2^80 + * host processor cycles, so we do not optimize this case, but simply disable + * VMID useage to retain correctness. + */ + +/* Per-Hart VMID management. */ +struct vmid_data { + uint64_t hart_vmid_generation; + uint16_t next_vmid; + uint16_t max_vmid; + bool disabled; +}; + +static DEFINE_PER_CPU(struct vmid_data, vmid_data); + +static unsigned long vmidlen_detect(void) +{ + unsigned long vmid_bits; + unsigned long old; + + /* Figure-out number of VMID bits in HW */ + old = csr_read(CSR_HGATP); + + csr_write(CSR_HGATP, old | HGATP_VMID_MASK); + vmid_bits = csr_read(CSR_HGATP); + vmid_bits = MASK_EXTR(vmid_bits, HGATP_VMID_MASK); + vmid_bits = flsl(vmid_bits); + csr_write(CSR_HGATP, old); + + /* + * We polluted local TLB so flush all guest TLB as + * a speculative access can happen at any time. + */ + local_hfence_gvma_all(); + + return vmid_bits; +} + +void vmid_init(void) +{ + static bool g_disabled = false; + + unsigned long vmid_len = vmidlen_detect(); + struct vmid_data *data = &this_cpu(vmid_data); + unsigned long max_availalbe_bits = sizeof(data->max_vmid) << 3; + + if ( vmid_len > max_availalbe_bits ) + panic("%s: VMIDLEN is bigger then a type which represent VMID: %lu(%lu)\n", + __func__, vmid_len, max_availalbe_bits); + + data->max_vmid = BIT(vmid_len, U) - 1; + data->disabled = !opt_vmid_enabled || (vmid_len <= 1); + + if ( g_disabled != data->disabled ) + { + printk("%s: VMIDs %sabled.\n", __func__, + data->disabled ? "dis" : "en"); + if ( !g_disabled ) + g_disabled = data->disabled; + } + + /* Zero indicates 'invalid generation', so we start the count at one. */ + data->hart_vmid_generation = 1; + + /* Zero indicates 'VMIDs disabled', so we start the count at one. */ + data->next_vmid = 1; +} + +void vcpu_vmid_flush_vcpu(struct vcpu *v) +{ + write_atomic(&v->arch.vmid.generation, 0); +} + +void vmid_flush_hart(void) +{ + struct vmid_data *data = &this_cpu(vmid_data); + + if ( data->disabled ) + return; + + if ( likely(++data->hart_vmid_generation != 0) ) + return; + + /* + * VMID generations are 64 bit. Overflow of generations never happens. + * For safety, we simply disable ASIDs, so correctness is established; it + * only runs a bit slower. + */ + printk("%s: VMID generation overrun. Disabling VMIDs.\n", __func__); + data->disabled = 1; +} + +bool vmid_handle_vmenter(struct vcpu_vmid *vmid) +{ + struct vmid_data *data = &this_cpu(vmid_data); + + /* Test if VCPU has valid VMID. */ + if ( read_atomic(&vmid->generation) == data->hart_vmid_generation ) + return 0; + + /* If there are no free VMIDs, need to go to a new generation. */ + if ( unlikely(data->next_vmid > data->max_vmid) ) + { + vmid_flush_hart(); + data->next_vmid = 1; + if ( data->disabled ) + goto disabled; + } + + /* Now guaranteed to be a free VMID. */ + vmid->vmid = data->next_vmid++; + write_atomic(&vmid->generation, data->hart_vmid_generation); + + /* + * When we assign VMID 1, flush all TLB entries as we are starting a new + * generation, and all old VMID allocations are now stale. + */ + return (vmid->vmid == 1); + + disabled: + vmid->vmid = 0; + return 0; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.50.1
Introduce the following things: - Update p2m_domain structure, which describe per p2m-table state, with: - lock to protect updates to p2m. - pool with pages used to construct p2m. - clean_pte which indicate if it is requires to clean the cache when writing an entry. - back pointer to domain structure. - p2m_init() to initalize members introduced in p2m_domain structure. - Call of paging_domain_init() in p2m_init() to initlize paging spinlock and freelist head. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - s/p2m_type/p2m_types. - Drop init. of p2m->clean_pte in p2m_init() as CONFIG_HAS_PASSTHROUGH is going to be selected unconditionaly. Plus CONFIG_HAS_PASSTHROUGH isn't ready to be used for RISC-V. Add compilation error to not forget to init p2m->clean_pte. - Move defintion of p2m->domain up in p2m_init(). - Add iommu_use_hap_pt() when p2m->clean_pte is initialized. - Add the comment above p2m_types member of p2m_domain struct. - Add need_flush member to p2m_domain structure. - Move introduction of p2m_write_(un)lock() and p2m_tlb_flush_sync() to the patch where they are really used: xen/riscv: implement guest_physmap_add_entry() for mapping GFNs to MFN - Add p2m member to arch_domain structure. - Drop p2m_types from struct p2m_domain as P2M type for PTE will be stored differently. - Drop default_access as it isn't going to be used for now. - Move defintion of p2m_is_write_locked() to "implement function to map memory in guest p2m" where it is really used. --- Changes in V2: - Use introduced erlier sbi_remote_hfence_gvma_vmid() for proper implementation of p2m_force_tlb_flush_sync() as TLB flushing needs to happen for each pCPU which potentially has cached a mapping, what is tracked by d->dirty_cpumask. - Drop unnecessary blanks. - Fix code style for # of pre-processor directive. - Drop max_mapped_gfn and lowest_mapped_gfn as they aren't used now. - [p2m_init()] Set p2m->clean_pte=false if CONFIG_HAS_PASSTHROUGH=n. - [p2m_init()] Update the comment above p2m->domain = d; - Drop p2m->need_flush as it seems to be always true for RISC-V and as a consequence drop p2m_tlb_flush_sync(). - Move to separate patch an introduction of root page table allocation. --- xen/arch/riscv/Makefile | 1 + xen/arch/riscv/include/asm/domain.h | 5 +++++ xen/arch/riscv/include/asm/p2m.h | 34 +++++++++++++++++++++++++++++ xen/arch/riscv/p2m.c | 32 +++++++++++++++++++++++++++ 4 files changed, 72 insertions(+) create mode 100644 xen/arch/riscv/p2m.c diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/Makefile +++ b/xen/arch/riscv/Makefile @@ -XXX,XX +XXX,XX @@ obj-y += intc.o obj-y += irq.o obj-y += mm.o obj-y += pt.o +obj-y += p2m.o obj-$(CONFIG_RISCV_64) += riscv64/ obj-y += sbi.o obj-y += setup.o diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/domain.h +++ b/xen/arch/riscv/include/asm/domain.h @@ -XXX,XX +XXX,XX @@ #include <xen/xmalloc.h> #include <public/hvm/params.h> +#include <asm/p2m.h> + struct vcpu_vmid { uint64_t generation; uint16_t vmid; @@ -XXX,XX +XXX,XX @@ struct arch_vcpu { struct arch_domain { struct hvm_domain hvm; + + /* Virtual MMU */ + struct p2m_domain p2m; }; #include <xen/sched.h> diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ #define ASM__RISCV__P2M_H #include <xen/errno.h> +#include <xen/mm.h> +#include <xen/rwlock.h> +#include <xen/types.h> #include <asm/page-bits.h> #define paddr_bits PADDR_BITS +/* Get host p2m table */ +#define p2m_get_hostp2m(d) (&(d)->arch.p2m) + +/* Per-p2m-table state */ +struct p2m_domain { + /* + * Lock that protects updates to the p2m. + */ + rwlock_t lock; + + /* Pages used to construct the p2m */ + struct page_list_head pages; + + /* Indicate if it is required to clean the cache when writing an entry */ + bool clean_pte; + + /* Back pointer to domain */ + struct domain *domain; + + /* + * P2M updates may required TLBs to be flushed (invalidated). + * + * Flushes may be deferred by setting 'need_flush' and then flushing + * when the p2m write lock is released. + * + * If an immediate flush is required (e.g, if a super page is + * shattered), call p2m_tlb_flush_sync(). + */ + bool need_flush; +}; + /* * List of possible type for each page in the p2m entry. * The number of available bit per page in the pte for this purpose is 2 bits. diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ +#include <xen/mm.h> +#include <xen/rwlock.h> +#include <xen/sched.h> + +int p2m_init(struct domain *d) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + /* + * "Trivial" initialisation is now complete. Set the backpointer so the + * users of p2m could get an access to domain structure. + */ + p2m->domain = d; + + rwlock_init(&p2m->lock); + INIT_PAGE_LIST_HEAD(&p2m->pages); + + /* + * Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH + * is not ready for RISC-V support. + * + * When CONFIG_HAS_PASSTHROUGH=y, p2m->clean_pte must be properly + * initialized. + * At the moment, it defaults to false because the p2m structure is + * zero-initialized. + */ +#ifdef CONFIG_HAS_PASSTHROUGH +# error "Add init of p2m->clean_pte" +#endif + + return 0; +} -- 2.50.1
Implement p2m_set_allocation() to construct p2m pages pool for guests based on required number of pages. This is implemented by: - Adding a `struct paging_domain` which contains a freelist, a counter variable and a spinlock to `struct arch_domain` to indicate the free p2m pages and the number of p2m total pages in the p2m pages pool. - Adding a helper `p2m_set_allocation` to set the p2m pages pool size. This helper should be called before allocating memory for a guest and is called from domain_p2m_set_allocation(), the latter is a part of common dom0less code. - Adding paging_freelist_init() to struct paging_domain. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in v3: - Drop usage of p2m_ prefix inside struct paging_domain(). - Introduce paging_domain_init() to init paging struct. --- Changes in v2: - Drop the comment above inclusion of <xen/event.h> in riscv/p2m.c. - Use ACCESS_ONCE() for lhs and rhs for the expressions in p2m_set_allocation(). --- xen/arch/riscv/Makefile | 1 + xen/arch/riscv/include/asm/Makefile | 1 - xen/arch/riscv/include/asm/domain.h | 12 ++++++ xen/arch/riscv/include/asm/paging.h | 13 ++++++ xen/arch/riscv/p2m.c | 19 +++++++++ xen/arch/riscv/paging.c | 64 +++++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-) create mode 100644 xen/arch/riscv/include/asm/paging.h create mode 100644 xen/arch/riscv/paging.c diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/Makefile +++ b/xen/arch/riscv/Makefile @@ -XXX,XX +XXX,XX @@ obj-y += imsic.o obj-y += intc.o obj-y += irq.o obj-y += mm.o +obj-y += paging.o obj-y += pt.o obj-y += p2m.o obj-$(CONFIG_RISCV_64) += riscv64/ diff --git a/xen/arch/riscv/include/asm/Makefile b/xen/arch/riscv/include/asm/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/Makefile +++ b/xen/arch/riscv/include/asm/Makefile @@ -XXX,XX +XXX,XX @@ generic-y += hardirq.h generic-y += hypercall.h generic-y += iocap.h generic-y += irq-dt.h -generic-y += paging.h generic-y += percpu.h generic-y += perfc_defn.h generic-y += random.h diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/domain.h +++ b/xen/arch/riscv/include/asm/domain.h @@ -XXX,XX +XXX,XX @@ #ifndef ASM__RISCV__DOMAIN_H #define ASM__RISCV__DOMAIN_H +#include <xen/mm.h> +#include <xen/spinlock.h> #include <xen/xmalloc.h> #include <public/hvm/params.h> @@ -XXX,XX +XXX,XX @@ struct arch_vcpu { struct vcpu_vmid vmid; }; +struct paging_domain { + spinlock_t lock; + /* Free pages from the pre-allocated pool */ + struct page_list_head freelist; + /* Number of pages from the pre-allocated pool */ + unsigned long total_pages; +}; + struct arch_domain { struct hvm_domain hvm; /* Virtual MMU */ struct p2m_domain p2m; + + struct paging_domain paging; }; #include <xen/sched.h> diff --git a/xen/arch/riscv/include/asm/paging.h b/xen/arch/riscv/include/asm/paging.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/riscv/include/asm/paging.h @@ -XXX,XX +XXX,XX @@ +#ifndef ASM_RISCV_PAGING_H +#define ASM_RISCV_PAGING_H + +#include <asm-generic/paging.h> + +struct domain; + +int paging_domain_init(struct domain *d); + +int paging_freelist_init(struct domain *d, unsigned long pages, + bool *preempted); + +#endif /* ASM_RISCV_PAGING_H */ diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ #include <xen/rwlock.h> #include <xen/sched.h> +#include <asm/paging.h> + int p2m_init(struct domain *d) { struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -XXX,XX +XXX,XX @@ int p2m_init(struct domain *d) */ p2m->domain = d; + paging_domain_init(d); + rwlock_init(&p2m->lock); INIT_PAGE_LIST_HEAD(&p2m->pages); @@ -XXX,XX +XXX,XX @@ int p2m_init(struct domain *d) return 0; } + +/* + * Set the pool of pages to the required number of pages. + * Returns 0 for success, non-zero for failure. + * Call with d->arch.paging.lock held. + */ +int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted) +{ + int rc; + + if ( (rc = paging_freelist_init(d, pages, preempted)) ) + return rc; + + return 0; +} diff --git a/xen/arch/riscv/paging.c b/xen/arch/riscv/paging.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/riscv/paging.c @@ -XXX,XX +XXX,XX @@ +#include <xen/event.h> +#include <xen/lib.h> +#include <xen/mm.h> +#include <xen/sched.h> +#include <xen/spinlock.h> + +int paging_freelist_init(struct domain *d, unsigned long pages, + bool *preempted) +{ + struct page_info *pg; + + ASSERT(spin_is_locked(&d->arch.paging.lock)); + + for ( ; ; ) + { + if ( d->arch.paging.total_pages < pages ) + { + /* Need to allocate more memory from domheap */ + pg = alloc_domheap_page(d, MEMF_no_owner); + if ( pg == NULL ) + { + printk(XENLOG_ERR "Failed to allocate pages.\n"); + return -ENOMEM; + } + ACCESS_ONCE(d->arch.paging.total_pages)++; + page_list_add_tail(pg, &d->arch.paging.freelist); + } + else if ( d->arch.paging.total_pages > pages ) + { + /* Need to return memory to domheap */ + pg = page_list_remove_head(&d->arch.paging.freelist); + if ( pg ) + { + ACCESS_ONCE(d->arch.paging.total_pages)--; + free_domheap_page(pg); + } + else + { + printk(XENLOG_ERR + "Failed to free pages, freelist is empty.\n"); + return -ENOMEM; + } + } + else + break; + + /* Check to see if we need to yield and try again */ + if ( preempted && general_preempt_check() ) + { + *preempted = true; + return -ERESTART; + } + } + + return 0; +} +/* Domain paging struct initialization. */ +int paging_domain_init(struct domain *d) +{ + spin_lock_init(&d->arch.paging.lock); + INIT_PAGE_LIST_HEAD(&d->arch.paging.freelist); + + return 0; +} -- 2.50.1
Introduce support for allocating and initializing the root page table required for RISC-V stage-2 address translation. To implement root page table allocation the following is introduced: - p2m_get_clean_page() and p2m_alloc_root_table(), p2m_allocate_root() helpers to allocate and zero a 16 KiB root page table, as mandated by the RISC-V privileged specification for Sv32x4/Sv39x4/Sv48x4/Sv57x4 modes. - Update p2m_init() to inititialize p2m_root_order. - Add maddr_to_page() and page_to_maddr() macros for easier address manipulation. - Introduce paging_ret_pages_to_domheap() to return some pages before allocate 16 KiB pages for root page table. - Allocate root p2m table after p2m pool is initialized. - Add construct_hgatp() to construct the hgatp register value based on p2m->root, p2m->hgatp_mode and VMID. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in v3: - Drop insterting of p2m->vmid in hgatp_from_page() as now vmid is allocated per-CPU, not per-domain, so it will be inserted later somewhere in context_switch or before returning control to a guest. - use BIT() to init nr_pages in p2m_allocate_root() instead of open-code BIT() macros. - Fix order in clear_and_clean_page(). - s/panic("Specify more xen,domain-p2m-mem-mb\n")/return NULL. - Use lock around a procedure of returning back pages necessary for p2m root table. - Update the comment about allocation of page for root page table. - Update an argument of hgatp_from_page() to "struct page_info *p2m_root_page" to be consistent with the function name. - Use p2m_get_hostp2m(d) instead of open-coding it. - Update the comment above the call of p2m_alloc_root_table(). - Update the comments in p2m_allocate_root(). - Move part which returns some page to domheap before root page table allocation to paging.c. - Pass p2m_domain * instead of struct domain * for p2m_alloc_root_table(). - Introduce construct_hgatp() instead of hgatp_from_page(). - Add vmid and hgatp_mode member of struct p2m_domain. - Add explanatory comment above clean_dcache_va_range() in clear_and_clean_page(). - Introduce P2M_ROOT_ORDER and P2M_ROOT_PAGES. - Drop vmid member from p2m_domain as now we are using per-pCPU VMID allocation. - Update a declaration of construct_hgatp() to recieve VMID as it isn't per-VM anymore. - Drop hgatp member of p2m_domain struct as with the new VMID scheme allocation construction of hgatp will be needed more often. - Drop is_hardware_domain() case in p2m_allocate_root(), just always allocate root using p2m pool pages. - Refactor p2m_alloc_root_table() and p2m_alloc_table(). --- Changes in v2: - This patch was created from "xen/riscv: introduce things necessary for p2m initialization" with the following changes: - [clear_and_clean_page()] Add missed call of clean_dcache_va_range(). - Drop p2m_get_clean_page() as it is going to be used only once to allocate root page table. Open-code it explicittly in p2m_allocate_root(). Also, it will help avoid duplication of the code connected to order and nr_pages of p2m root page table. - Instead of using order 2 for alloc_domheap_pages(), use get_order_from_bytes(KB(16)). - Clear and clean a proper amount of allocated pages in p2m_allocate_root(). - Drop _info from the function name hgatp_from_page_info() and its argument page_info. - Introduce HGATP_MODE_MASK and use MASK_INSR() instead of shift to calculate value of hgatp. - Drop unnecessary parentheses in definition of page_to_maddr(). - Add support of VMID. - Drop TLB flushing in p2m_alloc_root_table() and do that once when VMID is re-used. [Look at p2m_alloc_vmid()] - Allocate p2m root table after p2m pool is fully initialized: first return pages to p2m pool them allocate p2m root table. --- xen/arch/riscv/include/asm/mm.h | 4 + xen/arch/riscv/include/asm/p2m.h | 12 +++ xen/arch/riscv/include/asm/paging.h | 2 + xen/arch/riscv/include/asm/riscv_encoding.h | 6 ++ xen/arch/riscv/p2m.c | 90 +++++++++++++++++++++ xen/arch/riscv/paging.c | 30 +++++++ 6 files changed, 144 insertions(+) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ extern struct page_info *frametable_virt_start; #define mfn_to_page(mfn) (frametable_virt_start + mfn_x(mfn)) #define page_to_mfn(pg) _mfn((pg) - frametable_virt_start) +/* Convert between machine addresses and page-info structures. */ +#define maddr_to_page(ma) mfn_to_page(maddr_to_mfn(ma)) +#define page_to_maddr(pg) mfn_to_maddr(page_to_mfn(pg)) + static inline void *page_to_virt(const struct page_info *pg) { return mfn_to_virt(mfn_x(page_to_mfn(pg))); diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ #include <asm/page-bits.h> +extern unsigned int p2m_root_order; +#define P2M_ROOT_ORDER p2m_root_order +#define P2M_ROOT_PAGES BIT(P2M_ROOT_ORDER, U) + #define paddr_bits PADDR_BITS /* Get host p2m table */ @@ -XXX,XX +XXX,XX @@ struct p2m_domain { /* Pages used to construct the p2m */ struct page_list_head pages; + /* The root of the p2m tree. May be concatenated */ + struct page_info *root; + + /* G-stage (stage-2) address translation mode */ + unsigned long hgatp_mode; + /* Indicate if it is required to clean the cache when writing an entry */ bool clean_pte; @@ -XXX,XX +XXX,XX @@ static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx) /* Not supported on RISCV. */ } +unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid); + #endif /* ASM__RISCV__P2M_H */ /* diff --git a/xen/arch/riscv/include/asm/paging.h b/xen/arch/riscv/include/asm/paging.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/paging.h +++ b/xen/arch/riscv/include/asm/paging.h @@ -XXX,XX +XXX,XX @@ int paging_domain_init(struct domain *d); int paging_freelist_init(struct domain *d, unsigned long pages, bool *preempted); +bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages); + #endif /* ASM_RISCV_PAGING_H */ diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h b/xen/arch/riscv/include/asm/riscv_encoding.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/riscv_encoding.h +++ b/xen/arch/riscv/include/asm/riscv_encoding.h @@ -XXX,XX +XXX,XX @@ #define HGATP_MODE_SV48X4 _UL(9) #define HGATP32_MODE_SHIFT 31 +#define HGATP32_MODE_MASK _UL(0x80000000) #define HGATP32_VMID_SHIFT 22 #define HGATP32_VMID_MASK _UL(0x1FC00000) #define HGATP32_PPN _UL(0x003FFFFF) #define HGATP64_MODE_SHIFT 60 +#define HGATP64_MODE_MASK _ULL(0xF000000000000000) #define HGATP64_VMID_SHIFT 44 #define HGATP64_VMID_MASK _ULL(0x03FFF00000000000) #define HGATP64_PPN _ULL(0x00000FFFFFFFFFFF) @@ -XXX,XX +XXX,XX @@ #define HGATP_VMID_SHIFT HGATP64_VMID_SHIFT #define HGATP_VMID_MASK HGATP64_VMID_MASK #define HGATP_MODE_SHIFT HGATP64_MODE_SHIFT +#define HGATP_MODE_MASK HGATP64_MODE_MASK #else #define MSTATUS_SD MSTATUS32_SD #define SSTATUS_SD SSTATUS32_SD @@ -XXX,XX +XXX,XX @@ #define HGATP_VMID_SHIFT HGATP32_VMID_SHIFT #define HGATP_VMID_MASK HGATP32_VMID_MASK #define HGATP_MODE_SHIFT HGATP32_MODE_SHIFT +#define HGATP_MODE_MASK HGATP32_MODE_MASK #endif +#define GUEST_ROOT_PAGE_TABLE_SIZE KB(16) + #define TOPI_IID_SHIFT 16 #define TOPI_IID_MASK 0xfff #define TOPI_IPRIO_MASK 0xff diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ +#include <xen/domain_page.h> #include <xen/mm.h> #include <xen/rwlock.h> #include <xen/sched.h> #include <asm/paging.h> +#include <asm/p2m.h> +#include <asm/riscv_encoding.h> + +unsigned int __read_mostly p2m_root_order; + +static void clear_and_clean_page(struct page_info *page) +{ + clear_domain_page(page_to_mfn(page)); + + /* + * If the IOMMU doesn't support coherent walks and the p2m tables are + * shared between the CPU and IOMMU, it is necessary to clean the + * d-cache. + */ + clean_dcache_va_range(page, PAGE_SIZE); +} + +static struct page_info *p2m_allocate_root(struct domain *d) +{ + struct page_info *page; + + /* + * As mentioned in the Priviliged Architecture Spec (version 20240411) + * in Section 18.5.1, for the paged virtual-memory schemes (Sv32x4, + * Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 KiB and must + * be aligned to a 16-KiB boundary. + */ + page = alloc_domheap_pages(d, P2M_ROOT_ORDER, MEMF_no_owner); + if ( !page ) + return NULL; + + for ( unsigned int i = 0; i < P2M_ROOT_PAGES; i++ ) + clear_and_clean_page(page + i); + + return page; +} + +unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid) +{ + unsigned long ppn; + + ppn = PFN_DOWN(page_to_maddr(p2m->root)) & HGATP_PPN; + + /* TODO: add detection of hgatp_mode instead of hard-coding it. */ +#if RV_STAGE1_MODE == SATP_MODE_SV39 + p2m->hgatp_mode = HGATP_MODE_SV39X4; +#elif RV_STAGE1_MODE == SATP_MODE_SV48 + p2m->hgatp_mode = HGATP_MODE_SV48X4; +#else +# error "add HGATP_MODE" +#endif + + return ppn | MASK_INSR(p2m->hgatp_mode, HGATP_MODE_MASK) | + MASK_INSR(vmid, HGATP_VMID_MASK); +} + +static int p2m_alloc_root_table(struct p2m_domain *p2m) +{ + struct domain *d = p2m->domain; + struct page_info *page; + const unsigned int nr_root_pages = P2M_ROOT_PAGES; + + /* + * Return back nr_root_pages to assure the root table memory is also + * accounted against the P2M pool of the domain. + */ + if ( !paging_ret_pages_to_domheap(d, nr_root_pages) ) + return -ENOMEM; + + page = p2m_allocate_root(d); + if ( !page ) + return -ENOMEM; + + p2m->root = page; + + return 0; +} int p2m_init(struct domain *d) { @@ -XXX,XX +XXX,XX @@ int p2m_init(struct domain *d) # error "Add init of p2m->clean_pte" #endif + p2m_root_order = get_order_from_bytes(GUEST_ROOT_PAGE_TABLE_SIZE); + return 0; } @@ -XXX,XX +XXX,XX @@ int p2m_init(struct domain *d) */ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted) { + struct p2m_domain *p2m = p2m_get_hostp2m(d); int rc; if ( (rc = paging_freelist_init(d, pages, preempted)) ) return rc; + /* + * First, initialize p2m pool. Then allocate the root + * table so that the necessary pages can be returned from the p2m pool, + * since the root table must be allocated using alloc_domheap_pages(...) + * to meet its specific requirements. + */ + if ( !p2m->root ) + p2m_alloc_root_table(p2m); + return 0; } diff --git a/xen/arch/riscv/paging.c b/xen/arch/riscv/paging.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/paging.c +++ b/xen/arch/riscv/paging.c @@ -XXX,XX +XXX,XX @@ int paging_freelist_init(struct domain *d, unsigned long pages, return 0; } + +bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages) +{ + struct page_info *page; + + ASSERT(spin_is_locked(&d->arch.paging.lock)); + + if ( ACCESS_ONCE(d->arch.paging.total_pages) < nr_pages ) + return false; + + for ( unsigned int i = 0; i < nr_pages; i++ ) + { + /* Return memory to domheap. */ + page = page_list_remove_head(&d->arch.paging.freelist); + if( page ) + { + ACCESS_ONCE(d->arch.paging.total_pages)--; + free_domheap_page(page); + } + else + { + printk(XENLOG_ERR + "Failed to free P2M pages, P2M freelist is empty.\n"); + return false; + } + } + + return true; +} + /* Domain paging struct initialization. */ int paging_domain_init(struct domain *d) { -- 2.50.1
Introduce helpers pte_{set,get}_mfn() to simplify setting and getting of mfn. Also, introduce PTE_PPN_MASK and add BUILD_BUG_ON() to be sure that PTE_PPN_MASK remains the same for all MMU modes except Sv32. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> --- Changes in V3: - Add Acked-by: Jan Beulich <jbeulich@suse.com>. --- Changes in V2: - Patch "[PATCH v1 4/6] xen/riscv: define pt_t and pt_walk_t structures" was renamed to xen/riscv: introduce pte_{set,get}_mfn() as after dropping of bitfields for PTE structure, this patch introduce only pte_{set,get}_mfn(). - As pt_t and pt_walk_t were dropped, update implementation of pte_{set,get}_mfn() to use bit operations and shifts instead of bitfields. - Introduce PTE_PPN_MASK to be able to use MASK_INSR for setting/getting PPN. - Add BUILD_BUG_ON(RV_STAGE1_MODE > SATP_MODE_SV57) to be sure that when new MMU mode will be added, someone checks that PPN is still bits 53:10. --- xen/arch/riscv/include/asm/page.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/page.h +++ b/xen/arch/riscv/include/asm/page.h @@ -XXX,XX +XXX,XX @@ typedef struct { #endif } pte_t; +#if RV_STAGE1_MODE != SATP_MODE_SV32 +#define PTE_PPN_MASK _UL(0x3FFFFFFFFFFC00) +#else +#define PTE_PPN_MASK _U(0xFFFFFC00) +#endif + +static inline void pte_set_mfn(pte_t *p, mfn_t mfn) +{ + /* + * At the moment spec provides Sv32 - Sv57. + * If one day new MMU mode will be added it will be needed + * to check that PPN mask still continue to cover bits 53:10. + */ + BUILD_BUG_ON(RV_STAGE1_MODE > SATP_MODE_SV57); + + p->pte &= ~PTE_PPN_MASK; + p->pte |= MASK_INSR(mfn_x(mfn), PTE_PPN_MASK); +} + +static inline mfn_t pte_get_mfn(pte_t p) +{ + return _mfn(MASK_EXTR(p.pte, PTE_PPN_MASK)); +} + static inline bool pte_is_valid(pte_t p) { return p.pte & PTE_VALID; -- 2.50.1
- Extended p2m_type_t with additional types: p2m_mmio_direct, p2m_grant_map_{rw,ro}. - Added macros to classify memory types: P2M_RAM_TYPES, P2M_GRANT_TYPES. - Introduced helper predicates: p2m_is_ram(), p2m_is_any_ram(). - Define p2m_mmio_direct to tell handle_passthrough_prop() from common code how to map device memory. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Drop p2m_ram_ro. - Rename p2m_mmio_direct_dev to p2m_mmio_direct_io to make it more RISC-V specicific. - s/p2m_mmio_direct_dev/p2m_mmio_direct_io. --- Changes in V2: - Drop stuff connected to foreign mapping as it isn't necessary for RISC-V right now. --- xen/arch/riscv/include/asm/p2m.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ struct p2m_domain { typedef enum { p2m_invalid = 0, /* Nothing mapped here */ p2m_ram_rw, /* Normal read/write domain RAM */ + p2m_mmio_direct_io, /* Read/write mapping of genuine Device MMIO area, + PTE_PBMT_IO will be used for such mappings */ + p2m_ext_storage, /* Following types'll be stored outsude PTE bits: */ + p2m_grant_map_rw, /* Read/write grant mapping */ + p2m_grant_map_ro, /* Read-only grant mapping */ } p2m_type_t; +#define p2m_mmio_direct p2m_mmio_direct_io + +/* We use bitmaps and mask to handle groups of types */ +#define p2m_to_mask(t_) BIT(t_, UL) + +/* RAM types, which map to real machine frames */ +#define P2M_RAM_TYPES (p2m_to_mask(p2m_ram_rw)) + +/* Grant mapping types, which map to a real frame in another VM */ +#define P2M_GRANT_TYPES (p2m_to_mask(p2m_grant_map_rw) | \ + p2m_to_mask(p2m_grant_map_ro)) + +/* Useful predicates */ +#define p2m_is_ram(t_) (p2m_to_mask(t_) & P2M_RAM_TYPES) +#define p2m_is_any_ram(t_) (p2m_to_mask(t_) & \ + (P2M_RAM_TYPES | P2M_GRANT_TYPES)) + #include <xen/p2m-common.h> static inline int get_page_and_type(struct page_info *page, -- 2.50.1
Rename `p2m_mmio_direct_dev` to a more architecture-neutral alias `p2m_mmio_direct` to avoid leaking Arm-specific naming into common Xen code, such as dom0less passthrough property handling. This helps reduce platform-specific terminology in shared logic and improves clarity for future non-Arm ports (e.g. RISC-V or PowerPC). No functional changes — the definition is preserved via a macro alias for Arm. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in v3: - New patch --- xen/arch/arm/include/asm/p2m.h | 2 ++ xen/common/device-tree/dom0less-build.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/include/asm/p2m.h +++ b/xen/arch/arm/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ typedef enum { p2m_max_real_type, /* Types after this won't be store in the p2m */ } p2m_type_t; +#define p2m_mmio_direct p2m_mmio_direct_dev + /* We use bitmaps and mask to handle groups of types */ #define p2m_to_mask(_t) (1UL << (_t)) diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/device-tree/dom0less-build.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/device-tree/dom0less-build.c +++ b/xen/common/device-tree/dom0less-build.c @@ -XXX,XX +XXX,XX @@ static int __init handle_passthrough_prop(struct kernel_info *kinfo, gaddr_to_gfn(gstart), PFN_DOWN(size), maddr_to_mfn(mstart), - p2m_mmio_direct_dev); + p2m_mmio_direct); if ( res < 0 ) { printk(XENLOG_ERR -- 2.50.1
Introduce page_set_xenheap_gfn() to encode the GFN associated with a Xen heap page directly into the type_info field of struct page_info. Introduce page_get_xenheap_gfn() to retrieve the GFN from a Xen heap page. Reserve the upper 10 bits of type_info for the usage counter and frame type; use the remaining lower bits to store the grant table frame GFN. This is sufficient for all supported RISC-V MMU modes: Sv32 uses 22-bit GFNs, while Sv39, Sv47, and Sv57 use up to 44-bit GFNs. Define PGT_gfn_mask and PGT_gfn_width to ensure a consistent bit layout across all RISC-V MMU modes, avoiding the need for mode-specific ifdefs. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in v3: - Update the comment above defintions of PGT_gfn_width, PGT_gfn_mask. - Add page_get_xenheap_gfn(). - Make commit message clearer. --- Changes in v2: - This changes were part of "xen/riscv: implement p2m mapping functionality". No additional changes were done. --- xen/arch/riscv/include/asm/mm.h | 43 ++++++++++++++++++++++++++++++--- 1 file changed, 40 insertions(+), 3 deletions(-) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ #include <xen/sections.h> #include <xen/types.h> +#include <asm/cmpxchg.h> #include <asm/page.h> #include <asm/page-bits.h> @@ -XXX,XX +XXX,XX @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr) #define PGT_writable_page PG_mask(1, 1) /* has writable mappings? */ #define PGT_type_mask PG_mask(1, 1) /* Bits 31 or 63. */ -/* Count of uses of this frame as its current type. */ -#define PGT_count_width PG_shift(2) -#define PGT_count_mask ((1UL << PGT_count_width) - 1) + /* 9-bit count of uses of this frame as its current type. */ +#define PGT_count_mask PG_mask(0x3FF, 10) + +/* + * Stored in bits [22:0] (Sv32) or [44:0] (Sv39,48,57) GFN if page is + * xenheap page. + */ +#define PGT_gfn_width PG_shift(10) +#define PGT_gfn_mask (BIT(PGT_gfn_width, UL) - 1) + +#define PGT_INVALID_XENHEAP_GFN _gfn(PGT_gfn_mask) /* * Page needs to be scrubbed. Since this bit can only be set on a page that is @@ -XXX,XX +XXX,XX @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr) #define PFN_ORDER(pg) ((pg)->v.free.order) +/* + * All accesses to the GFN portion of type_info field should always be + * protected by the P2M lock. In case when it is not feasible to satisfy + * that requirement (risk of deadlock, lock inversion, etc) it is important + * to make sure that all non-protected updates to this field are atomic. + */ +static inline gfn_t page_get_xenheap_gfn(const struct page_info *p) +{ + gfn_t gfn = _gfn(ACCESS_ONCE(p->u.inuse.type_info) & PGT_gfn_mask); + + ASSERT(is_xen_heap_page(p)); + + return gfn_eq(gfn, PGT_INVALID_XENHEAP_GFN) ? INVALID_GFN : gfn; +} + +static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn) +{ + gfn_t gfn_ = gfn_eq(gfn, INVALID_GFN) ? PGT_INVALID_XENHEAP_GFN : gfn; + unsigned long x, nx, y = p->u.inuse.type_info; + + ASSERT(is_xen_heap_page(p)); + + do { + x = y; + nx = (x & ~PGT_gfn_mask) | gfn_x(gfn_); + } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x ); +} + extern unsigned char cpu0_boot_stack[]; void setup_initial_pagetables(void); -- 2.50.1
Implement map_regions_p2mt() to map a region in the guest p2m with a specific p2m type. The memory attributes will be derived from the p2m type. This function is going to be called from dom0less common code. To implement it, introduce: - p2m_write_(un)lock() to ensure safe concurrent updates to the P2M. As part of this change, introduce p2m_tlb_flush_sync() and p2m_force_tlb_flush_sync(). - A stub for p2m_set_range() to map a range of GFNs to MFNs. - p2m_insert_mapping(). - p2m_is_write_locked(). Drop guest_physmap_add_entry() and call map_regions_p2mt() directly from guest_physmap_add_page(), making guest_physmap_add_entry() unnecessary. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in v3: - Introudce p2m_write_lock() and p2m_is_write_locked(). - Introduce p2m_force_tlb_flush_sync() and p2m_flush_tlb() to flush TLBs after p2m table update. - Change an argument of p2m_insert_mapping() from struct domain *d to p2m_domain *p2m. - Drop guest_physmap_add_entry() and use map_regions_p2mt() to define guest_physmap_add_page(). - Add declaration of map_regions_p2mt() to asm/p2m.h. - Rewrite commit message and subject. - Drop p2m_access_t related stuff. - Add defintion of p2m_is_write_locked(). --- Changes in v2: - This changes were part of "xen/riscv: implement p2m mapping functionality". No additional signigicant changes were done. --- xen/arch/riscv/include/asm/p2m.h | 31 ++++++++++----- xen/arch/riscv/p2m.c | 65 ++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 9 deletions(-) diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ static inline int guest_physmap_mark_populate_on_demand(struct domain *d, return -EOPNOTSUPP; } -static inline int guest_physmap_add_entry(struct domain *d, - gfn_t gfn, mfn_t mfn, - unsigned long page_order, - p2m_type_t t) -{ - BUG_ON("unimplemented"); - return -EINVAL; -} +/* + * Map a region in the guest p2m with a specific p2m type. + * The memory attributes will be derived from the p2m type. + */ +int map_regions_p2mt(struct domain *d, + gfn_t gfn, + unsigned long nr, + mfn_t mfn, + p2m_type_t p2mt); /* Untyped version for RAM only, for compatibility */ static inline int __must_check guest_physmap_add_page(struct domain *d, gfn_t gfn, mfn_t mfn, unsigned int page_order) { - return guest_physmap_add_entry(d, gfn, mfn, page_order, p2m_ram_rw); + return map_regions_p2mt(d, gfn, BIT(page_order, UL), mfn, p2m_ram_rw); } static inline mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) @@ -XXX,XX +XXX,XX @@ static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx) /* Not supported on RISCV. */ } +static inline void p2m_write_lock(struct p2m_domain *p2m) +{ + write_lock(&p2m->lock); +} + +void p2m_write_unlock(struct p2m_domain *p2m); + +static inline int p2m_is_write_locked(struct p2m_domain *p2m) +{ + return rw_is_write_locked(&p2m->lock); +} + unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid); #endif /* ASM__RISCV__P2M_H */ diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ unsigned int __read_mostly p2m_root_order; +/* + * Force a synchronous P2M TLB flush. + * + * Must be called with the p2m lock held. + */ +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m) +{ + struct domain *d = p2m->domain; + + ASSERT(p2m_is_write_locked(p2m)); + + sbi_remote_hfence_gvma(d->dirty_cpumask, 0, 0); + + p2m->need_flush = false; +} + +void p2m_tlb_flush_sync(struct p2m_domain *p2m) +{ + if ( p2m->need_flush ) + p2m_force_tlb_flush_sync(p2m); +} + +/* Unlock the flush and do a P2M TLB flush if necessary */ +void p2m_write_unlock(struct p2m_domain *p2m) +{ + /* + * The final flush is done with the P2M write lock taken to avoid + * someone else modifying the P2M wbefore the TLB invalidation has + * completed. + */ + p2m_tlb_flush_sync(p2m); + + write_unlock(&p2m->lock); +} + static void clear_and_clean_page(struct page_info *page) { clear_domain_page(page_to_mfn(page)); @@ -XXX,XX +XXX,XX @@ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted) return 0; } + +static int p2m_set_range(struct p2m_domain *p2m, + gfn_t sgfn, + unsigned long nr, + mfn_t smfn, + p2m_type_t t) +{ + return -EOPNOTSUPP; +} + +static int p2m_insert_mapping(struct p2m_domain *p2m, gfn_t start_gfn, + unsigned long nr, mfn_t mfn, p2m_type_t t) +{ + int rc; + + p2m_write_lock(p2m); + rc = p2m_set_range(p2m, start_gfn, nr, mfn, t); + p2m_write_unlock(p2m); + + return rc; +} + +int map_regions_p2mt(struct domain *d, + gfn_t gfn, + unsigned long nr, + mfn_t mfn, + p2m_type_t p2mt) +{ + return p2m_insert_mapping(p2m_get_hostp2m(d), gfn, nr, mfn, p2mt); +} -- 2.50.1
This patch introduces p2m_set_range() and its core helper p2m_set_entry() for RISC-V, based loosely on the Arm implementation, with several RISC-V-specific modifications. The main changes are: - Simplification of Break-Before-Make (BBM) approach as according to RISC-V spec: It is permitted for multiple address-translation cache entries to co-exist for the same address. This represents the fact that in a conventional TLB hierarchy, it is possible for multiple entries to match a single address if, for example, a page is upgraded to a superpage without first clearing the original non-leaf PTE’s valid bit and executing an SFENCE.VMA with rs1=x0, or if multiple TLBs exist in parallel at a given level of the hierarchy. In this case, just as if an SFENCE.VMA is not executed between a write to the memory-management tables and subsequent implicit read of the same address: it is unpredictable whether the old non-leaf PTE or the new leaf PTE is used, but the behavior is otherwise well defined. In contrast to the Arm architecture, where BBM is mandatory and failing to use it in some cases can lead to CPU instability, RISC-V guarantees stability, and the behavior remains safe — though unpredictable in terms of which translation will be used. - Unlike Arm, the valid bit is not repurposed for other uses in this implementation. Instead, entry validity is determined based solely on P2M PTE's valid bit. The main functionality is in p2m_set_entry(), which handles mappings aligned to page table block entries (e.g., 1GB, 2MB, or 4KB with 4KB granularity). p2m_set_range() breaks a region down into block-aligned mappings and calls p2m_set_entry() accordingly. Stub implementations (to be completed later) include: - p2m_free_subtree() - p2m_next_level() - p2m_pte_from_mfn() Note: Support for shattering block entries is not implemented in this patch and will be added separately. Additionally, some straightforward helper functions are now implemented: - p2m_write_pte() - p2m_remove_pte() - p2m_get_root_pointer() Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Drop p2m_access_t connected stuff as it isn't going to be used, at least now. - Move defintion of P2M_ROOT_ORDER and P2M_ROOT_PAGES to earlier patches. - Update the comment above lowest_mapped_gfn declaration. - Update the comment above p2m_get_root_pointer(): s/"...ofset of the root table"/"...ofset into root table". - s/p2m_remove_pte/p2m_clean_pte. - Use plain 0 instead of 0x00 in p2m_clean_pte(). - s/p2m_entry_from_mfn/p2m_pte_from_mfn. - s/GUEST_TABLE_*/P2M_TABLE_*. - Update the comment above p2m_next_level(): "GFN entry" -> "corresponding the entry corresponding to the GFN". - s/__p2m_set_entry/_p2m_set_entry. - drop "s" for sgfn and smfn prefixes of _p2m_set_entry()'s arguments as this function work only with one GFN and one MFN. - Return correct return code when p2m_next_level() faild in _p2m_set_entry(), also drop "else" and just handle case (rc != P2M_TABLE_NORMAL) separately. - Code style fixes. - Use unsigned int for "order" in p2m_set_entry(). - s/p2m_set_entry/p2m_free_subtree. - Update ASSERT() in __p2m_set_enty() to check that page_order is propertly aligned. - Return -EACCES instead of -ENOMEM in the chase when domain is dying and someone called p2m_set_entry. - s/p2m_set_entry/p2m_set_range. - s/__p2m_set_entry/p2m_set_entry - s/p2me_is_valid/p2m_is_valid() - Return a number of successfully mapped GFNs in case if not all were mapped in p2m_set_range(). - Use BIT(order, UL) instead of 1 << order. - Drop IOMMU flushing code from p2m_set_entry(). - set p2m->need_flush=true when entry in p2m_set_entry() is changed. - Introduce p2m_mapping_order() to support superpages. - Drop p2m_is_valid() and use pte_is_valid() instead as there is no tricks with copying of valid bit anymore. - Update p2m_pte_from_mfn() prototype: drop p2m argument. --- Changes in V2: - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping functionality" which was splitted to smaller. - Update the way when p2m TLB is flushed: - RISC-V does't require BBM so there is no need to remove PTE before making new so drop 'if /*pte_is_valid(orig_pte) */' and remove PTE only removing has been requested. - Drop p2m->need_flush |= !!pte_is_valid(orig_pte); for the case when PTE's removing is happening as RISC-V could cache invalid PTE and thereby it requires to do a flush each time and it doesn't matter if PTE is valid or not at the moment when PTE removing is happening. - Drop a check if PTE is valid in case of PTE is modified as it was mentioned above as BBM isn't required so TLB flushing could be defered and there is no need to do it before modifying of PTE. - Drop p2m->need_flush as it seems like it will be always true. - Drop foreign mapping things as it isn't necessary for RISC-V right now. - s/p2m_is_valid/p2me_is_valid. - Move definition and initalization of p2m->{max_mapped_gfn,lowest_mapped_gfn} to this patch. --- xen/arch/riscv/include/asm/p2m.h | 12 ++ xen/arch/riscv/p2m.c | 250 ++++++++++++++++++++++++++++++- 2 files changed, 261 insertions(+), 1 deletion(-) diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ #include <xen/rwlock.h> #include <xen/types.h> +#include <asm/page.h> #include <asm/page-bits.h> extern unsigned int p2m_root_order; #define P2M_ROOT_ORDER p2m_root_order #define P2M_ROOT_PAGES BIT(P2M_ROOT_ORDER, U) +#define P2M_ROOT_LEVEL HYP_PT_ROOT_LEVEL #define paddr_bits PADDR_BITS @@ -XXX,XX +XXX,XX @@ struct p2m_domain { * shattered), call p2m_tlb_flush_sync(). */ bool need_flush; + + /* Highest guest frame that's ever been mapped in the p2m */ + gfn_t max_mapped_gfn; + + /* + * Lowest mapped gfn in the p2m. When releasing mapped gfn's in a + * preemptible manner this is updated to track where to resume + * the search. Apart from during teardown this can only decrease. + */ + gfn_t lowest_mapped_gfn; }; /* diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ #include <xen/rwlock.h> #include <xen/sched.h> +#include <asm/page.h> #include <asm/paging.h> #include <asm/p2m.h> #include <asm/riscv_encoding.h> @@ -XXX,XX +XXX,XX @@ int p2m_init(struct domain *d) rwlock_init(&p2m->lock); INIT_PAGE_LIST_HEAD(&p2m->pages); + p2m->max_mapped_gfn = _gfn(0); + p2m->lowest_mapped_gfn = _gfn(ULONG_MAX); + /* * Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH * is not ready for RISC-V support. @@ -XXX,XX +XXX,XX @@ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted) return 0; } +/* + * Find and map the root page table. The caller is responsible for + * unmapping the table. + * + * The function will return NULL if the offset into the root table is + * invalid. + */ +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn) +{ + unsigned long root_table_indx; + + root_table_indx = gfn_x(gfn) >> XEN_PT_LEVEL_ORDER(P2M_ROOT_LEVEL); + if ( root_table_indx >= P2M_ROOT_PAGES ) + return NULL; + + return __map_domain_page(p2m->root + root_table_indx); +} + +static inline void p2m_write_pte(pte_t *p, pte_t pte, bool clean_pte) +{ + write_pte(p, pte); + if ( clean_pte ) + clean_dcache_va_range(p, sizeof(*p)); +} + +static inline void p2m_clean_pte(pte_t *p, bool clean_pte) +{ + pte_t pte; + + memset(&pte, 0, sizeof(pte)); + p2m_write_pte(p, pte, clean_pte); +} + +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t) +{ + panic("%s: hasn't been implemented yet\n", __func__); + + return (pte_t) { .pte = 0 }; +} + +#define P2M_TABLE_MAP_NONE 0 +#define P2M_TABLE_MAP_NOMEM 1 +#define P2M_TABLE_SUPER_PAGE 2 +#define P2M_TABLE_NORMAL 3 + +/* + * Take the currently mapped table, find the corresponding the entry + * corresponding to the GFN, and map the next table, if available. + * The previous table will be unmapped if the next level was mapped + * (e.g P2M_TABLE_NORMAL returned). + * + * `alloc_tbl` parameter indicates whether intermediate tables should + * be allocated when not present. + * + * Return values: + * P2M_TABLE_MAP_NONE: a table allocation isn't permitted. + * P2M_TABLE_MAP_NOMEM: allocating a new page failed. + * P2M_TABLE_SUPER_PAGE: next level or leaf mapped normally. + * P2M_TABLE_NORMAL: The next entry points to a superpage. + */ +static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl, + unsigned int level, pte_t **table, + unsigned int offset) +{ + panic("%s: hasn't been implemented yet\n", __func__); + + return P2M_TABLE_MAP_NONE; +} + +/* Free pte sub-tree behind an entry */ +static void p2m_free_subtree(struct p2m_domain *p2m, + pte_t entry, unsigned int level) +{ + panic("%s: hasn't been implemented yet\n", __func__); +} + +/* + * Insert an entry in the p2m. This should be called with a mapping + * equal to a page/superpage. + */ +static int p2m_set_entry(struct p2m_domain *p2m, + gfn_t gfn, + unsigned long page_order, + mfn_t mfn, + p2m_type_t t) +{ + unsigned int level; + unsigned int target = page_order / PAGETABLE_ORDER; + pte_t *entry, *table, orig_pte; + int rc; + /* A mapping is removed if the MFN is invalid. */ + bool removing_mapping = mfn_eq(mfn, INVALID_MFN); + DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn)); + + ASSERT(p2m_is_write_locked(p2m)); + + /* + * Check if the level target is valid: we only support + * 4K - 2M - 1G mapping. + */ + ASSERT((target <= 2) && !(page_order % PAGETABLE_ORDER)); + + table = p2m_get_root_pointer(p2m, gfn); + if ( !table ) + return -EINVAL; + + for ( level = P2M_ROOT_LEVEL; level > target; level-- ) + { + /* + * Don't try to allocate intermediate page table if the mapping + * is about to be removed. + */ + rc = p2m_next_level(p2m, !removing_mapping, + level, &table, offsets[level]); + if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) ) + { + rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM; + /* + * We are here because p2m_next_level has failed to map + * the intermediate page table (e.g the table does not exist + * and they p2m tree is read-only). It is a valid case + * when removing a mapping as it may not exist in the + * page table. In this case, just ignore it. + */ + rc = removing_mapping ? 0 : rc; + goto out; + } + + if ( rc != P2M_TABLE_NORMAL ) + break; + } + + entry = table + offsets[level]; + + /* + * If we are here with level > target, we must be at a leaf node, + * and we need to break up the superpage. + */ + if ( level > target ) + { + panic("Shattering isn't implemented\n"); + } + + /* + * We should always be there with the correct level because all the + * intermediate tables have been installed if necessary. + */ + ASSERT(level == target); + + orig_pte = *entry; + + if ( removing_mapping ) + p2m_clean_pte(entry, p2m->clean_pte); + else + { + pte_t pte = p2m_pte_from_mfn(mfn, t); + + p2m_write_pte(entry, pte, p2m->clean_pte); + + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn, + gfn_add(gfn, BIT(page_order, UL) - 1)); + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, gfn); + } + + p2m->need_flush = true; + + /* + * Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH + * is not ready for RISC-V support. + * + * When CONFIG_HAS_PASSTHROUGH=y, iommu_iotlb_flush() should be done + * here. + */ +#ifdef CONFIG_HAS_PASSTHROUGH +# error "add code to flush IOMMU TLB" +#endif + + rc = 0; + + /* + * Free the entry only if the original pte was valid and the base + * is different (to avoid freeing when permission is changed). + */ + if ( pte_is_valid(orig_pte) && + !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ) + p2m_free_subtree(p2m, orig_pte, level); + + out: + unmap_domain_page(table); + + return rc; +} + +/* Return mapping order for given gfn, mfn and nr */ +static unsigned long p2m_mapping_order(gfn_t gfn, mfn_t mfn, unsigned long nr) +{ + unsigned long mask; + /* 1gb, 2mb, 4k mappings are supported */ + unsigned int level = min(P2M_ROOT_LEVEL, 2); + unsigned long order = 0; + + mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0; + mask |= gfn_x(gfn); + + for ( ; level != 0; level-- ) + { + if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(level), UL) - 1)) && + (nr >= BIT(XEN_PT_LEVEL_ORDER(level), UL)) ) + { + order = XEN_PT_LEVEL_ORDER(level); + break; + } + } + + return order; +} + static int p2m_set_range(struct p2m_domain *p2m, gfn_t sgfn, unsigned long nr, mfn_t smfn, p2m_type_t t) { - return -EOPNOTSUPP; + int rc = 0; + unsigned long left = nr; + + /* + * Any reference taken by the P2M mappings (e.g. foreign mapping) will + * be dropped in relinquish_p2m_mapping(). As the P2M will still + * be accessible after, we need to prevent mapping to be added when the + * domain is dying. + */ + if ( unlikely(p2m->domain->is_dying) ) + return -EACCES; + + while ( left ) + { + unsigned long order = p2m_mapping_order(sgfn, smfn, left); + + rc = p2m_set_entry(p2m, sgfn, order, smfn, t); + if ( rc ) + break; + + sgfn = gfn_add(sgfn, BIT(order, UL)); + if ( !mfn_eq(smfn, INVALID_MFN) ) + smfn = mfn_add(smfn, BIT(order, UL)); + + left -= BIT(order, UL); + } + + return !left ? 0 : left == nr ? rc : (nr - left); } static int p2m_insert_mapping(struct p2m_domain *p2m, gfn_t start_gfn, -- 2.50.1
This patch introduces a working implementation of p2m_free_subtree() for RISC-V based on ARM's implementation of p2m_free_entry(), enabling proper cleanup of page table entries in the P2M (physical-to-machine) mapping. Only few things are changed: - Introduce and use p2m_get_type() to get a type of p2m entry as RISC-V's PTE doesn't have enough space to store all necessary types so a type is stored outside PTE. But, at the moment, handle only types which fit into PTE's bits. Key additions include: - p2m_free_subtree(): Recursively frees page table entries at all levels. It handles both regular and superpage mappings and ensures that TLB entries are flushed before freeing intermediate tables. - p2m_put_page() and helpers: - p2m_put_4k_page(): Clears GFN from xenheap pages if applicable. - p2m_put_2m_superpage(): Releases foreign page references in a 2MB superpage. - p2m_get_type(): Extracts the stored p2m_type from the PTE bits. - p2m_free_page(): Returns a page to a domain's freelist. - Introduce p2m_is_foreign() and connected to it things. Defines XEN_PT_ENTRIES in asm/page.h to simplify loops over page table entries. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Use p2m_tlb_flush_sync(p2m) instead of p2m_force_tlb_flush_sync() in p2m_free_subtree(). - Drop p2m_is_valid() implementation as pte_is_valid() is going to be used instead. - Drop p2m_is_superpage() and introduce pte_is_superpage() instead. - s/p2m_free_entry/p2m_free_subtree. - s/p2m_type_radix_get/p2m_get_type. - Update implementation of p2m_get_type() to get type both from PTE bits, other cases will be covered in a separate patch. This requires an introduction of new P2M_TYPE_PTE_BITS_MASK macros. - Drop p2m argument of p2m_get_type() as it isn't needed anymore. - Put cheapest checks first in p2m_is_superpage(). - Use switch() in p2m_put_page(). - Update the comment in p2m_put_foreign_page(). - Code style fixes. - Move p2m_foreign stuff to this commit. - Drop p2m argument of p2m_put_page() as itsn't used anymore. --- Changes in V2: - New patch. It was a part of 2ma big patch "xen/riscv: implement p2m mapping functionality" which was splitted to smaller. - s/p2m_is_superpage/p2me_is_superpage. --- xen/arch/riscv/include/asm/p2m.h | 18 +++- xen/arch/riscv/include/asm/page.h | 6 ++ xen/arch/riscv/include/asm/paging.h | 2 + xen/arch/riscv/p2m.c | 137 +++++++++++++++++++++++++++- xen/arch/riscv/paging.c | 7 ++ 5 files changed, 168 insertions(+), 2 deletions(-) diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ typedef enum { p2m_ext_storage, /* Following types'll be stored outsude PTE bits: */ p2m_grant_map_rw, /* Read/write grant mapping */ p2m_grant_map_ro, /* Read-only grant mapping */ + p2m_map_foreign_rw, /* Read/write RAM pages from foreign domain */ + p2m_map_foreign_ro, /* Read-only RAM pages from foreign domain */ } p2m_type_t; #define p2m_mmio_direct p2m_mmio_direct_io +/* + * Bits 8 and 9 are reserved for use by supervisor software; + * the implementation shall ignore this field. + * We are going to use to save in these bits frequently used types to avoid + * get/set of a type from radix tree. + */ +#define P2M_TYPE_PTE_BITS_MASK 0x300 + /* We use bitmaps and mask to handle groups of types */ #define p2m_to_mask(t_) BIT(t_, UL) @@ -XXX,XX +XXX,XX @@ typedef enum { #define P2M_GRANT_TYPES (p2m_to_mask(p2m_grant_map_rw) | \ p2m_to_mask(p2m_grant_map_ro)) + /* Foreign mappings types */ +#define P2M_FOREIGN_TYPES (p2m_to_mask(p2m_map_foreign_rw) | \ + p2m_to_mask(p2m_map_foreign_ro)) + /* Useful predicates */ #define p2m_is_ram(t_) (p2m_to_mask(t_) & P2M_RAM_TYPES) #define p2m_is_any_ram(t_) (p2m_to_mask(t_) & \ - (P2M_RAM_TYPES | P2M_GRANT_TYPES)) + (P2M_RAM_TYPES | P2M_GRANT_TYPES | \ + P2M_FOREIGN_TYPES)) +#define p2m_is_foreign(t_) (p2m_to_mask(t_) & P2M_FOREIGN_TYPES) #include <xen/p2m-common.h> diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/page.h +++ b/xen/arch/riscv/include/asm/page.h @@ -XXX,XX +XXX,XX @@ #define XEN_PT_LEVEL_SIZE(lvl) (_AT(paddr_t, 1) << XEN_PT_LEVEL_SHIFT(lvl)) #define XEN_PT_LEVEL_MAP_MASK(lvl) (~(XEN_PT_LEVEL_SIZE(lvl) - 1)) #define XEN_PT_LEVEL_MASK(lvl) (VPN_MASK << XEN_PT_LEVEL_SHIFT(lvl)) +#define XEN_PT_ENTRIES (_AT(unsigned int, 1) << PAGETABLE_ORDER) /* * PTE format: @@ -XXX,XX +XXX,XX @@ static inline bool pte_is_mapping(pte_t p) return (p.pte & PTE_VALID) && (p.pte & PTE_ACCESS_MASK); } +static inline bool pte_is_superpage(pte_t p, unsigned int level) +{ + return (level > 0) && pte_is_mapping(p); +} + static inline int clean_and_invalidate_dcache_va_range(const void *p, unsigned long size) { diff --git a/xen/arch/riscv/include/asm/paging.h b/xen/arch/riscv/include/asm/paging.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/paging.h +++ b/xen/arch/riscv/include/asm/paging.h @@ -XXX,XX +XXX,XX @@ int paging_freelist_init(struct domain *d, unsigned long pages, bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages); +void paging_free_page(struct domain *d, struct page_info *pg); + #endif /* ASM_RISCV_PAGING_H */ diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn) return __map_domain_page(p2m->root + root_table_indx); } +static p2m_type_t p2m_get_type(const pte_t pte) +{ + p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK); + + if ( type == p2m_ext_storage ) + panic("unimplemented\n"); + + return type; +} + static inline void p2m_write_pte(pte_t *p, pte_t pte, bool clean_pte) { write_pte(p, pte); @@ -XXX,XX +XXX,XX @@ static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl, return P2M_TABLE_MAP_NONE; } +static void p2m_put_foreign_page(struct page_info *pg) +{ + /* + * It’s safe to call put_page() here because arch_flush_tlb_mask() + * will be invoked if the page is reallocated before the end of + * this loop, which will trigger a flush of the guest TLBs. + */ + put_page(pg); +} + +/* Put any references on the single 4K page referenced by mfn. */ +static void p2m_put_4k_page(mfn_t mfn, p2m_type_t type) +{ + /* TODO: Handle other p2m types */ + + if ( p2m_is_foreign(type) ) + { + ASSERT(mfn_valid(mfn)); + p2m_put_foreign_page(mfn_to_page(mfn)); + } + + /* + * Detect the xenheap page and mark the stored GFN as invalid. + * We don't free the underlying page until the guest requested to do so. + * So we only need to tell the page is not mapped anymore in the P2M by + * marking the stored GFN as invalid. + */ + if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) ) + page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN); +} + +/* Put any references on the superpage referenced by mfn. */ +static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t type) +{ + struct page_info *pg; + unsigned int i; + + ASSERT(mfn_valid(mfn)); + + pg = mfn_to_page(mfn); + + for ( i = 0; i < XEN_PT_ENTRIES; i++, pg++ ) + p2m_put_foreign_page(pg); +} + +/* Put any references on the page referenced by pte. */ +static void p2m_put_page(const pte_t pte, unsigned int level) +{ + mfn_t mfn = pte_get_mfn(pte); + p2m_type_t p2m_type = p2m_get_type(pte); + + ASSERT(pte_is_valid(pte)); + + /* + * TODO: Currently we don't handle level 2 super-page, Xen is not + * preemptible and therefore some work is needed to handle such + * superpages, for which at some point Xen might end up freeing memory + * and therefore for such a big mapping it could end up in a very long + * operation. + */ + switch ( level ) + { + case 1: + return p2m_put_2m_superpage(mfn, p2m_type); + + case 0: + return p2m_put_4k_page(mfn, p2m_type); + } +} + +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg) +{ + page_list_del(pg, &p2m->pages); + + paging_free_page(p2m->domain, pg); +} + /* Free pte sub-tree behind an entry */ static void p2m_free_subtree(struct p2m_domain *p2m, pte_t entry, unsigned int level) { - panic("%s: hasn't been implemented yet\n", __func__); + unsigned int i; + pte_t *table; + mfn_t mfn; + struct page_info *pg; + + /* Nothing to do if the entry is invalid. */ + if ( !pte_is_valid(entry) ) + return; + + if ( pte_is_superpage(entry, level) || (level == 0) ) + { +#ifdef CONFIG_IOREQ_SERVER + /* + * If this gets called then either the entry was replaced by an entry + * with a different base (valid case) or the shattering of a superpage + * has failed (error case). + * So, at worst, the spurious mapcache invalidation might be sent. + */ + if ( p2m_is_ram(p2m_get_type(p2m, entry)) && + domain_has_ioreq_server(p2m->domain) ) + ioreq_request_mapcache_invalidate(p2m->domain); +#endif + + p2m_put_page(entry, level); + + return; + } + + table = map_domain_page(pte_get_mfn(entry)); + for ( i = 0; i < XEN_PT_ENTRIES; i++ ) + p2m_free_subtree(p2m, table[i], level - 1); + + unmap_domain_page(table); + + /* + * Make sure all the references in the TLB have been removed before + * freing the intermediate page table. + * XXX: Should we defer the free of the page table to avoid the + * flush? + */ + p2m_tlb_flush_sync(p2m); + + mfn = pte_get_mfn(entry); + ASSERT(mfn_valid(mfn)); + + pg = mfn_to_page(mfn); + + page_list_del(pg, &p2m->pages); + p2m_free_page(p2m, pg); } /* diff --git a/xen/arch/riscv/paging.c b/xen/arch/riscv/paging.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/paging.c +++ b/xen/arch/riscv/paging.c @@ -XXX,XX +XXX,XX @@ bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages) return true; } +void paging_free_page(struct domain *d, struct page_info *pg) +{ + spin_lock(&d->arch.paging.lock); + page_list_add_tail(pg, &d->arch.paging.freelist); + spin_unlock(&d->arch.paging.lock); +} + /* Domain paging struct initialization. */ int paging_domain_init(struct domain *d) { -- 2.50.1
This patch adds the initial logic for constructing PTEs from MFNs in the RISC-V p2m subsystem. It includes: - Implementation of p2m_pte_from_mfn(): Generates a valid PTE using the given MFN, p2m_type_t, including permission encoding and PBMT attribute setup. - New helper p2m_set_permission(): Encodes access rights (r, w, x) into the PTE based on both p2m type and access permissions. - p2m_set_type(): Stores the p2m type in PTE's bits. The storage of types, which don't fit PTE bits, will be implemented separately later. PBMT type encoding support: - Introduces an enum pbmt_type_t to represent the PBMT field values. - Maps types like p2m_mmio_direct_dev to p2m_mmio_direct_io, others default to pbmt_pma. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - s/p2m_entry_from_mfn/p2m_pte_from_mfn. - s/pbmt_type_t/pbmt_type. - s/pbmt_max/pbmt_count. - s/p2m_type_radix_set/p2m_set_type. - Rework p2m_set_type() to handle only types which are fited into PTEs bits. Other types will be covered separately. Update arguments of p2m_set_type(): there is no any reason for p2m anymore. - p2m_set_permissions() updates: - Update the code in p2m_set_permission() for cases p2m_raw_rw and p2m_mmio_direct_io to set proper type permissions. - Add cases for p2m_grant_map_rw and p2m_grant_map_ro. - Use ASSERT_UNEACHABLE() instead of BUG() in switch cases of p2m_set_permissions. - Add blank lines non-fall-through case blocks in switch cases. - Set MFN before permissions are set in p2m_pte_from_mfn(). - Update prototype of p2m_entry_from_mfn(). --- Changes in V2: - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping functionality" which was splitted to smaller. --- xen/arch/riscv/include/asm/page.h | 8 +++ xen/arch/riscv/p2m.c | 81 +++++++++++++++++++++++++++++-- 2 files changed, 85 insertions(+), 4 deletions(-) diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/page.h +++ b/xen/arch/riscv/include/asm/page.h @@ -XXX,XX +XXX,XX @@ #define PTE_SMALL BIT(10, UL) #define PTE_POPULATE BIT(11, UL) +enum pbmt_type { + pbmt_pma, + pbmt_nc, + pbmt_io, + pbmt_rsvd, + pbmt_count, +}; + #define PTE_ACCESS_MASK (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE) #define PTE_PBMT_MASK (PTE_PBMT_NOCACHE | PTE_PBMT_IO) diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ +#include <xen/bug.h> #include <xen/domain_page.h> #include <xen/mm.h> #include <xen/rwlock.h> @@ -XXX,XX +XXX,XX @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn) return __map_domain_page(p2m->root + root_table_indx); } +static int p2m_set_type(pte_t *pte, p2m_type_t t) +{ + int rc = 0; + + if ( t > p2m_ext_storage ) + panic("unimplemeted\n"); + else + pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK); + + return rc; +} + static p2m_type_t p2m_get_type(const pte_t pte) { p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK); @@ -XXX,XX +XXX,XX @@ static inline void p2m_clean_pte(pte_t *p, bool clean_pte) p2m_write_pte(p, pte, clean_pte); } -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t) +static void p2m_set_permission(pte_t *e, p2m_type_t t) { - panic("%s: hasn't been implemented yet\n", __func__); + e->pte &= ~PTE_ACCESS_MASK; + + switch ( t ) + { + case p2m_grant_map_rw: + case p2m_ram_rw: + e->pte |= PTE_READABLE | PTE_WRITABLE; + break; + + case p2m_ext_storage: + case p2m_mmio_direct_io: + e->pte |= PTE_ACCESS_MASK; + break; + + case p2m_invalid: + e->pte &= ~(PTE_ACCESS_MASK | PTE_VALID); + break; + + case p2m_grant_map_ro: + e->pte |= PTE_READABLE; + break; + + default: + ASSERT_UNREACHABLE(); + break; + } +} + +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) +{ + pte_t e = (pte_t) { PTE_VALID }; + + switch ( t ) + { + case p2m_mmio_direct_io: + e.pte |= PTE_PBMT_IO; + break; + + default: + break; + } + + pte_set_mfn(&e, mfn); + + ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK)); + + if ( !is_table ) + { + p2m_set_permission(&e, t); + + if ( t < p2m_ext_storage ) + p2m_set_type(&e, t); + else + panic("unimplemeted\n"); + } + else + /* + * According to the spec and table "Encoding of PTE R/W/X fields": + * X=W=R=0 -> Pointer to next level of page table. + */ + e.pte &= ~PTE_ACCESS_MASK; - return (pte_t) { .pte = 0 }; + return e; } #define P2M_TABLE_MAP_NONE 0 @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, p2m_clean_pte(entry, p2m->clean_pte); else { - pte_t pte = p2m_pte_from_mfn(mfn, t); + pte_t pte = p2m_pte_from_mfn(mfn, t, false); p2m_write_pte(entry, pte, p2m->clean_pte); -- 2.50.1
Implement the p2m_next_level() function, which enables traversal and dynamic allocation of intermediate levels (if necessary) in the RISC-V p2m (physical-to-machine) page table hierarchy. To support this, the following helpers are introduced: - page_to_p2m_table(): Constructs non-leaf PTEs pointing to next-level page tables with correct attributes. - p2m_alloc_page(): Allocates page table pages, supporting both hardware and guest domains. - p2m_create_table(): Allocates and initializes a new page table page and installs it into the hierarchy. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - s/p2me_is_mapping/p2m_is_mapping to be in syc with other p2m_is_*() functions. - clear_and_clean_page() in p2m_create_table() instead of clear_page() to be sure that page is cleared and d-cache is flushed for it. - Move ASSERT(level != 0) in p2m_next_level() ahead of trying to allocate a page table. - Update p2m_create_table() to allocate metadata page to store p2m type in it for each entry of page table. - Introduce paging_alloc_page() and use it inside p2m_alloc_page(). - Add allocated page to p2m->pages list in p2m_alloc_page() to simplify a caller code a little bit. - Drop p2m_is_mapping() and use pte_is_mapping() instead as P2M PTE's valid bit doesn't have another purpose anymore. - Update an implementation and prototype of page_to_p2m_table(), it is enough to pass only a page as an argument. --- Changes in V2: - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping functionality" which was splitted to smaller. - s/p2m_is_mapping/p2m_is_mapping. --- xen/arch/riscv/include/asm/paging.h | 2 + xen/arch/riscv/p2m.c | 80 ++++++++++++++++++++++++++++- xen/arch/riscv/paging.c | 11 ++++ 3 files changed, 91 insertions(+), 2 deletions(-) diff --git a/xen/arch/riscv/include/asm/paging.h b/xen/arch/riscv/include/asm/paging.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/paging.h +++ b/xen/arch/riscv/include/asm/paging.h @@ -XXX,XX +XXX,XX @@ bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages); void paging_free_page(struct domain *d, struct page_info *pg); +struct page_info * paging_alloc_page(struct domain *d); + #endif /* ASM_RISCV_PAGING_H */ diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) return e; } +/* Generate table entry with correct attributes. */ +static pte_t page_to_p2m_table(struct page_info *page) +{ + /* + * p2m_invalid will be ignored inside p2m_pte_from_mfn() as is_table is + * set to true and p2m_type_t shouldn't be applied for PTEs which + * describe an intermidiate table. + */ + return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, true); +} + +static struct page_info *p2m_alloc_page(struct p2m_domain *p2m) +{ + struct page_info *pg = paging_alloc_page(p2m->domain); + + if ( pg ) + page_list_add(pg, &p2m->pages); + + return pg; +} + +/* + * Allocate a new page table page with an extra metadata page and hook it + * in via the given entry. + */ +static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry) +{ + struct page_info *page; + + ASSERT(!pte_is_valid(*entry)); + + page = p2m_alloc_page(p2m); + if ( page == NULL ) + return -ENOMEM; + + clear_and_clean_page(page); + + p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte); + + return 0; +} + #define P2M_TABLE_MAP_NONE 0 #define P2M_TABLE_MAP_NOMEM 1 #define P2M_TABLE_SUPER_PAGE 2 @@ -XXX,XX +XXX,XX @@ static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl, unsigned int level, pte_t **table, unsigned int offset) { - panic("%s: hasn't been implemented yet\n", __func__); + pte_t *entry; + int ret; + mfn_t mfn; + + /* The function p2m_next_level() is never called at the last level */ + ASSERT(level != 0); + + entry = *table + offset; + + if ( !pte_is_valid(*entry) ) + { + if ( !alloc_tbl ) + return P2M_TABLE_MAP_NONE; + + ret = p2m_create_table(p2m, entry); + if ( ret ) + return P2M_TABLE_MAP_NOMEM; + } + + /* The function p2m_next_level() is never called at the last level */ + ASSERT(level != 0); + if ( pte_is_mapping(*entry) ) + return P2M_TABLE_SUPER_PAGE; + + mfn = mfn_from_pte(*entry); + + unmap_domain_page(*table); + + /* + * TODO: There's an inefficiency here: + * In p2m_create_table(), the page is mapped to clear it. + * Then that mapping is torn down in p2m_create_table(), + * only to be re-established here. + */ + *table = map_domain_page(mfn); - return P2M_TABLE_MAP_NONE; + return P2M_TABLE_NORMAL; } static void p2m_put_foreign_page(struct page_info *pg) diff --git a/xen/arch/riscv/paging.c b/xen/arch/riscv/paging.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/paging.c +++ b/xen/arch/riscv/paging.c @@ -XXX,XX +XXX,XX @@ void paging_free_page(struct domain *d, struct page_info *pg) spin_unlock(&d->arch.paging.lock); } +struct page_info * paging_alloc_page(struct domain *d) +{ + struct page_info *pg; + + spin_lock(&d->arch.paging.lock); + pg = page_list_remove_head(&d->arch.paging.freelist); + spin_unlock(&d->arch.paging.lock); + + return pg; +} + /* Domain paging struct initialization. */ int paging_domain_init(struct domain *d) { -- 2.50.1
Add support for down large memory mappings ("superpages") in the RISC-V p2m mapping so that smaller, more precise mappings ("finer-grained entries") can be inserted into lower levels of the page table hierarchy. To implement that the following is done: - Introduce p2m_split_superpage(): Recursively shatters a superpage into smaller page table entries down to the target level, preserving original permissions and attributes. - p2m_set_entry() updated to invoke superpage splitting when inserting entries at lower levels within a superpage-mapped region. This implementation is based on the ARM code, with modifications to the part that follows the BBM (break-before-make) approach, some parts are simplified as according to RISC-V spec: It is permitted for multiple address-translation cache entries to co-exist for the same address. This represents the fact that in a conventional TLB hierarchy, it is possible for multiple entries to match a single address if, for example, a page is upgraded to a superpage without first clearing the original non-leaf PTE’s valid bit and executing an SFENCE.VMA with rs1=x0, or if multiple TLBs exist in parallel at a given level of the hierarchy. In this case, just as if an SFENCE.VMA is not executed between a write to the memory-management tables and subsequent implicit read of the same address: it is unpredictable whether the old non-leaf PTE or the new leaf PTE is used, but the behavior is otherwise well defined. In contrast to the Arm architecture, where BBM is mandatory and failing to use it in some cases can lead to CPU instability, RISC-V guarantees stability, and the behavior remains safe — though unpredictable in terms of which translation will be used. Additionally, the page table walk logic has been adjusted, as ARM uses the opposite number of levels compared to RISC-V. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Move page_list_add(page, &p2m->pages) inside p2m_alloc_page(). - Use 'unsigned long' for local vairiable 'i' in p2m_split_superpage(). - Update the comment above if ( next_level != target ) in p2m_split_superpage(). - Reverse cycle to iterate through page table levels in p2m_set_entry(). - Update p2m_split_superpage() with the same changes which are done in the patch "P2M: Don't try to free the existing PTE if we can't allocate a new table". --- Changes in V2: - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping functionality" which was splitted to smaller. - Update the commit above the cycle which creates new page table as RISC-V travserse page tables in an opposite to ARM order. - RISC-V doesn't require BBM so there is no needed for invalidating and TLB flushing before updating PTE. --- xen/arch/riscv/p2m.c | 118 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 117 insertions(+), 1 deletion(-) diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, p2m_free_page(p2m, pg); } +static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, + unsigned int level, unsigned int target, + const unsigned int *offsets) +{ + struct page_info *page; + unsigned long i; + pte_t pte, *table; + bool rv = true; + + /* Convenience aliases */ + mfn_t mfn = pte_get_mfn(*entry); + unsigned int next_level = level - 1; + unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level); + + /* + * This should only be called with target != level and the entry is + * a superpage. + */ + ASSERT(level > target); + ASSERT(pte_is_superpage(*entry, level)); + + page = p2m_alloc_page(p2m->domain); + if ( !page ) + { + /* + * The caller is in charge to free the sub-tree. + * As we didn't manage to allocate anything, just tell the + * caller there is nothing to free by invalidating the PTE. + */ + memset(entry, 0, sizeof(*entry)); + return false; + } + + table = __map_domain_page(page); + + /* + * We are either splitting a second level 1G page into 512 first level + * 2M pages, or a first level 2M page into 512 zero level 4K pages. + */ + for ( i = 0; i < XEN_PT_ENTRIES; i++ ) + { + pte_t *new_entry = table + i; + + /* + * Use the content of the superpage entry and override + * the necessary fields. So the correct permission are kept. + */ + pte = *entry; + pte_set_mfn(&pte, mfn_add(mfn, i << level_order)); + + write_pte(new_entry, pte); + } + + /* + * Shatter superpage in the page to the level we want to make the + * changes. + * This is done outside the loop to avoid checking the offset + * for every entry to know whether the entry should be shattered. + */ + if ( next_level != target ) + rv = p2m_split_superpage(p2m, table + offsets[next_level], + level - 1, target, offsets); + + if ( p2m->clean_pte ) + clean_dcache_va_range(table, PAGE_SIZE); + + /* + * TODO: an inefficiency here: the caller almost certainly wants to map + * the same page again, to update the one entry that caused the + * request to shatter the page. + */ + unmap_domain_page(table); + + /* + * Even if we failed, we should (according to the current implemetation + * of a way how sub-tree is freed if p2m_split_superpage hasn't been + * finished fully) install the newly allocated PTE + * entry. + * The caller will be in charge to free the sub-tree. + */ + p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte); + + return rv; +} + /* * Insert an entry in the p2m. This should be called with a mapping * equal to a page/superpage. @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, */ if ( level > target ) { - panic("Shattering isn't implemented\n"); + /* We need to split the original page. */ + pte_t split_pte = *entry; + + ASSERT(pte_is_superpage(*entry, level)); + + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) + { + /* Free the allocated sub-tree */ + p2m_free_subtree(p2m, split_pte, level); + + rc = -ENOMEM; + goto out; + } + + p2m_write_pte(entry, split_pte, p2m->clean_pte); + + p2m->need_flush = true; + + /* Then move to the level we want to make real changes */ + for ( ; level > target; level-- ) + { + rc = p2m_next_level(p2m, true, level, &table, offsets[level]); + + /* + * The entry should be found and either be a table + * or a superpage if level 0 is not targeted + */ + ASSERT(rc == P2M_TABLE_NORMAL || + (rc == P2M_TABLE_SUPER_PAGE && target > 0)); + } + + entry = table + offsets[level]; } /* -- 2.50.1
Implement put_page(), as it will be used by p2m_put_code(). Although CONFIG_STATIC_MEMORY has not yet been introduced for RISC-V, a stub for PGC_static is added to avoid cluttering the code of put_page_nr() with #ifdefs. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- xen/arch/riscv/include/asm/mm.h | 7 +++++++ xen/arch/riscv/mm.c | 25 ++++++++++++++++++++----- 2 files changed, 27 insertions(+), 5 deletions(-) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr) /* Page is Xen heap? */ #define _PGC_xen_heap PG_shift(2) #define PGC_xen_heap PG_mask(1, 2) +#ifdef CONFIG_STATIC_MEMORY +/* Page is static memory */ +#define _PGC_static PG_shift(3) +#define PGC_static PG_mask(1, 3) +#else +#define PGC_static 0 +#endif /* Page is broken? */ #define _PGC_broken PG_shift(7) #define PGC_broken PG_mask(1, 7) diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -XXX,XX +XXX,XX @@ unsigned long __init calc_phys_offset(void) return phys_offset; } -void put_page(struct page_info *page) -{ - BUG_ON("unimplemented"); -} - void arch_dump_shared_mem_info(void) { BUG_ON("unimplemented"); @@ -XXX,XX +XXX,XX @@ void flush_page_to_ram(unsigned long mfn, bool sync_icache) if ( sync_icache ) invalidate_icache(); } + +void put_page(struct page_info *page) +{ + unsigned long nx, x, y = page->count_info; + + do { + ASSERT((y & PGC_count_mask) >= 1); + x = y; + nx = x - 1; + } + while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) ); + + if ( unlikely((nx & PGC_count_mask) == 0) ) + { + if ( unlikely(nx & PGC_static) ) + free_domstatic_page(page); + else + free_domheap_page(page); + } +} -- 2.50.1
Implement the mfn_valid() macro to verify whether a given MFN is valid by checking that it falls within the range [start_page, max_page). These bounds are initialized based on the start and end addresses of RAM. As part of this patch, start_page is introduced and initialized with the PFN of the first RAM page. Also, initialize pdx_group_valid() by calling set_pdx_range() when memory banks are being mapped. Also, after providing a non-stub implementation of the mfn_valid() macro, the following compilation errors started to occur: riscv64-linux-gnu-ld: prelink.o: in function `__next_node': /build/xen/./include/xen/nodemask.h:202: undefined reference to `page_is_ram_type' riscv64-linux-gnu-ld: prelink.o: in function `get_free_buddy': /build/xen/common/page_alloc.c:881: undefined reference to `page_is_ram_type' riscv64-linux-gnu-ld: prelink.o: in function `alloc_heap_pages': /build/xen/common/page_alloc.c:1043: undefined reference to `page_get_owner_and_reference' riscv64-linux-gnu-ld: /build/xen/common/page_alloc.c:1098: undefined reference to `page_is_ram_type' riscv64-linux-gnu-ld: prelink.o: in function `ns16550_interrupt': /build/xen/drivers/char/ns16550.c:205: undefined reference to `get_page' riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol `page_get_owner_and_reference' isn't defined riscv64-linux-gnu-ld: final link failed: bad value make[2]: *** [arch/riscv/Makefile:35: xen-syms] Error 1 To resolve these errors, the following functions have also been introduced, based on their Arm counterparts: - page_get_owner_and_reference() and its variant to safely acquire a reference to a page and retrieve its owner. - A stub for page_is_ram_type() that currently always returns 0 and asserts unreachable, as RAM type checking is not yet implemented. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Update defintion of mfn_valid(). - Use __ro_after_init for variable start_page. - Drop ASSERT_UNREACHABLE() in page_get_owner_and_nr_reference(). - Update the comment inside do/while in page_get_owner_and_nr_reference(). - Define _PGC_static and drop "#ifdef CONFIG_STATIC_MEMORY" in put_page_nr(). - Initialize pdx_group_valid() by calling set_pdx_range() when memory banks are mapped. - Drop page_get_owner_and_nr_reference() and implement page_get_owner_and_reference() without reusing of a page_get_owner_and_nr_reference() to avoid potential dead code. - Move defintion of get_page() to "xen/riscv: add support of page lookup by GFN", where it is really used. --- Changes in V2: - New patch. --- xen/arch/riscv/include/asm/mm.h | 9 +++++++-- xen/arch/riscv/mm.c | 35 +++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ #include <public/xen.h> #include <xen/bug.h> +#include <xen/compiler.h> #include <xen/const.h> #include <xen/mm-frame.h> #include <xen/pdx.h> @@ -XXX,XX +XXX,XX @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr) #define page_get_owner(p) (p)->v.inuse.domain #define page_set_owner(p, d) ((p)->v.inuse.domain = (d)) -/* TODO: implement */ -#define mfn_valid(mfn) ({ (void)(mfn); 0; }) +extern unsigned long start_page; + +#define mfn_valid(mfn) ({ \ + unsigned long mfn__ = mfn_x(mfn); \ + likely((mfn__ >= start_page)) && likely(__mfn_valid(mfn__)); \ +}) #define domain_set_alloc_bitsize(d) ((void)(d)) #define domain_clamp_alloc_bitsize(d, b) ((void)(d), (b)) diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -XXX,XX +XXX,XX @@ static void __init setup_directmap_mappings(unsigned long base_mfn, #error setup_{directmap,frametable}_mapping() should be implemented for RV_32 #endif +unsigned long __ro_after_init start_page; + /* * Setup memory management * @@ -XXX,XX +XXX,XX @@ void __init setup_mm(void) ram_end = max(ram_end, bank_end); setup_directmap_mappings(PFN_DOWN(bank_start), PFN_DOWN(bank_size)); + + set_pdx_range(paddr_to_pfn(bank_start), paddr_to_pfn(bank_end)); } setup_frametable_mappings(ram_start, ram_end); + + start_page = PFN_DOWN(ram_start); max_page = PFN_DOWN(ram_end); } @@ -XXX,XX +XXX,XX @@ void put_page(struct page_info *page) free_domheap_page(page); } } + +int page_is_ram_type(unsigned long mfn, unsigned long mem_type) +{ + ASSERT_UNREACHABLE(); + + return 0; +} + +struct domain *page_get_owner_and_reference(struct page_info *page) +{ + unsigned long x, y = page->count_info; + struct domain *owner; + + do { + x = y; + /* + * Count == 0: Page is not allocated, so we cannot take a reference. + * Count == -1: Reference count would wrap, which is invalid. + */ + if ( unlikely(((x + 1) & PGC_count_mask) <= 1) ) + return NULL; + } + while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x ); + + owner = page_get_owner(page); + ASSERT(owner); + + return owner; +} -- 2.50.1
Introduce helper functions for safely querying the P2M (physical-to-machine) mapping: - add p2m_read_lock(), p2m_read_unlock(), and p2m_is_locked() for managing P2M lock state. - Implement p2m_get_entry() to retrieve mapping details for a given GFN, including MFN, page order, and validity. - Add p2m_lookup() to encapsulate read-locked MFN retrieval. - Introduce p2m_get_page_from_gfn() to convert a GFN into a page_info pointer, acquiring a reference to the page if valid. - Introduce get_page(). Implementations are based on Arm's functions with some minor modifications: - p2m_get_entry(): - Reverse traversal of page tables, as RISC-V uses the opposite level numbering compared to Arm. - Removed the return of p2m_access_t from p2m_get_entry() since mem_access_settings is not introduced for RISC-V. - Updated BUILD_BUG_ON() to check using the level 0 mask, which corresponds to Arm's THIRD_MASK. - Replaced open-coded bit shifts with the BIT() macro. - Other minor changes, such as using RISC-V-specific functions to validate P2M PTEs, and replacing Arm-specific GUEST_* macros with their RISC-V equivalents. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Add is_p2m_foreign() macro and connected stuff. - Change struct domain *d argument of p2m_get_page_from_gfn() to struct p2m_domain. - Update the comment above p2m_get_entry(). - s/_t/p2mt for local variable in p2m_get_entry(). - Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn) to define offsets array. - Code style fixes. - Update a check of rc code from p2m_next_level() in p2m_get_entry() and drop "else" case. - Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL. - Use struct p2m_domain instead of struct domain for p2m_lookup() and p2m_get_page_from_gfn(). - Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers" --- Changes in V2: - New patch. --- xen/arch/riscv/include/asm/p2m.h | 18 ++++ xen/arch/riscv/mm.c | 13 +++ xen/arch/riscv/p2m.c | 136 +++++++++++++++++++++++++++++++ 3 files changed, 167 insertions(+) diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ static inline int p2m_is_write_locked(struct p2m_domain *p2m) unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid); +static inline void p2m_read_lock(struct p2m_domain *p2m) +{ + read_lock(&p2m->lock); +} + +static inline void p2m_read_unlock(struct p2m_domain *p2m) +{ + read_unlock(&p2m->lock); +} + +static inline int p2m_is_locked(struct p2m_domain *p2m) +{ + return rw_is_locked(&p2m->lock); +} + +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn, + p2m_type_t *t); + #endif /* ASM__RISCV__P2M_H */ /* diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -XXX,XX +XXX,XX @@ struct domain *page_get_owner_and_reference(struct page_info *page) return owner; } + +bool get_page(struct page_info *page, const struct domain *domain) +{ + const struct domain *owner = page_get_owner_and_reference(page); + + if ( likely(owner == domain) ) + return true; + + if ( owner != NULL ) + put_page(page); + + return false; +} diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ int map_regions_p2mt(struct domain *d, { return p2m_insert_mapping(p2m_get_hostp2m(d), gfn, nr, mfn, p2mt); } + +/* + * Get the details of a given gfn. + * + * If the entry is present, the associated MFN will be returned type filled up. + * The page_order will correspond to the order of the mapping in the page + * table (i.e it could be a superpage). + * + * If the entry is not present, INVALID_MFN will be returned and the + * page_order will be set according to the order of the invalid range. + * + * valid will contain the value of bit[0] (e.g valid bit) of the + * entry. + */ +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn, + p2m_type_t *t, + unsigned int *page_order, + bool *valid) +{ + unsigned int level = 0; + pte_t entry, *table; + int rc; + mfn_t mfn = INVALID_MFN; + DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn)); + + ASSERT(p2m_is_locked(p2m)); + BUILD_BUG_ON(XEN_PT_LEVEL_MAP_MASK(0) != PAGE_MASK); + + if ( valid ) + *valid = false; + + /* XXX: Check if the mapping is lower than the mapped gfn */ + + /* This gfn is higher than the highest the p2m map currently holds */ + if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) ) + { + for ( level = P2M_ROOT_LEVEL; level; level-- ) + if ( (gfn_x(gfn) & (XEN_PT_LEVEL_MASK(level) >> PAGE_SHIFT)) > + gfn_x(p2m->max_mapped_gfn) ) + break; + + goto out; + } + + table = p2m_get_root_pointer(p2m, gfn); + + /* + * the table should always be non-NULL because the gfn is below + * p2m->max_mapped_gfn and the root table pages are always present. + */ + if ( !table ) + { + ASSERT_UNREACHABLE(); + level = P2M_ROOT_LEVEL; + goto out; + } + + for ( level = P2M_ROOT_LEVEL; level; level-- ) + { + rc = p2m_next_level(p2m, true, level, &table, offsets[level]); + if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) ) + goto out_unmap; + + if ( rc != P2M_TABLE_NORMAL ) + break; + } + + entry = table[offsets[level]]; + + if ( pte_is_valid(entry) ) + { + if ( t ) + *t = p2m_get_type(entry); + + mfn = pte_get_mfn(entry); + /* + * The entry may point to a superpage. Find the MFN associated + * to the GFN. + */ + mfn = mfn_add(mfn, + gfn_x(gfn) & (BIT(XEN_PT_LEVEL_ORDER(level), UL) - 1)); + + if ( valid ) + *valid = pte_is_valid(entry); + } + + out_unmap: + unmap_domain_page(table); + + out: + if ( page_order ) + *page_order = XEN_PT_LEVEL_ORDER(level); + + return mfn; +} + +static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t) +{ + mfn_t mfn; + + p2m_read_lock(p2m); + mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL); + p2m_read_unlock(p2m); + + return mfn; +} + +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn, + p2m_type_t *t) +{ + struct page_info *page; + p2m_type_t p2mt = p2m_invalid; + mfn_t mfn = p2m_lookup(p2m, gfn, t); + + if ( !mfn_valid(mfn) ) + return NULL; + + if ( t ) + p2mt = *t; + + page = mfn_to_page(mfn); + + /* + * get_page won't work on foreign mapping because the page doesn't + * belong to the current domain. + */ + if ( p2m_is_foreign(p2mt) ) + { + struct domain *fdom = page_get_owner_and_reference(page); + ASSERT(fdom != NULL); + ASSERT(fdom != p2m->domain); + return page; + } + + return get_page(page, p2m->domain) ? page : NULL; +} -- 2.50.1
RISC-V's PTE has only two available bits that can be used to store the P2M type. This is insufficient to represent all the current RISC-V P2M types. Therefore, some P2M types must be stored outside the PTE bits. To address this, a metadata table is introduced to store P2M types that cannot fit in the PTE itself. Not all P2M types are stored in the metadata table—only those that require it. The metadata table is linked to the intermediate page table via the `struct page_info`'s list field of the corresponding intermediate page. To simplify the allocation and linking of intermediate and metadata page tables, `p2m_{alloc,free}_table()` functions are implemented. These changes impact `p2m_split_superpage()`, since when a superpage is split, it is necessary to update the metadata table of the new intermediate page table — if the entry being split has its P2M type set to `p2m_ext_storage` in its `P2M_TYPES` bits. In addition to updating the metadata of the new intermediate page table, the corresponding entry in the metadata for the original superpage is invalidated. Also, update p2m_{get,set}_type to work with P2M types which don't fit into PTE bits. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V3: - Add is_p2m_foreign() macro and connected stuff. - Change struct domain *d argument of p2m_get_page_from_gfn() to struct p2m_domain. - Update the comment above p2m_get_entry(). - s/_t/p2mt for local variable in p2m_get_entry(). - Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn) to define offsets array. - Code style fixes. - Update a check of rc code from p2m_next_level() in p2m_get_entry() and drop "else" case. - Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL. - Use struct p2m_domain instead of struct domain for p2m_lookup() and p2m_get_page_from_gfn(). - Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers" --- Changes in V2: - New patch. --- xen/arch/riscv/include/asm/mm.h | 9 ++ xen/arch/riscv/p2m.c | 205 +++++++++++++++++++++++++------- 2 files changed, 170 insertions(+), 44 deletions(-) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ struct page_info /* Order-size of the free chunk this page is the head of. */ unsigned int order; } free; + + /* Page is used to store metadata: p2m type. */ + struct { + /* + * Pointer to a page which store metadata for an intermediate page + * table. + */ + struct page_info *metadata; + } md; } v; union { diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ static int p2m_alloc_root_table(struct p2m_domain *p2m) { struct domain *d = p2m->domain; struct page_info *page; - const unsigned int nr_root_pages = P2M_ROOT_PAGES; + /* + * If the root page table starts at Level <= 2, and since only 1GB, 2MB, + * and 4KB mappings are supported (as enforced by the ASSERT() in + * p2m_set_entry()), it is necessary to allocate P2M_ROOT_PAGES for + * the root page table itself, plus an additional P2M_ROOT_PAGES for + * metadata storage. This is because only two free bits are available in + * the PTE, which are not sufficient to represent all possible P2M types. + */ + const unsigned int nr_root_pages = P2M_ROOT_PAGES * + ((P2M_ROOT_LEVEL <= 2) ? 2 : 1); /* * Return back nr_root_pages to assure the root table memory is also @@ -XXX,XX +XXX,XX @@ static int p2m_alloc_root_table(struct p2m_domain *p2m) if ( !page ) return -ENOMEM; + if ( P2M_ROOT_LEVEL <= 2 ) + { + /* + * In the case where P2M_ROOT_LEVEL <= 2, it is necessary to allocate + * a page of the same size as that used for the root page table. + * Therefore, p2m_allocate_root() can be safely reused. + */ + struct page_info *metadata = p2m_allocate_root(d); + if ( !metadata ) + { + free_domheap_pages(page, P2M_ROOT_ORDER); + return -ENOMEM; + } + + page->v.md.metadata = metadata; + } + p2m->root = page; return 0; @@ -XXX,XX +XXX,XX @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn) return __map_domain_page(p2m->root + root_table_indx); } -static int p2m_set_type(pte_t *pte, p2m_type_t t) +static void p2m_set_type(pte_t *pte, const p2m_type_t t, const unsigned int i) { - int rc = 0; - if ( t > p2m_ext_storage ) - panic("unimplemeted\n"); + { + ASSERT(pte); + + pte[i].pte = t; + } else pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK); - - return rc; } -static p2m_type_t p2m_get_type(const pte_t pte) +static p2m_type_t p2m_get_type(const pte_t pte, const pte_t *metadata, + const unsigned int i) { p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK); if ( type == p2m_ext_storage ) - panic("unimplemented\n"); + type = metadata[i].pte; return type; } @@ -XXX,XX +XXX,XX @@ static void p2m_set_permission(pte_t *e, p2m_type_t t) } } -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, + struct page_info *metadata_pg, + const unsigned int indx, + bool is_table) { pte_t e = (pte_t) { PTE_VALID }; @@ -XXX,XX +XXX,XX @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) if ( !is_table ) { + pte_t *metadata = __map_domain_page(metadata_pg); + p2m_set_permission(&e, t); + metadata[indx].pte = p2m_invalid; + if ( t < p2m_ext_storage ) - p2m_set_type(&e, t); + p2m_set_type(&e, t, indx); else - panic("unimplemeted\n"); + { + e.pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK); + p2m_set_type(metadata, t, indx); + } + + unmap_domain_page(metadata); } else /* @@ -XXX,XX +XXX,XX @@ static pte_t page_to_p2m_table(struct page_info *page) * p2m_invalid will be ignored inside p2m_pte_from_mfn() as is_table is * set to true and p2m_type_t shouldn't be applied for PTEs which * describe an intermidiate table. + * That it also a reason why `metadata` and `indx` argument of + * p2m_pte_from_mfn() are NULL. */ - return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, true); + return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, NULL, 0, true); } static struct page_info *p2m_alloc_page(struct p2m_domain *p2m) @@ -XXX,XX +XXX,XX @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m) return pg; } +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg); + +/* + * Allocate a page table with an additional extra page to store + * metadata for each entry of the page table. + * Link this metadata page to page table page's list field. + */ +static struct page_info * p2m_alloc_table(struct p2m_domain *p2m) +{ + enum table_type + { + INTERMEDIATE_TABLE=0, + /* + * At the moment, metadata is going to store P2M type + * for each PTE of page table. + */ + METADATA_TABLE, + TABLE_MAX + }; + + struct page_info *tables[TABLE_MAX]; + + for ( unsigned int i = 0; i < TABLE_MAX; i++ ) + { + tables[i] = p2m_alloc_page(p2m); + + if ( !tables[i] ) + goto out; + + clear_and_clean_page(tables[i]); + } + + tables[INTERMEDIATE_TABLE]->v.md.metadata = tables[METADATA_TABLE]; + + return tables[INTERMEDIATE_TABLE]; + + out: + for ( unsigned int i = 0; i < TABLE_MAX; i++ ) + if ( tables[i] ) + p2m_free_page(p2m, tables[i]); + + return NULL; +} + +/* + * Free page table's page and metadata page linked to page table's page. + */ +static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg) +{ + ASSERT(tbl_pg->v.md.metadata); + + p2m_free_page(p2m, tbl_pg->v.md.metadata); + p2m_free_page(p2m, tbl_pg); +} + /* * Allocate a new page table page with an extra metadata page and hook it * in via the given entry. */ static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry) { - struct page_info *page; + struct page_info *page = p2m_alloc_table(p2m); ASSERT(!pte_is_valid(*entry)); - page = p2m_alloc_page(p2m); - if ( page == NULL ) - return -ENOMEM; - - clear_and_clean_page(page); - p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_pte); return 0; @@ -XXX,XX +XXX,XX @@ static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t type) } /* Put any references on the page referenced by pte. */ -static void p2m_put_page(const pte_t pte, unsigned int level) +static void p2m_put_page(const pte_t pte, unsigned int level, p2m_type_t p2mt) { mfn_t mfn = pte_get_mfn(pte); - p2m_type_t p2m_type = p2m_get_type(pte); ASSERT(pte_is_valid(pte)); @@ -XXX,XX +XXX,XX @@ static void p2m_put_page(const pte_t pte, unsigned int level) switch ( level ) { case 1: - return p2m_put_2m_superpage(mfn, p2m_type); + return p2m_put_2m_superpage(mfn, p2mt); case 0: - return p2m_put_4k_page(mfn, p2m_type); + return p2m_put_4k_page(mfn, p2mt); } } @@ -XXX,XX +XXX,XX @@ static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg) /* Free pte sub-tree behind an entry */ static void p2m_free_subtree(struct p2m_domain *p2m, - pte_t entry, unsigned int level) + pte_t entry, unsigned int level, + const pte_t *metadata, const unsigned int index) { unsigned int i; - pte_t *table; + pte_t *table, *tmp_metadata; mfn_t mfn; struct page_info *pg; @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, if ( pte_is_superpage(entry, level) || (level == 0) ) { + p2m_type_t p2mt = p2m_get_type(entry, metadata, index); + #ifdef CONFIG_IOREQ_SERVER /* * If this gets called then either the entry was replaced by an entry @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, ioreq_request_mapcache_invalidate(p2m->domain); #endif - p2m_put_page(entry, level); + p2m_put_page(entry, level, p2mt); return; } - table = map_domain_page(pte_get_mfn(entry)); + mfn = pte_get_mfn(entry); + ASSERT(mfn_valid(mfn)); + table = map_domain_page(mfn); + pg = mfn_to_page(mfn); + tmp_metadata = __map_domain_page(pg->v.md.metadata); + for ( i = 0; i < XEN_PT_ENTRIES; i++ ) - p2m_free_subtree(p2m, table[i], level - 1); + p2m_free_subtree(p2m, table[i], level - 1, tmp_metadata, i); + unmap_domain_page(tmp_metadata); unmap_domain_page(table); /* @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, */ p2m_tlb_flush_sync(p2m); - mfn = pte_get_mfn(entry); - ASSERT(mfn_valid(mfn)); - - pg = mfn_to_page(mfn); - - page_list_del(pg, &p2m->pages); - p2m_free_page(p2m, pg); + p2m_free_table(p2m, pg); } static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, unsigned int level, unsigned int target, - const unsigned int *offsets) + const unsigned int *offsets, + pte_t *metadata_tbl) { struct page_info *page; unsigned long i; pte_t pte, *table; bool rv = true; + pte_t *tmp_metadata_tbl; /* Convenience aliases */ mfn_t mfn = pte_get_mfn(*entry); @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, ASSERT(level > target); ASSERT(pte_is_superpage(*entry, level)); - page = p2m_alloc_page(p2m->domain); + page = p2m_alloc_table(p2m); if ( !page ) { /* @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, return false; } + tmp_metadata_tbl = __map_domain_page(page->v.md.metadata); + table = __map_domain_page(page); /* @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, pte = *entry; pte_set_mfn(&pte, mfn_add(mfn, i << level_order)); + if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage ) + tmp_metadata_tbl[i] = metadata_tbl[offsets[level]]; + write_pte(new_entry, pte); } @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, */ if ( next_level != target ) rv = p2m_split_superpage(p2m, table + offsets[next_level], - level - 1, target, offsets); + level - 1, target, offsets, tmp_metadata_tbl); if ( p2m->clean_pte ) clean_dcache_va_range(table, PAGE_SIZE); @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, */ unmap_domain_page(table); + unmap_domain_page(tmp_metadata_tbl); + /* * Even if we failed, we should (according to the current implemetation * of a way how sub-tree is freed if p2m_split_superpage hasn't been @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, { /* We need to split the original page. */ pte_t split_pte = *entry; + struct page_info *metadata = virt_to_page(table)->v.md.metadata; + pte_t *metadata_tbl = __map_domain_page(metadata); ASSERT(pte_is_superpage(*entry, level)); - if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets, + metadata_tbl) ) { /* Free the allocated sub-tree */ - p2m_free_subtree(p2m, split_pte, level); + p2m_free_subtree(p2m, split_pte, level, metadata_tbl, offsets[level]); rc = -ENOMEM; goto out; } + unmap_domain_page(metadata_tbl); + p2m_write_pte(entry, split_pte, p2m->clean_pte); p2m->need_flush = true; @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, p2m_clean_pte(entry, p2m->clean_pte); else { - pte_t pte = p2m_pte_from_mfn(mfn, t, false); + pte_t pte = p2m_pte_from_mfn(mfn, t, virt_to_page(table)->v.md.metadata, + offsets[level], false); p2m_write_pte(entry, pte, p2m->clean_pte); @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, */ if ( pte_is_valid(orig_pte) && !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ) - p2m_free_subtree(p2m, orig_pte, level); + { + struct page_info *metadata = virt_to_page(table)->v.md.metadata; + pte_t *metadata_tbl = __map_domain_page(metadata); + p2m_free_subtree(p2m, orig_pte, level, metadata_tbl, offsets[level]); + unmap_domain_page(metadata_tbl); + } out: unmap_domain_page(table); @@ -XXX,XX +XXX,XX @@ static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn, if ( pte_is_valid(entry) ) { if ( t ) - *t = p2m_get_type(entry); + { + struct page_info *metadata_pg = virt_to_page(table)->v.md.metadata; + pte_t *metadata = __map_domain_page(metadata_pg); + *t = p2m_get_type(entry, metadata, offsets[level]); + unmap_domain_page(metadata); + } mfn = pte_get_mfn(entry); /* -- 2.50.1
In this patch series are introduced necessary functions to build and manage RISC-V guest page tables and MMIO/RAM mappings. CI tests: https://gitlab.com/xen-project/people/olkur/xen/-/pipelines/2247120521 --- Changes in V9: - Addressed comments for v8. --- Changes in V8: - All patches (except last three ones) are merged to staging. - Addressed comments for v7. --- Changes in V7: - Merged to staging: - xen/riscv: avoid redundant HGATP*_MODE_SHIFT and HGATP*_VMID_SHIFT - Introduce new patch: - xen/riscv: update p2m_set_entry to free unused metadata page (could be merged with previous one: xen/riscv: introduce metadata table to store P2M type ) - Addressed comments for v6. --- Changes in V6: - Addressed coment for v5. --- Changes in V5: - Addressed comments for v4. --- Changes in V4: - Merged to staging: - xen/riscv: introduce sbi_remote_hfence_gvma() - xen/riscv: introduce sbi_remote_hfence_gvma_vmid() - Drop "xen/riscv: introduce page_{get,set}_xenheap_gfn()" as grant tables aren't going to be introduced for the moment. Also, drops other parts connected to grant tables support. - All other changes are patch specific. --- Changes in V3: - Introduce metadata table to store P2M types. - Use x86's way to allocate VMID. - Abstract Arm-specific p2m type name for device MMIO mappings. - All other updates please look at specific patch. --- Changes in V2: - Merged to staging: - [PATCH v1 1/6] xen/riscv: add inclusion of xen/bitops.h to asm/cmpxchg.h - New patches: - xen/riscv: implement sbi_remote_hfence_gvma{_vmid}(). - Split patch "xen/riscv: implement p2m mapping functionality" into smaller one patches: - xen/riscv: introduce page_set_xenheap_gfn() - xen/riscv: implement guest_physmap_add_entry() for mapping GFNs to MFNs - xen/riscv: implement p2m_set_entry() and __p2m_set_entry() - xen/riscv: Implement p2m_free_entry() and related helpers - xen/riscv: Implement superpage splitting for p2m mappings - xen/riscv: implement p2m_next_level() - xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration - Move root p2m table allocation to separate patch: xen/riscv: add root page table allocation - Drop dependency of this patch series from the patch witn an introduction of SvPBMT as it was merged. - Patch "[PATCH v1 4/6] xen/riscv: define pt_t and pt_walk_t structures" was renamed to xen/riscv: introduce pte_{set,get}_mfn() as after dropping of bitfields for PTE structure, this patch introduce only pte_{set,get}_mfn(). - Rename "xen/riscv: define pt_t and pt_walk_t structures" to "xen/riscv: introduce pte_{set,get}_mfn()" as pt_t and pt_walk_t were dropped. - Introduce guest domain's VMID allocation and manegement. - Add patches necessary to implement p2m lookup: - xen/riscv: implement mfn_valid() and page reference, ownership handling helpers - xen/riscv: add support of page lookup by GFN - Re-sort patch series. - All other changes are patch-specific. Please check them. --- Oleksii Kurochko (3): xen/riscv: add support of page lookup by GFN xen/riscv: introduce metadata table to store P2M type xen/riscv: update p2m_set_entry() to free unused metadata pages xen/arch/riscv/include/asm/flushtlb.h | 2 +- xen/arch/riscv/include/asm/mm.h | 21 ++ xen/arch/riscv/include/asm/p2m.h | 21 ++ xen/arch/riscv/mm.c | 13 + xen/arch/riscv/p2m.c | 435 ++++++++++++++++++++++++-- 5 files changed, 462 insertions(+), 30 deletions(-) -- 2.52.0
Introduce helper functions for safely querying the P2M (physical-to-machine) mapping: - add p2m_read_lock(), p2m_read_unlock(), and p2m_is_locked() for managing P2M lock state. - Implement p2m_get_entry() to retrieve mapping details for a given GFN, including MFN, page order, and validity. - Introduce p2m_get_page_from_gfn() to convert a GFN into a page_info pointer, acquiring a reference to the page if valid. - Introduce get_page(). Implementations are based on Arm's functions with some minor modifications: - p2m_get_entry(): - Reverse traversal of page tables, as RISC-V uses the opposite level numbering compared to Arm. - Removed the return of p2m_access_t from p2m_get_entry() since mem_access_settings is not introduced for RISC-V. - Updated BUILD_BUG_ON() to check using the level 0 mask, which corresponds to Arm's THIRD_MASK. - Replaced open-coded bit shifts with the BIT() macro. Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V9: - Update check_outside_boundary() to return (P2M_MAX_ROOT_LEVEL + 1) in the case if gfn is inside range. --- Changes in V8: - Drop the local variable masked_gfn inside check_outside_boundary() and fold the is_lower conditionals into the for loop. - Initialize the local variable level in p2m_get_entry() to the root level and drop the explicit assignment when root page table wasn't found, as it now defaults to the root level. - Introduce gfn_limit_bits and use it to calculate the maximum GFN for the MMU second stage, and return the appropriate page_order when the GFN exceeds this limit. --- Changes in V7: - Refactor check_outside_boundary(). - Reword the comment above p2m_get_entry(). - As at the moment p2m_get_entry() doesn't pass `t` as NULL we could drop "if ( t )" checks inside it to not have dead code now. - Add the check inside p2m_get_entry() that requested gfn is correct. - Add "if ( t )" check inside p2m_get_page_from_gfn() as it is going to be some callers with t = NULL. --- Changes in V6: - Move if-condition with initialization up in p2m_get_page_from_gfn(). - Pass p2mt to the call of p2m_get_entry() inside p2m_get_page_from_gfn() to avoid an issue when 't' is passed NULL. With p2mt passed to p2m_get_entry() we will recieve a proper type and so the rest of the function will able to continue use a proper type. - In check_outside_boundary() in the case when is_lower == true fill the bottom bits of masked_gfn with all 1s. - Update code of check_outside_boundary() to return proper level in the case when `level` is equal to 0. - Add ASSERT(p2m) in check_outside_boundary() to be sure that p2m isn't NULL as P2M_LEVEL_MASK() depends on p2m value. --- Changes in V5: - Use introduced in earlier patches P2M_DECLARE_OFFSETS() instead of DECLARE_OFFSETS(). - Drop blank line before check_outside_boundary(). - Use more readable version of if statements inside check_outside_boundary(). - Accumulate mask in check_outside_boundary() instead of re-writing it for each page table level to have correct gfns for comparison. - Set argument `t` of p2m_get_entry() to p2m_invalid by default. - Drop checking of (rc == P2M_TABLE_MAP_NOMEM ) when p2m_next_level(...,false,...) is called. - Add ASSERT(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)); in p2m_get_entry() to be sure that recieved `mfn` has cleared lowest bits. - Drop `valid` argument from p2m_get_entry(), it is not needed anymore. - Drop p2m_lookup(), use p2m_get_entry() explicitly inside p2m_get_page_from_gfn(). - Update the commit message. --- Changes in V4: - Update prototype of p2m_is_locked() to return bool and accept pointer-to-const. - Correct the comment above p2m_get_entry(). - Drop the check "BUILD_BUG_ON(XEN_PT_LEVEL_MAP_MASK(0) != PAGE_MASK);" inside p2m_get_entry() as it is stale and it was needed to sure that 4k page(s) are used on L3 (in Arm terms) what is true for RISC-V. (if not special extension are used). It was another reason for Arm to have it (and I copied it to RISC-V), but it isn't true for RISC-V. (some details could be found in response to the patch). - Style fixes. - Add explanatory comment what the loop inside "gfn is higher then the highest p2m mapping" does. Move this loop to separate function check_outside_boundary() to cover both boundaries (lower_mapped_gfn and max_mapped_gfn). - There is not need to allocate a page table as it is expected that p2m_get_entry() normally would be called after a corresponding p2m_set_entry() was called. So change 'true' to 'false' in a page table walking loop inside p2m_get_entry(). - Correct handling of p2m_is_foreign case inside p2m_get_page_from_gfn(). - Introduce and use P2M_LEVEL_MASK instead of XEN_PT_LEVEL_MASK as it isn't take into account two extra bits for root table in case of P2M. - Drop stale item from "change in v3" - Add is_p2m_foreign() macro and connected stuff. - Add p2m_read_(un)lock(). --- Changes in V3: - Change struct domain *d argument of p2m_get_page_from_gfn() to struct p2m_domain. - Update the comment above p2m_get_entry(). - s/_t/p2mt for local variable in p2m_get_entry(). - Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn) to define offsets array. - Code style fixes. - Update a check of rc code from p2m_next_level() in p2m_get_entry() and drop "else" case. - Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL. - Use struct p2m_domain instead of struct domain for p2m_lookup() and p2m_get_page_from_gfn(). - Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers" --- Changes in V2: - New patch. --- xen/arch/riscv/include/asm/p2m.h | 21 ++++ xen/arch/riscv/mm.c | 13 +++ xen/arch/riscv/p2m.c | 185 +++++++++++++++++++++++++++++++ 3 files changed, 219 insertions(+) diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/p2m.h +++ b/xen/arch/riscv/include/asm/p2m.h @@ -XXX,XX +XXX,XX @@ #define P2M_GFN_LEVEL_SHIFT(lvl) (P2M_LEVEL_ORDER(lvl) + PAGE_SHIFT) +#define P2M_LEVEL_MASK(p2m, lvl) \ + (P2M_TABLE_OFFSET(p2m, lvl) << P2M_GFN_LEVEL_SHIFT(lvl)) + #define paddr_bits PADDR_BITS /* Get host p2m table */ @@ -XXX,XX +XXX,XX @@ static inline bool p2m_is_write_locked(struct p2m_domain *p2m) unsigned long construct_hgatp(const struct p2m_domain *p2m, uint16_t vmid); +static inline void p2m_read_lock(struct p2m_domain *p2m) +{ + read_lock(&p2m->lock); +} + +static inline void p2m_read_unlock(struct p2m_domain *p2m) +{ + read_unlock(&p2m->lock); +} + +static inline bool p2m_is_locked(const struct p2m_domain *p2m) +{ + return rw_is_locked(&p2m->lock); +} + +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn, + p2m_type_t *t); + #endif /* ASM__RISCV__P2M_H */ /* diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -XXX,XX +XXX,XX @@ struct domain *page_get_owner_and_reference(struct page_info *page) return owner; } + +bool get_page(struct page_info *page, const struct domain *domain) +{ + const struct domain *owner = page_get_owner_and_reference(page); + + if ( likely(owner == domain) ) + return true; + + if ( owner != NULL ) + put_page(page); + + return false; +} diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ int map_regions_p2mt(struct domain *d, return rc; } + +/* + * p2m_get_entry() should always return the correct order value, even if an + * entry is not present (i.e. the GFN is outside the range): + * [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn] (1) + * + * This ensures that callers of p2m_get_entry() can determine what range of + * address space would be altered by a corresponding p2m_set_entry(). + * Also, it would help to avoid costly page walks for GFNs outside range (1). + * + * Therefore, this function returns true for GFNs outside range (1), and in + * that case the corresponding level is returned via the level_out argument. + * Otherwise, it returns false and p2m_get_entry() performs a page walk to + * find the proper entry. + */ +static bool check_outside_boundary(const struct p2m_domain *p2m, gfn_t gfn, + gfn_t boundary, bool is_lower, + unsigned int *level_out) +{ + unsigned int level = P2M_MAX_ROOT_LEVEL + 1; + bool ret = false; + + ASSERT(p2m); + + if ( is_lower ? gfn_x(gfn) < gfn_x(boundary) + : gfn_x(gfn) > gfn_x(boundary) ) + { + for ( level = P2M_ROOT_LEVEL(p2m) ; level; level-- ) + { + unsigned long mask = BIT(P2M_GFN_LEVEL_SHIFT(level), UL) - 1; + + if ( is_lower ? (gfn_x(gfn) | mask) < gfn_x(boundary) + : (gfn_x(gfn) & ~mask) > gfn_x(boundary) ) + break; + } + + ret = true; + } + + if ( level_out ) + *level_out = level; + + return ret; +} + +/* + * Get the details of a given gfn. + * + * If the entry is present, the associated MFN, the p2m type of the mapping, + * and the page order of the mapping in the page table (i.e., it could be a + * superpage) will be returned. + * + * If the entry is not present, INVALID_MFN will be returned, page_order will + * be set according to the order of the invalid range, and the type will be + * p2m_invalid. + */ +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn, + p2m_type_t *t, + unsigned int *page_order) +{ + unsigned int level = P2M_ROOT_LEVEL(p2m); + unsigned int gfn_limit_bits = + P2M_LEVEL_ORDER(level + 1) + P2M_ROOT_EXTRA_BITS(p2m, level); + pte_t entry, *table; + int rc; + mfn_t mfn = INVALID_MFN; + + P2M_BUILD_LEVEL_OFFSETS(p2m, offsets, gfn_to_gaddr(gfn)); + + ASSERT(p2m_is_locked(p2m)); + + *t = p2m_invalid; + + if ( gfn_x(gfn) > (BIT(gfn_limit_bits, UL) - 1) ) + { + if ( page_order ) + *page_order = gfn_limit_bits; + + return mfn; + } + + if ( check_outside_boundary(p2m, gfn, p2m->lowest_mapped_gfn, true, + &level) ) + goto out; + + if ( check_outside_boundary(p2m, gfn, p2m->max_mapped_gfn, false, &level) ) + goto out; + + table = p2m_get_root_pointer(p2m, gfn); + + /* + * The table should always be non-NULL because the gfn is below + * p2m->max_mapped_gfn and the root table pages are always present. + */ + if ( !table ) + { + ASSERT_UNREACHABLE(); + goto out; + } + + for ( level = P2M_ROOT_LEVEL(p2m); level; level-- ) + { + rc = p2m_next_level(p2m, false, level, &table, offsets[level]); + if ( rc == P2M_TABLE_MAP_NONE ) + goto out_unmap; + + if ( rc != P2M_TABLE_NORMAL ) + break; + } + + entry = table[offsets[level]]; + + if ( pte_is_valid(entry) ) + { + *t = p2m_get_type(entry); + + mfn = pte_get_mfn(entry); + + ASSERT(!(mfn_x(mfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1))); + + /* + * The entry may point to a superpage. Find the MFN associated + * to the GFN. + */ + mfn = mfn_add(mfn, + gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)); + } + + out_unmap: + unmap_domain_page(table); + + out: + if ( page_order ) + *page_order = P2M_LEVEL_ORDER(level); + + return mfn; +} + +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn, + p2m_type_t *t) +{ + struct page_info *page; + p2m_type_t p2mt; + mfn_t mfn; + + p2m_read_lock(p2m); + mfn = p2m_get_entry(p2m, gfn, &p2mt, NULL); + + if ( t ) + *t = p2mt; + + if ( !mfn_valid(mfn) ) + { + p2m_read_unlock(p2m); + return NULL; + } + + page = mfn_to_page(mfn); + + /* + * get_page won't work on foreign mapping because the page doesn't + * belong to the current domain. + */ + if ( unlikely(p2m_is_foreign(p2mt)) ) + { + const struct domain *fdom = page_get_owner_and_reference(page); + + p2m_read_unlock(p2m); + + if ( fdom ) + { + if ( likely(fdom != p2m->domain) ) + return page; + + ASSERT_UNREACHABLE(); + put_page(page); + } + + return NULL; + } + + p2m_read_unlock(p2m); + + return get_page(page, p2m->domain) ? page : NULL; +} -- 2.52.0
RISC-V's PTE has only two available bits that can be used to store the P2M type. This is insufficient to represent all the current RISC-V P2M types. Therefore, some P2M types must be stored outside the PTE bits. To address this, a metadata table is introduced to store P2M types that cannot fit in the PTE itself. Not all P2M types are stored in the metadata table—only those that require it. The metadata table is linked to the intermediate page table via the `struct page_info`'s v.md.metadata field of the corresponding intermediate page. Such pages are allocated with MEMF_no_owner, which allows us to use the v field for the purpose of storing the metadata table. To simplify the allocation and linking of intermediate and metadata page tables, `p2m_{alloc,free}_table()` functions are implemented. These changes impact `p2m_split_superpage()`, since when a superpage is split, it is necessary to update the metadata table of the new intermediate page table — if the entry being split has its P2M type set to `p2m_ext_storage` in its `P2M_TYPES` bits. In addition to updating the metadata of the new intermediate page table, the corresponding entry in the metadata for the original superpage is invalidated. Also, update p2m_{get,set}_type to work with P2M types which don't fit into PTE bits. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- Changes in V9: - Fold ASSERT(ctx->p2m) to the previous one ASSERT() in p2m_set_type(). --- Changes in V8: - Update the comment above p2m_set_type(). - Drop BUG_ON(ctx->level ...) and "if ( ctx->level <= P2M_MAX_SUPPORTED_LEVEL_MAPPING )" as p2m_set_type() doesn't care about ctx->level and it is expected that passed `pte` is valid, and so ctx->level is expected to be valid too. - Rename p2m_pte_ctx argument to ctx for p2m_pte_from_mfn() and p2m_free_subtree(). - Initialize local variable p2m_pte_ctx inside p2m_split_superpage() with an initializer. Drop an assigment of p2m_pte_ctx->level when old pte's type is got. - Use initializer for tmp_ctx and drop an assignment of tmp_ctx.p2m inside p2m_set_type(). - Drop brackets around p2m_free_subtree() call inside p2m_set_entry(). --- Changes in V7: - Put p2m_domain * inside struct p2m_pte_ctx and update an APIs of p2m_set_type(), p2m_pte_from_mfn(). Also, move ASSERT(p2m) closer to p2m_alloc_page(ctx->p2m) inside p2m_set_type(). Update all callers of p2m_set_type() and p2m_pte_from_mfn(). - Update the comment above BUILD_BUG_ON(p2m_invalid): drop unnessary sentenses and make it shorter then 80 chars. - Drop the comment and BUILD_BUG_ON() in p2m_get_type() as it is enough to have it in p2m_set_type(). - Update the comment above p2m_set_type() about p2m argument which was droppped. - Make ctx argument of p2m_set_type() const to be able to re-use p2m_pte_ctx across multiple iterations without fully reinitializing. - Declare "struct p2m_pte_ctx tmp_ctx;" as function scope variable and rework p2m_set_entry() correspondingly. --- Changes in V6: - Introduce new type md_t to use it instead of pte_t to store metadata types outside PTE bits. - Integrate introduced struct md_t. - Drop local variable "struct domain *d" inside p2m_set_type(). - Drop __func__ printting and use %pv. - Code style fixes - Drop unnessarry check inside if-condition in p2m_pte_from_mfn() as we have ASSERT(p2m) inside p2m_set_type() anyway. - Return back the commnent inside page_to_p2m_table() as it was deleted accidently. - move the initialization of p2m_pte_ctx.pt_page and p2m_pte_ctx.level ahead of the loop - Add BUILD_BUG_ON(p2m_invalid) before the call of p2m_alloc_page() in p2m_set_type() and in p2m_get_type() before " if ( type == p2m_ext_storage )". - Set to NULL tbl_pg->v.md.pg in p2m_free_table(). - Make argument 't' of p2m_set_type() non-const as we are going to change it. - Add some explanatory comments. - Update ASSERT at the start of p2m_set_type() to verify that passed ctx->index is lesser then 512 and drop calculation of an index of root page as it is guaranteed by calc_offset() and get_root_pointer() that we will aready get proper page and proper index inside this page. --- Changes in V5: - Rename metadata member of stuct md inside struct page_info to pg. - Stray blank in the declaration of p2m_alloc_table(). - Use "<" instead of "<=" in ASSERT() in p2m_set_type(). - Move the check that ctx is provided to an earlier point in p2m_set_type(). - Set `md_pg` after ASSERT() in p2m_set_type(). - Add BUG_ON() insetead of ASSERT_UNREACHABLE() in p2m_set_type(). - Drop a check that metadata isn't NULL before unmap_domain_page() is being called. - Make const `md` variable in p2m_get_type(). - unmap correct domain's page in p2m_get_type: use `md` instead of ctx->pt_page->v.md.pg. - Add description of how p2m and p2m_pte_ctx is expected to be used in p2m_pte_from_mfn() and drop a comment from page_to_p2m_table(). - Drop the stale part of the comment above p2m_alloc_table(). - Drop ASSERT(tbl_pg->v.md.pg) from p2m_free_table() as tbl_pg->v.md.pg is created conditionally now. - Drop an introduction of p2m_alloc_table(), update p2m_alloc_page() correspondengly and use it instead. - Add missing blank in definition of level member for tmp_ctx variable in p2m_free_subtree(). Also, add the comma at the end. - Initialize old_type once before for-loop in p2m_split_superpage() as old type will be used for all newly created PTEs. - Properly initialize p2m_pte_ctx.level with next_level instead of level when p2m_set_type() is going to be called for new PTEs. - Fix identations. - Move ASSERT(p2m) on top of p2m_set_type() to be sure that NULL isn't passed for p2m argument of p2m_set_type(). - s/virt_to_page(table)/mfn_to_page(domain_page_map_to_mfn(table)) to recieve correct page for a table which is mapped by domain_page_map(). - Add "return;" after domain_crash() in p2m_set_type() to avoid potential NULL pointer dereference of md_pg. --- Changes in V4: - Add Suggested-by: Jan Beulich <jbeulich@suse.com>. - Update the comment above declation of md structure inside struct page_info to: "Page is used as an intermediate P2M page table". - Allocate metadata table on demand to save some memory. (1) - Rework p2m_set_type(): - Add allocatation of metadata page only if needed. - Move a check what kind of type we are handling inside p2m_set_type(). - Move mapping of metadata page inside p2m_get_type() as it is needed only in case if PTE's type is equal to p2m_ext_storage. - Add some description to p2m_get_type() function. - Drop blank after return type of p2m_alloc_table(). - Drop allocation of metadata page inside p2m_alloc_table becaues of (1). - Fix p2m_free_table() to free metadata page only if it was allocated. --- Changes in V3: - Add is_p2m_foreign() macro and connected stuff. - Change struct domain *d argument of p2m_get_page_from_gfn() to struct p2m_domain. - Update the comment above p2m_get_entry(). - s/_t/p2mt for local variable in p2m_get_entry(). - Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn) to define offsets array. - Code style fixes. - Update a check of rc code from p2m_next_level() in p2m_get_entry() and drop "else" case. - Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL. - Use struct p2m_domain instead of struct domain for p2m_lookup() and p2m_get_page_from_gfn(). - Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers" --- Changes in V2: - New patch. --- xen/arch/riscv/include/asm/mm.h | 9 ++ xen/arch/riscv/p2m.c | 234 ++++++++++++++++++++++++++++---- 2 files changed, 213 insertions(+), 30 deletions(-) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ struct page_info /* Order-size of the free chunk this page is the head of. */ unsigned int order; } free; + + /* Page is used as an intermediate P2M page table */ + struct { + /* + * Pointer to a page which store metadata for an intermediate page + * table. + */ + struct page_info *pg; + } md; } v; union { diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ */ #define P2M_MAX_SUPPORTED_LEVEL_MAPPING _AC(2, U) +struct md_t { + /* + * Describes a type stored outside PTE bits. + * Look at the comment above definition of enum p2m_type_t. + */ + p2m_type_t type : 4; +}; + +/* + * P2M PTE context is used only when a PTE's P2M type is p2m_ext_storage. + * In this case, the P2M type is stored separately in the metadata page. + */ +struct p2m_pte_ctx { + struct p2m_domain *p2m; + struct page_info *pt_page; /* Page table page containing the PTE. */ + unsigned int index; /* Index of the PTE within that page. */ + unsigned int level; /* Paging level at which the PTE resides. */ +}; + static struct gstage_mode_desc __ro_after_init max_gstage_mode = { .mode = HGATP_MODE_OFF, .paging_levels = 0, @@ -XXX,XX +XXX,XX @@ unsigned char get_max_supported_mode(void) return max_gstage_mode.mode; } +/* + * If anything is changed here, it may also require updates to + * p2m_{get,set}_type(). + */ static inline unsigned int calc_offset(const struct p2m_domain *p2m, const unsigned int lvl, const paddr_t gpa) @@ -XXX,XX +XXX,XX @@ static inline unsigned int calc_offset(const struct p2m_domain *p2m, * The caller is responsible for unmapping the page after use. * * Returns NULL if the calculated offset into the root table is invalid. + * + * If anything is changed here, it may also require updates to + * p2m_{get,set}_type(). */ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn) { @@ -XXX,XX +XXX,XX @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m) return pg; } -static int p2m_set_type(pte_t *pte, p2m_type_t t) +/* + * `pte` – PTE entry for which the type `t` will be stored. + * + * If `t` >= p2m_first_external, a valid `ctx` must be provided. + */ +static void p2m_set_type(pte_t *pte, p2m_type_t t, + const struct p2m_pte_ctx *ctx) { - int rc = 0; + struct page_info **md_pg; + struct md_t *metadata = NULL; - if ( t > p2m_first_external ) - panic("unimplemeted\n"); - else - pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK); + /* + * It is sufficient to compare ctx->index with PAGETABLE_ENTRIES because, + * even for the p2m root page table (which is a 16 KB page allocated as + * four 4 KB pages), calc_offset() guarantees that the page-table index + * will always fall within the range [0, 511]. + */ + ASSERT(ctx && ctx->index < PAGETABLE_ENTRIES && ctx->p2m); - return rc; + /* + * At the moment, p2m_get_root_pointer() returns one of four possible p2m + * root pages, so there is no need to search for the correct ->pt_page + * here. + * Non-root page tables are 4 KB pages, so simply using ->pt_page is + * sufficient. + */ + md_pg = &ctx->pt_page->v.md.pg; + + if ( !*md_pg && (t >= p2m_first_external) ) + { + /* + * Since p2m_alloc_page() initializes an allocated page with + * zeros, p2m_invalid is expected to have the value 0 as well. + */ + BUILD_BUG_ON(p2m_invalid); + + *md_pg = p2m_alloc_page(ctx->p2m); + if ( !*md_pg ) + { + printk("%pd: can't allocate metadata page\n", + ctx->p2m->domain); + domain_crash(ctx->p2m->domain); + + return; + } + } + + if ( *md_pg ) + metadata = __map_domain_page(*md_pg); + + if ( t >= p2m_first_external ) + { + metadata[ctx->index].type = t; + + t = p2m_ext_storage; + } + else if ( metadata ) + metadata[ctx->index].type = p2m_invalid; + + pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK); + + unmap_domain_page(metadata); } -static p2m_type_t p2m_get_type(const pte_t pte) +/* + * `pte` -> PTE entry that stores the PTE's type. + * + * If the PTE's type is `p2m_ext_storage`, `ctx` should be provided; + * otherwise it could be NULL. + */ +static p2m_type_t p2m_get_type(const pte_t pte, const struct p2m_pte_ctx *ctx) { p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK); if ( type == p2m_ext_storage ) - panic("unimplemented\n"); + { + const struct md_t *md = __map_domain_page(ctx->pt_page->v.md.pg); + + type = md[ctx->index].type; + + /* + * Since p2m_set_type() guarantees that the type will be greater than + * p2m_first_external, just check that we received a valid type here. + */ + ASSERT(type > p2m_first_external); + + unmap_domain_page(md); + } return type; } @@ -XXX,XX +XXX,XX @@ static void p2m_set_permission(pte_t *e, p2m_type_t t) } } -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) +/* + * If p2m_pte_from_mfn() is called with ctx = NULL, + * it means the function is working with a page table for which the `t` + * should not be applicable. Otherwise, the function is handling a leaf PTE + * for which `t` is applicable. + */ +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, + struct p2m_pte_ctx *ctx) { pte_t e = (pte_t) { PTE_VALID }; @@ -XXX,XX +XXX,XX @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK) || mfn_eq(mfn, INVALID_MFN)); - if ( !is_table ) + if ( ctx ) { switch ( t ) { @@ -XXX,XX +XXX,XX @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table) } p2m_set_permission(&e, t); - p2m_set_type(&e, t); + p2m_set_type(&e, t, ctx); } else /* @@ -XXX,XX +XXX,XX @@ static pte_t page_to_p2m_table(const struct page_info *page) * set to true and p2m_type_t shouldn't be applied for PTEs which * describe an intermediate table. */ - return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, true); + return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, NULL); +} + +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg); + +/* + * Free page table's page and metadata page linked to page table's page. + */ +static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg) +{ + if ( tbl_pg->v.md.pg ) + { + p2m_free_page(p2m, tbl_pg->v.md.pg); + tbl_pg->v.md.pg = NULL; + } + p2m_free_page(p2m, tbl_pg); } /* Allocate a new page table page and hook it in via the given entry. */ @@ -XXX,XX +XXX,XX @@ static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg) /* Free pte sub-tree behind an entry */ static void p2m_free_subtree(struct p2m_domain *p2m, - pte_t entry, unsigned int level) + pte_t entry, + const struct p2m_pte_ctx *ctx) { unsigned int i; pte_t *table; mfn_t mfn; struct page_info *pg; + unsigned int level = ctx->level; /* * Check if the level is valid: only 4K - 2M - 1G mappings are supported. @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, if ( pte_is_mapping(entry) ) { - p2m_type_t p2mt = p2m_get_type(entry); + p2m_type_t p2mt = p2m_get_type(entry, ctx); #ifdef CONFIG_IOREQ_SERVER /* @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, return; } - table = map_domain_page(pte_get_mfn(entry)); + mfn = pte_get_mfn(entry); + ASSERT(mfn_valid(mfn)); + table = map_domain_page(mfn); + pg = mfn_to_page(mfn); for ( i = 0; i < P2M_PAGETABLE_ENTRIES(p2m, level); i++ ) - p2m_free_subtree(p2m, table[i], level - 1); + { + struct p2m_pte_ctx tmp_ctx = { + .pt_page = pg, + .index = i, + .level = level - 1, + .p2m = p2m, + }; + + p2m_free_subtree(p2m, table[i], &tmp_ctx); + } unmap_domain_page(table); @@ -XXX,XX +XXX,XX @@ static void p2m_free_subtree(struct p2m_domain *p2m, */ p2m_tlb_flush_sync(p2m); - mfn = pte_get_mfn(entry); - ASSERT(mfn_valid(mfn)); - - pg = mfn_to_page(mfn); - - p2m_free_page(p2m, pg); + p2m_free_table(p2m, pg); } static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, unsigned int level, unsigned int target, - const unsigned int *offsets) + const unsigned int *offsets, + struct page_info *tbl_pg) { struct page_info *page; unsigned long i; @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, unsigned int next_level = level - 1; unsigned int level_order = P2M_LEVEL_ORDER(next_level); + struct p2m_pte_ctx p2m_pte_ctx = { + .p2m = p2m, + .level = level, + }; + + /* Init with p2m_invalid just to make compiler happy. */ + p2m_type_t old_type = p2m_invalid; + /* * This should only be called with target != level and the entry is * a superpage. @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, table = __map_domain_page(page); + if ( MASK_EXTR(entry->pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage ) + { + p2m_pte_ctx.pt_page = tbl_pg; + p2m_pte_ctx.index = offsets[level]; + + old_type = p2m_get_type(*entry, &p2m_pte_ctx); + } + + p2m_pte_ctx.pt_page = page; + p2m_pte_ctx.level = next_level; + for ( i = 0; i < P2M_PAGETABLE_ENTRIES(p2m, next_level); i++ ) { pte_t *new_entry = table + i; @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, pte = *entry; pte_set_mfn(&pte, mfn_add(mfn, i << level_order)); + if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage ) + { + p2m_pte_ctx.index = i; + + p2m_set_type(&pte, old_type, &p2m_pte_ctx); + } + write_pte(new_entry, pte); } @@ -XXX,XX +XXX,XX @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, */ if ( next_level != target ) rv = p2m_split_superpage(p2m, table + offsets[next_level], - next_level, target, offsets); + next_level, target, offsets, page); if ( p2m->clean_dcache ) clean_dcache_va_range(table, PAGE_SIZE); @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, * are still allowed. */ bool removing_mapping = mfn_eq(mfn, INVALID_MFN); + struct p2m_pte_ctx tmp_ctx = { + .p2m = p2m, + }; P2M_BUILD_LEVEL_OFFSETS(p2m, offsets, gfn_to_gaddr(gfn)); ASSERT(p2m_is_write_locked(p2m)); @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, { /* We need to split the original page. */ pte_t split_pte = *entry; + struct page_info *tbl_pg = mfn_to_page(domain_page_map_to_mfn(table)); ASSERT(pte_is_superpage(*entry, level)); - if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets, + tbl_pg) ) { + tmp_ctx.pt_page = tbl_pg; + tmp_ctx.index = offsets[level]; + tmp_ctx.level = level; + /* Free the allocated sub-tree */ - p2m_free_subtree(p2m, split_pte, level); + p2m_free_subtree(p2m, split_pte, &tmp_ctx); rc = -ENOMEM; goto out; @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, entry = table + offsets[level]; } + tmp_ctx.pt_page = mfn_to_page(domain_page_map_to_mfn(table)); + tmp_ctx.index = offsets[level]; + tmp_ctx.level = level; + /* * We should always be there with the correct level because all the * intermediate tables have been installed if necessary. @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, p2m_clean_pte(entry, p2m->clean_dcache); else { - pte_t pte = p2m_pte_from_mfn(mfn, t, false); + pte_t pte = p2m_pte_from_mfn(mfn, t, &tmp_ctx); p2m_write_pte(entry, pte, p2m->clean_dcache); @@ -XXX,XX +XXX,XX @@ static int p2m_set_entry(struct p2m_domain *p2m, if ( pte_is_valid(orig_pte) && (!pte_is_valid(*entry) || !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte))) ) - p2m_free_subtree(p2m, orig_pte, level); + p2m_free_subtree(p2m, orig_pte, &tmp_ctx); out: unmap_domain_page(table); @@ -XXX,XX +XXX,XX @@ static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn, if ( pte_is_valid(entry) ) { - *t = p2m_get_type(entry); + struct p2m_pte_ctx p2m_pte_ctx = { + .pt_page = mfn_to_page(domain_page_map_to_mfn(table)), + .index = offsets[level], + .level = level, + .p2m = p2m, + }; + + *t = p2m_get_type(entry, &p2m_pte_ctx); mfn = pte_get_mfn(entry); -- 2.52.0
Introduce tracking of metadata page entries usage and if all of them are p2m_invalid then free them. Intermediate P2M page tables are allocated with MEMF_no_owner, so we are free to repurpose struct page_info fields for them. Since page_info.u.* is not used for such pages, introduce a used_entries counter in struct page_info to track how many metadata entries are in use for a given intermediate P2M page table. The counter is updated in p2m_set_type() when metadata entries transition between p2m_invalid and a valid external type. When the last metadata entry is cleared (used_entries == 0), the associated metadata page is freed and returned to the P2M pool. Refactor metadata page freeing into a new helper, p2m_free_metadata_page(), as the same logic is needed both when tearing down a P2M table and when all metadata entries become p2m_invalid in p2m_set_type(). As part of this refactoring, move the declaration of p2m_free_page() earlier to satisfy the new helper. Additionally, implement page_set_tlbflush_timestamp() for RISC-V instead of BUGing, as it is invoked when returning memory to the domheap. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> --- Changes in v5: - Nothing changed. Only rebase. --- Changes in v4: - Move implementation of alloc_domain_struct() and free_domain_struct() ahead of alloc_vcpu_struct(). --- Changes in v3: - Move alloc_domain_struct() and free_domain_struct() to not have forward declaration. - Add Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>. --- Changes in v2: - New patch. --- xen/arch/riscv/include/asm/flushtlb.h | 2 +- xen/arch/riscv/include/asm/mm.h | 12 ++++++++++ xen/arch/riscv/p2m.c | 32 +++++++++++++++++++++------ 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/xen/arch/riscv/include/asm/flushtlb.h b/xen/arch/riscv/include/asm/flushtlb.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/flushtlb.h +++ b/xen/arch/riscv/include/asm/flushtlb.h @@ -XXX,XX +XXX,XX @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {} static inline void page_set_tlbflush_timestamp(struct page_info *page) { - BUG_ON("unimplemented"); + page->tlbflush_timestamp = tlbflush_current_time(); } static inline void arch_flush_tlb_mask(const cpumask_t *mask) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -XXX,XX +XXX,XX @@ struct page_info unsigned long type_info; } inuse; + /* Page is used as an intermediate P2M page table: count_info == 0 */ + struct { + /* + * Tracks the number of used entries in the metadata page table. + * + * If used_entries == 0, then `page_info.v.md.pg` can be freed and + * returned to the P2M pool. + */ + unsigned long used_entries; + } md; + + /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ union { struct { diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/p2m.c +++ b/xen/arch/riscv/p2m.c @@ -XXX,XX +XXX,XX @@ static struct gstage_mode_desc __ro_after_init max_gstage_mode = { .name = "Bare", }; +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg); + +static inline void p2m_free_metadata_page(struct p2m_domain *p2m, + struct page_info **md_pg) +{ + if ( *md_pg ) + { + p2m_free_page(p2m, *md_pg); + *md_pg = NULL; + } +} + unsigned char get_max_supported_mode(void) { return max_gstage_mode.mode; @@ -XXX,XX +XXX,XX @@ static void p2m_set_type(pte_t *pte, p2m_type_t t, if ( t >= p2m_first_external ) { + if ( metadata[ctx->index].type == p2m_invalid ) + ctx->pt_page->u.md.used_entries++; + metadata[ctx->index].type = t; t = p2m_ext_storage; } else if ( metadata ) + { + if ( metadata[ctx->index].type != p2m_invalid ) + ctx->pt_page->u.md.used_entries--; + metadata[ctx->index].type = p2m_invalid; + } pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK); unmap_domain_page(metadata); + + if ( *md_pg && !ctx->pt_page->u.md.used_entries ) + p2m_free_metadata_page(ctx->p2m, md_pg); } /* @@ -XXX,XX +XXX,XX @@ static pte_t page_to_p2m_table(const struct page_info *page) return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, NULL); } -static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg); - /* * Free page table's page and metadata page linked to page table's page. */ static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg) { - if ( tbl_pg->v.md.pg ) - { - p2m_free_page(p2m, tbl_pg->v.md.pg); - tbl_pg->v.md.pg = NULL; - } + p2m_free_metadata_page(p2m, &tbl_pg->v.md.pg); + p2m_free_page(p2m, tbl_pg); } -- 2.52.0