[v3] xen/riscv: introduce p2m functionality

[PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Oleksii Kurochko 3 months ago

Introduce support for allocating and initializing the root page table
required for RISC-V stage-2 address translation.

To implement root page table allocation the following is introduced:
- p2m_get_clean_page() and p2m_alloc_root_table(), p2m_allocate_root()
  helpers to allocate and zero a 16 KiB root page table, as mandated
  by the RISC-V privileged specification for Sv32x4/Sv39x4/Sv48x4/Sv57x4
  modes.
- Update p2m_init() to inititialize p2m_root_order.
- Add maddr_to_page() and page_to_maddr() macros for easier address
  manipulation.
- Introduce paging_ret_pages_to_domheap() to return some pages before
  allocate 16 KiB pages for root page table.
- Allocate root p2m table after p2m pool is initialized.
- Add construct_hgatp() to construct the hgatp register value based on
  p2m->root, p2m->hgatp_mode and VMID.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
 - Drop insterting of p2m->vmid in hgatp_from_page() as now vmid is allocated
   per-CPU, not per-domain, so it will be inserted later somewhere in
   context_switch or before returning control to a guest.
 - use BIT() to init nr_pages in p2m_allocate_root() instead of open-code
   BIT() macros.
 - Fix order in clear_and_clean_page().
 - s/panic("Specify more xen,domain-p2m-mem-mb\n")/return NULL.
 - Use lock around a procedure of returning back pages necessary for p2m
   root table.
 - Update the comment about allocation of page for root page table.
 - Update an argument of hgatp_from_page() to "struct page_info *p2m_root_page"
   to be consistent with the  function name.
 - Use p2m_get_hostp2m(d) instead of open-coding it.
 - Update the comment above the call of p2m_alloc_root_table().
 - Update the comments in p2m_allocate_root().
 - Move part which returns some page to domheap before root page table allocation
   to paging.c.
 - Pass p2m_domain * instead of struct domain * for p2m_alloc_root_table().
 - Introduce construct_hgatp() instead of hgatp_from_page().
 - Add vmid and hgatp_mode member of struct p2m_domain.
 - Add explanatory comment above clean_dcache_va_range() in
   clear_and_clean_page().
 - Introduce P2M_ROOT_ORDER and P2M_ROOT_PAGES.
 - Drop vmid member from p2m_domain as now we are using per-pCPU
   VMID allocation.
 - Update a declaration of construct_hgatp() to recieve VMID as it
   isn't per-VM anymore.
 - Drop hgatp member of p2m_domain struct as with the new VMID scheme
   allocation construction of hgatp will be needed more often.
 - Drop is_hardware_domain() case in p2m_allocate_root(), just always
   allocate root using p2m pool pages.
 - Refactor p2m_alloc_root_table() and p2m_alloc_table().
---
Changes in v2:
 - This patch was created from "xen/riscv: introduce things necessary for p2m
   initialization" with the following changes:
   - [clear_and_clean_page()] Add missed call of clean_dcache_va_range().
   - Drop p2m_get_clean_page() as it is going to be used only once to allocate
     root page table. Open-code it explicittly in p2m_allocate_root(). Also,
     it will help avoid duplication of the code connected to order and nr_pages
     of p2m root page table.
   - Instead of using order 2 for alloc_domheap_pages(), use
     get_order_from_bytes(KB(16)).
   - Clear and clean a proper amount of allocated pages in p2m_allocate_root().
   - Drop _info from the function name hgatp_from_page_info() and its argument
     page_info.
   - Introduce HGATP_MODE_MASK and use MASK_INSR() instead of shift to calculate
     value of hgatp.
   - Drop unnecessary parentheses in definition of page_to_maddr().
   - Add support of VMID.
   - Drop TLB flushing in p2m_alloc_root_table() and do that once when VMID
     is re-used. [Look at p2m_alloc_vmid()]
   - Allocate p2m root table after p2m pool is fully initialized: first
     return pages to p2m pool them allocate p2m root table.
---
 xen/arch/riscv/include/asm/mm.h             |  4 +
 xen/arch/riscv/include/asm/p2m.h            | 12 +++
 xen/arch/riscv/include/asm/paging.h         |  2 +
 xen/arch/riscv/include/asm/riscv_encoding.h |  6 ++
 xen/arch/riscv/p2m.c                        | 90 +++++++++++++++++++++
 xen/arch/riscv/paging.c                     | 30 +++++++
 6 files changed, 144 insertions(+)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 9283616c02..dd8cdc9782 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -167,6 +167,10 @@ extern struct page_info *frametable_virt_start;
 #define mfn_to_page(mfn)    (frametable_virt_start + mfn_x(mfn))
 #define page_to_mfn(pg)     _mfn((pg) - frametable_virt_start)
 
+/* Convert between machine addresses and page-info structures. */
+#define maddr_to_page(ma) mfn_to_page(maddr_to_mfn(ma))
+#define page_to_maddr(pg) mfn_to_maddr(page_to_mfn(pg))
+
 static inline void *page_to_virt(const struct page_info *pg)
 {
     return mfn_to_virt(mfn_x(page_to_mfn(pg)));
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index f8051ed893..3c37a708db 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -9,6 +9,10 @@
 
 #include <asm/page-bits.h>
 
+extern unsigned int p2m_root_order;
+#define P2M_ROOT_ORDER  p2m_root_order
+#define P2M_ROOT_PAGES  BIT(P2M_ROOT_ORDER, U)
+
 #define paddr_bits PADDR_BITS
 
 /* Get host p2m table */
@@ -24,6 +28,12 @@ struct p2m_domain {
     /* Pages used to construct the p2m */
     struct page_list_head pages;
 
+    /* The root of the p2m tree. May be concatenated */
+    struct page_info *root;
+
+    /* G-stage (stage-2) address translation mode */
+    unsigned long hgatp_mode;
+
     /* Indicate if it is required to clean the cache when writing an entry */
     bool clean_pte;
 
@@ -127,6 +137,8 @@ static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
     /* Not supported on RISCV. */
 }
 
+unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid);
+
 #endif /* ASM__RISCV__P2M_H */
 
 /*
diff --git a/xen/arch/riscv/include/asm/paging.h b/xen/arch/riscv/include/asm/paging.h
index 8fdaeeb2e4..557fbd1abc 100644
--- a/xen/arch/riscv/include/asm/paging.h
+++ b/xen/arch/riscv/include/asm/paging.h
@@ -10,4 +10,6 @@ int paging_domain_init(struct domain *d);
 int paging_freelist_init(struct domain *d, unsigned long pages,
                          bool *preempted);
 
+bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages);
+
 #endif /* ASM_RISCV_PAGING_H */
diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h b/xen/arch/riscv/include/asm/riscv_encoding.h
index 6cc8f4eb45..8362df8784 100644
--- a/xen/arch/riscv/include/asm/riscv_encoding.h
+++ b/xen/arch/riscv/include/asm/riscv_encoding.h
@@ -133,11 +133,13 @@
 #define HGATP_MODE_SV48X4		_UL(9)
 
 #define HGATP32_MODE_SHIFT		31
+#define HGATP32_MODE_MASK		_UL(0x80000000)
 #define HGATP32_VMID_SHIFT		22
 #define HGATP32_VMID_MASK		_UL(0x1FC00000)
 #define HGATP32_PPN			_UL(0x003FFFFF)
 
 #define HGATP64_MODE_SHIFT		60
+#define HGATP64_MODE_MASK		_ULL(0xF000000000000000)
 #define HGATP64_VMID_SHIFT		44
 #define HGATP64_VMID_MASK		_ULL(0x03FFF00000000000)
 #define HGATP64_PPN			_ULL(0x00000FFFFFFFFFFF)
@@ -170,6 +172,7 @@
 #define HGATP_VMID_SHIFT		HGATP64_VMID_SHIFT
 #define HGATP_VMID_MASK			HGATP64_VMID_MASK
 #define HGATP_MODE_SHIFT		HGATP64_MODE_SHIFT
+#define HGATP_MODE_MASK			HGATP64_MODE_MASK
 #else
 #define MSTATUS_SD			MSTATUS32_SD
 #define SSTATUS_SD			SSTATUS32_SD
@@ -181,8 +184,11 @@
 #define HGATP_VMID_SHIFT		HGATP32_VMID_SHIFT
 #define HGATP_VMID_MASK			HGATP32_VMID_MASK
 #define HGATP_MODE_SHIFT		HGATP32_MODE_SHIFT
+#define HGATP_MODE_MASK			HGATP32_MODE_MASK
 #endif
 
+#define GUEST_ROOT_PAGE_TABLE_SIZE	KB(16)
+
 #define TOPI_IID_SHIFT			16
 #define TOPI_IID_MASK			0xfff
 #define TOPI_IPRIO_MASK		0xff
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 214b4861d2..cac07c51c9 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -1,8 +1,86 @@
+#include <xen/domain_page.h>
 #include <xen/mm.h>
 #include <xen/rwlock.h>
 #include <xen/sched.h>
 
 #include <asm/paging.h>
+#include <asm/p2m.h>
+#include <asm/riscv_encoding.h>
+
+unsigned int __read_mostly p2m_root_order;
+
+static void clear_and_clean_page(struct page_info *page)
+{
+    clear_domain_page(page_to_mfn(page));
+
+    /*
+     * If the IOMMU doesn't support coherent walks and the p2m tables are
+     * shared between the CPU and IOMMU, it is necessary to clean the
+     * d-cache.
+     */
+    clean_dcache_va_range(page, PAGE_SIZE);
+}
+
+static struct page_info *p2m_allocate_root(struct domain *d)
+{
+    struct page_info *page;
+
+    /*
+     * As mentioned in the Priviliged Architecture Spec (version 20240411)
+     * in Section 18.5.1, for the paged virtual-memory schemes  (Sv32x4,
+     * Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 KiB and must
+     * be aligned to a 16-KiB boundary.
+     */
+    page = alloc_domheap_pages(d, P2M_ROOT_ORDER, MEMF_no_owner);
+    if ( !page )
+        return NULL;
+
+    for ( unsigned int i = 0; i < P2M_ROOT_PAGES; i++ )
+        clear_and_clean_page(page + i);
+
+    return page;
+}
+
+unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid)
+{
+    unsigned long ppn;
+
+    ppn = PFN_DOWN(page_to_maddr(p2m->root)) & HGATP_PPN;
+
+    /* TODO: add detection of hgatp_mode instead of hard-coding it. */
+#if RV_STAGE1_MODE == SATP_MODE_SV39
+    p2m->hgatp_mode = HGATP_MODE_SV39X4;
+#elif RV_STAGE1_MODE == SATP_MODE_SV48
+    p2m->hgatp_mode = HGATP_MODE_SV48X4;
+#else
+#   error "add HGATP_MODE"
+#endif
+
+    return ppn | MASK_INSR(p2m->hgatp_mode, HGATP_MODE_MASK) |
+                 MASK_INSR(vmid, HGATP_VMID_MASK);
+}
+
+static int p2m_alloc_root_table(struct p2m_domain *p2m)
+{
+    struct domain *d = p2m->domain;
+    struct page_info *page;
+    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
+
+    /*
+     * Return back nr_root_pages to assure the root table memory is also
+     * accounted against the P2M pool of the domain.
+     */
+    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
+        return -ENOMEM;
+
+    page = p2m_allocate_root(d);
+    if ( !page )
+        return -ENOMEM;
+
+    p2m->root = page;
+
+    return 0;
+}
 
 int p2m_init(struct domain *d)
 {
@@ -32,6 +110,8 @@ int p2m_init(struct domain *d)
 #   error "Add init of p2m->clean_pte"
 #endif
 
+    p2m_root_order = get_order_from_bytes(GUEST_ROOT_PAGE_TABLE_SIZE);
+
     return 0;
 }
 
@@ -42,10 +122,20 @@ int p2m_init(struct domain *d)
  */
 int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
 {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc;
 
     if ( (rc = paging_freelist_init(d, pages, preempted)) )
         return rc;
 
+    /*
+     * First, initialize p2m pool. Then allocate the root
+     * table so that the necessary pages can be returned from the p2m pool,
+     * since the root table must be allocated using alloc_domheap_pages(...)
+     * to meet its specific requirements.
+     */
+    if ( !p2m->root )
+        p2m_alloc_root_table(p2m);
+
     return 0;
 }
diff --git a/xen/arch/riscv/paging.c b/xen/arch/riscv/paging.c
index 8882be5ac9..bbe1186900 100644
--- a/xen/arch/riscv/paging.c
+++ b/xen/arch/riscv/paging.c
@@ -54,6 +54,36 @@ int paging_freelist_init(struct domain *d, unsigned long pages,
 
     return 0;
 }
+
+bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages)
+{
+    struct page_info *page;
+
+    ASSERT(spin_is_locked(&d->arch.paging.lock));
+
+    if ( ACCESS_ONCE(d->arch.paging.total_pages) < nr_pages )
+        return false;
+
+    for ( unsigned int i = 0; i < nr_pages; i++ )
+    {
+        /* Return memory to domheap. */
+        page = page_list_remove_head(&d->arch.paging.freelist);
+        if( page )
+        {
+            ACCESS_ONCE(d->arch.paging.total_pages)--;
+            free_domheap_page(page);
+        }
+        else
+        {
+            printk(XENLOG_ERR
+                   "Failed to free P2M pages, P2M freelist is empty.\n");
+            return false;
+        }
+    }
+
+    return true;
+}
+
 /* Domain paging struct initialization. */
 int paging_domain_init(struct domain *d)
 {
-- 
2.50.1

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Jan Beulich 2 months, 3 weeks ago

On 31.07.2025 17:58, Oleksii Kurochko wrote:
> +static int p2m_alloc_root_table(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
> +    struct page_info *page;
> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
> +
> +    /*
> +     * Return back nr_root_pages to assure the root table memory is also
> +     * accounted against the P2M pool of the domain.
> +     */
> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
> +        return -ENOMEM;
> +
> +    page = p2m_allocate_root(d);
> +    if ( !page )
> +        return -ENOMEM;
> +
> +    p2m->root = page;
> +
> +    return 0;
> +}

In the success case, shouldn't you bump the paging pool's total_pages by
P2M_ROOT_PAGES? (As the freeing side is missing so far, it's not easy to
tell whether there's [going to be] a balancing problem in the long run.
In the short run there certainly is.)

Jan

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Oleksii Kurochko 2 months, 3 weeks ago

On 8/5/25 12:43 PM, Jan Beulich wrote:
> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>> +static int p2m_alloc_root_table(struct p2m_domain *p2m)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    struct page_info *page;
>> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
>> +
>> +    /*
>> +     * Return back nr_root_pages to assure the root table memory is also
>> +     * accounted against the P2M pool of the domain.
>> +     */
>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>> +        return -ENOMEM;
>> +
>> +    page = p2m_allocate_root(d);
>> +    if ( !page )
>> +        return -ENOMEM;
>> +
>> +    p2m->root = page;
>> +
>> +    return 0;
>> +}
> In the success case, shouldn't you bump the paging pool's total_pages by
> P2M_ROOT_PAGES? (As the freeing side is missing so far, it's not easy to
> tell whether there's [going to be] a balancing problem in the long run.
> In the short run there certainly is.)

I think that total_pages should be updated only in case when page is added
to freelist.
In the case of p2m root table, we just returning some pages to domheap and
durint that decreasing an amount of total_pages as freelist has lesser pages,
and then just allocate pages from domheap without adding them to freelist.

~ Oleksii

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Jan Beulich 2 months, 3 weeks ago

On 07.08.2025 15:35, Oleksii Kurochko wrote:
> 
> On 8/5/25 12:43 PM, Jan Beulich wrote:
>> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>>> +static int p2m_alloc_root_table(struct p2m_domain *p2m)
>>> +{
>>> +    struct domain *d = p2m->domain;
>>> +    struct page_info *page;
>>> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
>>> +
>>> +    /*
>>> +     * Return back nr_root_pages to assure the root table memory is also
>>> +     * accounted against the P2M pool of the domain.
>>> +     */
>>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>>> +        return -ENOMEM;
>>> +
>>> +    page = p2m_allocate_root(d);
>>> +    if ( !page )
>>> +        return -ENOMEM;
>>> +
>>> +    p2m->root = page;
>>> +
>>> +    return 0;
>>> +}
>> In the success case, shouldn't you bump the paging pool's total_pages by
>> P2M_ROOT_PAGES? (As the freeing side is missing so far, it's not easy to
>> tell whether there's [going to be] a balancing problem in the long run.
>> In the short run there certainly is.)
> 
> I think that total_pages should be updated only in case when page is added
> to freelist.
> In the case of p2m root table, we just returning some pages to domheap and
> durint that decreasing an amount of total_pages as freelist has lesser pages,
> and then just allocate pages from domheap without adding them to freelist.

But how's freeing of a root table going to look like? Logically that group
of 4 pages would be put back into the pool. And from that the pool's
total_pages should reflect that right after successful allocation.

Jan

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Oleksii Kurochko 2 months, 3 weeks ago

On 8/7/25 5:57 PM, Jan Beulich wrote:
> On 07.08.2025 15:35, Oleksii Kurochko wrote:
>> On 8/5/25 12:43 PM, Jan Beulich wrote:
>>> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>>>> +static int p2m_alloc_root_table(struct p2m_domain *p2m)
>>>> +{
>>>> +    struct domain *d = p2m->domain;
>>>> +    struct page_info *page;
>>>> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
>>>> +
>>>> +    /*
>>>> +     * Return back nr_root_pages to assure the root table memory is also
>>>> +     * accounted against the P2M pool of the domain.
>>>> +     */
>>>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    page = p2m_allocate_root(d);
>>>> +    if ( !page )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    p2m->root = page;
>>>> +
>>>> +    return 0;
>>>> +}
>>> In the success case, shouldn't you bump the paging pool's total_pages by
>>> P2M_ROOT_PAGES? (As the freeing side is missing so far, it's not easy to
>>> tell whether there's [going to be] a balancing problem in the long run.
>>> In the short run there certainly is.)
>> I think that total_pages should be updated only in case when page is added
>> to freelist.
>> In the case of p2m root table, we just returning some pages to domheap and
>> durint that decreasing an amount of total_pages as freelist has lesser pages,
>> and then just allocate pages from domheap without adding them to freelist.
> But how's freeing of a root table going to look like?

We have saved pointer to first page of P2M_ROOT_PAGES allocated for root page
table which is stored in p2m->root. Then when a domain is going to be destroyed,
then do something like:
     for ( i = 0; i < P2M_ROOT_PAGES; i++ )
         clear_and_clean_page(p2m->root + i);
...


> Logically that group
> of 4 pages would be put back into the pool. And from that the pool's
> total_pages should reflect that right after successful allocation.

... I think instead of having the loop mentioned above we could add root table
pages to p2m->pages (as you suggested) in p2m_allocate_root() and then a domain
is being destroyed just do the following:
   while ( (pg = page_list_remove_head(&p2m->pages)) )
   {
       p2m_free_page(p2m->domain, pg);
And it will be a job of internals of p2m_free_page() -> paging_free_page() to
adjust freelist's total_pages and return back page(s) allocated for root table
to the freelist. (Note: the current implementation of paging_free_page() just
add a page to freelist without updating of freelist's total_pages what looks
incorrect. And it will be enough as total_pages is present only for freelist
and there is not separate total_pages (or something similar) for p2m->pages).

~ Oleksii

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Jan Beulich 2 months, 3 weeks ago

On 31.07.2025 17:58, Oleksii Kurochko wrote:
> Introduce support for allocating and initializing the root page table
> required for RISC-V stage-2 address translation.
> 
> To implement root page table allocation the following is introduced:
> - p2m_get_clean_page() and p2m_alloc_root_table(), p2m_allocate_root()
>   helpers to allocate and zero a 16 KiB root page table, as mandated
>   by the RISC-V privileged specification for Sv32x4/Sv39x4/Sv48x4/Sv57x4
>   modes.
> - Update p2m_init() to inititialize p2m_root_order.
> - Add maddr_to_page() and page_to_maddr() macros for easier address
>   manipulation.
> - Introduce paging_ret_pages_to_domheap() to return some pages before
>   allocate 16 KiB pages for root page table.
> - Allocate root p2m table after p2m pool is initialized.
> - Add construct_hgatp() to construct the hgatp register value based on
>   p2m->root, p2m->hgatp_mode and VMID.

Imo for this to be complete, freeing of the root table also wants taking
care of. Much like imo p2m_init() would better immediately be accompanied
by the respective teardown function. Once you start using them, you want
to use them in pairs, after all.

> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> @@ -133,11 +133,13 @@
>  #define HGATP_MODE_SV48X4		_UL(9)
>  
>  #define HGATP32_MODE_SHIFT		31
> +#define HGATP32_MODE_MASK		_UL(0x80000000)
>  #define HGATP32_VMID_SHIFT		22
>  #define HGATP32_VMID_MASK		_UL(0x1FC00000)
>  #define HGATP32_PPN			_UL(0x003FFFFF)
>  
>  #define HGATP64_MODE_SHIFT		60
> +#define HGATP64_MODE_MASK		_ULL(0xF000000000000000)
>  #define HGATP64_VMID_SHIFT		44
>  #define HGATP64_VMID_MASK		_ULL(0x03FFF00000000000)
>  #define HGATP64_PPN			_ULL(0x00000FFFFFFFFFFF)
> @@ -170,6 +172,7 @@
>  #define HGATP_VMID_SHIFT		HGATP64_VMID_SHIFT
>  #define HGATP_VMID_MASK			HGATP64_VMID_MASK
>  #define HGATP_MODE_SHIFT		HGATP64_MODE_SHIFT
> +#define HGATP_MODE_MASK			HGATP64_MODE_MASK
>  #else
>  #define MSTATUS_SD			MSTATUS32_SD
>  #define SSTATUS_SD			SSTATUS32_SD
> @@ -181,8 +184,11 @@
>  #define HGATP_VMID_SHIFT		HGATP32_VMID_SHIFT
>  #define HGATP_VMID_MASK			HGATP32_VMID_MASK
>  #define HGATP_MODE_SHIFT		HGATP32_MODE_SHIFT
> +#define HGATP_MODE_MASK			HGATP32_MODE_MASK
>  #endif
>  
> +#define GUEST_ROOT_PAGE_TABLE_SIZE	KB(16)

In another context I already mentioned that imo you want to be careful with
the use of "guest" in identifiers. It's not the guest page tables which have
an order-2 root table, but the P2M (Xen terminology) or G-stage / second
stage (RISC-V spec terminology) ones. As long as you're only doing P2M
work, this may not look significant. But once you actually start dealing
with guest page tables, it easily can end up confusing.

> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1,8 +1,86 @@
> +#include <xen/domain_page.h>
>  #include <xen/mm.h>
>  #include <xen/rwlock.h>
>  #include <xen/sched.h>
>  
>  #include <asm/paging.h>
> +#include <asm/p2m.h>
> +#include <asm/riscv_encoding.h>
> +
> +unsigned int __read_mostly p2m_root_order;

If this is to be a variable at all, it ought to be __ro_after_init, and
hence it shouldn't be written every time p2m_init() is run. If you want
to to remain as a variable, what's wrong with

const unsigned int p2m_root_order = ilog2(GUEST_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT;

or some such? But of course equally well you could have

#define P2M_ROOT_ORDER  (ilog2(GUEST_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT)

> +static void clear_and_clean_page(struct page_info *page)
> +{
> +    clear_domain_page(page_to_mfn(page));
> +
> +    /*
> +     * If the IOMMU doesn't support coherent walks and the p2m tables are
> +     * shared between the CPU and IOMMU, it is necessary to clean the
> +     * d-cache.
> +     */

That is, ...

> +    clean_dcache_va_range(page, PAGE_SIZE);

... this call really wants to be conditional?

> +}
> +
> +static struct page_info *p2m_allocate_root(struct domain *d)

With there also being p2m_alloc_root_table() and with that being the sole
caller of the function here, I wonder: Is having this in a separate
function really outweighing the possible confusion of which of the two
functions to use?

> +{
> +    struct page_info *page;
> +
> +    /*
> +     * As mentioned in the Priviliged Architecture Spec (version 20240411)
> +     * in Section 18.5.1, for the paged virtual-memory schemes  (Sv32x4,
> +     * Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 KiB and must
> +     * be aligned to a 16-KiB boundary.
> +     */
> +    page = alloc_domheap_pages(d, P2M_ROOT_ORDER, MEMF_no_owner);
> +    if ( !page )
> +        return NULL;
> +
> +    for ( unsigned int i = 0; i < P2M_ROOT_PAGES; i++ )
> +        clear_and_clean_page(page + i);
> +
> +    return page;
> +}
> +
> +unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid)
> +{
> +    unsigned long ppn;
> +
> +    ppn = PFN_DOWN(page_to_maddr(p2m->root)) & HGATP_PPN;

Why not page_to_pfn() or mfn_x(page_to_mfn())? I.e. why mix different groups
of accessors?

As to "& HGATP_PPN" - that's making an assumption that you could avoid by
using ...

> +    /* TODO: add detection of hgatp_mode instead of hard-coding it. */
> +#if RV_STAGE1_MODE == SATP_MODE_SV39
> +    p2m->hgatp_mode = HGATP_MODE_SV39X4;
> +#elif RV_STAGE1_MODE == SATP_MODE_SV48
> +    p2m->hgatp_mode = HGATP_MODE_SV48X4;
> +#else
> +#   error "add HGATP_MODE"
> +#endif
> +
> +    return ppn | MASK_INSR(p2m->hgatp_mode, HGATP_MODE_MASK) |
> +                 MASK_INSR(vmid, HGATP_VMID_MASK);

... MASK_INSR() also on "ppn".

As to the writing of p2m->hgatp_mode - you don't want to do this here, when
this is the function to calculate the value to put into hgatp. This field
needs calculating only once, perhaps in p2m_init().

> +static int p2m_alloc_root_table(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
> +    struct page_info *page;
> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES;

Is this local variable really of any use?

> +    /*
> +     * Return back nr_root_pages to assure the root table memory is also
> +     * accounted against the P2M pool of the domain.
> +     */
> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
> +        return -ENOMEM;
> +
> +    page = p2m_allocate_root(d);
> +    if ( !page )
> +        return -ENOMEM;

Hmm, and the pool is then left shrunk by 4 pages?

> --- a/xen/arch/riscv/paging.c
> +++ b/xen/arch/riscv/paging.c
> @@ -54,6 +54,36 @@ int paging_freelist_init(struct domain *d, unsigned long pages,
>  
>      return 0;
>  }
> +
> +bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages)
> +{
> +    struct page_info *page;
> +
> +    ASSERT(spin_is_locked(&d->arch.paging.lock));
> +
> +    if ( ACCESS_ONCE(d->arch.paging.total_pages) < nr_pages )
> +        return false;
> +
> +    for ( unsigned int i = 0; i < nr_pages; i++ )
> +    {
> +        /* Return memory to domheap. */
> +        page = page_list_remove_head(&d->arch.paging.freelist);
> +        if( page )
> +        {
> +            ACCESS_ONCE(d->arch.paging.total_pages)--;
> +            free_domheap_page(page);
> +        }
> +        else
> +        {
> +            printk(XENLOG_ERR
> +                   "Failed to free P2M pages, P2M freelist is empty.\n");
> +            return false;

Looks pretty redundant with half of paging_freelist_init(), including the
stray full stop in the log message.

Jan

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Oleksii Kurochko 2 months, 3 weeks ago

On 8/5/25 12:37 PM, Jan Beulich wrote:
> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>> Introduce support for allocating and initializing the root page table
>> required for RISC-V stage-2 address translation.
>>
>> To implement root page table allocation the following is introduced:
>> - p2m_get_clean_page() and p2m_alloc_root_table(), p2m_allocate_root()
>>    helpers to allocate and zero a 16 KiB root page table, as mandated
>>    by the RISC-V privileged specification for Sv32x4/Sv39x4/Sv48x4/Sv57x4
>>    modes.
>> - Update p2m_init() to inititialize p2m_root_order.
>> - Add maddr_to_page() and page_to_maddr() macros for easier address
>>    manipulation.
>> - Introduce paging_ret_pages_to_domheap() to return some pages before
>>    allocate 16 KiB pages for root page table.
>> - Allocate root p2m table after p2m pool is initialized.
>> - Add construct_hgatp() to construct the hgatp register value based on
>>    p2m->root, p2m->hgatp_mode and VMID.
> Imo for this to be complete, freeing of the root table also wants taking
> care of. Much like imo p2m_init() would better immediately be accompanied
> by the respective teardown function. Once you start using them, you want
> to use them in pairs, after all.

I decided to ignore freeing of the root table and tearing down p2m mapping
as it is going to be used during a domain destroy, which isn't supported
at the moment, and thereby an implementation of them could be delayed when
they really will be used.

>
>> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
>> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
>> @@ -133,11 +133,13 @@
>>   #define HGATP_MODE_SV48X4		_UL(9)
>>   
>>   #define HGATP32_MODE_SHIFT		31
>> +#define HGATP32_MODE_MASK		_UL(0x80000000)
>>   #define HGATP32_VMID_SHIFT		22
>>   #define HGATP32_VMID_MASK		_UL(0x1FC00000)
>>   #define HGATP32_PPN			_UL(0x003FFFFF)
>>   
>>   #define HGATP64_MODE_SHIFT		60
>> +#define HGATP64_MODE_MASK		_ULL(0xF000000000000000)
>>   #define HGATP64_VMID_SHIFT		44
>>   #define HGATP64_VMID_MASK		_ULL(0x03FFF00000000000)
>>   #define HGATP64_PPN			_ULL(0x00000FFFFFFFFFFF)
>> @@ -170,6 +172,7 @@
>>   #define HGATP_VMID_SHIFT		HGATP64_VMID_SHIFT
>>   #define HGATP_VMID_MASK			HGATP64_VMID_MASK
>>   #define HGATP_MODE_SHIFT		HGATP64_MODE_SHIFT
>> +#define HGATP_MODE_MASK			HGATP64_MODE_MASK
>>   #else
>>   #define MSTATUS_SD			MSTATUS32_SD
>>   #define SSTATUS_SD			SSTATUS32_SD
>> @@ -181,8 +184,11 @@
>>   #define HGATP_VMID_SHIFT		HGATP32_VMID_SHIFT
>>   #define HGATP_VMID_MASK			HGATP32_VMID_MASK
>>   #define HGATP_MODE_SHIFT		HGATP32_MODE_SHIFT
>> +#define HGATP_MODE_MASK			HGATP32_MODE_MASK
>>   #endif
>>   
>> +#define GUEST_ROOT_PAGE_TABLE_SIZE	KB(16)
> In another context I already mentioned that imo you want to be careful with
> the use of "guest" in identifiers. It's not the guest page tables which have
> an order-2 root table, but the P2M (Xen terminology) or G-stage / second
> stage (RISC-V spec terminology) ones. As long as you're only doing P2M
> work, this may not look significant. But once you actually start dealing
> with guest page tables, it easily can end up confusing.

I thought that GUEST_ROOT_PAGE_TABLE is equal to G-stage root page table.
But it is confusing even now, then I'll use GSTAGE_ROOT_PAGE_TABLE_SIZE
instead.

>
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -1,8 +1,86 @@
>> +#include <xen/domain_page.h>
>>   #include <xen/mm.h>
>>   #include <xen/rwlock.h>
>>   #include <xen/sched.h>
>>   
>>   #include <asm/paging.h>
>> +#include <asm/p2m.h>
>> +#include <asm/riscv_encoding.h>
>> +
>> +unsigned int __read_mostly p2m_root_order;
> If this is to be a variable at all, it ought to be __ro_after_init, and
> hence it shouldn't be written every time p2m_init() is run. If you want
> to to remain as a variable, what's wrong with
>
> const unsigned int p2m_root_order = ilog2(GUEST_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT;
>
> or some such? But of course equally well you could have
>
> #define P2M_ROOT_ORDER  (ilog2(GUEST_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT)

The only one reason p2m_root_order was introduced as variable it was that
I had a compilation issue when define P2M_ROOT_ORDER in such way:
   #define P2M_ROOT_ORDER  get_order_from_bytes(GUEST_ROOT_PAGE_TABLE_SIZE)
But I can't reproduce it anymore.

Anyway, your option is better as it should be faster.

>
>> +static void clear_and_clean_page(struct page_info *page)
>> +{
>> +    clear_domain_page(page_to_mfn(page));
>> +
>> +    /*
>> +     * If the IOMMU doesn't support coherent walks and the p2m tables are
>> +     * shared between the CPU and IOMMU, it is necessary to clean the
>> +     * d-cache.
>> +     */
> That is, ...
>
>> +    clean_dcache_va_range(page, PAGE_SIZE);
> ... this call really wants to be conditional?

It makes sense. I will add "if ( p2m->clean_pte )" and update clear_and_clean_page()
declaration.

>
>> +}
>> +
>> +static struct page_info *p2m_allocate_root(struct domain *d)
> With there also being p2m_alloc_root_table() and with that being the sole
> caller of the function here, I wonder: Is having this in a separate
> function really outweighing the possible confusion of which of the two
> functions to use?

p2m_allocate_root() will be also used in further patches to allocate
root's metadata page(s), but, also, in the same function p2m_alloc_root_table().

Probably, to avoid confusion it makes sense to rename p2m_allocate_root() to
p2m_allocate_root_page().


>
>> +{
>> +    struct page_info *page;
>> +
>> +    /*
>> +     * As mentioned in the Priviliged Architecture Spec (version 20240411)
>> +     * in Section 18.5.1, for the paged virtual-memory schemes  (Sv32x4,
>> +     * Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 KiB and must
>> +     * be aligned to a 16-KiB boundary.
>> +     */
>> +    page = alloc_domheap_pages(d, P2M_ROOT_ORDER, MEMF_no_owner);
>> +    if ( !page )
>> +        return NULL;
>> +
>> +    for ( unsigned int i = 0; i < P2M_ROOT_PAGES; i++ )
>> +        clear_and_clean_page(page + i);
>> +
>> +    return page;
>> +}
>> +
>> +unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid)
>> +{
>> +    unsigned long ppn;
>> +
>> +    ppn = PFN_DOWN(page_to_maddr(p2m->root)) & HGATP_PPN;
> Why not page_to_pfn() or mfn_x(page_to_mfn())? I.e. why mix different groups
> of accessors?

No specific reason, just missed such option.

>
> As to "& HGATP_PPN" - that's making an assumption that you could avoid by
> using ...
>
>> +    /* TODO: add detection of hgatp_mode instead of hard-coding it. */
>> +#if RV_STAGE1_MODE == SATP_MODE_SV39
>> +    p2m->hgatp_mode = HGATP_MODE_SV39X4;
>> +#elif RV_STAGE1_MODE == SATP_MODE_SV48
>> +    p2m->hgatp_mode = HGATP_MODE_SV48X4;
>> +#else
>> +#   error "add HGATP_MODE"
>> +#endif
>> +
>> +    return ppn | MASK_INSR(p2m->hgatp_mode, HGATP_MODE_MASK) |
>> +                 MASK_INSR(vmid, HGATP_VMID_MASK);
> ... MASK_INSR() also on "ppn".
>
> As to the writing of p2m->hgatp_mode - you don't want to do this here, when
> this is the function to calculate the value to put into hgatp. This field
> needs calculating only once, perhaps in p2m_init().

Agree, it makes sense to move hgatp_mode detection to p2m_init().

>
>> +static int p2m_alloc_root_table(struct p2m_domain *p2m)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    struct page_info *page;
>> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
> Is this local variable really of any use?

It will be needed for one of the next patches and to have less change in
further patch, I've decided to introduce it here.

>
>> +    /*
>> +     * Return back nr_root_pages to assure the root table memory is also
>> +     * accounted against the P2M pool of the domain.
>> +     */
>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>> +        return -ENOMEM;
>> +
>> +    page = p2m_allocate_root(d);
>> +    if ( !page )
>> +        return -ENOMEM;
> Hmm, and the pool is then left shrunk by 4 pages?

Yes until they are used for root table it shouldn't be in p2m pool (freelist),
when root table will be freed then it makes sense to return them back.
Am I missing something?

Probably, you meant that it is needed to update p2m->pages?

>
>> --- a/xen/arch/riscv/paging.c
>> +++ b/xen/arch/riscv/paging.c
>> @@ -54,6 +54,36 @@ int paging_freelist_init(struct domain *d, unsigned long pages,
>>   
>>       return 0;
>>   }
>> +
>> +bool paging_ret_pages_to_domheap(struct domain *d, unsigned int nr_pages)
>> +{
>> +    struct page_info *page;
>> +
>> +    ASSERT(spin_is_locked(&d->arch.paging.lock));
>> +
>> +    if ( ACCESS_ONCE(d->arch.paging.total_pages) < nr_pages )
>> +        return false;
>> +
>> +    for ( unsigned int i = 0; i < nr_pages; i++ )
>> +    {
>> +        /* Return memory to domheap. */
>> +        page = page_list_remove_head(&d->arch.paging.freelist);
>> +        if( page )
>> +        {
>> +            ACCESS_ONCE(d->arch.paging.total_pages)--;
>> +            free_domheap_page(page);
>> +        }
>> +        else
>> +        {
>> +            printk(XENLOG_ERR
>> +                   "Failed to free P2M pages, P2M freelist is empty.\n");
>> +            return false;
> Looks pretty redundant with half of paging_freelist_init(), including the
> stray full stop in the log message.

I will introduce then a separate function (for a code, which is inside
for-loop) and use it here and in paging_freelist_init().

Thanks.

~ Oleksii

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Jan Beulich 2 months, 3 weeks ago

On 07.08.2025 14:00, Oleksii Kurochko wrote:
> On 8/5/25 12:37 PM, Jan Beulich wrote:
>> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>>> +    /*
>>> +     * Return back nr_root_pages to assure the root table memory is also
>>> +     * accounted against the P2M pool of the domain.
>>> +     */
>>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>>> +        return -ENOMEM;
>>> +
>>> +    page = p2m_allocate_root(d);
>>> +    if ( !page )
>>> +        return -ENOMEM;
>> Hmm, and the pool is then left shrunk by 4 pages?
> 
> Yes until they are used for root table it shouldn't be in p2m pool (freelist),
> when root table will be freed then it makes sense to return them back.
> Am I missing something?

I'm commenting specifically on the error path here.

> Probably, you meant that it is needed to update p2m->pages?

That (I think) I commented on elsewhere, yes.

Jan

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Oleksii Kurochko 2 months, 3 weeks ago

On 8/7/25 5:30 PM, Jan Beulich wrote:
> On 07.08.2025 14:00, Oleksii Kurochko wrote:
>> On 8/5/25 12:37 PM, Jan Beulich wrote:
>>> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>>>> +    /*
>>>> +     * Return back nr_root_pages to assure the root table memory is also
>>>> +     * accounted against the P2M pool of the domain.
>>>> +     */
>>>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    page = p2m_allocate_root(d);
>>>> +    if ( !page )
>>>> +        return -ENOMEM;
>>> Hmm, and the pool is then left shrunk by 4 pages?
>> Yes until they are used for root table it shouldn't be in p2m pool (freelist),
>> when root table will be freed then it makes sense to return them back.
>> Am I missing something?
> I'm commenting specifically on the error path here.

Ohh, got it.

In this case, should we really care about this 4 pages as a domain can't be ran
without allocated page root table and a panic() will be occured anyway according
to the create_domUs() common code (construct_domU() -> domain_p2m_set_allocation()
-> p2m_set_allocation() -> p2m_alloc_root_table()):
...
         rc = construct_domU(&ki, node);
         if ( rc )
             panic("Could not set up domain %s (rc = %d)\n",
                   dt_node_name(node), rc);
...
(Note: I missed to return a value returned by p2m_alloc_root_table() in p2m_set_allocation()
so it isn't really propagated, at the moment, but I will fix that in the next patch
version) ...

>> Probably, you meant that it is needed to update p2m->pages?
> That (I think) I commented on elsewhere, yes.
...

if it is needed really to update p2m->pages when a page is allocated, I think
it will be better to in p2m_allocate_root() immediately after alloc_domheap_pages()
is called in p2m_allocate_root().

~ Oleksii

Re: [PATCH v3 06/20] xen/riscv: add root page table allocation

Posted by Jan Beulich 2 months, 3 weeks ago

On 07.08.2025 17:59, Oleksii Kurochko wrote:
> On 8/7/25 5:30 PM, Jan Beulich wrote:
>> On 07.08.2025 14:00, Oleksii Kurochko wrote:
>>> On 8/5/25 12:37 PM, Jan Beulich wrote:
>>>> On 31.07.2025 17:58, Oleksii Kurochko wrote:
>>>>> +    /*
>>>>> +     * Return back nr_root_pages to assure the root table memory is also
>>>>> +     * accounted against the P2M pool of the domain.
>>>>> +     */
>>>>> +    if ( !paging_ret_pages_to_domheap(d, nr_root_pages) )
>>>>> +        return -ENOMEM;
>>>>> +
>>>>> +    page = p2m_allocate_root(d);
>>>>> +    if ( !page )
>>>>> +        return -ENOMEM;
>>>> Hmm, and the pool is then left shrunk by 4 pages?
>>> Yes until they are used for root table it shouldn't be in p2m pool (freelist),
>>> when root table will be freed then it makes sense to return them back.
>>> Am I missing something?
>> I'm commenting specifically on the error path here.
> 
> Ohh, got it.
> 
> In this case, should we really care about this 4 pages as a domain can't be ran
> without allocated page root table and a panic() will be occured anyway according
> to the create_domUs() common code (construct_domU() -> domain_p2m_set_allocation()
> -> p2m_set_allocation() -> p2m_alloc_root_table()):
> ...
>          rc = construct_domU(&ki, node);
>          if ( rc )
>              panic("Could not set up domain %s (rc = %d)\n",
>                    dt_node_name(node), rc);

Well, that's for dom0less. Even for tool-stack created VMs there would be
no problem. But root tables required on demand (altp2m, nested) would be
different. So what you do here may be good enough for now, but likely will
want improving later on. (Such temporary restrictions may want putting
down somewhere.)

Jan