[PATCH v4 13/18] xen/riscv: implement p2m_next_level()

Oleksii Kurochko posted 18 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v4 13/18] xen/riscv: implement p2m_next_level()
Posted by Oleksii Kurochko 1 month, 1 week ago
Implement the p2m_next_level() function, which enables traversal and dynamic
allocation of intermediate levels (if necessary) in the RISC-V
p2m (physical-to-machine) page table hierarchy.

To support this, the following helpers are introduced:
- page_to_p2m_table(): Constructs non-leaf PTEs pointing to next-level page
  tables with correct attributes.
- p2m_alloc_page(): Allocates page table pages, supporting both hardware and
  guest domains.
- p2m_create_table(): Allocates and initializes a new page table page and
  installs it into the hierarchy.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V4:
 - make `page` argument of page_to_p2m_table pointer-to-const.
 - Move p2m_next_level()'s local variable `ret` to the more narrow space where
   it is really used.
 - Drop stale ASSERT() in p2m_next_level().
 - Stray blank after * in declaration of paging_alloc_page().
 - Decrease p2m_freelist.total_pages when a page is taken from the p2m freelist.
---
Changes in V3:
 - s/p2me_is_mapping/p2m_is_mapping to be in syc with other p2m_is_*() functions.
 - clear_and_clean_page() in p2m_create_table() instead of clear_page() to be
   sure that page is cleared and d-cache is flushed for it.
 - Move ASSERT(level != 0) in p2m_next_level() ahead of trying to allocate a
   page table.
 - Update p2m_create_table() to allocate metadata page to store p2m type in it
   for each entry of page table.
 - Introduce paging_alloc_page() and use it inside p2m_alloc_page().
 - Add allocated page to p2m->pages list in p2m_alloc_page() to simplify
   a caller code a little bit.
 - Drop p2m_is_mapping() and use pte_is_mapping() instead as P2M PTE's valid
   bit doesn't have another purpose anymore.
 - Update an implementation and prototype of page_to_p2m_table(), it is enough
   to pass only a page as an argument.
---
Changes in V2:
 - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping
   functionality" which was splitted to smaller.
 - s/p2m_is_mapping/p2m_is_mapping.
---
 xen/arch/riscv/include/asm/paging.h |  2 +
 xen/arch/riscv/p2m.c                | 79 ++++++++++++++++++++++++++++-
 xen/arch/riscv/paging.c             | 12 +++++
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/xen/arch/riscv/include/asm/paging.h b/xen/arch/riscv/include/asm/paging.h
index 9712aa77c5..69cb414962 100644
--- a/xen/arch/riscv/include/asm/paging.h
+++ b/xen/arch/riscv/include/asm/paging.h
@@ -15,4 +15,6 @@ int paging_ret_pages_to_freelist(struct domain *d, unsigned int nr_pages);
 
 void paging_free_page(struct domain *d, struct page_info *pg);
 
+struct page_info * paging_alloc_page(struct domain *d);
+
 #endif /* ASM_RISCV_PAGING_H */
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 2d4433360d..bf4945e99f 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -414,6 +414,48 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
     return e;
 }
 
+/* Generate table entry with correct attributes. */
+static pte_t page_to_p2m_table(const struct page_info *page)
+{
+    /*
+     * p2m_invalid will be ignored inside p2m_pte_from_mfn() as is_table is
+     * set to true and p2m_type_t shouldn't be applied for PTEs which
+     * describe an intermidiate table.
+     */
+    return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, true);
+}
+
+static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
+{
+    struct page_info *pg = paging_alloc_page(p2m->domain);
+
+    if ( pg )
+        page_list_add(pg, &p2m->pages);
+
+    return pg;
+}
+
+/*
+ * Allocate a new page table page with an extra metadata page and hook it
+ * in via the given entry.
+ */
+static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
+{
+    struct page_info *page;
+
+    ASSERT(!pte_is_valid(*entry));
+
+    page = p2m_alloc_page(p2m);
+    if ( page == NULL )
+        return -ENOMEM;
+
+    clear_and_clean_page(page, p2m->clean_dcache);
+
+    p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_dcache);
+
+    return 0;
+}
+
 #define P2M_TABLE_MAP_NONE 0
 #define P2M_TABLE_MAP_NOMEM 1
 #define P2M_TABLE_SUPER_PAGE 2
@@ -438,9 +480,42 @@ static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl,
                           unsigned int level, pte_t **table,
                           unsigned int offset)
 {
-    panic("%s: hasn't been implemented yet\n", __func__);
+    pte_t *entry;
+    mfn_t mfn;
+
+    /* The function p2m_next_level() is never called at the last level */
+    ASSERT(level != 0);
+
+    entry = *table + offset;
+
+    if ( !pte_is_valid(*entry) )
+    {
+        int ret;
+
+        if ( !alloc_tbl )
+            return P2M_TABLE_MAP_NONE;
+
+        ret = p2m_create_table(p2m, entry);
+        if ( ret )
+            return P2M_TABLE_MAP_NOMEM;
+    }
+
+    if ( pte_is_mapping(*entry) )
+        return P2M_TABLE_SUPER_PAGE;
+
+    mfn = mfn_from_pte(*entry);
+
+    unmap_domain_page(*table);
+
+    /*
+     * TODO: There's an inefficiency here:
+     *       In p2m_create_table(), the page is mapped to clear it.
+     *       Then that mapping is torn down in p2m_create_table(),
+     *       only to be re-established here.
+     */
+    *table = map_domain_page(mfn);
 
-    return P2M_TABLE_MAP_NONE;
+    return P2M_TABLE_NORMAL;
 }
 
 static void p2m_put_foreign_page(struct page_info *pg)
diff --git a/xen/arch/riscv/paging.c b/xen/arch/riscv/paging.c
index 049b850e03..803b026f34 100644
--- a/xen/arch/riscv/paging.c
+++ b/xen/arch/riscv/paging.c
@@ -115,6 +115,18 @@ void paging_free_page(struct domain *d, struct page_info *pg)
     spin_unlock(&d->arch.paging.lock);
 }
 
+struct page_info *paging_alloc_page(struct domain *d)
+{
+    struct page_info *pg;
+
+    spin_lock(&d->arch.paging.lock);
+    pg = page_list_remove_head(&d->arch.paging.freelist);
+    ACCESS_ONCE(d->arch.paging.total_pages)--;
+    spin_unlock(&d->arch.paging.lock);
+
+    return pg;
+}
+
 /* Domain paging struct initialization. */
 int paging_domain_init(struct domain *d)
 {
-- 
2.51.0
Re: [PATCH v4 13/18] xen/riscv: implement p2m_next_level()
Posted by Jan Beulich 1 month, 1 week ago
On 17.09.2025 23:55, Oleksii Kurochko wrote:
> Implement the p2m_next_level() function, which enables traversal and dynamic
> allocation of intermediate levels (if necessary) in the RISC-V
> p2m (physical-to-machine) page table hierarchy.
> 
> To support this, the following helpers are introduced:
> - page_to_p2m_table(): Constructs non-leaf PTEs pointing to next-level page
>   tables with correct attributes.
> - p2m_alloc_page(): Allocates page table pages, supporting both hardware and
>   guest domains.
> - p2m_create_table(): Allocates and initializes a new page table page and
>   installs it into the hierarchy.
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> ---
> Changes in V4:
>  - make `page` argument of page_to_p2m_table pointer-to-const.
>  - Move p2m_next_level()'s local variable `ret` to the more narrow space where
>    it is really used.
>  - Drop stale ASSERT() in p2m_next_level().
>  - Stray blank after * in declaration of paging_alloc_page().

When you deal with comments like this, can you please make sure you
apply them to at least a patch as a whole, if not the entire series?
I notice ...

> --- a/xen/arch/riscv/include/asm/paging.h
> +++ b/xen/arch/riscv/include/asm/paging.h
> @@ -15,4 +15,6 @@ int paging_ret_pages_to_freelist(struct domain *d, unsigned int nr_pages);
>  
>  void paging_free_page(struct domain *d, struct page_info *pg);
>  
> +struct page_info * paging_alloc_page(struct domain *d);

... there's still a stray blank here. With this dropped:
Acked-by: Jan Beulich <jbeulich@suse.com>
I have one other question, though:

> +/*
> + * Allocate a new page table page with an extra metadata page and hook it
> + * in via the given entry.
> + */
> +static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
> +{
> +    struct page_info *page;
> +
> +    ASSERT(!pte_is_valid(*entry));

Isn't this going to get in the way of splitting superpages? The caller
will need to initialize *entry just for this assertion to not trigger.

Jan
Re: [PATCH v4 13/18] xen/riscv: implement p2m_next_level()
Posted by Oleksii Kurochko 1 month ago
On 9/22/25 7:35 PM, Jan Beulich wrote:
> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>> Implement the p2m_next_level() function, which enables traversal and dynamic
>> allocation of intermediate levels (if necessary) in the RISC-V
>> p2m (physical-to-machine) page table hierarchy.
>>
>> To support this, the following helpers are introduced:
>> - page_to_p2m_table(): Constructs non-leaf PTEs pointing to next-level page
>>    tables with correct attributes.
>> - p2m_alloc_page(): Allocates page table pages, supporting both hardware and
>>    guest domains.
>> - p2m_create_table(): Allocates and initializes a new page table page and
>>    installs it into the hierarchy.
>>
>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
>> ---
>> Changes in V4:
>>   - make `page` argument of page_to_p2m_table pointer-to-const.
>>   - Move p2m_next_level()'s local variable `ret` to the more narrow space where
>>     it is really used.
>>   - Drop stale ASSERT() in p2m_next_level().
>>   - Stray blank after * in declaration of paging_alloc_page().
> When you deal with comments like this, can you please make sure you
> apply them to at least a patch as a whole, if not the entire series?
> I notice ...
>
>> --- a/xen/arch/riscv/include/asm/paging.h
>> +++ b/xen/arch/riscv/include/asm/paging.h
>> @@ -15,4 +15,6 @@ int paging_ret_pages_to_freelist(struct domain *d, unsigned int nr_pages);
>>   
>>   void paging_free_page(struct domain *d, struct page_info *pg);
>>   
>> +struct page_info * paging_alloc_page(struct domain *d);
> ... there's still a stray blank here. With this dropped:
> Acked-by: Jan Beulich<jbeulich@suse.com>

Thanks.

> I have one other question, though:
>
>> +/*
>> + * Allocate a new page table page with an extra metadata page and hook it
>> + * in via the given entry.
>> + */
>> +static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
>> +{
>> +    struct page_info *page;
>> +
>> +    ASSERT(!pte_is_valid(*entry));
> Isn't this going to get in the way of splitting superpages? The caller
> will need to initialize *entry just for this assertion to not trigger.

The superpage splitting function doesn’t use|p2m_create_table()|. It calls
|p2m_alloc_table()|, then fills the table, and finally updates the entry
using|p2m_write_pte()|. So this shouldn’t be an issue.

Ohh, I just noticed, the comment should be updated, since an extra metadata
page is no longer allocated here.

~ Oleksii