[for 4.22 v5 17/18] xen/riscv: add support of page lookup by GFN

Oleksii Kurochko posted 18 patches 3 months, 3 weeks ago
There is a newer version of this series
[for 4.22 v5 17/18] xen/riscv: add support of page lookup by GFN
Posted by Oleksii Kurochko 3 months, 3 weeks ago
Introduce helper functions for safely querying the P2M (physical-to-machine)
mapping:
 - add p2m_read_lock(), p2m_read_unlock(), and p2m_is_locked() for managing
   P2M lock state.
 - Implement p2m_get_entry() to retrieve mapping details for a given GFN,
   including MFN, page order, and validity.
 - Introduce p2m_get_page_from_gfn() to convert a GFN into a page_info
   pointer, acquiring a reference to the page if valid.
 - Introduce get_page().

Implementations are based on Arm's functions with some minor modifications:
- p2m_get_entry():
  - Reverse traversal of page tables, as RISC-V uses the opposite level
    numbering compared to Arm.
  - Removed the return of p2m_access_t from p2m_get_entry() since
    mem_access_settings is not introduced for RISC-V.
  - Updated BUILD_BUG_ON() to check using the level 0 mask, which corresponds
    to Arm's THIRD_MASK.
  - Replaced open-coded bit shifts with the BIT() macro.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V5:
 - Use introduced in earlier patches P2M_DECLARE_OFFSETS() instead of
   DECLARE_OFFSETS().
 - Drop blank line before check_outside_boundary().
 - Use more readable version of if statements inside check_outside_boundary().
 - Accumulate mask in check_outside_boundary() instead of re-writing it for
   each page table level to have correct gfns for comparison.
 - Set argument `t` of p2m_get_entry() to p2m_invalid by default.
 - Drop checking of (rc == P2M_TABLE_MAP_NOMEM ) when p2m_next_level(...,false,...)
   is called.
 - Add ASSERT(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)); in p2m_get_entry()
   to be sure that recieved `mfn` has cleared lowest bits.
 - Drop `valid` argument from p2m_get_entry(), it is not needed anymore.
 - Drop p2m_lookup(), use p2m_get_entry() explicitly inside p2m_get_page_from_gfn().
 - Update the commit message.
---
Changes in V4:
 - Update prototype of p2m_is_locked() to return bool and accept pointer-to-const.
 - Correct the comment above p2m_get_entry().
 - Drop the check "BUILD_BUG_ON(XEN_PT_LEVEL_MAP_MASK(0) != PAGE_MASK);" inside
   p2m_get_entry() as it is stale and it was needed to sure that 4k page(s) are
   used on L3 (in Arm terms) what is true for RISC-V. (if not special extension
   are used). It was another reason for Arm to have it (and I copied it to RISC-V),
   but it isn't true for RISC-V. (some details could be found in response to the
   patch).
 - Style fixes.
 - Add explanatory comment what the loop inside "gfn is higher then the highest
   p2m mapping" does. Move this loop to separate function check_outside_boundary()
   to cover both boundaries (lower_mapped_gfn and max_mapped_gfn).
 - There is not need to allocate a page table as it is expected that
   p2m_get_entry() normally would be called after a corresponding p2m_set_entry()
   was called. So change 'true' to 'false' in a page table walking loop inside
   p2m_get_entry().
 - Correct handling of p2m_is_foreign case inside p2m_get_page_from_gfn().
 - Introduce and use P2M_LEVEL_MASK instead of XEN_PT_LEVEL_MASK as it isn't take
   into account two extra bits for root table in case of P2M.
 - Drop stale item from "change in v3" - Add is_p2m_foreign() macro and connected stuff.
 - Add p2m_read_(un)lock().
---
Changes in V3:
 - Change struct domain *d argument of p2m_get_page_from_gfn() to
   struct p2m_domain.
 - Update the comment above p2m_get_entry().
 - s/_t/p2mt for local variable in p2m_get_entry().
 - Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn)
   to define offsets array.
 - Code style fixes.
 - Update a check of rc code from p2m_next_level() in p2m_get_entry()
   and drop "else" case.
 - Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL.
 - Use struct p2m_domain instead of struct domain for p2m_lookup() and
   p2m_get_page_from_gfn().
 - Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers"
---
Changes in V2:
 - New patch.
---
 xen/arch/riscv/include/asm/p2m.h |  20 ++++
 xen/arch/riscv/mm.c              |  13 +++
 xen/arch/riscv/p2m.c             | 175 +++++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+)

diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 6a17cd52fc..39cfc1fd9e 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -48,6 +48,8 @@ extern unsigned int gstage_root_level;
 
 #define P2M_LEVEL_SHIFT(lvl) (P2M_LEVEL_ORDER(lvl) + PAGE_SHIFT)
 
+#define P2M_LEVEL_MASK(lvl) (GFN_MASK(lvl) << P2M_LEVEL_SHIFT(lvl))
+
 #define paddr_bits PADDR_BITS
 
 /* Get host p2m table */
@@ -232,6 +234,24 @@ static inline bool p2m_is_write_locked(struct p2m_domain *p2m)
 
 unsigned long construct_hgatp(const struct p2m_domain *p2m, uint16_t vmid);
 
+static inline void p2m_read_lock(struct p2m_domain *p2m)
+{
+    read_lock(&p2m->lock);
+}
+
+static inline void p2m_read_unlock(struct p2m_domain *p2m)
+{
+    read_unlock(&p2m->lock);
+}
+
+static inline bool p2m_is_locked(const struct p2m_domain *p2m)
+{
+    return rw_is_locked(&p2m->lock);
+}
+
+struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
+                                        p2m_type_t *t);
+
 #endif /* ASM__RISCV__P2M_H */
 
 /*
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index e25f995b72..e9ce182d06 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -673,3 +673,16 @@ struct domain *page_get_owner_and_reference(struct page_info *page)
 
     return owner;
 }
+
+bool get_page(struct page_info *page, const struct domain *domain)
+{
+    const struct domain *owner = page_get_owner_and_reference(page);
+
+    if ( likely(owner == domain) )
+        return true;
+
+    if ( owner != NULL )
+        put_page(page);
+
+    return false;
+}
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 383047580a..785d11aaff 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -1049,3 +1049,178 @@ int map_regions_p2mt(struct domain *d,
 
     return rc;
 }
+
+/*
+ * p2m_get_entry() should always return the correct order value, even if an
+ * entry is not present (i.e. the GFN is outside the range):
+ *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
+ *
+ * This ensures that callers of p2m_get_entry() can determine what range of
+ * address space would be altered by a corresponding p2m_set_entry().
+ * Also, it would help to avoid cost page walks for GFNs outside range (1).
+ *
+ * Therefore, this function returns true for GFNs outside range (1), and in
+ * that case the corresponding level is returned via the level_out argument.
+ * Otherwise, it returns false and p2m_get_entry() performs a page walk to
+ * find the proper entry.
+ */
+static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
+                                   unsigned int *level_out)
+{
+    unsigned int level;
+
+    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
+                  : gfn_x(gfn) > gfn_x(boundary) )
+    {
+        unsigned long mask = 0;
+
+        for ( level = P2M_ROOT_LEVEL; level; level-- )
+        {
+            unsigned long masked_gfn;
+
+            mask |= PFN_DOWN(P2M_LEVEL_MASK(level));
+            masked_gfn = gfn_x(gfn) & mask;
+
+            if ( is_lower ? masked_gfn < gfn_x(boundary)
+                          : masked_gfn > gfn_x(boundary) )
+            {
+                *level_out = level;
+                return true;
+            }
+        }
+    }
+
+    return false;
+}
+
+/*
+ * Get the details of a given gfn.
+ *
+ * If the entry is present, the associated MFN will be returned and the
+ * p2m type of the mapping.
+ * The page_order will correspond to the order of the mapping in the page
+ * table (i.e it could be a superpage).
+ *
+ * If the entry is not present, INVALID_MFN will be returned and the
+ * page_order will be set according to the order of the invalid range.
+ */
+static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+                           p2m_type_t *t,
+                           unsigned int *page_order)
+{
+    unsigned int level = 0;
+    pte_t entry, *table;
+    int rc;
+    mfn_t mfn = INVALID_MFN;
+    P2M_DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
+
+    ASSERT(p2m_is_locked(p2m));
+
+    if ( t )
+        *t = p2m_invalid;
+
+    if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
+        goto out;
+
+    if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
+        goto out;
+
+    table = p2m_get_root_pointer(p2m, gfn);
+
+    /*
+     * The table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    if ( !table )
+    {
+        ASSERT_UNREACHABLE();
+        level = P2M_ROOT_LEVEL;
+        goto out;
+    }
+
+    for ( level = P2M_ROOT_LEVEL; level; level-- )
+    {
+        rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
+        if ( rc == P2M_TABLE_MAP_NONE )
+            goto out_unmap;
+
+        if ( rc != P2M_TABLE_NORMAL )
+            break;
+    }
+
+    entry = table[offsets[level]];
+
+    if ( pte_is_valid(entry) )
+    {
+        if ( t )
+            *t = p2m_get_type(entry);
+
+        mfn = pte_get_mfn(entry);
+
+        ASSERT(!(mfn_x(mfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)));
+
+        /*
+         * The entry may point to a superpage. Find the MFN associated
+         * to the GFN.
+         */
+        mfn = mfn_add(mfn,
+                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
+    }
+
+ out_unmap:
+    unmap_domain_page(table);
+
+ out:
+    if ( page_order )
+        *page_order = P2M_LEVEL_ORDER(level);
+
+    return mfn;
+}
+
+struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
+                                        p2m_type_t *t)
+{
+    struct page_info *page;
+    p2m_type_t p2mt = p2m_invalid;
+    mfn_t mfn;
+
+    p2m_read_lock(p2m);
+    mfn = p2m_get_entry(p2m, gfn, t, NULL);
+
+    if ( !mfn_valid(mfn) )
+    {
+        p2m_read_unlock(p2m);
+        return NULL;
+    }
+
+    if ( t )
+        p2mt = *t;
+
+    page = mfn_to_page(mfn);
+
+    /*
+     * get_page won't work on foreign mapping because the page doesn't
+     * belong to the current domain.
+     */
+    if ( unlikely(p2m_is_foreign(p2mt)) )
+    {
+        const struct domain *fdom = page_get_owner_and_reference(page);
+
+        p2m_read_unlock(p2m);
+
+        if ( fdom )
+        {
+            if ( likely(fdom != p2m->domain) )
+                return page;
+
+            ASSERT_UNREACHABLE();
+            put_page(page);
+        }
+
+        return NULL;
+    }
+
+    p2m_read_unlock(p2m);
+
+    return get_page(page, p2m->domain) ? page : NULL;
+}
-- 
2.51.0
Re: [for 4.22 v5 17/18] xen/riscv: add support of page lookup by GFN
Posted by Jan Beulich 3 months ago
On 20.10.2025 17:58, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1049,3 +1049,178 @@ int map_regions_p2mt(struct domain *d,
>  
>      return rc;
>  }
> +
> +/*
> + * p2m_get_entry() should always return the correct order value, even if an
> + * entry is not present (i.e. the GFN is outside the range):
> + *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)

There's one closing parenthesis too many here (likely the one before the colon).

> + * This ensures that callers of p2m_get_entry() can determine what range of
> + * address space would be altered by a corresponding p2m_set_entry().
> + * Also, it would help to avoid cost page walks for GFNs outside range (1).

DYM "costly"?

> + * Therefore, this function returns true for GFNs outside range (1), and in
> + * that case the corresponding level is returned via the level_out argument.
> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
> + * find the proper entry.
> + */
> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
> +                                   unsigned int *level_out)
> +{
> +    unsigned int level;
> +
> +    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
> +                  : gfn_x(gfn) > gfn_x(boundary) )
> +    {
> +        unsigned long mask = 0;
> +
> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
> +        {
> +            unsigned long masked_gfn;
> +
> +            mask |= PFN_DOWN(P2M_LEVEL_MASK(level));
> +            masked_gfn = gfn_x(gfn) & mask;
> +
> +            if ( is_lower ? masked_gfn < gfn_x(boundary)
> +                          : masked_gfn > gfn_x(boundary) )
> +            {
> +                *level_out = level;

For this to be correct in the is_lower case, don't you need to fill the
bottom bits of masked_gfn with all 1s, rather than with all 0s? Otherwise
the tail of the range may be above boundary.

> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
> +                                        p2m_type_t *t)
> +{
> +    struct page_info *page;
> +    p2m_type_t p2mt = p2m_invalid;
> +    mfn_t mfn;
> +
> +    p2m_read_lock(p2m);
> +    mfn = p2m_get_entry(p2m, gfn, t, NULL);
> +
> +    if ( !mfn_valid(mfn) )
> +    {
> +        p2m_read_unlock(p2m);
> +        return NULL;
> +    }
> +
> +    if ( t )
> +        p2mt = *t;

Doesn't it need to be the other way around? The way you have it, when a caller
passes NULL for t, p2m_get_entry() won't give you a type, and you'll do all
further work with p2m_invalid.

Also, might this better move ahead of the earlier if()? Callers might be able
to do still something based on the type, when they get back NULL as function
return value. (Practically this might only become of interest once you add
something like PoD, paging, or sharing.)

Jan
Re: [for 4.22 v5 17/18] xen/riscv: add support of page lookup by GFN
Posted by Oleksii Kurochko 2 months, 3 weeks ago
On 11/10/25 5:46 PM, Jan Beulich wrote:
> On 20.10.2025 17:58, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -1049,3 +1049,178 @@ int map_regions_p2mt(struct domain *d,
>>   
>>       return rc;
>>   }
>> +
>> +/*
>> + * p2m_get_entry() should always return the correct order value, even if an
>> + * entry is not present (i.e. the GFN is outside the range):
>> + *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
> There's one closing parenthesis too many here (likely the one before the colon).

You are right, ')' should be dropped. I think that "." could be drooped too.

>
>> + * This ensures that callers of p2m_get_entry() can determine what range of
>> + * address space would be altered by a corresponding p2m_set_entry().
>> + * Also, it would help to avoid cost page walks for GFNs outside range (1).
> DYM "costly"?

Agree, costly would be better here.

>
>> + * Therefore, this function returns true for GFNs outside range (1), and in
>> + * that case the corresponding level is returned via the level_out argument.
>> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
>> + * find the proper entry.
>> + */
>> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
>> +                                   unsigned int *level_out)
>> +{
>> +    unsigned int level;
>> +
>> +    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>> +                  : gfn_x(gfn) > gfn_x(boundary) )
>> +    {
>> +        unsigned long mask = 0;
>> +
>> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
>> +        {
>> +            unsigned long masked_gfn;
>> +
>> +            mask |= PFN_DOWN(P2M_LEVEL_MASK(level));
>> +            masked_gfn = gfn_x(gfn) & mask;
>> +
>> +            if ( is_lower ? masked_gfn < gfn_x(boundary)
>> +                          : masked_gfn > gfn_x(boundary) )
>> +            {
>> +                *level_out = level;
> For this to be correct in the is_lower case, don't you need to fill the
> bottom bits of masked_gfn with all 1s, rather than with all 0s? Otherwise
> the tail of the range may be above boundary.

I think that I didn't get what you mean by "the range" here and so I can't understand
what is "the tail of the range".
Could you please clarify?

>
>> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
>> +                                        p2m_type_t *t)
>> +{
>> +    struct page_info *page;
>> +    p2m_type_t p2mt = p2m_invalid;
>> +    mfn_t mfn;
>> +
>> +    p2m_read_lock(p2m);
>> +    mfn = p2m_get_entry(p2m, gfn, t, NULL);
>> +
>> +    if ( !mfn_valid(mfn) )
>> +    {
>> +        p2m_read_unlock(p2m);
>> +        return NULL;
>> +    }
>> +
>> +    if ( t )
>> +        p2mt = *t;
> Doesn't it need to be the other way around? The way you have it, when a caller
> passes NULL for t, p2m_get_entry() won't give you a type, and you'll do all
> further work with p2m_invalid.

IIUC, then the following should resolve the mentioned issue:
@@ -1344,11 +1344,14 @@ struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
                                          p2m_type_t *t)
  {
      struct page_info *page;
-    p2m_type_t p2mt = p2m_invalid;
+    p2m_type_t p2mt;
      mfn_t mfn;
  
      p2m_read_lock(p2m);
-    mfn = p2m_get_entry(p2m, gfn, t, NULL);
+    mfn = p2m_get_entry(p2m, gfn, &p2mt, NULL);

>
> Also, might this better move ahead of the earlier if()? Callers might be able
> to do still something based on the type, when they get back NULL as function
> return value. (Practically this might only become of interest once you add
> something like PoD, paging, or sharing.)

Agree with that, it should be moved before "if ( !mfn_valid(mfn) )"

Thanks.

~ Oleksii
Re: [for 4.22 v5 17/18] xen/riscv: add support of page lookup by GFN
Posted by Jan Beulich 2 months, 3 weeks ago
On 17.11.2025 16:52, Oleksii Kurochko wrote:
> On 11/10/25 5:46 PM, Jan Beulich wrote:
>> On 20.10.2025 17:58, Oleksii Kurochko wrote:
>>> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
>>> +                                   unsigned int *level_out)
>>> +{
>>> +    unsigned int level;
>>> +
>>> +    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>>> +                  : gfn_x(gfn) > gfn_x(boundary) )
>>> +    {
>>> +        unsigned long mask = 0;
>>> +
>>> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
>>> +        {
>>> +            unsigned long masked_gfn;
>>> +
>>> +            mask |= PFN_DOWN(P2M_LEVEL_MASK(level));
>>> +            masked_gfn = gfn_x(gfn) & mask;
>>> +
>>> +            if ( is_lower ? masked_gfn < gfn_x(boundary)
>>> +                          : masked_gfn > gfn_x(boundary) )
>>> +            {
>>> +                *level_out = level;
>> For this to be correct in the is_lower case, don't you need to fill the
>> bottom bits of masked_gfn with all 1s, rather than with all 0s? Otherwise
>> the tail of the range may be above boundary.
> 
> I think that I didn't get what you mean by "the range" here and so I can't understand
> what is "the tail of the range".
> Could you please clarify?

By applying "mask" you effectively produce a range (with "gfn" somewhere in
the middle). For the level (which you return to the caller) to be correct,
the entire range must be matching "gfn" in being below or above of the
boundary. My impression is that this isn't the case when is_lower is true.

Jan
Re: [for 4.22 v5 17/18] xen/riscv: add support of page lookup by GFN
Posted by Oleksii Kurochko 2 months, 3 weeks ago
On 11/17/25 5:00 PM, Jan Beulich wrote:
> On 17.11.2025 16:52, Oleksii Kurochko wrote:
>> On 11/10/25 5:46 PM, Jan Beulich wrote:
>>> On 20.10.2025 17:58, Oleksii Kurochko wrote:
>>>> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
>>>> +                                   unsigned int *level_out)
>>>> +{
>>>> +    unsigned int level;
>>>> +
>>>> +    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>>>> +                  : gfn_x(gfn) > gfn_x(boundary) )
>>>> +    {
>>>> +        unsigned long mask = 0;
>>>> +
>>>> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
>>>> +        {
>>>> +            unsigned long masked_gfn;
>>>> +
>>>> +            mask |= PFN_DOWN(P2M_LEVEL_MASK(level));
>>>> +            masked_gfn = gfn_x(gfn) & mask;
>>>> +
>>>> +            if ( is_lower ? masked_gfn < gfn_x(boundary)
>>>> +                          : masked_gfn > gfn_x(boundary) )
>>>> +            {
>>>> +                *level_out = level;
>>> For this to be correct in the is_lower case, don't you need to fill the
>>> bottom bits of masked_gfn with all 1s, rather than with all 0s? Otherwise
>>> the tail of the range may be above boundary.
>> I think that I didn't get what you mean by "the range" here and so I can't understand
>> what is "the tail of the range".
>> Could you please clarify?
> By applying "mask" you effectively produce a range (with "gfn" somewhere in
> the middle). For the level (which you return to the caller) to be correct,
> the entire range must be matching "gfn" in being below or above of the
> boundary. My impression is that this isn't the case when is_lower is true.

Oh, got it. Then I agree that when is_lower is true we really need to fill the bottoms
bits of masked_gfn with all 1s.

Thanks for clarifying.

~ Oleksii