[PATCH v4 17/18] xen/riscv: add support of page lookup by GFN

Oleksii Kurochko posted 18 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v4 17/18] xen/riscv: add support of page lookup by GFN
Posted by Oleksii Kurochko 1 month, 1 week ago
Introduce helper functions for safely querying the P2M (physical-to-machine)
mapping:
 - add p2m_read_lock(), p2m_read_unlock(), and p2m_is_locked() for managing
   P2M lock state.
 - Implement p2m_get_entry() to retrieve mapping details for a given GFN,
   including MFN, page order, and validity.
 - Add p2m_lookup() to encapsulate read-locked MFN retrieval.
 - Introduce p2m_get_page_from_gfn() to convert a GFN into a page_info
   pointer, acquiring a reference to the page if valid.
 - Introduce get_page().

Implementations are based on Arm's functions with some minor modifications:
- p2m_get_entry():
  - Reverse traversal of page tables, as RISC-V uses the opposite level
    numbering compared to Arm.
  - Removed the return of p2m_access_t from p2m_get_entry() since
    mem_access_settings is not introduced for RISC-V.
  - Updated BUILD_BUG_ON() to check using the level 0 mask, which corresponds
    to Arm's THIRD_MASK.
  - Replaced open-coded bit shifts with the BIT() macro.
  - Other minor changes, such as using RISC-V-specific functions to validate
    P2M PTEs, and replacing Arm-specific GUEST_* macros with their RISC-V
    equivalents.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V4:
 - Update prototype of p2m_is_locked() to return bool and accept pointer-to-const.
 - Correct the comment above p2m_get_entry().
 - Drop the check "BUILD_BUG_ON(XEN_PT_LEVEL_MAP_MASK(0) != PAGE_MASK);" inside
   p2m_get_entry() as it is stale and it was needed to sure that 4k page(s) are
   used on L3 (in Arm terms) what is true for RISC-V. (if not special extension
   are used). It was another reason for Arm to have it (and I copied it to RISC-V),
   but it isn't true for RISC-V. (some details could be found in response to the
   patch).
 - Style fixes.
 - Add explanatory comment what the loop inside "gfn is higher then the highest
   p2m mapping" does. Move this loop to separate function check_outside_boundary()
   to cover both boundaries (lower_mapped_gfn and max_mapped_gfn).
 - There is not need to allocate a page table as it is expected that
   p2m_get_entry() normally would be called after a corresponding p2m_set_entry()
   was called. So change 'true' to 'false' in a page table walking loop inside
   p2m_get_entry().
 - Correct handling of p2m_is_foreign case inside p2m_get_page_from_gfn().
 - Introduce and use P2M_LEVEL_MASK instead of XEN_PT_LEVEL_MASK as it isn't take
   into account two extra bits for root table in case of P2M.
 - Drop stale item from "change in v3" - Add is_p2m_foreign() macro and connected stuff.
 - Add p2m_read_(un)lock().
---
Changes in V3:
 - Change struct domain *d argument of p2m_get_page_from_gfn() to
   struct p2m_domain.
 - Update the comment above p2m_get_entry().
 - s/_t/p2mt for local variable in p2m_get_entry().
 - Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn)
   to define offsets array.
 - Code style fixes.
 - Update a check of rc code from p2m_next_level() in p2m_get_entry()
   and drop "else" case.
 - Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL.
 - Use struct p2m_domain instead of struct domain for p2m_lookup() and
   p2m_get_page_from_gfn().
 - Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers"
---
Changes in V2:
 - New patch.
---
 xen/arch/riscv/include/asm/p2m.h |  24 ++++
 xen/arch/riscv/mm.c              |  13 +++
 xen/arch/riscv/p2m.c             | 186 +++++++++++++++++++++++++++++++
 3 files changed, 223 insertions(+)

diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 29685c7852..2d0b0375d5 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -44,6 +44,12 @@ extern unsigned int gstage_root_level;
 #define P2M_PAGETABLE_ENTRIES(lvl) \
     (BIT(PAGETABLE_ORDER + P2M_ROOT_EXTRA_BITS(lvl), UL))
 
+#define GFN_MASK(lvl) (P2M_PAGETABLE_ENTRIES(lvl) - 1UL)
+
+#define P2M_LEVEL_SHIFT(lvl) (P2M_LEVEL_ORDER(lvl) + PAGE_SHIFT)
+
+#define P2M_LEVEL_MASK(lvl) (GFN_MASK(lvl) << P2M_LEVEL_SHIFT(lvl))
+
 #define paddr_bits PADDR_BITS
 
 /* Get host p2m table */
@@ -229,6 +235,24 @@ static inline bool p2m_is_write_locked(struct p2m_domain *p2m)
 
 unsigned long construct_hgatp(struct p2m_domain *p2m, uint16_t vmid);
 
+static inline void p2m_read_lock(struct p2m_domain *p2m)
+{
+    read_lock(&p2m->lock);
+}
+
+static inline void p2m_read_unlock(struct p2m_domain *p2m)
+{
+    read_unlock(&p2m->lock);
+}
+
+static inline bool p2m_is_locked(const struct p2m_domain *p2m)
+{
+    return rw_is_locked(&p2m->lock);
+}
+
+struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
+                                        p2m_type_t *t);
+
 #endif /* ASM__RISCV__P2M_H */
 
 /*
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 8c6e8075f3..e34b1b674a 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -675,3 +675,16 @@ struct domain *page_get_owner_and_reference(struct page_info *page)
 
     return owner;
 }
+
+bool get_page(struct page_info *page, const struct domain *domain)
+{
+    const struct domain *owner = page_get_owner_and_reference(page);
+
+    if ( likely(owner == domain) )
+        return true;
+
+    if ( owner != NULL )
+        put_page(page);
+
+    return false;
+}
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 1577b09b15..a5ea61fe61 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -978,3 +978,189 @@ int map_regions_p2mt(struct domain *d,
 
     return rc;
 }
+
+
+/*
+ * p2m_get_entry() should always return the correct order value, even if an
+ * entry is not present (i.e. the GFN is outside the range):
+ *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
+ *
+ * This ensures that callers of p2m_get_entry() can determine what range of
+ * address space would be altered by a corresponding p2m_set_entry().
+ * Also, it would help to avoid cost page walks for GFNs outside range (1).
+ *
+ * Therefore, this function returns true for GFNs outside range (1), and in
+ * that case the corresponding level is returned via the level_out argument.
+ * Otherwise, it returns false and p2m_get_entry() performs a page walk to
+ * find the proper entry.
+ */
+static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
+                                   unsigned int *level_out)
+{
+    unsigned int level;
+
+    if ( (is_lower && gfn_x(gfn) < gfn_x(boundary)) ||
+         (!is_lower && gfn_x(gfn) > gfn_x(boundary)) )
+    {
+        for ( level = P2M_ROOT_LEVEL; level; level-- )
+        {
+            unsigned long mask = PFN_DOWN(P2M_LEVEL_MASK(level));
+
+            if ( (is_lower && ((gfn_x(gfn) & mask) < gfn_x(boundary))) ||
+                 (!is_lower && ((gfn_x(gfn) & mask) > gfn_x(boundary))) )
+            {
+                *level_out = level;
+                return true;
+            }
+        }
+    }
+
+    return false;
+}
+
+/*
+ * Get the details of a given gfn.
+ *
+ * If the entry is present, the associated MFN will be returned and the
+ * p2m type of the mapping.
+ * The page_order will correspond to the order of the mapping in the page
+ * table (i.e it could be a superpage).
+ *
+ * If the entry is not present, INVALID_MFN will be returned and the
+ * page_order will be set according to the order of the invalid range.
+ *
+ * valid will contain the value of bit[0] (e.g valid bit) of the
+ * entry.
+ */
+static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+                           p2m_type_t *t,
+                           unsigned int *page_order,
+                           bool *valid)
+{
+    unsigned int level = 0;
+    pte_t entry, *table;
+    int rc;
+    mfn_t mfn = INVALID_MFN;
+    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
+
+    ASSERT(p2m_is_locked(p2m));
+
+    if ( valid )
+        *valid = false;
+
+    if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
+        goto out;
+
+    if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
+        goto out;
+
+    table = p2m_get_root_pointer(p2m, gfn);
+
+    /*
+     * The table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    if ( !table )
+    {
+        ASSERT_UNREACHABLE();
+        level = P2M_ROOT_LEVEL;
+        goto out;
+    }
+
+    for ( level = P2M_ROOT_LEVEL; level; level-- )
+    {
+        rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
+        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
+            goto out_unmap;
+
+        if ( rc != P2M_TABLE_NORMAL )
+            break;
+    }
+
+    entry = table[offsets[level]];
+
+    if ( pte_is_valid(entry) )
+    {
+        if ( t )
+            *t = p2m_get_type(entry);
+
+        mfn = pte_get_mfn(entry);
+        /*
+         * The entry may point to a superpage. Find the MFN associated
+         * to the GFN.
+         */
+        mfn = mfn_add(mfn,
+                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
+
+        if ( valid )
+            *valid = pte_is_valid(entry);
+    }
+
+ out_unmap:
+    unmap_domain_page(table);
+
+ out:
+    if ( page_order )
+        *page_order = P2M_LEVEL_ORDER(level);
+
+    return mfn;
+}
+
+static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t)
+{
+    mfn_t mfn;
+
+    p2m_read_lock(p2m);
+    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL);
+    p2m_read_unlock(p2m);
+
+    return mfn;
+}
+
+struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
+                                        p2m_type_t *t)
+{
+    struct page_info *page;
+    p2m_type_t p2mt = p2m_invalid;
+    mfn_t mfn;
+
+    p2m_read_lock(p2m);
+    mfn = p2m_lookup(p2m, gfn, t);
+
+    if ( !mfn_valid(mfn) )
+    {
+        p2m_read_unlock(p2m);
+        return NULL;
+    }
+
+    if ( t )
+        p2mt = *t;
+
+    page = mfn_to_page(mfn);
+
+    /*
+     * get_page won't work on foreign mapping because the page doesn't
+     * belong to the current domain.
+     */
+    if ( unlikely(p2m_is_foreign(p2mt)) )
+    {
+        const struct domain *fdom = page_get_owner_and_reference(page);
+
+        p2m_read_unlock(p2m);
+
+        if ( fdom )
+        {
+            if ( likely(fdom != p2m->domain) )
+                return page;
+
+            ASSERT_UNREACHABLE();
+            put_page(page);
+        }
+
+        return NULL;
+    }
+
+    p2m_read_unlock(p2m);
+
+    return get_page(page, p2m->domain) ? page : NULL;
+}
-- 
2.51.0
Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN
Posted by Jan Beulich 1 month, 1 week ago
On 17.09.2025 23:55, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -978,3 +978,189 @@ int map_regions_p2mt(struct domain *d,
>  
>      return rc;
>  }
> +
> +

Nit: No double blank lines please.

> +/*
> + * p2m_get_entry() should always return the correct order value, even if an
> + * entry is not present (i.e. the GFN is outside the range):
> + *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
> + *
> + * This ensures that callers of p2m_get_entry() can determine what range of
> + * address space would be altered by a corresponding p2m_set_entry().
> + * Also, it would help to avoid cost page walks for GFNs outside range (1).
> + *
> + * Therefore, this function returns true for GFNs outside range (1), and in
> + * that case the corresponding level is returned via the level_out argument.
> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
> + * find the proper entry.
> + */
> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
> +                                   unsigned int *level_out)
> +{
> +    unsigned int level;
> +
> +    if ( (is_lower && gfn_x(gfn) < gfn_x(boundary)) ||
> +         (!is_lower && gfn_x(gfn) > gfn_x(boundary)) )

I understand people write things this way, but personally I find it confusing
to read. Why not simply use a conditional operator here (and again below):

    if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
                  : gfn_x(gfn) > gfn_x(boundary) )

> +    {
> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
> +        {
> +            unsigned long mask = PFN_DOWN(P2M_LEVEL_MASK(level));

Don't you need to accumulate the mask to use across loop iterations here
(or calculate it accordingly)? Else ...

> +            if ( (is_lower && ((gfn_x(gfn) & mask) < gfn_x(boundary))) ||
> +                 (!is_lower && ((gfn_x(gfn) & mask) > gfn_x(boundary))) )

... here you'll compare some middle part of the original GFN against the
boundary.

> +            {
> +                *level_out = level;
> +                return true;
> +            }
> +        }
> +    }
> +
> +    return false;
> +}
> +
> +/*
> + * Get the details of a given gfn.
> + *
> + * If the entry is present, the associated MFN will be returned and the
> + * p2m type of the mapping.
> + * The page_order will correspond to the order of the mapping in the page
> + * table (i.e it could be a superpage).
> + *
> + * If the entry is not present, INVALID_MFN will be returned and the
> + * page_order will be set according to the order of the invalid range.
> + *
> + * valid will contain the value of bit[0] (e.g valid bit) of the
> + * entry.
> + */
> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
> +                           p2m_type_t *t,
> +                           unsigned int *page_order,
> +                           bool *valid)
> +{
> +    unsigned int level = 0;
> +    pte_t entry, *table;
> +    int rc;
> +    mfn_t mfn = INVALID_MFN;
> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
> +
> +    ASSERT(p2m_is_locked(p2m));
> +
> +    if ( valid )
> +        *valid = false;

Wouldn't you better similarly set *t to some "default" value?

> +    if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
> +        goto out;
> +
> +    if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
> +        goto out;
> +
> +    table = p2m_get_root_pointer(p2m, gfn);
> +
> +    /*
> +     * The table should always be non-NULL because the gfn is below
> +     * p2m->max_mapped_gfn and the root table pages are always present.
> +     */
> +    if ( !table )
> +    {
> +        ASSERT_UNREACHABLE();
> +        level = P2M_ROOT_LEVEL;
> +        goto out;
> +    }
> +
> +    for ( level = P2M_ROOT_LEVEL; level; level-- )
> +    {
> +        rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
> +            goto out_unmap;

Getting back P2M_TABLE_MAP_NOMEM here is a bug, not really a loop exit
condition.

> +        if ( rc != P2M_TABLE_NORMAL )
> +            break;
> +    }
> +
> +    entry = table[offsets[level]];
> +
> +    if ( pte_is_valid(entry) )
> +    {
> +        if ( t )
> +            *t = p2m_get_type(entry);
> +
> +        mfn = pte_get_mfn(entry);
> +        /*
> +         * The entry may point to a superpage. Find the MFN associated
> +         * to the GFN.
> +         */
> +        mfn = mfn_add(mfn,
> +                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));

May want to assert that the respective bits of "mfn" are actually clear
before this calculation.

> +        if ( valid )
> +            *valid = pte_is_valid(entry);
> +    }
> +
> + out_unmap:
> +    unmap_domain_page(table);
> +
> + out:
> +    if ( page_order )
> +        *page_order = P2M_LEVEL_ORDER(level);
> +
> +    return mfn;
> +}
> +
> +static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t)
> +{
> +    mfn_t mfn;
> +
> +    p2m_read_lock(p2m);
> +    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL);

Seeing the two NULLs here I wonder: What use is the "valid" parameter of that
function? And what use is the function here when it doesn't also return the
order? IOW I'm not sure having this helper is actually worthwhile. This is
even more so that ...

> +    p2m_read_unlock(p2m);
> +
> +    return mfn;
> +}
> +
> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
> +                                        p2m_type_t *t)
> +{
> +    struct page_info *page;
> +    p2m_type_t p2mt = p2m_invalid;
> +    mfn_t mfn;
> +
> +    p2m_read_lock(p2m);
> +    mfn = p2m_lookup(p2m, gfn, t);

... there's a locking problem here: You cannot acquire a read lock in a
nested fashion - that's a recipe for a deadlock when between the first
acquire and the 2nd acquire attempt another CPU tries to acquire the
lock for writing (which will result in no further readers being allowed
in). It wasn't all that long ago that in the security team we actually
audited the code base for the absence of such a pattern.

Jan
Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN
Posted by Oleksii Kurochko 1 month ago
On 9/22/25 10:46 PM, Jan Beulich wrote:
> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -978,3 +978,189 @@ int map_regions_p2mt(struct domain *d,
>>   
>>       return rc;
>>   }
>> +
>> +
> Nit: No double blank lines please.
>
>> +/*
>> + * p2m_get_entry() should always return the correct order value, even if an
>> + * entry is not present (i.e. the GFN is outside the range):
>> + *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
>> + *
>> + * This ensures that callers of p2m_get_entry() can determine what range of
>> + * address space would be altered by a corresponding p2m_set_entry().
>> + * Also, it would help to avoid cost page walks for GFNs outside range (1).
>> + *
>> + * Therefore, this function returns true for GFNs outside range (1), and in
>> + * that case the corresponding level is returned via the level_out argument.
>> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
>> + * find the proper entry.
>> + */
>> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
>> +                                   unsigned int *level_out)
>> +{
>> +    unsigned int level;
>> +
>> +    if ( (is_lower && gfn_x(gfn) < gfn_x(boundary)) ||
>> +         (!is_lower && gfn_x(gfn) > gfn_x(boundary)) )
> I understand people write things this way, but personally I find it confusing
> to read. Why not simply use a conditional operator here (and again below):
>
>      if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>                    : gfn_x(gfn) > gfn_x(boundary) )

I am okay with both options. If you think the second one is more readable then I
will use it.

>> +    {
>> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
>> +        {
>> +            unsigned long mask = PFN_DOWN(P2M_LEVEL_MASK(level));
> Don't you need to accumulate the mask to use across loop iterations here
> (or calculate it accordingly)? Else ...
>
>> +            if ( (is_lower && ((gfn_x(gfn) & mask) < gfn_x(boundary))) ||
>> +                 (!is_lower && ((gfn_x(gfn) & mask) > gfn_x(boundary))) )
> ... here you'll compare some middle part of the original GFN against the
> boundary.

Agree, accumulation of the mask should be done here.

>> +            {
>> +                *level_out = level;
>> +                return true;
>> +            }
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +/*
>> + * Get the details of a given gfn.
>> + *
>> + * If the entry is present, the associated MFN will be returned and the
>> + * p2m type of the mapping.
>> + * The page_order will correspond to the order of the mapping in the page
>> + * table (i.e it could be a superpage).
>> + *
>> + * If the entry is not present, INVALID_MFN will be returned and the
>> + * page_order will be set according to the order of the invalid range.
>> + *
>> + * valid will contain the value of bit[0] (e.g valid bit) of the
>> + * entry.
>> + */
>> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
>> +                           p2m_type_t *t,
>> +                           unsigned int *page_order,
>> +                           bool *valid)
>> +{
>> +    unsigned int level = 0;
>> +    pte_t entry, *table;
>> +    int rc;
>> +    mfn_t mfn = INVALID_MFN;
>> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>> +
>> +    ASSERT(p2m_is_locked(p2m));
>> +
>> +    if ( valid )
>> +        *valid = false;
> Wouldn't you better similarly set *t to some "default" value?

I think it makes sense. I will set it to p2m_invalid.

>> +    if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
>> +        goto out;
>> +
>> +    if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
>> +        goto out;
>> +
>> +    table = p2m_get_root_pointer(p2m, gfn);
>> +
>> +    /*
>> +     * The table should always be non-NULL because the gfn is below
>> +     * p2m->max_mapped_gfn and the root table pages are always present.
>> +     */
>> +    if ( !table )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        level = P2M_ROOT_LEVEL;
>> +        goto out;
>> +    }
>> +
>> +    for ( level = P2M_ROOT_LEVEL; level; level-- )
>> +    {
>> +        rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
>> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>> +            goto out_unmap;
> Getting back P2M_TABLE_MAP_NOMEM here is a bug, not really a loop exit
> condition.

Oh, I agree. With the second argument set to|false|,|rc = P2M_TABLE_MAP_NOMEM|
will never be returned, so it can simply be dropped.


>
>> +        if ( rc != P2M_TABLE_NORMAL )
>> +            break;
>> +    }
>> +
>> +    entry = table[offsets[level]];
>> +
>> +    if ( pte_is_valid(entry) )
>> +    {
>> +        if ( t )
>> +            *t = p2m_get_type(entry);
>> +
>> +        mfn = pte_get_mfn(entry);
>> +        /*
>> +         * The entry may point to a superpage. Find the MFN associated
>> +         * to the GFN.
>> +         */
>> +        mfn = mfn_add(mfn,
>> +                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
> May want to assert that the respective bits of "mfn" are actually clear
> before this calculation.

ASSERT(!(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)));
Do you mean something like that?

I am not 100% sure that there is really need for that as page-fault exception
is raised if the PA is insufficienlty aligned:
  Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages, Sv39 supports
  2 MiB megapages and 1 GiB gigapages, each of which must be virtually and
  physically aligned to a boundary equal to its size. A page-fault exception is
  raised if the physical address is insufficiently aligned.

>
>> +        if ( valid )
>> +            *valid = pte_is_valid(entry);
>> +    }
>> +
>> + out_unmap:
>> +    unmap_domain_page(table);
>> +
>> + out:
>> +    if ( page_order )
>> +        *page_order = P2M_LEVEL_ORDER(level);
>> +
>> +    return mfn;
>> +}
>> +
>> +static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t)
>> +{
>> +    mfn_t mfn;
>> +
>> +    p2m_read_lock(p2m);
>> +    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL);
> Seeing the two NULLs here I wonder: What use is the "valid" parameter of that
> function?

`valid` parameter isn't really needed anymore. It was needed when I had a copy
of `valid` bit with real (in PTE) valid bit set to 0 to track which one pages
are used.
I will drop `valid` parameter.

> And what use is the function here when it doesn't also return the
> order?

It could be used for gfn_to_mfn(), but p2m_get_entry() could be used too, just
not need to forget to wrap by p2m_read_(un)lock() each time when p2m_get_entry()
is used. Probably, it makes sense to put p2m_read_(un)lock() inside p2m_get_entry().
I think we can leave only p2m_get_entry() and drop p2m_lookup().

>   IOW I'm not sure having this helper is actually worthwhile. This is
> even more so that ...
>> +    p2m_read_unlock(p2m);
>> +
>> +    return mfn;
>> +}
>> +
>> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
>> +                                        p2m_type_t *t)
>> +{
>> +    struct page_info *page;
>> +    p2m_type_t p2mt = p2m_invalid;
>> +    mfn_t mfn;
>> +
>> +    p2m_read_lock(p2m);
>> +    mfn = p2m_lookup(p2m, gfn, t);
> ... there's a locking problem here: You cannot acquire a read lock in a
> nested fashion - that's a recipe for a deadlock when between the first
> acquire and the 2nd acquire attempt another CPU tries to acquire the
> lock for writing (which will result in no further readers being allowed
> in). It wasn't all that long ago that in the security team we actually
> audited the code base for the absence of such a pattern.

Oh, missed such case. Thanks for explanation and review.

~ Oleksii
Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN
Posted by Jan Beulich 3 weeks, 2 days ago
On 30.09.2025 17:37, Oleksii Kurochko wrote:
> On 9/22/25 10:46 PM, Jan Beulich wrote:
>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>> +        if ( rc != P2M_TABLE_NORMAL )
>>> +            break;
>>> +    }
>>> +
>>> +    entry = table[offsets[level]];
>>> +
>>> +    if ( pte_is_valid(entry) )
>>> +    {
>>> +        if ( t )
>>> +            *t = p2m_get_type(entry);
>>> +
>>> +        mfn = pte_get_mfn(entry);
>>> +        /*
>>> +         * The entry may point to a superpage. Find the MFN associated
>>> +         * to the GFN.
>>> +         */
>>> +        mfn = mfn_add(mfn,
>>> +                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
>> May want to assert that the respective bits of "mfn" are actually clear
>> before this calculation.
> 
> ASSERT(!(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)));
> Do you mean something like that?

Yes.

> I am not 100% sure that there is really need for that as page-fault exception
> is raised if the PA is insufficienlty aligned:
>   Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages, Sv39 supports
>   2 MiB megapages and 1 GiB gigapages, each of which must be virtually and
>   physically aligned to a boundary equal to its size. A page-fault exception is
>   raised if the physical address is insufficiently aligned.

But that would be raised only when a page walk encounters such a PTE. You may
be altering a PTE here which never was involved in a page walk, though.

Jan