[PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization

Oleksii Kurochko posted 17 patches 4 months, 3 weeks ago
There is a newer version of this series
[PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Oleksii Kurochko 4 months, 3 weeks ago
Introduce the following things:
- Update p2m_domain structure, which describe per p2m-table state, with:
  - lock to protect updates to p2m.
  - pool with pages used to construct p2m.
  - clean_pte which indicate if it is requires to clean the cache when
    writing an entry.
  - radix tree to store p2m type as PTE doesn't have enough free bits to
    store type.
  - default_access to store p2m access type for each page in the domain.
  - back pointer to domain structure.
- p2m_init() to initalize members introduced in p2m_domain structure.
- Introudce p2m_write_lock() and p2m_is_write_locked().
- Introduce p2m_force_tlb_flush_sync() to flush TLBs after p2m table
  update.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V2:
 - Use introduced erlier sbi_remote_hfence_gvma_vmid() for proper implementation
   of p2m_force_tlb_flush_sync() as TLB flushing needs to happen for each pCPU
   which potentially has cached a mapping, what is tracked by d->dirty_cpumask.
 - Drop unnecessary blanks.
 - Fix code style for # of pre-processor directive.
 - Drop max_mapped_gfn and lowest_mapped_gfn as they aren't used now.
 - [p2m_init()] Set p2m->clean_pte=false if CONFIG_HAS_PASSTHROUGH=n.
 - [p2m_init()] Update the comment above p2m->domain = d;
 - Drop p2m->need_flush as it seems to be always true for RISC-V and as a
   consequence drop p2m_tlb_flush_sync().
 - Move to separate patch an introduction of root page table allocation.
---
 xen/arch/riscv/include/asm/p2m.h | 39 +++++++++++++++++++++
 xen/arch/riscv/p2m.c             | 58 ++++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)

diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 359408e1be..9570eff014 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -3,6 +3,10 @@
 #define ASM__RISCV__P2M_H
 
 #include <xen/errno.h>
+#include <xen/mem_access.h>
+#include <xen/mm.h>
+#include <xen/radix-tree.h>
+#include <xen/rwlock.h>
 #include <xen/types.h>
 
 #include <asm/page-bits.h>
@@ -14,6 +18,29 @@
 
 /* Per-p2m-table state */
 struct p2m_domain {
+    /*
+     * Lock that protects updates to the p2m.
+     */
+    rwlock_t lock;
+
+    /* Pages used to construct the p2m */
+    struct page_list_head pages;
+
+    /* Indicate if it is required to clean the cache when writing an entry */
+    bool clean_pte;
+
+    struct radix_tree_root p2m_type;
+
+    /*
+     * Default P2M access type for each page in the the domain: new pages,
+     * swapped in pages, cleared pages, and pages that are ambiguously
+     * retyped get this access type.  See definition of p2m_access_t.
+     */
+    p2m_access_t default_access;
+
+    /* Back pointer to domain */
+    struct domain *domain;
+
     /* Current VMID in use */
     uint16_t vmid;
 };
@@ -107,6 +134,18 @@ void p2m_vmid_allocator_init(void);
 
 int p2m_init(struct domain *d);
 
+static inline void p2m_write_lock(struct p2m_domain *p2m)
+{
+    write_lock(&p2m->lock);
+}
+
+void p2m_write_unlock(struct p2m_domain *p2m);
+
+static inline int p2m_is_write_locked(struct p2m_domain *p2m)
+{
+    return rw_is_write_locked(&p2m->lock);
+}
+
 #endif /* ASM__RISCV__P2M_H */
 
 /*
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index f33c7147ff..e409997499 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -1,13 +1,46 @@
 #include <xen/bitops.h>
+#include <xen/domain_page.h>
 #include <xen/event.h>
+#include <xen/iommu.h>
 #include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/rwlock.h>
 #include <xen/sched.h>
 #include <xen/spinlock.h>
 #include <xen/xvmalloc.h>
 
+#include <asm/page.h>
 #include <asm/p2m.h>
 #include <asm/sbi.h>
 
+/*
+ * Force a synchronous P2M TLB flush.
+ *
+ * Must be called with the p2m lock held.
+ */
+static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
+{
+    struct domain *d = p2m->domain;
+
+    ASSERT(p2m_is_write_locked(p2m));
+
+    sbi_remote_hfence_gvma_vmid(d->dirty_cpumask, 0, 0, p2m->vmid);
+}
+
+/* Unlock the flush and do a P2M TLB flush if necessary */
+void p2m_write_unlock(struct p2m_domain *p2m)
+{
+    /*
+     * The final flush is done with the P2M write lock taken to avoid
+     * someone else modifying the P2M wbefore the TLB invalidation has
+     * completed.
+     */
+    p2m_force_tlb_flush_sync(p2m);
+
+    write_unlock(&p2m->lock);
+}
+
 static spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
 
 /*
@@ -109,8 +142,33 @@ int p2m_init(struct domain *d)
     spin_lock_init(&d->arch.paging.lock);
     INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
 
+    rwlock_init(&p2m->lock);
+    INIT_PAGE_LIST_HEAD(&p2m->pages);
+
     p2m->vmid = INVALID_VMID;
 
+    p2m->default_access = p2m_access_rwx;
+
+    radix_tree_init(&p2m->p2m_type);
+
+#ifdef CONFIG_HAS_PASSTHROUGH
+    /*
+     * Some IOMMUs don't support coherent PT walk. When the p2m is
+     * shared with the CPU, Xen has to make sure that the PT changes have
+     * reached the memory
+     */
+    p2m->clean_pte = is_iommu_enabled(d) &&
+        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
+#else
+    p2m->clean_pte = false;
+#endif
+
+    /*
+     * "Trivial" initialisation is now complete.  Set the backpointer so the
+     * users of p2m could get an access to domain structure.
+     */
+    p2m->domain = d;
+
     rc = p2m_alloc_vmid(d);
     if ( rc )
         return rc;
-- 
2.49.0
Re: [PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Jan Beulich 4 months, 2 weeks ago
On 10.06.2025 15:05, Oleksii Kurochko wrote:
> Introduce the following things:
> - Update p2m_domain structure, which describe per p2m-table state, with:
>   - lock to protect updates to p2m.
>   - pool with pages used to construct p2m.
>   - clean_pte which indicate if it is requires to clean the cache when
>     writing an entry.
>   - radix tree to store p2m type as PTE doesn't have enough free bits to
>     store type.
>   - default_access to store p2m access type for each page in the domain.
>   - back pointer to domain structure.
> - p2m_init() to initalize members introduced in p2m_domain structure.
> - Introudce p2m_write_lock() and p2m_is_write_locked().

What about the reader variant? If you don't need that, why not use a simple
spin lock?

> @@ -14,6 +18,29 @@
>  
>  /* Per-p2m-table state */
>  struct p2m_domain {
> +    /*
> +     * Lock that protects updates to the p2m.
> +     */
> +    rwlock_t lock;
> +
> +    /* Pages used to construct the p2m */
> +    struct page_list_head pages;
> +
> +    /* Indicate if it is required to clean the cache when writing an entry */
> +    bool clean_pte;
> +
> +    struct radix_tree_root p2m_type;

A field with a p2m_ prefix in a p2m struct? And is this tree really about
just a single "type"?

> +    /*
> +     * Default P2M access type for each page in the the domain: new pages,
> +     * swapped in pages, cleared pages, and pages that are ambiguously
> +     * retyped get this access type.  See definition of p2m_access_t.
> +     */
> +    p2m_access_t default_access;
> +
> +    /* Back pointer to domain */
> +    struct domain *domain;

This you may want to introduce earlier, to prefer passing around struct
p2m_domain * in / to P2M functions (which would benefit earlier patches
already, I think).

> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1,13 +1,46 @@
>  #include <xen/bitops.h>
> +#include <xen/domain_page.h>
>  #include <xen/event.h>
> +#include <xen/iommu.h>
>  #include <xen/lib.h>
> +#include <xen/mm.h>
> +#include <xen/pfn.h>
> +#include <xen/rwlock.h>
>  #include <xen/sched.h>
>  #include <xen/spinlock.h>
>  #include <xen/xvmalloc.h>
>  
> +#include <asm/page.h>
>  #include <asm/p2m.h>
>  #include <asm/sbi.h>
>  
> +/*
> + * Force a synchronous P2M TLB flush.
> + *
> + * Must be called with the p2m lock held.
> + */
> +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
> +
> +    ASSERT(p2m_is_write_locked(p2m));
> +
> +    sbi_remote_hfence_gvma_vmid(d->dirty_cpumask, 0, 0, p2m->vmid);
> +}
> +
> +/* Unlock the flush and do a P2M TLB flush if necessary */
> +void p2m_write_unlock(struct p2m_domain *p2m)
> +{
> +    /*
> +     * The final flush is done with the P2M write lock taken to avoid
> +     * someone else modifying the P2M wbefore the TLB invalidation has
> +     * completed.
> +     */
> +    p2m_force_tlb_flush_sync(p2m);

The comment ahead of the function says "if necessary". Yet there's no
conditional here. I also question the need for a global flush in all
cases.

> @@ -109,8 +142,33 @@ int p2m_init(struct domain *d)
>      spin_lock_init(&d->arch.paging.lock);
>      INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
>  
> +    rwlock_init(&p2m->lock);
> +    INIT_PAGE_LIST_HEAD(&p2m->pages);
> +
>      p2m->vmid = INVALID_VMID;
>  
> +    p2m->default_access = p2m_access_rwx;
> +
> +    radix_tree_init(&p2m->p2m_type);
> +
> +#ifdef CONFIG_HAS_PASSTHROUGH

Do you expect this to be conditionally selected on RISC-V?

> +    /*
> +     * Some IOMMUs don't support coherent PT walk. When the p2m is
> +     * shared with the CPU, Xen has to make sure that the PT changes have
> +     * reached the memory
> +     */
> +    p2m->clean_pte = is_iommu_enabled(d) &&
> +        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);

The comment talks about shared page tables, yet you don't check whether
page table sharing is actually enabled for the domain.

> +#else
> +    p2m->clean_pte = false;

I hope the struct starts out zero-filled, in which case you wouldn't need
this.

> +#endif
> +
> +    /*
> +     * "Trivial" initialisation is now complete.  Set the backpointer so the
> +     * users of p2m could get an access to domain structure.
> +     */
> +    p2m->domain = d;

Better set this about the very first thing?

Jan
Re: [PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Oleksii Kurochko 4 months, 1 week ago
On 6/18/25 6:08 PM, Jan Beulich wrote:
> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>> Introduce the following things:
>> - Update p2m_domain structure, which describe per p2m-table state, with:
>>    - lock to protect updates to p2m.
>>    - pool with pages used to construct p2m.
>>    - clean_pte which indicate if it is requires to clean the cache when
>>      writing an entry.
>>    - radix tree to store p2m type as PTE doesn't have enough free bits to
>>      store type.
>>    - default_access to store p2m access type for each page in the domain.
>>    - back pointer to domain structure.
>> - p2m_init() to initalize members introduced in p2m_domain structure.
>> - Introudce p2m_write_lock() and p2m_is_write_locked().
> What about the reader variant? If you don't need that, why not use a simple
> spin lock?

It will be introduced later in "xen/riscv: add support of page lookup by GFN"
of this patch series where it is really used.

But I can move it here.

>
>> @@ -14,6 +18,29 @@
>>   
>>   /* Per-p2m-table state */
>>   struct p2m_domain {
>> +    /*
>> +     * Lock that protects updates to the p2m.
>> +     */
>> +    rwlock_t lock;
>> +
>> +    /* Pages used to construct the p2m */
>> +    struct page_list_head pages;
>> +
>> +    /* Indicate if it is required to clean the cache when writing an entry */
>> +    bool clean_pte;
>> +
>> +    struct radix_tree_root p2m_type;
> A field with a p2m_ prefix in a p2m struct?

p2m_ prefix could be really dropped.

>   And is this tree really about
> just a single "type"?

Yes, we don't have enough bits in PTE so we need some extra storage to store type.

>
>> +    /*
>> +     * Default P2M access type for each page in the the domain: new pages,
>> +     * swapped in pages, cleared pages, and pages that are ambiguously
>> +     * retyped get this access type.  See definition of p2m_access_t.
>> +     */
>> +    p2m_access_t default_access;
>> +
>> +    /* Back pointer to domain */
>> +    struct domain *domain;
> This you may want to introduce earlier, to prefer passing around struct
> p2m_domain * in / to P2M functions (which would benefit earlier patches
> already, I think).

But nothing uses it earlier.

>
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -1,13 +1,46 @@
>>   #include <xen/bitops.h>
>> +#include <xen/domain_page.h>
>>   #include <xen/event.h>
>> +#include <xen/iommu.h>
>>   #include <xen/lib.h>
>> +#include <xen/mm.h>
>> +#include <xen/pfn.h>
>> +#include <xen/rwlock.h>
>>   #include <xen/sched.h>
>>   #include <xen/spinlock.h>
>>   #include <xen/xvmalloc.h>
>>   
>> +#include <asm/page.h>
>>   #include <asm/p2m.h>
>>   #include <asm/sbi.h>
>>   
>> +/*
>> + * Force a synchronous P2M TLB flush.
>> + *
>> + * Must be called with the p2m lock held.
>> + */
>> +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
>> +{
>> +    struct domain *d = p2m->domain;
>> +
>> +    ASSERT(p2m_is_write_locked(p2m));
>> +
>> +    sbi_remote_hfence_gvma_vmid(d->dirty_cpumask, 0, 0, p2m->vmid);
>> +}
>> +
>> +/* Unlock the flush and do a P2M TLB flush if necessary */
>> +void p2m_write_unlock(struct p2m_domain *p2m)
>> +{
>> +    /*
>> +     * The final flush is done with the P2M write lock taken to avoid
>> +     * someone else modifying the P2M wbefore the TLB invalidation has
>> +     * completed.
>> +     */
>> +    p2m_force_tlb_flush_sync(p2m);
> The comment ahead of the function says "if necessary". Yet there's no
> conditional here. I also question the need for a global flush in all
> cases.

Stale comment.

But if p2m page table was modified that it is needed to do a flush for CPUs
in d->dirty_cpumask.

>
>> @@ -109,8 +142,33 @@ int p2m_init(struct domain *d)
>>       spin_lock_init(&d->arch.paging.lock);
>>       INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
>>   
>> +    rwlock_init(&p2m->lock);
>> +    INIT_PAGE_LIST_HEAD(&p2m->pages);
>> +
>>       p2m->vmid = INVALID_VMID;
>>   
>> +    p2m->default_access = p2m_access_rwx;
>> +
>> +    radix_tree_init(&p2m->p2m_type);
>> +
>> +#ifdef CONFIG_HAS_PASSTHROUGH
> Do you expect this to be conditionally selected on RISC-V?

No, once it will be implemented it will be just selected once by config RISC-V.
And it was done so because iommu_has_feature() isn't implemented now as IOMMU
isn't supported now and depends on CONFIG_HAS_PASSTHROUGH.

>
>> +    /*
>> +     * Some IOMMUs don't support coherent PT walk. When the p2m is
>> +     * shared with the CPU, Xen has to make sure that the PT changes have
>> +     * reached the memory
>> +     */
>> +    p2m->clean_pte = is_iommu_enabled(d) &&
>> +        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
> The comment talks about shared page tables, yet you don't check whether
> page table sharing is actually enabled for the domain.

Do we have such function/macros? It is shared by implementation now.

>
>> +#else
>> +    p2m->clean_pte = false;
> I hope the struct starts out zero-filled, in which case you wouldn't need
> this.
>
>> +#endif
>> +
>> +    /*
>> +     * "Trivial" initialisation is now complete.  Set the backpointer so the
>> +     * users of p2m could get an access to domain structure.
>> +     */
>> +    p2m->domain = d;
> Better set this about the very first thing?

It makes sense. I will move it up.

Thanks.

~ Oleksii

Re: [PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Jan Beulich 4 months, 1 week ago
On 25.06.2025 17:31, Oleksii Kurochko wrote:
> On 6/18/25 6:08 PM, Jan Beulich wrote:
>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>> @@ -14,6 +18,29 @@
>>>   
>>>   /* Per-p2m-table state */
>>>   struct p2m_domain {
>>> +    /*
>>> +     * Lock that protects updates to the p2m.
>>> +     */
>>> +    rwlock_t lock;
>>> +
>>> +    /* Pages used to construct the p2m */
>>> +    struct page_list_head pages;
>>> +
>>> +    /* Indicate if it is required to clean the cache when writing an entry */
>>> +    bool clean_pte;
>>> +
>>> +    struct radix_tree_root p2m_type;
>> A field with a p2m_ prefix in a p2m struct?
> 
> p2m_ prefix could be really dropped.
> 
>>   And is this tree really about
>> just a single "type"?
> 
> Yes, we don't have enough bits in PTE so we need some extra storage to store type.

My question wasn't about that, though. My question was whether in the name
"type" (singular) is appropriate. I didn't think you need a tree to store just
a single type.

>>> +    /*
>>> +     * Default P2M access type for each page in the the domain: new pages,
>>> +     * swapped in pages, cleared pages, and pages that are ambiguously
>>> +     * retyped get this access type.  See definition of p2m_access_t.
>>> +     */
>>> +    p2m_access_t default_access;
>>> +
>>> +    /* Back pointer to domain */
>>> +    struct domain *domain;
>> This you may want to introduce earlier, to prefer passing around struct
>> p2m_domain * in / to P2M functions (which would benefit earlier patches
>> already, I think).
> 
> But nothing uses it earlier.

If you do as suggested and pass around struct p2m_domain * for p2m_*()
functions, you'll quickly find it used, I think.

>>> --- a/xen/arch/riscv/p2m.c
>>> +++ b/xen/arch/riscv/p2m.c
>>> @@ -1,13 +1,46 @@
>>>   #include <xen/bitops.h>
>>> +#include <xen/domain_page.h>
>>>   #include <xen/event.h>
>>> +#include <xen/iommu.h>
>>>   #include <xen/lib.h>
>>> +#include <xen/mm.h>
>>> +#include <xen/pfn.h>
>>> +#include <xen/rwlock.h>
>>>   #include <xen/sched.h>
>>>   #include <xen/spinlock.h>
>>>   #include <xen/xvmalloc.h>
>>>   
>>> +#include <asm/page.h>
>>>   #include <asm/p2m.h>
>>>   #include <asm/sbi.h>
>>>   
>>> +/*
>>> + * Force a synchronous P2M TLB flush.
>>> + *
>>> + * Must be called with the p2m lock held.
>>> + */
>>> +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
>>> +{
>>> +    struct domain *d = p2m->domain;
>>> +
>>> +    ASSERT(p2m_is_write_locked(p2m));
>>> +
>>> +    sbi_remote_hfence_gvma_vmid(d->dirty_cpumask, 0, 0, p2m->vmid);
>>> +}
>>> +
>>> +/* Unlock the flush and do a P2M TLB flush if necessary */
>>> +void p2m_write_unlock(struct p2m_domain *p2m)
>>> +{
>>> +    /*
>>> +     * The final flush is done with the P2M write lock taken to avoid
>>> +     * someone else modifying the P2M wbefore the TLB invalidation has
>>> +     * completed.
>>> +     */
>>> +    p2m_force_tlb_flush_sync(p2m);
>> The comment ahead of the function says "if necessary". Yet there's no
>> conditional here. I also question the need for a global flush in all
>> cases.
> 
> Stale comment.
> 
> But if p2m page table was modified that it is needed to do a flush for CPUs
> in d->dirty_cpumask.

Right, but is that true for each and every case where you acquire the
lock in write mode? There may e.g. be early-out path which end up doing
nothing, yet you would then still flush the TLB.

>>> @@ -109,8 +142,33 @@ int p2m_init(struct domain *d)
>>>       spin_lock_init(&d->arch.paging.lock);
>>>       INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
>>>   
>>> +    rwlock_init(&p2m->lock);
>>> +    INIT_PAGE_LIST_HEAD(&p2m->pages);
>>> +
>>>       p2m->vmid = INVALID_VMID;
>>>   
>>> +    p2m->default_access = p2m_access_rwx;
>>> +
>>> +    radix_tree_init(&p2m->p2m_type);
>>> +
>>> +#ifdef CONFIG_HAS_PASSTHROUGH
>> Do you expect this to be conditionally selected on RISC-V?
> 
> No, once it will be implemented it will be just selected once by config RISC-V.
> And it was done so because iommu_has_feature() isn't implemented now as IOMMU
> isn't supported now and depends on CONFIG_HAS_PASSTHROUGH.

If the selection isn't going to be conditional, then I see no reason to have
such conditionals in RISC-V-specific code. The piece of code presently inside
that #ifdef may simply need adding later, once there's enough infrastructure
to allow that code to compile. Or maybe it would even compile fine already now?

>>> +    /*
>>> +     * Some IOMMUs don't support coherent PT walk. When the p2m is
>>> +     * shared with the CPU, Xen has to make sure that the PT changes have
>>> +     * reached the memory
>>> +     */
>>> +    p2m->clean_pte = is_iommu_enabled(d) &&
>>> +        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
>> The comment talks about shared page tables, yet you don't check whether
>> page table sharing is actually enabled for the domain.
> 
> Do we have such function/macros?

We have iommu_hap_pt_share, and we have the per-domain hap_pt_share flag.

> It is shared by implementation now.

I don't understand. There's no IOMMU support yet for RISC-V. Hence it's in
neither state - not shared, but also not not shared.

Jan
Re: [PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Oleksii Kurochko 4 months ago
On 6/25/25 5:53 PM, Jan Beulich wrote:
> On 25.06.2025 17:31, Oleksii Kurochko wrote:
>> On 6/18/25 6:08 PM, Jan Beulich wrote:
>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>> @@ -14,6 +18,29 @@
>>>>    
>>>>    /* Per-p2m-table state */
>>>>    struct p2m_domain {
>>>> +    /*
>>>> +     * Lock that protects updates to the p2m.
>>>> +     */
>>>> +    rwlock_t lock;
>>>> +
>>>> +    /* Pages used to construct the p2m */
>>>> +    struct page_list_head pages;
>>>> +
>>>> +    /* Indicate if it is required to clean the cache when writing an entry */
>>>> +    bool clean_pte;
>>>> +
>>>> +    struct radix_tree_root p2m_type;
>>> A field with a p2m_ prefix in a p2m struct?
>> p2m_ prefix could be really dropped.
>>
>>>    And is this tree really about
>>> just a single "type"?
>> Yes, we don't have enough bits in PTE so we need some extra storage to store type.
> My question wasn't about that, though. My question was whether in the name
> "type" (singular) is appropriate. I didn't think you need a tree to store just
> a single type.

I need tree to store a pair of <gfn, p2m_type>, where gfn is an index. And it seems
to me a tree is a good structure for fast insert/search.

>
>>>> +    /*
>>>> +     * Default P2M access type for each page in the the domain: new pages,
>>>> +     * swapped in pages, cleared pages, and pages that are ambiguously
>>>> +     * retyped get this access type.  See definition of p2m_access_t.
>>>> +     */
>>>> +    p2m_access_t default_access;
>>>> +
>>>> +    /* Back pointer to domain */
>>>> +    struct domain *domain;
>>> This you may want to introduce earlier, to prefer passing around struct
>>> p2m_domain * in / to P2M functions (which would benefit earlier patches
>>> already, I think).
>> But nothing uses it earlier.
> If you do as suggested and pass around struct p2m_domain * for p2m_*()
> functions, you'll quickly find it used, I think.
>
>>>> --- a/xen/arch/riscv/p2m.c
>>>> +++ b/xen/arch/riscv/p2m.c
>>>> @@ -1,13 +1,46 @@
>>>>    #include <xen/bitops.h>
>>>> +#include <xen/domain_page.h>
>>>>    #include <xen/event.h>
>>>> +#include <xen/iommu.h>
>>>>    #include <xen/lib.h>
>>>> +#include <xen/mm.h>
>>>> +#include <xen/pfn.h>
>>>> +#include <xen/rwlock.h>
>>>>    #include <xen/sched.h>
>>>>    #include <xen/spinlock.h>
>>>>    #include <xen/xvmalloc.h>
>>>>    
>>>> +#include <asm/page.h>
>>>>    #include <asm/p2m.h>
>>>>    #include <asm/sbi.h>
>>>>    
>>>> +/*
>>>> + * Force a synchronous P2M TLB flush.
>>>> + *
>>>> + * Must be called with the p2m lock held.
>>>> + */
>>>> +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
>>>> +{
>>>> +    struct domain *d = p2m->domain;
>>>> +
>>>> +    ASSERT(p2m_is_write_locked(p2m));
>>>> +
>>>> +    sbi_remote_hfence_gvma_vmid(d->dirty_cpumask, 0, 0, p2m->vmid);
>>>> +}
>>>> +
>>>> +/* Unlock the flush and do a P2M TLB flush if necessary */
>>>> +void p2m_write_unlock(struct p2m_domain *p2m)
>>>> +{
>>>> +    /*
>>>> +     * The final flush is done with the P2M write lock taken to avoid
>>>> +     * someone else modifying the P2M wbefore the TLB invalidation has
>>>> +     * completed.
>>>> +     */
>>>> +    p2m_force_tlb_flush_sync(p2m);
>>> The comment ahead of the function says "if necessary". Yet there's no
>>> conditional here. I also question the need for a global flush in all
>>> cases.
>> Stale comment.
>>
>> But if p2m page table was modified that it is needed to do a flush for CPUs
>> in d->dirty_cpumask.
> Right, but is that true for each and every case where you acquire the
> lock in write mode? There may e.g. be early-out path which end up doing
> nothing, yet you would then still flush the TLB.

Initially, I assumed that early-out patch will happen mostly in the cases when
some error happen, so it will be okay to flush the TLB each time.

But, yes, I missed some cases when it will be end up doing nothing. I will return
back need_flush.

>
>>>> @@ -109,8 +142,33 @@ int p2m_init(struct domain *d)
>>>>        spin_lock_init(&d->arch.paging.lock);
>>>>        INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
>>>>    
>>>> +    rwlock_init(&p2m->lock);
>>>> +    INIT_PAGE_LIST_HEAD(&p2m->pages);
>>>> +
>>>>        p2m->vmid = INVALID_VMID;
>>>>    
>>>> +    p2m->default_access = p2m_access_rwx;
>>>> +
>>>> +    radix_tree_init(&p2m->p2m_type);
>>>> +
>>>> +#ifdef CONFIG_HAS_PASSTHROUGH
>>> Do you expect this to be conditionally selected on RISC-V?
>> No, once it will be implemented it will be just selected once by config RISC-V.
>> And it was done so because iommu_has_feature() isn't implemented now as IOMMU
>> isn't supported now and depends on CONFIG_HAS_PASSTHROUGH.
> If the selection isn't going to be conditional, then I see no reason to have
> such conditionals in RISC-V-specific code. The piece of code presently inside
> that #ifdef may simply need adding later, once there's enough infrastructure
> to allow that code to compile. Or maybe it would even compile fine already now?

I haven't tried. Anyway, I get your point.

>
>>>> +    /*
>>>> +     * Some IOMMUs don't support coherent PT walk. When the p2m is
>>>> +     * shared with the CPU, Xen has to make sure that the PT changes have
>>>> +     * reached the memory
>>>> +     */
>>>> +    p2m->clean_pte = is_iommu_enabled(d) &&
>>>> +        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
>>> The comment talks about shared page tables, yet you don't check whether
>>> page table sharing is actually enabled for the domain.
>> Do we have such function/macros?
> We have iommu_hap_pt_share, and we have the per-domain hap_pt_share flag.
>
>> It is shared by implementation now.
> I don't understand. There's no IOMMU support yet for RISC-V. Hence it's in
> neither state - not shared, but also not not shared.

In downstream there is a support of IOMMU for RISC-V.

~ Oleksii
Re: [PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Jan Beulich 4 months ago
On 26.06.2025 10:40, Oleksii Kurochko wrote:
> On 6/25/25 5:53 PM, Jan Beulich wrote:
>> On 25.06.2025 17:31, Oleksii Kurochko wrote:
>>> On 6/18/25 6:08 PM, Jan Beulich wrote:
>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>> @@ -14,6 +18,29 @@
>>>>>    
>>>>>    /* Per-p2m-table state */
>>>>>    struct p2m_domain {
>>>>> +    /*
>>>>> +     * Lock that protects updates to the p2m.
>>>>> +     */
>>>>> +    rwlock_t lock;
>>>>> +
>>>>> +    /* Pages used to construct the p2m */
>>>>> +    struct page_list_head pages;
>>>>> +
>>>>> +    /* Indicate if it is required to clean the cache when writing an entry */
>>>>> +    bool clean_pte;
>>>>> +
>>>>> +    struct radix_tree_root p2m_type;
>>>> A field with a p2m_ prefix in a p2m struct?
>>> p2m_ prefix could be really dropped.
>>>
>>>>    And is this tree really about
>>>> just a single "type"?
>>> Yes, we don't have enough bits in PTE so we need some extra storage to store type.
>> My question wasn't about that, though. My question was whether in the name
>> "type" (singular) is appropriate. I didn't think you need a tree to store just
>> a single type.
> 
> I need tree to store a pair of <gfn, p2m_type>, where gfn is an index. And it seems
> to me a tree is a good structure for fast insert/search.

Hmm, I'm increasingly puzzled. I tried to emphasize that my question was towards
the singular "type" in the variable name. I can't see any relationship between
that and your reply. (And yes, using a tree here may be appropriate. There is a
concern towards memory consumption, but that's a separate topic.)

Having said that, aiui you don't use the two RSW bits in the PTE. Do you have
any plans there? If not, can't they be used to at least represent the most
commonly used types, such that the number of entries in that tree can be kept
(relatively) low?

>>>>> +    /*
>>>>> +     * Some IOMMUs don't support coherent PT walk. When the p2m is
>>>>> +     * shared with the CPU, Xen has to make sure that the PT changes have
>>>>> +     * reached the memory
>>>>> +     */
>>>>> +    p2m->clean_pte = is_iommu_enabled(d) &&
>>>>> +        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
>>>> The comment talks about shared page tables, yet you don't check whether
>>>> page table sharing is actually enabled for the domain.
>>> Do we have such function/macros?
>> We have iommu_hap_pt_share, and we have the per-domain hap_pt_share flag.
>>
>>> It is shared by implementation now.
>> I don't understand. There's no IOMMU support yet for RISC-V. Hence it's in
>> neither state - not shared, but also not not shared.
> 
> In downstream there is a support of IOMMU for RISC-V.

And there page tables are unconditionally shared? I'll be surprised if no
want/need for non-shared page tables would ever appear.

Jan
Re: [PATCH v2 05/17] xen/riscv: introduce things necessary for p2m initialization
Posted by Oleksii Kurochko 4 months ago
On 6/26/25 1:01 PM, Jan Beulich wrote:
> On 26.06.2025 10:40, Oleksii Kurochko wrote:
>> On 6/25/25 5:53 PM, Jan Beulich wrote:
>>> On 25.06.2025 17:31, Oleksii Kurochko wrote:
>>>> On 6/18/25 6:08 PM, Jan Beulich wrote:
>>>>> On 10.06.2025 15:05, Oleksii Kurochko wrote:
>>>>>> @@ -14,6 +18,29 @@
>>>>>>     
>>>>>>     /* Per-p2m-table state */
>>>>>>     struct p2m_domain {
>>>>>> +    /*
>>>>>> +     * Lock that protects updates to the p2m.
>>>>>> +     */
>>>>>> +    rwlock_t lock;
>>>>>> +
>>>>>> +    /* Pages used to construct the p2m */
>>>>>> +    struct page_list_head pages;
>>>>>> +
>>>>>> +    /* Indicate if it is required to clean the cache when writing an entry */
>>>>>> +    bool clean_pte;
>>>>>> +
>>>>>> +    struct radix_tree_root p2m_type;
>>>>> A field with a p2m_ prefix in a p2m struct?
>>>> p2m_ prefix could be really dropped.
>>>>
>>>>>     And is this tree really about
>>>>> just a single "type"?
>>>> Yes, we don't have enough bits in PTE so we need some extra storage to store type.
>>> My question wasn't about that, though. My question was whether in the name
>>> "type" (singular) is appropriate. I didn't think you need a tree to store just
>>> a single type.
>> I need tree to store a pair of <gfn, p2m_type>, where gfn is an index. And it seems
>> to me a tree is a good structure for fast insert/search.
> Hmm, I'm increasingly puzzled. I tried to emphasize that my question was towards
> the singular "type" in the variable name. I can't see any relationship between
> that and your reply. (And yes, using a tree here may be appropriate. There is a
> concern towards memory consumption, but that's a separate topic.)

Oh, I got your initial intention. For sure, it should be "types".

>
> Having said that, aiui you don't use the two RSW bits in the PTE. Do you have
> any plans there? If not, can't they be used to at least represent the most
> commonly used types, such that the number of entries in that tree can be kept
> (relatively) low?

It could be really an option for optimization.

In this case I have to p2m_type_t by adding a new type p2m_tree_type:
typedef enum {
     p2m_invalid = 0,    /* Nothing mapped here */
     p2m_ram_rw,         /* Normal read/write domain RAM */
     p2m_ram_ro,         /* Read-only */
     
     + p2m_tree_type,    /* The types below p2m_free_type will be stored outside PTE's bits */

     p2m_mmio_direct_dev,/* Read/write mapping of genuine Device MMIO area */
     p2m_grant_map_rw,   /* Read/write grant mapping */
     p2m_grant_map_ro,   /* Read-only grant mapping */
} p2m_type_t;

Probably, it make sense to switch p2m_ram_ro and p2m_mmio_direct_dev. I think device mapping
is more often operations.

>
>>>>>> +    /*
>>>>>> +     * Some IOMMUs don't support coherent PT walk. When the p2m is
>>>>>> +     * shared with the CPU, Xen has to make sure that the PT changes have
>>>>>> +     * reached the memory
>>>>>> +     */
>>>>>> +    p2m->clean_pte = is_iommu_enabled(d) &&
>>>>>> +        !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
>>>>> The comment talks about shared page tables, yet you don't check whether
>>>>> page table sharing is actually enabled for the domain.
>>>> Do we have such function/macros?
>>> We have iommu_hap_pt_share, and we have the per-domain hap_pt_share flag.
>>>
>>>> It is shared by implementation now.
>>> I don't understand. There's no IOMMU support yet for RISC-V. Hence it's in
>>> neither state - not shared, but also not not shared.
>> In downstream there is a support of IOMMU for RISC-V.
> And there page tables are unconditionally shared? I'll be surprised if no
> want/need for non-shared page tables would ever appear.

At the moment, yes, but it isn't strict limitation. So yes, it should be page
tables should be conditionally shared.

~ Oleksii