RISC-V's PTE has only two available bits that can be used to store the P2M
type. This is insufficient to represent all the current RISC-V P2M types.
Therefore, some P2M types must be stored outside the PTE bits.
To address this, a metadata table is introduced to store P2M types that
cannot fit in the PTE itself. Not all P2M types are stored in the
metadata table—only those that require it.
The metadata table is linked to the intermediate page table via the
`struct page_info`'s v.md.metadata field of the corresponding intermediate
page.
Such pages are allocated with MEMF_no_owner, which allows us to use
the v field for the purpose of storing the metadata table.
To simplify the allocation and linking of intermediate and metadata page
tables, `p2m_{alloc,free}_table()` functions are implemented.
These changes impact `p2m_split_superpage()`, since when a superpage is
split, it is necessary to update the metadata table of the new
intermediate page table — if the entry being split has its P2M type set
to `p2m_ext_storage` in its `P2M_TYPES` bits. In addition to updating
the metadata of the new intermediate page table, the corresponding entry
in the metadata for the original superpage is invalidated.
Also, update p2m_{get,set}_type to work with P2M types which don't fit
into PTE bits.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V4:
- Add Suggested-by: Jan Beulich <jbeulich@suse.com>.
- Update the comment above declation of md structure inside struct page_info to:
"Page is used as an intermediate P2M page table".
- Allocate metadata table on demand to save some memory. (1)
- Rework p2m_set_type():
- Add allocatation of metadata page only if needed.
- Move a check what kind of type we are handling inside p2m_set_type().
- Move mapping of metadata page inside p2m_get_type() as it is needed only
in case if PTE's type is equal to p2m_ext_storage.
- Add some description to p2m_get_type() function.
- Drop blank after return type of p2m_alloc_table().
- Drop allocation of metadata page inside p2m_alloc_table becaues of (1).
- Fix p2m_free_table() to free metadata page only if it was allocated.
---
Changes in V3:
- Add is_p2m_foreign() macro and connected stuff.
- Change struct domain *d argument of p2m_get_page_from_gfn() to
struct p2m_domain.
- Update the comment above p2m_get_entry().
- s/_t/p2mt for local variable in p2m_get_entry().
- Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn)
to define offsets array.
- Code style fixes.
- Update a check of rc code from p2m_next_level() in p2m_get_entry()
and drop "else" case.
- Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL.
- Use struct p2m_domain instead of struct domain for p2m_lookup() and
p2m_get_page_from_gfn().
- Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers"
---
Changes in V2:
- New patch.
---
xen/arch/riscv/include/asm/mm.h | 9 ++
xen/arch/riscv/p2m.c | 247 +++++++++++++++++++++++++++-----
2 files changed, 218 insertions(+), 38 deletions(-)
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 1b16809749..1464119b6f 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -149,6 +149,15 @@ struct page_info
/* Order-size of the free chunk this page is the head of. */
unsigned int order;
} free;
+
+ /* Page is used as an intermediate P2M page table */
+ struct {
+ /*
+ * Pointer to a page which store metadata for an intermediate page
+ * table.
+ */
+ struct page_info *metadata;
+ } md;
} v;
union {
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index a5ea61fe61..14809bd089 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -16,6 +16,16 @@
#include <asm/paging.h>
#include <asm/riscv_encoding.h>
+/*
+ * P2M PTE context is used only when a PTE's P2M type is p2m_ext_storage.
+ * In this case, the P2M type is stored separately in the metadata page.
+ */
+struct p2m_pte_ctx {
+ struct page_info *pt_page; /* Page table page containing the PTE. */
+ unsigned int index; /* Index of the PTE within that page. */
+ unsigned int level; /* Paging level at which the PTE resides. */
+};
+
unsigned long __ro_after_init gstage_mode;
unsigned int __ro_after_init gstage_root_level;
@@ -289,24 +299,98 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
return __map_domain_page(p2m->root + root_table_indx);
}
-static int p2m_set_type(pte_t *pte, p2m_type_t t)
+static struct page_info * p2m_alloc_table(struct p2m_domain *p2m);
+
+/*
+ * `pte` – PTE entry for which the type `t` will be stored.
+ *
+ * If `t` is `p2m_ext_storage`, both `ctx` and `p2m` must be provided;
+ * otherwise, they may be NULL.
+ */
+static void p2m_set_type(pte_t *pte, const p2m_type_t t,
+ struct p2m_pte_ctx *ctx,
+ struct p2m_domain *p2m)
{
- int rc = 0;
+ /*
+ * For the root page table (16 KB in size), we need to select the correct
+ * metadata table, since allocations are 4 KB each. In total, there are
+ * 4 tables of 4 KB each.
+ * For none-root page table index of ->pt_page[] will be always 0 as
+ * index won't be higher then 511. ASSERT() below verifies that.
+ */
+ struct page_info **md_pg =
+ &ctx->pt_page[ctx->index / PAGETABLE_ENTRIES].v.md.metadata;
+ pte_t *metadata = NULL;
+
+ /* Be sure that an index correspondent to page level is passed. */
+ ASSERT(ctx->index <= P2M_PAGETABLE_ENTRIES(ctx->level));
+
+ if ( !*md_pg && (t >= p2m_first_external) )
+ {
+ /*
+ * Ensure that when `t` is stored outside the PTE bits
+ * (i.e. `t == p2m_ext_storage` or higher),
+ * both `ctx` and `p2m` are provided.
+ */
+ ASSERT(p2m && ctx);
- if ( t > p2m_first_external )
- panic("unimplemeted\n");
- else
+ if ( ctx->level <= P2M_SUPPORTED_LEVEL_MAPPING )
+ {
+ struct domain *d = p2m->domain;
+
+ *md_pg = p2m_alloc_table(p2m);
+ if ( !*md_pg )
+ {
+ printk("%s: can't allocate extra memory for dom%d\n",
+ __func__, d->domain_id);
+ domain_crash(d);
+ }
+ }
+ else
+ /*
+ * It is not legal to set a type for an entry which shouldn't
+ * be mapped.
+ */
+ ASSERT_UNREACHABLE();
+ }
+
+ if ( *md_pg )
+ metadata = __map_domain_page(*md_pg);
+
+ if ( t < p2m_first_external )
+ {
pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
- return rc;
+ if ( metadata )
+ metadata[ctx->index].pte = p2m_invalid;
+ }
+ else
+ {
+ pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
+
+ metadata[ctx->index].pte = t;
+ }
+
+ if ( metadata )
+ unmap_domain_page(metadata);
}
-static p2m_type_t p2m_get_type(const pte_t pte)
+/*
+ * `pte` -> PTE entry that stores the PTE's type.
+ *
+ * If the PTE's type is `p2m_ext_storage`, `ctx` should be provided;
+ * otherwise it could be NULL.
+ */
+static p2m_type_t p2m_get_type(const pte_t pte, const struct p2m_pte_ctx *ctx)
{
p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
if ( type == p2m_ext_storage )
- panic("unimplemented\n");
+ {
+ pte_t *md = __map_domain_page(ctx->pt_page->v.md.metadata);
+ type = md[ctx->index].pte;
+ unmap_domain_page(ctx->pt_page->v.md.metadata);
+ }
return type;
}
@@ -381,7 +465,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
}
}
-static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
+static pte_t p2m_pte_from_mfn(const mfn_t mfn, const p2m_type_t t,
+ struct p2m_pte_ctx *p2m_pte_ctx,
+ const bool is_table,
+ struct p2m_domain *p2m)
{
pte_t e = (pte_t) { PTE_VALID };
@@ -402,7 +489,7 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
if ( !is_table )
{
p2m_set_permission(&e, t);
- p2m_set_type(&e, t);
+ p2m_set_type(&e, t, p2m_pte_ctx, p2m);
}
else
/*
@@ -421,8 +508,13 @@ static pte_t page_to_p2m_table(const struct page_info *page)
* p2m_invalid will be ignored inside p2m_pte_from_mfn() as is_table is
* set to true and p2m_type_t shouldn't be applied for PTEs which
* describe an intermidiate table.
+ * That it also a reason why `p2m_pte_ctx` argument is NULL as a type
+ * isn't set for p2m tables.
+ * As p2m_pte_from_mfn()'s last argument is necessary only when a type
+ * should be set. For p2m table we don't set a type, so it is okay
+ * to have NULL for this argument.
*/
- return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, true);
+ return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, NULL, true, NULL);
}
static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
@@ -435,22 +527,47 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
return pg;
}
+static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
+
+/*
+ * Allocate a page table with an additional extra page to store
+ * metadata for each entry of the page table.
+ * Link this metadata page to page table page's list field.
+ */
+static struct page_info *p2m_alloc_table(struct p2m_domain *p2m)
+{
+ struct page_info *page_tbl = p2m_alloc_page(p2m);
+
+ if ( !page_tbl )
+ return NULL;
+
+ clear_and_clean_page(page_tbl, p2m->clean_dcache);
+
+ return page_tbl;
+}
+
+/*
+ * Free page table's page and metadata page linked to page table's page.
+ */
+static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg)
+{
+ ASSERT(tbl_pg->v.md.metadata);
+
+ if ( tbl_pg->v.md.metadata )
+ p2m_free_page(p2m, tbl_pg->v.md.metadata);
+ p2m_free_page(p2m, tbl_pg);
+}
+
/*
* Allocate a new page table page with an extra metadata page and hook it
* in via the given entry.
*/
static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
{
- struct page_info *page;
+ struct page_info *page = p2m_alloc_table(p2m);
ASSERT(!pte_is_valid(*entry));
- page = p2m_alloc_page(p2m);
- if ( page == NULL )
- return -ENOMEM;
-
- clear_and_clean_page(page, p2m->clean_dcache);
-
p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_dcache);
return 0;
@@ -599,12 +716,14 @@ static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg)
/* Free pte sub-tree behind an entry */
static void p2m_free_subtree(struct p2m_domain *p2m,
- pte_t entry, unsigned int level)
+ pte_t entry,
+ const struct p2m_pte_ctx *p2m_pte_ctx)
{
unsigned int i;
pte_t *table;
mfn_t mfn;
struct page_info *pg;
+ unsigned int level = p2m_pte_ctx->level;
/*
* Check if the level is valid: only 4K - 2M - 1G mappings are supported.
@@ -620,7 +739,7 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
if ( (level == 0) || pte_is_superpage(entry, level) )
{
- p2m_type_t p2mt = p2m_get_type(entry);
+ p2m_type_t p2mt = p2m_get_type(entry, p2m_pte_ctx);
#ifdef CONFIG_IOREQ_SERVER
/*
@@ -629,7 +748,7 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
* has failed (error case).
* So, at worst, the spurious mapcache invalidation might be sent.
*/
- if ( p2m_is_ram(p2m_get_type(p2m, entry)) &&
+ if ( p2m_is_ram(p2mt) &&
domain_has_ioreq_server(p2m->domain) )
ioreq_request_mapcache_invalidate(p2m->domain);
#endif
@@ -639,9 +758,21 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
return;
}
- table = map_domain_page(pte_get_mfn(entry));
+ mfn = pte_get_mfn(entry);
+ ASSERT(mfn_valid(mfn));
+ table = map_domain_page(mfn);
+ pg = mfn_to_page(mfn);
+
for ( i = 0; i < P2M_PAGETABLE_ENTRIES(level); i++ )
- p2m_free_subtree(p2m, table[i], level - 1);
+ {
+ struct p2m_pte_ctx tmp_ctx = {
+ .pt_page = pg,
+ .index = i,
+ .level = level -1
+ };
+
+ p2m_free_subtree(p2m, table[i], &tmp_ctx);
+ }
unmap_domain_page(table);
@@ -653,17 +784,13 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
*/
p2m_tlb_flush_sync(p2m);
- mfn = pte_get_mfn(entry);
- ASSERT(mfn_valid(mfn));
-
- pg = mfn_to_page(mfn);
-
- p2m_free_page(p2m, pg);
+ p2m_free_table(p2m, pg);
}
static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
unsigned int level, unsigned int target,
- const unsigned int *offsets)
+ const unsigned int *offsets,
+ struct page_info *tbl_pg)
{
struct page_info *page;
unsigned long i;
@@ -682,7 +809,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
ASSERT(level > target);
ASSERT(pte_is_superpage(*entry, level));
- page = p2m_alloc_page(p2m);
+ page = p2m_alloc_table(p2m);
if ( !page )
{
/*
@@ -707,6 +834,22 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
pte = *entry;
pte_set_mfn(&pte, mfn_add(mfn, i << level_order));
+ if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage )
+ {
+ struct p2m_pte_ctx p2m_pte_ctx = {
+ .pt_page = tbl_pg,
+ .index = offsets[level],
+ };
+
+ p2m_type_t old_type = p2m_get_type(pte, &p2m_pte_ctx);
+
+ p2m_pte_ctx.pt_page = page;
+ p2m_pte_ctx.index = i;
+ p2m_pte_ctx.level = level;
+
+ p2m_set_type(&pte, old_type, &p2m_pte_ctx, p2m);
+ }
+
write_pte(new_entry, pte);
}
@@ -718,7 +861,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
*/
if ( next_level != target )
rv = p2m_split_superpage(p2m, table + offsets[next_level],
- level - 1, target, offsets);
+ level - 1, target, offsets, page);
if ( p2m->clean_dcache )
clean_dcache_va_range(table, PAGE_SIZE);
@@ -812,13 +955,21 @@ static int p2m_set_entry(struct p2m_domain *p2m,
{
/* We need to split the original page. */
pte_t split_pte = *entry;
+ struct page_info *tbl_pg = virt_to_page(table);
ASSERT(pte_is_superpage(*entry, level));
- if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
+ if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets,
+ tbl_pg) )
{
+ struct p2m_pte_ctx tmp_ctx = {
+ .pt_page = tbl_pg,
+ .index = offsets[level],
+ .level = level,
+ };
+
/* Free the allocated sub-tree */
- p2m_free_subtree(p2m, split_pte, level);
+ p2m_free_subtree(p2m, split_pte, &tmp_ctx);
rc = -ENOMEM;
goto out;
@@ -856,7 +1007,13 @@ static int p2m_set_entry(struct p2m_domain *p2m,
p2m_clean_pte(entry, p2m->clean_dcache);
else
{
- pte_t pte = p2m_pte_from_mfn(mfn, t, false);
+ struct p2m_pte_ctx tmp_ctx = {
+ .pt_page = virt_to_page(table),
+ .index = offsets[level],
+ .level = level,
+ };
+
+ pte_t pte = p2m_pte_from_mfn(mfn, t, &tmp_ctx, false, p2m);
p2m_write_pte(entry, pte, p2m->clean_dcache);
@@ -892,7 +1049,15 @@ static int p2m_set_entry(struct p2m_domain *p2m,
if ( pte_is_valid(orig_pte) &&
(!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
(removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
- p2m_free_subtree(p2m, orig_pte, level);
+ {
+ struct p2m_pte_ctx tmp_ctx = {
+ .pt_page = virt_to_page(table),
+ .index = offsets[level],
+ .level = level,
+ };
+
+ p2m_free_subtree(p2m, orig_pte, &tmp_ctx);
+ }
out:
unmap_domain_page(table);
@@ -979,7 +1144,6 @@ int map_regions_p2mt(struct domain *d,
return rc;
}
-
/*
* p2m_get_entry() should always return the correct order value, even if an
* entry is not present (i.e. the GFN is outside the range):
@@ -1082,7 +1246,14 @@ static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
if ( pte_is_valid(entry) )
{
if ( t )
- *t = p2m_get_type(entry);
+ {
+ struct p2m_pte_ctx p2m_pte_ctx = {
+ .pt_page = virt_to_page(table),
+ .index = offsets[level],
+ };
+
+ *t = p2m_get_type(entry,&p2m_pte_ctx);
+ }
mfn = pte_get_mfn(entry);
/*
--
2.51.0
On 17.09.2025 23:55, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -149,6 +149,15 @@ struct page_info
> /* Order-size of the free chunk this page is the head of. */
> unsigned int order;
> } free;
> +
> + /* Page is used as an intermediate P2M page table */
> + struct {
> + /*
> + * Pointer to a page which store metadata for an intermediate page
> + * table.
> + */
> + struct page_info *metadata;
Any reason for this to not be "page" or "pg"? The metadata aspect is already
covered ...
> + } md;
... by the "md" here.
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -16,6 +16,16 @@
> #include <asm/paging.h>
> #include <asm/riscv_encoding.h>
>
> +/*
> + * P2M PTE context is used only when a PTE's P2M type is p2m_ext_storage.
> + * In this case, the P2M type is stored separately in the metadata page.
> + */
> +struct p2m_pte_ctx {
> + struct page_info *pt_page; /* Page table page containing the PTE. */
> + unsigned int index; /* Index of the PTE within that page. */
> + unsigned int level; /* Paging level at which the PTE resides. */
> +};
> +
> unsigned long __ro_after_init gstage_mode;
> unsigned int __ro_after_init gstage_root_level;
>
> @@ -289,24 +299,98 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
> return __map_domain_page(p2m->root + root_table_indx);
> }
>
> -static int p2m_set_type(pte_t *pte, p2m_type_t t)
> +static struct page_info * p2m_alloc_table(struct p2m_domain *p2m);
Nit: Stray blank again.
> +/*
> + * `pte` – PTE entry for which the type `t` will be stored.
> + *
> + * If `t` is `p2m_ext_storage`, both `ctx` and `p2m` must be provided;
> + * otherwise, they may be NULL.
> + */
> +static void p2m_set_type(pte_t *pte, const p2m_type_t t,
> + struct p2m_pte_ctx *ctx,
> + struct p2m_domain *p2m)
> {
> - int rc = 0;
> + /*
> + * For the root page table (16 KB in size), we need to select the correct
> + * metadata table, since allocations are 4 KB each. In total, there are
> + * 4 tables of 4 KB each.
> + * For none-root page table index of ->pt_page[] will be always 0 as
> + * index won't be higher then 511. ASSERT() below verifies that.
> + */
> + struct page_info **md_pg =
> + &ctx->pt_page[ctx->index / PAGETABLE_ENTRIES].v.md.metadata;
> + pte_t *metadata = NULL;
> +
> + /* Be sure that an index correspondent to page level is passed. */
> + ASSERT(ctx->index <= P2M_PAGETABLE_ENTRIES(ctx->level));
Doesn't this need to be < ?
> + if ( !*md_pg && (t >= p2m_first_external) )
> + {
> + /*
> + * Ensure that when `t` is stored outside the PTE bits
> + * (i.e. `t == p2m_ext_storage` or higher),
> + * both `ctx` and `p2m` are provided.
> + */
> + ASSERT(p2m && ctx);
Imo this would want to be checked whenever t > p2m_first_external, no
matter whether a metadata page was already allocated.
> - if ( t > p2m_first_external )
> - panic("unimplemeted\n");
> - else
> + if ( ctx->level <= P2M_SUPPORTED_LEVEL_MAPPING )
> + {
> + struct domain *d = p2m->domain;
> +
> + *md_pg = p2m_alloc_table(p2m);
> + if ( !*md_pg )
> + {
> + printk("%s: can't allocate extra memory for dom%d\n",
> + __func__, d->domain_id);
> + domain_crash(d);
> + }
> + }
> + else
> + /*
> + * It is not legal to set a type for an entry which shouldn't
> + * be mapped.
> + */
> + ASSERT_UNREACHABLE();
Something not being legal doesn't mean it can't happen. Imo in this case
BUG_ON() (in place of the if() above) would be better.
> + }
> +
> + if ( *md_pg )
> + metadata = __map_domain_page(*md_pg);
> +
> + if ( t < p2m_first_external )
> + {
> pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>
> - return rc;
> + if ( metadata )
> + metadata[ctx->index].pte = p2m_invalid;
> + }
> + else
> + {
> + pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
> +
> + metadata[ctx->index].pte = t;
Afaict metadata can still be NULL when you get here.
> + }
> +
> + if ( metadata )
> + unmap_domain_page(metadata);
According to the x86 implementation, passing NULL here ought to be fine,
so no if() needed.
> }
>
> -static p2m_type_t p2m_get_type(const pte_t pte)
> +/*
> + * `pte` -> PTE entry that stores the PTE's type.
> + *
> + * If the PTE's type is `p2m_ext_storage`, `ctx` should be provided;
> + * otherwise it could be NULL.
> + */
> +static p2m_type_t p2m_get_type(const pte_t pte, const struct p2m_pte_ctx *ctx)
> {
> p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
>
> if ( type == p2m_ext_storage )
> - panic("unimplemented\n");
> + {
> + pte_t *md = __map_domain_page(ctx->pt_page->v.md.metadata);
Pointer-to-const?
> + type = md[ctx->index].pte;
> + unmap_domain_page(ctx->pt_page->v.md.metadata);
I'm pretty sure you want to pass md here, not the pointer you passed
into __map_domain_page().
> @@ -381,7 +465,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
> }
> }
>
> -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
> +static pte_t p2m_pte_from_mfn(const mfn_t mfn, const p2m_type_t t,
> + struct p2m_pte_ctx *p2m_pte_ctx,
> + const bool is_table,
Do you really need both "is_table" and the context pointer? Couldn't
the "is intermediate page table" case be identified by a NULL context
and/or p2m pointer?
Also why "const" all of the sudden?
> @@ -435,22 +527,47 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
> return pg;
> }
>
> +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
> +
> +/*
> + * Allocate a page table with an additional extra page to store
> + * metadata for each entry of the page table.
Isn't this stale now? At which point the question is whether ...
> + * Link this metadata page to page table page's list field.
> + */
> +static struct page_info *p2m_alloc_table(struct p2m_domain *p2m)
> +{
> + struct page_info *page_tbl = p2m_alloc_page(p2m);
> +
> + if ( !page_tbl )
> + return NULL;
> +
> + clear_and_clean_page(page_tbl, p2m->clean_dcache);
> +
> + return page_tbl;
> +}
... the function is needed in the first place.
> +/*
> + * Free page table's page and metadata page linked to page table's page.
> + */
> +static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg)
> +{
> + ASSERT(tbl_pg->v.md.metadata);
Why, when you no longer unconditionally alloc that page?
> + if ( tbl_pg->v.md.metadata )
> + p2m_free_page(p2m, tbl_pg->v.md.metadata);
> + p2m_free_page(p2m, tbl_pg);
> +}
> +
> /*
> * Allocate a new page table page with an extra metadata page and hook it
> * in via the given entry.
> */
This comment looks to have been inapplicable already when it was introduced.
> static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
> {
> - struct page_info *page;
> + struct page_info *page = p2m_alloc_table(p2m);
>
> ASSERT(!pte_is_valid(*entry));
>
> - page = p2m_alloc_page(p2m);
> - if ( page == NULL )
> - return -ENOMEM;
> -
> - clear_and_clean_page(page, p2m->clean_dcache);
> -
> p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_dcache);
>
> return 0;
As per above I don't think any change is needed here.
> @@ -629,7 +748,7 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
> * has failed (error case).
> * So, at worst, the spurious mapcache invalidation might be sent.
> */
> - if ( p2m_is_ram(p2m_get_type(p2m, entry)) &&
> + if ( p2m_is_ram(p2mt) &&
> domain_has_ioreq_server(p2m->domain) )
> ioreq_request_mapcache_invalidate(p2m->domain);
> #endif
This change wants making right in the earlier patch, where "p2mt" is
being introduced.
> @@ -639,9 +758,21 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
> return;
> }
>
> - table = map_domain_page(pte_get_mfn(entry));
> + mfn = pte_get_mfn(entry);
> + ASSERT(mfn_valid(mfn));
> + table = map_domain_page(mfn);
> + pg = mfn_to_page(mfn);
> +
> for ( i = 0; i < P2M_PAGETABLE_ENTRIES(level); i++ )
> - p2m_free_subtree(p2m, table[i], level - 1);
> + {
> + struct p2m_pte_ctx tmp_ctx = {
> + .pt_page = pg,
> + .index = i,
> + .level = level -1
Nit: Missing blank after - . Also it is generally better to end such
initialization with a comma.
> @@ -707,6 +834,22 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
> pte = *entry;
> pte_set_mfn(&pte, mfn_add(mfn, i << level_order));
>
> + if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage )
> + {
> + struct p2m_pte_ctx p2m_pte_ctx = {
> + .pt_page = tbl_pg,
> + .index = offsets[level],
> + };
Assuming using "level" is correct here (which it looks like it is), ...
> + p2m_type_t old_type = p2m_get_type(pte, &p2m_pte_ctx);
... can't this move ahead of the loop?
> + p2m_pte_ctx.pt_page = page;
> + p2m_pte_ctx.index = i;
> + p2m_pte_ctx.level = level;
Whereas - doesn't this need to be "next_level"?
> @@ -718,7 +861,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
> */
> if ( next_level != target )
> rv = p2m_split_superpage(p2m, table + offsets[next_level],
> - level - 1, target, offsets);
> + level - 1, target, offsets, page);
And btw (alredy in the earlier patch introducing this code) - why isn't
it "next_level" here, instead of "level - 1" (if already you have that
variable)?
> @@ -812,13 +955,21 @@ static int p2m_set_entry(struct p2m_domain *p2m,
> {
> /* We need to split the original page. */
> pte_t split_pte = *entry;
> + struct page_info *tbl_pg = virt_to_page(table);
This isn't valid on a pointer obtained from map_domain_page().
> @@ -892,7 +1049,15 @@ static int p2m_set_entry(struct p2m_domain *p2m,
> if ( pte_is_valid(orig_pte) &&
> (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
> (removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
> - p2m_free_subtree(p2m, orig_pte, level);
> + {
> + struct p2m_pte_ctx tmp_ctx = {
> + .pt_page = virt_to_page(table),
> + .index = offsets[level],
> + .level = level,
Nit: Indentation is off here.
> @@ -1082,7 +1246,14 @@ static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
> if ( pte_is_valid(entry) )
> {
> if ( t )
> - *t = p2m_get_type(entry);
> + {
> + struct p2m_pte_ctx p2m_pte_ctx = {
> + .pt_page = virt_to_page(table),
> + .index = offsets[level],
> + };
> +
> + *t = p2m_get_type(entry,&p2m_pte_ctx);
Nit: Blank after comma please.
Jan
On 9/23/25 12:41 AM, Jan Beulich wrote:
> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>
>> +/*
>> + * `pte` – PTE entry for which the type `t` will be stored.
>> + *
>> + * If `t` is `p2m_ext_storage`, both `ctx` and `p2m` must be provided;
>> + * otherwise, they may be NULL.
>> + */
>> +static void p2m_set_type(pte_t *pte, const p2m_type_t t,
>> + struct p2m_pte_ctx *ctx,
>> + struct p2m_domain *p2m)
>> {
>> - int rc = 0;
>> + /*
>> + * For the root page table (16 KB in size), we need to select the correct
>> + * metadata table, since allocations are 4 KB each. In total, there are
>> + * 4 tables of 4 KB each.
>> + * For none-root page table index of ->pt_page[] will be always 0 as
>> + * index won't be higher then 511. ASSERT() below verifies that.
>> + */
>> + struct page_info **md_pg =
>> + &ctx->pt_page[ctx->index / PAGETABLE_ENTRIES].v.md.metadata;
>> + pte_t *metadata = NULL;
>> +
>> + /* Be sure that an index correspondent to page level is passed. */
>> + ASSERT(ctx->index <= P2M_PAGETABLE_ENTRIES(ctx->level));
> Doesn't this need to be < ?
Yeah, it should be <.
>
>> + if ( !*md_pg && (t >= p2m_first_external) )
>> + {
>> + /*
>> + * Ensure that when `t` is stored outside the PTE bits
>> + * (i.e. `t == p2m_ext_storage` or higher),
>> + * both `ctx` and `p2m` are provided.
>> + */
>> + ASSERT(p2m && ctx);
> Imo this would want to be checked whenever t > p2m_first_external, no
> matter whether a metadata page was already allocated.
I think that|ctx| should be checked before this|if| condition, since it is
used to obtain the proper metadata page.
The check for|p2m| can remain inside the|if| condition, as it is essentially
only needed for allocating a metadata page.
>
>> - if ( t > p2m_first_external )
>> - panic("unimplemeted\n");
>> - else
>> + if ( ctx->level <= P2M_SUPPORTED_LEVEL_MAPPING )
>> + {
>> + struct domain *d = p2m->domain;
>> +
>> + *md_pg = p2m_alloc_table(p2m);
>> + if ( !*md_pg )
>> + {
>> + printk("%s: can't allocate extra memory for dom%d\n",
>> + __func__, d->domain_id);
>> + domain_crash(d);
>> + }
>> + }
>> + else
>> + /*
>> + * It is not legal to set a type for an entry which shouldn't
>> + * be mapped.
>> + */
>> + ASSERT_UNREACHABLE();
> Something not being legal doesn't mean it can't happen. Imo in this case
> BUG_ON() (in place of the if() above) would be better.
>
>> + }
>> +
>> + if ( *md_pg )
>> + metadata = __map_domain_page(*md_pg);
>> +
>> + if ( t < p2m_first_external )
>> + {
>> pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>>
>> - return rc;
>> + if ( metadata )
>> + metadata[ctx->index].pte = p2m_invalid;
>> + }
>> + else
>> + {
>> + pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
>> +
>> + metadata[ctx->index].pte = t;
> Afaict metadata can still be NULL when you get here.
It shouldn't be, because when this line is executed, the metadata page already
exists or was allocated at the start of p2m_set_type().
>
>> + }
>> +
>> + if ( metadata )
>> + unmap_domain_page(metadata);
> According to the x86 implementation, passing NULL here ought to be fine,
> so no if() needed.
With the current one implementation for RISC-V (CONFIG_ARCH_MAP_DOMAIN_PAGE=n so
unmap_domain_page() does nothing), it is fine too.
>
>> }
>>
>> -static p2m_type_t p2m_get_type(const pte_t pte)
>> +/*
>> + * `pte` -> PTE entry that stores the PTE's type.
>> + *
>> + * If the PTE's type is `p2m_ext_storage`, `ctx` should be provided;
>> + * otherwise it could be NULL.
>> + */
>> +static p2m_type_t p2m_get_type(const pte_t pte, const struct p2m_pte_ctx *ctx)
>> {
>> p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
>>
>> if ( type == p2m_ext_storage )
>> - panic("unimplemented\n");
>> + {
>> + pte_t *md = __map_domain_page(ctx->pt_page->v.md.metadata);
> Pointer-to-const?
>
>> + type = md[ctx->index].pte;
>> + unmap_domain_page(ctx->pt_page->v.md.metadata);
> I'm pretty sure you want to pass md here, not the pointer you passed
> into __map_domain_page().
Oh, right. It should be `md`.
>
>> @@ -381,7 +465,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
>> }
>> }
>>
>> -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
>> +static pte_t p2m_pte_from_mfn(const mfn_t mfn, const p2m_type_t t,
>> + struct p2m_pte_ctx *p2m_pte_ctx,
>> + const bool is_table,
> Do you really need both "is_table" and the context pointer? Couldn't
> the "is intermediate page table" case be identified by a NULL context
> and/or p2m pointer?
Good point. I will drop is_table.
>
> Also why "const" all of the sudden?
Because nothing of that is going to be changed in p2m_pte_from_mfn(). To have
diff clearer, I can revert these changes.
>
>> @@ -435,22 +527,47 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
>> return pg;
>> }
>>
>> + * Link this metadata page to page table page's list field.
>> + */
>> +static struct page_info *p2m_alloc_table(struct p2m_domain *p2m)
>> +{
>> + struct page_info *page_tbl = p2m_alloc_page(p2m);
>> +
>> + if ( !page_tbl )
>> + return NULL;
>> +
>> + clear_and_clean_page(page_tbl, p2m->clean_dcache);
>> +
>> + return page_tbl;
>> +}
> ... the function is needed in the first place.
On one hand, it may not seem strictly necessary, but on the
other hand, without it we would need to repeat the pattern of
allocating, clearing, and cleaning a page each time a page table
is allocated. At the moment, I prefer to keep it.
But considering another your comment below ...
>
>> +/*
>> + * Free page table's page and metadata page linked to page table's page.
>> + */
>> +static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg)
>> +{
>> + ASSERT(tbl_pg->v.md.metadata);
> Why, when you no longer unconditionally alloc that page?
Agree, there is no need for this ASSERT() as "lazy allocation" is used for
metadata.
>> static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
>> {
>> - struct page_info *page;
>> + struct page_info *page = p2m_alloc_table(p2m);
>>
>> ASSERT(!pte_is_valid(*entry));
>>
>> - page = p2m_alloc_page(p2m);
>> - if ( page == NULL )
>> - return -ENOMEM;
>> -
>> - clear_and_clean_page(page, p2m->clean_dcache);
>> -
>> p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_dcache);
>>
>> return 0;
> As per above I don't think any change is needed here.
There are some places in the code where it isn’t necessary to immediately
write the address of a newly allocated page table page into a PTE:
- During superpage splitting: a new page is first allocated for the new
page table, then it is filled, and only afterwards is the PTE updated
with the new page table address.
- In p2m_set_type(): when a table is allocated for storing metadata
(although I think p2m_alloc_page() would work fine here as well),
there is no need to update any PTE correspondingly.
...
So, I think I can agree that p2m_alloc_table() isn’t really needed.
It should be sufficient to move the clear_and_clean_page(page_tbl, p2m->clean_dcache)
call from p2m_alloc_table() into p2m_alloc_page(), and then just use
p2m_alloc_page() everywhere.
Does the last paragraph make sense?
>> @@ -707,6 +834,22 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
>> pte = *entry;
>> pte_set_mfn(&pte, mfn_add(mfn, i << level_order));
>>
>> + if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage )
>> + {
>> + struct p2m_pte_ctx p2m_pte_ctx = {
>> + .pt_page = tbl_pg,
>> + .index = offsets[level],
>> + };
> Assuming using "level" is correct here (which it looks like it is), ...
>
>> + p2m_type_t old_type = p2m_get_type(pte, &p2m_pte_ctx);
> ... can't this move ahead of the loop?
Considering that old_type is expected to be the same for all new PTEs, I think
we can move that ahead of the loop. I'll do that.
>
>> + p2m_pte_ctx.pt_page = page;
>> + p2m_pte_ctx.index = i;
>> + p2m_pte_ctx.level = level;
> Whereas - doesn't this need to be "next_level"?
Yes, it should be next_level.
>
>> @@ -718,7 +861,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
>> */
>> if ( next_level != target )
>> rv = p2m_split_superpage(p2m, table + offsets[next_level],
>> - level - 1, target, offsets);
>> + level - 1, target, offsets, page);
> And btw (alredy in the earlier patch introducing this code) - why isn't
> it "next_level" here, instead of "level - 1" (if already you have that
> variable)?
Missed to update that part. It should next_level used instead of level - 1.
>
>> @@ -812,13 +955,21 @@ static int p2m_set_entry(struct p2m_domain *p2m,
>> {
>> /* We need to split the original page. */
>> pte_t split_pte = *entry;
>> + struct page_info *tbl_pg = virt_to_page(table);
> This isn't valid on a pointer obtained from map_domain_page().
Oh, sure — virt_to_page() and page_to_virt() should be used only for Xen
heap addresses.
By the way, do we have any documentation, comments, or notes describing
what should be allocated and from where?
Since map_domain_page() returns an address from the direct map region,
should we instead use maddr_to_page(virt_to_maddr(table))?
Thanks for review.
~ Oleksii
On 01.10.2025 18:00, Oleksii Kurochko wrote:
> On 9/23/25 12:41 AM, Jan Beulich wrote:
>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>
>>> +/*
>>> + * `pte` – PTE entry for which the type `t` will be stored.
>>> + *
>>> + * If `t` is `p2m_ext_storage`, both `ctx` and `p2m` must be provided;
>>> + * otherwise, they may be NULL.
>>> + */
>>> +static void p2m_set_type(pte_t *pte, const p2m_type_t t,
>>> + struct p2m_pte_ctx *ctx,
>>> + struct p2m_domain *p2m)
>>> {
>>> - int rc = 0;
>>> + /*
>>> + * For the root page table (16 KB in size), we need to select the correct
>>> + * metadata table, since allocations are 4 KB each. In total, there are
>>> + * 4 tables of 4 KB each.
>>> + * For none-root page table index of ->pt_page[] will be always 0 as
>>> + * index won't be higher then 511. ASSERT() below verifies that.
>>> + */
>>> + struct page_info **md_pg =
>>> + &ctx->pt_page[ctx->index / PAGETABLE_ENTRIES].v.md.metadata;
>>> + pte_t *metadata = NULL;
>>> +
>>> + /* Be sure that an index correspondent to page level is passed. */
>>> + ASSERT(ctx->index <= P2M_PAGETABLE_ENTRIES(ctx->level));
>> Doesn't this need to be < ?
>
> Yeah, it should be <.
>
>>
>>> + if ( !*md_pg && (t >= p2m_first_external) )
>>> + {
>>> + /*
>>> + * Ensure that when `t` is stored outside the PTE bits
>>> + * (i.e. `t == p2m_ext_storage` or higher),
>>> + * both `ctx` and `p2m` are provided.
>>> + */
>>> + ASSERT(p2m && ctx);
>> Imo this would want to be checked whenever t > p2m_first_external, no
>> matter whether a metadata page was already allocated.
>
> I think that|ctx| should be checked before this|if| condition, since it is
> used to obtain the proper metadata page.
>
> The check for|p2m| can remain inside the|if| condition, as it is essentially
> only needed for allocating a metadata page.
That is, you want to allow callers to pass in NULL for the "p2m" parameter?
Isn't this going to be risky?
>>> - if ( t > p2m_first_external )
>>> - panic("unimplemeted\n");
>>> - else
>>> + if ( ctx->level <= P2M_SUPPORTED_LEVEL_MAPPING )
>>> + {
>>> + struct domain *d = p2m->domain;
>>> +
>>> + *md_pg = p2m_alloc_table(p2m);
>>> + if ( !*md_pg )
>>> + {
>>> + printk("%s: can't allocate extra memory for dom%d\n",
>>> + __func__, d->domain_id);
>>> + domain_crash(d);
>>> + }
>>> + }
>>> + else
>>> + /*
>>> + * It is not legal to set a type for an entry which shouldn't
>>> + * be mapped.
>>> + */
>>> + ASSERT_UNREACHABLE();
>> Something not being legal doesn't mean it can't happen. Imo in this case
>> BUG_ON() (in place of the if() above) would be better.
>>
>>> + }
>>> +
>>> + if ( *md_pg )
>>> + metadata = __map_domain_page(*md_pg);
Not this conditional assignment for ...
>>> + if ( t < p2m_first_external )
>>> + {
>>> pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>>>
>>> - return rc;
>>> + if ( metadata )
>>> + metadata[ctx->index].pte = p2m_invalid;
>>> + }
>>> + else
>>> + {
>>> + pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
>>> +
>>> + metadata[ctx->index].pte = t;
>> Afaict metadata can still be NULL when you get here.
>
> It shouldn't be, because when this line is executed, the metadata page already
> exists or was allocated at the start of p2m_set_type().
... this reply of yours. And the condition there can be false, in case you
took the domain_crash() path.
>>> @@ -812,13 +955,21 @@ static int p2m_set_entry(struct p2m_domain *p2m,
>>> {
>>> /* We need to split the original page. */
>>> pte_t split_pte = *entry;
>>> + struct page_info *tbl_pg = virt_to_page(table);
>> This isn't valid on a pointer obtained from map_domain_page().
>
> Oh, sure — virt_to_page() and page_to_virt() should be used only for Xen
> heap addresses.
>
> By the way, do we have any documentation, comments, or notes describing
> what should be allocated and from where?
>
> Since map_domain_page() returns an address from the direct map region,
> should we instead use maddr_to_page(virt_to_maddr(table))?
How would that be any better? Even if right now you only build RISC-V code
with map_domain_page() having a trivial expansion, you will want to avoid
any assumptions along those lines. Or else you could avoid the use of that
abstraction altogether. It exists so when you need to support memory
amounts beyond what the directmap can cover, you can provide a suitable
implementation of the function and be done. You (or whoever else) in
particular shouldn't be required to go audit all the places where
map_domain_page() (and the pointers it returns) is (are) used.
Jan
On 10/7/25 3:25 PM, Jan Beulich wrote:
> On 01.10.2025 18:00, Oleksii Kurochko wrote:
>> On 9/23/25 12:41 AM, Jan Beulich wrote:
>>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>>
>>>> +/*
>>>> + * `pte` – PTE entry for which the type `t` will be stored.
>>>> + *
>>>> + * If `t` is `p2m_ext_storage`, both `ctx` and `p2m` must be provided;
>>>> + * otherwise, they may be NULL.
>>>> + */
>>>> +static void p2m_set_type(pte_t *pte, const p2m_type_t t,
>>>> + struct p2m_pte_ctx *ctx,
>>>> + struct p2m_domain *p2m)
>>>> {
>>>> - int rc = 0;
>>>> + /*
>>>> + * For the root page table (16 KB in size), we need to select the correct
>>>> + * metadata table, since allocations are 4 KB each. In total, there are
>>>> + * 4 tables of 4 KB each.
>>>> + * For none-root page table index of ->pt_page[] will be always 0 as
>>>> + * index won't be higher then 511. ASSERT() below verifies that.
>>>> + */
>>>> + struct page_info **md_pg =
>>>> + &ctx->pt_page[ctx->index / PAGETABLE_ENTRIES].v.md.metadata;
>>>> + pte_t *metadata = NULL;
>>>> +
>>>> + /* Be sure that an index correspondent to page level is passed. */
>>>> + ASSERT(ctx->index <= P2M_PAGETABLE_ENTRIES(ctx->level));
>>> Doesn't this need to be < ?
>> Yeah, it should be <.
>>
>>>> + if ( !*md_pg && (t >= p2m_first_external) )
>>>> + {
>>>> + /*
>>>> + * Ensure that when `t` is stored outside the PTE bits
>>>> + * (i.e. `t == p2m_ext_storage` or higher),
>>>> + * both `ctx` and `p2m` are provided.
>>>> + */
>>>> + ASSERT(p2m && ctx);
>>> Imo this would want to be checked whenever t > p2m_first_external, no
>>> matter whether a metadata page was already allocated.
>> I think that|ctx| should be checked before this|if| condition, since it is
>> used to obtain the proper metadata page.
>>
>> The check for|p2m| can remain inside the|if| condition, as it is essentially
>> only needed for allocating a metadata page.
> That is, you want to allow callers to pass in NULL for the "p2m" parameter?
> Isn't this going to be risky?
With the current implementation it is not risky and initially I thought that p2m
could be passed NULL for the types which are used for types stored within PTE
as for that type p2m argument isn't really needed.
But just to be sure that something won't be broker in future changes let move
ASSERT(p2m) at the top of the function.
>
>>>> - if ( t > p2m_first_external )
>>>> - panic("unimplemeted\n");
>>>> - else
>>>> + if ( ctx->level <= P2M_SUPPORTED_LEVEL_MAPPING )
>>>> + {
>>>> + struct domain *d = p2m->domain;
>>>> +
>>>> + *md_pg = p2m_alloc_table(p2m);
>>>> + if ( !*md_pg )
>>>> + {
>>>> + printk("%s: can't allocate extra memory for dom%d\n",
>>>> + __func__, d->domain_id);
>>>> + domain_crash(d);
>>>> + }
>>>> + }
>>>> + else
>>>> + /*
>>>> + * It is not legal to set a type for an entry which shouldn't
>>>> + * be mapped.
>>>> + */
>>>> + ASSERT_UNREACHABLE();
>>> Something not being legal doesn't mean it can't happen. Imo in this case
>>> BUG_ON() (in place of the if() above) would be better.
>>>
>>>> + }
>>>> +
>>>> + if ( *md_pg )
>>>> + metadata = __map_domain_page(*md_pg);
> Not this conditional assignment for ...
>
>>>> + if ( t < p2m_first_external )
>>>> + {
>>>> pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>>>>
>>>> - return rc;
>>>> + if ( metadata )
>>>> + metadata[ctx->index].pte = p2m_invalid;
>>>> + }
>>>> + else
>>>> + {
>>>> + pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
>>>> +
>>>> + metadata[ctx->index].pte = t;
>>> Afaict metadata can still be NULL when you get here.
>> It shouldn't be, because when this line is executed, the metadata page already
>> exists or was allocated at the start of p2m_set_type().
> ... this reply of yours. And the condition there can be false, in case you
> took the domain_crash() path.
Oh, right, for some reason, I thought we didn’t return from|domain_crash()|.
I’m curious whether calling|domain_crash()| might break something, as some useful
data could be freed and negatively affect the internals of|map_regions_p2mt()|.
It might make more sense to use|panic()| here instead.
Do you have any thoughts or suggestions on this?
>
>>>> @@ -812,13 +955,21 @@ static int p2m_set_entry(struct p2m_domain *p2m,
>>>> {
>>>> /* We need to split the original page. */
>>>> pte_t split_pte = *entry;
>>>> + struct page_info *tbl_pg = virt_to_page(table);
>>> This isn't valid on a pointer obtained from map_domain_page().
>> Oh, sure — virt_to_page() and page_to_virt() should be used only for Xen
>> heap addresses.
>>
>> By the way, do we have any documentation, comments, or notes describing
>> what should be allocated and from where?
>>
>> Since map_domain_page() returns an address from the direct map region,
>> should we instead use maddr_to_page(virt_to_maddr(table))?
> How would that be any better? Even if right now you only build RISC-V code
> with map_domain_page() having a trivial expansion, you will want to avoid
> any assumptions along those lines. Or else you could avoid the use of that
> abstraction altogether. It exists so when you need to support memory
> amounts beyond what the directmap can cover, you can provide a suitable
> implementation of the function and be done. You (or whoever else) in
> particular shouldn't be required to go audit all the places where
> map_domain_page() (and the pointers it returns) is (are) used.
Then|domain_page_map_to_mfn()| is the appropriate function to use, and to
get a page,|mfn_to_page(domain_page_map_to_mfn(virt)) should be called.|
Thanks.
~ Oleksii
On 09.10.2025 13:34, Oleksii Kurochko wrote:
> On 10/7/25 3:25 PM, Jan Beulich wrote:
>> On 01.10.2025 18:00, Oleksii Kurochko wrote:
>>> On 9/23/25 12:41 AM, Jan Beulich wrote:
>>>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>>>> + if ( *md_pg )
>>>>> + metadata = __map_domain_page(*md_pg);
>> Not this conditional assignment for ...
>>
>>>>> + if ( t < p2m_first_external )
>>>>> + {
>>>>> pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>>>>>
>>>>> - return rc;
>>>>> + if ( metadata )
>>>>> + metadata[ctx->index].pte = p2m_invalid;
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
>>>>> +
>>>>> + metadata[ctx->index].pte = t;
>>>> Afaict metadata can still be NULL when you get here.
>>> It shouldn't be, because when this line is executed, the metadata page already
>>> exists or was allocated at the start of p2m_set_type().
>> ... this reply of yours. And the condition there can be false, in case you
>> took the domain_crash() path.
>
> Oh, right, for some reason, I thought we didn’t return from|domain_crash()|.
> I’m curious whether calling|domain_crash()| might break something, as some useful
> data could be freed and negatively affect the internals of|map_regions_p2mt()|.
>
> It might make more sense to use|panic()| here instead.
> Do you have any thoughts or suggestions on this?
domain_crash() is generally preferable over crashing the system as a whole.
I don't follow what negative effects you're alluding to. Did you look at
what domain_crash() does? It doesn't start tearing down the domain, that'll
still need invoking from the toolstack. A crashed domain will stay around
with all its resources allocated.
Jan
On 10/9/25 2:10 PM, Jan Beulich wrote:
> On 09.10.2025 13:34, Oleksii Kurochko wrote:
>> On 10/7/25 3:25 PM, Jan Beulich wrote:
>>> On 01.10.2025 18:00, Oleksii Kurochko wrote:
>>>> On 9/23/25 12:41 AM, Jan Beulich wrote:
>>>>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>>>>> + if ( *md_pg )
>>>>>> + metadata = __map_domain_page(*md_pg);
>>> Not this conditional assignment for ...
>>>
>>>>>> + if ( t < p2m_first_external )
>>>>>> + {
>>>>>> pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>>>>>>
>>>>>> - return rc;
>>>>>> + if ( metadata )
>>>>>> + metadata[ctx->index].pte = p2m_invalid;
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
>>>>>> +
>>>>>> + metadata[ctx->index].pte = t;
>>>>> Afaict metadata can still be NULL when you get here.
>>>> It shouldn't be, because when this line is executed, the metadata page already
>>>> exists or was allocated at the start of p2m_set_type().
>>> ... this reply of yours. And the condition there can be false, in case you
>>> took the domain_crash() path.
>> Oh, right, for some reason, I thought we didn’t return from|domain_crash()|.
>> I’m curious whether calling|domain_crash()| might break something, as some useful
>> data could be freed and negatively affect the internals of|map_regions_p2mt()|.
>>
>> It might make more sense to use|panic()| here instead.
>> Do you have any thoughts or suggestions on this?
> domain_crash() is generally preferable over crashing the system as a whole.
> I don't follow what negative effects you're alluding to. Did you look at
> what domain_crash() does? It doesn't start tearing down the domain, that'll
> still need invoking from the toolstack. A crashed domain will stay around
> with all its resources allocated.
I was confused by|arch_domain_shutdown()|, which is called somewhere inside
|domain_crash()|, since the function name suggests that some resource cleanup
might happen there. There’s also no comment explaining what
|arch_domain_shutdown()| is expected to do or not to do.
However, since it’s an architecture-specific function, we can control its
behavior for a given architecture.
So, if it doesn’t actually start tearing down the domain, I don’t see any
other negative effects.
Anyway, if|domain_crash()| is called, I’m not really sure we need to set
PTE type afterward. We could simply add a|return;| right after the
|domain_crash()| call and so we won't have NULL pointer deference.
Thanks.
~ Oleksii
On 10.10.2025 10:42, Oleksii Kurochko wrote: > Anyway, if|domain_crash()| is called, I’m not really sure we need to set > PTE type afterward. We could simply add a|return;| right after the > |domain_crash()| call and so we won't have NULL pointer deference. That's indeed preferable, so long as it won't cause other issues in the caller. Jan
© 2016 - 2025 Red Hat, Inc.