Introduce the implementation of setup_mm(), which includes:
1. Adding all free regions to the boot allocator, as memory is needed
to allocate page tables used for frame table mapping.
2. Calculating RAM size and the RAM end address.
3. Setting up direct map mappings from each RAM bank and initialize
directmap_virt_start (also introduce XENHEAP_VIRT_START which is
defined as directmap_virt_start) to be properly aligned with RAM
start to use more superpages to reduce pressure on the TLB.
4. Setting up frame table mappings from physical address 0 to ram_end
to simplify mfn_to_page() and page_to_mfn() conversions.
5. Setting up total_pages and max_page.
Update virt_to_maddr() to use introduced XENHEAP_VIRT_START.
Implement maddr_to_virt() function to convert a machine address
to a virtual address. This function is specifically designed to be used
only for the DIRECTMAP region, so a check has been added to ensure that
the address does not exceed DIRECTMAP_SIZE.
After the introduction of maddr_to_virt() the following linkage error starts
to occur and to avoid it share_xen_page_with_guest() stub is added:
riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill':
/build/xen/common/tasklet.c:176: undefined reference to
`share_xen_page_with_guest'
riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol `share_xen_page_with_guest'
isn't defined riscv64-linux-gnu-ld: final link failed: bad value
Despite the linkger fingering tasklet.c, it's trace.o which has the undefined
refenrece:
$ find . -name \*.o | while read F; do nm $F | grep share_xen_page_with_guest &&
echo $F; done
U share_xen_page_with_guest
./xen/common/built_in.o
U share_xen_page_with_guest
./xen/common/trace.o
U share_xen_page_with_guest
./xen/prelink.o
Looking at trace.i, there is call of share_xen_page_with_guest() but in case of
when maddr_to_virt() is defined as "return NULL" compiler optimizes the part of
common/trace.c code where share_xen_page_with_priviliged_guest() is called
( there is no any code in dissambled common/trace.o ) so there is no real call
of share_xen_page_with_priviliged_guest().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V2:
- merge patch 2 ( xen/riscv: implement maddr_to_virt() ) to the current one
as maddr_to_virt() started to use the thing which are introduced in the
current patch.
- merge with patch 1 ( xen/riscv: add stub for share_xen_page_with_guest() )
as this linkage issue happens during introduction of maddr_to_virt().
- use mathematical range expressions for log messages.
- calculate properly amount of mfns in setup_frametable_mapping() taking into
account that ps and pe can be not properly aligned.
- drop full stop at the end of debug message.
- use PFN_DOWN(framsetable_size) instead of frametable_size >> PAGE_SHIFT.
- round down ram_size when it is being accumulated in setup_mm() to guarantee
that banks can never have partial pages at their start/end.
- call setup_directmap_mappings() only for ram bank regions instead of
mapping [0, ram_end] region.
- drop directmap_virt_end for now as it isn't used at the moment.
- update the commit message.
---
xen/arch/riscv/include/asm/config.h | 1 +
xen/arch/riscv/include/asm/mm.h | 13 ++-
xen/arch/riscv/include/asm/setup.h | 2 +
xen/arch/riscv/mm.c | 121 ++++++++++++++++++++++++++++
xen/arch/riscv/setup.c | 3 +
xen/arch/riscv/stubs.c | 10 +++
6 files changed, 146 insertions(+), 4 deletions(-)
diff --git a/xen/arch/riscv/include/asm/config.h b/xen/arch/riscv/include/asm/config.h
index ad75871283..3aa9afa5ad 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -90,6 +90,7 @@
#define DIRECTMAP_SLOT_START 200
#define DIRECTMAP_VIRT_START SLOTN(DIRECTMAP_SLOT_START)
#define DIRECTMAP_SIZE (SLOTN(DIRECTMAP_SLOT_END) - SLOTN(DIRECTMAP_SLOT_START))
+#define XENHEAP_VIRT_START directmap_virt_start
#define FRAMETABLE_SCALE_FACTOR (PAGE_SIZE/sizeof(struct page_info))
#define FRAMETABLE_SIZE_IN_SLOTS (((DIRECTMAP_SIZE / SLOTN(1)) / FRAMETABLE_SCALE_FACTOR) + 1)
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index ebb142502e..bff4e763d9 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -12,6 +12,8 @@
#include <asm/page-bits.h>
+extern vaddr_t directmap_virt_start;
+
#define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
#define paddr_to_pfn(pa) ((unsigned long)((pa) >> PAGE_SHIFT))
@@ -25,8 +27,11 @@
static inline void *maddr_to_virt(paddr_t ma)
{
- BUG_ON("unimplemented");
- return NULL;
+ unsigned long va_offset = maddr_to_directmapoff(ma);
+
+ ASSERT(va_offset < DIRECTMAP_SIZE);
+
+ return (void *)(XENHEAP_VIRT_START + va_offset);
}
/*
@@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma)
*/
static inline unsigned long virt_to_maddr(unsigned long va)
{
- if ((va >= DIRECTMAP_VIRT_START) &&
+ if ((va >= XENHEAP_VIRT_START) &&
(va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
- return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START);
+ return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2));
ASSERT((va >> (PAGETABLE_ORDER + PAGE_SHIFT)) ==
diff --git a/xen/arch/riscv/include/asm/setup.h b/xen/arch/riscv/include/asm/setup.h
index c0214a9bf2..844a2f0ef1 100644
--- a/xen/arch/riscv/include/asm/setup.h
+++ b/xen/arch/riscv/include/asm/setup.h
@@ -5,6 +5,8 @@
#define max_init_domid (0)
+void setup_mm(void);
+
#endif /* ASM__RISCV__SETUP_H */
/*
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 27026d803b..5be5a7b52a 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -8,6 +8,7 @@
#include <xen/libfdt/libfdt.h>
#include <xen/macros.h>
#include <xen/mm.h>
+#include <xen/pdx.h>
#include <xen/pfn.h>
#include <xen/sections.h>
#include <xen/sizes.h>
@@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
return fdt_virt;
}
+
+#ifndef CONFIG_RISCV_32
+
+#define ROUNDDOWN(addr, size) ((addr) & ~((size) - 1))
+
+/* Map a frame table to cover physical addresses ps through pe */
+static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
+{
+ paddr_t aligned_ps = ROUNDDOWN(ps, PAGE_SIZE);
+ paddr_t aligned_pe = ROUNDUP(pe, PAGE_SIZE);
+ unsigned long nr_mfns = PFN_DOWN(aligned_pe - aligned_ps);
+ unsigned long frametable_size = nr_mfns * sizeof(struct page_info);
+ mfn_t base_mfn;
+
+ if ( frametable_size > FRAMETABLE_SIZE )
+ panic("The frametable cannot cover the physical region [%#"PRIpaddr" - %#"PRIpaddr")\n",
+ ps, pe);
+
+ frametable_size = ROUNDUP(frametable_size, MB(2));
+ base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, PFN_DOWN(MB(2)));
+
+ if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
+ PFN_DOWN(frametable_size),
+ PAGE_HYPERVISOR_RW) )
+ panic("Unable to setup the frametable mappings\n");
+
+ memset(&frame_table[0], 0, nr_mfns * sizeof(struct page_info));
+ memset(&frame_table[nr_mfns], -1,
+ frametable_size - (nr_mfns * sizeof(struct page_info)));
+}
+
+
+static mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER;
+vaddr_t __ro_after_init directmap_virt_start;
+
+/* Map the region in the directmap area. */
+static void __init setup_directmap_mappings(unsigned long base_mfn,
+ unsigned long nr_mfns)
+{
+ int rc;
+
+ /* First call sets the directmap physical and virtual offset. */
+ if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
+ {
+ directmap_mfn_start = _mfn(base_mfn);
+
+ /*
+ * The base address may not be aligned to the second level
+ * size (e.g. 1GB when using 4KB pages). This would prevent
+ * superpage mappings for all the regions because the virtual
+ * address and machine address should both be suitably aligned.
+ *
+ * Prevent that by offsetting the start of the directmap virtual
+ * address.
+ */
+ directmap_virt_start = DIRECTMAP_VIRT_START + pfn_to_paddr(base_mfn);
+ }
+
+ if ( base_mfn < mfn_x(directmap_mfn_start) )
+ panic("cannot add directmap mapping at %#lx below heap start %#lx\n",
+ base_mfn, mfn_x(directmap_mfn_start));
+
+ rc = map_pages_to_xen((vaddr_t)mfn_to_virt(base_mfn),
+ _mfn(base_mfn), nr_mfns,
+ PAGE_HYPERVISOR_RW);
+ if ( rc )
+ panic("Unable to setup the directmap mappings.\n");
+}
+
+/*
+ * Setup memory management
+ *
+ * RISC-V 64 has a large virtual address space (the minimum supported
+ * MMU mode is Sv39, which provides TBs of VA space).
+ * In the case of RISC-V 64, the directmap and frametable are mapped
+ * starting from physical address 0 to simplify the page_to_mfn(),
+ * mfn_to_page(), and maddr_to_virt() calculations, as there is no need
+ * to account for {directmap, frametable}_base_pdx in this setup.
+ */
+void __init setup_mm(void)
+{
+ const struct membanks *banks = bootinfo_get_mem();
+ paddr_t ram_start = INVALID_PADDR;
+ paddr_t ram_end = 0;
+ paddr_t ram_size = 0;
+ unsigned int i;
+
+ /*
+ * We need some memory to allocate the page-tables used for the directmap
+ * mappings. But some regions may contain memory already allocated
+ * for other uses (e.g. modules, reserved-memory...).
+ *
+ * For simplicity, add all the free regions in the boot allocator.
+ */
+ populate_boot_allocator();
+
+ total_pages = 0;
+
+ for ( i = 0; i < banks->nr_banks; i++ )
+ {
+ const struct membank *bank = &banks->bank[i];
+ paddr_t bank_end = bank->start + bank->size;
+
+ ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
+ ram_start = min(ram_start, bank->start);
+ ram_end = max(ram_end, bank_end);
+
+ setup_directmap_mappings(PFN_DOWN(bank->start),
+ PFN_DOWN(bank->size));
+ }
+
+ total_pages = PFN_DOWN(ram_size);
+
+ setup_frametable_mappings(0, ram_end);
+ max_page = PFN_DOWN(ram_end);
+}
+
+#else /* CONFIG_RISCV_32 */
+#error setup_mm(), setup_{directmap,frametable}_mapping() should be implemented for RV_32
+#endif
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index e29bd75d7c..2887a18c0c 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -12,6 +12,7 @@
#include <asm/early_printk.h>
#include <asm/sbi.h>
+#include <asm/setup.h>
#include <asm/smp.h>
#include <asm/traps.h>
@@ -59,6 +60,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
printk("Command line: %s\n", cmdline);
cmdline_parse(cmdline);
+ setup_mm();
+
printk("All set up\n");
machine_halt();
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 5951b0ce91..c9a590b225 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -2,7 +2,9 @@
#include <xen/cpumask.h>
#include <xen/domain.h>
#include <xen/irq.h>
+#include <xen/mm.h>
#include <xen/nodemask.h>
+#include <xen/sched.h>
#include <xen/sections.h>
#include <xen/time.h>
#include <public/domctl.h>
@@ -409,3 +411,11 @@ unsigned long get_upper_mfn_bound(void)
{
BUG_ON("unimplemented");
}
+
+/* mm.c */
+
+void share_xen_page_with_guest(struct page_info *page, struct domain *d,
+ enum XENSHARE_flags flags)
+{
+ BUG_ON("unimplemented");
+}
--
2.47.0
On 23.10.2024 17:50, Oleksii Kurochko wrote: > Introduce the implementation of setup_mm(), which includes: > 1. Adding all free regions to the boot allocator, as memory is needed > to allocate page tables used for frame table mapping. > 2. Calculating RAM size and the RAM end address. > 3. Setting up direct map mappings from each RAM bank and initialize > directmap_virt_start (also introduce XENHEAP_VIRT_START which is > defined as directmap_virt_start) to be properly aligned with RAM > start to use more superpages to reduce pressure on the TLB. > 4. Setting up frame table mappings from physical address 0 to ram_end > to simplify mfn_to_page() and page_to_mfn() conversions. > 5. Setting up total_pages and max_page. > > Update virt_to_maddr() to use introduced XENHEAP_VIRT_START. > > Implement maddr_to_virt() function to convert a machine address > to a virtual address. This function is specifically designed to be used > only for the DIRECTMAP region, so a check has been added to ensure that > the address does not exceed DIRECTMAP_SIZE. I'm unconvinced by this. Conceivably the function could be used on "imaginary" addresses, just to calculate abstract positions or e.g. deltas. At the same time I'm also not going to insist on the removal of that assertion, so long as it doesn't trigger. > After the introduction of maddr_to_virt() the following linkage error starts > to occur and to avoid it share_xen_page_with_guest() stub is added: > riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill': > /build/xen/common/tasklet.c:176: undefined reference to > `share_xen_page_with_guest' > riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol `share_xen_page_with_guest' > isn't defined riscv64-linux-gnu-ld: final link failed: bad value > > Despite the linkger fingering tasklet.c, it's trace.o which has the undefined > refenrece: > $ find . -name \*.o | while read F; do nm $F | grep share_xen_page_with_guest && > echo $F; done > U share_xen_page_with_guest > ./xen/common/built_in.o > U share_xen_page_with_guest > ./xen/common/trace.o > U share_xen_page_with_guest > ./xen/prelink.o > > Looking at trace.i, there is call of share_xen_page_with_guest() but in case of > when maddr_to_virt() is defined as "return NULL" compiler optimizes the part of > common/trace.c code where share_xen_page_with_priviliged_guest() is called > ( there is no any code in dissambled common/trace.o ) so there is no real call > of share_xen_page_with_priviliged_guest(). I don't think it's the "return NULL", but rather BUG_ON()'s (really BUG()'s) unreachable(). Not the least because the function can't validly return NULL, and hence callers have no need to check for NULL. > @@ -25,8 +27,11 @@ > > static inline void *maddr_to_virt(paddr_t ma) > { > - BUG_ON("unimplemented"); > - return NULL; > + unsigned long va_offset = maddr_to_directmapoff(ma); > + > + ASSERT(va_offset < DIRECTMAP_SIZE); > + > + return (void *)(XENHEAP_VIRT_START + va_offset); > } I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START in the first place: You don't have a separate "heap" virtual address range, do you? > @@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma) > */ > static inline unsigned long virt_to_maddr(unsigned long va) > { > - if ((va >= DIRECTMAP_VIRT_START) && > + if ((va >= XENHEAP_VIRT_START) && > (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE))) > - return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START); > + return directmapoff_to_maddr(va - XENHEAP_VIRT_START); Same concern here then. > @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t fdt_paddr) > > return fdt_virt; > } > + > +#ifndef CONFIG_RISCV_32 I'd like to ask that you be more selective with this #ifdef (or omit it altogether here). setup_mm() itself, for example, looks good for any mode. Like does ... > +#define ROUNDDOWN(addr, size) ((addr) & ~((size) - 1)) ... this #define. Then again this macro may better be placed in xen/macros.h anyway, next to ROUNDUP(). > +/* Map a frame table to cover physical addresses ps through pe */ > +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) > +{ > + paddr_t aligned_ps = ROUNDDOWN(ps, PAGE_SIZE); > + paddr_t aligned_pe = ROUNDUP(pe, PAGE_SIZE); > + unsigned long nr_mfns = PFN_DOWN(aligned_pe - aligned_ps); > + unsigned long frametable_size = nr_mfns * sizeof(struct page_info); Nit: Better sizeof(*frame_table). > + mfn_t base_mfn; > + > + if ( frametable_size > FRAMETABLE_SIZE ) > + panic("The frametable cannot cover the physical region [%#"PRIpaddr" - %#"PRIpaddr")\n", > + ps, pe); As per prior comments of mine: Imo the message is too verbose (and too long). "frametable cannot cover [%#"PRIpaddr", %#"PRIpaddr")\n" doesn't leave any ambiguity, I think. (Please take this as a general remark, i.e. potentially applicable elsewhere as well.) Note also the adjustment to how the range is presented. As said before, using mathematical intervals is (imo) least ambiguous. > + frametable_size = ROUNDUP(frametable_size, MB(2)); > + base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, PFN_DOWN(MB(2))); The 2Mb aspect wants a (brief) comment, imo. > + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, > + PFN_DOWN(frametable_size), > + PAGE_HYPERVISOR_RW) ) > + panic("Unable to setup the frametable mappings\n"); > + > + memset(&frame_table[0], 0, nr_mfns * sizeof(struct page_info)); > + memset(&frame_table[nr_mfns], -1, > + frametable_size - (nr_mfns * sizeof(struct page_info))); Here (see comments on v1) you're still assuming ps == 0. > +} > + > + Nit: No double blank lines please. > +static mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER; This is used only by __init code, and hence ought to be __initdata. In fact as it's used by just one function afaics, it may want to move into that function (to limit its scope). > +vaddr_t __ro_after_init directmap_virt_start; Even if largely benign, I think this would better be initialized to DIRECTMAP_VIRT_START. > +/* Map the region in the directmap area. */ > +static void __init setup_directmap_mappings(unsigned long base_mfn, > + unsigned long nr_mfns) > +{ > + int rc; > + > + /* First call sets the directmap physical and virtual offset. */ > + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) > + { > + directmap_mfn_start = _mfn(base_mfn); > + > + /* > + * The base address may not be aligned to the second level > + * size (e.g. 1GB when using 4KB pages). This would prevent > + * superpage mappings for all the regions because the virtual > + * address and machine address should both be suitably aligned. > + * > + * Prevent that by offsetting the start of the directmap virtual > + * address. > + */ > + directmap_virt_start = DIRECTMAP_VIRT_START + pfn_to_paddr(base_mfn); Don't you need to mask off top bits of the incoming MFN here, or else you may waste a huge part of direct map space? > + } > + > + if ( base_mfn < mfn_x(directmap_mfn_start) ) > + panic("cannot add directmap mapping at %#lx below heap start %#lx\n", > + base_mfn, mfn_x(directmap_mfn_start)); > + > + rc = map_pages_to_xen((vaddr_t)mfn_to_virt(base_mfn), > + _mfn(base_mfn), nr_mfns, > + PAGE_HYPERVISOR_RW); > + if ( rc ) > + panic("Unable to setup the directmap mappings.\n"); Might help to also log the range in question. Also, to repeat a prior nit: No full stop please at the end of log messages. > +} > + > +/* > + * Setup memory management > + * > + * RISC-V 64 has a large virtual address space (the minimum supported > + * MMU mode is Sv39, which provides TBs of VA space). Is it really TBs? According to my math you'd need more than 40 bits to map a single Tb (alongside other stuff). > + * In the case of RISC-V 64, the directmap and frametable are mapped > + * starting from physical address 0 to simplify the page_to_mfn(), > + * mfn_to_page(), and maddr_to_virt() calculations, as there is no need > + * to account for {directmap, frametable}_base_pdx in this setup. This looks somewhat stale for the directmap part, now that you have directmap_virt_start. > + */ > +void __init setup_mm(void) > +{ > + const struct membanks *banks = bootinfo_get_mem(); > + paddr_t ram_start = INVALID_PADDR; > + paddr_t ram_end = 0; > + paddr_t ram_size = 0; > + unsigned int i; > + > + /* > + * We need some memory to allocate the page-tables used for the directmap > + * mappings. But some regions may contain memory already allocated > + * for other uses (e.g. modules, reserved-memory...). > + * > + * For simplicity, add all the free regions in the boot allocator. > + */ > + populate_boot_allocator(); > + > + total_pages = 0; > + > + for ( i = 0; i < banks->nr_banks; i++ ) > + { > + const struct membank *bank = &banks->bank[i]; > + paddr_t bank_end = bank->start + bank->size; > + > + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); As before - if a bank doesn't cover full pages, this may give the impression of there being more "total pages" than there are. > + ram_start = min(ram_start, bank->start); > + ram_end = max(ram_end, bank_end); > + > + setup_directmap_mappings(PFN_DOWN(bank->start), > + PFN_DOWN(bank->size)); Similarly I don't think this is right when both start and size aren't multiple of PAGE_SIZE. You may map an unsuable partial page at the start, and then fail to map a fully usable page at the end. > --- a/xen/arch/riscv/stubs.c > +++ b/xen/arch/riscv/stubs.c > @@ -2,7 +2,9 @@ > #include <xen/cpumask.h> > #include <xen/domain.h> > #include <xen/irq.h> > +#include <xen/mm.h> > #include <xen/nodemask.h> > +#include <xen/sched.h> > #include <xen/sections.h> > #include <xen/time.h> > #include <public/domctl.h> Neither of these are needed afaict, even without the further comment below. > @@ -409,3 +411,11 @@ unsigned long get_upper_mfn_bound(void) > { > BUG_ON("unimplemented"); > } > + > +/* mm.c */ > + > +void share_xen_page_with_guest(struct page_info *page, struct domain *d, > + enum XENSHARE_flags flags) > +{ > + BUG_ON("unimplemented"); > +} Why not right in mm.c? I thought stubs.c exists only for functions which don't have a proper "home" source file yet. Jan
On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote: > On 23.10.2024 17:50, Oleksii Kurochko wrote: > > Introduce the implementation of setup_mm(), which includes: > > 1. Adding all free regions to the boot allocator, as memory is > > needed > > to allocate page tables used for frame table mapping. > > 2. Calculating RAM size and the RAM end address. > > 3. Setting up direct map mappings from each RAM bank and initialize > > directmap_virt_start (also introduce XENHEAP_VIRT_START which is > > defined as directmap_virt_start) to be properly aligned with RAM > > start to use more superpages to reduce pressure on the TLB. > > 4. Setting up frame table mappings from physical address 0 to > > ram_end > > to simplify mfn_to_page() and page_to_mfn() conversions. > > 5. Setting up total_pages and max_page. > > > > Update virt_to_maddr() to use introduced XENHEAP_VIRT_START. > > > > Implement maddr_to_virt() function to convert a machine address > > to a virtual address. This function is specifically designed to be > > used > > only for the DIRECTMAP region, so a check has been added to ensure > > that > > the address does not exceed DIRECTMAP_SIZE. > > I'm unconvinced by this. Conceivably the function could be used on > "imaginary" addresses, just to calculate abstract positions or e.g. > deltas. At the same time I'm also not going to insist on the removal > of > that assertion, so long as it doesn't trigger. > > > After the introduction of maddr_to_virt() the following linkage > > error starts > > to occur and to avoid it share_xen_page_with_guest() stub is added: > > riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill': > > /build/xen/common/tasklet.c:176: undefined reference to > > `share_xen_page_with_guest' > > riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol > > `share_xen_page_with_guest' > > isn't defined riscv64-linux-gnu-ld: final link failed: bad > > value > > > > Despite the linkger fingering tasklet.c, it's trace.o which has the > > undefined > > refenrece: > > $ find . -name \*.o | while read F; do nm $F | grep > > share_xen_page_with_guest && > > echo $F; done > > U share_xen_page_with_guest > > ./xen/common/built_in.o > > U share_xen_page_with_guest > > ./xen/common/trace.o > > U share_xen_page_with_guest > > ./xen/prelink.o > > > > Looking at trace.i, there is call of share_xen_page_with_guest() > > but in case of > > when maddr_to_virt() is defined as "return NULL" compiler optimizes > > the part of > > common/trace.c code where share_xen_page_with_priviliged_guest() is > > called > > ( there is no any code in dissambled common/trace.o ) so there is > > no real call > > of share_xen_page_with_priviliged_guest(). > > I don't think it's the "return NULL", but rather BUG_ON()'s (really > BUG()'s) > unreachable(). Not the least because the function can't validly > return NULL, > and hence callers have no need to check for NULL. > > > @@ -25,8 +27,11 @@ > > > > static inline void *maddr_to_virt(paddr_t ma) > > { > > - BUG_ON("unimplemented"); > > - return NULL; > > + unsigned long va_offset = maddr_to_directmapoff(ma); > > + > > + ASSERT(va_offset < DIRECTMAP_SIZE); > > + > > + return (void *)(XENHEAP_VIRT_START + va_offset); > > } > > I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when > it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START > in the first place: You don't have a separate "heap" virtual address > range, do you? The name may not be ideal for RISC-V. I borrowed it from Arm, intending to account for cases where the directmap virtual start might not align with DIRECTMAP_VIRT_START due to potential adjustments for superpage mapping. And my understanding is that XENHEAP == DIRECTMAP in case of Arm64. Let's discuss below whether XENHEAP_VIRT_START is necessary, as there are related questions connected to it. > > > @@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma) > > */ > > static inline unsigned long virt_to_maddr(unsigned long va) > > { > > - if ((va >= DIRECTMAP_VIRT_START) && > > + if ((va >= XENHEAP_VIRT_START) && > > (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE))) > > - return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START); > > + return directmapoff_to_maddr(va - XENHEAP_VIRT_START); > > Same concern here then. > > > @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t > > fdt_paddr) > > > > return fdt_virt; > > } > > + > > +#ifndef CONFIG_RISCV_32 > > I'd like to ask that you be more selective with this #ifdef (or omit > it > altogether here). setup_mm() itself, for example, looks good for any > mode. Regarding setup_mm() as they have pretty different implementations for 32 and 64 bit versions. > Like does ... > > > +#define ROUNDDOWN(addr, size) ((addr) & ~((size) - 1)) > > ... this #define. Then again this macro may better be placed in > xen/macros.h anyway, next to ROUNDUP(). I will put it there. It was put in arch specific code as for such long existence of Xen project no one introduce that so I decided that it is only one specific case thereby no real need to go to common. > > > + frametable_size = ROUNDUP(frametable_size, MB(2)); > > + base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, > > PFN_DOWN(MB(2))); > > The 2Mb aspect wants a (brief) comment, imo. > > > + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, > > + PFN_DOWN(frametable_size), > > + PAGE_HYPERVISOR_RW) ) > > + panic("Unable to setup the frametable mappings\n"); > > + > > + memset(&frame_table[0], 0, nr_mfns * sizeof(struct > > page_info)); > > + memset(&frame_table[nr_mfns], -1, > > + frametable_size - (nr_mfns * sizeof(struct > > page_info))); > > Here (see comments on v1) you're still assuming ps == 0. Do you refer to ? ``` > +/* Map a frame table to cover physical addresses ps through pe */ > +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) > +{ > + unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) - This looks to be accounting for a partial page at the end. > + mfn_x(maddr_to_mfn(ps)) + 1; Whereas this doesn't do the same at the start. The sole present caller passes 0, so that's going to be fine for the time being. Yet it's a latent pitfall. I'd recommend to either drop the function parameter, or to deal with it correctly right away. ``` And I've added aligned_ps to cover the case that ps could be not page aligned. Or are you refering to 0 in memset(&frame_table[0],...)? > > > +/* Map the region in the directmap area. */ > > +static void __init setup_directmap_mappings(unsigned long > > base_mfn, > > + unsigned long nr_mfns) > > +{ > > + int rc; > > + > > + /* First call sets the directmap physical and virtual offset. > > */ > > + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) > > + { > > + directmap_mfn_start = _mfn(base_mfn); > > + > > + /* > > + * The base address may not be aligned to the second level > > + * size (e.g. 1GB when using 4KB pages). This would > > prevent > > + * superpage mappings for all the regions because the > > virtual > > + * address and machine address should both be suitably > > aligned. > > + * > > + * Prevent that by offsetting the start of the directmap > > virtual > > + * address. > > + */ > > + directmap_virt_start = DIRECTMAP_VIRT_START + > > pfn_to_paddr(base_mfn); > > Don't you need to mask off top bits of the incoming MFN here, or else > you > may waste a huge part of direct map space? Yes, it will result in a loss of direct map space, but we still have a considerable amount available in Sv39 mode and higher modes. The largest RAM_START I see currently is 0x1000000000, which means we would lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still plenty of free space available, and we can always increase DIRECTMAP_SIZE since we have a lot of free virtual address space in Sv39. That said, I’m not insisting on this approach. My suggestion was to handle the addition and subtraction of directmap_mfn_start in maddr_to_virt() and virt_to_maddr(): ``` +extern mfn_t directmap_mfn_start; extern vaddr_t directmap_virt_start; #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT) @@ -31,7 +32,7 @@ static inline void *maddr_to_virt(paddr_t ma) ASSERT(va_offset < DIRECTMAP_SIZE); - return (void *)(XENHEAP_VIRT_START + va_offset); + return (void *)(XENHEAP_VIRT_START - (mfn_to_maddr(directmap_mfn_start)) + va_offset); } /* @@ -44,7 +45,7 @@ static inline unsigned long virt_to_maddr(unsigned long va) { if ((va >= XENHEAP_VIRT_START) && (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE))) - return directmapoff_to_maddr(va - XENHEAP_VIRT_START); + return directmapoff_to_maddr(va - XENHEAP_VIRT_START + mfn_to_maddr(directmap_mfn_start)); BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2)); ASSERT((va >> (PAGETABLE_ORDER + PAGE_SHIFT)) == diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index 262cec811e..7ef9db2363 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -450,7 +450,7 @@ static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) } -static mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER; +mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER; vaddr_t __ro_after_init directmap_virt_start; /* Map the region in the directmap area. */ @@ -462,6 +462,8 @@ static void __init setup_directmap_mappings(unsigned long base_mfn, /* First call sets the directmap physical and virtual offset. */ if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) { + unsigned long mfn_gb = base_mfn & ~XEN_PT_LEVEL_SIZE(2); + directmap_mfn_start = _mfn(base_mfn); /* @@ -473,7 +475,8 @@ static void __init setup_directmap_mappings(unsigned long base_mfn, * Prevent that by offsetting the start of the directmap virtual * address. */ - directmap_virt_start = DIRECTMAP_VIRT_START + pfn_to_paddr(base_mfn); + directmap_virt_start = DIRECTMAP_VIRT_START + + (base_mfn - mfn_gb) * PAGE_SIZE; /*+ pfn_to_paddr(base_mfn)*/; ``` Finally, regarding masking off the top bits of mfn, I'm not entirely clear on how this should work. If I understand correctly, if I mask off certain top bits in mfn, then I would need to unmask those same top bits in maddr_to_virt() and virt_to_maddr(). Is that correct? Another point I’m unclear on is which specific part of the top bits should be masked. If you could explain this to me, I would really appreciate it, and I'll be happy to use the masking approach. > > > +} > > + > > +/* > > + * Setup memory management > > + * > > + * RISC-V 64 has a large virtual address space (the minimum > > supported > > + * MMU mode is Sv39, which provides TBs of VA space). > > Is it really TBs? According to my math you'd need more than 40 bits > to > map a single Tb (alongside other stuff). I accidentally calculated it as the first 40 bits (from bits 0 to 39) due to the "39" in Sv39. However, in reality, it’s actually 39 bits (from bits 0 to 38), so it represents less than TBs, only GBs of virtual address space. > > > + */ > > +void __init setup_mm(void) > > +{ > > + const struct membanks *banks = bootinfo_get_mem(); > > + paddr_t ram_start = INVALID_PADDR; > > + paddr_t ram_end = 0; > > + paddr_t ram_size = 0; > > + unsigned int i; > > + > > + /* > > + * We need some memory to allocate the page-tables used for > > the directmap > > + * mappings. But some regions may contain memory already > > allocated > > + * for other uses (e.g. modules, reserved-memory...). > > + * > > + * For simplicity, add all the free regions in the boot > > allocator. > > + */ > > + populate_boot_allocator(); > > + > > + total_pages = 0; > > + > > + for ( i = 0; i < banks->nr_banks; i++ ) > > + { > > + const struct membank *bank = &banks->bank[i]; > > + paddr_t bank_end = bank->start + bank->size; > > + > > + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); > > As before - if a bank doesn't cover full pages, this may give the > impression > of there being more "total pages" than there are. Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total size of a bank is 11K, ram_size will end up being 8K, so the "total pages" will cover less RAM than the actual size of the RAM bank. > > > + ram_start = min(ram_start, bank->start); > > + ram_end = max(ram_end, bank_end); > > + > > + setup_directmap_mappings(PFN_DOWN(bank->start), > > + PFN_DOWN(bank->size)); > > Similarly I don't think this is right when both start and size aren't > multiple of PAGE_SIZE. You may map an unsuable partial page at the > start, > and then fail to map a fully usable page at the end. ram_size should be a multiple of PAGE_SIZE because we have: ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); Do you know of any examples where bank->start isn't aligned to PAGE_SIZE? Should be somewhere mentioned what is legal physical address for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part of the bank. ~ Oleksii
On 30.10.2024 17:50, oleksii.kurochko@gmail.com wrote: > On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote: >> On 23.10.2024 17:50, Oleksii Kurochko wrote: >>> @@ -25,8 +27,11 @@ >>> >>> static inline void *maddr_to_virt(paddr_t ma) >>> { >>> - BUG_ON("unimplemented"); >>> - return NULL; >>> + unsigned long va_offset = maddr_to_directmapoff(ma); >>> + >>> + ASSERT(va_offset < DIRECTMAP_SIZE); >>> + >>> + return (void *)(XENHEAP_VIRT_START + va_offset); >>> } >> >> I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when >> it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START >> in the first place: You don't have a separate "heap" virtual address >> range, do you? > The name may not be ideal for RISC-V. I borrowed it from Arm, intending > to account for cases where the directmap virtual start might not align > with DIRECTMAP_VIRT_START due to potential adjustments for superpage > mapping. > And my understanding is that XENHEAP == DIRECTMAP in case of Arm64. Just to mention it: If I looked at Arm64 in isolation (without also considering Arm32, and hence the desire to keep code common where possible), I'd consider the mere existence of XENHEAP_VIRT_START (without an accompanying XENHEAP_VIRT_SIZE) a mistake. Therefore for RISC-V its introduction may be justified by (remote) plans to also cover RV32 at some point. Yet such than needs sayin explicitly in the description. >>> @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t >>> fdt_paddr) >>> >>> return fdt_virt; >>> } >>> + >>> +#ifndef CONFIG_RISCV_32 >> >> I'd like to ask that you be more selective with this #ifdef (or omit >> it >> altogether here). setup_mm() itself, for example, looks good for any >> mode. > Regarding setup_mm() as they have pretty different implementations for > 32 and 64 bit versions. Not setup_mm() itself, it seems. Its helpers - sure. >>> + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, >>> + PFN_DOWN(frametable_size), >>> + PAGE_HYPERVISOR_RW) ) >>> + panic("Unable to setup the frametable mappings\n"); >>> + >>> + memset(&frame_table[0], 0, nr_mfns * sizeof(struct >>> page_info)); >>> + memset(&frame_table[nr_mfns], -1, >>> + frametable_size - (nr_mfns * sizeof(struct >>> page_info))); >> >> Here (see comments on v1) you're still assuming ps == 0. > Do you refer to ? > ``` >> +/* Map a frame table to cover physical addresses ps through pe */ >> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) >> +{ >> + unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) - > > This looks to be accounting for a partial page at the end. > >> + mfn_x(maddr_to_mfn(ps)) + 1; > > Whereas this doesn't do the same at the start. The sole present caller > passes 0, so that's going to be fine for the time being. Yet it's a > latent pitfall. I'd recommend to either drop the function parameter, or > to deal with it correctly right away. > ``` > And I've added aligned_ps to cover the case that ps could be not page > aligned. Not this, no, but ... > Or are you refering to 0 in memset(&frame_table[0],...)? ... this. If the start address wasn't 0, you'd need to invalidate a region at the start of the table, just as you invalidate a region at the end. >>> +/* Map the region in the directmap area. */ >>> +static void __init setup_directmap_mappings(unsigned long >>> base_mfn, >>> + unsigned long nr_mfns) >>> +{ >>> + int rc; >>> + >>> + /* First call sets the directmap physical and virtual offset. >>> */ >>> + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) >>> + { >>> + directmap_mfn_start = _mfn(base_mfn); >>> + >>> + /* >>> + * The base address may not be aligned to the second level >>> + * size (e.g. 1GB when using 4KB pages). This would >>> prevent >>> + * superpage mappings for all the regions because the >>> virtual >>> + * address and machine address should both be suitably >>> aligned. >>> + * >>> + * Prevent that by offsetting the start of the directmap >>> virtual >>> + * address. >>> + */ >>> + directmap_virt_start = DIRECTMAP_VIRT_START + >>> pfn_to_paddr(base_mfn); >> >> Don't you need to mask off top bits of the incoming MFN here, or else >> you >> may waste a huge part of direct map space? > Yes, it will result in a loss of direct map space, but we still have a > considerable amount available in Sv39 mode and higher modes. The > largest RAM_START I see currently is 0x1000000000, which means we would > lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still > plenty of free space available, and we can always increase > DIRECTMAP_SIZE since we have a lot of free virtual address space in > Sv39. Wow, 68 out of 308 - that's more than 20%. I'm definitely concerned of this then. > Finally, regarding masking off the top bits of mfn, I'm not entirely > clear on how this should work. If I understand correctly, if I mask off > certain top bits in mfn, then I would need to unmask those same top > bits in maddr_to_virt() and virt_to_maddr(). Is that correct? > > Another point I’m unclear on is which specific part of the top bits > should be masked. You want to "move" the directmap such that the first legitimate RAM page is within the first (large/huge) page mapping of the directmap. IOW the "virtual" start of the directmap would move down in VA space. That still leaves things at a simple offset calculation when translating VA <-> PA. To give an example: Let's assume RAM starts at 61.5 Gb, and you want to use 1Gb mappings for the bulk of the directmap. Then the "virtual" start of the directmap would shift down to DIRECTMAP_VIRT_START - 60Gb, such that the first RAM page would be mapped at DIRECTMAP_VIRT_START + 1.5Gb. IOW it would be the low 30 address bits of the start address that you use (30 - PAGE_SHIFT for the MFN), with the higher bits contributing to the offset involved in the VA <-> PA translation. Values used depend on the (largest) page size you mean to use for the direct map: On systems with terabytes of memory (demanding Sv48 or even Sv57 mode) you may want to use 512Gb mappings, and hence you'd then need to mask the low 39 bits (or 48 for 256Tb mappings). >>> +void __init setup_mm(void) >>> +{ >>> + const struct membanks *banks = bootinfo_get_mem(); >>> + paddr_t ram_start = INVALID_PADDR; >>> + paddr_t ram_end = 0; >>> + paddr_t ram_size = 0; >>> + unsigned int i; >>> + >>> + /* >>> + * We need some memory to allocate the page-tables used for >>> the directmap >>> + * mappings. But some regions may contain memory already >>> allocated >>> + * for other uses (e.g. modules, reserved-memory...). >>> + * >>> + * For simplicity, add all the free regions in the boot >>> allocator. >>> + */ >>> + populate_boot_allocator(); >>> + >>> + total_pages = 0; >>> + >>> + for ( i = 0; i < banks->nr_banks; i++ ) >>> + { >>> + const struct membank *bank = &banks->bank[i]; >>> + paddr_t bank_end = bank->start + bank->size; >>> + >>> + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); >> >> As before - if a bank doesn't cover full pages, this may give the >> impression >> of there being more "total pages" than there are. > Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total > size of a bank is 11K, ram_size will end up being 8K, so the "total > pages" will cover less RAM than the actual size of the RAM bank. ram_start at 2k but bank size being 13k would yield 2 usable pages (first partial page of 2k unusable and last partial page of 3k unusable), yet ram_size of 12k (3 pages). You need to consider the common case; checking things work for a randomly chosen example isn't enough. >>> + ram_start = min(ram_start, bank->start); >>> + ram_end = max(ram_end, bank_end); >>> + >>> + setup_directmap_mappings(PFN_DOWN(bank->start), >>> + PFN_DOWN(bank->size)); >> >> Similarly I don't think this is right when both start and size aren't >> multiple of PAGE_SIZE. You may map an unsuable partial page at the >> start, >> and then fail to map a fully usable page at the end. > ram_size should be a multiple of PAGE_SIZE because we have: > ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); > > Do you know of any examples where bank->start isn't aligned to > PAGE_SIZE? Question is the other way around: Is it specified anywhere that start (and size) _need_ to be aligned? And if it is - do all firmware makers play by that (on x86 at least specifications often mean pretty little to firmware people, apparently)? > Should be somewhere mentioned what is legal physical address > for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no > choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part > of the bank. Correct - partial pages simply cannot be used (except in adhoc ways, which likely isn't very desirable). Jan
> > > > > > > > > > Finally, regarding masking off the top bits of mfn, I'm not > > entirely > > clear on how this should work. If I understand correctly, if I mask > > off > > certain top bits in mfn, then I would need to unmask those same top > > bits in maddr_to_virt() and virt_to_maddr(). Is that correct? > > > > Another point I’m unclear on is which specific part of the top bits > > should be masked. > > You want to "move" the directmap such that the first legitimate RAM > page is within the first (large/huge) page mapping of the directmap. > IOW the "virtual" start of the directmap would move down in VA space. > That still leaves things at a simple offset calculation when > translating VA <-> PA. > > To give an example: Let's assume RAM starts at 61.5 Gb, and you want > to > use 1Gb mappings for the bulk of the directmap. Then the "virtual" > start > of the directmap would shift down to DIRECTMAP_VIRT_START - 60Gb, > such that the first RAM page would be mapped at > DIRECTMAP_VIRT_START + 1.5Gb. IOW it would be the low 30 address bits > of > the start address that you use (30 - PAGE_SHIFT for the MFN), with > the > higher bits contributing to the offset involved in the VA <-> PA > translation. Values used depend on the (largest) page size you mean > to > use for the direct map: On systems with terabytes of memory > (demanding > Sv48 or even Sv57 mode) you may want to use 512Gb mappings, and hence > you'd then need to mask the low 39 bits (or 48 for 256Tb mappings). Thanks a lot for clarification. IIUC then not to many things should be changed, only directmap mapping virtual address and calculation of proper virtual address start of directmap: --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -457,6 +457,7 @@ vaddr_t __ro_after_init directmap_virt_start; static void __init setup_directmap_mappings(unsigned long base_mfn, unsigned long nr_mfns) { + unsigned long base_addr = mfn_to_maddr(_mfn(base_mfn)); int rc; /* First call sets the directmap physical and virtual offset. */ @@ -473,14 +474,14 @@ static void __init setup_directmap_mappings(unsigned long base_mfn, * Prevent that by offsetting the start of the directmap virtual * address. */ - directmap_virt_start = DIRECTMAP_VIRT_START + pfn_to_paddr(base_mfn); + directmap_virt_start = DIRECTMAP_VIRT_START - (base_addr & ~XEN_PT_LEVEL_SIZE(HYP_PT_ROOT_LEVEL)); } if ( base_mfn < mfn_x(directmap_mfn_start) ) panic("cannot add directmap mapping at %#lx below heap start %#lx\n", base_mfn, mfn_x(directmap_mfn_start)); - rc = map_pages_to_xen((vaddr_t)mfn_to_virt(base_mfn), + rc = map_pages_to_xen(DIRECTMAP_VIRT_START + (base_addr & XEN_PT_LEVEL_SIZE(HYP_PT_ROOT_LEVEL)), _mfn(base_mfn), nr_mfns, PAGE_HYPERVISOR_RW); And of course then use directmap_virt_start in maddr_to_virt() and virt_to_maddr(): @@ -31,7 +31,7 @@ static inline void *maddr_to_virt(paddr_t ma) ASSERT(va_offset < DIRECTMAP_SIZE); - return (void *)(XENHEAP_VIRT_START + va_offset); + return (void *)(directmap_virt_start + va_offset); } /* @@ -44,7 +44,7 @@ static inline unsigned long virt_to_maddr(unsigned long va) { if ((va >= XENHEAP_VIRT_START) && (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE))) - return directmapoff_to_maddr(va - XENHEAP_VIRT_START); + return directmapoff_to_maddr(va - directmap_virt_start); > > > > > +void __init setup_mm(void) > > > > +{ > > > > + const struct membanks *banks = bootinfo_get_mem(); > > > > + paddr_t ram_start = INVALID_PADDR; > > > > + paddr_t ram_end = 0; > > > > + paddr_t ram_size = 0; > > > > + unsigned int i; > > > > + > > > > + /* > > > > + * We need some memory to allocate the page-tables used > > > > for > > > > the directmap > > > > + * mappings. But some regions may contain memory already > > > > allocated > > > > + * for other uses (e.g. modules, reserved-memory...). > > > > + * > > > > + * For simplicity, add all the free regions in the boot > > > > allocator. > > > > + */ > > > > + populate_boot_allocator(); > > > > + > > > > + total_pages = 0; > > > > + > > > > + for ( i = 0; i < banks->nr_banks; i++ ) > > > > + { > > > > + const struct membank *bank = &banks->bank[i]; > > > > + paddr_t bank_end = bank->start + bank->size; > > > > + > > > > + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); > > > > > > As before - if a bank doesn't cover full pages, this may give the > > > impression > > > of there being more "total pages" than there are. > > Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total > > size of a bank is 11K, ram_size will end up being 8K, so the "total > > pages" will cover less RAM than the actual size of the RAM bank. > > ram_start at 2k but bank size being 13k would yield 2 usable pages > (first partial page of 2k unusable and last partial page of 3k > unusable), yet ram_size of 12k (3 pages). You need to consider the > common case; checking things work for a randomly chosen example isn't > enough. Then I have to check separately the start and end of bank and check if ram_size should be reduced in case if the start or end isn't properly aligned. > > > > > + ram_start = min(ram_start, bank->start); > > > > + ram_end = max(ram_end, bank_end); > > > > + > > > > + setup_directmap_mappings(PFN_DOWN(bank->start), > > > > + PFN_DOWN(bank->size)); > > > > > > Similarly I don't think this is right when both start and size > > > aren't > > > multiple of PAGE_SIZE. You may map an unsuable partial page at > > > the > > > start, > > > and then fail to map a fully usable page at the end. > > ram_size should be a multiple of PAGE_SIZE because we have: > > ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); > > > > Do you know of any examples where bank->start isn't aligned to > > PAGE_SIZE? > > Question is the other way around: Is it specified anywhere that start > (and > size) _need_ to be aligned? And if it is - do all firmware makers > play by > that (on x86 at least specifications often mean pretty little to > firmware > people, apparently)? Yes, I understand that, I tried to find that somewhere in priv/unpriv spec and wasn't able to find that. And that is why I asked if it should be mentioned somewhere. Anyway, I think that it will be better just to update the code and make it working in any case. Thanks. ~ Oleksii
On 31.10.2024 14:19, oleksii.kurochko@gmail.com wrote: >>>>> +void __init setup_mm(void) >>>>> +{ >>>>> + const struct membanks *banks = bootinfo_get_mem(); >>>>> + paddr_t ram_start = INVALID_PADDR; >>>>> + paddr_t ram_end = 0; >>>>> + paddr_t ram_size = 0; >>>>> + unsigned int i; >>>>> + >>>>> + /* >>>>> + * We need some memory to allocate the page-tables used >>>>> for >>>>> the directmap >>>>> + * mappings. But some regions may contain memory already >>>>> allocated >>>>> + * for other uses (e.g. modules, reserved-memory...). >>>>> + * >>>>> + * For simplicity, add all the free regions in the boot >>>>> allocator. >>>>> + */ >>>>> + populate_boot_allocator(); >>>>> + >>>>> + total_pages = 0; >>>>> + >>>>> + for ( i = 0; i < banks->nr_banks; i++ ) >>>>> + { >>>>> + const struct membank *bank = &banks->bank[i]; >>>>> + paddr_t bank_end = bank->start + bank->size; >>>>> + >>>>> + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); >>>> >>>> As before - if a bank doesn't cover full pages, this may give the >>>> impression >>>> of there being more "total pages" than there are. >>> Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total >>> size of a bank is 11K, ram_size will end up being 8K, so the "total >>> pages" will cover less RAM than the actual size of the RAM bank. >> >> ram_start at 2k but bank size being 13k would yield 2 usable pages >> (first partial page of 2k unusable and last partial page of 3k >> unusable), yet ram_size of 12k (3 pages). You need to consider the >> common case; checking things work for a randomly chosen example isn't >> enough. > Then I have to check separately the start and end of bank and check if > ram_size should be reduced in case if the start or end isn't properly > aligned. All I think you need to do is align bank start up to a page boundary and align bank end down to a page boundary. Jan
© 2016 - 2024 Red Hat, Inc.