There's a loop in arch_init_memory() that iterates over holes and non-RAM
regions to possibly mark any page_info structures matching those addresses
as IO. The looping there is done over the PFN space.
PFNs not covered by the PDX space will always fail the mfn_valid() check,
hence re-write the loop to iterate over the PDX space and avoid checking
any holes that are not covered by the PDX translation.
On a system with a ~6TiB hole this change together with using PDX
compression reduces boot time in approximately 20 seconds. Xen boot time
without the change is ~50s, with the change it's ~30s.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
- New in this version.
---
xen/arch/x86/mm.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7fd56c7ce90..647bf0b41db6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
void __init arch_init_memory(void)
{
- unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
+ unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
/*
* Basic guest-accessible flags:
@@ -328,9 +328,14 @@ void __init arch_init_memory(void)
destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
(unsigned long)mfn_to_virt(ioend_pfn));
- /* Mark as I/O up to next RAM region. */
- for ( ; pfn < rstart_pfn; pfn++ )
+ /*
+ * Mark as I/O up to next RAM region. Iterate over the PDX space to
+ * skip holes which would always fail the mfn_valid() check.
+ */
+ for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
{
+ pfn = pdx_to_pfn(pdx);
+
if ( !mfn_valid(_mfn(pfn)) )
continue;
--
2.49.0
On 24.07.2025 13:04, Roger Pau Monne wrote:
> There's a loop in arch_init_memory() that iterates over holes and non-RAM
> regions to possibly mark any page_info structures matching those addresses
> as IO. The looping there is done over the PFN space.
>
> PFNs not covered by the PDX space will always fail the mfn_valid() check,
> hence re-write the loop to iterate over the PDX space and avoid checking
> any holes that are not covered by the PDX translation.
>
> On a system with a ~6TiB hole this change together with using PDX
> compression reduces boot time in approximately 20 seconds. Xen boot time
> without the change is ~50s, with the change it's ~30s.
That's nice, and I agree what we currently do isn't very efficient, but ...
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
>
> void __init arch_init_memory(void)
> {
> - unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
> + unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
>
> /*
> * Basic guest-accessible flags:
> @@ -328,9 +328,14 @@ void __init arch_init_memory(void)
> destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
> (unsigned long)mfn_to_virt(ioend_pfn));
>
> - /* Mark as I/O up to next RAM region. */
> - for ( ; pfn < rstart_pfn; pfn++ )
> + /*
> + * Mark as I/O up to next RAM region. Iterate over the PDX space to
> + * skip holes which would always fail the mfn_valid() check.
> + */
> + for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
... pfn_to_pdx() isn't well-defined for a non-RAM PFN, or more precisely for any
PFN that fails the mfn_valid() check. That is, I think, particularly noticeable
with the new offset compression you introduce.
Jan
On Tue, Jul 29, 2025 at 04:33:53PM +0200, Jan Beulich wrote:
> On 24.07.2025 13:04, Roger Pau Monne wrote:
> > There's a loop in arch_init_memory() that iterates over holes and non-RAM
> > regions to possibly mark any page_info structures matching those addresses
> > as IO. The looping there is done over the PFN space.
> >
> > PFNs not covered by the PDX space will always fail the mfn_valid() check,
> > hence re-write the loop to iterate over the PDX space and avoid checking
> > any holes that are not covered by the PDX translation.
> >
> > On a system with a ~6TiB hole this change together with using PDX
> > compression reduces boot time in approximately 20 seconds. Xen boot time
> > without the change is ~50s, with the change it's ~30s.
>
> That's nice, and I agree what we currently do isn't very efficient, but ...
>
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
> >
> > void __init arch_init_memory(void)
> > {
> > - unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
> > + unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
> >
> > /*
> > * Basic guest-accessible flags:
> > @@ -328,9 +328,14 @@ void __init arch_init_memory(void)
> > destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
> > (unsigned long)mfn_to_virt(ioend_pfn));
> >
> > - /* Mark as I/O up to next RAM region. */
> > - for ( ; pfn < rstart_pfn; pfn++ )
> > + /*
> > + * Mark as I/O up to next RAM region. Iterate over the PDX space to
> > + * skip holes which would always fail the mfn_valid() check.
> > + */
> > + for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
>
> ... pfn_to_pdx() isn't well-defined for a non-RAM PFN, or more precisely for any
> PFN that fails the mfn_valid() check. That is, I think, particularly noticeable
> with the new offset compression you introduce.
rstart_pfn will always point to the start of the next RAM region (or
the end of the current region if it's the last one). So for that case
pfn_to_pdx() is always provided a RAM PFN as input parameter.
However for the pfn parameter, we would need to do pfn_to_pdx(pfn -
1), as that's the last address in the previous RAM range. The loop
would then possibly be:
for ( pdx = pfn_to_pdx((pfn ?: 1) - 1) + 1; pdx < pfn_to_pdx(rstart_pfn); pdx++ )
{
...
This also assumes that PFN 0 will always have a valid PDX translation,
regardless of whether it's RAM or not (which is the case given the PDX
code currently used).
Thanks, Roger.
On 29.07.2025 18:54, Roger Pau Monné wrote:
> On Tue, Jul 29, 2025 at 04:33:53PM +0200, Jan Beulich wrote:
>> On 24.07.2025 13:04, Roger Pau Monne wrote:
>>> There's a loop in arch_init_memory() that iterates over holes and non-RAM
>>> regions to possibly mark any page_info structures matching those addresses
>>> as IO. The looping there is done over the PFN space.
>>>
>>> PFNs not covered by the PDX space will always fail the mfn_valid() check,
>>> hence re-write the loop to iterate over the PDX space and avoid checking
>>> any holes that are not covered by the PDX translation.
>>>
>>> On a system with a ~6TiB hole this change together with using PDX
>>> compression reduces boot time in approximately 20 seconds. Xen boot time
>>> without the change is ~50s, with the change it's ~30s.
>>
>> That's nice, and I agree what we currently do isn't very efficient, but ...
>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
>>>
>>> void __init arch_init_memory(void)
>>> {
>>> - unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
>>> + unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
>>>
>>> /*
>>> * Basic guest-accessible flags:
>>> @@ -328,9 +328,14 @@ void __init arch_init_memory(void)
>>> destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
>>> (unsigned long)mfn_to_virt(ioend_pfn));
>>>
>>> - /* Mark as I/O up to next RAM region. */
>>> - for ( ; pfn < rstart_pfn; pfn++ )
>>> + /*
>>> + * Mark as I/O up to next RAM region. Iterate over the PDX space to
>>> + * skip holes which would always fail the mfn_valid() check.
>>> + */
>>> + for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
>>
>> ... pfn_to_pdx() isn't well-defined for a non-RAM PFN, or more precisely for any
>> PFN that fails the mfn_valid() check. That is, I think, particularly noticeable
>> with the new offset compression you introduce.
>
> rstart_pfn will always point to the start of the next RAM region (or
> the end of the current region if it's the last one). So for that case
> pfn_to_pdx() is always provided a RAM PFN as input parameter.
>
> However for the pfn parameter, we would need to do pfn_to_pdx(pfn -
> 1), as that's the last address in the previous RAM range. The loop
> would then possibly be:
>
> for ( pdx = pfn_to_pdx((pfn ?: 1) - 1) + 1; pdx < pfn_to_pdx(rstart_pfn); pdx++ )
> {
> ...
>
> This also assumes that PFN 0 will always have a valid PDX translation,
> regardless of whether it's RAM or not (which is the case given the PDX
> code currently used).
Looks good to me. The caveat may then want mentioning in the comment as
well.
Jan
© 2016 - 2025 Red Hat, Inc.