[v3] pdx: introduce a new compression algorithm

[PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

Posted by Roger Pau Monne 3 months, 1 week ago

There's a loop in arch_init_memory() that iterates over holes and non-RAM
regions to possibly mark any page_info structures matching those addresses
as IO.  The looping there is done over the PFN space.

PFNs not covered by the PDX space will always fail the mfn_valid() check,
hence re-write the loop to iterate over the PDX space and avoid checking
any holes that are not covered by the PDX translation.

On a system with a ~6TiB hole this change together with using PDX
compression reduces boot time in approximately 20 seconds.  Xen boot time
without the change is ~50s, with the change it's ~30s.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - New in this version.
---
 xen/arch/x86/mm.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7fd56c7ce90..647bf0b41db6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
 
 void __init arch_init_memory(void)
 {
-    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
+    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
 
     /*
      * Basic guest-accessible flags:
@@ -328,9 +328,14 @@ void __init arch_init_memory(void)
             destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
                                  (unsigned long)mfn_to_virt(ioend_pfn));
 
-        /* Mark as I/O up to next RAM region. */
-        for ( ; pfn < rstart_pfn; pfn++ )
+        /*
+         * Mark as I/O up to next RAM region.  Iterate over the PDX space to
+         * skip holes which would always fail the mfn_valid() check.
+         */
+        for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
         {
+            pfn = pdx_to_pfn(pdx);
+
             if ( !mfn_valid(_mfn(pfn)) )
                 continue;
 
-- 
2.49.0

Re: [PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

Posted by Jan Beulich 3 months ago

On 24.07.2025 13:04, Roger Pau Monne wrote:
> There's a loop in arch_init_memory() that iterates over holes and non-RAM
> regions to possibly mark any page_info structures matching those addresses
> as IO.  The looping there is done over the PFN space.
> 
> PFNs not covered by the PDX space will always fail the mfn_valid() check,
> hence re-write the loop to iterate over the PDX space and avoid checking
> any holes that are not covered by the PDX translation.
> 
> On a system with a ~6TiB hole this change together with using PDX
> compression reduces boot time in approximately 20 seconds.  Xen boot time
> without the change is ~50s, with the change it's ~30s.

That's nice, and I agree what we currently do isn't very efficient, but ...

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
>  
>  void __init arch_init_memory(void)
>  {
> -    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
> +    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
>  
>      /*
>       * Basic guest-accessible flags:
> @@ -328,9 +328,14 @@ void __init arch_init_memory(void)
>              destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
>                                   (unsigned long)mfn_to_virt(ioend_pfn));
>  
> -        /* Mark as I/O up to next RAM region. */
> -        for ( ; pfn < rstart_pfn; pfn++ )
> +        /*
> +         * Mark as I/O up to next RAM region.  Iterate over the PDX space to
> +         * skip holes which would always fail the mfn_valid() check.
> +         */
> +        for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )

... pfn_to_pdx() isn't well-defined for a non-RAM PFN, or more precisely for any
PFN that fails the mfn_valid() check. That is, I think, particularly noticeable
with the new offset compression you introduce.

Jan

Re: [PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

Posted by Roger Pau Monné 3 months ago

On Tue, Jul 29, 2025 at 04:33:53PM +0200, Jan Beulich wrote:
> On 24.07.2025 13:04, Roger Pau Monne wrote:
> > There's a loop in arch_init_memory() that iterates over holes and non-RAM
> > regions to possibly mark any page_info structures matching those addresses
> > as IO.  The looping there is done over the PFN space.
> > 
> > PFNs not covered by the PDX space will always fail the mfn_valid() check,
> > hence re-write the loop to iterate over the PDX space and avoid checking
> > any holes that are not covered by the PDX translation.
> > 
> > On a system with a ~6TiB hole this change together with using PDX
> > compression reduces boot time in approximately 20 seconds.  Xen boot time
> > without the change is ~50s, with the change it's ~30s.
> 
> That's nice, and I agree what we currently do isn't very efficient, but ...
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
> >  
> >  void __init arch_init_memory(void)
> >  {
> > -    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
> > +    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
> >  
> >      /*
> >       * Basic guest-accessible flags:
> > @@ -328,9 +328,14 @@ void __init arch_init_memory(void)
> >              destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
> >                                   (unsigned long)mfn_to_virt(ioend_pfn));
> >  
> > -        /* Mark as I/O up to next RAM region. */
> > -        for ( ; pfn < rstart_pfn; pfn++ )
> > +        /*
> > +         * Mark as I/O up to next RAM region.  Iterate over the PDX space to
> > +         * skip holes which would always fail the mfn_valid() check.
> > +         */
> > +        for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
> 
> ... pfn_to_pdx() isn't well-defined for a non-RAM PFN, or more precisely for any
> PFN that fails the mfn_valid() check. That is, I think, particularly noticeable
> with the new offset compression you introduce.

rstart_pfn will always point to the start of the next RAM region (or
the end of the current region if it's the last one).  So for that case
pfn_to_pdx() is always provided a RAM PFN as input parameter.

However for the pfn parameter, we would need to do pfn_to_pdx(pfn -
1), as that's the last address in the previous RAM range.  The loop
would then possibly be:

for ( pdx = pfn_to_pdx((pfn ?: 1) - 1) + 1; pdx < pfn_to_pdx(rstart_pfn); pdx++ )
{
    ...

This also assumes that PFN 0 will always have a valid PDX translation,
regardless of whether it's RAM or not (which is the case given the PDX
code currently used).

Thanks, Roger.

Re: [PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

Posted by Jan Beulich 3 months ago

On 29.07.2025 18:54, Roger Pau Monné wrote:
> On Tue, Jul 29, 2025 at 04:33:53PM +0200, Jan Beulich wrote:
>> On 24.07.2025 13:04, Roger Pau Monne wrote:
>>> There's a loop in arch_init_memory() that iterates over holes and non-RAM
>>> regions to possibly mark any page_info structures matching those addresses
>>> as IO.  The looping there is done over the PFN space.
>>>
>>> PFNs not covered by the PDX space will always fail the mfn_valid() check,
>>> hence re-write the loop to iterate over the PDX space and avoid checking
>>> any holes that are not covered by the PDX translation.
>>>
>>> On a system with a ~6TiB hole this change together with using PDX
>>> compression reduces boot time in approximately 20 seconds.  Xen boot time
>>> without the change is ~50s, with the change it's ~30s.
>>
>> That's nice, and I agree what we currently do isn't very efficient, but ...
>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -275,7 +275,7 @@ static void __init assign_io_page(struct page_info *page)
>>>  
>>>  void __init arch_init_memory(void)
>>>  {
>>> -    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
>>> +    unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn, pdx;
>>>  
>>>      /*
>>>       * Basic guest-accessible flags:
>>> @@ -328,9 +328,14 @@ void __init arch_init_memory(void)
>>>              destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
>>>                                   (unsigned long)mfn_to_virt(ioend_pfn));
>>>  
>>> -        /* Mark as I/O up to next RAM region. */
>>> -        for ( ; pfn < rstart_pfn; pfn++ )
>>> +        /*
>>> +         * Mark as I/O up to next RAM region.  Iterate over the PDX space to
>>> +         * skip holes which would always fail the mfn_valid() check.
>>> +         */
>>> +        for ( pdx = pfn_to_pdx(pfn); pdx < pfn_to_pdx(rstart_pfn); pdx++ )
>>
>> ... pfn_to_pdx() isn't well-defined for a non-RAM PFN, or more precisely for any
>> PFN that fails the mfn_valid() check. That is, I think, particularly noticeable
>> with the new offset compression you introduce.
> 
> rstart_pfn will always point to the start of the next RAM region (or
> the end of the current region if it's the last one).  So for that case
> pfn_to_pdx() is always provided a RAM PFN as input parameter.
> 
> However for the pfn parameter, we would need to do pfn_to_pdx(pfn -
> 1), as that's the last address in the previous RAM range.  The loop
> would then possibly be:
> 
> for ( pdx = pfn_to_pdx((pfn ?: 1) - 1) + 1; pdx < pfn_to_pdx(rstart_pfn); pdx++ )
> {
>     ...
> 
> This also assumes that PFN 0 will always have a valid PDX translation,
> regardless of whether it's RAM or not (which is the case given the PDX
> code currently used).

Looks good to me. The caveat may then want mentioning in the comment as
well.

Jan

[PATCH v3 1/8] kconfig: turn PDX compression into a choice
[PATCH v3 2/8] pdx: provide a unified set of unit functions
[PATCH v3 3/8] pdx: introduce command line compression toggle
[PATCH v3 4/8] pdx: allow per-arch optimization of PDX conversion helpers
[PATCH v3 5/8] test/pdx: add PDX compression unit tests
[PATCH v3 6/8] pdx: move some helpers in preparation for new compression
[PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets
[PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space