[v1] pdx: introduce a new compression algorithm

[PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

Posted by Roger Pau Monne 4 months, 3 weeks ago

When not using CONFIG_BIGMEM there are some restrictions in the address
width for allocations of the domain structure, as it's PDX truncated to
32bits it's stashed into page_info structure for domain allocated pages.

The current logic to calculate this limit is based on the internals of the
PDX compression used, which is not strictly required.  Instead simplify the
logic to rely on the existing PDX to PFN conversion helpers used elsewhere.

This has the added benefit of allowing alternative PDX compression
algorithms to be implemented without requiring to change the calculation of
the domain structure allocation boundary.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/domain.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7536b6c8717e..2f438ce367cf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -461,30 +461,6 @@ void domain_cpu_policy_changed(struct domain *d)
     }
 }
 
-#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION)
-/*
- * The hole may be at or above the 44-bit boundary, so we need to determine
- * the total bit count until reaching 32 significant (not squashed out) bits
- * in PFN representations.
- * Note that the way "bits" gets initialized/updated/bounds-checked guarantees
- * that the function will never return zero, and hence will never be called
- * more than once (which is important due to it being deliberately placed in
- * .init.text).
- */
-static unsigned int __init noinline _domain_struct_bits(void)
-{
-    unsigned int bits = 32 + PAGE_SHIFT;
-    unsigned int sig = hweight32(~pfn_hole_mask);
-    unsigned int mask = pfn_hole_mask >> 32;
-
-    for ( ; bits < BITS_PER_LONG && sig < 32; ++bits, mask >>= 1 )
-        if ( !(mask & 1) )
-            ++sig;
-
-    return bits;
-}
-#endif
-
 struct domain *alloc_domain_struct(void)
 {
     struct domain *d;
@@ -498,14 +474,15 @@ struct domain *alloc_domain_struct(void)
      * On systems with CONFIG_BIGMEM there's no packing, and so there's no
      * such restriction.
      */
-#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
-    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
-                                                          32 + PAGE_SHIFT;
+#if defined(CONFIG_BIGMEM)
+    const unsigned int bits = 0;
 #else
-    static unsigned int __read_mostly bits;
+    static unsigned int __ro_after_init bits;
 
     if ( unlikely(!bits) )
-         bits = _domain_struct_bits();
+         bits = flsl(pfn_to_paddr(pdx_to_pfn(
+             1UL << (sizeof(((struct page_info *)NULL)->v.inuse._domain) * 8))))
+             - 1;
 #endif
 
     BUILD_BUG_ON(sizeof(*d) > PAGE_SIZE);
-- 
2.49.0

Re: [PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

Posted by Jan Beulich 4 months, 2 weeks ago

On 11.06.2025 19:16, Roger Pau Monne wrote:
> @@ -498,14 +474,15 @@ struct domain *alloc_domain_struct(void)
>       * On systems with CONFIG_BIGMEM there's no packing, and so there's no
>       * such restriction.
>       */
> -#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
> -    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
> -                                                          32 + PAGE_SHIFT;
> +#if defined(CONFIG_BIGMEM)
> +    const unsigned int bits = 0;
>  #else
> -    static unsigned int __read_mostly bits;
> +    static unsigned int __ro_after_init bits;
>  
>      if ( unlikely(!bits) )
> -         bits = _domain_struct_bits();
> +         bits = flsl(pfn_to_paddr(pdx_to_pfn(
> +             1UL << (sizeof(((struct page_info *)NULL)->v.inuse._domain) * 8))))
> +             - 1;

While Andrew did point you at sizeof_field(), we can have this even less verbose
by utilizing that frame_table is of the right type and (almost) globally in scope.

Further, why use pfn_to_paddr()?

         bits = flsl(pdx_to_pfn(1UL << 
                                (sizeof(frame_table->v.inuse._domain) * 8))) +
                PAGE_SHIFT - 1;

However, it further feels like this was off by one; we had similar issues over
time in several places. There potentially being a gap between one less than
the PDX used here and that very PDX, don't we need to calculate based on the
"one less" value here? Hmm, there being a gap means no allocation would
succeed for the precise value of "bits" (in the mask-compression scheme), so
functionally all would be fine. Yet just to avoid setting a bad precedent I
think we'd still be better off using

         bits = flsl(pdx_to_pfn((1UL << 
                                 (sizeof(frame_table->v.inuse._domain) * 8)) -
                                1)) + PAGE_SHIFT;

If one would log the value of bits, the result would then also be less
confusing in (at least) the mask-compression scheme.

Jan

Re: [PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

Posted by Roger Pau Monné 4 months, 2 weeks ago

On Thu, Jun 12, 2025 at 11:03:21AM +0200, Jan Beulich wrote:
> On 11.06.2025 19:16, Roger Pau Monne wrote:
> > @@ -498,14 +474,15 @@ struct domain *alloc_domain_struct(void)
> >       * On systems with CONFIG_BIGMEM there's no packing, and so there's no
> >       * such restriction.
> >       */
> > -#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
> > -    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
> > -                                                          32 + PAGE_SHIFT;
> > +#if defined(CONFIG_BIGMEM)
> > +    const unsigned int bits = 0;
> >  #else
> > -    static unsigned int __read_mostly bits;
> > +    static unsigned int __ro_after_init bits;
> >  
> >      if ( unlikely(!bits) )
> > -         bits = _domain_struct_bits();
> > +         bits = flsl(pfn_to_paddr(pdx_to_pfn(
> > +             1UL << (sizeof(((struct page_info *)NULL)->v.inuse._domain) * 8))))
> > +             - 1;
> 
> While Andrew did point you at sizeof_field(), we can have this even less verbose
> by utilizing that frame_table is of the right type and (almost) globally in scope.
> 
> Further, why use pfn_to_paddr()?
> 
>          bits = flsl(pdx_to_pfn(1UL << 
>                                 (sizeof(frame_table->v.inuse._domain) * 8))) +
>                 PAGE_SHIFT - 1;

I've introduced and used pdx_to_paddr(), which I think it's more
natural.  We already had a paddr_to_pdx() which was missing it's
bidirectional equivalent.  It's now:

         bits = flsl(pdx_to_paddr(1UL << (sizeof_field(struct page_info,
                                                       v.inuse._domain) * 8)))
                - 1;

> However, it further feels like this was off by one; we had similar issues over
> time in several places. There potentially being a gap between one less than
> the PDX used here and that very PDX, don't we need to calculate based on the
> "one less" value here? Hmm, there being a gap means no allocation would
> succeed for the precise value of "bits" (in the mask-compression scheme), so
> functionally all would be fine. Yet just to avoid setting a bad precedent I
> think we'd still be better off using
> 
>          bits = flsl(pdx_to_pfn((1UL << 
>                                  (sizeof(frame_table->v.inuse._domain) * 8)) -
>                                 1)) + PAGE_SHIFT;
> 
> If one would log the value of bits, the result would then also be less
> confusing in (at least) the mask-compression scheme.


Is the above correct tough?

Take for example the hypothetical case where pdx_to_pfn() returns
0x10.  Then flsl() will return 5 (let's leave the PAGE_SHIFT
adjustment out for the example here).  The allocation bit width would
be off-by-one, because allocating using a bit width of 5 would also
allow 0x11 to be allocated, and that won't be correct.

I think we need to get the bit width of the next pdx (so the
non-inclusive end of the range), and then subtract 1 from it,
otherwise the allocation bit width is possibly off-by-one.

Thanks, Roger.

Re: [PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

Posted by Jan Beulich 4 months, 2 weeks ago

On 12.06.2025 12:46, Roger Pau Monné wrote:
> On Thu, Jun 12, 2025 at 11:03:21AM +0200, Jan Beulich wrote:
>> On 11.06.2025 19:16, Roger Pau Monne wrote:
>>> @@ -498,14 +474,15 @@ struct domain *alloc_domain_struct(void)
>>>       * On systems with CONFIG_BIGMEM there's no packing, and so there's no
>>>       * such restriction.
>>>       */
>>> -#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
>>> -    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
>>> -                                                          32 + PAGE_SHIFT;
>>> +#if defined(CONFIG_BIGMEM)
>>> +    const unsigned int bits = 0;
>>>  #else
>>> -    static unsigned int __read_mostly bits;
>>> +    static unsigned int __ro_after_init bits;
>>>  
>>>      if ( unlikely(!bits) )
>>> -         bits = _domain_struct_bits();
>>> +         bits = flsl(pfn_to_paddr(pdx_to_pfn(
>>> +             1UL << (sizeof(((struct page_info *)NULL)->v.inuse._domain) * 8))))
>>> +             - 1;
>>
>> While Andrew did point you at sizeof_field(), we can have this even less verbose
>> by utilizing that frame_table is of the right type and (almost) globally in scope.
>>
>> Further, why use pfn_to_paddr()?
>>
>>          bits = flsl(pdx_to_pfn(1UL << 
>>                                 (sizeof(frame_table->v.inuse._domain) * 8))) +
>>                 PAGE_SHIFT - 1;
> 
> I've introduced and used pdx_to_paddr(), which I think it's more
> natural.  We already had a paddr_to_pdx() which was missing it's
> bidirectional equivalent.  It's now:
> 
>          bits = flsl(pdx_to_paddr(1UL << (sizeof_field(struct page_info,
>                                                        v.inuse._domain) * 8)))
>                 - 1;

Textually this is better, yes. I won't insist on the other variant, while
still noting that your way there's an extra shift whereas my way there's
merely an extra add.

>> However, it further feels like this was off by one; we had similar issues over
>> time in several places. There potentially being a gap between one less than
>> the PDX used here and that very PDX, don't we need to calculate based on the
>> "one less" value here? Hmm, there being a gap means no allocation would
>> succeed for the precise value of "bits" (in the mask-compression scheme), so
>> functionally all would be fine. Yet just to avoid setting a bad precedent I
>> think we'd still be better off using
>>
>>          bits = flsl(pdx_to_pfn((1UL << 
>>                                  (sizeof(frame_table->v.inuse._domain) * 8)) -
>>                                 1)) + PAGE_SHIFT;
>>
>> If one would log the value of bits, the result would then also be less
>> confusing in (at least) the mask-compression scheme.
> 
> 
> Is the above correct tough?
> 
> Take for example the hypothetical case where pdx_to_pfn() returns
> 0x10.

Hmm, yes - while impossible in the mask-compression scheme, it is in
principle possible with other schemes (like the offset one).

>  Then flsl() will return 5 (let's leave the PAGE_SHIFT
> adjustment out for the example here).  The allocation bit width would
> be off-by-one, because allocating using a bit width of 5 would also
> allow 0x11 to be allocated, and that won't be correct.
> 
> I think we need to get the bit width of the next pdx (so the
> non-inclusive end of the range), and then subtract 1 from it,
> otherwise the allocation bit width is possibly off-by-one.

I think you're right, and I can't really see how to (easily) get the more
precise value for the mask-compression scheme then. I would therefore
like to ask that you attach a comment clarifying the slight oddity.

Jan

Re: [PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

Posted by Andrew Cooper 4 months, 3 weeks ago

On 11/06/2025 6:16 pm, Roger Pau Monne wrote:
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 7536b6c8717e..2f438ce367cf 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -461,30 +461,6 @@ void domain_cpu_policy_changed(struct domain *d)
>      }
>  }
>  
> -#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION)
> -/*
> - * The hole may be at or above the 44-bit boundary, so we need to determine
> - * the total bit count until reaching 32 significant (not squashed out) bits
> - * in PFN representations.
> - * Note that the way "bits" gets initialized/updated/bounds-checked guarantees
> - * that the function will never return zero, and hence will never be called
> - * more than once (which is important due to it being deliberately placed in
> - * .init.text).
> - */
> -static unsigned int __init noinline _domain_struct_bits(void)
> -{
> -    unsigned int bits = 32 + PAGE_SHIFT;
> -    unsigned int sig = hweight32(~pfn_hole_mask);
> -    unsigned int mask = pfn_hole_mask >> 32;
> -
> -    for ( ; bits < BITS_PER_LONG && sig < 32; ++bits, mask >>= 1 )
> -        if ( !(mask & 1) )
> -            ++sig;
> -
> -    return bits;
> -}
> -#endif
> -

I'm very happy to see this disappear.  Both because of a non-__init
function calling an __init function, and because this internal is just
horrible.

>  struct domain *alloc_domain_struct(void)
>  {
>      struct domain *d;
> @@ -498,14 +474,15 @@ struct domain *alloc_domain_struct(void)
>       * On systems with CONFIG_BIGMEM there's no packing, and so there's no
>       * such restriction.
>       */
> -#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
> -    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
> -                                                          32 + PAGE_SHIFT;
> +#if defined(CONFIG_BIGMEM)
> +    const unsigned int bits = 0;
>  #else
> -    static unsigned int __read_mostly bits;
> +    static unsigned int __ro_after_init bits;
>  
>      if ( unlikely(!bits) )
> -         bits = _domain_struct_bits();
> +         bits = flsl(pfn_to_paddr(pdx_to_pfn(
> +             1UL << (sizeof(((struct page_info *)NULL)->v.inuse._domain) * 8))))
> +             - 1;

I think this would benefit greatly by not being a oneliner.  There's
sizeof_field() which helps a little.

But, isn't this UB with CONFIG_BIGMEM?  You're shifting 1UL by 64.

When __pdx_t is unsigned long, there's no bits restriction necessary. 
Therefore, don't you want !bits && sizeof_field(...) < BYTES_PER_LONG as
the entry criteria?

~Andrew

Re: [PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

Posted by Roger Pau Monné 4 months, 2 weeks ago

On Wed, Jun 11, 2025 at 06:58:31PM +0100, Andrew Cooper wrote:
> On 11/06/2025 6:16 pm, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index 7536b6c8717e..2f438ce367cf 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -461,30 +461,6 @@ void domain_cpu_policy_changed(struct domain *d)
> >      }
> >  }
> >  
> > -#if !defined(CONFIG_BIGMEM) && defined(CONFIG_PDX_COMPRESSION)
> > -/*
> > - * The hole may be at or above the 44-bit boundary, so we need to determine
> > - * the total bit count until reaching 32 significant (not squashed out) bits
> > - * in PFN representations.
> > - * Note that the way "bits" gets initialized/updated/bounds-checked guarantees
> > - * that the function will never return zero, and hence will never be called
> > - * more than once (which is important due to it being deliberately placed in
> > - * .init.text).
> > - */
> > -static unsigned int __init noinline _domain_struct_bits(void)
> > -{
> > -    unsigned int bits = 32 + PAGE_SHIFT;
> > -    unsigned int sig = hweight32(~pfn_hole_mask);
> > -    unsigned int mask = pfn_hole_mask >> 32;
> > -
> > -    for ( ; bits < BITS_PER_LONG && sig < 32; ++bits, mask >>= 1 )
> > -        if ( !(mask & 1) )
> > -            ++sig;
> > -
> > -    return bits;
> > -}
> > -#endif
> > -
> 
> I'm very happy to see this disappear.  Both because of a non-__init
> function calling an __init function, and because this internal is just
> horrible.
> 
> >  struct domain *alloc_domain_struct(void)
> >  {
> >      struct domain *d;
> > @@ -498,14 +474,15 @@ struct domain *alloc_domain_struct(void)
> >       * On systems with CONFIG_BIGMEM there's no packing, and so there's no
> >       * such restriction.
> >       */
> > -#if defined(CONFIG_BIGMEM) || !defined(CONFIG_PDX_COMPRESSION)
> > -    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
> > -                                                          32 + PAGE_SHIFT;
> > +#if defined(CONFIG_BIGMEM)
> > +    const unsigned int bits = 0;
> >  #else
> > -    static unsigned int __read_mostly bits;
> > +    static unsigned int __ro_after_init bits;
> >  
> >      if ( unlikely(!bits) )
> > -         bits = _domain_struct_bits();
> > +         bits = flsl(pfn_to_paddr(pdx_to_pfn(
> > +             1UL << (sizeof(((struct page_info *)NULL)->v.inuse._domain) * 8))))
> > +             - 1;
> 
> I think this would benefit greatly by not being a oneliner.  There's
> sizeof_field() which helps a little.

Oh, I missed that.  I was expecting to find something like
member_size() or similar.

> But, isn't this UB with CONFIG_BIGMEM?  You're shifting 1UL by 64.

CONFIG_BIGMEM doesn't get here (see the #fidef above).

> When __pdx_t is unsigned long, there's no bits restriction necessary. 
> Therefore, don't you want !bits && sizeof_field(...) < BYTES_PER_LONG as
> the entry criteria?

__pdx_t being unsigned long can only happen for CONFIG_BIGMEM, and
that's still handled separately using a #if defined(CONFIG_BIGMEM) ...
(which I should have converted to #ifdef instead).

Thanks, Roger.

[PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary
[PATCH 2/8] pdx: introduce function to calculate max PFN based on PDX compression
[PATCH 3/8] kconfig: turn PDX compression into a choice
[PATCH 4/8] pdx: provide a unified set of unit functions
[PATCH 5/8] pdx: allow optimizing PDX conversion helpers
[PATCH 6/8] pdx: introduce a new compression algorithm based on offsets between regions
[PATCH 7/8] pdx: introduce translation helpers for offset compression
[PATCH 8/8] pdx: introduce a command line option for offset compression