[PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order

Roger Pau Monne posted 3 patches 1 week, 3 days ago
There is a newer version of this series
[PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Roger Pau Monne 1 week, 3 days ago
The current logic allows for up to 1G pages to be scrubbed in place, which
can cause the watchdog to trigger in practice.  Reduce the limit for
in-place scrubbed allocations to a newly introduced define:
CONFIG_DIRTY_MAX_ORDER.  This currently defaults to CONFIG_DOMU_MAX_ORDER
on all architectures.  Also introduce a command line option to set the
value.

Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Split from previous patch.
 - Introduce a command line option to set the limit.
---
 docs/misc/xen-command-line.pandoc |  9 +++++++++
 xen/common/page_alloc.c           | 23 ++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 50d7edb2488e..65b4dfc826b5 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1822,6 +1822,15 @@ Specify the deepest C-state CPUs are permitted to be placed in, and
 optionally the maximum sub C-state to be used used.  The latter only applies
 to the highest permitted C-state.
 
+### max-order-dirty
+> `= <integer>`
+
+Specify the maximum allocation order allowed when scrubbing allocated pages
+in-place.  The allocation is non-preemptive, and hence the value must be keep
+low enough to avoid hogging the CPU for too long.
+
+Defaults to `CONFIG_DIRTY_MAX_ORDER` or if unset to `CONFIG_DOMU_MAX_ORDER`.
+
 ### max_gsi_irqs (x86)
 > `= <integer>`
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index c9e82fd7ab62..728b4d6c9861 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -267,6 +267,13 @@ static PAGE_LIST_HEAD(page_offlined_list);
 /* Broken page list, protected by heap_lock. */
 static PAGE_LIST_HEAD(page_broken_list);
 
+/* Maximum order allowed for allocations with MEMF_no_scrub. */
+#ifndef CONFIG_DIRTY_MAX_ORDER
+# define CONFIG_DIRTY_MAX_ORDER CONFIG_DOMU_MAX_ORDER
+#endif
+static unsigned int __ro_after_init dirty_max_order = CONFIG_DIRTY_MAX_ORDER;
+integer_param("max-order-dirty", dirty_max_order);
+
 /*************************
  * BOOT-TIME ALLOCATOR
  */
@@ -1008,7 +1015,13 @@ static struct page_info *alloc_heap_pages(
 
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
     /* Try getting a dirty buddy if we couldn't get a clean one. */
-    if ( !pg && !(memflags & MEMF_no_scrub) )
+    if ( !pg && !(memflags & MEMF_no_scrub) &&
+         /*
+          * Allow any order unscrubbed allocations during boot time, we
+          * compensate by processing softirqs in the scrubbing loop below once
+          * irqs are enabled.
+          */
+         (order <= dirty_max_order || system_state < SYS_STATE_active) )
         pg = get_free_buddy(zone_lo, zone_hi, order,
                             memflags | MEMF_no_scrub, d);
     if ( !pg )
@@ -1117,6 +1130,14 @@ static struct page_info *alloc_heap_pages(
                     scrub_one_page(&pg[i], cold);
 
                 dirty_cnt++;
+
+                /*
+                 * Use SYS_STATE_smp_boot explicitly; ahead of that state
+                 * interrupts are disabled.
+                 */
+                if ( system_state == SYS_STATE_smp_boot &&
+                     !(dirty_cnt & 0xff) )
+                    process_pending_softirqs();
             }
             else
                 check_one_page(&pg[i]);
-- 
2.51.0


Re: [PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Jan Beulich 6 days, 15 hours ago
On 15.01.2026 12:18, Roger Pau Monne wrote:
> The current logic allows for up to 1G pages to be scrubbed in place, which
> can cause the watchdog to trigger in practice.  Reduce the limit for
> in-place scrubbed allocations to a newly introduced define:
> CONFIG_DIRTY_MAX_ORDER.  This currently defaults to CONFIG_DOMU_MAX_ORDER
> on all architectures.  Also introduce a command line option to set the
> value.
> 
> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Split from previous patch.
>  - Introduce a command line option to set the limit.
> ---
>  docs/misc/xen-command-line.pandoc |  9 +++++++++
>  xen/common/page_alloc.c           | 23 ++++++++++++++++++++++-
>  2 files changed, 31 insertions(+), 1 deletion(-)

If you confine the change to page_alloc.c, won't this mean that patch 2's
passing of MEMF_no_scrub will then also be bounded (in which case the need
for patch 2 would largely disappear)?

> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1822,6 +1822,15 @@ Specify the deepest C-state CPUs are permitted to be placed in, and
>  optionally the maximum sub C-state to be used used.  The latter only applies
>  to the highest permitted C-state.
>  
> +### max-order-dirty
> +> `= <integer>`
> +
> +Specify the maximum allocation order allowed when scrubbing allocated pages
> +in-place.  The allocation is non-preemptive, and hence the value must be keep
> +low enough to avoid hogging the CPU for too long.
> +
> +Defaults to `CONFIG_DIRTY_MAX_ORDER` or if unset to `CONFIG_DOMU_MAX_ORDER`.

This may end up misleading, as - despite their names - these aren't really
Kconfig settings that people could easily control in their builds.

>  ### max_gsi_irqs (x86)
>  > `= <integer>`

I also wonder whether your addition wouldn't more naturally go a litter
further down, by assuming / implying that the sorting used largely ignores
separator characters (underscore vs dash here).

Jan

Re: [PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Roger Pau Monné 3 days, 18 hours ago
On Mon, Jan 19, 2026 at 05:13:25PM +0100, Jan Beulich wrote:
> On 15.01.2026 12:18, Roger Pau Monne wrote:
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -1822,6 +1822,15 @@ Specify the deepest C-state CPUs are permitted to be placed in, and
> >  optionally the maximum sub C-state to be used used.  The latter only applies
> >  to the highest permitted C-state.
> >  
> > +### max-order-dirty
> > +> `= <integer>`
> > +
> > +Specify the maximum allocation order allowed when scrubbing allocated pages
> > +in-place.  The allocation is non-preemptive, and hence the value must be keep
> > +low enough to avoid hogging the CPU for too long.
> > +
> > +Defaults to `CONFIG_DIRTY_MAX_ORDER` or if unset to `CONFIG_DOMU_MAX_ORDER`.
> 
> This may end up misleading, as - despite their names - these aren't really
> Kconfig settings that people could easily control in their builds.

But those have different default values depending on the architecture,
hence I didn't know what else to reference to as the default.  I'm
open to suggestions, but I think we need to reference some default
value so the user knows where to look for.

> >  ### max_gsi_irqs (x86)
> >  > `= <integer>`
> 
> I also wonder whether your addition wouldn't more naturally go a litter
> further down, by assuming / implying that the sorting used largely ignores
> separator characters (underscore vs dash here).

My bad, I think I've originally named it max-dirty-order and forgot to
move it down when renaming to max-order-dirty.

Thanks, Roger.
Re: [PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Jan Beulich 3 days, 18 hours ago
On 22.01.2026 13:55, Roger Pau Monné wrote:
> On Mon, Jan 19, 2026 at 05:13:25PM +0100, Jan Beulich wrote:
>> On 15.01.2026 12:18, Roger Pau Monne wrote:
>>> --- a/docs/misc/xen-command-line.pandoc
>>> +++ b/docs/misc/xen-command-line.pandoc
>>> @@ -1822,6 +1822,15 @@ Specify the deepest C-state CPUs are permitted to be placed in, and
>>>  optionally the maximum sub C-state to be used used.  The latter only applies
>>>  to the highest permitted C-state.
>>>  
>>> +### max-order-dirty
>>> +> `= <integer>`
>>> +
>>> +Specify the maximum allocation order allowed when scrubbing allocated pages
>>> +in-place.  The allocation is non-preemptive, and hence the value must be keep
>>> +low enough to avoid hogging the CPU for too long.
>>> +
>>> +Defaults to `CONFIG_DIRTY_MAX_ORDER` or if unset to `CONFIG_DOMU_MAX_ORDER`.
>>
>> This may end up misleading, as - despite their names - these aren't really
>> Kconfig settings that people could easily control in their builds.
> 
> But those have different default values depending on the architecture,
> hence I didn't know what else to reference to as the default.  I'm
> open to suggestions, but I think we need to reference some default
> value so the user knows where to look for.

I agree something needs saying. In the absence of anything better we may be
able to think of, perhaps simply clarify that these are #define-s in source,
not Kconfig settings?

Jan

Re: [PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Jan Beulich 6 days ago
On 19.01.2026 17:13, Jan Beulich wrote:
> On 15.01.2026 12:18, Roger Pau Monne wrote:
>> The current logic allows for up to 1G pages to be scrubbed in place, which
>> can cause the watchdog to trigger in practice.  Reduce the limit for
>> in-place scrubbed allocations to a newly introduced define:
>> CONFIG_DIRTY_MAX_ORDER.  This currently defaults to CONFIG_DOMU_MAX_ORDER
>> on all architectures.  Also introduce a command line option to set the
>> value.
>>
>> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> Changes since v1:
>>  - Split from previous patch.
>>  - Introduce a command line option to set the limit.
>> ---
>>  docs/misc/xen-command-line.pandoc |  9 +++++++++
>>  xen/common/page_alloc.c           | 23 ++++++++++++++++++++++-
>>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> If you confine the change to page_alloc.c, won't this mean that patch 2's
> passing of MEMF_no_scrub will then also be bounded (in which case the need
> for patch 2 would largely disappear)?

This was rubbish, sorry. Besides my being thick-headed I can only attribute
this to the double negation in !(memflags & MEMF_no_scrub).

I have another concern, though: You effectively undermine ptdom_max_order,
which is even more of a problem as that would also affect Dom0's ability to
obtain larger contiguous I/O buffers. Perhaps DIRTY_MAX_ORDER ought to
default to PTDOM_MAX_ORDER (if HAS_PASSTHROUGH)? Yet then command line
options may also need tying together, such that people using
"memop-max-order=" to alter (increase) ptdom_max_order won't need to
additionally use "max-order-dirty="? At which point maybe the new option
shouldn't be a standalone one, but be added to "memop-max-order=" (despite
it being effected in alloc_heap_pages())?

Jan

Re: [PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Roger Pau Monné 3 days, 18 hours ago
On Tue, Jan 20, 2026 at 08:25:49AM +0100, Jan Beulich wrote:
> On 19.01.2026 17:13, Jan Beulich wrote:
> > On 15.01.2026 12:18, Roger Pau Monne wrote:
> >> The current logic allows for up to 1G pages to be scrubbed in place, which
> >> can cause the watchdog to trigger in practice.  Reduce the limit for
> >> in-place scrubbed allocations to a newly introduced define:
> >> CONFIG_DIRTY_MAX_ORDER.  This currently defaults to CONFIG_DOMU_MAX_ORDER
> >> on all architectures.  Also introduce a command line option to set the
> >> value.
> >>
> >> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> ---
> >> Changes since v1:
> >>  - Split from previous patch.
> >>  - Introduce a command line option to set the limit.
> >> ---
> >>  docs/misc/xen-command-line.pandoc |  9 +++++++++
> >>  xen/common/page_alloc.c           | 23 ++++++++++++++++++++++-
> >>  2 files changed, 31 insertions(+), 1 deletion(-)
> > 
> > If you confine the change to page_alloc.c, won't this mean that patch 2's
> > passing of MEMF_no_scrub will then also be bounded (in which case the need
> > for patch 2 would largely disappear)?
> 
> This was rubbish, sorry. Besides my being thick-headed I can only attribute
> this to the double negation in !(memflags & MEMF_no_scrub).
> 
> I have another concern, though: You effectively undermine ptdom_max_order,
> which is even more of a problem as that would also affect Dom0's ability to
> obtain larger contiguous I/O buffers. Perhaps DIRTY_MAX_ORDER ought to
> default to PTDOM_MAX_ORDER (if HAS_PASSTHROUGH)?

OK, yes, I can default to PTDOM_MAX_ORDER instead of DOMU_MAX_ORDER.

> Yet then command line
> options may also need tying together, such that people using
> "memop-max-order=" to alter (increase) ptdom_max_order won't need to
> additionally use "max-order-dirty="? At which point maybe the new option
> shouldn't be a standalone one, but be added to "memop-max-order=" (despite
> it being effected in alloc_heap_pages())?

I had concerns about adding it to "memop-max-order=" because it's effect
is not limited to "issued by the various kinds of domain", this is an
option that affects all allocations.  I could try expanding the option
description to reflect that, but I wasn't sure whether it would lead
to confusion (as all options there are per-domain currently).

Also if added to "memop-max-order=" the parsing function needs to be
adjust a bit to consume an extra parameter in the !HAS_PASSTHROUGH
case (which is not much of an issue).

Thanks, Roger.

Re: [PATCH v2 3/3] xen/mm: limit non-scrubbed allocations to a specific order
Posted by Jan Beulich 3 days, 18 hours ago
On 22.01.2026 14:05, Roger Pau Monné wrote:
> On Tue, Jan 20, 2026 at 08:25:49AM +0100, Jan Beulich wrote:
>> On 19.01.2026 17:13, Jan Beulich wrote:
>>> On 15.01.2026 12:18, Roger Pau Monne wrote:
>>>> The current logic allows for up to 1G pages to be scrubbed in place, which
>>>> can cause the watchdog to trigger in practice.  Reduce the limit for
>>>> in-place scrubbed allocations to a newly introduced define:
>>>> CONFIG_DIRTY_MAX_ORDER.  This currently defaults to CONFIG_DOMU_MAX_ORDER
>>>> on all architectures.  Also introduce a command line option to set the
>>>> value.
>>>>
>>>> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> ---
>>>> Changes since v1:
>>>>  - Split from previous patch.
>>>>  - Introduce a command line option to set the limit.
>>>> ---
>>>>  docs/misc/xen-command-line.pandoc |  9 +++++++++
>>>>  xen/common/page_alloc.c           | 23 ++++++++++++++++++++++-
>>>>  2 files changed, 31 insertions(+), 1 deletion(-)
>>>
>>> If you confine the change to page_alloc.c, won't this mean that patch 2's
>>> passing of MEMF_no_scrub will then also be bounded (in which case the need
>>> for patch 2 would largely disappear)?
>>
>> This was rubbish, sorry. Besides my being thick-headed I can only attribute
>> this to the double negation in !(memflags & MEMF_no_scrub).
>>
>> I have another concern, though: You effectively undermine ptdom_max_order,
>> which is even more of a problem as that would also affect Dom0's ability to
>> obtain larger contiguous I/O buffers. Perhaps DIRTY_MAX_ORDER ought to
>> default to PTDOM_MAX_ORDER (if HAS_PASSTHROUGH)?
> 
> OK, yes, I can default to PTDOM_MAX_ORDER instead of DOMU_MAX_ORDER.
> 
>> Yet then command line
>> options may also need tying together, such that people using
>> "memop-max-order=" to alter (increase) ptdom_max_order won't need to
>> additionally use "max-order-dirty="? At which point maybe the new option
>> shouldn't be a standalone one, but be added to "memop-max-order=" (despite
>> it being effected in alloc_heap_pages())?
> 
> I had concerns about adding it to "memop-max-order=" because it's effect
> is not limited to "issued by the various kinds of domain", this is an
> option that affects all allocations.  I could try expanding the option
> description to reflect that, but I wasn't sure whether it would lead
> to confusion (as all options there are per-domain currently).

Hmm, fair point. Let's keep it separate then.

Jan