[v2] xen/mm: fix fallout from populate_physmap() deferred scrub change

[PATCH v2 3/3] xen/mm: improve freeing of partially scrubbed pages

Posted by Roger Pau Monne 1 week ago

When freeing possibly partially scrubbed pages in populate_physmap() the
whole page is marked as dirty, but that's not fully accurate.  Since the
PGC_need_scrub bit is preserved for the populate_physmap() allocation we
can use those when freeing to detect which pages need scrubbing instead of
marking the whole page as dirty.

This requires exposing free_heap_pages() globally, and switching
populate_physmap() to use it instead of free_domheap_pages().

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Jan: I'm not sure if that's what you suggested in the review of v1.  I've
added your Suggested-by but I can drop it if that's not what you were
thinking of.
---
 xen/common/memory.c     |  6 +++---
 xen/common/page_alloc.c | 16 +++++++++++++---
 xen/include/xen/mm.h    |  6 ++++++
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 1ad4b51c5b02..68eef8291571 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -177,7 +177,7 @@ static void stash_allocation(struct domain *d, struct page_info *page,
      * interface is designed to be used for single-threaded domain creation.
      */
     if ( d->pending_scrub || d->is_dying )
-        free_domheap_pages(page, order);
+        free_heap_pages(page, order, false);
     else
     {
         d->pending_scrub_index = scrub_index;
@@ -210,7 +210,7 @@ static struct page_info *get_stashed_allocation(struct domain *d,
             *scrub_index = d->pending_scrub_index;
         }
         else
-            free_domheap_pages(d->pending_scrub, d->pending_scrub_order);
+            free_heap_pages(d->pending_scrub, d->pending_scrub_order, false);
 
         /*
          * The caller now owns the page or it has been freed, clear stashed
@@ -391,7 +391,7 @@ static void populate_physmap(struct memop_args *a)
 
                     if ( assign_page(page, a->extent_order, d, memflags) )
                     {
-                        free_domheap_pages(page, a->extent_order);
+                        free_heap_pages(page, a->extent_order, false);
                         goto out;
                     }
                 }
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index b1edef87124f..8fc9b5a27f1b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1529,13 +1529,13 @@ static bool mark_page_free(struct page_info *pg, mfn_t mfn)
 static void free_color_heap_page(struct page_info *pg, bool need_scrub);
 
 /* Free 2^@order set of pages. */
-static void free_heap_pages(
-    struct page_info *pg, unsigned int order, bool need_scrub)
+void free_heap_pages(struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask;
     mfn_t mfn = page_to_mfn(pg);
     unsigned int i, node = mfn_to_nid(mfn);
     unsigned int zone = page_to_zone(pg);
+    unsigned int first_dirty = INVALID_DIRTY_IDX, dirty_cnt = 0;
     bool pg_offlined = false;
 
     ASSERT(order <= MAX_ORDER);
@@ -1552,6 +1552,13 @@ static void free_heap_pages(
             pg[i].count_info |= PGC_need_scrub;
             poison_one_page(&pg[i]);
         }
+        else if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            /* The caller might have returned pages pending scrub. */
+            if ( first_dirty == INVALID_DIRTY_IDX )
+                first_dirty = i;
+            dirty_cnt++;
+        }
 
         if ( pg->count_info & PGC_colored )
         {
@@ -1571,7 +1578,10 @@ static void free_heap_pages(
         pg->u.free.first_dirty = 0;
     }
     else
-        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
+    {
+        node_need_scrub[node] += dirty_cnt;
+        pg->u.free.first_dirty = first_dirty;
+    }
 
     /* Merge chunks as far as possible. */
     while ( order < MAX_ORDER )
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index b80bec00c124..0b192caa07bc 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -153,6 +153,12 @@ unsigned long avail_node_heap_pages(unsigned int nodeid);
 } while ( false )
 #define FREE_DOMHEAP_PAGE(p) FREE_DOMHEAP_PAGES(p, 0)
 
+/*
+ * Most callers should use free_{xen,dom}heap_pages() instead of directly
+ * calling free_heap_pages().
+ */
+void free_heap_pages(struct page_info *pg, unsigned int order, bool need_scrub);
+
 void scrub_one_page(const struct page_info *pg, bool cold);
 
 int online_page(mfn_t mfn, uint32_t *status);
-- 
2.51.0

Re: [PATCH v2 3/3] xen/mm: improve freeing of partially scrubbed pages

Posted by Jan Beulich 1 week ago

On 26.03.2026 09:51, Roger Pau Monne wrote:
> When freeing possibly partially scrubbed pages in populate_physmap() the
> whole page is marked as dirty, but that's not fully accurate.  Since the
> PGC_need_scrub bit is preserved for the populate_physmap() allocation we
> can use those when freeing to detect which pages need scrubbing instead of
> marking the whole page as dirty.
> 
> This requires exposing free_heap_pages() globally, and switching
> populate_physmap() to use it instead of free_domheap_pages().
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Jan: I'm not sure if that's what you suggested in the review of v1.  I've
> added your Suggested-by but I can drop it if that's not what you were
> thinking of.

You're going quite a bit farther. In my comment I really only meant the one
new use you add in patch 2 (in which case no changes to the body of
free_heap_pages() would have been needed, and hence why I thought that it
could maybe be done right there). Up to you whether to keep the tag.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -177,7 +177,7 @@ static void stash_allocation(struct domain *d, struct page_info *page,
>       * interface is designed to be used for single-threaded domain creation.
>       */
>      if ( d->pending_scrub || d->is_dying )
> -        free_domheap_pages(page, order);
> +        free_heap_pages(page, order, false);
>      else
>      {
>          d->pending_scrub_index = scrub_index;
> @@ -210,7 +210,7 @@ static struct page_info *get_stashed_allocation(struct domain *d,
>              *scrub_index = d->pending_scrub_index;
>          }
>          else
> -            free_domheap_pages(d->pending_scrub, d->pending_scrub_order);
> +            free_heap_pages(d->pending_scrub, d->pending_scrub_order, false);
>  
>          /*
>           * The caller now owns the page or it has been freed, clear stashed
> @@ -391,7 +391,7 @@ static void populate_physmap(struct memop_args *a)
>  
>                      if ( assign_page(page, a->extent_order, d, memflags) )
>                      {
> -                        free_domheap_pages(page, a->extent_order);
> +                        free_heap_pages(page, a->extent_order, false);
>                          goto out;
>                      }
>                  }

Along with all of these there's then also domain_pending_scrub_free().

> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -153,6 +153,12 @@ unsigned long avail_node_heap_pages(unsigned int nodeid);
>  } while ( false )
>  #define FREE_DOMHEAP_PAGE(p) FREE_DOMHEAP_PAGES(p, 0)
>  
> +/*
> + * Most callers should use free_{xen,dom}heap_pages() instead of directly
> + * calling free_heap_pages().
> + */
> +void free_heap_pages(struct page_info *pg, unsigned int order, bool need_scrub);

Might we better not put this here, but instead in a private header in common/?

Jan

Re: [PATCH v2 3/3] xen/mm: improve freeing of partially scrubbed pages

Posted by Roger Pau Monné 1 week ago

On Thu, Mar 26, 2026 at 12:50:27PM +0100, Jan Beulich wrote:
> On 26.03.2026 09:51, Roger Pau Monne wrote:
> > When freeing possibly partially scrubbed pages in populate_physmap() the
> > whole page is marked as dirty, but that's not fully accurate.  Since the
> > PGC_need_scrub bit is preserved for the populate_physmap() allocation we
> > can use those when freeing to detect which pages need scrubbing instead of
> > marking the whole page as dirty.
> > 
> > This requires exposing free_heap_pages() globally, and switching
> > populate_physmap() to use it instead of free_domheap_pages().
> > 
> > Suggested-by: Jan Beulich <jbeulich@suse.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Jan: I'm not sure if that's what you suggested in the review of v1.  I've
> > added your Suggested-by but I can drop it if that's not what you were
> > thinking of.
> 
> You're going quite a bit farther. In my comment I really only meant the one
> new use you add in patch 2 (in which case no changes to the body of
> free_heap_pages() would have been needed, and hence why I thought that it
> could maybe be done right there). Up to you whether to keep the tag.

I see, you meant to change the single usage in case assign_page()
fails.  I think going a bit further is fine, seeing the adjustment to
free_heap_pages() is very minimal?

> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -177,7 +177,7 @@ static void stash_allocation(struct domain *d, struct page_info *page,
> >       * interface is designed to be used for single-threaded domain creation.
> >       */
> >      if ( d->pending_scrub || d->is_dying )
> > -        free_domheap_pages(page, order);
> > +        free_heap_pages(page, order, false);
> >      else
> >      {
> >          d->pending_scrub_index = scrub_index;
> > @@ -210,7 +210,7 @@ static struct page_info *get_stashed_allocation(struct domain *d,
> >              *scrub_index = d->pending_scrub_index;
> >          }
> >          else
> > -            free_domheap_pages(d->pending_scrub, d->pending_scrub_order);
> > +            free_heap_pages(d->pending_scrub, d->pending_scrub_order, false);
> >  
> >          /*
> >           * The caller now owns the page or it has been freed, clear stashed
> > @@ -391,7 +391,7 @@ static void populate_physmap(struct memop_args *a)
> >  
> >                      if ( assign_page(page, a->extent_order, d, memflags) )
> >                      {
> > -                        free_domheap_pages(page, a->extent_order);
> > +                        free_heap_pages(page, a->extent_order, false);
> >                          goto out;
> >                      }
> >                  }
> 
> Along with all of these there's then also domain_pending_scrub_free().

Yes, indeed.

> > --- a/xen/include/xen/mm.h
> > +++ b/xen/include/xen/mm.h
> > @@ -153,6 +153,12 @@ unsigned long avail_node_heap_pages(unsigned int nodeid);
> >  } while ( false )
> >  #define FREE_DOMHEAP_PAGE(p) FREE_DOMHEAP_PAGES(p, 0)
> >  
> > +/*
> > + * Most callers should use free_{xen,dom}heap_pages() instead of directly
> > + * calling free_heap_pages().
> > + */
> > +void free_heap_pages(struct page_info *pg, unsigned int order, bool need_scrub);
> 
> Might we better not put this here, but instead in a private header in common/?

No strong opinion.  It could logically be used outside of common in
principle, hence we might end up moving it anyway.  Would you prefer
me to introduce a common/memory.h header with just this prototype?

Thanks, Roger.

Re: [PATCH v2 3/3] xen/mm: improve freeing of partially scrubbed pages

Posted by Jan Beulich 1 week ago

On 26.03.2026 16:53, Roger Pau Monné wrote:
> On Thu, Mar 26, 2026 at 12:50:27PM +0100, Jan Beulich wrote:
>> On 26.03.2026 09:51, Roger Pau Monne wrote:
>>> When freeing possibly partially scrubbed pages in populate_physmap() the
>>> whole page is marked as dirty, but that's not fully accurate.  Since the
>>> PGC_need_scrub bit is preserved for the populate_physmap() allocation we
>>> can use those when freeing to detect which pages need scrubbing instead of
>>> marking the whole page as dirty.
>>>
>>> This requires exposing free_heap_pages() globally, and switching
>>> populate_physmap() to use it instead of free_domheap_pages().
>>>
>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Jan: I'm not sure if that's what you suggested in the review of v1.  I've
>>> added your Suggested-by but I can drop it if that's not what you were
>>> thinking of.
>>
>> You're going quite a bit farther. In my comment I really only meant the one
>> new use you add in patch 2 (in which case no changes to the body of
>> free_heap_pages() would have been needed, and hence why I thought that it
>> could maybe be done right there). Up to you whether to keep the tag.
> 
> I see, you meant to change the single usage in case assign_page()
> fails.  I think going a bit further is fine, seeing the adjustment to
> free_heap_pages() is very minimal?

Oh, yes, sure. I was merely trying to address your remark.

>>> --- a/xen/include/xen/mm.h
>>> +++ b/xen/include/xen/mm.h
>>> @@ -153,6 +153,12 @@ unsigned long avail_node_heap_pages(unsigned int nodeid);
>>>  } while ( false )
>>>  #define FREE_DOMHEAP_PAGE(p) FREE_DOMHEAP_PAGES(p, 0)
>>>  
>>> +/*
>>> + * Most callers should use free_{xen,dom}heap_pages() instead of directly
>>> + * calling free_heap_pages().
>>> + */
>>> +void free_heap_pages(struct page_info *pg, unsigned int order, bool need_scrub);
>>
>> Might we better not put this here, but instead in a private header in common/?
> 
> No strong opinion.  It could logically be used outside of common in
> principle, hence we might end up moving it anyway.  Would you prefer
> me to introduce a common/memory.h header with just this prototype?

It would help if others could voice an opinion. To me exposing this
supposedly internal (to the page allocator) function feels a little
risky. Yet of course any undue use would likely be spotted and objected
to during review.

Jan

[PATCH v2 1/3] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages()
[PATCH v2 2/3] xen/mm: do not assign pages to a domain until they are scrubbed
[PATCH v2 3/3] xen/mm: improve freeing of partially scrubbed pages