[PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()

David Hildenbrand posted 29 patches 3 months, 1 week ago
There is a newer version of this series
[PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by David Hildenbrand 3 months, 1 week ago
Instead, let's check in the callbacks if the page was already destroyed,
which can be checked by looking at zpdesc->zspage (see reset_zpdesc()).

If we detect that the page was destroyed:

(1) Fail isolation, just like the migration core would

(2) Fake migration success just like the migration core would

In the putback case there is nothing to do, as we don't do anything just
like the migration core would do.

In the future, we should look into not letting these pages get destroyed
while they are isolated -- and instead delaying that to the
putback/migration call. Add a TODO for that.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/zsmalloc.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f98747aed4330..72c2b7562c511 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -876,7 +876,6 @@ static void reset_zpdesc(struct zpdesc *zpdesc)
 {
 	struct page *page = zpdesc_page(zpdesc);
 
-	__ClearPageMovable(page);
 	ClearPagePrivate(page);
 	zpdesc->zspage = NULL;
 	zpdesc->next = NULL;
@@ -1715,10 +1714,11 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
 static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
 {
 	/*
-	 * Page is locked so zspage couldn't be destroyed. For detail, look at
-	 * lock_zspage in free_zspage.
+	 * Page is locked so zspage can't be destroyed concurrently
+	 * (see free_zspage()). But if the page was already destroyed
+	 * (see reset_zpdesc()), refuse isolation here.
 	 */
-	return true;
+	return page_zpdesc(page)->zspage;
 }
 
 static int zs_page_migrate(struct page *newpage, struct page *page,
@@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 	unsigned long old_obj, new_obj;
 	unsigned int obj_idx;
 
+	/*
+	 * TODO: nothing prevents a zspage from getting destroyed while
+	 * isolated: we should disallow that and defer it.
+	 */
+	if (!zpdesc->zspage)
+		return MIGRATEPAGE_SUCCESS;
+
 	/* The page is locked, so this pointer must remain valid */
 	zspage = get_zspage(zpdesc);
 	pool = zspage->pool;
-- 
2.49.0
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Sergey Senozhatsky 3 months, 1 week ago
On (25/06/30 14:59), David Hildenbrand wrote:
[..]
>  static int zs_page_migrate(struct page *newpage, struct page *page,
> @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
>  	unsigned long old_obj, new_obj;
>  	unsigned int obj_idx;
>  
> +	/*
> +	 * TODO: nothing prevents a zspage from getting destroyed while
> +	 * isolated: we should disallow that and defer it.
> +	 */

Can you elaborate?
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by David Hildenbrand 3 months, 1 week ago
On 02.07.25 10:11, Sergey Senozhatsky wrote:
> On (25/06/30 14:59), David Hildenbrand wrote:
> [..]
>>   static int zs_page_migrate(struct page *newpage, struct page *page,
>> @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
>>   	unsigned long old_obj, new_obj;
>>   	unsigned int obj_idx;
>>   
>> +	/*
>> +	 * TODO: nothing prevents a zspage from getting destroyed while
>> +	 * isolated: we should disallow that and defer it.
>> +	 */
> 
> Can you elaborate?

We can only free a zspage in free_zspage() while the page is locked.

After we isolated a zspage page for migration (under page lock!), we 
drop the lock again, to retake the lock when trying to migrate it.

That means, there is a window where a zspage can be freed although the 
page is isolated for migration.

While we currently keep that working (as far as I can see), in the 
future we want to remove that support from the core.

So what probably needs to be done is, checking in free_zspage(), whether 
the page is isolated. If isolated, defer freeing to the 
putback/migration call.

That way, it will be clear who the current owner of an object is 
(isolation makes mm core the owner, while putback returns ownership), 
and prepare for some pages to be migrated to have a permanently frozen 
refcount (esp PageOffline pages without any refcount).

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Sergey Senozhatsky 3 months, 1 week ago
On (25/07/02 10:25), David Hildenbrand wrote:
> On 02.07.25 10:11, Sergey Senozhatsky wrote:
> > On (25/06/30 14:59), David Hildenbrand wrote:
> > [..]
> > >   static int zs_page_migrate(struct page *newpage, struct page *page,
> > > @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
> > >   	unsigned long old_obj, new_obj;
> > >   	unsigned int obj_idx;
> > > +	/*
> > > +	 * TODO: nothing prevents a zspage from getting destroyed while
> > > +	 * isolated: we should disallow that and defer it.
> > > +	 */
> > 
> > Can you elaborate?
> 
> We can only free a zspage in free_zspage() while the page is locked.
> 
> After we isolated a zspage page for migration (under page lock!), we drop
                      ^^ a physical page? (IOW zspage chain page?)

> the lock again, to retake the lock when trying to migrate it.
> 
> That means, there is a window where a zspage can be freed although the page
> is isolated for migration.

I see, thanks.  Looks somewhat fragile.  Is this a new thing?

> While we currently keep that working (as far as I can see), in the future we
> want to remove that support from the core.

Maybe comment can more explicitly distinguish zspage isolation and
physical page (zspage chain) isolation?  zspages can get isolated
for compaction (defragmentation), for instance, which is a different
form of isolation.

> So what probably needs to be done is, checking in free_zspage(), whether the
> page is isolated. If isolated, defer freeing to the putback/migration call.

Perhaps.
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by David Hildenbrand 3 months, 1 week ago
On 02.07.25 12:10, Sergey Senozhatsky wrote:
> On (25/07/02 10:25), David Hildenbrand wrote:
>> On 02.07.25 10:11, Sergey Senozhatsky wrote:
>>> On (25/06/30 14:59), David Hildenbrand wrote:
>>> [..]
>>>>    static int zs_page_migrate(struct page *newpage, struct page *page,
>>>> @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
>>>>    	unsigned long old_obj, new_obj;
>>>>    	unsigned int obj_idx;
>>>> +	/*
>>>> +	 * TODO: nothing prevents a zspage from getting destroyed while
>>>> +	 * isolated: we should disallow that and defer it.
>>>> +	 */
>>>
>>> Can you elaborate?
>>
>> We can only free a zspage in free_zspage() while the page is locked.
>>
>> After we isolated a zspage page for migration (under page lock!), we drop
>                        ^^ a physical page? (IOW zspage chain page?)
> 
>> the lock again, to retake the lock when trying to migrate it.
>>
>> That means, there is a window where a zspage can be freed although the page
>> is isolated for migration.
> 
> I see, thanks.  Looks somewhat fragile.  Is this a new thing?

No, it's been like that forever. And I was surprised that only zsmalloc 
behaves that way -- balloon implements isolation as one would expect it 
(disallow freeing while isolated).

> 
>> While we currently keep that working (as far as I can see), in the future we
>> want to remove that support from the core.
> 
> Maybe comment can more explicitly distinguish zspage isolation and
> physical page (zspage chain) isolation?  zspages can get isolated
> for compaction (defragmentation), for instance, which is a different
> form of isolation.

Well, it's confusing, as we have MM compaction (-> migration) and 
apparently zs_compact.

I'll try to clarify that we are talking about isolation for page 
migration purposes.

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Sergey Senozhatsky 3 months, 1 week ago
On (25/07/02 12:55), David Hildenbrand wrote:
> On 02.07.25 12:10, Sergey Senozhatsky wrote:
> > On (25/07/02 10:25), David Hildenbrand wrote:
> > > On 02.07.25 10:11, Sergey Senozhatsky wrote:
> > > > On (25/06/30 14:59), David Hildenbrand wrote:
> > > > [..]
> > > > >    static int zs_page_migrate(struct page *newpage, struct page *page,
> > > > > @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
> > > > >    	unsigned long old_obj, new_obj;
> > > > >    	unsigned int obj_idx;
> > > > > +	/*
> > > > > +	 * TODO: nothing prevents a zspage from getting destroyed while
> > > > > +	 * isolated: we should disallow that and defer it.
> > > > > +	 */
> > > > 
> > > > Can you elaborate?
> > > 
> > > We can only free a zspage in free_zspage() while the page is locked.
> > > 
> > > After we isolated a zspage page for migration (under page lock!), we drop
> >                        ^^ a physical page? (IOW zspage chain page?)
> > 
> > > the lock again, to retake the lock when trying to migrate it.
> > > 
> > > That means, there is a window where a zspage can be freed although the page
> > > is isolated for migration.
> > 
> > I see, thanks.  Looks somewhat fragile.  Is this a new thing?
> 
> No, it's been like that forever. And I was surprised that only zsmalloc
> behaves that way

Oh, that makes two of us.

> > > While we currently keep that working (as far as I can see), in the future we
> > > want to remove that support from the core.
> > 
> > Maybe comment can more explicitly distinguish zspage isolation and
> > physical page (zspage chain) isolation?  zspages can get isolated
> > for compaction (defragmentation), for instance, which is a different
> > form of isolation.
> 
> Well, it's confusing, as we have MM compaction (-> migration) and apparently
> zs_compact.

True.

> I'll try to clarify that we are talking about isolation for page migration
> purposes.

Thanks.
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Sergey Senozhatsky 3 months, 1 week ago
On (25/07/03 11:28), Sergey Senozhatsky wrote:
> > > > > >    static int zs_page_migrate(struct page *newpage, struct page *page,
> > > > > > @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
> > > > > >    	unsigned long old_obj, new_obj;
> > > > > >    	unsigned int obj_idx;
> > > > > > +	/*
> > > > > > +	 * TODO: nothing prevents a zspage from getting destroyed while
> > > > > > +	 * isolated: we should disallow that and defer it.
> > > > > > +	 */
> > > > > 
> > > > > Can you elaborate?
> > > > 
> > > > We can only free a zspage in free_zspage() while the page is locked.
> > > > 
> > > > After we isolated a zspage page for migration (under page lock!), we drop
> > >                        ^^ a physical page? (IOW zspage chain page?)
> > > 
> > > > the lock again, to retake the lock when trying to migrate it.
> > > > 
> > > > That means, there is a window where a zspage can be freed although the page
> > > > is isolated for migration.
> > > 
> > > I see, thanks.  Looks somewhat fragile.  Is this a new thing?
> > 
> > No, it's been like that forever. And I was surprised that only zsmalloc
> > behaves that way
> 
> Oh, that makes two of us.

I sort of wonder if zs_page_migrate() VM_BUG_ON_PAGE() removal and
zspage check addition need to be landed outside of this series, as
a zsmalloc fixup.
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by David Hildenbrand 3 months, 1 week ago
On 03.07.25 05:22, Sergey Senozhatsky wrote:
> On (25/07/03 11:28), Sergey Senozhatsky wrote:
>>>>>>>     static int zs_page_migrate(struct page *newpage, struct page *page,
>>>>>>> @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
>>>>>>>     	unsigned long old_obj, new_obj;
>>>>>>>     	unsigned int obj_idx;
>>>>>>> +	/*
>>>>>>> +	 * TODO: nothing prevents a zspage from getting destroyed while
>>>>>>> +	 * isolated: we should disallow that and defer it.
>>>>>>> +	 */
>>>>>>
>>>>>> Can you elaborate?
>>>>>
>>>>> We can only free a zspage in free_zspage() while the page is locked.
>>>>>
>>>>> After we isolated a zspage page for migration (under page lock!), we drop
>>>>                         ^^ a physical page? (IOW zspage chain page?)
>>>>
>>>>> the lock again, to retake the lock when trying to migrate it.
>>>>>
>>>>> That means, there is a window where a zspage can be freed although the page
>>>>> is isolated for migration.
>>>>
>>>> I see, thanks.  Looks somewhat fragile.  Is this a new thing?
>>>
>>> No, it's been like that forever. And I was surprised that only zsmalloc
>>> behaves that way
>>
>> Oh, that makes two of us.
> 
> I sort of wonder if zs_page_migrate() VM_BUG_ON_PAGE() removal and
> zspage check addition need to be landed outside of this series, as
> a zsmalloc fixup.

Not sure if there is real value for that; given the review status, I 
assume this series won't take too long to be ready for upstream. Of 
course, if that is not the case we could try pulling them out.

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Sergey Senozhatsky 3 months, 1 week ago
On (25/07/03 09:45), David Hildenbrand wrote:
> Not sure if there is real value for that; given the review status, I assume
> this series won't take too long to be ready for upstream. Of course, if that
> is not the case we could try pulling them out.

Sounds good to me.
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Lorenzo Stoakes 3 months, 1 week ago
On Mon, Jun 30, 2025 at 02:59:53PM +0200, David Hildenbrand wrote:
> Instead, let's check in the callbacks if the page was already destroyed,
> which can be checked by looking at zpdesc->zspage (see reset_zpdesc()).
>
> If we detect that the page was destroyed:
>
> (1) Fail isolation, just like the migration core would
>
> (2) Fake migration success just like the migration core would
>
> In the putback case there is nothing to do, as we don't do anything just
> like the migration core would do.
>
> In the future, we should look into not letting these pages get destroyed
> while they are isolated -- and instead delaying that to the
> putback/migration call. Add a TODO for that.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  mm/zsmalloc.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index f98747aed4330..72c2b7562c511 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -876,7 +876,6 @@ static void reset_zpdesc(struct zpdesc *zpdesc)
>  {
>  	struct page *page = zpdesc_page(zpdesc);
>
> -	__ClearPageMovable(page);
>  	ClearPagePrivate(page);
>  	zpdesc->zspage = NULL;
>  	zpdesc->next = NULL;
> @@ -1715,10 +1714,11 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
>  static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
>  {
>  	/*
> -	 * Page is locked so zspage couldn't be destroyed. For detail, look at
> -	 * lock_zspage in free_zspage.
> +	 * Page is locked so zspage can't be destroyed concurrently
> +	 * (see free_zspage()). But if the page was already destroyed
> +	 * (see reset_zpdesc()), refuse isolation here.
>  	 */
> -	return true;
> +	return page_zpdesc(page)->zspage;
>  }
>
>  static int zs_page_migrate(struct page *newpage, struct page *page,
> @@ -1736,6 +1736,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
>  	unsigned long old_obj, new_obj;
>  	unsigned int obj_idx;
>
> +	/*
> +	 * TODO: nothing prevents a zspage from getting destroyed while
> +	 * isolated: we should disallow that and defer it.
> +	 */
> +	if (!zpdesc->zspage)
> +		return MIGRATEPAGE_SUCCESS;
> +
>  	/* The page is locked, so this pointer must remain valid */
>  	zspage = get_zspage(zpdesc);
>  	pool = zspage->pool;
> --
> 2.49.0
>
Re: [PATCH v1 12/29] mm/zsmalloc: stop using __ClearPageMovable()
Posted by Harry Yoo 3 months, 1 week ago
On Mon, Jun 30, 2025 at 02:59:53PM +0200, David Hildenbrand wrote:
> Instead, let's check in the callbacks if the page was already destroyed,
> which can be checked by looking at zpdesc->zspage (see reset_zpdesc()).
> 
> If we detect that the page was destroyed:
> 
> (1) Fail isolation, just like the migration core would
> 
> (2) Fake migration success just like the migration core would
> 
> In the putback case there is nothing to do, as we don't do anything just
> like the migration core would do.
>
> In the future, we should look into not letting these pages get destroyed
> while they are isolated -- and instead delaying that to the
> putback/migration call. Add a TODO for that.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Looks correct to me.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon