[v4] support large folio swap-out and swap-in for shmem

[PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Baolin Wang 1 year, 6 months ago

Page reclaim will not scan anon LRU if no swap space, however MADV_PAGEOUT
can still split shmem large folios even without a swap device. Thus add
swap available space validation before spliting shmem large folio to
avoid redundant split.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/vmscan.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 31d13462571e..796f65781f4f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			}
 		} else if (folio_test_swapbacked(folio) &&
 			   folio_test_large(folio)) {
+
+			/*
+			 * Do not split shmem folio if no swap memory
+			 * available.
+			 */
+			if (!total_swap_pages)
+				goto activate_locked;
+
 			/* Split shmem folio */
 			if (split_folio_to_list(folio, folio_list))
 				goto keep_locked;
-- 
2.39.3

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by David Hildenbrand 1 year, 6 months ago

On 07.08.24 09:31, Baolin Wang wrote:
> Page reclaim will not scan anon LRU if no swap space, however MADV_PAGEOUT
> can still split shmem large folios even without a swap device. Thus add
> swap available space validation before spliting shmem large folio to
> avoid redundant split.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>   mm/vmscan.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 31d13462571e..796f65781f4f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   			}
>   		} else if (folio_test_swapbacked(folio) &&
>   			   folio_test_large(folio)) {
> +
> +			/*
> +			 * Do not split shmem folio if no swap memory
> +			 * available.
> +			 */
> +			if (!total_swap_pages)
> +				goto activate_locked;
> +
>   			/* Split shmem folio */
>   			if (split_folio_to_list(folio, folio_list))
>   				goto keep_locked;

Reminds me of

commit 9a976f0c847b67d22ed694556a3626ed92da0422
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Thu Mar 9 15:05:43 2023 -0800

     shmem: skip page split if we're not reclaiming
     
     In theory when info->flags & VM_LOCKED we should not be getting
     shem_writepage() called so we should be verifying this with a
     WARN_ON_ONCE().  Since we should not be swapping then best to ensure we
     also don't do the folio split earlier too.  So just move the check early
     to avoid folio splits in case its a dubious call.
     
     We also have a similar early bail when !total_swap_pages so just move that
     earlier to avoid the possible folio split in the same situation.


But indeed, pageout() -> writepage() is called *after* the split in the vmscan path.

In that "noswap" context, I wonder if we also want to skip folios part of shmem
with disabled swapping?

But now I am wondering under which circumstances we end up calling
shmem_writepage() with a large folio. And I think the answer is the comment of
folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.


... so if shmem_writepage() handles+checks that, could we do

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a332cb80e928..7dfa3d6e8ba7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
                                                 goto activate_locked_split;
                                 }
                         }
-               } else if (folio_test_swapbacked(folio) &&
-                          folio_test_large(folio)) {
-                       /* Split shmem folio */
-                       if (split_folio_to_list(folio, folio_list))
-                               goto keep_locked;
                 }
  
                 /*

instead?

-- 
Cheers,

David / dhildenb

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Baolin Wang 1 year, 6 months ago


On 2024/8/7 23:53, David Hildenbrand wrote:
> On 07.08.24 09:31, Baolin Wang wrote:
>> Page reclaim will not scan anon LRU if no swap space, however 
>> MADV_PAGEOUT
>> can still split shmem large folios even without a swap device. Thus add
>> swap available space validation before spliting shmem large folio to
>> avoid redundant split.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/vmscan.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 31d13462571e..796f65781f4f 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct 
>> list_head *folio_list,
>>               }
>>           } else if (folio_test_swapbacked(folio) &&
>>                  folio_test_large(folio)) {
>> +
>> +            /*
>> +             * Do not split shmem folio if no swap memory
>> +             * available.
>> +             */
>> +            if (!total_swap_pages)
>> +                goto activate_locked;
>> +
>>               /* Split shmem folio */
>>               if (split_folio_to_list(folio, folio_list))
>>                   goto keep_locked;
> 
> Reminds me of
> 
> commit 9a976f0c847b67d22ed694556a3626ed92da0422
> Author: Luis Chamberlain <mcgrof@kernel.org>
> Date:   Thu Mar 9 15:05:43 2023 -0800
> 
>      shmem: skip page split if we're not reclaiming
>      In theory when info->flags & VM_LOCKED we should not be getting
>      shem_writepage() called so we should be verifying this with a
>      WARN_ON_ONCE().  Since we should not be swapping then best to 
> ensure we
>      also don't do the folio split earlier too.  So just move the check 
> early
>      to avoid folio splits in case its a dubious call.
>      We also have a similar early bail when !total_swap_pages so just 
> move that
>      earlier to avoid the possible folio split in the same situation.
> 
> 
> But indeed, pageout() -> writepage() is called *after* the split in the 
> vmscan path.
> 
> In that "noswap" context, I wonder if we also want to skip folios part 
> of shmem
> with disabled swapping?

Yes, I think so.

> 
> But now I am wondering under which circumstances we end up calling
> shmem_writepage() with a large folio. And I think the answer is the 
> comment of
> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> 
> 
> ... so if shmem_writepage() handles+checks that, could we do
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a332cb80e928..7dfa3d6e8ba7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct 
> list_head *folio_list,
>                                                  goto 
> activate_locked_split;
>                                  }
>                          }
> -               } else if (folio_test_swapbacked(folio) &&
> -                          folio_test_large(folio)) {
> -                       /* Split shmem folio */
> -                       if (split_folio_to_list(folio, folio_list))
> -                               goto keep_locked;
>                  }
> 
>                  /*
> 
> instead?

Seems reasonable to me. But we should pass the 'folio_list' to 
shmem_writepage() to list the subpages of the large folio. Let me try. 
Thanks.

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Matthew Wilcox 1 year, 6 months ago

On Thu, Aug 08, 2024 at 10:36:23AM +0800, Baolin Wang wrote:
> On 2024/8/7 23:53, David Hildenbrand wrote:
> > But now I am wondering under which circumstances we end up calling
> > shmem_writepage() with a large folio. And I think the answer is the
> > comment of
> > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > 
> > 
> > ... so if shmem_writepage() handles+checks that, could we do
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a332cb80e928..7dfa3d6e8ba7 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> >                                                  goto
> > activate_locked_split;
> >                                  }
> >                          }
> > -               } else if (folio_test_swapbacked(folio) &&
> > -                          folio_test_large(folio)) {
> > -                       /* Split shmem folio */
> > -                       if (split_folio_to_list(folio, folio_list))
> > -                               goto keep_locked;
> >                  }
> > 
> >                  /*
> > 
> > instead?
> 
> Seems reasonable to me. But we should pass the 'folio_list' to
> shmem_writepage() to list the subpages of the large folio. Let me try.
> Thanks.

We should be trying to remove shmem_writepage(), not make it more
complex.  We're making good progress removing instances of ->writepage;
just ceph, ecryptfs, f2fs, gfs2, hostfs, nilfs2, orangefs, vboxsf, shmem
& swap are left.  gfs2 patches are out for review.

As you can see from previous patches, the approach is to use
->writepages instead of ->writepage.  There should be no need to
handle a folio split list as splitting a folio leaves the folios in the
page cache and they'll naturally be found by subsequent iterations.

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Baolin Wang 1 year, 6 months ago

On 2024/8/8 20:35, Matthew Wilcox wrote:
> On Thu, Aug 08, 2024 at 10:36:23AM +0800, Baolin Wang wrote:
>> On 2024/8/7 23:53, David Hildenbrand wrote:
>>> But now I am wondering under which circumstances we end up calling
>>> shmem_writepage() with a large folio. And I think the answer is the
>>> comment of
>>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>>
>>>
>>> ... so if shmem_writepage() handles+checks that, could we do
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a332cb80e928..7dfa3d6e8ba7 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>>                                                   goto
>>> activate_locked_split;
>>>                                   }
>>>                           }
>>> -               } else if (folio_test_swapbacked(folio) &&
>>> -                          folio_test_large(folio)) {
>>> -                       /* Split shmem folio */
>>> -                       if (split_folio_to_list(folio, folio_list))
>>> -                               goto keep_locked;
>>>                   }
>>>
>>>                   /*
>>>
>>> instead?
>>
>> Seems reasonable to me. But we should pass the 'folio_list' to
>> shmem_writepage() to list the subpages of the large folio. Let me try.
>> Thanks.
> 
> We should be trying to remove shmem_writepage(), not make it more
> complex.  We're making good progress removing instances of ->writepage;
> just ceph, ecryptfs, f2fs, gfs2, hostfs, nilfs2, orangefs, vboxsf, shmem
> & swap are left.  gfs2 patches are out for review.

I am afraid shmem is a bit special. IIUC, ->writepages() is used to 
write back some dirty pages from the mapping by the writeback flusher 
thread, but shmem cannot be written back (mapping_can_writeback() will 
return false). Therefore, shmem can only be reclaimed through direct 
reclaim or kswapd if a swap device is set up (if no swap, shmem should 
always be kept in memory). So currently, we should still keep 
shmem_writepage() to reclaim shmem pages.

> As you can see from previous patches, the approach is to use
> ->writepages instead of ->writepage.  There should be no need to
> handle a folio split list as splitting a folio leaves the folios in the
> page cache and they'll naturally be found by subsequent iterations.

Again, shmem is special. If shmem folio is reclaimable (when a swap 
device is set up), we need to allocate contiguous swap entries for large 
folios. However, if there is significant fragmentation of swap entries 
(there is already a topic to talk about this issue), we will not able to 
allocate contiguous swap entries for large shmem folios. Therefore, in 
this case, it is necessary to split the large shmem folio in order to 
try to allocate a singe swap entry for reclaiming shmem.

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by David Hildenbrand 1 year, 6 months ago

On 08.08.24 04:36, Baolin Wang wrote:
> 
> 
> On 2024/8/7 23:53, David Hildenbrand wrote:
>> On 07.08.24 09:31, Baolin Wang wrote:
>>> Page reclaim will not scan anon LRU if no swap space, however
>>> MADV_PAGEOUT
>>> can still split shmem large folios even without a swap device. Thus add
>>> swap available space validation before spliting shmem large folio to
>>> avoid redundant split.
>>>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>>    mm/vmscan.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 31d13462571e..796f65781f4f 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>>                }
>>>            } else if (folio_test_swapbacked(folio) &&
>>>                   folio_test_large(folio)) {
>>> +
>>> +            /*
>>> +             * Do not split shmem folio if no swap memory
>>> +             * available.
>>> +             */
>>> +            if (!total_swap_pages)
>>> +                goto activate_locked;
>>> +
>>>                /* Split shmem folio */
>>>                if (split_folio_to_list(folio, folio_list))
>>>                    goto keep_locked;
>>
>> Reminds me of
>>
>> commit 9a976f0c847b67d22ed694556a3626ed92da0422
>> Author: Luis Chamberlain <mcgrof@kernel.org>
>> Date:   Thu Mar 9 15:05:43 2023 -0800
>>
>>       shmem: skip page split if we're not reclaiming
>>       In theory when info->flags & VM_LOCKED we should not be getting
>>       shem_writepage() called so we should be verifying this with a
>>       WARN_ON_ONCE().  Since we should not be swapping then best to
>> ensure we
>>       also don't do the folio split earlier too.  So just move the check
>> early
>>       to avoid folio splits in case its a dubious call.
>>       We also have a similar early bail when !total_swap_pages so just
>> move that
>>       earlier to avoid the possible folio split in the same situation.
>>
>>
>> But indeed, pageout() -> writepage() is called *after* the split in the
>> vmscan path.
>>
>> In that "noswap" context, I wonder if we also want to skip folios part
>> of shmem
>> with disabled swapping?
> 
> Yes, I think so.
> 
>>
>> But now I am wondering under which circumstances we end up calling
>> shmem_writepage() with a large folio. And I think the answer is the
>> comment of
>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>
>>
>> ... so if shmem_writepage() handles+checks that, could we do
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a332cb80e928..7dfa3d6e8ba7 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>>                                                   goto
>> activate_locked_split;
>>                                   }
>>                           }
>> -               } else if (folio_test_swapbacked(folio) &&
>> -                          folio_test_large(folio)) {
>> -                       /* Split shmem folio */
>> -                       if (split_folio_to_list(folio, folio_list))
>> -                               goto keep_locked;
>>                   }
>>
>>                   /*
>>
>> instead?
> 
> Seems reasonable to me. But we should pass the 'folio_list' to
> shmem_writepage() to list the subpages of the large folio. Let me try.

Ah, yes, good point. Alternatively, we'd have to split and try writing 
all subpages in there. I wonder what to do if we fail to write some, and 
if we could handle that transparently, without the folio_list.

-- 
Cheers,

David / dhildenb

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Baolin Wang 1 year, 6 months ago


On 2024/8/8 16:51, David Hildenbrand wrote:
> On 08.08.24 04:36, Baolin Wang wrote:
>>
>>
>> On 2024/8/7 23:53, David Hildenbrand wrote:
>>> On 07.08.24 09:31, Baolin Wang wrote:
>>>> Page reclaim will not scan anon LRU if no swap space, however
>>>> MADV_PAGEOUT
>>>> can still split shmem large folios even without a swap device. Thus add
>>>> swap available space validation before spliting shmem large folio to
>>>> avoid redundant split.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>>    mm/vmscan.c | 8 ++++++++
>>>>    1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 31d13462571e..796f65781f4f 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>>>> list_head *folio_list,
>>>>                }
>>>>            } else if (folio_test_swapbacked(folio) &&
>>>>                   folio_test_large(folio)) {
>>>> +
>>>> +            /*
>>>> +             * Do not split shmem folio if no swap memory
>>>> +             * available.
>>>> +             */
>>>> +            if (!total_swap_pages)
>>>> +                goto activate_locked;
>>>> +
>>>>                /* Split shmem folio */
>>>>                if (split_folio_to_list(folio, folio_list))
>>>>                    goto keep_locked;
>>>
>>> Reminds me of
>>>
>>> commit 9a976f0c847b67d22ed694556a3626ed92da0422
>>> Author: Luis Chamberlain <mcgrof@kernel.org>
>>> Date:   Thu Mar 9 15:05:43 2023 -0800
>>>
>>>       shmem: skip page split if we're not reclaiming
>>>       In theory when info->flags & VM_LOCKED we should not be getting
>>>       shem_writepage() called so we should be verifying this with a
>>>       WARN_ON_ONCE().  Since we should not be swapping then best to
>>> ensure we
>>>       also don't do the folio split earlier too.  So just move the check
>>> early
>>>       to avoid folio splits in case its a dubious call.
>>>       We also have a similar early bail when !total_swap_pages so just
>>> move that
>>>       earlier to avoid the possible folio split in the same situation.
>>>
>>>
>>> But indeed, pageout() -> writepage() is called *after* the split in the
>>> vmscan path.
>>>
>>> In that "noswap" context, I wonder if we also want to skip folios part
>>> of shmem
>>> with disabled swapping?
>>
>> Yes, I think so.
>>
>>>
>>> But now I am wondering under which circumstances we end up calling
>>> shmem_writepage() with a large folio. And I think the answer is the
>>> comment of
>>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>>
>>>
>>> ... so if shmem_writepage() handles+checks that, could we do
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a332cb80e928..7dfa3d6e8ba7 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>>                                                   goto
>>> activate_locked_split;
>>>                                   }
>>>                           }
>>> -               } else if (folio_test_swapbacked(folio) &&
>>> -                          folio_test_large(folio)) {
>>> -                       /* Split shmem folio */
>>> -                       if (split_folio_to_list(folio, folio_list))
>>> -                               goto keep_locked;
>>>                   }
>>>
>>>                   /*
>>>
>>> instead?
>>
>> Seems reasonable to me. But we should pass the 'folio_list' to
>> shmem_writepage() to list the subpages of the large folio. Let me try.
> 
> Ah, yes, good point. Alternatively, we'd have to split and try writing 
> all subpages in there. I wonder what to do if we fail to write some, and 
> if we could handle that transparently, without the folio_list.

After some investigation, I prefer to pass 'folio_list' to 
shmem_writepage() via 'struct writeback_control', which could simplify 
the logic a lot. Otherwise, we need to handle each subpage's 
writeback/reclaim/dirty state, as well as tracking each subpage's write 
result, which seems more complicated.

I made a quick change by passing 'folio_list', and it looks simple and 
works as expected.

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 75196b0f894f..10100e22d5c6 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -80,6 +80,9 @@ struct writeback_control {
          */
         struct swap_iocb **swap_plug;

+       /* Target list for splitting a large folio */
+       struct list_head *list;
+
         /* internal fields used by the ->writepages implementation: */
         struct folio_batch fbatch;
         pgoff_t index;
diff --git a/mm/shmem.c b/mm/shmem.c
index 05525e9e7423..0a5a68f7d0a0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, 
struct writeback_control *wbc)
          * and its shmem_writeback() needs them to be split when swapping.
          */
         if (wbc->split_large_folio && folio_test_large(folio)) {
+try_split:
                 /* Ensure the subpages are still dirty */
                 folio_test_set_dirty(folio);
-               if (split_huge_page(page) < 0)
+               if (split_huge_page_to_list_to_order(page, wbc->list, 0))
                         goto redirty;
                 folio = page_folio(page);
                 folio_clear_dirty(folio);
@@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, 
struct writeback_control *wbc)
         }

         swap = folio_alloc_swap(folio);
-       if (!swap.val)
+       if (!swap.val) {
+               if (nr_pages > 1)
+                       goto try_split;
+
                 goto redirty;
+       }

         /*
          * Add inode to shmem_unuse()'s list of swapped-out inodes,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 277571815789..cf982cf2454f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -628,7 +628,7 @@ typedef enum {
   * Calls ->writepage().
   */
  static pageout_t pageout(struct folio *folio, struct address_space 
*mapping,
-                        struct swap_iocb **plug)
+                        struct swap_iocb **plug, struct list_head 
*folio_list)
  {
         /*
          * If the folio is dirty, only perform writeback if that write
@@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, 
struct address_space *mapping,
                         .swap_plug = plug,
                 };

+               if (shmem_mapping(mapping)) {
+                       wbc.list = folio_list;
+                       wbc.split_large_folio = 
!IS_ENABLED(CONFIG_THP_SWAP);
+               }
+
                 folio_set_reclaim(folio);
                 res = mapping->a_ops->writepage(&folio->page, &wbc);
                 if (res < 0)
@@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct 
list_head *folio_list,
                                                 goto activate_locked_split;
                                 }
                         }
-               } else if (folio_test_swapbacked(folio) &&
-                          folio_test_large(folio)) {
-
-                       /*
-                        * Do not split shmem folio if no swap memory
-                        * available.
-                        */
-                       if (!total_swap_pages)
-                               goto activate_locked;
-
-                       /*
-                        * Only split shmem folio when CONFIG_THP_SWAP
-                        * is not enabled.
-                        */
-                       if (!IS_ENABLED(CONFIG_THP_SWAP) &&
-                           split_folio_to_list(folio, folio_list))
-                               goto keep_locked;
                 }

                 /*
@@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct 
list_head *folio_list,
                          * starts and then write it out here.
                          */
                         try_to_unmap_flush_dirty();
-try_pageout:
-                       switch (pageout(folio, mapping, &plug)) {
+                       switch (pageout(folio, mapping, &plug, 
folio_list)) {
                         case PAGE_KEEP:
                                 goto keep_locked;
                         case PAGE_ACTIVATE:
-                               if (shmem_mapping(mapping) && 
folio_test_large(folio) &&
-                                   !split_folio_to_list(folio, 
folio_list)) {
+                               /* Shmem can be split when writeback to 
swap */
+                               if ((nr_pages > 1) && 
!folio_test_large(folio)) {
+                                       sc->nr_scanned -= (nr_pages - 1);
                                         nr_pages = 1;
-                                       goto try_pageout;
                                 }
                                 goto activate_locked;
                         case PAGE_SUCCESS:
+                               if ((nr_pages > 1) && 
!folio_test_large(folio)) {
+                                       sc->nr_scanned -= (nr_pages - 1);
+                                       nr_pages = 1;
+                               }
                                 stat->nr_pageout += nr_pages;

                                 if (folio_test_writeback(folio)) {

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Daniel Gomez 1 year, 6 months ago

On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
> 
> 
> On 2024/8/8 16:51, David Hildenbrand wrote:
> > On 08.08.24 04:36, Baolin Wang wrote:
> > > 
> > > 
> > > On 2024/8/7 23:53, David Hildenbrand wrote:
> > > > On 07.08.24 09:31, Baolin Wang wrote:
> > > > > Page reclaim will not scan anon LRU if no swap space, however
> > > > > MADV_PAGEOUT
> > > > > can still split shmem large folios even without a swap device. Thus add
> > > > > swap available space validation before spliting shmem large folio to
> > > > > avoid redundant split.
> > > > > 
> > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > ---
> > > > >    mm/vmscan.c | 8 ++++++++
> > > > >    1 file changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 31d13462571e..796f65781f4f 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
> > > > > list_head *folio_list,
> > > > >                }
> > > > >            } else if (folio_test_swapbacked(folio) &&
> > > > >                   folio_test_large(folio)) {
> > > > > +
> > > > > +            /*
> > > > > +             * Do not split shmem folio if no swap memory
> > > > > +             * available.
> > > > > +             */
> > > > > +            if (!total_swap_pages)
> > > > > +                goto activate_locked;
> > > > > +
> > > > >                /* Split shmem folio */
> > > > >                if (split_folio_to_list(folio, folio_list))
> > > > >                    goto keep_locked;
> > > > 
> > > > Reminds me of
> > > > 
> > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422
> > > > Author: Luis Chamberlain <mcgrof@kernel.org>
> > > > Date:   Thu Mar 9 15:05:43 2023 -0800
> > > > 
> > > >       shmem: skip page split if we're not reclaiming
> > > >       In theory when info->flags & VM_LOCKED we should not be getting
> > > >       shem_writepage() called so we should be verifying this with a
> > > >       WARN_ON_ONCE().  Since we should not be swapping then best to
> > > > ensure we
> > > >       also don't do the folio split earlier too.  So just move the check
> > > > early
> > > >       to avoid folio splits in case its a dubious call.
> > > >       We also have a similar early bail when !total_swap_pages so just
> > > > move that
> > > >       earlier to avoid the possible folio split in the same situation.
> > > > 
> > > > 
> > > > But indeed, pageout() -> writepage() is called *after* the split in the
> > > > vmscan path.
> > > > 
> > > > In that "noswap" context, I wonder if we also want to skip folios part
> > > > of shmem
> > > > with disabled swapping?
> > > 
> > > Yes, I think so.
> > > 
> > > > 
> > > > But now I am wondering under which circumstances we end up calling
> > > > shmem_writepage() with a large folio. And I think the answer is the
> > > > comment of
> > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > > > 
> > > > 
> > > > ... so if shmem_writepage() handles+checks that, could we do
> > > > 
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index a332cb80e928..7dfa3d6e8ba7 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > > > list_head *folio_list,
> > > >                                                   goto
> > > > activate_locked_split;
> > > >                                   }
> > > >                           }
> > > > -               } else if (folio_test_swapbacked(folio) &&
> > > > -                          folio_test_large(folio)) {
> > > > -                       /* Split shmem folio */
> > > > -                       if (split_folio_to_list(folio, folio_list))
> > > > -                               goto keep_locked;
> > > >                   }
> > > > 
> > > >                   /*
> > > > 
> > > > instead?
> > > 
> > > Seems reasonable to me. But we should pass the 'folio_list' to
> > > shmem_writepage() to list the subpages of the large folio. Let me try.
> > 
> > Ah, yes, good point. Alternatively, we'd have to split and try writing
> > all subpages in there. I wonder what to do if we fail to write some, and
> > if we could handle that transparently, without the folio_list.
> 
> After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
> via 'struct writeback_control', which could simplify the logic a lot.
> Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
> as well as tracking each subpage's write result, which seems more
> complicated.
> 
> I made a quick change by passing 'folio_list', and it looks simple and works
> as expected.
> 
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 75196b0f894f..10100e22d5c6 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -80,6 +80,9 @@ struct writeback_control {
>          */
>         struct swap_iocb **swap_plug;
> 
> +       /* Target list for splitting a large folio */
> +       struct list_head *list;
> +
>         /* internal fields used by the ->writepages implementation: */
>         struct folio_batch fbatch;
>         pgoff_t index;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 05525e9e7423..0a5a68f7d0a0 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
> writeback_control *wbc)
>          * and its shmem_writeback() needs them to be split when swapping.
>          */
>         if (wbc->split_large_folio && folio_test_large(folio)) {
> +try_split:
>                 /* Ensure the subpages are still dirty */
>                 folio_test_set_dirty(folio);
> -               if (split_huge_page(page) < 0)
> +               if (split_huge_page_to_list_to_order(page, wbc->list, 0))

We check for split_large_folio, but we still send the wbc->list for i915
writepage() case. Previously, we were sending a NULL list. Shouldn't we address
that case too?

>                         goto redirty;
>                 folio = page_folio(page);
>                 folio_clear_dirty(folio);
> @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct
> writeback_control *wbc)
>         }
> 
>         swap = folio_alloc_swap(folio);
> -       if (!swap.val)
> +       if (!swap.val) {
> +               if (nr_pages > 1)
> +                       goto try_split;
> +
>                 goto redirty;
> +       }
> 
>         /*
>          * Add inode to shmem_unuse()'s list of swapped-out inodes,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 277571815789..cf982cf2454f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -628,7 +628,7 @@ typedef enum {
>   * Calls ->writepage().
>   */
>  static pageout_t pageout(struct folio *folio, struct address_space
> *mapping,
> -                        struct swap_iocb **plug)
> +                        struct swap_iocb **plug, struct list_head
> *folio_list)
>  {
>         /*
>          * If the folio is dirty, only perform writeback if that write
> @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct
> address_space *mapping,
>                         .swap_plug = plug,
>                 };
> 
> +               if (shmem_mapping(mapping)) {
> +                       wbc.list = folio_list;
> +                       wbc.split_large_folio =
> !IS_ENABLED(CONFIG_THP_SWAP);
> +               }
> +
>                 folio_set_reclaim(folio);
>                 res = mapping->a_ops->writepage(&folio->page, &wbc);
>                 if (res < 0)
> @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>                                                 goto activate_locked_split;
>                                 }
>                         }
> -               } else if (folio_test_swapbacked(folio) &&
> -                          folio_test_large(folio)) {
> -
> -                       /*
> -                        * Do not split shmem folio if no swap memory
> -                        * available.
> -                        */
> -                       if (!total_swap_pages)
> -                               goto activate_locked;
> -
> -                       /*
> -                        * Only split shmem folio when CONFIG_THP_SWAP
> -                        * is not enabled.
> -                        */
> -                       if (!IS_ENABLED(CONFIG_THP_SWAP) &&
> -                           split_folio_to_list(folio, folio_list))
> -                               goto keep_locked;
>                 }
> 
>                 /*
> @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>                          * starts and then write it out here.
>                          */
>                         try_to_unmap_flush_dirty();
> -try_pageout:
> -                       switch (pageout(folio, mapping, &plug)) {
> +                       switch (pageout(folio, mapping, &plug, folio_list))
> {
>                         case PAGE_KEEP:
>                                 goto keep_locked;
>                         case PAGE_ACTIVATE:
> -                               if (shmem_mapping(mapping) &&
> folio_test_large(folio) &&
> -                                   !split_folio_to_list(folio, folio_list))
> {
> +                               /* Shmem can be split when writeback to swap
> */
> +                               if ((nr_pages > 1) &&
> !folio_test_large(folio)) {
> +                                       sc->nr_scanned -= (nr_pages - 1);
>                                         nr_pages = 1;
> -                                       goto try_pageout;
>                                 }
>                                 goto activate_locked;
>                         case PAGE_SUCCESS:
> +                               if ((nr_pages > 1) &&
> !folio_test_large(folio)) {
> +                                       sc->nr_scanned -= (nr_pages - 1);
> +                                       nr_pages = 1;
> +                               }
>                                 stat->nr_pageout += nr_pages;
> 
>                                 if (folio_test_writeback(folio)) {

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Daniel Gomez 1 year, 6 months ago

On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
> On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
> > 
> > 
> > On 2024/8/8 16:51, David Hildenbrand wrote:
> > > On 08.08.24 04:36, Baolin Wang wrote:
> > > > 
> > > > 
> > > > On 2024/8/7 23:53, David Hildenbrand wrote:
> > > > > On 07.08.24 09:31, Baolin Wang wrote:
> > > > > > Page reclaim will not scan anon LRU if no swap space, however
> > > > > > MADV_PAGEOUT
> > > > > > can still split shmem large folios even without a swap device. Thus add
> > > > > > swap available space validation before spliting shmem large folio to
> > > > > > avoid redundant split.
> > > > > > 
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > > ---
> > > > > >    mm/vmscan.c | 8 ++++++++
> > > > > >    1 file changed, 8 insertions(+)
> > > > > > 
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 31d13462571e..796f65781f4f 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
> > > > > > list_head *folio_list,
> > > > > >                }
> > > > > >            } else if (folio_test_swapbacked(folio) &&
> > > > > >                   folio_test_large(folio)) {
> > > > > > +
> > > > > > +            /*
> > > > > > +             * Do not split shmem folio if no swap memory
> > > > > > +             * available.
> > > > > > +             */
> > > > > > +            if (!total_swap_pages)
> > > > > > +                goto activate_locked;
> > > > > > +
> > > > > >                /* Split shmem folio */
> > > > > >                if (split_folio_to_list(folio, folio_list))
> > > > > >                    goto keep_locked;
> > > > > 
> > > > > Reminds me of
> > > > > 
> > > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422
> > > > > Author: Luis Chamberlain <mcgrof@kernel.org>
> > > > > Date:   Thu Mar 9 15:05:43 2023 -0800
> > > > > 
> > > > >       shmem: skip page split if we're not reclaiming
> > > > >       In theory when info->flags & VM_LOCKED we should not be getting
> > > > >       shem_writepage() called so we should be verifying this with a
> > > > >       WARN_ON_ONCE().  Since we should not be swapping then best to
> > > > > ensure we
> > > > >       also don't do the folio split earlier too.  So just move the check
> > > > > early
> > > > >       to avoid folio splits in case its a dubious call.
> > > > >       We also have a similar early bail when !total_swap_pages so just
> > > > > move that
> > > > >       earlier to avoid the possible folio split in the same situation.
> > > > > 
> > > > > 
> > > > > But indeed, pageout() -> writepage() is called *after* the split in the
> > > > > vmscan path.
> > > > > 
> > > > > In that "noswap" context, I wonder if we also want to skip folios part
> > > > > of shmem
> > > > > with disabled swapping?
> > > > 
> > > > Yes, I think so.
> > > > 
> > > > > 
> > > > > But now I am wondering under which circumstances we end up calling
> > > > > shmem_writepage() with a large folio. And I think the answer is the
> > > > > comment of
> > > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > > > > 
> > > > > 
> > > > > ... so if shmem_writepage() handles+checks that, could we do
> > > > > 
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index a332cb80e928..7dfa3d6e8ba7 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > > > > list_head *folio_list,
> > > > >                                                   goto
> > > > > activate_locked_split;
> > > > >                                   }
> > > > >                           }
> > > > > -               } else if (folio_test_swapbacked(folio) &&
> > > > > -                          folio_test_large(folio)) {
> > > > > -                       /* Split shmem folio */
> > > > > -                       if (split_folio_to_list(folio, folio_list))
> > > > > -                               goto keep_locked;
> > > > >                   }
> > > > > 
> > > > >                   /*
> > > > > 
> > > > > instead?
> > > > 
> > > > Seems reasonable to me. But we should pass the 'folio_list' to
> > > > shmem_writepage() to list the subpages of the large folio. Let me try.
> > > 
> > > Ah, yes, good point. Alternatively, we'd have to split and try writing
> > > all subpages in there. I wonder what to do if we fail to write some, and
> > > if we could handle that transparently, without the folio_list.
> > 
> > After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
> > via 'struct writeback_control', which could simplify the logic a lot.
> > Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
> > as well as tracking each subpage's write result, which seems more
> > complicated.
> > 
> > I made a quick change by passing 'folio_list', and it looks simple and works
> > as expected.
> > 
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index 75196b0f894f..10100e22d5c6 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -80,6 +80,9 @@ struct writeback_control {
> >          */
> >         struct swap_iocb **swap_plug;
> > 
> > +       /* Target list for splitting a large folio */
> > +       struct list_head *list;
> > +
> >         /* internal fields used by the ->writepages implementation: */
> >         struct folio_batch fbatch;
> >         pgoff_t index;
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 05525e9e7423..0a5a68f7d0a0 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> >          * and its shmem_writeback() needs them to be split when swapping.
> >          */
> >         if (wbc->split_large_folio && folio_test_large(folio)) {
> > +try_split:
> >                 /* Ensure the subpages are still dirty */
> >                 folio_test_set_dirty(folio);
> > -               if (split_huge_page(page) < 0)
> > +               if (split_huge_page_to_list_to_order(page, wbc->list, 0))
> 
> We check for split_large_folio, but we still send the wbc->list for i915
> writepage() case. Previously, we were sending a NULL list. Shouldn't we address
> that case too?

I guess I was missing wbc initialization snippet:

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..174b95a9a988 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
                .range_start = 0,
                .range_end = LLONG_MAX,
                .for_reclaim = 1,
+               .list = NULL,
        };
        unsigned long i;

> >                         goto redirty;
> >                 folio = page_folio(page);
> >                 folio_clear_dirty(folio);
> > @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> >         }
> > 
> >         swap = folio_alloc_swap(folio);
> > -       if (!swap.val)
> > +       if (!swap.val) {
> > +               if (nr_pages > 1)
> > +                       goto try_split;
> > +
> >                 goto redirty;
> > +       }
> > 
> >         /*
> >          * Add inode to shmem_unuse()'s list of swapped-out inodes,
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 277571815789..cf982cf2454f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -628,7 +628,7 @@ typedef enum {
> >   * Calls ->writepage().
> >   */
> >  static pageout_t pageout(struct folio *folio, struct address_space
> > *mapping,
> > -                        struct swap_iocb **plug)
> > +                        struct swap_iocb **plug, struct list_head
> > *folio_list)
> >  {
> >         /*
> >          * If the folio is dirty, only perform writeback if that write
> > @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct
> > address_space *mapping,
> >                         .swap_plug = plug,
> >                 };
> > 
> > +               if (shmem_mapping(mapping)) {
> > +                       wbc.list = folio_list;
> > +                       wbc.split_large_folio =
> > !IS_ENABLED(CONFIG_THP_SWAP);
> > +               }
> > +
> >                 folio_set_reclaim(folio);
> >                 res = mapping->a_ops->writepage(&folio->page, &wbc);
> >                 if (res < 0)
> > @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> >                                                 goto activate_locked_split;
> >                                 }
> >                         }
> > -               } else if (folio_test_swapbacked(folio) &&
> > -                          folio_test_large(folio)) {
> > -
> > -                       /*
> > -                        * Do not split shmem folio if no swap memory
> > -                        * available.
> > -                        */
> > -                       if (!total_swap_pages)
> > -                               goto activate_locked;
> > -
> > -                       /*
> > -                        * Only split shmem folio when CONFIG_THP_SWAP
> > -                        * is not enabled.
> > -                        */
> > -                       if (!IS_ENABLED(CONFIG_THP_SWAP) &&
> > -                           split_folio_to_list(folio, folio_list))
> > -                               goto keep_locked;
> >                 }
> > 
> >                 /*
> > @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> >                          * starts and then write it out here.
> >                          */
> >                         try_to_unmap_flush_dirty();
> > -try_pageout:
> > -                       switch (pageout(folio, mapping, &plug)) {
> > +                       switch (pageout(folio, mapping, &plug, folio_list))
> > {
> >                         case PAGE_KEEP:
> >                                 goto keep_locked;
> >                         case PAGE_ACTIVATE:
> > -                               if (shmem_mapping(mapping) &&
> > folio_test_large(folio) &&
> > -                                   !split_folio_to_list(folio, folio_list))
> > {
> > +                               /* Shmem can be split when writeback to swap
> > */
> > +                               if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > +                                       sc->nr_scanned -= (nr_pages - 1);
> >                                         nr_pages = 1;
> > -                                       goto try_pageout;
> >                                 }
> >                                 goto activate_locked;
> >                         case PAGE_SUCCESS:
> > +                               if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > +                                       sc->nr_scanned -= (nr_pages - 1);
> > +                                       nr_pages = 1;
> > +                               }
> >                                 stat->nr_pageout += nr_pages;
> > 
> >                                 if (folio_test_writeback(folio)) {

Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

Posted by Baolin Wang 1 year, 6 months ago


On 2024/8/8 18:57, Daniel Gomez wrote:
> On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
>> On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
>>>
>>>
>>> On 2024/8/8 16:51, David Hildenbrand wrote:
>>>> On 08.08.24 04:36, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/8/7 23:53, David Hildenbrand wrote:
>>>>>> On 07.08.24 09:31, Baolin Wang wrote:
>>>>>>> Page reclaim will not scan anon LRU if no swap space, however
>>>>>>> MADV_PAGEOUT
>>>>>>> can still split shmem large folios even without a swap device. Thus add
>>>>>>> swap available space validation before spliting shmem large folio to
>>>>>>> avoid redundant split.
>>>>>>>
>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>> ---
>>>>>>>     mm/vmscan.c | 8 ++++++++
>>>>>>>     1 file changed, 8 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>> index 31d13462571e..796f65781f4f 100644
>>>>>>> --- a/mm/vmscan.c
>>>>>>> +++ b/mm/vmscan.c
>>>>>>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>>>>>>> list_head *folio_list,
>>>>>>>                 }
>>>>>>>             } else if (folio_test_swapbacked(folio) &&
>>>>>>>                    folio_test_large(folio)) {
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * Do not split shmem folio if no swap memory
>>>>>>> +             * available.
>>>>>>> +             */
>>>>>>> +            if (!total_swap_pages)
>>>>>>> +                goto activate_locked;
>>>>>>> +
>>>>>>>                 /* Split shmem folio */
>>>>>>>                 if (split_folio_to_list(folio, folio_list))
>>>>>>>                     goto keep_locked;
>>>>>>
>>>>>> Reminds me of
>>>>>>
>>>>>> commit 9a976f0c847b67d22ed694556a3626ed92da0422
>>>>>> Author: Luis Chamberlain <mcgrof@kernel.org>
>>>>>> Date:   Thu Mar 9 15:05:43 2023 -0800
>>>>>>
>>>>>>        shmem: skip page split if we're not reclaiming
>>>>>>        In theory when info->flags & VM_LOCKED we should not be getting
>>>>>>        shem_writepage() called so we should be verifying this with a
>>>>>>        WARN_ON_ONCE().  Since we should not be swapping then best to
>>>>>> ensure we
>>>>>>        also don't do the folio split earlier too.  So just move the check
>>>>>> early
>>>>>>        to avoid folio splits in case its a dubious call.
>>>>>>        We also have a similar early bail when !total_swap_pages so just
>>>>>> move that
>>>>>>        earlier to avoid the possible folio split in the same situation.
>>>>>>
>>>>>>
>>>>>> But indeed, pageout() -> writepage() is called *after* the split in the
>>>>>> vmscan path.
>>>>>>
>>>>>> In that "noswap" context, I wonder if we also want to skip folios part
>>>>>> of shmem
>>>>>> with disabled swapping?
>>>>>
>>>>> Yes, I think so.
>>>>>
>>>>>>
>>>>>> But now I am wondering under which circumstances we end up calling
>>>>>> shmem_writepage() with a large folio. And I think the answer is the
>>>>>> comment of
>>>>>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>>>>>
>>>>>>
>>>>>> ... so if shmem_writepage() handles+checks that, could we do
>>>>>>
>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>> index a332cb80e928..7dfa3d6e8ba7 100644
>>>>>> --- a/mm/vmscan.c
>>>>>> +++ b/mm/vmscan.c
>>>>>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>>>>>> list_head *folio_list,
>>>>>>                                                    goto
>>>>>> activate_locked_split;
>>>>>>                                    }
>>>>>>                            }
>>>>>> -               } else if (folio_test_swapbacked(folio) &&
>>>>>> -                          folio_test_large(folio)) {
>>>>>> -                       /* Split shmem folio */
>>>>>> -                       if (split_folio_to_list(folio, folio_list))
>>>>>> -                               goto keep_locked;
>>>>>>                    }
>>>>>>
>>>>>>                    /*
>>>>>>
>>>>>> instead?
>>>>>
>>>>> Seems reasonable to me. But we should pass the 'folio_list' to
>>>>> shmem_writepage() to list the subpages of the large folio. Let me try.
>>>>
>>>> Ah, yes, good point. Alternatively, we'd have to split and try writing
>>>> all subpages in there. I wonder what to do if we fail to write some, and
>>>> if we could handle that transparently, without the folio_list.
>>>
>>> After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
>>> via 'struct writeback_control', which could simplify the logic a lot.
>>> Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
>>> as well as tracking each subpage's write result, which seems more
>>> complicated.
>>>
>>> I made a quick change by passing 'folio_list', and it looks simple and works
>>> as expected.
>>>
>>> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
>>> index 75196b0f894f..10100e22d5c6 100644
>>> --- a/include/linux/writeback.h
>>> +++ b/include/linux/writeback.h
>>> @@ -80,6 +80,9 @@ struct writeback_control {
>>>           */
>>>          struct swap_iocb **swap_plug;
>>>
>>> +       /* Target list for splitting a large folio */
>>> +       struct list_head *list;
>>> +
>>>          /* internal fields used by the ->writepages implementation: */
>>>          struct folio_batch fbatch;
>>>          pgoff_t index;
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 05525e9e7423..0a5a68f7d0a0 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
>>> writeback_control *wbc)
>>>           * and its shmem_writeback() needs them to be split when swapping.
>>>           */
>>>          if (wbc->split_large_folio && folio_test_large(folio)) {
>>> +try_split:
>>>                  /* Ensure the subpages are still dirty */
>>>                  folio_test_set_dirty(folio);
>>> -               if (split_huge_page(page) < 0)
>>> +               if (split_huge_page_to_list_to_order(page, wbc->list, 0))
>>
>> We check for split_large_folio, but we still send the wbc->list for i915
>> writepage() case. Previously, we were sending a NULL list. Shouldn't we address
>> that case too?
> 
> I guess I was missing wbc initialization snippet:
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index fe69f2c8527d..174b95a9a988 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
>                  .range_start = 0,
>                  .range_end = LLONG_MAX,
>                  .for_reclaim = 1,
> +               .list = NULL,
>          };
>          unsigned long i;
> 

IMO, we don't need an explicit initialization, and 'list' will be 
initialized as NULL. Please see: 
https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html