[v5] mm/shmem, swap: bugfix and improvement of mTHP swap in

[PATCH v5 6/8] mm/shmem, swap: simplify swapin path and result handling

Posted by Kairui Song 7 months ago

From: Kairui Song <kasong@tencent.com>

Slightly tidy up the different handling of swap in and error handling
for SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin
will always use either shmem_swap_alloc_folio or shmem_swapin_cluster,
then check the result.

Simplify the control flow and avoid a redundant goto label.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/shmem.c | 45 +++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 847e6f128485..80f5b8c73eb8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2320,40 +2320,33 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 			count_memcg_event_mm(fault_mm, PGMAJFAULT);
 		}
 
-		/* Skip swapcache for synchronous device. */
 		if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
+			/* Direct mTHP swapin skipping swap cache & readhaed */
 			folio = shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp);
-			if (!IS_ERR(folio)) {
-				skip_swapcache = true;
-				goto alloced;
+			if (IS_ERR(folio)) {
+				error = PTR_ERR(folio);
+				folio = NULL;
+				goto failed;
 			}
-
+			skip_swapcache = true;
+		} else {
 			/*
-			 * Direct swapin handled order 0 fallback already,
-			 * if it failed, abort.
+			 * Cached swapin only supports order 0 folio, it is
+			 * necessary to recalculate the new swap entry based on
+			 * the offset, as the swapin index might be unalgined.
 			 */
-			error = PTR_ERR(folio);
-			folio = NULL;
-			goto failed;
-		}
-
-		/*
-		 * Now swap device can only swap in order 0 folio, it is
-		 * necessary to recalculate the new swap entry based on
-		 * the offset, as the swapin index might be unalgined.
-		 */
-		if (order) {
-			offset = index - round_down(index, 1 << order);
-			swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
-		}
+			if (order) {
+				offset = index - round_down(index, 1 << order);
+				swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
+			}
 
-		folio = shmem_swapin_cluster(swap, gfp, info, index);
-		if (!folio) {
-			error = -ENOMEM;
-			goto failed;
+			folio = shmem_swapin_cluster(swap, gfp, info, index);
+			if (!folio) {
+				error = -ENOMEM;
+				goto failed;
+			}
 		}
 	}
-alloced:
 	if (order > folio_order(folio)) {
 		/*
 		 * Swapin may get smaller folios due to various reasons:
-- 
2.50.0

Re: [PATCH v5 6/8] mm/shmem, swap: simplify swapin path and result handling

Posted by Baolin Wang 7 months ago


On 2025/7/10 11:37, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> Slightly tidy up the different handling of swap in and error handling
> for SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin
> will always use either shmem_swap_alloc_folio or shmem_swapin_cluster,
> then check the result.
> 
> Simplify the control flow and avoid a redundant goto label.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>

LGTM, with a nit as follows.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/shmem.c | 45 +++++++++++++++++++--------------------------
>   1 file changed, 19 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 847e6f128485..80f5b8c73eb8 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2320,40 +2320,33 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>   			count_memcg_event_mm(fault_mm, PGMAJFAULT);
>   		}
>   
> -		/* Skip swapcache for synchronous device. */
>   		if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
> +			/* Direct mTHP swapin skipping swap cache & readhaed */
>   			folio = shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp);

Nit: the 'mTHP' word can be confusing, since we will skip swapcache for 
order 0 too. Please drop it.

Re: [PATCH v5 6/8] mm/shmem, swap: simplify swapin path and result handling

Posted by Kairui Song 7 months ago

On Fri, Jul 11, 2025 at 2:23 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/7/10 11:37, Kairui Song wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > Slightly tidy up the different handling of swap in and error handling
> > for SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin
> > will always use either shmem_swap_alloc_folio or shmem_swapin_cluster,
> > then check the result.
> >
> > Simplify the control flow and avoid a redundant goto label.
> >
> > Signed-off-by: Kairui Song <kasong@tencent.com>
>
> LGTM, with a nit as follows.
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>
> > ---
> >   mm/shmem.c | 45 +++++++++++++++++++--------------------------
> >   1 file changed, 19 insertions(+), 26 deletions(-)
> >
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 847e6f128485..80f5b8c73eb8 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -2320,40 +2320,33 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> >                       count_memcg_event_mm(fault_mm, PGMAJFAULT);
> >               }
> >
> > -             /* Skip swapcache for synchronous device. */
> >               if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
> > +                     /* Direct mTHP swapin skipping swap cache & readhaed */
> >                       folio = shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp);
>
> Nit: the 'mTHP' word can be confusing, since we will skip swapcache for
> order 0 too. Please drop it.
>

Yes, thanks for the review.

Re: [PATCH v5 6/8] mm/shmem, swap: simplify swapin path and result handling

Posted by Hugh Dickins 6 months, 3 weeks ago

On Fri, 11 Jul 2025, Kairui Song wrote:
> On Fri, Jul 11, 2025 at 2:23 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/7/10 11:37, Kairui Song wrote:
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > Slightly tidy up the different handling of swap in and error handling
> > > for SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin
> > > will always use either shmem_swap_alloc_folio or shmem_swapin_cluster,
> > > then check the result.
> > >
> > > Simplify the control flow and avoid a redundant goto label.
> > >
> > > Signed-off-by: Kairui Song <kasong@tencent.com>
> >
> > LGTM, with a nit as follows.
> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >
> > > ---
> > >   mm/shmem.c | 45 +++++++++++++++++++--------------------------
> > >   1 file changed, 19 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 847e6f128485..80f5b8c73eb8 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2320,40 +2320,33 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> > >                       count_memcg_event_mm(fault_mm, PGMAJFAULT);
> > >               }
> > >
> > > -             /* Skip swapcache for synchronous device. */
> > >               if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
> > > +                     /* Direct mTHP swapin skipping swap cache & readhaed */
> > >                       folio = shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp);
> >
> > Nit: the 'mTHP' word can be confusing, since we will skip swapcache for
> > order 0 too. Please drop it.
> >
> 
> Yes, thanks for the review.

And a few words after that 'mTHP ', I keep wincing at 'readhaed':
Andrew, you already did a fix to remove the 'mTHP ', I hope we can
also persuade you to change 'readhaed' to 'readahead' there - thanks!

Kairui, I'm a little uneasy about the way this series does arithmetic
on swap.val, in the knowledge that swp_offset(entry) involves no shift.

Perhaps I haven't noticed, but I think this is the first place to
make that assumption; and a few years ago it was not true at all -
swp_type() was down the bottom.  Usually we would do it all with
swp_entry(swp_type(x), arithmetic_on(swp_offset(x))).

But I guess, let's just agree that it's easier to read and get right
the way you have it, and make no change: if I try to "correct" you,
or demand that you change it, we shall probably just bring in bugs.

I'm particularly glad that you now avoid SWP_SYNCHRONOUS_IO readahead:
that stupidity had very much annoyed me, once I realized it.

Thanks,
Hugh

Re: [PATCH v5 6/8] mm/shmem, swap: simplify swapin path and result handling

Posted by Kairui Song 6 months, 3 weeks ago

On Wed, Jul 16, 2025 at 6:10 AM Hugh Dickins <hughd@google.com> wrote:
>
> On Fri, 11 Jul 2025, Kairui Song wrote:
> > On Fri, Jul 11, 2025 at 2:23 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 2025/7/10 11:37, Kairui Song wrote:
> > > > From: Kairui Song <kasong@tencent.com>
> > > >
> > > > Slightly tidy up the different handling of swap in and error handling
> > > > for SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin
> > > > will always use either shmem_swap_alloc_folio or shmem_swapin_cluster,
> > > > then check the result.
> > > >
> > > > Simplify the control flow and avoid a redundant goto label.
> > > >
> > > > Signed-off-by: Kairui Song <kasong@tencent.com>
> > >
> > > LGTM, with a nit as follows.
> > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > >
> > > > ---
> > > >   mm/shmem.c | 45 +++++++++++++++++++--------------------------
> > > >   1 file changed, 19 insertions(+), 26 deletions(-)
> > > >
> > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > index 847e6f128485..80f5b8c73eb8 100644
> > > > --- a/mm/shmem.c
> > > > +++ b/mm/shmem.c
> > > > @@ -2320,40 +2320,33 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> > > >                       count_memcg_event_mm(fault_mm, PGMAJFAULT);
> > > >               }
> > > >
> > > > -             /* Skip swapcache for synchronous device. */
> > > >               if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
> > > > +                     /* Direct mTHP swapin skipping swap cache & readhaed */
> > > >                       folio = shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp);
> > >
> > > Nit: the 'mTHP' word can be confusing, since we will skip swapcache for
> > > order 0 too. Please drop it.
> > >
> >
> > Yes, thanks for the review.
>
> And a few words after that 'mTHP ', I keep wincing at 'readhaed':
> Andrew, you already did a fix to remove the 'mTHP ', I hope we can
> also persuade you to change 'readhaed' to 'readahead' there - thanks!
>
> Kairui, I'm a little uneasy about the way this series does arithmetic
> on swap.val, in the knowledge that swp_offset(entry) involves no shift.
>
> Perhaps I haven't noticed, but I think this is the first place to
> make that assumption; and a few years ago it was not true at all -
> swp_type() was down the bottom.  Usually we would do it all with
> swp_entry(swp_type(x), arithmetic_on(swp_offset(x))).
>
> But I guess, let's just agree that it's easier to read and get right
> the way you have it, and make no change: if I try to "correct" you,
> or demand that you change it, we shall probably just bring in bugs.

Thanks!

I think maybe we can introduce some helpers for things like rounding
the swap entry later, we already have other similar helpers for swap
entries.

There was already same arithmetics in memoy.c some time ago, and I
remember seeing people doing this several times. Current swap values
are OK with this, will be easier to track with a helper.


> I'm particularly glad that you now avoid SWP_SYNCHRONOUS_IO readahead:
> that stupidity had very much annoyed me, once I realized it.
>
> Thanks,
> Hugh