[v5] mm/shmem, swap: bugfix and improvement of mTHP swap in

[PATCH v5 5/8] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO

Posted by Kairui Song 7 months ago

From: Kairui Song <kasong@tencent.com>

For SWP_SYNCHRONOUS_IO devices, if a cache bypassing THP swapin failed
due to reasons like memory pressure, partially conflicting swap cache
or ZSWAP enabled, shmem will fallback to cached order 0 swapin.

Right now the swap cache still has a non-trivial overhead, and readahead
is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip
the readahead and swap cache even if the swapin falls back to order 0.

So handle the fallback logic without falling back to the cached read.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/shmem.c | 41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 97db1097f7de..847e6f128485 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1982,6 +1982,7 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	int nr_pages = 1 << order;
 	struct folio *new;
+	gfp_t alloc_gfp;
 	void *shadow;
 
 	/*
@@ -1989,6 +1990,7 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
 	 * limit chance of success with further cpuset and node constraints.
 	 */
 	gfp &= ~GFP_CONSTRAINT_MASK;
+	alloc_gfp = gfp;
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
 		if (WARN_ON_ONCE(order))
 			return ERR_PTR(-EINVAL);
@@ -2003,19 +2005,22 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
 		if ((vma && unlikely(userfaultfd_armed(vma))) ||
 		     !zswap_never_enabled() ||
 		     non_swapcache_batch(entry, nr_pages) != nr_pages)
-			return ERR_PTR(-EINVAL);
+			goto fallback;
 
-		gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp);
+		alloc_gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp);
+	}
+retry:
+	new = shmem_alloc_folio(alloc_gfp, order, info, index);
+	if (!new) {
+		new = ERR_PTR(-ENOMEM);
+		goto fallback;
 	}
-
-	new = shmem_alloc_folio(gfp, order, info, index);
-	if (!new)
-		return ERR_PTR(-ENOMEM);
 
 	if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL,
-					   gfp, entry)) {
+					   alloc_gfp, entry)) {
 		folio_put(new);
-		return ERR_PTR(-ENOMEM);
+		new = ERR_PTR(-ENOMEM);
+		goto fallback;
 	}
 
 	/*
@@ -2030,7 +2035,9 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
 	 */
 	if (swapcache_prepare(entry, nr_pages)) {
 		folio_put(new);
-		return ERR_PTR(-EEXIST);
+		new = ERR_PTR(-EEXIST);
+		/* Try smaller folio to avoid cache conflict */
+		goto fallback;
 	}
 
 	__folio_set_locked(new);
@@ -2044,6 +2051,15 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
 	folio_add_lru(new);
 	swap_read_folio(new, NULL);
 	return new;
+fallback:
+	/* Order 0 swapin failed, nothing to fallback to, abort */
+	if (!order)
+		return new;
+	entry.val += index - round_down(index, nr_pages);
+	alloc_gfp = gfp;
+	nr_pages = 1;
+	order = 0;
+	goto retry;
 }
 
 /*
@@ -2313,13 +2329,12 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 			}
 
 			/*
-			 * Fallback to swapin order-0 folio unless the swap entry
-			 * already exists.
+			 * Direct swapin handled order 0 fallback already,
+			 * if it failed, abort.
 			 */
 			error = PTR_ERR(folio);
 			folio = NULL;
-			if (error == -EEXIST)
-				goto failed;
+			goto failed;
 		}
 
 		/*
-- 
2.50.0

Re: [PATCH v5 5/8] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO

Posted by Barry Song 7 months ago

On Thu, Jul 10, 2025 at 11:37 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> For SWP_SYNCHRONOUS_IO devices, if a cache bypassing THP swapin failed
> due to reasons like memory pressure, partially conflicting swap cache
> or ZSWAP enabled, shmem will fallback to cached order 0 swapin.
>
> Right now the swap cache still has a non-trivial overhead, and readahead
> is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip
> the readahead and swap cache even if the swapin falls back to order 0.
>
> So handle the fallback logic without falling back to the cached read.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/shmem.c | 41 ++++++++++++++++++++++++++++-------------
>  1 file changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 97db1097f7de..847e6f128485 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1982,6 +1982,7 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
>         struct shmem_inode_info *info = SHMEM_I(inode);
>         int nr_pages = 1 << order;
>         struct folio *new;
> +       gfp_t alloc_gfp;
>         void *shadow;
>
>         /*
> @@ -1989,6 +1990,7 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
>          * limit chance of success with further cpuset and node constraints.
>          */
>         gfp &= ~GFP_CONSTRAINT_MASK;
> +       alloc_gfp = gfp;
>         if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>                 if (WARN_ON_ONCE(order))
>                         return ERR_PTR(-EINVAL);
> @@ -2003,19 +2005,22 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
>                 if ((vma && unlikely(userfaultfd_armed(vma))) ||
>                      !zswap_never_enabled() ||
>                      non_swapcache_batch(entry, nr_pages) != nr_pages)
> -                       return ERR_PTR(-EINVAL);
> +                       goto fallback;
>
> -               gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp);
> +               alloc_gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp);
> +       }
> +retry:
> +       new = shmem_alloc_folio(alloc_gfp, order, info, index);
> +       if (!new) {
> +               new = ERR_PTR(-ENOMEM);
> +               goto fallback;
>         }
> -
> -       new = shmem_alloc_folio(gfp, order, info, index);
> -       if (!new)
> -               return ERR_PTR(-ENOMEM);
>
>         if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL,
> -                                          gfp, entry)) {
> +                                          alloc_gfp, entry)) {
>                 folio_put(new);
> -               return ERR_PTR(-ENOMEM);
> +               new = ERR_PTR(-ENOMEM);
> +               goto fallback;
>         }
>
>         /*
> @@ -2030,7 +2035,9 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
>          */
>         if (swapcache_prepare(entry, nr_pages)) {
>                 folio_put(new);
> -               return ERR_PTR(-EEXIST);
> +               new = ERR_PTR(-EEXIST);
> +               /* Try smaller folio to avoid cache conflict */
> +               goto fallback;
>         }
>
>         __folio_set_locked(new);
> @@ -2044,6 +2051,15 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
>         folio_add_lru(new);
>         swap_read_folio(new, NULL);
>         return new;
> +fallback:
> +       /* Order 0 swapin failed, nothing to fallback to, abort */
> +       if (!order)
> +               return new;


Feels a bit odd to me. Would it be possible to handle this earlier,
like:

    if (!order)
        return ERR_PTR(-ENOMEM);
    goto fallback;

or:

    if (order)
        goto fallback;
    return ERR_PTR(-ENOMEM);

Not strongly opinionated here—totally up to you.

Thanks
Barry

Re: [PATCH v5 5/8] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO

Posted by Baolin Wang 7 months ago


On 2025/7/10 11:37, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> For SWP_SYNCHRONOUS_IO devices, if a cache bypassing THP swapin failed
> due to reasons like memory pressure, partially conflicting swap cache
> or ZSWAP enabled, shmem will fallback to cached order 0 swapin.
> 
> Right now the swap cache still has a non-trivial overhead, and readahead
> is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip
> the readahead and swap cache even if the swapin falls back to order 0.
> 
> So handle the fallback logic without falling back to the cached read.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>

LGTM. Thanks.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

[PATCH v5 1/8] mm/shmem, swap: improve cached mTHP handling and fix potential hung
[PATCH v5 2/8] mm/shmem, swap: avoid redundant Xarray lookup during swapin
[PATCH v5 3/8] mm/shmem, swap: tidy up THP swapin checks
[PATCH v5 4/8] mm/shmem, swap: tidy up swap entry splitting
[PATCH v5 5/8] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
[PATCH v5 6/8] mm/shmem, swap: simplify swapin path and result handling
[PATCH v5 7/8] mm/shmem, swap: rework swap entry and index calculation for large swapin
[PATCH v5 8/8] mm/shmem, swap: fix major fault counting