On Tue, Nov 4, 2025 at 4:27 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Nov 4, 2025 at 12:19 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, Oct 29, 2025 at 11:59 PM Kairui Song <ryncsn@gmail.com> wrote:
> > >
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > Now SWP_SYNCHRONOUS_IO devices are also using swap cache. One side
> > > effect is that a folio may stay in swap cache for a longer time due to
> > > lazy freeing (vm_swap_full()). This can help save some CPU / IO if folios
> > > are being swapped out very frequently right after swapin, hence improving
> > > the performance. But the long pinning of swap slots also increases the
> > > fragmentation rate of the swap device significantly, and currently,
> > > all in-tree SWP_SYNCHRONOUS_IO devices are RAM disks, so it also
> > > causes the backing memory to be pinned, increasing the memory pressure.
> > >
> > > So drop the swap cache immediately for SWP_SYNCHRONOUS_IO devices
> > > after swapin finishes. Swap cache has served its role as a
> > > synchronization layer to prevent any parallel swapin from wasting
> > > CPU or memory allocation, and the redundant IO is not a major concern
> > > for SWP_SYNCHRONOUS_IO devices.
> > >
> > > Signed-off-by: Kairui Song <kasong@tencent.com>
> > > ---
> > > mm/memory.c | 13 +++++++++++--
> > > 1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 9a43d4811781..78457347ae60 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -4359,12 +4359,21 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
> > > return 0;
> > > }
> > >
> > > -static inline bool should_try_to_free_swap(struct folio *folio,
> > > +static inline bool should_try_to_free_swap(struct swap_info_struct *si,
> > > + struct folio *folio,
> > > struct vm_area_struct *vma,
> > > unsigned int fault_flags)
> > > {
> > > if (!folio_test_swapcache(folio))
> > > return false;
> > > + /*
> > > + * Try to free swap cache for SWP_SYNCHRONOUS_IO devices.
> > > + * Redundant IO is unlikely to be an issue for them, but a
> > > + * slot being pinned by swap cache may cause more fragmentation
> > > + * and delayed freeing of swap metadata.
> > > + */
> >
> > I don’t like the claim about “redundant I/O” — it sounds misleading. Those
> > I/Os are not redundant; they are simply saved by swapcache, which prevents
> > some swap-out I/O when a recently swap-in folio is swapped out again.
> >
> > So, could we make it a bit more specific in both the comment and the commit
> > message?
>
> Sorry, on second thought—consider a case where process A mmaps 100 MB and writes
> to it to populate memory, then forks process B. If that 100 MB gets swapped out,
> and A and B later swap it in separately for reading, with this change it seems
> they would each get their own 100 MB copy (total 2 × 100 MB), whereas previously
> they could share the same 100 MB?
It's a bit tricky here, folio_free_swap only frees the swap cache if a
folio's swap count is 0, so if A swapin these folios first, the swap
cache won't be freed until B also mapped these folios and reduced the
swap count.
And this function is called should_try_to_free_swap: it's only trying
to free the swap cache if swap count == 0. I think I can add some
comments on that.