[PATCH v2 01/15] docs/mm: add document for swap table

Kairui Song posted 15 patches 4 days, 10 hours ago
[PATCH v2 01/15] docs/mm: add document for swap table
Posted by Kairui Song 4 days, 10 hours ago
From: Kairui Song <kasong@tencent.com>

From: Chris Li <chrisl@kernel.org>

Swap table is the new swap cache.

Signed-off-by: Chris Li <chrisl@kernel.org>
Signed-off-by: Kairui Song <kasong@tencent.com>
---
 Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
 MAINTAINERS                     |  1 +
 2 files changed, 73 insertions(+)
 create mode 100644 Documentation/mm/swap-table.rst

diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
new file mode 100644
index 000000000000..929cd91aa984
--- /dev/null
+++ b/Documentation/mm/swap-table.rst
@@ -0,0 +1,72 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
+
+==========
+Swap Table
+==========
+
+Swap table implements swap cache as a per-cluster swap cache value array.
+
+Swap Entry
+----------
+
+A swap entry contains the information required to serve the anonymous page
+fault.
+
+Swap entry is encoded as two parts: swap type and swap offset.
+
+The swap type indicates which swap device to use.
+The swap offset is the offset of the swap file to read the page data from.
+
+Swap Cache
+----------
+
+Swap cache is a map to look up folios using swap entry as the key. The result
+value can have three possible types depending on which stage of this swap entry
+was in.
+
+1. NULL: This swap entry is not used.
+
+2. folio: A folio has been allocated and bound to this swap entry. This is
+   the transient state of swap out or swap in. The folio data can be in
+   the folio or swap file, or both.
+
+3. shadow: The shadow contains the working set information of the swap
+   outed folio. This is the normal state for a swap outed page.
+
+Swap Table
+----------
+
+The previous swap cache is implemented by XAray. The XArray is a tree
+structure. Each lookup will go through multiple nodes. Can we do better?
+
+Notice that most of the time when we look up the swap cache, we are either
+in a swap in or swap out path. We should already have the swap cluster,
+which contains the swap entry.
+
+If we have a per-cluster array to store swap cache value in the cluster.
+Swap cache lookup within the cluster can be a very simple array lookup.
+
+We give such a per-cluster swap cache value array a name: the swap table.
+
+Each swap cluster contains 512 entries, so a swap table stores one cluster
+worth of swap cache values, which is exactly one page. This is not
+coincidental because the cluster size is determined by the huge page size.
+The swap table is holding an array of pointers. The pointer has the same
+size as the PTE. The size of the swap table should match to the second
+last level of the page table page, exactly one page.
+
+With swap table, swap cache lookup can achieve great locality, simpler,
+and faster.
+
+Locking
+-------
+
+Swap table modification requires taking the cluster lock. If a folio
+is being added to or removed from the swap table, the folio must be
+locked prior to the cluster lock. After adding or removing is done, the
+folio shall be unlocked.
+
+Swap table lookup is protected by RCU and atomic read. If the lookup
+returns a folio, the user must lock the folio before use.
diff --git a/MAINTAINERS b/MAINTAINERS
index ec19be6c9917..1c8292c0318d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16219,6 +16219,7 @@ R:	Barry Song <baohua@kernel.org>
 R:	Chris Li <chrisl@kernel.org>
 L:	linux-mm@kvack.org
 S:	Maintained
+F:	Documentation/mm/swap-table.rst
 F:	include/linux/swap.h
 F:	include/linux/swapfile.h
 F:	include/linux/swapops.h
-- 
2.51.0
Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Baoquan He 1 day, 16 hours ago
On 09/06/25 at 03:13am, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> From: Chris Li <chrisl@kernel.org>

'From author <authorkernel.org>' can only be one person, and the co-author
should be specified by "Co-developed-by:" and "Signed-off-by:"?

> 
> Swap table is the new swap cache.
> 
> Signed-off-by: Chris Li <chrisl@kernel.org>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
>  MAINTAINERS                     |  1 +
>  2 files changed, 73 insertions(+)
>  create mode 100644 Documentation/mm/swap-table.rst
> 
> diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
> new file mode 100644
> index 000000000000..929cd91aa984
> --- /dev/null
> +++ b/Documentation/mm/swap-table.rst
> @@ -0,0 +1,72 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
> +
> +==========
> +Swap Table
> +==========
> +
> +Swap table implements swap cache as a per-cluster swap cache value array.
> +
> +Swap Entry
> +----------
> +
> +A swap entry contains the information required to serve the anonymous page
> +fault.
> +
> +Swap entry is encoded as two parts: swap type and swap offset.
> +
> +The swap type indicates which swap device to use.
> +The swap offset is the offset of the swap file to read the page data from.
> +
> +Swap Cache
> +----------
> +
> +Swap cache is a map to look up folios using swap entry as the key. The result
> +value can have three possible types depending on which stage of this swap entry
> +was in.
> +
> +1. NULL: This swap entry is not used.
> +
> +2. folio: A folio has been allocated and bound to this swap entry. This is
> +   the transient state of swap out or swap in. The folio data can be in
> +   the folio or swap file, or both.
> +
> +3. shadow: The shadow contains the working set information of the swap
> +   outed folio. This is the normal state for a swap outed page.
> +
> +Swap Table
> +----------
> +
> +The previous swap cache is implemented by XAray. The XArray is a tree
> +structure. Each lookup will go through multiple nodes. Can we do better?
> +
> +Notice that most of the time when we look up the swap cache, we are either
> +in a swap in or swap out path. We should already have the swap cluster,
> +which contains the swap entry.
> +
> +If we have a per-cluster array to store swap cache value in the cluster.
> +Swap cache lookup within the cluster can be a very simple array lookup.
> +
> +We give such a per-cluster swap cache value array a name: the swap table.
> +
> +Each swap cluster contains 512 entries, so a swap table stores one cluster
> +worth of swap cache values, which is exactly one page. This is not
> +coincidental because the cluster size is determined by the huge page size.
> +The swap table is holding an array of pointers. The pointer has the same
> +size as the PTE. The size of the swap table should match to the second
> +last level of the page table page, exactly one page.
> +
> +With swap table, swap cache lookup can achieve great locality, simpler,
> +and faster.
> +
> +Locking
> +-------
> +
> +Swap table modification requires taking the cluster lock. If a folio
> +is being added to or removed from the swap table, the folio must be
> +locked prior to the cluster lock. After adding or removing is done, the
> +folio shall be unlocked.
> +
> +Swap table lookup is protected by RCU and atomic read. If the lookup
> +returns a folio, the user must lock the folio before use.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ec19be6c9917..1c8292c0318d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16219,6 +16219,7 @@ R:	Barry Song <baohua@kernel.org>
>  R:	Chris Li <chrisl@kernel.org>
>  L:	linux-mm@kvack.org
>  S:	Maintained
> +F:	Documentation/mm/swap-table.rst
>  F:	include/linux/swap.h
>  F:	include/linux/swapfile.h
>  F:	include/linux/swapops.h
> -- 
> 2.51.0
>
Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Chris Li 1 day, 14 hours ago
On Mon, Sep 8, 2025 at 5:36 AM Baoquan He <bhe@redhat.com> wrote:
>
> On 09/06/25 at 03:13am, Kairui Song wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > From: Chris Li <chrisl@kernel.org>
>
> 'From author <authorkernel.org>' can only be one person, and the co-author
> should be specified by "Co-developed-by:" and "Signed-off-by:"?

That is the artifact of sending another person's patch in a series.
The first "From" is from the email header sender. The second "from" is
the real author of the patch. Just like an IP tunnel packet there is
another inner IP packet wrapped in the outer IP packet.

I think that is all normal and did not violate the kernel rules. When
I include Kairui's patch in my swap allocator series. The same thing
happened there on Kairui's patch. In the end the git will know enough
who is the real author, because those patches are  outputted by git
anyway.

Chris
Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Baoquan He 1 day, 14 hours ago
On 09/08/25 at 08:01am, Chris Li wrote:
> On Mon, Sep 8, 2025 at 5:36 AM Baoquan He <bhe@redhat.com> wrote:
> >
> > On 09/06/25 at 03:13am, Kairui Song wrote:
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > From: Chris Li <chrisl@kernel.org>
> >
> > 'From author <authorkernel.org>' can only be one person, and the co-author
> > should be specified by "Co-developed-by:" and "Signed-off-by:"?
> 
> That is the artifact of sending another person's patch in a series.
> The first "From" is from the email header sender. The second "from" is
> the real author of the patch. Just like an IP tunnel packet there is
> another inner IP packet wrapped in the outer IP packet.
> 
> I think that is all normal and did not violate the kernel rules. When
> I include Kairui's patch in my swap allocator series. The same thing
> happened there on Kairui's patch. In the end the git will know enough
> who is the real author, because those patches are  outputted by git
> anyway.

Hmm, maybe git doesn't work like that. I applied this patch via git am,
I got this on my local branch. The 2nd 'From' become part of commit log.

commit 337b3cd6c0ffad355df8851414e8aa5be052f4cb (HEAD -> kasan-v3)
Author: Kairui Song <kasong@tencent.com>
Date:   Sat Sep 6 03:13:43 2025 +0800

    docs/mm: add document for swap table
    
    From: Chris Li <chrisl@kernel.org>
    
    Swap table is the new swap cache.
    
    Signed-off-by: Chris Li <chrisl@kernel.org>
    Signed-off-by: Kairui Song <kasong@tencent.com>

Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Chris Li 1 day, 13 hours ago
On Mon, Sep 8, 2025 at 8:09 AM Baoquan He <bhe@redhat.com> wrote:
> > I think that is all normal and did not violate the kernel rules. When
> > I include Kairui's patch in my swap allocator series. The same thing
> > happened there on Kairui's patch. In the end the git will know enough
> > who is the real author, because those patches are  outputted by git
> > anyway.
>
> Hmm, maybe git doesn't work like that. I applied this patch via git am,
> I got this on my local branch. The 2nd 'From' become part of commit log.
>

In that case, Kairui needs to fix his sendmail config.

Maybe as you suggested, remove his own From: line  in this case. I
don't recall needing such a special git-send-mail config. Maybe
Kairui's smtp server is different.

BTW, I definitely know that Google's smtp server does not work well
with "b4 send --reflect".  Google SMTP is like: "Oh, I see you might
want to CC these people, because you include them in your inner
envelope CC list. let me do a CC for you on the outer envelope as
well". It defeats the purpose of "b4 send --reflect", which is a dry
run. I still recall the horror on my poor colleague's face when I
convinced him to try out "b4 send --reflect", which should be safe,
but the email actually sent out to the full list. I should file a bug
for that.

Chris

> commit 337b3cd6c0ffad355df8851414e8aa5be052f4cb (HEAD -> kasan-v3)
> Author: Kairui Song <kasong@tencent.com>
> Date:   Sat Sep 6 03:13:43 2025 +0800
>
>     docs/mm: add document for swap table
>
>     From: Chris Li <chrisl@kernel.org>
>
>     Swap table is the new swap cache.
>
>     Signed-off-by: Chris Li <chrisl@kernel.org>
>     Signed-off-by: Kairui Song <kasong@tencent.com>
>
Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Kairui Song 1 day, 15 hours ago
On Mon, Sep 8, 2025 at 8:54 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 09/06/25 at 03:13am, Kairui Song wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > From: Chris Li <chrisl@kernel.org>
>
> 'From author <authorkernel.org>' can only be one person, and the co-author
> should be specified by "Co-developed-by:" and "Signed-off-by:"?
>

Hmm, that's interesting, I'm using git send mail with below setup:

[sendemail]
from = Kairui Song <ryncsn@gmail.com>
confirm = auto
smtpServer = smtp.gmail.com
smtpServerPort = 587
smtpEncryption = tls
smtpUser = ryncsn@gmail.com

So it will add a "From:" automatically when I'm using gmail's SMTP but
the patch author doesn't match the sender. It seems git somehow got
confused by this commit, maybe I used some sending parameters wrongly.

The author of the doc really should be Chris.
Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Baoquan He 1 day, 14 hours ago
On 09/08/25 at 10:27pm, Kairui Song wrote:
> On Mon, Sep 8, 2025 at 8:54 PM Baoquan He <bhe@redhat.com> wrote:
> >
> > On 09/06/25 at 03:13am, Kairui Song wrote:
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > From: Chris Li <chrisl@kernel.org>
> >
> > 'From author <authorkernel.org>' can only be one person, and the co-author
> > should be specified by "Co-developed-by:" and "Signed-off-by:"?
> >
> 
> Hmm, that's interesting, I'm using git send mail with below setup:
> 
> [sendemail]
> from = Kairui Song <ryncsn@gmail.com>
> confirm = auto
> smtpServer = smtp.gmail.com
> smtpServerPort = 587
> smtpEncryption = tls
> smtpUser = ryncsn@gmail.com
> 
> So it will add a "From:" automatically when I'm using gmail's SMTP but
> the patch author doesn't match the sender. It seems git somehow got
> confused by this commit, maybe I used some sending parameters wrongly.

Then you may need to remove the 'from' field of your git [sendemail]
section. If I git am your patch, then the your first 'from' will be the
patch author.

> 
> The author of the doc really should be Chris.
> 

Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Chris Li 4 days, 5 hours ago
On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> From: Chris Li <chrisl@kernel.org>
>
> Swap table is the new swap cache.
>
> Signed-off-by: Chris Li <chrisl@kernel.org>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
>  MAINTAINERS                     |  1 +
>  2 files changed, 73 insertions(+)
>  create mode 100644 Documentation/mm/swap-table.rst
>
> diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
> new file mode 100644
> index 000000000000..929cd91aa984
> --- /dev/null
> +++ b/Documentation/mm/swap-table.rst
> @@ -0,0 +1,72 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
> +
> +==========
> +Swap Table
> +==========
> +
> +Swap table implements swap cache as a per-cluster swap cache value array.
> +
> +Swap Entry
> +----------
> +
> +A swap entry contains the information required to serve the anonymous page
> +fault.
> +
> +Swap entry is encoded as two parts: swap type and swap offset.
> +
> +The swap type indicates which swap device to use.
> +The swap offset is the offset of the swap file to read the page data from.
> +
> +Swap Cache
> +----------
> +
> +Swap cache is a map to look up folios using swap entry as the key. The result
> +value can have three possible types depending on which stage of this swap entry
> +was in.
> +
> +1. NULL: This swap entry is not used.
> +
> +2. folio: A folio has been allocated and bound to this swap entry. This is
> +   the transient state of swap out or swap in. The folio data can be in
> +   the folio or swap file, or both.
> +
> +3. shadow: The shadow contains the working set information of the swap

I just noticed a typo here, should be "swapped out page"

> +   outed folio. This is the normal state for a swap outed page.

Same here. "swap outed page" -> "swapped out page"

Chris
Re: [PATCH v2 01/15] docs/mm: add document for swap table
Posted by Kairui Song 3 days, 16 hours ago
On Sat, Sep 6, 2025 at 8:05 AM Chris Li <chrisl@kernel.org> wrote:
>
> On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > From: Chris Li <chrisl@kernel.org>
> >
> > Swap table is the new swap cache.
> >
> > Signed-off-by: Chris Li <chrisl@kernel.org>
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> >  Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
> >  MAINTAINERS                     |  1 +
> >  2 files changed, 73 insertions(+)
> >  create mode 100644 Documentation/mm/swap-table.rst
> >
> > diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
> > new file mode 100644
> > index 000000000000..929cd91aa984
> > --- /dev/null
> > +++ b/Documentation/mm/swap-table.rst
> > @@ -0,0 +1,72 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
> > +
> > +==========
> > +Swap Table
> > +==========
> > +
> > +Swap table implements swap cache as a per-cluster swap cache value array.
> > +
> > +Swap Entry
> > +----------
> > +
> > +A swap entry contains the information required to serve the anonymous page
> > +fault.
> > +
> > +Swap entry is encoded as two parts: swap type and swap offset.
> > +
> > +The swap type indicates which swap device to use.
> > +The swap offset is the offset of the swap file to read the page data from.
> > +
> > +Swap Cache
> > +----------
> > +
> > +Swap cache is a map to look up folios using swap entry as the key. The result
> > +value can have three possible types depending on which stage of this swap entry
> > +was in.
> > +
> > +1. NULL: This swap entry is not used.
> > +
> > +2. folio: A folio has been allocated and bound to this swap entry. This is
> > +   the transient state of swap out or swap in. The folio data can be in
> > +   the folio or swap file, or both.
> > +
> > +3. shadow: The shadow contains the working set information of the swap
>
> I just noticed a typo here, should be "swapped out page"
>
> > +   outed folio. This is the normal state for a swap outed page.
>
> Same here. "swap outed page" -> "swapped out page"

Thanks, I used some grammar check tools and it seems they are not
perfect with kernel terminologies.

>
> Chris
>