From: Kairui Song <kasong@tencent.com>
From: Chris Li <chrisl@kernel.org>
Swap table is the new swap cache.
Signed-off-by: Chris Li <chrisl@kernel.org>
Signed-off-by: Kairui Song <kasong@tencent.com>
---
Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 73 insertions(+)
create mode 100644 Documentation/mm/swap-table.rst
diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
new file mode 100644
index 000000000000..929cd91aa984
--- /dev/null
+++ b/Documentation/mm/swap-table.rst
@@ -0,0 +1,72 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
+
+==========
+Swap Table
+==========
+
+Swap table implements swap cache as a per-cluster swap cache value array.
+
+Swap Entry
+----------
+
+A swap entry contains the information required to serve the anonymous page
+fault.
+
+Swap entry is encoded as two parts: swap type and swap offset.
+
+The swap type indicates which swap device to use.
+The swap offset is the offset of the swap file to read the page data from.
+
+Swap Cache
+----------
+
+Swap cache is a map to look up folios using swap entry as the key. The result
+value can have three possible types depending on which stage of this swap entry
+was in.
+
+1. NULL: This swap entry is not used.
+
+2. folio: A folio has been allocated and bound to this swap entry. This is
+ the transient state of swap out or swap in. The folio data can be in
+ the folio or swap file, or both.
+
+3. shadow: The shadow contains the working set information of the swap
+ outed folio. This is the normal state for a swap outed page.
+
+Swap Table
+----------
+
+The previous swap cache is implemented by XAray. The XArray is a tree
+structure. Each lookup will go through multiple nodes. Can we do better?
+
+Notice that most of the time when we look up the swap cache, we are either
+in a swap in or swap out path. We should already have the swap cluster,
+which contains the swap entry.
+
+If we have a per-cluster array to store swap cache value in the cluster.
+Swap cache lookup within the cluster can be a very simple array lookup.
+
+We give such a per-cluster swap cache value array a name: the swap table.
+
+Each swap cluster contains 512 entries, so a swap table stores one cluster
+worth of swap cache values, which is exactly one page. This is not
+coincidental because the cluster size is determined by the huge page size.
+The swap table is holding an array of pointers. The pointer has the same
+size as the PTE. The size of the swap table should match to the second
+last level of the page table page, exactly one page.
+
+With swap table, swap cache lookup can achieve great locality, simpler,
+and faster.
+
+Locking
+-------
+
+Swap table modification requires taking the cluster lock. If a folio
+is being added to or removed from the swap table, the folio must be
+locked prior to the cluster lock. After adding or removing is done, the
+folio shall be unlocked.
+
+Swap table lookup is protected by RCU and atomic read. If the lookup
+returns a folio, the user must lock the folio before use.
diff --git a/MAINTAINERS b/MAINTAINERS
index ec19be6c9917..1c8292c0318d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16219,6 +16219,7 @@ R: Barry Song <baohua@kernel.org>
R: Chris Li <chrisl@kernel.org>
L: linux-mm@kvack.org
S: Maintained
+F: Documentation/mm/swap-table.rst
F: include/linux/swap.h
F: include/linux/swapfile.h
F: include/linux/swapops.h
--
2.51.0
On 09/06/25 at 03:13am, Kairui Song wrote: > From: Kairui Song <kasong@tencent.com> > > From: Chris Li <chrisl@kernel.org> 'From author <authorkernel.org>' can only be one person, and the co-author should be specified by "Co-developed-by:" and "Signed-off-by:"? > > Swap table is the new swap cache. > > Signed-off-by: Chris Li <chrisl@kernel.org> > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 2 files changed, 73 insertions(+) > create mode 100644 Documentation/mm/swap-table.rst > > diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst > new file mode 100644 > index 000000000000..929cd91aa984 > --- /dev/null > +++ b/Documentation/mm/swap-table.rst > @@ -0,0 +1,72 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com> > + > +========== > +Swap Table > +========== > + > +Swap table implements swap cache as a per-cluster swap cache value array. > + > +Swap Entry > +---------- > + > +A swap entry contains the information required to serve the anonymous page > +fault. > + > +Swap entry is encoded as two parts: swap type and swap offset. > + > +The swap type indicates which swap device to use. > +The swap offset is the offset of the swap file to read the page data from. > + > +Swap Cache > +---------- > + > +Swap cache is a map to look up folios using swap entry as the key. The result > +value can have three possible types depending on which stage of this swap entry > +was in. > + > +1. NULL: This swap entry is not used. > + > +2. folio: A folio has been allocated and bound to this swap entry. This is > + the transient state of swap out or swap in. The folio data can be in > + the folio or swap file, or both. > + > +3. shadow: The shadow contains the working set information of the swap > + outed folio. This is the normal state for a swap outed page. > + > +Swap Table > +---------- > + > +The previous swap cache is implemented by XAray. The XArray is a tree > +structure. Each lookup will go through multiple nodes. Can we do better? > + > +Notice that most of the time when we look up the swap cache, we are either > +in a swap in or swap out path. We should already have the swap cluster, > +which contains the swap entry. > + > +If we have a per-cluster array to store swap cache value in the cluster. > +Swap cache lookup within the cluster can be a very simple array lookup. > + > +We give such a per-cluster swap cache value array a name: the swap table. > + > +Each swap cluster contains 512 entries, so a swap table stores one cluster > +worth of swap cache values, which is exactly one page. This is not > +coincidental because the cluster size is determined by the huge page size. > +The swap table is holding an array of pointers. The pointer has the same > +size as the PTE. The size of the swap table should match to the second > +last level of the page table page, exactly one page. > + > +With swap table, swap cache lookup can achieve great locality, simpler, > +and faster. > + > +Locking > +------- > + > +Swap table modification requires taking the cluster lock. If a folio > +is being added to or removed from the swap table, the folio must be > +locked prior to the cluster lock. After adding or removing is done, the > +folio shall be unlocked. > + > +Swap table lookup is protected by RCU and atomic read. If the lookup > +returns a folio, the user must lock the folio before use. > diff --git a/MAINTAINERS b/MAINTAINERS > index ec19be6c9917..1c8292c0318d 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -16219,6 +16219,7 @@ R: Barry Song <baohua@kernel.org> > R: Chris Li <chrisl@kernel.org> > L: linux-mm@kvack.org > S: Maintained > +F: Documentation/mm/swap-table.rst > F: include/linux/swap.h > F: include/linux/swapfile.h > F: include/linux/swapops.h > -- > 2.51.0 >
On Mon, Sep 8, 2025 at 5:36 AM Baoquan He <bhe@redhat.com> wrote: > > On 09/06/25 at 03:13am, Kairui Song wrote: > > From: Kairui Song <kasong@tencent.com> > > > > From: Chris Li <chrisl@kernel.org> > > 'From author <authorkernel.org>' can only be one person, and the co-author > should be specified by "Co-developed-by:" and "Signed-off-by:"? That is the artifact of sending another person's patch in a series. The first "From" is from the email header sender. The second "from" is the real author of the patch. Just like an IP tunnel packet there is another inner IP packet wrapped in the outer IP packet. I think that is all normal and did not violate the kernel rules. When I include Kairui's patch in my swap allocator series. The same thing happened there on Kairui's patch. In the end the git will know enough who is the real author, because those patches are outputted by git anyway. Chris
On 09/08/25 at 08:01am, Chris Li wrote: > On Mon, Sep 8, 2025 at 5:36 AM Baoquan He <bhe@redhat.com> wrote: > > > > On 09/06/25 at 03:13am, Kairui Song wrote: > > > From: Kairui Song <kasong@tencent.com> > > > > > > From: Chris Li <chrisl@kernel.org> > > > > 'From author <authorkernel.org>' can only be one person, and the co-author > > should be specified by "Co-developed-by:" and "Signed-off-by:"? > > That is the artifact of sending another person's patch in a series. > The first "From" is from the email header sender. The second "from" is > the real author of the patch. Just like an IP tunnel packet there is > another inner IP packet wrapped in the outer IP packet. > > I think that is all normal and did not violate the kernel rules. When > I include Kairui's patch in my swap allocator series. The same thing > happened there on Kairui's patch. In the end the git will know enough > who is the real author, because those patches are outputted by git > anyway. Hmm, maybe git doesn't work like that. I applied this patch via git am, I got this on my local branch. The 2nd 'From' become part of commit log. commit 337b3cd6c0ffad355df8851414e8aa5be052f4cb (HEAD -> kasan-v3) Author: Kairui Song <kasong@tencent.com> Date: Sat Sep 6 03:13:43 2025 +0800 docs/mm: add document for swap table From: Chris Li <chrisl@kernel.org> Swap table is the new swap cache. Signed-off-by: Chris Li <chrisl@kernel.org> Signed-off-by: Kairui Song <kasong@tencent.com>
On Mon, Sep 8, 2025 at 8:09 AM Baoquan He <bhe@redhat.com> wrote: > > I think that is all normal and did not violate the kernel rules. When > > I include Kairui's patch in my swap allocator series. The same thing > > happened there on Kairui's patch. In the end the git will know enough > > who is the real author, because those patches are outputted by git > > anyway. > > Hmm, maybe git doesn't work like that. I applied this patch via git am, > I got this on my local branch. The 2nd 'From' become part of commit log. > In that case, Kairui needs to fix his sendmail config. Maybe as you suggested, remove his own From: line in this case. I don't recall needing such a special git-send-mail config. Maybe Kairui's smtp server is different. BTW, I definitely know that Google's smtp server does not work well with "b4 send --reflect". Google SMTP is like: "Oh, I see you might want to CC these people, because you include them in your inner envelope CC list. let me do a CC for you on the outer envelope as well". It defeats the purpose of "b4 send --reflect", which is a dry run. I still recall the horror on my poor colleague's face when I convinced him to try out "b4 send --reflect", which should be safe, but the email actually sent out to the full list. I should file a bug for that. Chris > commit 337b3cd6c0ffad355df8851414e8aa5be052f4cb (HEAD -> kasan-v3) > Author: Kairui Song <kasong@tencent.com> > Date: Sat Sep 6 03:13:43 2025 +0800 > > docs/mm: add document for swap table > > From: Chris Li <chrisl@kernel.org> > > Swap table is the new swap cache. > > Signed-off-by: Chris Li <chrisl@kernel.org> > Signed-off-by: Kairui Song <kasong@tencent.com> >
On Mon, Sep 8, 2025 at 8:54 PM Baoquan He <bhe@redhat.com> wrote: > > On 09/06/25 at 03:13am, Kairui Song wrote: > > From: Kairui Song <kasong@tencent.com> > > > > From: Chris Li <chrisl@kernel.org> > > 'From author <authorkernel.org>' can only be one person, and the co-author > should be specified by "Co-developed-by:" and "Signed-off-by:"? > Hmm, that's interesting, I'm using git send mail with below setup: [sendemail] from = Kairui Song <ryncsn@gmail.com> confirm = auto smtpServer = smtp.gmail.com smtpServerPort = 587 smtpEncryption = tls smtpUser = ryncsn@gmail.com So it will add a "From:" automatically when I'm using gmail's SMTP but the patch author doesn't match the sender. It seems git somehow got confused by this commit, maybe I used some sending parameters wrongly. The author of the doc really should be Chris.
On 09/08/25 at 10:27pm, Kairui Song wrote: > On Mon, Sep 8, 2025 at 8:54 PM Baoquan He <bhe@redhat.com> wrote: > > > > On 09/06/25 at 03:13am, Kairui Song wrote: > > > From: Kairui Song <kasong@tencent.com> > > > > > > From: Chris Li <chrisl@kernel.org> > > > > 'From author <authorkernel.org>' can only be one person, and the co-author > > should be specified by "Co-developed-by:" and "Signed-off-by:"? > > > > Hmm, that's interesting, I'm using git send mail with below setup: > > [sendemail] > from = Kairui Song <ryncsn@gmail.com> > confirm = auto > smtpServer = smtp.gmail.com > smtpServerPort = 587 > smtpEncryption = tls > smtpUser = ryncsn@gmail.com > > So it will add a "From:" automatically when I'm using gmail's SMTP but > the patch author doesn't match the sender. It seems git somehow got > confused by this commit, maybe I used some sending parameters wrongly. Then you may need to remove the 'from' field of your git [sendemail] section. If I git am your patch, then the your first 'from' will be the patch author. > > The author of the doc really should be Chris. >
On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote: > > From: Kairui Song <kasong@tencent.com> > > From: Chris Li <chrisl@kernel.org> > > Swap table is the new swap cache. > > Signed-off-by: Chris Li <chrisl@kernel.org> > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 2 files changed, 73 insertions(+) > create mode 100644 Documentation/mm/swap-table.rst > > diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst > new file mode 100644 > index 000000000000..929cd91aa984 > --- /dev/null > +++ b/Documentation/mm/swap-table.rst > @@ -0,0 +1,72 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com> > + > +========== > +Swap Table > +========== > + > +Swap table implements swap cache as a per-cluster swap cache value array. > + > +Swap Entry > +---------- > + > +A swap entry contains the information required to serve the anonymous page > +fault. > + > +Swap entry is encoded as two parts: swap type and swap offset. > + > +The swap type indicates which swap device to use. > +The swap offset is the offset of the swap file to read the page data from. > + > +Swap Cache > +---------- > + > +Swap cache is a map to look up folios using swap entry as the key. The result > +value can have three possible types depending on which stage of this swap entry > +was in. > + > +1. NULL: This swap entry is not used. > + > +2. folio: A folio has been allocated and bound to this swap entry. This is > + the transient state of swap out or swap in. The folio data can be in > + the folio or swap file, or both. > + > +3. shadow: The shadow contains the working set information of the swap I just noticed a typo here, should be "swapped out page" > + outed folio. This is the normal state for a swap outed page. Same here. "swap outed page" -> "swapped out page" Chris
On Sat, Sep 6, 2025 at 8:05 AM Chris Li <chrisl@kernel.org> wrote: > > On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > From: Chris Li <chrisl@kernel.org> > > > > Swap table is the new swap cache. > > > > Signed-off-by: Chris Li <chrisl@kernel.org> > > Signed-off-by: Kairui Song <kasong@tencent.com> > > --- > > Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++ > > MAINTAINERS | 1 + > > 2 files changed, 73 insertions(+) > > create mode 100644 Documentation/mm/swap-table.rst > > > > diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst > > new file mode 100644 > > index 000000000000..929cd91aa984 > > --- /dev/null > > +++ b/Documentation/mm/swap-table.rst > > @@ -0,0 +1,72 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com> > > + > > +========== > > +Swap Table > > +========== > > + > > +Swap table implements swap cache as a per-cluster swap cache value array. > > + > > +Swap Entry > > +---------- > > + > > +A swap entry contains the information required to serve the anonymous page > > +fault. > > + > > +Swap entry is encoded as two parts: swap type and swap offset. > > + > > +The swap type indicates which swap device to use. > > +The swap offset is the offset of the swap file to read the page data from. > > + > > +Swap Cache > > +---------- > > + > > +Swap cache is a map to look up folios using swap entry as the key. The result > > +value can have three possible types depending on which stage of this swap entry > > +was in. > > + > > +1. NULL: This swap entry is not used. > > + > > +2. folio: A folio has been allocated and bound to this swap entry. This is > > + the transient state of swap out or swap in. The folio data can be in > > + the folio or swap file, or both. > > + > > +3. shadow: The shadow contains the working set information of the swap > > I just noticed a typo here, should be "swapped out page" > > > + outed folio. This is the normal state for a swap outed page. > > Same here. "swap outed page" -> "swapped out page" Thanks, I used some grammar check tools and it seems they are not perfect with kernel terminologies. > > Chris >
© 2016 - 2025 Red Hat, Inc.