include/linux/mm.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Currently, folio_expected_ref_count() only adds references for the swap
cache if the folio is anonymous. However, according to the comment above
the definition of PG_swapcache in enum pageflags, shmem folios can also
have PG_swapcache set. This patch makes sure references for the swap
cache are added if folio_test_swapcache(folio) is true.
This issue was found when trying to hot-unplug memory in a QEMU/KVM
virtual machine. When initiating hot-unplug when most of the guest
memory is allocated, hot-unplug hangs partway through removal due to
migration failures. The following message would be printed several
times, and would be printed again about every five seconds:
[ 49.641309] migrating pfn b12f25 failed ret:7
[ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
[ 49.641311] aops:swap_aops
[ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
[ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
[ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
[ 49.641315] page dumped because: migration failure
When debugging this, I found that these migration failures were due to
__migrate_folio() returning -EAGAIN for a small set of folios because
the expected reference count it calculates via folio_expected_ref_count()
is one less than the actual reference count of the folios. Furthermore,
all of the affected folios were not anonymous, but had the PG_swapcache
flag set, inspiring this patch. After applying this patch, the memory
hot-unplug behaves as expected.
I tested this on a machine running Ubuntu 24.04 with kernel version
6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
mm-unstable branch as a Dec 16, 2025 was also tested and behaves the
same) and 48GB of memory. The libvirt XML definition for the VM can be
found at [1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in
the guest kernel so the hot-pluggable memory is automatically onlined.
Below are the steps to reproduce this behavior:
1) Define and start and virtual machine
host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
host$ virsh -c qemu:///system start test_vm
2) Setup swap in the guest
guest$ sudo fallocate -l 32G /swapfile
guest$ sudo chmod 0600 /swapfile
guest$ sudo mkswap /swapfile
guest$ sudo swapon /swapfile
3) Use alloc_data [2] to allocate most of the remaining guest memory
guest$ ./alloc_data 45
4) In a separate guest terminal, monitor the amount of used memory
guest$ watch -n1 free -h
5) When alloc_data has finished allocating, initiate the memory
hot-unplug using the provided xml file [3]
host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
After initiating the memory hot-unplug, you should see the amount of
available memory in the guest decrease, and the amount of used swap data
increase. If everything works as expected, when all of the memory is
unplugged, there should be around 8.5-9GB of data in swap. If the
unplugging is unsuccessful, the amount of used swap data will settle
below that. If that happens, you should be able to see log messages in
dmesg similar to the one posted above.
[1] https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml
[2] https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c
[3] https://github.com/BijanT/linux_patch_files/blob/main/remove.xml
Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
Signed-off-by: Bijan Tabatabai <bijan311@gmail.com>
---
I am not very familiar with the memory hot-(un)plug or swapping code, so
I am not 100% certain if this patch actually solves the root of the
problem. I believe the issue is from shmem folios, in which case I believe
this patch is correct. However, I couldn't think of an easy way to confirm
that the affected folios were from shmem. I guess it could be possible that
the root cause could be from some bug where some anonymous pages do not
return true to folio_test_anon(). I don't think that's the case, but
figured the MM maintainers would have a better idea of what's going on.
---
include/linux/mm.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 15076261d0c2..6f959d8ca4b4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio)
if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
return 0;
- if (folio_test_anon(folio)) {
- /* One reference per page from the swapcache. */
- ref_count += folio_test_swapcache(folio) << order;
- } else {
+ /* One reference per page from the swapcache. */
+ ref_count += folio_test_swapcache(folio) << order;
+
+ if (!folio_test_anon(folio)) {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
--
2.43.0
On 12/16/25 21:07, Bijan Tabatabai wrote:
> Currently, folio_expected_ref_count() only adds references for the swap
> cache if the folio is anonymous. However, according to the comment above
> the definition of PG_swapcache in enum pageflags, shmem folios can also
> have PG_swapcache set. This patch makes sure references for the swap
> cache are added if folio_test_swapcache(folio) is true.
>
> This issue was found when trying to hot-unplug memory in a QEMU/KVM
> virtual machine. When initiating hot-unplug when most of the guest
> memory is allocated, hot-unplug hangs partway through removal due to
> migration failures. The following message would be printed several
> times, and would be printed again about every five seconds:
>
> [ 49.641309] migrating pfn b12f25 failed ret:7
> [ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
> [ 49.641311] aops:swap_aops
> [ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
> [ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
> [ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
> [ 49.641315] page dumped because: migration failure
>
> When debugging this, I found that these migration failures were due to
> __migrate_folio() returning -EAGAIN for a small set of folios because
> the expected reference count it calculates via folio_expected_ref_count()
> is one less than the actual reference count of the folios. Furthermore,
> all of the affected folios were not anonymous, but had the PG_swapcache
> flag set, inspiring this patch. After applying this patch, the memory
> hot-unplug behaves as expected.
>
> I tested this on a machine running Ubuntu 24.04 with kernel version
> 6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
> and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
> mm-unstable branch as a Dec 16, 2025 was also tested and behaves the
> same) and 48GB of memory. The libvirt XML definition for the VM can be
> found at [1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in
> the guest kernel so the hot-pluggable memory is automatically onlined.
>
> Below are the steps to reproduce this behavior:
>
> 1) Define and start and virtual machine
> host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
> host$ virsh -c qemu:///system start test_vm
>
> 2) Setup swap in the guest
> guest$ sudo fallocate -l 32G /swapfile
> guest$ sudo chmod 0600 /swapfile
> guest$ sudo mkswap /swapfile
> guest$ sudo swapon /swapfile
>
> 3) Use alloc_data [2] to allocate most of the remaining guest memory
> guest$ ./alloc_data 45
>
> 4) In a separate guest terminal, monitor the amount of used memory
> guest$ watch -n1 free -h
>
> 5) When alloc_data has finished allocating, initiate the memory
> hot-unplug using the provided xml file [3]
> host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
>
> After initiating the memory hot-unplug, you should see the amount of
> available memory in the guest decrease, and the amount of used swap data
> increase. If everything works as expected, when all of the memory is
> unplugged, there should be around 8.5-9GB of data in swap. If the
> unplugging is unsuccessful, the amount of used swap data will settle
> below that. If that happens, you should be able to see log messages in
> dmesg similar to the one posted above.
>
> [1] https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml
> [2] https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c
> [3] https://github.com/BijanT/linux_patch_files/blob/main/remove.xml
>
> Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
> Signed-off-by: Bijan Tabatabai <bijan311@gmail.com>
> ---
>
> I am not very familiar with the memory hot-(un)plug or swapping code, so
> I am not 100% certain if this patch actually solves the root of the
> problem. I believe the issue is from shmem folios, in which case I believe
> this patch is correct. However, I couldn't think of an easy way to confirm
> that the affected folios were from shmem. I guess it could be possible that
> the root cause could be from some bug where some anonymous pages do not
> return true to folio_test_anon(). I don't think that's the case, but
> figured the MM maintainers would have a better idea of what's going on.
>
> ---
> include/linux/mm.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 15076261d0c2..6f959d8ca4b4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio)
> if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
> return 0;
>
> - if (folio_test_anon(folio)) {
> - /* One reference per page from the swapcache. */
> - ref_count += folio_test_swapcache(folio) << order;
> - } else {
> + /* One reference per page from the swapcache. */
> + ref_count += folio_test_swapcache(folio) << order;
> +
> + if (!folio_test_anon(folio)) {
> /* One reference per page from the pagecache. */
> ref_count += !!folio->mapping << order;
> /* One reference from PG_private. */
We discussed that recently [1] and I think Zi wanted to send a patch. We
were a bit confused about the semantics of folio_test_swapcache(), but
concluded that it should be fine when called against pagecache folios.
So far I thought 86ebd50224c0 did not result in the issue because it
replaced
-static int folio_expected_refs(struct address_space *mapping,
- struct folio *folio)
-{
- int refs = 1;
- if (!mapping)
- return refs;
-
- refs += folio_nr_pages(folio);
- if (folio_test_private(folio))
- refs++;
-
- return refs;
-}
in migration code where !mapping would have only have returned 1
(reference held by the caller) that folio_expected_ref_count() now
expects to be added in the caller.
But looking again, in the caller, we obtain
mapping = folio_mapping(src)
Which returns the swap_address_space() for folios in the swapcache.
So it indeed looks like 86ebd50224c0 introduced the issue.
Thanks!
We should cc: stable
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
[1]
https://lore.kernel.org/all/33A929D1-7438-43C1-AA4A-398183976F8F@nvidia.com/
[2]
https://lore.kernel.org/all/66C159D8-D267-4B3B-9384-1CE94533990E@nvidia.com/
--
Cheers
David
On 16 Dec 2025, at 19:07, David Hildenbrand (Red Hat) wrote:
> On 12/16/25 21:07, Bijan Tabatabai wrote:
>> Currently, folio_expected_ref_count() only adds references for the swap
>> cache if the folio is anonymous. However, according to the comment above
>> the definition of PG_swapcache in enum pageflags, shmem folios can also
>> have PG_swapcache set. This patch makes sure references for the swap
>> cache are added if folio_test_swapcache(folio) is true.
>>
>> This issue was found when trying to hot-unplug memory in a QEMU/KVM
>> virtual machine. When initiating hot-unplug when most of the guest
>> memory is allocated, hot-unplug hangs partway through removal due to
>> migration failures. The following message would be printed several
>> times, and would be printed again about every five seconds:
>>
>> [ 49.641309] migrating pfn b12f25 failed ret:7
>> [ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
>> [ 49.641311] aops:swap_aops
>> [ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
>> [ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
>> [ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
>> [ 49.641315] page dumped because: migration failure
>>
>> When debugging this, I found that these migration failures were due to
>> __migrate_folio() returning -EAGAIN for a small set of folios because
>> the expected reference count it calculates via folio_expected_ref_count()
>> is one less than the actual reference count of the folios. Furthermore,
>> all of the affected folios were not anonymous, but had the PG_swapcache
>> flag set, inspiring this patch. After applying this patch, the memory
>> hot-unplug behaves as expected.
>>
>> I tested this on a machine running Ubuntu 24.04 with kernel version
>> 6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
>> and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
>> mm-unstable branch as a Dec 16, 2025 was also tested and behaves the
>> same) and 48GB of memory. The libvirt XML definition for the VM can be
>> found at [1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in
>> the guest kernel so the hot-pluggable memory is automatically onlined.
>>
>> Below are the steps to reproduce this behavior:
>>
>> 1) Define and start and virtual machine
>> host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
>> host$ virsh -c qemu:///system start test_vm
>>
>> 2) Setup swap in the guest
>> guest$ sudo fallocate -l 32G /swapfile
>> guest$ sudo chmod 0600 /swapfile
>> guest$ sudo mkswap /swapfile
>> guest$ sudo swapon /swapfile
>>
>> 3) Use alloc_data [2] to allocate most of the remaining guest memory
>> guest$ ./alloc_data 45
>>
>> 4) In a separate guest terminal, monitor the amount of used memory
>> guest$ watch -n1 free -h
>>
>> 5) When alloc_data has finished allocating, initiate the memory
>> hot-unplug using the provided xml file [3]
>> host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
>>
>> After initiating the memory hot-unplug, you should see the amount of
>> available memory in the guest decrease, and the amount of used swap data
>> increase. If everything works as expected, when all of the memory is
>> unplugged, there should be around 8.5-9GB of data in swap. If the
>> unplugging is unsuccessful, the amount of used swap data will settle
>> below that. If that happens, you should be able to see log messages in
>> dmesg similar to the one posted above.
>>
>> [1] https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml
>> [2] https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c
>> [3] https://github.com/BijanT/linux_patch_files/blob/main/remove.xml
>>
>> Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
>> Signed-off-by: Bijan Tabatabai <bijan311@gmail.com>
>> ---
>>
>> I am not very familiar with the memory hot-(un)plug or swapping code, so
>> I am not 100% certain if this patch actually solves the root of the
>> problem. I believe the issue is from shmem folios, in which case I believe
>> this patch is correct. However, I couldn't think of an easy way to confirm
>> that the affected folios were from shmem. I guess it could be possible that
>> the root cause could be from some bug where some anonymous pages do not
>> return true to folio_test_anon(). I don't think that's the case, but
>> figured the MM maintainers would have a better idea of what's going on.
I am not sure about if shmem in swapcache causes the issue, since
the above setup does not involve shmem. +Baolin and Hugh for some insight.
But David also mentioned that in __read_swap_cache_async() there is a chance
that anon folio in swapcache can have anon flag not set yet. +Chris and Kairui
for more analysis.
>>
>> ---
>> include/linux/mm.h | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 15076261d0c2..6f959d8ca4b4 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_count(const struct folio *folio)
>> if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
>> return 0;
>> - if (folio_test_anon(folio)) {
>> - /* One reference per page from the swapcache. */
>> - ref_count += folio_test_swapcache(folio) << order;
>> - } else {
>> + /* One reference per page from the swapcache. */
>> + ref_count += folio_test_swapcache(folio) << order;
>> +
>> + if (!folio_test_anon(folio)) {
>> /* One reference per page from the pagecache. */
>> ref_count += !!folio->mapping << order;
>> /* One reference from PG_private. */
This change is almost the same as what I proposed in [1] during my discussion
with David.
>
> We discussed that recently [1] and I think Zi wanted to send a patch. We were a bit confused about the semantics of folio_test_swapcache(), but concluded that it should be fine when called against pagecache folios.
>
> So far I thought 86ebd50224c0 did not result in the issue because it replaced
>
> -static int folio_expected_refs(struct address_space *mapping,
> - struct folio *folio)
> -{
> - int refs = 1;
> - if (!mapping)
> - return refs;
> -
> - refs += folio_nr_pages(folio);
> - if (folio_test_private(folio))
> - refs++;
> -
> - return refs;
> -}
>
> in migration code where !mapping would have only have returned 1 (reference held by the caller) that folio_expected_ref_count() now expects to be added in the caller.
>
>
> But looking again, in the caller, we obtain
>
> mapping = folio_mapping(src)
>
> Which returns the swap_address_space() for folios in the swapcache.
>
>
> So it indeed looks like 86ebd50224c0 introduced the issue.
>
> Thanks!
>
> We should cc: stable
>
>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
>
>
> [1] https://lore.kernel.org/all/33A929D1-7438-43C1-AA4A-398183976F8F@nvidia.com/
> [2] https://lore.kernel.org/all/66C159D8-D267-4B3B-9384-1CE94533990E@nvidia.com/
>
I agree with David. Acked-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
On Wed, Dec 17, 2025 at 8:34 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 16 Dec 2025, at 19:07, David Hildenbrand (Red Hat) wrote:
>
> > On 12/16/25 21:07, Bijan Tabatabai wrote:
> >> Currently, folio_expected_ref_count() only adds references for the swap
> >> cache if the folio is anonymous. However, according to the comment above
> >> the definition of PG_swapcache in enum pageflags, shmem folios can also
> >> have PG_swapcache set. This patch makes sure references for the swap
> >> cache are added if folio_test_swapcache(folio) is true.
> >>
> >> This issue was found when trying to hot-unplug memory in a QEMU/KVM
> >> virtual machine. When initiating hot-unplug when most of the guest
> >> memory is allocated, hot-unplug hangs partway through removal due to
> >> migration failures. The following message would be printed several
> >> times, and would be printed again about every five seconds:
> >>
> >> [ 49.641309] migrating pfn b12f25 failed ret:7
> >> [ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
> >> [ 49.641311] aops:swap_aops
> >> [ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
> >> [ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
> >> [ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
> >> [ 49.641315] page dumped because: migration failure
> >>
> >> When debugging this, I found that these migration failures were due to
> >> __migrate_folio() returning -EAGAIN for a small set of folios because
> >> the expected reference count it calculates via folio_expected_ref_count()
> >> is one less than the actual reference count of the folios. Furthermore,
> >> all of the affected folios were not anonymous, but had the PG_swapcache
> >> flag set, inspiring this patch. After applying this patch, the memory
> >> hot-unplug behaves as expected.
> >>
> >> I tested this on a machine running Ubuntu 24.04 with kernel version
> >> 6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
> >> and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
> >> mm-unstable branch as a Dec 16, 2025 was also tested and behaves the
> >> same) and 48GB of memory. The libvirt XML definition for the VM can be
> >> found at [1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in
> >> the guest kernel so the hot-pluggable memory is automatically onlined.
> >>
> >> Below are the steps to reproduce this behavior:
> >>
> >> 1) Define and start and virtual machine
> >> host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
> >> host$ virsh -c qemu:///system start test_vm
> >>
> >> 2) Setup swap in the guest
> >> guest$ sudo fallocate -l 32G /swapfile
> >> guest$ sudo chmod 0600 /swapfile
> >> guest$ sudo mkswap /swapfile
> >> guest$ sudo swapon /swapfile
> >>
> >> 3) Use alloc_data [2] to allocate most of the remaining guest memory
> >> guest$ ./alloc_data 45
> >>
> >> 4) In a separate guest terminal, monitor the amount of used memory
> >> guest$ watch -n1 free -h
> >>
> >> 5) When alloc_data has finished allocating, initiate the memory
> >> hot-unplug using the provided xml file [3]
> >> host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
> >>
> >> After initiating the memory hot-unplug, you should see the amount of
> >> available memory in the guest decrease, and the amount of used swap data
> >> increase. If everything works as expected, when all of the memory is
> >> unplugged, there should be around 8.5-9GB of data in swap. If the
> >> unplugging is unsuccessful, the amount of used swap data will settle
> >> below that. If that happens, you should be able to see log messages in
> >> dmesg similar to the one posted above.
> >>
> >> [1] https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml
> >> [2] https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c
> >> [3] https://github.com/BijanT/linux_patch_files/blob/main/remove.xml
> >>
> >> Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
> >> Signed-off-by: Bijan Tabatabai <bijan311@gmail.com>
> >> ---
> >>
> >> I am not very familiar with the memory hot-(un)plug or swapping code, so
> >> I am not 100% certain if this patch actually solves the root of the
> >> problem. I believe the issue is from shmem folios, in which case I believe
> >> this patch is correct. However, I couldn't think of an easy way to confirm
> >> that the affected folios were from shmem. I guess it could be possible that
> >> the root cause could be from some bug where some anonymous pages do not
> >> return true to folio_test_anon(). I don't think that's the case, but
> >> figured the MM maintainers would have a better idea of what's going on.
>
> I am not sure about if shmem in swapcache causes the issue, since
> the above setup does not involve shmem. +Baolin and Hugh for some insight.
>
> But David also mentioned that in __read_swap_cache_async() there is a chance
> that anon folio in swapcache can have anon flag not set yet. +Chris and Kairui
> for more analysis.
Yeah, that's possible, a typical case is swap readahead will alloc and
add folios into swap cache, but won't add it to anon/shmem mapping.
Anon/shmem will use the folio in swapcache upon page fault, and make
it anon/shmem folio by then.
This change looks good to me too, thanks for Ccing me.
>>> >>> I am not very familiar with the memory hot-(un)plug or swapping code, so >>> I am not 100% certain if this patch actually solves the root of the >>> problem. I believe the issue is from shmem folios, in which case I believe >>> this patch is correct. However, I couldn't think of an easy way to confirm >>> that the affected folios were from shmem. I guess it could be possible that >>> the root cause could be from some bug where some anonymous pages do not >>> return true to folio_test_anon(). I don't think that's the case, but >>> figured the MM maintainers would have a better idea of what's going on. > > I am not sure about if shmem in swapcache causes the issue, since > the above setup does not involve shmem. +Baolin and Hugh for some insight. We might just push out another unrelated shmem page to swap as we create memory pressure in the system I think. > > But David also mentioned that in __read_swap_cache_async() there is a chance > that anon folio in swapcache can have anon flag not set yet. +Chris and Kairui > for more analysis. Right, when we swapin an anon folio and did not map it into the page table yet. Likely we can trigger something similar when we proactively read a shmem page from swap into the swapcache. So it's unclear "where" a swapcache page belongs to until we move it to its owner (anon / shmem), which is also why I cannot judge easily from [ 49.641309] migrating pfn b12f25 failed ret:7 [ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25 [ 49.641311] aops:swap_aops [ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3) [ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000 [ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000 [ 49.641315] page dumped because: migration failure What exactly that was. It was certainly an order-0 folio. [...] > > I agree with David. Acked-by: Zi Yan <ziy@nvidia.com> Thanks for the fast review :) -- Cheers David
On 2025/12/17 09:04, David Hildenbrand (Red Hat) wrote: >>>> >>>> I am not very familiar with the memory hot-(un)plug or swapping >>>> code, so >>>> I am not 100% certain if this patch actually solves the root of the >>>> problem. I believe the issue is from shmem folios, in which case I >>>> believe >>>> this patch is correct. However, I couldn't think of an easy way to >>>> confirm >>>> that the affected folios were from shmem. I guess it could be >>>> possible that >>>> the root cause could be from some bug where some anonymous pages do not >>>> return true to folio_test_anon(). I don't think that's the case, but >>>> figured the MM maintainers would have a better idea of what's going on. >> >> I am not sure about if shmem in swapcache causes the issue, since >> the above setup does not involve shmem. +Baolin and Hugh for some >> insight. > > We might just push out another unrelated shmem page to swap as we create > memory pressure in the system I think. > >> >> But David also mentioned that in __read_swap_cache_async() there is a >> chance >> that anon folio in swapcache can have anon flag not set yet. +Chris >> and Kairui >> for more analysis. > > Right, when we swapin an anon folio and did not map it into the page > table yet. Likely we can trigger something similar when we proactively > read a shmem page from swap into the swapcache. > > So it's unclear "where" a swapcache page belongs to until we move it to > its owner (anon / shmem), which is also why I cannot judge easily from > > [ 49.641309] migrating pfn b12f25 failed ret:7 > [ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 > index:0x7f404d925 pfn:0xb12f25 > [ 49.641311] aops:swap_aops > [ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1| > reclaim|swapbacked|node=0|zone=3) > [ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 > 0000000000000000 > [ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff > 0000000000000000 > [ 49.641315] page dumped because: migration failure > > What exactly that was. > > It was certainly an order-0 folio. Thanks David for the explanation. It completely makes sense to me. So feel free to add: Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
© 2016 - 2025 Red Hat, Inc.