mm/compaction.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
The issue is reported when removing memory through virtio_mem device.
The transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page
can't be migrated and the corresponding memory block can't be put
into offline state.
Fix it by replacing page_mapcount() with total_mapcount(). With this,
the transparent huge page can be isolated and migrated, and the memory
block can be put into offline state. Besides, The page's refcount is
increased a bit earlier to avoid the page is released when the check
is executed.
Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
v2: Corrected fix tag and increase page's refcount before the check
---
mm/compaction.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index c51f7f545afe..1f6da31dd9a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
goto isolate_fail;
}
+ /*
+ * Be careful not to clear PageLRU until after we're
+ * sure the page is not being freed elsewhere -- the
+ * page release code relies on it.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto isolate_fail;
+
/*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
- goto isolate_fail;
+ if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+ goto isolate_fail_put;
/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
if (!(cc->gfp_mask & __GFP_FS) && mapping)
- goto isolate_fail;
-
- /*
- * Be careful not to clear PageLRU until after we're
- * sure the page is not being freed elsewhere -- the
- * page release code relies on it.
- */
- if (unlikely(!get_page_unless_zero(page)))
- goto isolate_fail;
+ goto isolate_fail_put;
/* Only take pages on LRU: a check now makes later tests safe */
if (!PageLRU(page))
--
2.23.0
On 24.11.22 10:55, Gavin Shan wrote:
> The issue is reported when removing memory through virtio_mem device.
> The transparent huge page, experienced copy-on-write fault, is wrongly
> regarded as pinned. The transparent huge page is escaped from being
> isolated in isolate_migratepages_block(). The transparent huge page
> can't be migrated and the corresponding memory block can't be put
> into offline state.
>
> Fix it by replacing page_mapcount() with total_mapcount(). With this,
> the transparent huge page can be isolated and migrated, and the memory
> block can be put into offline state. Besides, The page's refcount is
> increased a bit earlier to avoid the page is released when the check
> is executed.
Did you look into handling pages that are in the swapcache case as well?
See is_refcount_suitable() in mm/khugepaged.c.
Should be easy to reproduce, let me know if you need inspiration.
>
> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
> Cc: stable@vger.kernel.org # v5.7+
> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> v2: Corrected fix tag and increase page's refcount before the check
> ---
> mm/compaction.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c51f7f545afe..1f6da31dd9a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> goto isolate_fail;
> }
>
> + /*
> + * Be careful not to clear PageLRU until after we're
> + * sure the page is not being freed elsewhere -- the
> + * page release code relies on it.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto isolate_fail;
> +
> /*
> * Migration will fail if an anonymous page is pinned in memory,
> * so avoid taking lru_lock and isolating it unnecessarily in an
> * admittedly racy check.
> */
> mapping = page_mapping(page);
> - if (!mapping && page_count(page) > page_mapcount(page))
> - goto isolate_fail;
> + if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> + goto isolate_fail_put;
>
> /*
> * Only allow to migrate anonymous pages in GFP_NOFS context
> * because those do not depend on fs locks.
> */
> if (!(cc->gfp_mask & __GFP_FS) && mapping)
> - goto isolate_fail;
> -
> - /*
> - * Be careful not to clear PageLRU until after we're
> - * sure the page is not being freed elsewhere -- the
> - * page release code relies on it.
> - */
> - if (unlikely(!get_page_unless_zero(page)))
> - goto isolate_fail;
> + goto isolate_fail_put;
>
> /* Only take pages on LRU: a check now makes later tests safe */
> if (!PageLRU(page))
--
Thanks,
David / dhildenb
On 11/24/22 6:09 PM, David Hildenbrand wrote:
> On 24.11.22 10:55, Gavin Shan wrote:
>> The issue is reported when removing memory through virtio_mem device.
>> The transparent huge page, experienced copy-on-write fault, is wrongly
>> regarded as pinned. The transparent huge page is escaped from being
>> isolated in isolate_migratepages_block(). The transparent huge page
>> can't be migrated and the corresponding memory block can't be put
>> into offline state.
>>
>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>> the transparent huge page can be isolated and migrated, and the memory
>> block can be put into offline state. Besides, The page's refcount is
>> increased a bit earlier to avoid the page is released when the check
>> is executed.
>
> Did you look into handling pages that are in the swapcache case as well?
>
> See is_refcount_suitable() in mm/khugepaged.c.
>
> Should be easy to reproduce, let me know if you need inspiration.
>
Nope, I didn't look into the case. Please elaborate the details so that
I can reproduce it firstly.
>>
>> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
>> Cc: stable@vger.kernel.org # v5.7+
>> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>> v2: Corrected fix tag and increase page's refcount before the check
>> ---
>> mm/compaction.c | 22 +++++++++++-----------
>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index c51f7f545afe..1f6da31dd9a5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> goto isolate_fail;
>> }
>> + /*
>> + * Be careful not to clear PageLRU until after we're
>> + * sure the page is not being freed elsewhere -- the
>> + * page release code relies on it.
>> + */
>> + if (unlikely(!get_page_unless_zero(page)))
>> + goto isolate_fail;
>> +
>> /*
>> * Migration will fail if an anonymous page is pinned in memory,
>> * so avoid taking lru_lock and isolating it unnecessarily in an
>> * admittedly racy check.
>> */
>> mapping = page_mapping(page);
>> - if (!mapping && page_count(page) > page_mapcount(page))
>> - goto isolate_fail;
>> + if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> + goto isolate_fail_put;
>> /*
>> * Only allow to migrate anonymous pages in GFP_NOFS context
>> * because those do not depend on fs locks.
>> */
>> if (!(cc->gfp_mask & __GFP_FS) && mapping)
>> - goto isolate_fail;
>> -
>> - /*
>> - * Be careful not to clear PageLRU until after we're
>> - * sure the page is not being freed elsewhere -- the
>> - * page release code relies on it.
>> - */
>> - if (unlikely(!get_page_unless_zero(page)))
>> - goto isolate_fail;
>> + goto isolate_fail_put;
>> /* Only take pages on LRU: a check now makes later tests safe */
>> if (!PageLRU(page))
>
Thanks,
Gavin
On 24.11.22 11:21, Gavin Shan wrote: > On 11/24/22 6:09 PM, David Hildenbrand wrote: >> On 24.11.22 10:55, Gavin Shan wrote: >>> The issue is reported when removing memory through virtio_mem device. >>> The transparent huge page, experienced copy-on-write fault, is wrongly >>> regarded as pinned. The transparent huge page is escaped from being >>> isolated in isolate_migratepages_block(). The transparent huge page >>> can't be migrated and the corresponding memory block can't be put >>> into offline state. >>> >>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>> the transparent huge page can be isolated and migrated, and the memory >>> block can be put into offline state. Besides, The page's refcount is >>> increased a bit earlier to avoid the page is released when the check >>> is executed. >> >> Did you look into handling pages that are in the swapcache case as well? >> >> See is_refcount_suitable() in mm/khugepaged.c. >> >> Should be easy to reproduce, let me know if you need inspiration. >> > > Nope, I didn't look into the case. Please elaborate the details so that > I can reproduce it firstly. A simple reproducer would be (on a system with ordinary swap (not zram)) 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP 2) Enable THP for that region (MADV_HUGEPAGE) 3) Populate a THP (e.g., write access) 4) PTE-map the THP, for example, using MADV_FREE on the last subpage 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT 6) Read-access to some subpages to fault them in from the swapcache Now you'd have a THP, which 1) Is partially PTE-mapped into the page table 2) Is in the swapcache (each subpage should have one reference from the swapache) Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). -- Thanks, David / dhildenb
On 11/24/22 6:43 PM, David Hildenbrand wrote:
> On 24.11.22 11:21, Gavin Shan wrote:
>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>> The issue is reported when removing memory through virtio_mem device.
>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>> regarded as pinned. The transparent huge page is escaped from being
>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>> can't be migrated and the corresponding memory block can't be put
>>>> into offline state.
>>>>
>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>> the transparent huge page can be isolated and migrated, and the memory
>>>> block can be put into offline state. Besides, The page's refcount is
>>>> increased a bit earlier to avoid the page is released when the check
>>>> is executed.
>>>
>>> Did you look into handling pages that are in the swapcache case as well?
>>>
>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>
>>> Should be easy to reproduce, let me know if you need inspiration.
>>>
>>
>> Nope, I didn't look into the case. Please elaborate the details so that
>> I can reproduce it firstly.
>
>
> A simple reproducer would be (on a system with ordinary swap (not zram))
>
> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>
> 2) Enable THP for that region (MADV_HUGEPAGE)
>
> 3) Populate a THP (e.g., write access)
>
> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>
> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>
> 6) Read-access to some subpages to fault them in from the swapcache
>
>
> Now you'd have a THP, which
>
> 1) Is partially PTE-mapped into the page table
> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>
>
> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>
Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
the THP (e.g. one sub-page) will force the THP to be split.
I followed your steps in the attached program, there is no issue to do memory hot-remove
through virtio-mem with or without this patch.
# numactl -p 1 testsuite mm swap -k
Any key to split THP
Any key to swap sub-pages
Any key to read the swapped sub-pages
Page[000]: 0xffffffffffffffff
Page[001]: 0xffffffffffffffff
:
Page[255]: 0xffffffffffffffff
Any key to exit // hold here and the program doesn't exit
(qemu) qom-set vm1 requested-size 0
[ 356.005396] virtio_mem virtio1: plugged size: 0x40000000
[ 356.005996] virtio_mem virtio1: requested size: 0x0
[ 356.350299] Fallback order for Node 0: 0 1
[ 356.350810] Fallback order for Node 1: 1 0
[ 356.351260] Built 2 zonelists, mobility grouping on. Total pages: 491343
[ 356.351998] Policy zone: DMA
Thanks,
Gavin
On 24.11.22 13:55, Gavin Shan wrote: > On 11/24/22 6:43 PM, David Hildenbrand wrote: >> On 24.11.22 11:21, Gavin Shan wrote: >>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>> The issue is reported when removing memory through virtio_mem device. >>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>> regarded as pinned. The transparent huge page is escaped from being >>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>> can't be migrated and the corresponding memory block can't be put >>>>> into offline state. >>>>> >>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>> the transparent huge page can be isolated and migrated, and the memory >>>>> block can be put into offline state. Besides, The page's refcount is >>>>> increased a bit earlier to avoid the page is released when the check >>>>> is executed. >>>> >>>> Did you look into handling pages that are in the swapcache case as well? >>>> >>>> See is_refcount_suitable() in mm/khugepaged.c. >>>> >>>> Should be easy to reproduce, let me know if you need inspiration. >>>> >>> >>> Nope, I didn't look into the case. Please elaborate the details so that >>> I can reproduce it firstly. >> >> >> A simple reproducer would be (on a system with ordinary swap (not zram)) >> >> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >> >> 2) Enable THP for that region (MADV_HUGEPAGE) >> >> 3) Populate a THP (e.g., write access) >> >> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >> >> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT >> >> 6) Read-access to some subpages to fault them in from the swapcache >> >> >> Now you'd have a THP, which >> >> 1) Is partially PTE-mapped into the page table >> 2) Is in the swapcache (each subpage should have one reference from the swapache) >> >> >> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). >> > > Thanks for the details. Step (4) and (5) can be actually combined. To swap part of > the THP (e.g. one sub-page) will force the THP to be split. > > I followed your steps in the attached program, there is no issue to do memory hot-remove > through virtio-mem with or without this patch. Interesting. But I don't really see how we could pass this check with a page that's in the swapcache, maybe I'm missing something else. I'll try to see if I can reproduce it. -- Thanks, David / dhildenb
On 24.11.22 14:22, David Hildenbrand wrote: > On 24.11.22 13:55, Gavin Shan wrote: >> On 11/24/22 6:43 PM, David Hildenbrand wrote: >>> On 24.11.22 11:21, Gavin Shan wrote: >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>>> The issue is reported when removing memory through virtio_mem device. >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>>> regarded as pinned. The transparent huge page is escaped from being >>>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>>> can't be migrated and the corresponding memory block can't be put >>>>>> into offline state. >>>>>> >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>>> the transparent huge page can be isolated and migrated, and the memory >>>>>> block can be put into offline state. Besides, The page's refcount is >>>>>> increased a bit earlier to avoid the page is released when the check >>>>>> is executed. >>>>> >>>>> Did you look into handling pages that are in the swapcache case as well? >>>>> >>>>> See is_refcount_suitable() in mm/khugepaged.c. >>>>> >>>>> Should be easy to reproduce, let me know if you need inspiration. >>>>> >>>> >>>> Nope, I didn't look into the case. Please elaborate the details so that >>>> I can reproduce it firstly. >>> >>> >>> A simple reproducer would be (on a system with ordinary swap (not zram)) >>> >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >>> >>> 2) Enable THP for that region (MADV_HUGEPAGE) >>> >>> 3) Populate a THP (e.g., write access) >>> >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >>> >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT >>> >>> 6) Read-access to some subpages to fault them in from the swapcache >>> >>> >>> Now you'd have a THP, which >>> >>> 1) Is partially PTE-mapped into the page table >>> 2) Is in the swapcache (each subpage should have one reference from the swapache) >>> >>> >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). >>> >> >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of >> the THP (e.g. one sub-page) will force the THP to be split. >> >> I followed your steps in the attached program, there is no issue to do memory hot-remove >> through virtio-mem with or without this patch. > > Interesting. But I don't really see how we could pass this check with a > page that's in the swapcache, maybe I'm missing something else. > > I'll try to see if I can reproduce it. > After some unsuccessful attempts and many head-scratches, I realized that it's quite simple why we don't have to worry about swapcache pages here: page_mapping() is != NULL for pages in the swapcache: folio_mapping() makes this rather obvious: if (unlikely(folio_test_swapcache(folio)) return swap_address_space(folio_swap_entry(folio)); I think the get_page_unless_zero() might also be a fix for the page_mapping() call, smells like something could blow up on concurrent page freeing. (what about concurrent removal from the swapcache? nobody knows :) ) Thanks Gavin! Acked-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb
With the patch applied, I'm unable to hit memory hot-remove failure in the environment where the issue was initially found. Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> On Thu, Nov 24, 2022 at 10:09 PM David Hildenbrand <david@redhat.com> wrote: > > On 24.11.22 14:22, David Hildenbrand wrote: > > On 24.11.22 13:55, Gavin Shan wrote: > >> On 11/24/22 6:43 PM, David Hildenbrand wrote: > >>> On 24.11.22 11:21, Gavin Shan wrote: > >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote: > >>>>> On 24.11.22 10:55, Gavin Shan wrote: > >>>>>> The issue is reported when removing memory through virtio_mem device. > >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly > >>>>>> regarded as pinned. The transparent huge page is escaped from being > >>>>>> isolated in isolate_migratepages_block(). The transparent huge page > >>>>>> can't be migrated and the corresponding memory block can't be put > >>>>>> into offline state. > >>>>>> > >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, > >>>>>> the transparent huge page can be isolated and migrated, and the memory > >>>>>> block can be put into offline state. Besides, The page's refcount is > >>>>>> increased a bit earlier to avoid the page is released when the check > >>>>>> is executed. > >>>>> > >>>>> Did you look into handling pages that are in the swapcache case as well? > >>>>> > >>>>> See is_refcount_suitable() in mm/khugepaged.c. > >>>>> > >>>>> Should be easy to reproduce, let me know if you need inspiration. > >>>>> > >>>> > >>>> Nope, I didn't look into the case. Please elaborate the details so that > >>>> I can reproduce it firstly. > >>> > >>> > >>> A simple reproducer would be (on a system with ordinary swap (not zram)) > >>> > >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > >>> > >>> 2) Enable THP for that region (MADV_HUGEPAGE) > >>> > >>> 3) Populate a THP (e.g., write access) > >>> > >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > >>> > >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > >>> > >>> 6) Read-access to some subpages to fault them in from the swapcache > >>> > >>> > >>> Now you'd have a THP, which > >>> > >>> 1) Is partially PTE-mapped into the page table > >>> 2) Is in the swapcache (each subpage should have one reference from the swapache) > >>> > >>> > >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > >>> > >> > >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of > >> the THP (e.g. one sub-page) will force the THP to be split. > >> > >> I followed your steps in the attached program, there is no issue to do memory hot-remove > >> through virtio-mem with or without this patch. > > > > Interesting. But I don't really see how we could pass this check with a > > page that's in the swapcache, maybe I'm missing something else. > > > > I'll try to see if I can reproduce it. > > > > After some unsuccessful attempts and many head-scratches, I realized > that it's quite simple why we don't have to worry about swapcache pages > here: > > page_mapping() is != NULL for pages in the swapcache: folio_mapping() > makes this rather obvious: > > if (unlikely(folio_test_swapcache(folio)) > return swap_address_space(folio_swap_entry(folio)); > > > I think the get_page_unless_zero() might also be a fix for the > page_mapping() call, smells like something could blow up on concurrent > page freeing. (what about concurrent removal from the swapcache? nobody > knows :) ) > > > Thanks Gavin! > > Acked-by: David Hildenbrand <david@redhat.com> > > > -- > Thanks, > > David / dhildenb >
On 24 Nov 2022, at 5:43, David Hildenbrand wrote: > On 24.11.22 11:21, Gavin Shan wrote: >> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>> On 24.11.22 10:55, Gavin Shan wrote: >>>> The issue is reported when removing memory through virtio_mem device. >>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>> regarded as pinned. The transparent huge page is escaped from being >>>> isolated in isolate_migratepages_block(). The transparent huge page >>>> can't be migrated and the corresponding memory block can't be put >>>> into offline state. >>>> >>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>> the transparent huge page can be isolated and migrated, and the memory >>>> block can be put into offline state. Besides, The page's refcount is >>>> increased a bit earlier to avoid the page is released when the check >>>> is executed. >>> >>> Did you look into handling pages that are in the swapcache case as well? >>> >>> See is_refcount_suitable() in mm/khugepaged.c. >>> >>> Should be easy to reproduce, let me know if you need inspiration. >>> >> >> Nope, I didn't look into the case. Please elaborate the details so that >> I can reproduce it firstly. > > > A simple reproducer would be (on a system with ordinary swap (not zram)) > > 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > > 2) Enable THP for that region (MADV_HUGEPAGE) > > 3) Populate a THP (e.g., write access) > > 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > > 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT Added the original THP swapout code author, Ying. At this step, the THP will be split, right? https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786 Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap() then swapped out. But I cannot find that split code now. > > 6) Read-access to some subpages to fault them in from the swapcache > > > Now you'd have a THP, which > > 1) Is partially PTE-mapped into the page table > 2) Is in the swapcache (each subpage should have one reference from the swapache) > > > Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > > -- > Thanks, > > David / dhildenb -- Best Regards, Yan Zi
On 24.11.22 13:38, Zi Yan wrote:
>
> On 24 Nov 2022, at 5:43, David Hildenbrand wrote:
>
>> On 24.11.22 11:21, Gavin Shan wrote:
>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>> can't be migrated and the corresponding memory block can't be put
>>>>> into offline state.
>>>>>
>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>> increased a bit earlier to avoid the page is released when the check
>>>>> is executed.
>>>>
>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>
>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>
>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>
>>>
>>> Nope, I didn't look into the case. Please elaborate the details so that
>>> I can reproduce it firstly.
>>
>>
>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>
>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>
>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>
>> 3) Populate a THP (e.g., write access)
>>
>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>
>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>
> Added the original THP swapout code author, Ying.
>
> At this step, the THP will be split, right?
>
> https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786
>
> Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap()
> then swapped out. But I cannot find that split code now.
I recall there was some sequence to achieve it. Maybe it was
swapping out the PMD first and not triggering a PTE-mapping first.
mm/vmscan.c:shrink_folio_list()
if (folio_test_large(folio)) {
/* cannot split folio, skip it */
if (!can_split_folio(folio, NULL))
goto activate_locked;
/*
* Split folios without a PMD map right
* away. Chances are some or all of the
* tail pages can be freed without IO.
*/
if (!folio_entire_mapcount(folio) &&
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
}
So the sequence might have to be
1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
2) Enable THP for that region (MADV_HUGEPAGE)
3) Populate a THP (e.g., write access)
4) Trigger swapout of the THP, for example, using MADV_PAGEOUT
5) Access some subpage
As we don't have PMD swap entries, we will PTE-map the
THP during try_to_unmap() IIRC.
Independent of that, the check we have here also doesn't consider
ordinary order-0 pages that might be in the swapache.
--
Thanks,
David / dhildenb
© 2016 - 2026 Red Hat, Inc.