[PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios

Usama Arif posted 6 patches 1 year, 3 months ago
[PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios
Posted by Usama Arif 1 year, 3 months ago
Currently folio->_deferred_list is used to keep track of
partially_mapped folios that are going to be split under memory
pressure. In the next patch, all THPs that are faulted in and collapsed
by khugepaged are also going to be tracked using _deferred_list.

This patch introduces a pageflag to be able to distinguish between
partially mapped folios and others in the deferred_list at split time in
deferred_split_scan. Its needed as __folio_remove_rmap decrements
_mapcount, _large_mapcount and _entire_mapcount, hence it won't be
possible to distinguish between partially mapped folios and others in
deferred_split_scan.

Eventhough it introduces an extra flag to track if the folio is
partially mapped, there is no functional change intended with this
patch and the flag is not useful in this patch itself, it will
become useful in the next patch when _deferred_list has non partially
mapped folios.

Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
 include/linux/huge_mm.h    |  4 ++--
 include/linux/page-flags.h | 13 +++++++++++-
 mm/huge_memory.c           | 41 ++++++++++++++++++++++++++++----------
 mm/memcontrol.c            |  3 ++-
 mm/migrate.c               |  3 ++-
 mm/page_alloc.c            |  5 +++--
 mm/rmap.c                  |  5 +++--
 mm/vmscan.c                |  3 ++-
 8 files changed, 56 insertions(+), 21 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4da102b74a8c..0b0539f4ee1a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -333,7 +333,7 @@ static inline int split_huge_page(struct page *page)
 {
 	return split_huge_page_to_list_to_order(page, NULL, 0);
 }
-void deferred_split_folio(struct folio *folio);
+void deferred_split_folio(struct folio *folio, bool partially_mapped);
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze, struct folio *folio);
@@ -502,7 +502,7 @@ static inline int split_huge_page(struct page *page)
 {
 	return 0;
 }
-static inline void deferred_split_folio(struct folio *folio) {}
+static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 2175ebceb41c..1b3a76710487 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -186,6 +186,7 @@ enum pageflags {
 	/* At least one page in this folio has the hwpoison flag set */
 	PG_has_hwpoisoned = PG_active,
 	PG_large_rmappable = PG_workingset, /* anon or file-backed */
+	PG_partially_mapped = PG_reclaim, /* was identified to be partially mapped */
 };
 
 #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
@@ -859,8 +860,18 @@ static inline void ClearPageCompound(struct page *page)
 	ClearPageHead(page);
 }
 FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
+FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
+/*
+ * PG_partially_mapped is protected by deferred_split split_queue_lock,
+ * so its safe to use non-atomic set/clear.
+ */
+__FOLIO_SET_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
+__FOLIO_CLEAR_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
 #else
 FOLIO_FLAG_FALSE(large_rmappable)
+FOLIO_TEST_FLAG_FALSE(partially_mapped)
+__FOLIO_SET_FLAG_NOOP(partially_mapped)
+__FOLIO_CLEAR_FLAG_NOOP(partially_mapped)
 #endif
 
 #define PG_head_mask ((1UL << PG_head))
@@ -1171,7 +1182,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
  */
 #define PAGE_FLAGS_SECOND						\
 	(0xffUL /* order */		| 1UL << PG_has_hwpoisoned |	\
-	 1UL << PG_large_rmappable)
+	 1UL << PG_large_rmappable	| 1UL << PG_partially_mapped)
 
 #define PAGE_FLAGS_PRIVATE				\
 	(1UL << PG_private | 1UL << PG_private_2)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index af60684e7c70..166f8810f3c6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3503,7 +3503,11 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		if (folio_order(folio) > 1 &&
 		    !list_empty(&folio->_deferred_list)) {
 			ds_queue->split_queue_len--;
-			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+			if (folio_test_partially_mapped(folio)) {
+				__folio_clear_partially_mapped(folio);
+				mod_mthp_stat(folio_order(folio),
+					      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+			}
 			/*
 			 * Reinitialize page_deferred_list after removing the
 			 * page from the split_queue, otherwise a subsequent
@@ -3570,13 +3574,18 @@ void __folio_undo_large_rmappable(struct folio *folio)
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	if (!list_empty(&folio->_deferred_list)) {
 		ds_queue->split_queue_len--;
-		mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+		if (folio_test_partially_mapped(folio)) {
+			__folio_clear_partially_mapped(folio);
+			mod_mthp_stat(folio_order(folio),
+				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+		}
 		list_del_init(&folio->_deferred_list);
 	}
 	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 }
 
-void deferred_split_folio(struct folio *folio)
+/* partially_mapped=false won't clear PG_partially_mapped folio flag */
+void deferred_split_folio(struct folio *folio, bool partially_mapped)
 {
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 #ifdef CONFIG_MEMCG
@@ -3604,15 +3613,21 @@ void deferred_split_folio(struct folio *folio)
 	if (folio_test_swapcache(folio))
 		return;
 
-	if (!list_empty(&folio->_deferred_list))
-		return;
-
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
+	if (partially_mapped) {
+		if (!folio_test_partially_mapped(folio)) {
+			__folio_set_partially_mapped(folio);
+			if (folio_test_pmd_mappable(folio))
+				count_vm_event(THP_DEFERRED_SPLIT_PAGE);
+			count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
+			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
+
+		}
+	} else {
+		/* partially mapped folios cannot become non-partially mapped */
+		VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
+	}
 	if (list_empty(&folio->_deferred_list)) {
-		if (folio_test_pmd_mappable(folio))
-			count_vm_event(THP_DEFERRED_SPLIT_PAGE);
-		count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
-		mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
 		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
 		ds_queue->split_queue_len++;
 #ifdef CONFIG_MEMCG
@@ -3660,7 +3675,11 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 			list_move(&folio->_deferred_list, &list);
 		} else {
 			/* We lost race with folio_put() */
-			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+			if (folio_test_partially_mapped(folio)) {
+				__folio_clear_partially_mapped(folio);
+				mod_mthp_stat(folio_order(folio),
+					      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+			}
 			list_del_init(&folio->_deferred_list);
 			ds_queue->split_queue_len--;
 		}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 087a8cb1a6d8..e66da58a365d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4629,7 +4629,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
 	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
 	VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
 			!folio_test_hugetlb(folio) &&
-			!list_empty(&folio->_deferred_list), folio);
+			!list_empty(&folio->_deferred_list) &&
+			folio_test_partially_mapped(folio), folio);
 
 	/*
 	 * Nobody should be changing or seriously looking at
diff --git a/mm/migrate.c b/mm/migrate.c
index d039863e014b..35cc9d35064b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1766,7 +1766,8 @@ static int migrate_pages_batch(struct list_head *from,
 			 * use _deferred_list.
 			 */
 			if (nr_pages > 2 &&
-			   !list_empty(&folio->_deferred_list)) {
+			   !list_empty(&folio->_deferred_list) &&
+			   folio_test_partially_mapped(folio)) {
 				if (!try_split_folio(folio, split_folios, mode)) {
 					nr_failed++;
 					stats->nr_thp_failed += is_thp;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c2ffccf9d213..a82c221b7c2e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -962,8 +962,9 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 		break;
 	case 2:
 		/* the second tail page: deferred_list overlaps ->mapping */
-		if (unlikely(!list_empty(&folio->_deferred_list))) {
-			bad_page(page, "on deferred list");
+		if (unlikely(!list_empty(&folio->_deferred_list) &&
+		    folio_test_partially_mapped(folio))) {
+			bad_page(page, "partially mapped folio on deferred list");
 			goto out;
 		}
 		break;
diff --git a/mm/rmap.c b/mm/rmap.c
index 78529cf0fd66..a8797d1b3d49 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1579,8 +1579,9 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
 	 * Check partially_mapped first to ensure it is a large folio.
 	 */
 	if (partially_mapped && folio_test_anon(folio) &&
-	    list_empty(&folio->_deferred_list))
-		deferred_split_folio(folio);
+	    !folio_test_partially_mapped(folio))
+		deferred_split_folio(folio, true);
+
 	__folio_mod_stat(folio, -nr, -nr_pmdmapped);
 
 	/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f27792e77a0f..4ca612f7e473 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1238,7 +1238,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 					 * Split partially mapped folios right away.
 					 * We can free the unmapped pages without IO.
 					 */
-					if (data_race(!list_empty(&folio->_deferred_list)) &&
+					if (data_race(!list_empty(&folio->_deferred_list) &&
+					    folio_test_partially_mapped(folio)) &&
 					    split_folio_to_list(folio, folio_list))
 						goto activate_locked;
 				}
-- 
2.43.5
Re: [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios
Posted by David Hildenbrand 1 year ago
On 30.08.24 12:03, Usama Arif wrote:
> Currently folio->_deferred_list is used to keep track of
> partially_mapped folios that are going to be split under memory
> pressure. In the next patch, all THPs that are faulted in and collapsed
> by khugepaged are also going to be tracked using _deferred_list.
> 
> This patch introduces a pageflag to be able to distinguish between
> partially mapped folios and others in the deferred_list at split time in
> deferred_split_scan. Its needed as __folio_remove_rmap decrements
> _mapcount, _large_mapcount and _entire_mapcount, hence it won't be
> possible to distinguish between partially mapped folios and others in
> deferred_split_scan.
> 
> Eventhough it introduces an extra flag to track if the folio is
> partially mapped, there is no functional change intended with this
> patch and the flag is not useful in this patch itself, it will
> become useful in the next patch when _deferred_list has non partially
> mapped folios.
> 
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
>   include/linux/huge_mm.h    |  4 ++--
>   include/linux/page-flags.h | 13 +++++++++++-
>   mm/huge_memory.c           | 41 ++++++++++++++++++++++++++++----------
>   mm/memcontrol.c            |  3 ++-
>   mm/migrate.c               |  3 ++-
>   mm/page_alloc.c            |  5 +++--
>   mm/rmap.c                  |  5 +++--
>   mm/vmscan.c                |  3 ++-
>   8 files changed, 56 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 4da102b74a8c..0b0539f4ee1a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -333,7 +333,7 @@ static inline int split_huge_page(struct page *page)
>   {
>   	return split_huge_page_to_list_to_order(page, NULL, 0);
>   }
> -void deferred_split_folio(struct folio *folio);
> +void deferred_split_folio(struct folio *folio, bool partially_mapped);
>   
>   void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>   		unsigned long address, bool freeze, struct folio *folio);
> @@ -502,7 +502,7 @@ static inline int split_huge_page(struct page *page)
>   {
>   	return 0;
>   }
> -static inline void deferred_split_folio(struct folio *folio) {}
> +static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
>   #define split_huge_pmd(__vma, __pmd, __address)	\
>   	do { } while (0)
>   
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 2175ebceb41c..1b3a76710487 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -186,6 +186,7 @@ enum pageflags {
>   	/* At least one page in this folio has the hwpoison flag set */
>   	PG_has_hwpoisoned = PG_active,
>   	PG_large_rmappable = PG_workingset, /* anon or file-backed */
> +	PG_partially_mapped = PG_reclaim, /* was identified to be partially mapped */
>   };
>   
>   #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
> @@ -859,8 +860,18 @@ static inline void ClearPageCompound(struct page *page)
>   	ClearPageHead(page);
>   }
>   FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
> +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
> +/*
> + * PG_partially_mapped is protected by deferred_split split_queue_lock,
> + * so its safe to use non-atomic set/clear.

Just stumbled over that. In my understanding, this assumption is wrong.

I don't think anything prevents other PF_ANY (PG_anon_exclusive, 
PG_PG_hwpoison) / PF_SECOND (PF_has_hwpoisoned) flags from getting 
modified concurrently I'm afraid.

-- 
Cheers,

David / dhildenb
Re: [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios
Posted by Usama Arif 1 year ago

On 11/12/2024 18:03, David Hildenbrand wrote:
> On 30.08.24 12:03, Usama Arif wrote:
>> Currently folio->_deferred_list is used to keep track of
>> partially_mapped folios that are going to be split under memory
>> pressure. In the next patch, all THPs that are faulted in and collapsed
>> by khugepaged are also going to be tracked using _deferred_list.
>>
>> This patch introduces a pageflag to be able to distinguish between
>> partially mapped folios and others in the deferred_list at split time in
>> deferred_split_scan. Its needed as __folio_remove_rmap decrements
>> _mapcount, _large_mapcount and _entire_mapcount, hence it won't be
>> possible to distinguish between partially mapped folios and others in
>> deferred_split_scan.
>>
>> Eventhough it introduces an extra flag to track if the folio is
>> partially mapped, there is no functional change intended with this
>> patch and the flag is not useful in this patch itself, it will
>> become useful in the next patch when _deferred_list has non partially
>> mapped folios.
>>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>> ---
>>   include/linux/huge_mm.h    |  4 ++--
>>   include/linux/page-flags.h | 13 +++++++++++-
>>   mm/huge_memory.c           | 41 ++++++++++++++++++++++++++++----------
>>   mm/memcontrol.c            |  3 ++-
>>   mm/migrate.c               |  3 ++-
>>   mm/page_alloc.c            |  5 +++--
>>   mm/rmap.c                  |  5 +++--
>>   mm/vmscan.c                |  3 ++-
>>   8 files changed, 56 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 4da102b74a8c..0b0539f4ee1a 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -333,7 +333,7 @@ static inline int split_huge_page(struct page *page)
>>   {
>>       return split_huge_page_to_list_to_order(page, NULL, 0);
>>   }
>> -void deferred_split_folio(struct folio *folio);
>> +void deferred_split_folio(struct folio *folio, bool partially_mapped);
>>     void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>           unsigned long address, bool freeze, struct folio *folio);
>> @@ -502,7 +502,7 @@ static inline int split_huge_page(struct page *page)
>>   {
>>       return 0;
>>   }
>> -static inline void deferred_split_folio(struct folio *folio) {}
>> +static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
>>   #define split_huge_pmd(__vma, __pmd, __address)    \
>>       do { } while (0)
>>   diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 2175ebceb41c..1b3a76710487 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -186,6 +186,7 @@ enum pageflags {
>>       /* At least one page in this folio has the hwpoison flag set */
>>       PG_has_hwpoisoned = PG_active,
>>       PG_large_rmappable = PG_workingset, /* anon or file-backed */
>> +    PG_partially_mapped = PG_reclaim, /* was identified to be partially mapped */
>>   };
>>     #define PAGEFLAGS_MASK        ((1UL << NR_PAGEFLAGS) - 1)
>> @@ -859,8 +860,18 @@ static inline void ClearPageCompound(struct page *page)
>>       ClearPageHead(page);
>>   }
>>   FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
>> +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
>> +/*
>> + * PG_partially_mapped is protected by deferred_split split_queue_lock,
>> + * so its safe to use non-atomic set/clear.
> 
> Just stumbled over that. In my understanding, this assumption is wrong.
> 
> I don't think anything prevents other PF_ANY (PG_anon_exclusive, PG_PG_hwpoison) / PF_SECOND (PF_has_hwpoisoned) flags from getting modified concurrently I'm afraid.
> 
Hi David,

Just to clear my understanding, what you are suggesting could happen in __folio_set/clear_partially_mapped is:
1) __folio_set/clear_partially_mapped reads the 2nd page flags (x) where one of the other 2nd page flags is lets say not set.
2) One of the other 2nd page flags are set atomically.
3) __folio_set/clear_partially_mapped writes x + changes to partially_mapped. However, the change in step 2 to one of the other 2nd page flag is lost.

Is that correct? But that would mean we shouldn't have any page flags (first or second page) as non atomic? although it would depend if they are being
changed at the same time point. If you encountered a particular instance of PG_anon_exclusive or PF_has_hwpoisoned being changed at the same point as
__folio_set/clear_partially_mapped, could you point to it?

I am happy to send a fix to change all set/clear_partially_mapped to atomic, but just want to understand this better.

Thanks!
Usama

Re: [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios
Posted by David Hildenbrand 1 year ago
On 12.12.24 11:30, Usama Arif wrote:
> 
> 
> On 11/12/2024 18:03, David Hildenbrand wrote:
>> On 30.08.24 12:03, Usama Arif wrote:
>>> Currently folio->_deferred_list is used to keep track of
>>> partially_mapped folios that are going to be split under memory
>>> pressure. In the next patch, all THPs that are faulted in and collapsed
>>> by khugepaged are also going to be tracked using _deferred_list.
>>>
>>> This patch introduces a pageflag to be able to distinguish between
>>> partially mapped folios and others in the deferred_list at split time in
>>> deferred_split_scan. Its needed as __folio_remove_rmap decrements
>>> _mapcount, _large_mapcount and _entire_mapcount, hence it won't be
>>> possible to distinguish between partially mapped folios and others in
>>> deferred_split_scan.
>>>
>>> Eventhough it introduces an extra flag to track if the folio is
>>> partially mapped, there is no functional change intended with this
>>> patch and the flag is not useful in this patch itself, it will
>>> become useful in the next patch when _deferred_list has non partially
>>> mapped folios.
>>>
>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>> ---
>>>    include/linux/huge_mm.h    |  4 ++--
>>>    include/linux/page-flags.h | 13 +++++++++++-
>>>    mm/huge_memory.c           | 41 ++++++++++++++++++++++++++++----------
>>>    mm/memcontrol.c            |  3 ++-
>>>    mm/migrate.c               |  3 ++-
>>>    mm/page_alloc.c            |  5 +++--
>>>    mm/rmap.c                  |  5 +++--
>>>    mm/vmscan.c                |  3 ++-
>>>    8 files changed, 56 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 4da102b74a8c..0b0539f4ee1a 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -333,7 +333,7 @@ static inline int split_huge_page(struct page *page)
>>>    {
>>>        return split_huge_page_to_list_to_order(page, NULL, 0);
>>>    }
>>> -void deferred_split_folio(struct folio *folio);
>>> +void deferred_split_folio(struct folio *folio, bool partially_mapped);
>>>      void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>>            unsigned long address, bool freeze, struct folio *folio);
>>> @@ -502,7 +502,7 @@ static inline int split_huge_page(struct page *page)
>>>    {
>>>        return 0;
>>>    }
>>> -static inline void deferred_split_folio(struct folio *folio) {}
>>> +static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
>>>    #define split_huge_pmd(__vma, __pmd, __address)    \
>>>        do { } while (0)
>>>    diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index 2175ebceb41c..1b3a76710487 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -186,6 +186,7 @@ enum pageflags {
>>>        /* At least one page in this folio has the hwpoison flag set */
>>>        PG_has_hwpoisoned = PG_active,
>>>        PG_large_rmappable = PG_workingset, /* anon or file-backed */
>>> +    PG_partially_mapped = PG_reclaim, /* was identified to be partially mapped */
>>>    };
>>>      #define PAGEFLAGS_MASK        ((1UL << NR_PAGEFLAGS) - 1)
>>> @@ -859,8 +860,18 @@ static inline void ClearPageCompound(struct page *page)
>>>        ClearPageHead(page);
>>>    }
>>>    FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
>>> +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
>>> +/*
>>> + * PG_partially_mapped is protected by deferred_split split_queue_lock,
>>> + * so its safe to use non-atomic set/clear.
>>
>> Just stumbled over that. In my understanding, this assumption is wrong.
>>
>> I don't think anything prevents other PF_ANY (PG_anon_exclusive, PG_PG_hwpoison) / PF_SECOND (PF_has_hwpoisoned) flags from getting modified concurrently I'm afraid.
>>
> Hi David,
> 
> Just to clear my understanding, what you are suggesting could happen in __folio_set/clear_partially_mapped is:
> 1) __folio_set/clear_partially_mapped reads the 2nd page flags (x) where one of the other 2nd page flags is lets say not set.
> 2) One of the other 2nd page flags are set atomically.
> 3) __folio_set/clear_partially_mapped writes x + changes to partially_mapped. However, the change in step 2 to one of the other 2nd page flag is lost.
> 
> Is that correct?

That matches my understanding.

But that would mean we shouldn't have any page flags (first or second 
page) as non atomic?

Yes. We may only use non-atomic variants as long as we can guarantee 
that nobody can concurrently operate on the flags, for example on the 
early folio allocation path or on the folio freeing path.

Observe how the other SECOND users are atomic, PG_anon_exclusive is 
atomic (except on two page freeing paths) and PF_hwpoison is atomic.


> although it would depend if they are being
> changed at the same time point. If you encountered a particular instance of PG_anon_exclusive or PF_has_hwpoisoned being changed at the same point as
> __folio_set/clear_partially_mapped, could you point to it?

Regarding PG_hwpoison, observe how memory_failure() performs the 
TestSetPageHWPoison() + folio_set_has_hwpoisoned() before unmapping the 
pages, without any locking. This can race with pretty much any operation 
that triggers unmapping.

With PG_anon_exclusive it's a bit more complicated, but it's probably 
sufficient if MADV_DONTNEED (setting deferred) races with concurrent 
swapout/mgration (clearing PG_anon_exclusive), whereby both operations 
are not performed under the same PT lock. This can happen after partial 
mremap(), or after fork() when only parts of the large folio were shared 
with the child.

-- 
Cheers,

David / dhildenb