[PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN

Shivank Garg posted 2 patches 1 week, 4 days ago
There is a newer version of this series
[PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by Shivank Garg 1 week, 4 days ago
When collapse_file encounters dirty or writeback pages in file-backed
mappings, it currently SCAN_FAIL which maps to -EINVAL. This is
misleading as EINVAL suggests invalid arguments, whereas dirty/writeback
pages represent transient conditions that may resolve on retry.

Introduce SCAN_PAGE_NOT_CLEAN to cover both dirty and writeback states,
mapping it to -EAGAIN. For MADV_COLLAPSE, this provides userspace with
a clear signal that retry may succeed after writeback completes, making
-EAGAIN semantically correct. For khugepaged, this is harmless as it
will naturally revisit the range during periodic scans after async
writeback completes.

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 include/trace/events/huge_memory.h | 3 ++-
 mm/khugepaged.c                    | 8 +++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 4cde53b45a85..1caf24b951e1 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -37,7 +37,8 @@
 	EM( SCAN_PAGE_HAS_PRIVATE,	"page_has_private")		\
 	EM( SCAN_STORE_FAILED,		"store_failed")			\
 	EM( SCAN_COPY_MC,		"copy_poisoned_page")		\
-	EMe(SCAN_PAGE_FILLED,		"page_filled")
+	EM( SCAN_PAGE_FILLED,		"page_filled")			\
+	EMe(SCAN_PAGE_NOT_CLEAN,	"page_not_clean")
 
 #undef EM
 #undef EMe
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 066a332c76ad..282b413d17e8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -59,6 +59,7 @@ enum scan_result {
 	SCAN_STORE_FAILED,
 	SCAN_COPY_MC,
 	SCAN_PAGE_FILLED,
+	SCAN_PAGE_NOT_CLEAN,
 };
 
 #define CREATE_TRACE_POINTS
@@ -1968,11 +1969,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 				 */
 				xas_unlock_irq(&xas);
 				filemap_flush(mapping);
-				result = SCAN_FAIL;
+				result = SCAN_PAGE_NOT_CLEAN;
 				goto xa_unlocked;
 			} else if (folio_test_writeback(folio)) {
 				xas_unlock_irq(&xas);
-				result = SCAN_FAIL;
+				result = SCAN_PAGE_NOT_CLEAN;
 				goto xa_unlocked;
 			} else if (folio_trylock(folio)) {
 				folio_get(folio);
@@ -2019,7 +2020,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 			 * folio is dirty because it hasn't been flushed
 			 * since first write.
 			 */
-			result = SCAN_FAIL;
+			result = SCAN_PAGE_NOT_CLEAN;
 			goto out_unlock;
 		}
 
@@ -2748,6 +2749,7 @@ static int madvise_collapse_errno(enum scan_result r)
 	case SCAN_PAGE_LRU:
 	case SCAN_DEL_PAGE_LRU:
 	case SCAN_PAGE_FILLED:
+	case SCAN_PAGE_NOT_CLEAN:
 		return -EAGAIN;
 	/*
 	 * Other: Trying again likely not to succeed / error intrinsic to
-- 
2.43.0
Re: [PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by Dev Jain 1 week, 4 days ago
On 20/11/25 12:20 pm, Shivank Garg wrote:
> When collapse_file encounters dirty or writeback pages in file-backed
> mappings, it currently SCAN_FAIL which maps to -EINVAL. This is
> misleading as EINVAL suggests invalid arguments, whereas dirty/writeback
> pages represent transient conditions that may resolve on retry.
>
> Introduce SCAN_PAGE_NOT_CLEAN to cover both dirty and writeback states,
> mapping it to -EAGAIN. For MADV_COLLAPSE, this provides userspace with
> a clear signal that retry may succeed after writeback completes, making
> -EAGAIN semantically correct. For khugepaged, this is harmless as it
> will naturally revisit the range during periodic scans after async
> writeback completes.
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>   include/trace/events/huge_memory.h | 3 ++-
>   mm/khugepaged.c                    | 8 +++++---
>   2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 4cde53b45a85..1caf24b951e1 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -37,7 +37,8 @@
>   	EM( SCAN_PAGE_HAS_PRIVATE,	"page_has_private")		\
>   	EM( SCAN_STORE_FAILED,		"store_failed")			\
>   	EM( SCAN_COPY_MC,		"copy_poisoned_page")		\
> -	EMe(SCAN_PAGE_FILLED,		"page_filled")
> +	EM( SCAN_PAGE_FILLED,		"page_filled")			\
> +	EMe(SCAN_PAGE_NOT_CLEAN,	"page_not_clean")
>   
>   #undef EM
>   #undef EMe
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 066a332c76ad..282b413d17e8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -59,6 +59,7 @@ enum scan_result {
>   	SCAN_STORE_FAILED,
>   	SCAN_COPY_MC,
>   	SCAN_PAGE_FILLED,
> +	SCAN_PAGE_NOT_CLEAN,
>   };
>   
>   #define CREATE_TRACE_POINTS
> @@ -1968,11 +1969,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>   				 */
>   				xas_unlock_irq(&xas);
>   				filemap_flush(mapping);
> -				result = SCAN_FAIL;
> +				result = SCAN_PAGE_NOT_CLEAN;
>   				goto xa_unlocked;
>   			} else if (folio_test_writeback(folio)) {
>   				xas_unlock_irq(&xas);
> -				result = SCAN_FAIL;
> +				result = SCAN_PAGE_NOT_CLEAN;
>   				goto xa_unlocked;
>   			} else if (folio_trylock(folio)) {
>   				folio_get(folio);
> @@ -2019,7 +2020,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>   			 * folio is dirty because it hasn't been flushed
>   			 * since first write.
>   			 */
> -			result = SCAN_FAIL;
> +			result = SCAN_PAGE_NOT_CLEAN;
>   			goto out_unlock;
>   		}
>   
> @@ -2748,6 +2749,7 @@ static int madvise_collapse_errno(enum scan_result r)
>   	case SCAN_PAGE_LRU:
>   	case SCAN_DEL_PAGE_LRU:
>   	case SCAN_PAGE_FILLED:
> +	case SCAN_PAGE_NOT_CLEAN:
>   		return -EAGAIN;
>   	/*
>   	 * Other: Trying again likely not to succeed / error intrinsic to

SCAN_PAGE_NOT_CLEAN is confusing - NOT_CLEAN literally means dirty, so why not SCAN_PAGE_DIRTY?
Or SCAN_PAGE_DIRTY_OR_UNDER_WRITEBACK? Since folio_test_writeback() is true as a result of
the folio being dirty, maybe just SCAN_PAGE_DIRTY can do.

Reviewed-by: Dev Jain <dev.jain@arm.com>
Re: [PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by Garg, Shivank 1 week, 4 days ago

On 11/20/2025 1:33 PM, Dev Jain wrote:
> 
> On 20/11/25 12:20 pm, Shivank Garg wrote:

> SCAN_PAGE_NOT_CLEAN is confusing - NOT_CLEAN literally means dirty, so why not SCAN_PAGE_DIRTY?
> Or SCAN_PAGE_DIRTY_OR_UNDER_WRITEBACK? Since folio_test_writeback() is true as a result of
> the folio being dirty, maybe just SCAN_PAGE_DIRTY can do.
> 
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> 
Thanks for the review.

I chose not to use SCAN_PAGE_DIRTY because dirty and writeback have different meanings[1]:

    Dirty: Memory that is waiting to be written back to disk
Writeback: Memory that is actively being written back to disk

[1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt

IIUC, a page under writeback is no longer dirty, so using SCAN_PAGE_DIRTY would be misleading
for pages in the writeback state.

I considered SCAN_PAGE_DIRTY_OR_WRITEBACK initially but felt it was too long.

SCAN_PAGE_NOT_CLEAN covers both states that indicate the page is not in a clean/stable
state suitable for collapse.

[1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt

Thanks,
Shivank
Re: [PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by Lance Yang 1 week, 4 days ago

On 2025/11/20 16:17, Garg, Shivank wrote:
> 
> 
> On 11/20/2025 1:33 PM, Dev Jain wrote:
>>
>> On 20/11/25 12:20 pm, Shivank Garg wrote:
> 
>> SCAN_PAGE_NOT_CLEAN is confusing - NOT_CLEAN literally means dirty, so why not SCAN_PAGE_DIRTY?
>> Or SCAN_PAGE_DIRTY_OR_UNDER_WRITEBACK? Since folio_test_writeback() is true as a result of
>> the folio being dirty, maybe just SCAN_PAGE_DIRTY can do.
>>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>
> Thanks for the review.
> 
> I chose not to use SCAN_PAGE_DIRTY because dirty and writeback have different meanings[1]:
> 
>      Dirty: Memory that is waiting to be written back to disk
> Writeback: Memory that is actively being written back to disk
> 
> [1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt
> 
> IIUC, a page under writeback is no longer dirty, so using SCAN_PAGE_DIRTY would be misleading
> for pages in the writeback state.
> 
> I considered SCAN_PAGE_DIRTY_OR_WRITEBACK initially but felt it was too long.

Nit: If SCAN_PAGE_DIRTY_OR_WRITEBACK is too verbose, how about 
SCAN_PAGE_DIRTY_WB?

It keeps the specificity without the length, and is arguably more 
descriptive
than NOT_CLEAN ;)

That said, LGTM.

Reviewed-by: Lance Yang <lance.yang@linux.dev>

> 
> SCAN_PAGE_NOT_CLEAN covers both states that indicate the page is not in a clean/stable
> state suitable for collapse.
> 
> [1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt
> 
> Thanks,
> Shivank
Re: [PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by David Hildenbrand (Red Hat) 1 week, 4 days ago
On 11/20/25 13:24, Lance Yang wrote:
> 
> 
> On 2025/11/20 16:17, Garg, Shivank wrote:
>>
>>
>> On 11/20/2025 1:33 PM, Dev Jain wrote:
>>>
>>> On 20/11/25 12:20 pm, Shivank Garg wrote:
>>
>>> SCAN_PAGE_NOT_CLEAN is confusing - NOT_CLEAN literally means dirty, so why not SCAN_PAGE_DIRTY?
>>> Or SCAN_PAGE_DIRTY_OR_UNDER_WRITEBACK? Since folio_test_writeback() is true as a result of
>>> the folio being dirty, maybe just SCAN_PAGE_DIRTY can do.
>>>
>>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>>
>> Thanks for the review.
>>
>> I chose not to use SCAN_PAGE_DIRTY because dirty and writeback have different meanings[1]:
>>
>>       Dirty: Memory that is waiting to be written back to disk
>> Writeback: Memory that is actively being written back to disk
>>
>> [1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt
>>
>> IIUC, a page under writeback is no longer dirty, so using SCAN_PAGE_DIRTY would be misleading
>> for pages in the writeback state.
>>
>> I considered SCAN_PAGE_DIRTY_OR_WRITEBACK initially but felt it was too long.
> 
> Nit: If SCAN_PAGE_DIRTY_OR_WRITEBACK 

I would prefer that here.

-- 
Cheers

David
Re: [PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by Garg, Shivank 1 week, 3 days ago

On 11/20/2025 6:59 PM, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 13:24, Lance Yang wrote:
>>
>>
>> On 2025/11/20 16:17, Garg, Shivank wrote:
>>>
>>>
>>> On 11/20/2025 1:33 PM, Dev Jain wrote:
>>>>
>>>> On 20/11/25 12:20 pm, Shivank Garg wrote:
>>>
>>>> SCAN_PAGE_NOT_CLEAN is confusing - NOT_CLEAN literally means dirty, so why not SCAN_PAGE_DIRTY?
>>>> Or SCAN_PAGE_DIRTY_OR_UNDER_WRITEBACK? Since folio_test_writeback() is true as a result of
>>>> the folio being dirty, maybe just SCAN_PAGE_DIRTY can do.
>>>>
>>>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>>>
>>> Thanks for the review.
>>>
>>> I chose not to use SCAN_PAGE_DIRTY because dirty and writeback have different meanings[1]:
>>>
>>>       Dirty: Memory that is waiting to be written back to disk
>>> Writeback: Memory that is actively being written back to disk
>>>
>>> [1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt
>>>
>>> IIUC, a page under writeback is no longer dirty, so using SCAN_PAGE_DIRTY would be misleading
>>> for pages in the writeback state.
>>>
>>> I considered SCAN_PAGE_DIRTY_OR_WRITEBACK initially but felt it was too long.
>>
>> Nit: If SCAN_PAGE_DIRTY_OR_WRITEBACK 
> 
> I would prefer that here.
> 

I agree on this. 
If the consensus is SCAN_PAGE_DIRTY_OR_WRITEBACK, I'll use it in v3.

Thanks,
Shivank
Re: [PATCH V2 2/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN
Posted by Dev Jain 1 week, 4 days ago
On 20/11/25 1:47 pm, Garg, Shivank wrote:
>
> On 11/20/2025 1:33 PM, Dev Jain wrote:
>> On 20/11/25 12:20 pm, Shivank Garg wrote:
>> SCAN_PAGE_NOT_CLEAN is confusing - NOT_CLEAN literally means dirty, so why not SCAN_PAGE_DIRTY?
>> Or SCAN_PAGE_DIRTY_OR_UNDER_WRITEBACK? Since folio_test_writeback() is true as a result of
>> the folio being dirty, maybe just SCAN_PAGE_DIRTY can do.
>>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>
> Thanks for the review.
>
> I chose not to use SCAN_PAGE_DIRTY because dirty and writeback have different meanings[1]:
>
>      Dirty: Memory that is waiting to be written back to disk
> Writeback: Memory that is actively being written back to disk
>
> [1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt
>
> IIUC, a page under writeback is no longer dirty, so using SCAN_PAGE_DIRTY would be misleading
> for pages in the writeback state.
>
> I considered SCAN_PAGE_DIRTY_OR_WRITEBACK initially but felt it was too long.
>
> SCAN_PAGE_NOT_CLEAN covers both states that indicate the page is not in a clean/stable
> state suitable for collapse.
>
> [1] https://www.kernel.org/doc/Documentation/filesystems/proc.txt

Alright, makes sense.

>
> Thanks,
> Shivank