[v1] fixup: mm/huge_memory.c: introduce folio_split_unmapped

[PATCH] fixup: mm/huge_memory.c: introduce folio_split_unmapped

Posted by Balbir Singh 2 months, 3 weeks ago

Code refactoring of __folio_split() via helper
__folio_freeze_and_split_unmapped() caused a regression with clang-20
with CONFIG_SHMEM=n, the compiler was not able to optimize away the
call to shmem_uncharge() due to changes in nr_shmem_dropped.
Fix this by checking for shmem_mapping() prior to calling
shmem_uncharge(), shmem_mapping() returns false when CONFIG_SHMEM=n.

smatch also complained about parameter end being used without
initialization, which is a false positive, but keep the tool happy
by sending in initialized parameters. end is initialized to 0.

Add detailed documentation comments for folio_split_unmapped()

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>

Signed-off-by: Balbir Singh <balbirs@nvidia.com>
---
 mm/huge_memory.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78a31a476ad3..c4267a0f74df 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3751,6 +3751,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 	int ret = 0;
 	struct deferred_split *ds_queue;
 
+	VM_WARN_ON_ONCE(!mapping && end != 0);
 	/* Prevent deferred_split_scan() touching ->_refcount */
 	ds_queue = folio_split_queue_lock(folio);
 	if (folio_ref_freeze(folio, 1 + extra_pins)) {
@@ -3919,7 +3920,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 	int nr_shmem_dropped = 0;
 	int remap_flags = 0;
 	int extra_pins, ret;
-	pgoff_t end;
+	pgoff_t end = 0;
 	bool is_hzp;
 
 	VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
@@ -4049,7 +4050,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 
 	local_irq_enable();
 
-	if (nr_shmem_dropped)
+	if (mapping && shmem_mapping(mapping) && nr_shmem_dropped)
 		shmem_uncharge(mapping->host, nr_shmem_dropped);
 
 	if (!ret && is_anon && !folio_is_device_private(folio))
@@ -4092,16 +4093,27 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 	return ret;
 }
 
-/*
- * This function is a helper for splitting folios that have already been unmapped.
- * The use case is that the device or the CPU can refuse to migrate THP pages in
- * the middle of migration, due to allocation issues on either side
+/**
+ * folio_split_unmapped() - split a large anon folio that is already unmapped
+ * @folio: folio to split
+ * @new_order: the order of folios after split
+ *
+ * This function is a helper for splitting folios that have already been
+ * unmapped. The use case is that the device or the CPU can refuse to migrate
+ * THP pages in the middle of migration, due to allocation issues on either
+ * side.
+ *
+ * anon_vma_lock is not required to be held, mmap_read_lock() or
+ * mmap_write_lock() should be held. @folio is expected to be locked by the
+ * caller. device-private and non device-private folios are supported along
+ * with folios that are in the swapcache. @folio should also be unmapped and
+ * isolated from LRU (if applicable)
  *
- * The high level code is copied from __folio_split, since the pages are anonymous
- * and are already isolated from the LRU, the code has been simplified to not
- * burden __folio_split with unmapped sprinkled into the code.
+ * Upon return, the folio is not remapped, split folios are not added to LRU,
+ * free_folio_and_swap_cache() is not called, and new folios remain locked.
  *
- * None of the split folios are unlocked
+ * Return: 0 on success, -EAGAIN if the folio cannot be split (e.g., due to
+ *         insufficient reference count or extra pins).
  */
 int folio_split_unmapped(struct folio *folio, unsigned int new_order)
 {
-- 
2.51.1

Re: [PATCH] fixup: mm/huge_memory.c: introduce folio_split_unmapped

Posted by David Hildenbrand (Red Hat) 2 months, 2 weeks ago

On 11/20/25 04:07, Balbir Singh wrote:
> Code refactoring of __folio_split() via helper
> __folio_freeze_and_split_unmapped() caused a regression with clang-20
> with CONFIG_SHMEM=n, the compiler was not able to optimize away the
> call to shmem_uncharge() due to changes in nr_shmem_dropped.
> Fix this by checking for shmem_mapping() prior to calling
> shmem_uncharge(), shmem_mapping() returns false when CONFIG_SHMEM=n.
> 
> smatch also complained about parameter end being used without
> initialization, which is a false positive, but keep the tool happy
> by sending in initialized parameters. end is initialized to 0.
> 
> Add detailed documentation comments for folio_split_unmapped()
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
> Cc: Rakie Kim <rakie.kim@sk.com>
> Cc: Byungchul Park <byungchul@sk.com>
> Cc: Gregory Price <gourry@gourry.net>
> Cc: Ying Huang <ying.huang@linux.alibaba.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Barry Song <baohua@kernel.org>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mika Penttilä <mpenttil@redhat.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Francois Dugast <francois.dugast@intel.com>
> 
> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
> ---
>   mm/huge_memory.c | 32 ++++++++++++++++++++++----------
>   1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 78a31a476ad3..c4267a0f74df 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3751,6 +3751,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>   	int ret = 0;
>   	struct deferred_split *ds_queue;
>   
> +	VM_WARN_ON_ONCE(!mapping && end != 0);

You could drop the "!= 0"

>   	/* Prevent deferred_split_scan() touching ->_refcount */
>   	ds_queue = folio_split_queue_lock(folio);
>   	if (folio_ref_freeze(folio, 1 + extra_pins)) {
> @@ -3919,7 +3920,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   	int nr_shmem_dropped = 0;
>   	int remap_flags = 0;
>   	int extra_pins, ret;
> -	pgoff_t end;
> +	pgoff_t end = 0;
>   	bool is_hzp;
>   
>   	VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
> @@ -4049,7 +4050,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   
>   	local_irq_enable();
>   
> -	if (nr_shmem_dropped)
> +	if (mapping && shmem_mapping(mapping) && nr_shmem_dropped)
>   		shmem_uncharge(mapping->host, nr_shmem_dropped);

That looks questionable. We shouldn't add runtime check to handle 
buildtime things.

Likely what you want is instead

if (IS_ENABLED(CONFIG_SHMEM) && nr_shmem_dropped)
	shmem_uncharge()

>   
>   	if (!ret && is_anon && !folio_is_device_private(folio))
> @@ -4092,16 +4093,27 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   	return ret;
>   }
>   
> -/*
> - * This function is a helper for splitting folios that have already been unmapped.
> - * The use case is that the device or the CPU can refuse to migrate THP pages in
> - * the middle of migration, due to allocation issues on either side
> +/**
> + * folio_split_unmapped() - split a large anon folio that is already unmapped
> + * @folio: folio to split
> + * @new_order: the order of folios after split
> + *
> + * This function is a helper for splitting folios that have already been
> + * unmapped. The use case is that the device or the CPU can refuse to migrate
> + * THP pages in the middle of migration, due to allocation issues on either
> + * side.
> + *
> + * anon_vma_lock is not required to be held, mmap_read_lock() or
> + * mmap_write_lock() should be held. @folio is expected to be locked by the
> + * caller. device-private and non device-private folios are supported along
> + * with folios that are in the swapcache. @folio should also be unmapped and
> + * isolated from LRU (if applicable)
>    *
> - * The high level code is copied from __folio_split, since the pages are anonymous
> - * and are already isolated from the LRU, the code has been simplified to not
> - * burden __folio_split with unmapped sprinkled into the code.
> + * Upon return, the folio is not remapped, split folios are not added to LRU,
> + * free_folio_and_swap_cache() is not called, and new folios remain locked.
>    *
> - * None of the split folios are unlocked
> + * Return: 0 on success, -EAGAIN if the folio cannot be split (e.g., due to
> + *         insufficient reference count or extra pins).

Sounds much better to me, thanks.

-- 
Cheers

David

Re: [PATCH] fixup: mm/huge_memory.c: introduce folio_split_unmapped

Posted by Balbir Singh 2 months, 2 weeks ago

On 11/20/25 20:09, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 04:07, Balbir Singh wrote:
>> Code refactoring of __folio_split() via helper
>> __folio_freeze_and_split_unmapped() caused a regression with clang-20
>> with CONFIG_SHMEM=n, the compiler was not able to optimize away the
>> call to shmem_uncharge() due to changes in nr_shmem_dropped.
>> Fix this by checking for shmem_mapping() prior to calling
>> shmem_uncharge(), shmem_mapping() returns false when CONFIG_SHMEM=n.
>>
>> smatch also complained about parameter end being used without
>> initialization, which is a false positive, but keep the tool happy
>> by sending in initialized parameters. end is initialized to 0.
>>
>> Add detailed documentation comments for folio_split_unmapped()
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
>> Cc: Rakie Kim <rakie.kim@sk.com>
>> Cc: Byungchul Park <byungchul@sk.com>
>> Cc: Gregory Price <gourry@gourry.net>
>> Cc: Ying Huang <ying.huang@linux.alibaba.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
>> Cc: Nico Pache <npache@redhat.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Dev Jain <dev.jain@arm.com>
>> Cc: Barry Song <baohua@kernel.org>
>> Cc: Lyude Paul <lyude@redhat.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: David Airlie <airlied@gmail.com>
>> Cc: Simona Vetter <simona@ffwll.ch>
>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>> Cc: Mika Penttilä <mpenttil@redhat.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Francois Dugast <francois.dugast@intel.com>
>>
>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>> ---
>>   mm/huge_memory.c | 32 ++++++++++++++++++++++----------
>>   1 file changed, 22 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 78a31a476ad3..c4267a0f74df 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3751,6 +3751,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>       int ret = 0;
>>       struct deferred_split *ds_queue;
>>   +    VM_WARN_ON_ONCE(!mapping && end != 0);
> 
> You could drop the "!= 0"

Ack

VM_WARN_ONE(!mapping && end);

> 
>>       /* Prevent deferred_split_scan() touching ->_refcount */
>>       ds_queue = folio_split_queue_lock(folio);
>>       if (folio_ref_freeze(folio, 1 + extra_pins)) {
>> @@ -3919,7 +3920,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>       int nr_shmem_dropped = 0;
>>       int remap_flags = 0;
>>       int extra_pins, ret;
>> -    pgoff_t end;
>> +    pgoff_t end = 0;
>>       bool is_hzp;
>>         VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
>> @@ -4049,7 +4050,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>         local_irq_enable();
>>   -    if (nr_shmem_dropped)
>> +    if (mapping && shmem_mapping(mapping) && nr_shmem_dropped)
>>           shmem_uncharge(mapping->host, nr_shmem_dropped);
> 
> That looks questionable. We shouldn't add runtime check to handle buildtime things.
> 
> Likely what you want is instead
> 
> if (IS_ENABLED(CONFIG_SHMEM) && nr_shmem_dropped)
>     shmem_uncharge()
> 

shmem_mapping() returns false for CONFIG_SHMEM=n and shmem_mapping() checks that the mapping
is indeed for shmem ops before uncharging. Happy to change it if you like,
your version is more readable

>>         if (!ret && is_anon && !folio_is_device_private(folio))
>> @@ -4092,16 +4093,27 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>       return ret;
>>   }
>>   -/*
>> - * This function is a helper for splitting folios that have already been unmapped.
>> - * The use case is that the device or the CPU can refuse to migrate THP pages in
>> - * the middle of migration, due to allocation issues on either side
>> +/**
>> + * folio_split_unmapped() - split a large anon folio that is already unmapped
>> + * @folio: folio to split
>> + * @new_order: the order of folios after split
>> + *
>> + * This function is a helper for splitting folios that have already been
>> + * unmapped. The use case is that the device or the CPU can refuse to migrate
>> + * THP pages in the middle of migration, due to allocation issues on either
>> + * side.
>> + *
>> + * anon_vma_lock is not required to be held, mmap_read_lock() or
>> + * mmap_write_lock() should be held. @folio is expected to be locked by the
>> + * caller. device-private and non device-private folios are supported along
>> + * with folios that are in the swapcache. @folio should also be unmapped and
>> + * isolated from LRU (if applicable)
>>    *
>> - * The high level code is copied from __folio_split, since the pages are anonymous
>> - * and are already isolated from the LRU, the code has been simplified to not
>> - * burden __folio_split with unmapped sprinkled into the code.
>> + * Upon return, the folio is not remapped, split folios are not added to LRU,
>> + * free_folio_and_swap_cache() is not called, and new folios remain locked.
>>    *
>> - * None of the split folios are unlocked
>> + * Return: 0 on success, -EAGAIN if the folio cannot be split (e.g., due to
>> + *         insufficient reference count or extra pins).
> 
> Sounds much better to me, thanks.
> 

Thanks
Balbir

Re: [PATCH] fixup: mm/huge_memory.c: introduce folio_split_unmapped

Posted by David Hildenbrand (Red Hat) 2 months, 2 weeks ago

On 11/20/25 10:25, Balbir Singh wrote:
> On 11/20/25 20:09, David Hildenbrand (Red Hat) wrote:
>> On 11/20/25 04:07, Balbir Singh wrote:
>>> Code refactoring of __folio_split() via helper
>>> __folio_freeze_and_split_unmapped() caused a regression with clang-20
>>> with CONFIG_SHMEM=n, the compiler was not able to optimize away the
>>> call to shmem_uncharge() due to changes in nr_shmem_dropped.
>>> Fix this by checking for shmem_mapping() prior to calling
>>> shmem_uncharge(), shmem_mapping() returns false when CONFIG_SHMEM=n.
>>>
>>> smatch also complained about parameter end being used without
>>> initialization, which is a false positive, but keep the tool happy
>>> by sending in initialized parameters. end is initialized to 0.
>>>
>>> Add detailed documentation comments for folio_split_unmapped()
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
>>> Cc: Rakie Kim <rakie.kim@sk.com>
>>> Cc: Byungchul Park <byungchul@sk.com>
>>> Cc: Gregory Price <gourry@gourry.net>
>>> Cc: Ying Huang <ying.huang@linux.alibaba.com>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
>>> Cc: Nico Pache <npache@redhat.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Dev Jain <dev.jain@arm.com>
>>> Cc: Barry Song <baohua@kernel.org>
>>> Cc: Lyude Paul <lyude@redhat.com>
>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>> Cc: David Airlie <airlied@gmail.com>
>>> Cc: Simona Vetter <simona@ffwll.ch>
>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>> Cc: Mika Penttilä <mpenttil@redhat.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Francois Dugast <francois.dugast@intel.com>
>>>
>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>> ---
>>>    mm/huge_memory.c | 32 ++++++++++++++++++++++----------
>>>    1 file changed, 22 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 78a31a476ad3..c4267a0f74df 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3751,6 +3751,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>>        int ret = 0;
>>>        struct deferred_split *ds_queue;
>>>    +    VM_WARN_ON_ONCE(!mapping && end != 0);
>>
>> You could drop the "!= 0"
> 
> Ack
> 
> VM_WARN_ONE(!mapping && end);
> 
>>
>>>        /* Prevent deferred_split_scan() touching ->_refcount */
>>>        ds_queue = folio_split_queue_lock(folio);
>>>        if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>> @@ -3919,7 +3920,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>        int nr_shmem_dropped = 0;
>>>        int remap_flags = 0;
>>>        int extra_pins, ret;
>>> -    pgoff_t end;
>>> +    pgoff_t end = 0;
>>>        bool is_hzp;
>>>          VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
>>> @@ -4049,7 +4050,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>          local_irq_enable();
>>>    -    if (nr_shmem_dropped)
>>> +    if (mapping && shmem_mapping(mapping) && nr_shmem_dropped)
>>>            shmem_uncharge(mapping->host, nr_shmem_dropped);
>>
>> That looks questionable. We shouldn't add runtime check to handle buildtime things.
>>
>> Likely what you want is instead
>>
>> if (IS_ENABLED(CONFIG_SHMEM) && nr_shmem_dropped)
>>      shmem_uncharge()
>>
> 
> shmem_mapping() returns false for CONFIG_SHMEM=n and shmem_mapping() checks that the mapping
> is indeed for shmem ops before uncharging. Happy to change it if you like,
> your version is more readable
Good point, but the questionable thing is that it looks like nr_shmem_dropped
could be set for non-shmem mappings, when it's really just a compiler thing.

What about handling it through a proper stub so we can keep this calling code simple?

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 5b368f9549d67..e38cb01031200 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -136,11 +136,15 @@ static inline bool shmem_hpage_pmd_enabled(void)
  
  #ifdef CONFIG_SHMEM
  extern unsigned long shmem_swap_usage(struct vm_area_struct *vma);
+extern void shmem_uncharge(struct inode *inode, long pages);
  #else
  static inline unsigned long shmem_swap_usage(struct vm_area_struct *vma)
  {
         return 0;
  }
+static inline void shmem_uncharge(struct inode *inode, long pages)
+{
+}
  #endif
  extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
                                                 pgoff_t start, pgoff_t end);
@@ -194,7 +198,6 @@ static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof)
  }
  
  extern bool shmem_charge(struct inode *inode, long pages);
-extern void shmem_uncharge(struct inode *inode, long pages);
  
  #ifdef CONFIG_USERFAULTFD
  #ifdef CONFIG_SHMEM


-- 
Cheers

David

Re: [PATCH] fixup: mm/huge_memory.c: introduce folio_split_unmapped

Posted by Balbir Singh 2 months, 2 weeks ago

On 11/20/25 20:32, David Hildenbrand (Red Hat) wrote:
> On 11/20/25 10:25, Balbir Singh wrote:
>> On 11/20/25 20:09, David Hildenbrand (Red Hat) wrote:
>>> On 11/20/25 04:07, Balbir Singh wrote:
>>>> Code refactoring of __folio_split() via helper
>>>> __folio_freeze_and_split_unmapped() caused a regression with clang-20
>>>> with CONFIG_SHMEM=n, the compiler was not able to optimize away the
>>>> call to shmem_uncharge() due to changes in nr_shmem_dropped.
>>>> Fix this by checking for shmem_mapping() prior to calling
>>>> shmem_uncharge(), shmem_mapping() returns false when CONFIG_SHMEM=n.
>>>>
>>>> smatch also complained about parameter end being used without
>>>> initialization, which is a false positive, but keep the tool happy
>>>> by sending in initialized parameters. end is initialized to 0.
>>>>
>>>> Add detailed documentation comments for folio_split_unmapped()
>>>>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
>>>> Cc: Rakie Kim <rakie.kim@sk.com>
>>>> Cc: Byungchul Park <byungchul@sk.com>
>>>> Cc: Gregory Price <gourry@gourry.net>
>>>> Cc: Ying Huang <ying.huang@linux.alibaba.com>
>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
>>>> Cc: Nico Pache <npache@redhat.com>
>>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>>> Cc: Dev Jain <dev.jain@arm.com>
>>>> Cc: Barry Song <baohua@kernel.org>
>>>> Cc: Lyude Paul <lyude@redhat.com>
>>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>>> Cc: David Airlie <airlied@gmail.com>
>>>> Cc: Simona Vetter <simona@ffwll.ch>
>>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>>> Cc: Mika Penttilä <mpenttil@redhat.com>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: Francois Dugast <francois.dugast@intel.com>
>>>>
>>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>>> ---
>>>>    mm/huge_memory.c | 32 ++++++++++++++++++++++----------
>>>>    1 file changed, 22 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 78a31a476ad3..c4267a0f74df 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -3751,6 +3751,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>>>        int ret = 0;
>>>>        struct deferred_split *ds_queue;
>>>>    +    VM_WARN_ON_ONCE(!mapping && end != 0);
>>>
>>> You could drop the "!= 0"
>>
>> Ack
>>
>> VM_WARN_ONE(!mapping && end);
>>
>>>
>>>>        /* Prevent deferred_split_scan() touching ->_refcount */
>>>>        ds_queue = folio_split_queue_lock(folio);
>>>>        if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>> @@ -3919,7 +3920,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>        int nr_shmem_dropped = 0;
>>>>        int remap_flags = 0;
>>>>        int extra_pins, ret;
>>>> -    pgoff_t end;
>>>> +    pgoff_t end = 0;
>>>>        bool is_hzp;
>>>>          VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
>>>> @@ -4049,7 +4050,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>          local_irq_enable();
>>>>    -    if (nr_shmem_dropped)
>>>> +    if (mapping && shmem_mapping(mapping) && nr_shmem_dropped)
>>>>            shmem_uncharge(mapping->host, nr_shmem_dropped);
>>>
>>> That looks questionable. We shouldn't add runtime check to handle buildtime things.
>>>
>>> Likely what you want is instead
>>>
>>> if (IS_ENABLED(CONFIG_SHMEM) && nr_shmem_dropped)
>>>      shmem_uncharge()
>>>
>>
>> shmem_mapping() returns false for CONFIG_SHMEM=n and shmem_mapping() checks that the mapping
>> is indeed for shmem ops before uncharging. Happy to change it if you like,
>> your version is more readable
> Good point, but the questionable thing is that it looks like nr_shmem_dropped
> could be set for non-shmem mappings, when it's really just a compiler thing.
> 
> What about handling it through a proper stub so we can keep this calling code simple?
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 5b368f9549d67..e38cb01031200 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -136,11 +136,15 @@ static inline bool shmem_hpage_pmd_enabled(void)
>  
>  #ifdef CONFIG_SHMEM
>  extern unsigned long shmem_swap_usage(struct vm_area_struct *vma);
> +extern void shmem_uncharge(struct inode *inode, long pages);
>  #else
>  static inline unsigned long shmem_swap_usage(struct vm_area_struct *vma)
>  {
>         return 0;
>  }
> +static inline void shmem_uncharge(struct inode *inode, long pages)
> +{
> +}
>  #endif
>  extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>                                                 pgoff_t start, pgoff_t end);
> @@ -194,7 +198,6 @@ static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof)
>  }
>  
>  extern bool shmem_charge(struct inode *inode, long pages);
> -extern void shmem_uncharge(struct inode *inode, long pages);
>  
>  #ifdef CONFIG_USERFAULTFD
>  #ifdef CONFIG_SHMEM
> 
> 

Agreed, I would like to let this patch proceed and then immediately follow up patch
along the lines of CONFIG_SHMEM as separate independent patch (independent of this
series). What do you think?

Balbir
changes.

Re: [PATCH] fixup: mm/huge_memory.c: introduce folio_split_unmapped

Posted by David Hildenbrand (Red Hat) 2 months, 2 weeks ago

On 11/20/25 11:35, Balbir Singh wrote:
> On 11/20/25 20:32, David Hildenbrand (Red Hat) wrote:
>> On 11/20/25 10:25, Balbir Singh wrote:
>>> On 11/20/25 20:09, David Hildenbrand (Red Hat) wrote:
>>>> On 11/20/25 04:07, Balbir Singh wrote:
>>>>> Code refactoring of __folio_split() via helper
>>>>> __folio_freeze_and_split_unmapped() caused a regression with clang-20
>>>>> with CONFIG_SHMEM=n, the compiler was not able to optimize away the
>>>>> call to shmem_uncharge() due to changes in nr_shmem_dropped.
>>>>> Fix this by checking for shmem_mapping() prior to calling
>>>>> shmem_uncharge(), shmem_mapping() returns false when CONFIG_SHMEM=n.
>>>>>
>>>>> smatch also complained about parameter end being used without
>>>>> initialization, which is a false positive, but keep the tool happy
>>>>> by sending in initialized parameters. end is initialized to 0.
>>>>>
>>>>> Add detailed documentation comments for folio_split_unmapped()
>>>>>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: David Hildenbrand <david@redhat.com>
>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
>>>>> Cc: Rakie Kim <rakie.kim@sk.com>
>>>>> Cc: Byungchul Park <byungchul@sk.com>
>>>>> Cc: Gregory Price <gourry@gourry.net>
>>>>> Cc: Ying Huang <ying.huang@linux.alibaba.com>
>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
>>>>> Cc: Nico Pache <npache@redhat.com>
>>>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>>>> Cc: Dev Jain <dev.jain@arm.com>
>>>>> Cc: Barry Song <baohua@kernel.org>
>>>>> Cc: Lyude Paul <lyude@redhat.com>
>>>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>>>> Cc: David Airlie <airlied@gmail.com>
>>>>> Cc: Simona Vetter <simona@ffwll.ch>
>>>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>>>> Cc: Mika Penttilä <mpenttil@redhat.com>
>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>> Cc: Francois Dugast <francois.dugast@intel.com>
>>>>>
>>>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>>>> ---
>>>>>     mm/huge_memory.c | 32 ++++++++++++++++++++++----------
>>>>>     1 file changed, 22 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 78a31a476ad3..c4267a0f74df 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -3751,6 +3751,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>>>>         int ret = 0;
>>>>>         struct deferred_split *ds_queue;
>>>>>     +    VM_WARN_ON_ONCE(!mapping && end != 0);
>>>>
>>>> You could drop the "!= 0"
>>>
>>> Ack
>>>
>>> VM_WARN_ONE(!mapping && end);
>>>
>>>>
>>>>>         /* Prevent deferred_split_scan() touching ->_refcount */
>>>>>         ds_queue = folio_split_queue_lock(folio);
>>>>>         if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>>> @@ -3919,7 +3920,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>         int nr_shmem_dropped = 0;
>>>>>         int remap_flags = 0;
>>>>>         int extra_pins, ret;
>>>>> -    pgoff_t end;
>>>>> +    pgoff_t end = 0;
>>>>>         bool is_hzp;
>>>>>           VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
>>>>> @@ -4049,7 +4050,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>           local_irq_enable();
>>>>>     -    if (nr_shmem_dropped)
>>>>> +    if (mapping && shmem_mapping(mapping) && nr_shmem_dropped)
>>>>>             shmem_uncharge(mapping->host, nr_shmem_dropped);
>>>>
>>>> That looks questionable. We shouldn't add runtime check to handle buildtime things.
>>>>
>>>> Likely what you want is instead
>>>>
>>>> if (IS_ENABLED(CONFIG_SHMEM) && nr_shmem_dropped)
>>>>       shmem_uncharge()
>>>>
>>>
>>> shmem_mapping() returns false for CONFIG_SHMEM=n and shmem_mapping() checks that the mapping
>>> is indeed for shmem ops before uncharging. Happy to change it if you like,
>>> your version is more readable
>> Good point, but the questionable thing is that it looks like nr_shmem_dropped
>> could be set for non-shmem mappings, when it's really just a compiler thing.
>>
>> What about handling it through a proper stub so we can keep this calling code simple?
>>
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index 5b368f9549d67..e38cb01031200 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -136,11 +136,15 @@ static inline bool shmem_hpage_pmd_enabled(void)
>>   
>>   #ifdef CONFIG_SHMEM
>>   extern unsigned long shmem_swap_usage(struct vm_area_struct *vma);
>> +extern void shmem_uncharge(struct inode *inode, long pages);
>>   #else
>>   static inline unsigned long shmem_swap_usage(struct vm_area_struct *vma)
>>   {
>>          return 0;
>>   }
>> +static inline void shmem_uncharge(struct inode *inode, long pages)
>> +{
>> +}
>>   #endif
>>   extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>>                                                  pgoff_t start, pgoff_t end);
>> @@ -194,7 +198,6 @@ static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof)
>>   }
>>   
>>   extern bool shmem_charge(struct inode *inode, long pages);
>> -extern void shmem_uncharge(struct inode *inode, long pages);
>>   
>>   #ifdef CONFIG_USERFAULTFD
>>   #ifdef CONFIG_SHMEM
>>
>>
> 
> Agreed, I would like to let this patch proceed and then immediately follow up patch
> along the lines of CONFIG_SHMEM as separate independent patch (independent of this
> series). What do you think?

Let's do it properly right away, no need to hurry that much.

-- 
Cheers

David