mm/shmem.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
From: Vernon Yang <yanglincheng@kylinos.cn>
When the system memory is sufficient, allocating memory is always
successful, but when tmpfs size is low (e.g. 1MB), it falls back
directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB)
will not be tried.
Therefore add check whether the remaining space of tmpfs is sufficient
for allocation. If there is too little space left, try smaller large
folio.
Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
mm/shmem.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index 8c592c6db2a0..b20affd57b23 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1820,6 +1820,7 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
unsigned long orders)
{
struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
+ struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
pgoff_t aligned_index;
unsigned long pages;
int order;
@@ -1835,6 +1836,18 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
while (orders) {
pages = 1UL << order;
aligned_index = round_down(index, pages);
+
+ /*
+ * Check whether the remaining space of tmpfs is sufficient for
+ * allocation. If there is too little space left, try smaller
+ * large folio.
+ */
+ if (sbinfo->max_blocks && percpu_counter_read(&sbinfo->used_blocks)
+ + pages > sbinfo->max_blocks) {
+ order = next_order(&orders, order);
+ continue;
+ }
+
/*
* Check for conflict before waiting on a huge allocation.
* Conflict might be that a huge page has just been allocated
--
2.51.0
On 2025/9/8 20:31, Vernon Yang wrote:
> From: Vernon Yang <yanglincheng@kylinos.cn>
>
> When the system memory is sufficient, allocating memory is always
> successful, but when tmpfs size is low (e.g. 1MB), it falls back
> directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB)
> will not be tried.
>
> Therefore add check whether the remaining space of tmpfs is sufficient
> for allocation. If there is too little space left, try smaller large
> folio.
I don't think so.
For a tmpfs mount with 'huge=within_size' and 'size=1M', if you try to
write 1M data, it will allocate an order 8 large folio and will not
fallback to order 0.
For a tmpfs mount with 'huge=always' and 'size=1M', if you try to write
1M data, it will not completely fallback to order 0 either, instead, it
will still allocate some order 1 to order 7 large folios.
I'm not sure if this is your actual user scenario. If your files are
small and you are concerned about not getting large folio allocations, I
recommend using the 'huge=within_size' mount option.
> Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
No, this doesn't fix anything.
> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> ---
> mm/shmem.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8c592c6db2a0..b20affd57b23 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1820,6 +1820,7 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
> unsigned long orders)
> {
> struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
> + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> pgoff_t aligned_index;
> unsigned long pages;
> int order;
> @@ -1835,6 +1836,18 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
> while (orders) {
> pages = 1UL << order;
> aligned_index = round_down(index, pages);
> +
> + /*
> + * Check whether the remaining space of tmpfs is sufficient for
> + * allocation. If there is too little space left, try smaller
> + * large folio.
> + */
> + if (sbinfo->max_blocks && percpu_counter_read(&sbinfo->used_blocks)
> + + pages > sbinfo->max_blocks) {
> + order = next_order(&orders, order);
> + continue;
> + }
> +
> /*
> * Check for conflict before waiting on a huge allocation.
> * Conflict might be that a huge page has just been allocated
> On Sep 9, 2025, at 13:58, Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/9/8 20:31, Vernon Yang wrote:
>> From: Vernon Yang <yanglincheng@kylinos.cn>
>> When the system memory is sufficient, allocating memory is always
>> successful, but when tmpfs size is low (e.g. 1MB), it falls back
>> directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB)
>> will not be tried.
>> Therefore add check whether the remaining space of tmpfs is sufficient
>> for allocation. If there is too little space left, try smaller large
>> folio.
>
> I don't think so.
>
> For a tmpfs mount with 'huge=within_size' and 'size=1M', if you try to write 1M data, it will allocate an order 8 large folio and will not fallback to order 0.
>
> For a tmpfs mount with 'huge=always' and 'size=1M', if you try to write 1M data, it will not completely fallback to order 0 either, instead, it will still allocate some order 1 to order 7 large folios.
>
> I'm not sure if this is your actual user scenario. If your files are small and you are concerned about not getting large folio allocations, I recommend using the 'huge=within_size' mount option.
>
No, this is not my user scenario.
Based on your previous patch [1], this scenario can be easily reproduced as
follows.
$ mount -t tmpfs -o size=1024K,huge=always tmpfs /xxx/test
$ echo hello > /xxx/test/README
$ df -h
tmpfs 1.0M 4.0K 1020K 1% /xxx/test
The code logic is as follows:
shmem_get_folio_gfp()
orders = shmem_allowable_huge_orders()
shmem_alloc_and_add_folio(orders) return -ENOSPC;
shmem_alloc_folio() alloc 2MB
shmem_inode_acct_blocks()
percpu_counter_limited_add() goto unacct;
filemap_remove_folio()
shmem_alloc_and_add_folio(order = 0)
As long as the tmpfs remaining space is too little and the system can allocate
memory 2MB, the above path will be triggered.
[1] https://lore.kernel.org/linux-mm/10e7ac6cebe6535c137c064d5c5a235643eebb4a.1756888965.git.baolin.wang@linux.alibaba.com/
>> Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
>
> No, this doesn't fix anything.
>
>> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
>> ---
>> mm/shmem.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 8c592c6db2a0..b20affd57b23 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1820,6 +1820,7 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
>> unsigned long orders)
>> {
>> struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
>> + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>> pgoff_t aligned_index;
>> unsigned long pages;
>> int order;
>> @@ -1835,6 +1836,18 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
>> while (orders) {
>> pages = 1UL << order;
>> aligned_index = round_down(index, pages);
>> +
>> + /*
>> + * Check whether the remaining space of tmpfs is sufficient for
>> + * allocation. If there is too little space left, try smaller
>> + * large folio.
>> + */
>> + if (sbinfo->max_blocks && percpu_counter_read(&sbinfo->used_blocks)
>> + + pages > sbinfo->max_blocks) {
>> + order = next_order(&orders, order);
>> + continue;
>> + }
>> +
>> /*
>> * Check for conflict before waiting on a huge allocation.
>> * Conflict might be that a huge page has just been allocated
>
On 2025/9/9 20:29, Vernon Yang wrote: > > >> On Sep 9, 2025, at 13:58, Baolin Wang <baolin.wang@linux.alibaba.com> wrote: >> >> >> >> On 2025/9/8 20:31, Vernon Yang wrote: >>> From: Vernon Yang <yanglincheng@kylinos.cn> >>> When the system memory is sufficient, allocating memory is always >>> successful, but when tmpfs size is low (e.g. 1MB), it falls back >>> directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB) >>> will not be tried. >>> Therefore add check whether the remaining space of tmpfs is sufficient >>> for allocation. If there is too little space left, try smaller large >>> folio. >> >> I don't think so. >> >> For a tmpfs mount with 'huge=within_size' and 'size=1M', if you try to write 1M data, it will allocate an order 8 large folio and will not fallback to order 0. >> >> For a tmpfs mount with 'huge=always' and 'size=1M', if you try to write 1M data, it will not completely fallback to order 0 either, instead, it will still allocate some order 1 to order 7 large folios. >> >> I'm not sure if this is your actual user scenario. If your files are small and you are concerned about not getting large folio allocations, I recommend using the 'huge=within_size' mount option. >> > > No, this is not my user scenario. > > Based on your previous patch [1], this scenario can be easily reproduced as > follows. > > $ mount -t tmpfs -o size=1024K,huge=always tmpfs /xxx/test > $ echo hello > /xxx/test/README > $ df -h > tmpfs 1.0M 4.0K 1020K 1% /xxx/test > > The code logic is as follows: > > shmem_get_folio_gfp() > orders = shmem_allowable_huge_orders() > shmem_alloc_and_add_folio(orders) return -ENOSPC; > shmem_alloc_folio() alloc 2MB > shmem_inode_acct_blocks() > percpu_counter_limited_add() goto unacct; > filemap_remove_folio() > shmem_alloc_and_add_folio(order = 0) > > > As long as the tmpfs remaining space is too little and the system can allocate > memory 2MB, the above path will be triggered. In your scenario, wouldn't allocating 4K be more reasonable? Using a 1M large folio would waste memory. Moreover, if you want to use a large folio, I think you could increase the 'size' mount option. To me, this doesn't seem like a real-world usage scenario, instead it looks more like a contrived test case for a specific situation. Sorry, this still doesn't convince me.
On Mon, Sep 22, 2025 at 09:46:53AM +0800, Baolin Wang wrote: > > > On 2025/9/9 20:29, Vernon Yang wrote: > > > > > > > On Sep 9, 2025, at 13:58, Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > > > > > > > > > On 2025/9/8 20:31, Vernon Yang wrote: > > > > From: Vernon Yang <yanglincheng@kylinos.cn> > > > > When the system memory is sufficient, allocating memory is always > > > > successful, but when tmpfs size is low (e.g. 1MB), it falls back > > > > directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB) > > > > will not be tried. > > > > Therefore add check whether the remaining space of tmpfs is sufficient > > > > for allocation. If there is too little space left, try smaller large > > > > folio. > > > > > > I don't think so. > > > > > > For a tmpfs mount with 'huge=within_size' and 'size=1M', if you try to write 1M data, it will allocate an order 8 large folio and will not fallback to order 0. > > > > > > For a tmpfs mount with 'huge=always' and 'size=1M', if you try to write 1M data, it will not completely fallback to order 0 either, instead, it will still allocate some order 1 to order 7 large folios. > > > > > > I'm not sure if this is your actual user scenario. If your files are small and you are concerned about not getting large folio allocations, I recommend using the 'huge=within_size' mount option. > > > > > > > No, this is not my user scenario. > > > > Based on your previous patch [1], this scenario can be easily reproduced as > > follows. > > > > $ mount -t tmpfs -o size=1024K,huge=always tmpfs /xxx/test > > $ echo hello > /xxx/test/README > > $ df -h > > tmpfs 1.0M 4.0K 1020K 1% /xxx/test > > > > The code logic is as follows: > > > > shmem_get_folio_gfp() > > orders = shmem_allowable_huge_orders() > > shmem_alloc_and_add_folio(orders) return -ENOSPC; > > shmem_alloc_folio() alloc 2MB > > shmem_inode_acct_blocks() > > percpu_counter_limited_add() goto unacct; > > filemap_remove_folio() > > shmem_alloc_and_add_folio(order = 0) > > > > > > As long as the tmpfs remaining space is too little and the system can allocate > > memory 2MB, the above path will be triggered. > > In your scenario, wouldn't allocating 4K be more reasonable? Using a 1M > large folio would waste memory. Moreover, if you want to use a large folio, > I think you could increase the 'size' mount option. To me, this doesn't seem > like a real-world usage scenario, instead it looks more like a contrived > test case for a specific situation. The previous example is just an easy demo to reproduce, and if someone uses this example in the real world, of course the best method is to increase the 'size'. But the scenario I want to express here is that when the tmpfs space is *consumed* to less than 2MB, only 4KB will be allocated, you can imagine that when a tmpfs is constantly consumed, but someone is reclaiming or freeing memory, causing often tmpfs space to remain in the range of [0~2MB), then tmpfs will always only allocate 4KB. > Sorry, this still doesn't convince me.
On 2025/9/22 10:51, Vernon Yang wrote: > On Mon, Sep 22, 2025 at 09:46:53AM +0800, Baolin Wang wrote: >> >> >> On 2025/9/9 20:29, Vernon Yang wrote: >>> >>> >>>> On Sep 9, 2025, at 13:58, Baolin Wang <baolin.wang@linux.alibaba.com> wrote: >>>> >>>> >>>> >>>> On 2025/9/8 20:31, Vernon Yang wrote: >>>>> From: Vernon Yang <yanglincheng@kylinos.cn> >>>>> When the system memory is sufficient, allocating memory is always >>>>> successful, but when tmpfs size is low (e.g. 1MB), it falls back >>>>> directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB) >>>>> will not be tried. >>>>> Therefore add check whether the remaining space of tmpfs is sufficient >>>>> for allocation. If there is too little space left, try smaller large >>>>> folio. >>>> >>>> I don't think so. >>>> >>>> For a tmpfs mount with 'huge=within_size' and 'size=1M', if you try to write 1M data, it will allocate an order 8 large folio and will not fallback to order 0. >>>> >>>> For a tmpfs mount with 'huge=always' and 'size=1M', if you try to write 1M data, it will not completely fallback to order 0 either, instead, it will still allocate some order 1 to order 7 large folios. >>>> >>>> I'm not sure if this is your actual user scenario. If your files are small and you are concerned about not getting large folio allocations, I recommend using the 'huge=within_size' mount option. >>>> >>> >>> No, this is not my user scenario. >>> >>> Based on your previous patch [1], this scenario can be easily reproduced as >>> follows. >>> >>> $ mount -t tmpfs -o size=1024K,huge=always tmpfs /xxx/test >>> $ echo hello > /xxx/test/README >>> $ df -h >>> tmpfs 1.0M 4.0K 1020K 1% /xxx/test >>> >>> The code logic is as follows: >>> >>> shmem_get_folio_gfp() >>> orders = shmem_allowable_huge_orders() >>> shmem_alloc_and_add_folio(orders) return -ENOSPC; >>> shmem_alloc_folio() alloc 2MB >>> shmem_inode_acct_blocks() >>> percpu_counter_limited_add() goto unacct; >>> filemap_remove_folio() >>> shmem_alloc_and_add_folio(order = 0) >>> >>> >>> As long as the tmpfs remaining space is too little and the system can allocate >>> memory 2MB, the above path will be triggered. >> >> In your scenario, wouldn't allocating 4K be more reasonable? Using a 1M >> large folio would waste memory. Moreover, if you want to use a large folio, >> I think you could increase the 'size' mount option. To me, this doesn't seem >> like a real-world usage scenario, instead it looks more like a contrived >> test case for a specific situation. > > The previous example is just an easy demo to reproduce, and if someone > uses this example in the real world, of course the best method is to > increase the 'size'. > > But the scenario I want to express here is that when the tmpfs space is > *consumed* to less than 2MB, only 4KB will be allocated, you can imagine > that when a tmpfs is constantly consumed, but someone is reclaiming or > freeing memory, causing often tmpfs space to remain in the range of > [0~2MB), then tmpfs will always only allocate 4KB. Please increase your 'size' mount option for testing. I don't see why we need to add more such logic without a solid reason. Andrew, please drop this patch.
On Mon, 8 Sep 2025 20:31:28 +0800 Vernon Yang <vernon2gm@gmail.com> wrote: > From: Vernon Yang <yanglincheng@kylinos.cn> > > When the system memory is sufficient, allocating memory is always > successful, but when tmpfs size is low (e.g. 1MB), it falls back > directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB) > will not be tried. > > Therefore add check whether the remaining space of tmpfs is sufficient > for allocation. If there is too little space left, try smaller large > folio. Thanks. What are the effects of this change? I'm assuming it's an *improvement*, rather than a fix for some misbehavior?
> On Sep 9, 2025, at 07:22, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 8 Sep 2025 20:31:28 +0800 Vernon Yang <vernon2gm@gmail.com> wrote: > >> From: Vernon Yang <yanglincheng@kylinos.cn> >> >> When the system memory is sufficient, allocating memory is always >> successful, but when tmpfs size is low (e.g. 1MB), it falls back >> directly from 2MB to 4KB, and other small granularity (8KB ~ 1024KB) >> will not be tried. >> >> Therefore add check whether the remaining space of tmpfs is sufficient >> for allocation. If there is too little space left, try smaller large >> folio. > > Thanks. > > What are the effects of this change? I'm assuming it's an > *improvement*, rather than a fix for some misbehavior? > When we use tmpfs and the tmpfs space is getting smaller and smaller (e.g. less than 2MB), it can still allocate 8KB~1MB large folio. Thank you for your feedback.
© 2016 - 2026 Red Hat, Inc.