[v1] Support large folios for tmpfs

[PATCH 0/4] Support large folios for tmpfs

Posted by Baolin Wang 1 year, 3 months ago

Traditionally, tmpfs only supported PMD-sized huge folios. However nowadays
with other file systems supporting any sized large folios, and extending
anonymous to support mTHP, we should not restrict tmpfs to allocating only
PMD-sized huge folios, making it more special. Instead, we should allow
tmpfs can allocate any sized large folios.

Considering that tmpfs already has the 'huge=' option to control the huge
folios allocation, we can extend the 'huge=' option to allow any sized huge
folios. The semantics of the 'huge=' mount option are:

huge=never: no any sized huge folios
huge=always: any sized huge folios
huge=within_size: like 'always' but respect the i_size
huge=advise: like 'always' if requested with fadvise()/madvise()

Note: for tmpfs mmap() faults, due to the lack of a write size hint, still
allocate the PMD-sized huge folios if huge=always/within_size/advise is set.

Moreover, the 'deny' and 'force' testing options controlled by
'/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same
semantics. The 'deny' can disable any sized large folios for tmpfs, while
the 'force' can enable PMD sized large folios for tmpfs.

Any comments and suggestions are appreciated. Thanks.

Hi David,
I did not add a new Kconfig option to control the default behavior of 'huge='
in the current version. I have not changed the default behavior at this
time, and let's see if there is a need for this.

Changes from RFC v3:
 - Drop the huge=write_size option.
 - Allow any sized huge folios for 'hgue' option.
 - Update the documentation, per David.

Changes from RFC v2:
 - Drop mTHP interfaces to control huge page allocation, per Matthew.
 - Add a new helper to calculate the order, suggested by Matthew.
 - Add a new huge=write_size option to allocate large folios based on
   the write size.
 - Add a new patch to update the documentation.

Changes from RFC v1:
 - Drop patch 1.
 - Use 'write_end' to calculate the length in shmem_allowable_huge_orders().
 - Update shmem_mapping_size_order() per Daniel.

Baolin Wang (3):
  mm: factor out the order calculation into a new helper
  mm: shmem: change shmem_huge_global_enabled() to return huge order
    bitmap
  mm: shmem: add large folio support for tmpfs

David Hildenbrand (1):
  docs: tmpfs: update the huge folios policy for tmpfs and shmem

 Documentation/admin-guide/mm/transhuge.rst |  52 ++++++---
 include/linux/pagemap.h                    |  16 ++-
 mm/shmem.c                                 | 128 ++++++++++++++++-----
 3 files changed, 146 insertions(+), 50 deletions(-)

-- 
2.39.3

Re: [PATCH 0/4] Support large folios for tmpfs

Posted by David Hildenbrand 1 year, 3 months ago

On 08.11.24 05:12, Baolin Wang wrote:
> Traditionally, tmpfs only supported PMD-sized huge folios. However nowadays
> with other file systems supporting any sized large folios, and extending
> anonymous to support mTHP, we should not restrict tmpfs to allocating only
> PMD-sized huge folios, making it more special. Instead, we should allow
> tmpfs can allocate any sized large folios.
> 
> Considering that tmpfs already has the 'huge=' option to control the huge
> folios allocation, we can extend the 'huge=' option to allow any sized huge
> folios. The semantics of the 'huge=' mount option are:
> 
> huge=never: no any sized huge folios
> huge=always: any sized huge folios
> huge=within_size: like 'always' but respect the i_size
> huge=advise: like 'always' if requested with fadvise()/madvise()
> 
> Note: for tmpfs mmap() faults, due to the lack of a write size hint, still
> allocate the PMD-sized huge folios if huge=always/within_size/advise is set.

So, no fallback to smaller sizes for now in case we fail to allocate a 
PMD one? Of course, this can be added later fairly easily.

> 
> Moreover, the 'deny' and 'force' testing options controlled by
> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same
> semantics. The 'deny' can disable any sized large folios for tmpfs, while
> the 'force' can enable PMD sized large folios for tmpfs.
> 
> Any comments and suggestions are appreciated. Thanks.
> 
> Hi David,
> I did not add a new Kconfig option to control the default behavior of 'huge='
> in the current version. I have not changed the default behavior at this
> time, and let's see if there is a need for this.

Likely we want to change the default at some point so people might get a 
benefit in more scenarios automatically. But I did not investigate how 
/tmp is mapped as default by Fedora, for example.

-- 
Cheers,

David / dhildenb

Re: [PATCH 0/4] Support large folios for tmpfs

Posted by Baolin Wang 1 year, 3 months ago


On 2024/11/8 23:30, David Hildenbrand wrote:
> On 08.11.24 05:12, Baolin Wang wrote:
>> Traditionally, tmpfs only supported PMD-sized huge folios. However 
>> nowadays
>> with other file systems supporting any sized large folios, and extending
>> anonymous to support mTHP, we should not restrict tmpfs to allocating 
>> only
>> PMD-sized huge folios, making it more special. Instead, we should allow
>> tmpfs can allocate any sized large folios.
>>
>> Considering that tmpfs already has the 'huge=' option to control the huge
>> folios allocation, we can extend the 'huge=' option to allow any sized 
>> huge
>> folios. The semantics of the 'huge=' mount option are:
>>
>> huge=never: no any sized huge folios
>> huge=always: any sized huge folios
>> huge=within_size: like 'always' but respect the i_size
>> huge=advise: like 'always' if requested with fadvise()/madvise()
>>
>> Note: for tmpfs mmap() faults, due to the lack of a write size hint, 
>> still
>> allocate the PMD-sized huge folios if huge=always/within_size/advise 
>> is set.
> 
> So, no fallback to smaller sizes for now in case we fail to allocate a 
> PMD one? Of course, this can be added later fairly easily.

Right. I have no strong preference on this. If no one objects, I can add 
a fallback to smaller large folios if the PMD sized allocation fails in 
the next version.

>> Moreover, the 'deny' and 'force' testing options controlled by
>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the 
>> same
>> semantics. The 'deny' can disable any sized large folios for tmpfs, while
>> the 'force' can enable PMD sized large folios for tmpfs.
>>
>> Any comments and suggestions are appreciated. Thanks.
>>
>> Hi David,
>> I did not add a new Kconfig option to control the default behavior of 
>> 'huge='
>> in the current version. I have not changed the default behavior at this
>> time, and let's see if there is a need for this.
> 
> Likely we want to change the default at some point so people might get a 
> benefit in more scenarios automatically. But I did not investigate how 
> /tmp is mapped as default by Fedora, for example.

Personally, adding a cmdline to change the default value might be more 
useful than the Kconfig. Anyway, I still want to investigate if there is 
a real need.

Re: [PATCH 0/4] Support large folios for tmpfs

Posted by David Hildenbrand 1 year, 2 months ago

On 09.11.24 08:12, Baolin Wang wrote:
> 
> 
> On 2024/11/8 23:30, David Hildenbrand wrote:
>> On 08.11.24 05:12, Baolin Wang wrote:
>>> Traditionally, tmpfs only supported PMD-sized huge folios. However
>>> nowadays
>>> with other file systems supporting any sized large folios, and extending
>>> anonymous to support mTHP, we should not restrict tmpfs to allocating
>>> only
>>> PMD-sized huge folios, making it more special. Instead, we should allow
>>> tmpfs can allocate any sized large folios.
>>>
>>> Considering that tmpfs already has the 'huge=' option to control the huge
>>> folios allocation, we can extend the 'huge=' option to allow any sized
>>> huge
>>> folios. The semantics of the 'huge=' mount option are:
>>>
>>> huge=never: no any sized huge folios
>>> huge=always: any sized huge folios
>>> huge=within_size: like 'always' but respect the i_size
>>> huge=advise: like 'always' if requested with fadvise()/madvise()
>>>
>>> Note: for tmpfs mmap() faults, due to the lack of a write size hint,
>>> still
>>> allocate the PMD-sized huge folios if huge=always/within_size/advise
>>> is set.
>>
>> So, no fallback to smaller sizes for now in case we fail to allocate a
>> PMD one? Of course, this can be added later fairly easily.
> 
> Right. I have no strong preference on this. If no one objects, I can add
> a fallback to smaller large folios if the PMD sized allocation fails in
> the next version.

I'm fine with a staged approach, to perform this change separately.

> 
>>> Moreover, the 'deny' and 'force' testing options controlled by
>>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the
>>> same
>>> semantics. The 'deny' can disable any sized large folios for tmpfs, while
>>> the 'force' can enable PMD sized large folios for tmpfs.
>>>
>>> Any comments and suggestions are appreciated. Thanks.
>>>
>>> Hi David,
>>> I did not add a new Kconfig option to control the default behavior of
>>> 'huge='
>>> in the current version. I have not changed the default behavior at this
>>> time, and let's see if there is a need for this.
>>
>> Likely we want to change the default at some point so people might get a
>> benefit in more scenarios automatically. But I did not investigate how
>> /tmp is mapped as default by Fedora, for example.
> 
> Personally, adding a cmdline to change the default value might be more
> useful than the Kconfig. Anyway, I still want to investigate if there is
> a real need.

Likely both will be reasonable to have.

FWIW, "systemctl cat tmp.mount" on a Fedora40 system tells me
"Options=mode=1777,strictatime,nosuid,nodev,size=50%%,nr_inodes=1m"

To be precise:

$ grep tmpfs /etc/mtab
vendorfw /usr/lib/firmware/vendor tmpfs rw,relatime,mode=755,inode64 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=4096k,nr_inodes=4063361,mode=755,inode64 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=6511156k,nr_inodes=819200,mode=755,inode64 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev,size=16277892k,nr_inodes=1048576,inode64 0 0
tmpfs /run/user/100813 tmpfs rw,nosuid,nodev,relatime,size=3255576k,nr_inodes=813894,mode=700,uid=100813,gid=100813,inode64 0 0


Having a way to change the default will likely be extremely helpful.

-- 
Cheers,

David / dhildenb

Re: [PATCH 0/4] Support large folios for tmpfs

Posted by Baolin Wang 1 year, 2 months ago


On 2024/11/12 03:47, David Hildenbrand wrote:
> On 09.11.24 08:12, Baolin Wang wrote:
>>
>>
>> On 2024/11/8 23:30, David Hildenbrand wrote:
>>> On 08.11.24 05:12, Baolin Wang wrote:
>>>> Traditionally, tmpfs only supported PMD-sized huge folios. However
>>>> nowadays
>>>> with other file systems supporting any sized large folios, and 
>>>> extending
>>>> anonymous to support mTHP, we should not restrict tmpfs to allocating
>>>> only
>>>> PMD-sized huge folios, making it more special. Instead, we should allow
>>>> tmpfs can allocate any sized large folios.
>>>>
>>>> Considering that tmpfs already has the 'huge=' option to control the 
>>>> huge
>>>> folios allocation, we can extend the 'huge=' option to allow any sized
>>>> huge
>>>> folios. The semantics of the 'huge=' mount option are:
>>>>
>>>> huge=never: no any sized huge folios
>>>> huge=always: any sized huge folios
>>>> huge=within_size: like 'always' but respect the i_size
>>>> huge=advise: like 'always' if requested with fadvise()/madvise()
>>>>
>>>> Note: for tmpfs mmap() faults, due to the lack of a write size hint,
>>>> still
>>>> allocate the PMD-sized huge folios if huge=always/within_size/advise
>>>> is set.
>>>
>>> So, no fallback to smaller sizes for now in case we fail to allocate a
>>> PMD one? Of course, this can be added later fairly easily.
>>
>> Right. I have no strong preference on this. If no one objects, I can add
>> a fallback to smaller large folios if the PMD sized allocation fails in
>> the next version.
> 
> I'm fine with a staged approach, to perform this change separately.

Sure.

>>>> Moreover, the 'deny' and 'force' testing options controlled by
>>>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the
>>>> same
>>>> semantics. The 'deny' can disable any sized large folios for tmpfs, 
>>>> while
>>>> the 'force' can enable PMD sized large folios for tmpfs.
>>>>
>>>> Any comments and suggestions are appreciated. Thanks.
>>>>
>>>> Hi David,
>>>> I did not add a new Kconfig option to control the default behavior of
>>>> 'huge='
>>>> in the current version. I have not changed the default behavior at this
>>>> time, and let's see if there is a need for this.
>>>
>>> Likely we want to change the default at some point so people might get a
>>> benefit in more scenarios automatically. But I did not investigate how
>>> /tmp is mapped as default by Fedora, for example.
>>
>> Personally, adding a cmdline to change the default value might be more
>> useful than the Kconfig. Anyway, I still want to investigate if there is
>> a real need.
> 
> Likely both will be reasonable to have.
> 
> FWIW, "systemctl cat tmp.mount" on a Fedora40 system tells me
> "Options=mode=1777,strictatime,nosuid,nodev,size=50%%,nr_inodes=1m"
> 
> To be precise:
> 
> $ grep tmpfs /etc/mtab
> vendorfw /usr/lib/firmware/vendor tmpfs rw,relatime,mode=755,inode64 0 0
> devtmpfs /dev devtmpfs 
> rw,nosuid,size=4096k,nr_inodes=4063361,mode=755,inode64 0 0
> tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
> tmpfs /run tmpfs 
> rw,nosuid,nodev,size=6511156k,nr_inodes=819200,mode=755,inode64 0 0
> tmpfs /tmp tmpfs 
> rw,nosuid,nodev,size=16277892k,nr_inodes=1048576,inode64 0 0
> tmpfs /run/user/100813 tmpfs 
> rw,nosuid,nodev,relatime,size=3255576k,nr_inodes=813894,mode=700,uid=100813,gid=100813,inode64 0 0
> 
> 
> Having a way to change the default will likely be extremely helpful.

Thanks. I'd like to add a command line option like 
'transparent_hugepage_shmem' to control the default value.