Documentation/admin-guide/mm/transhuge.rst | 52 ++++++--- include/linux/pagemap.h | 16 ++- mm/shmem.c | 128 ++++++++++++++++----- 3 files changed, 146 insertions(+), 50 deletions(-)
Traditionally, tmpfs only supported PMD-sized huge folios. However nowadays with other file systems supporting any sized large folios, and extending anonymous to support mTHP, we should not restrict tmpfs to allocating only PMD-sized huge folios, making it more special. Instead, we should allow tmpfs can allocate any sized large folios. Considering that tmpfs already has the 'huge=' option to control the huge folios allocation, we can extend the 'huge=' option to allow any sized huge folios. The semantics of the 'huge=' mount option are: huge=never: no any sized huge folios huge=always: any sized huge folios huge=within_size: like 'always' but respect the i_size huge=advise: like 'always' if requested with fadvise()/madvise() Note: for tmpfs mmap() faults, due to the lack of a write size hint, still allocate the PMD-sized huge folios if huge=always/within_size/advise is set. Moreover, the 'deny' and 'force' testing options controlled by '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same semantics. The 'deny' can disable any sized large folios for tmpfs, while the 'force' can enable PMD sized large folios for tmpfs. Any comments and suggestions are appreciated. Thanks. Hi David, I did not add a new Kconfig option to control the default behavior of 'huge=' in the current version. I have not changed the default behavior at this time, and let's see if there is a need for this. Changes from RFC v3: - Drop the huge=write_size option. - Allow any sized huge folios for 'hgue' option. - Update the documentation, per David. Changes from RFC v2: - Drop mTHP interfaces to control huge page allocation, per Matthew. - Add a new helper to calculate the order, suggested by Matthew. - Add a new huge=write_size option to allocate large folios based on the write size. - Add a new patch to update the documentation. Changes from RFC v1: - Drop patch 1. - Use 'write_end' to calculate the length in shmem_allowable_huge_orders(). - Update shmem_mapping_size_order() per Daniel. Baolin Wang (3): mm: factor out the order calculation into a new helper mm: shmem: change shmem_huge_global_enabled() to return huge order bitmap mm: shmem: add large folio support for tmpfs David Hildenbrand (1): docs: tmpfs: update the huge folios policy for tmpfs and shmem Documentation/admin-guide/mm/transhuge.rst | 52 ++++++--- include/linux/pagemap.h | 16 ++- mm/shmem.c | 128 ++++++++++++++++----- 3 files changed, 146 insertions(+), 50 deletions(-) -- 2.39.3
On 08.11.24 05:12, Baolin Wang wrote: > Traditionally, tmpfs only supported PMD-sized huge folios. However nowadays > with other file systems supporting any sized large folios, and extending > anonymous to support mTHP, we should not restrict tmpfs to allocating only > PMD-sized huge folios, making it more special. Instead, we should allow > tmpfs can allocate any sized large folios. > > Considering that tmpfs already has the 'huge=' option to control the huge > folios allocation, we can extend the 'huge=' option to allow any sized huge > folios. The semantics of the 'huge=' mount option are: > > huge=never: no any sized huge folios > huge=always: any sized huge folios > huge=within_size: like 'always' but respect the i_size > huge=advise: like 'always' if requested with fadvise()/madvise() > > Note: for tmpfs mmap() faults, due to the lack of a write size hint, still > allocate the PMD-sized huge folios if huge=always/within_size/advise is set. So, no fallback to smaller sizes for now in case we fail to allocate a PMD one? Of course, this can be added later fairly easily. > > Moreover, the 'deny' and 'force' testing options controlled by > '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same > semantics. The 'deny' can disable any sized large folios for tmpfs, while > the 'force' can enable PMD sized large folios for tmpfs. > > Any comments and suggestions are appreciated. Thanks. > > Hi David, > I did not add a new Kconfig option to control the default behavior of 'huge=' > in the current version. I have not changed the default behavior at this > time, and let's see if there is a need for this. Likely we want to change the default at some point so people might get a benefit in more scenarios automatically. But I did not investigate how /tmp is mapped as default by Fedora, for example. -- Cheers, David / dhildenb
On 2024/11/8 23:30, David Hildenbrand wrote: > On 08.11.24 05:12, Baolin Wang wrote: >> Traditionally, tmpfs only supported PMD-sized huge folios. However >> nowadays >> with other file systems supporting any sized large folios, and extending >> anonymous to support mTHP, we should not restrict tmpfs to allocating >> only >> PMD-sized huge folios, making it more special. Instead, we should allow >> tmpfs can allocate any sized large folios. >> >> Considering that tmpfs already has the 'huge=' option to control the huge >> folios allocation, we can extend the 'huge=' option to allow any sized >> huge >> folios. The semantics of the 'huge=' mount option are: >> >> huge=never: no any sized huge folios >> huge=always: any sized huge folios >> huge=within_size: like 'always' but respect the i_size >> huge=advise: like 'always' if requested with fadvise()/madvise() >> >> Note: for tmpfs mmap() faults, due to the lack of a write size hint, >> still >> allocate the PMD-sized huge folios if huge=always/within_size/advise >> is set. > > So, no fallback to smaller sizes for now in case we fail to allocate a > PMD one? Of course, this can be added later fairly easily. Right. I have no strong preference on this. If no one objects, I can add a fallback to smaller large folios if the PMD sized allocation fails in the next version. >> Moreover, the 'deny' and 'force' testing options controlled by >> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the >> same >> semantics. The 'deny' can disable any sized large folios for tmpfs, while >> the 'force' can enable PMD sized large folios for tmpfs. >> >> Any comments and suggestions are appreciated. Thanks. >> >> Hi David, >> I did not add a new Kconfig option to control the default behavior of >> 'huge=' >> in the current version. I have not changed the default behavior at this >> time, and let's see if there is a need for this. > > Likely we want to change the default at some point so people might get a > benefit in more scenarios automatically. But I did not investigate how > /tmp is mapped as default by Fedora, for example. Personally, adding a cmdline to change the default value might be more useful than the Kconfig. Anyway, I still want to investigate if there is a real need.
On 09.11.24 08:12, Baolin Wang wrote: > > > On 2024/11/8 23:30, David Hildenbrand wrote: >> On 08.11.24 05:12, Baolin Wang wrote: >>> Traditionally, tmpfs only supported PMD-sized huge folios. However >>> nowadays >>> with other file systems supporting any sized large folios, and extending >>> anonymous to support mTHP, we should not restrict tmpfs to allocating >>> only >>> PMD-sized huge folios, making it more special. Instead, we should allow >>> tmpfs can allocate any sized large folios. >>> >>> Considering that tmpfs already has the 'huge=' option to control the huge >>> folios allocation, we can extend the 'huge=' option to allow any sized >>> huge >>> folios. The semantics of the 'huge=' mount option are: >>> >>> huge=never: no any sized huge folios >>> huge=always: any sized huge folios >>> huge=within_size: like 'always' but respect the i_size >>> huge=advise: like 'always' if requested with fadvise()/madvise() >>> >>> Note: for tmpfs mmap() faults, due to the lack of a write size hint, >>> still >>> allocate the PMD-sized huge folios if huge=always/within_size/advise >>> is set. >> >> So, no fallback to smaller sizes for now in case we fail to allocate a >> PMD one? Of course, this can be added later fairly easily. > > Right. I have no strong preference on this. If no one objects, I can add > a fallback to smaller large folios if the PMD sized allocation fails in > the next version. I'm fine with a staged approach, to perform this change separately. > >>> Moreover, the 'deny' and 'force' testing options controlled by >>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the >>> same >>> semantics. The 'deny' can disable any sized large folios for tmpfs, while >>> the 'force' can enable PMD sized large folios for tmpfs. >>> >>> Any comments and suggestions are appreciated. Thanks. >>> >>> Hi David, >>> I did not add a new Kconfig option to control the default behavior of >>> 'huge=' >>> in the current version. I have not changed the default behavior at this >>> time, and let's see if there is a need for this. >> >> Likely we want to change the default at some point so people might get a >> benefit in more scenarios automatically. But I did not investigate how >> /tmp is mapped as default by Fedora, for example. > > Personally, adding a cmdline to change the default value might be more > useful than the Kconfig. Anyway, I still want to investigate if there is > a real need. Likely both will be reasonable to have. FWIW, "systemctl cat tmp.mount" on a Fedora40 system tells me "Options=mode=1777,strictatime,nosuid,nodev,size=50%%,nr_inodes=1m" To be precise: $ grep tmpfs /etc/mtab vendorfw /usr/lib/firmware/vendor tmpfs rw,relatime,mode=755,inode64 0 0 devtmpfs /dev devtmpfs rw,nosuid,size=4096k,nr_inodes=4063361,mode=755,inode64 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0 tmpfs /run tmpfs rw,nosuid,nodev,size=6511156k,nr_inodes=819200,mode=755,inode64 0 0 tmpfs /tmp tmpfs rw,nosuid,nodev,size=16277892k,nr_inodes=1048576,inode64 0 0 tmpfs /run/user/100813 tmpfs rw,nosuid,nodev,relatime,size=3255576k,nr_inodes=813894,mode=700,uid=100813,gid=100813,inode64 0 0 Having a way to change the default will likely be extremely helpful. -- Cheers, David / dhildenb
On 2024/11/12 03:47, David Hildenbrand wrote: > On 09.11.24 08:12, Baolin Wang wrote: >> >> >> On 2024/11/8 23:30, David Hildenbrand wrote: >>> On 08.11.24 05:12, Baolin Wang wrote: >>>> Traditionally, tmpfs only supported PMD-sized huge folios. However >>>> nowadays >>>> with other file systems supporting any sized large folios, and >>>> extending >>>> anonymous to support mTHP, we should not restrict tmpfs to allocating >>>> only >>>> PMD-sized huge folios, making it more special. Instead, we should allow >>>> tmpfs can allocate any sized large folios. >>>> >>>> Considering that tmpfs already has the 'huge=' option to control the >>>> huge >>>> folios allocation, we can extend the 'huge=' option to allow any sized >>>> huge >>>> folios. The semantics of the 'huge=' mount option are: >>>> >>>> huge=never: no any sized huge folios >>>> huge=always: any sized huge folios >>>> huge=within_size: like 'always' but respect the i_size >>>> huge=advise: like 'always' if requested with fadvise()/madvise() >>>> >>>> Note: for tmpfs mmap() faults, due to the lack of a write size hint, >>>> still >>>> allocate the PMD-sized huge folios if huge=always/within_size/advise >>>> is set. >>> >>> So, no fallback to smaller sizes for now in case we fail to allocate a >>> PMD one? Of course, this can be added later fairly easily. >> >> Right. I have no strong preference on this. If no one objects, I can add >> a fallback to smaller large folios if the PMD sized allocation fails in >> the next version. > > I'm fine with a staged approach, to perform this change separately. Sure. >>>> Moreover, the 'deny' and 'force' testing options controlled by >>>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the >>>> same >>>> semantics. The 'deny' can disable any sized large folios for tmpfs, >>>> while >>>> the 'force' can enable PMD sized large folios for tmpfs. >>>> >>>> Any comments and suggestions are appreciated. Thanks. >>>> >>>> Hi David, >>>> I did not add a new Kconfig option to control the default behavior of >>>> 'huge=' >>>> in the current version. I have not changed the default behavior at this >>>> time, and let's see if there is a need for this. >>> >>> Likely we want to change the default at some point so people might get a >>> benefit in more scenarios automatically. But I did not investigate how >>> /tmp is mapped as default by Fedora, for example. >> >> Personally, adding a cmdline to change the default value might be more >> useful than the Kconfig. Anyway, I still want to investigate if there is >> a real need. > > Likely both will be reasonable to have. > > FWIW, "systemctl cat tmp.mount" on a Fedora40 system tells me > "Options=mode=1777,strictatime,nosuid,nodev,size=50%%,nr_inodes=1m" > > To be precise: > > $ grep tmpfs /etc/mtab > vendorfw /usr/lib/firmware/vendor tmpfs rw,relatime,mode=755,inode64 0 0 > devtmpfs /dev devtmpfs > rw,nosuid,size=4096k,nr_inodes=4063361,mode=755,inode64 0 0 > tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0 > tmpfs /run tmpfs > rw,nosuid,nodev,size=6511156k,nr_inodes=819200,mode=755,inode64 0 0 > tmpfs /tmp tmpfs > rw,nosuid,nodev,size=16277892k,nr_inodes=1048576,inode64 0 0 > tmpfs /run/user/100813 tmpfs > rw,nosuid,nodev,relatime,size=3255576k,nr_inodes=813894,mode=700,uid=100813,gid=100813,inode64 0 0 > > > Having a way to change the default will likely be extremely helpful. Thanks. I'd like to add a command line option like 'transparent_hugepage_shmem' to control the default value.
© 2016 - 2024 Red Hat, Inc.