[v2] Fix SIGBUS semantics with large folios

[PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Kiryl Shutsemau 3 months, 2 weeks ago

From: Kiryl Shutsemau <kas@kernel.org>

Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
supposed to generate SIGBUS.

Recent changes attempted to fault in full folio where possible. They did
not respect i_size, which led to populating PTEs beyond i_size and
breaking SIGBUS semantics.

Darrick reported generic/749 breakage because of this.

However, the problem existed before the recent changes. With huge=always
tmpfs, any write to a file leads to PMD-size allocation. Following the
fault-in of the folio will install PMD mapping regardless of i_size.

Fix filemap_map_pages() and finish_fault() to not install:
  - PTEs beyond i_size;
  - PMD mappings across i_size;

Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Reported-by: "Darrick J. Wong" <djwong@kernel.org>
---
 mm/filemap.c | 18 ++++++++++--------
 mm/memory.c  | 13 +++++++++++--
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 13f0259d993c..0d251f6ab480 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3681,7 +3681,8 @@ static struct folio *next_uptodate_folio(struct xa_state *xas,
 static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 			struct folio *folio, unsigned long start,
 			unsigned long addr, unsigned int nr_pages,
-			unsigned long *rss, unsigned short *mmap_miss)
+			unsigned long *rss, unsigned short *mmap_miss,
+			pgoff_t file_end)
 {
 	unsigned int ref_from_caller = 1;
 	vm_fault_t ret = 0;
@@ -3697,7 +3698,8 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	 */
 	addr0 = addr - start * PAGE_SIZE;
 	if (folio_within_vma(folio, vmf->vma) &&
-	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK)) {
+	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK) &&
+	    file_end >= folio_next_index(folio)) {
 		vmf->pte -= start;
 		page -= start;
 		addr = addr0;
@@ -3817,7 +3819,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	if (!folio)
 		goto out;
 
-	if (filemap_map_pmd(vmf, folio, start_pgoff)) {
+	file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
+	end_pgoff = min(end_pgoff, file_end);
+
+	if (file_end >= folio_next_index(folio) &&
+	    filemap_map_pmd(vmf, folio, start_pgoff)) {
 		ret = VM_FAULT_NOPAGE;
 		goto out;
 	}
@@ -3830,10 +3836,6 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		goto out;
 	}
 
-	file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
-	if (end_pgoff > file_end)
-		end_pgoff = file_end;
-
 	folio_type = mm_counter_file(folio);
 	do {
 		unsigned long end;
@@ -3850,7 +3852,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		else
 			ret |= filemap_map_folio_range(vmf, folio,
 					xas.xa_index - folio->index, addr,
-					nr_pages, &rss, &mmap_miss);
+					nr_pages, &rss, &mmap_miss, file_end);
 
 		folio_unlock(folio);
 	} while ((folio = next_uptodate_folio(&xas, mapping, end_pgoff)) != NULL);
diff --git a/mm/memory.c b/mm/memory.c
index 74b45e258323..9bbe59e6922f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5480,6 +5480,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 	int type, nr_pages;
 	unsigned long addr;
 	bool needs_fallback = false;
+	pgoff_t file_end = -1UL;
 
 fallback:
 	addr = vmf->address;
@@ -5501,8 +5502,15 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 			return ret;
 	}
 
+	if (vma->vm_file) {
+		struct inode *inode = vma->vm_file->f_mapping->host;
+
+		file_end = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+	}
+
 	if (pmd_none(*vmf->pmd)) {
-		if (folio_test_pmd_mappable(folio)) {
+		if (folio_test_pmd_mappable(folio) &&
+		    file_end >= folio_next_index(folio)) {
 			ret = do_set_pmd(vmf, folio, page);
 			if (ret != VM_FAULT_FALLBACK)
 				return ret;
@@ -5533,7 +5541,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 		if (unlikely(vma_off < idx ||
 			    vma_off + (nr_pages - idx) > vma_pages(vma) ||
 			    pte_off < idx ||
-			    pte_off + (nr_pages - idx)  > PTRS_PER_PTE)) {
+			    pte_off + (nr_pages - idx) > PTRS_PER_PTE ||
+			    file_end < folio_next_index(folio))) {
 			nr_pages = 1;
 		} else {
 			/* Now we can set mappings for the whole large folio. */
-- 
2.50.1

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Hugh Dickins 3 months, 2 weeks ago

On Thu, 23 Oct 2025, Kiryl Shutsemau wrote:

> From: Kiryl Shutsemau <kas@kernel.org>
> 
> Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
> supposed to generate SIGBUS.
> 
> Recent changes attempted to fault in full folio where possible. They did
> not respect i_size, which led to populating PTEs beyond i_size and
> breaking SIGBUS semantics.
> 
> Darrick reported generic/749 breakage because of this.
> 
> However, the problem existed before the recent changes. With huge=always
> tmpfs, any write to a file leads to PMD-size allocation. Following the
> fault-in of the folio will install PMD mapping regardless of i_size.
> 
> Fix filemap_map_pages() and finish_fault() to not install:
>   - PTEs beyond i_size;
>   - PMD mappings across i_size;

Sorry for coming in late as usual, and complicating matters.

> 
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")

ACK to restoring the correct POSIX behaviour to those filesystems
which are being given large folios beyond EOF transparently,
without any huge= mount option to permit it.

> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")

But NAK to regressing the intentional behaviour of huge=always
on shmem/tmpfs: the page size, whenever possible, is PMD-sized.  In
6.18-rc huge=always is currently (thanks to Baolin) behaving correctly
again, as it had done for nine years: I insist we do not re-break it.

Andrew, please drop this version (and no need to worry about backports).

I'm guessing that yet another ugly shmem_file() or shmem_mapping()
exception should be good enough - I doubt you need to consider the
huge= option, just go by whether there is a huge folio already there -
though that would have an implication for the following patch.

(But what do I mean by "huge folio" above?  Do I mean large or do
I mean pmd_mappable?  It's the huge=always pmd_mappable folios I
care not to break, the mTHPy ones can be argued either way.)

Hugh

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by David Hildenbrand 3 months, 2 weeks ago

On 27.10.25 09:20, Hugh Dickins wrote:
> On Thu, 23 Oct 2025, Kiryl Shutsemau wrote:
> 
>> From: Kiryl Shutsemau <kas@kernel.org>
>>
>> Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
>> supposed to generate SIGBUS.
>>
>> Recent changes attempted to fault in full folio where possible. They did
>> not respect i_size, which led to populating PTEs beyond i_size and
>> breaking SIGBUS semantics.
>>
>> Darrick reported generic/749 breakage because of this.
>>
>> However, the problem existed before the recent changes. With huge=always
>> tmpfs, any write to a file leads to PMD-size allocation. Following the
>> fault-in of the folio will install PMD mapping regardless of i_size.
>>
>> Fix filemap_map_pages() and finish_fault() to not install:
>>    - PTEs beyond i_size;
>>    - PMD mappings across i_size;
> 
> Sorry for coming in late as usual, and complicating matters.
> 

No problem, we CCed you on earlier versions to get your input, and we 
were speculating that shmem behavior might be intended (one way or the 
other).

>>
>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
>> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
> 
> ACK to restoring the correct POSIX behaviour to those filesystems
> which are being given large folios beyond EOF transparently,
> without any huge= mount option to permit it.
> 
>> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> 
> But NAK to regressing the intentional behaviour of huge=always
> on shmem/tmpfs: the page size, whenever possible, is PMD-sized.  In
> 6.18-rc huge=always is currently (thanks to Baolin) behaving correctly
> again, as it had done for nine years: I insist we do not re-break it.

Just so we are on the same page: this is not about which folio sizes we 
allocate (like what Baolin fixed) but what/how much to map.

I guess this patch here would imply the following changes

1) A file with a size that is not PMD aligned will have the last 
(unaligned part) not mapped by PMDs.

2) Once growing a file, the previously-last-part would not be mapped by 
PMDs.


Of course, we would have only mapped the last part of the file by PMDs 
if the VMA would have been large enough in the first place. I'm curious, 
is that something that is commonly done by applications with shmem files 
(map beyond eof)?

-- 
Cheers

David / dhildenb

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Hugh Dickins 3 months, 1 week ago

On Mon, 27 Oct 2025, David Hildenbrand wrote:
...
> 
> Just so we are on the same page: this is not about which folio sizes we
> allocate (like what Baolin fixed) but what/how much to map.
> 
> I guess this patch here would imply the following changes
> 
> 1) A file with a size that is not PMD aligned will have the last (unaligned
> part) not mapped by PMDs.
> 
> 2) Once growing a file, the previously-last-part would not be mapped by PMDs.

Yes, the v2 patch was so, and the v3 patch fixes it.

khugepaged might have fixed it up later on, I suppose.

Hmm, does hpage_collapse_scan_file() or collapse_pte_mapped_thp()
want a modification, to prevent reinserting a PMD after a failed
non-shmem truncation folio_split?  And collapse_file() after a
successful non-shmem truncation folio_split?

Conversely, shouldn't MADV_COLLAPSE be happy to give you a PMD
if the map size permits, even when spanning EOF?

> 
> Of course, we would have only mapped the last part of the file by PMDs if the
> VMA would have been large enough in the first place. I'm curious, is that
> something that is commonly done by applications with shmem files (map beyond
> eof)?

Setting aside the very common case of mapping a fraction of PAGE_SIZE
beyond EOF...

I do not know whether it's common to map a >= PAGE_SIZE fraction of
HPAGE_PMD_SIZE beyond EOF, but it has often been sensible to do so.
For example, imagine (using x86_64 numbers) a 4MiB map of a 3MiB
file on huge tmpfs, requiring two TLB entries for the whole file.

Hugh

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Kiryl Shutsemau 3 months, 1 week ago

On Wed, Oct 29, 2025 at 01:31:45AM -0700, Hugh Dickins wrote:
> On Mon, 27 Oct 2025, David Hildenbrand wrote:
> ...
> > 
> > Just so we are on the same page: this is not about which folio sizes we
> > allocate (like what Baolin fixed) but what/how much to map.
> > 
> > I guess this patch here would imply the following changes
> > 
> > 1) A file with a size that is not PMD aligned will have the last (unaligned
> > part) not mapped by PMDs.
> > 
> > 2) Once growing a file, the previously-last-part would not be mapped by PMDs.
> 
> Yes, the v2 patch was so, and the v3 patch fixes it.
> 
> khugepaged might have fixed it up later on, I suppose.
> 
> Hmm, does hpage_collapse_scan_file() or collapse_pte_mapped_thp()
> want a modification, to prevent reinserting a PMD after a failed
> non-shmem truncation folio_split?  And collapse_file() after a
> successful non-shmem truncation folio_split?

I operated from an assumption that file collapse is still lazy as I
wrote it back it the days and doesn't install PMDs. It *seems* to be
true for khugepaged, but not MADV_COLLAPSE.

Hm...

> Conversely, shouldn't MADV_COLLAPSE be happy to give you a PMD
> if the map size permits, even when spanning EOF?

Filesystem folks say allowing the folio to be mapped beyond
round_up(i_size, PAGE_SIZE) is a correctness issue, not only POSIX
violation.

I consider dropping 'install_pmd' from collapse_pte_mapped_thp() so the
fault path is source of truth of whether PMD can be installed or not.

Objections?

> > Of course, we would have only mapped the last part of the file by PMDs if the
> > VMA would have been large enough in the first place. I'm curious, is that
> > something that is commonly done by applications with shmem files (map beyond
> > eof)?
> 
> Setting aside the very common case of mapping a fraction of PAGE_SIZE
> beyond EOF...
> 
> I do not know whether it's common to map a >= PAGE_SIZE fraction of
> HPAGE_PMD_SIZE beyond EOF, but it has often been sensible to do so.
> For example, imagine (using x86_64 numbers) a 4MiB map of a 3MiB
> file on huge tmpfs, requiring two TLB entries for the whole file.

I am all for ignoring POSIX here. But I am in minority.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Hugh Dickins 3 months, 1 week ago

On Wed, 29 Oct 2025, Kiryl Shutsemau wrote:
> On Wed, Oct 29, 2025 at 01:31:45AM -0700, Hugh Dickins wrote:
> > On Mon, 27 Oct 2025, David Hildenbrand wrote:
> > ...
> > > 
> > > Just so we are on the same page: this is not about which folio sizes we
> > > allocate (like what Baolin fixed) but what/how much to map.
> > > 
> > > I guess this patch here would imply the following changes
> > > 
> > > 1) A file with a size that is not PMD aligned will have the last (unaligned
> > > part) not mapped by PMDs.
> > > 
> > > 2) Once growing a file, the previously-last-part would not be mapped by PMDs.
> > 
> > Yes, the v2 patch was so, and the v3 patch fixes it.
> > 
> > khugepaged might have fixed it up later on, I suppose.
> > 
> > Hmm, does hpage_collapse_scan_file() or collapse_pte_mapped_thp()
> > want a modification, to prevent reinserting a PMD after a failed
> > non-shmem truncation folio_split?  And collapse_file() after a
> > successful non-shmem truncation folio_split?
> 
> I operated from an assumption that file collapse is still lazy as I
> wrote it back it the days and doesn't install PMDs. It *seems* to be
> true for khugepaged, but not MADV_COLLAPSE.
> 
> Hm...
> 
> > Conversely, shouldn't MADV_COLLAPSE be happy to give you a PMD
> > if the map size permits, even when spanning EOF?
> 
> Filesystem folks say allowing the folio to be mapped beyond
> round_up(i_size, PAGE_SIZE) is a correctness issue, not only POSIX
> violation.
> 
> I consider dropping 'install_pmd' from collapse_pte_mapped_thp() so the
> fault path is source of truth of whether PMD can be installed or not.

(Didn't you yourself just recently enhance that?)

> 
> Objections?

Yes, I would probably object (or perhaps want to allow until EOF);
but now it looks to me like we can agree no change is needed there.

I was mistaken in raising those khugepaged/MADV_COLLAPSE doubts,
because file_thp_enabled(vma) is checked in the !shmem !anonymous
!dax case, and file_thp_enabled(vma) still limits to
CONFIG_READ_ONLY_THP_FOR_FS=y, refusing to allow collapse if anyone
has the file open for writing (and you cannot truncate or hole-punch
without write permission); and pagecache is invalidated afterwards
if there are any THPs when reopened for writing (presumably for
page_mkwrite()-ish consistency reasons, which you interestingly
pointed to in another mail where I had worried about ENOSPC after
split failure).

But shmem is simple, does not use page_mkwrite(), and is fine to
continue with install_pmd here, just as it's fine to continue
with huge page spanning EOF as you're now allowing in v3.

But please double check my conclusion there, it's so easy to
get lost in the maze of hugepage permissions and prohibitions.

Hugh

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Kiryl Shutsemau 3 months, 1 week ago

On Wed, Oct 29, 2025 at 10:59:24PM -0700, Hugh Dickins wrote:
> On Wed, 29 Oct 2025, Kiryl Shutsemau wrote:
> > On Wed, Oct 29, 2025 at 01:31:45AM -0700, Hugh Dickins wrote:
> > > On Mon, 27 Oct 2025, David Hildenbrand wrote:
> > > ...
> > > > 
> > > > Just so we are on the same page: this is not about which folio sizes we
> > > > allocate (like what Baolin fixed) but what/how much to map.
> > > > 
> > > > I guess this patch here would imply the following changes
> > > > 
> > > > 1) A file with a size that is not PMD aligned will have the last (unaligned
> > > > part) not mapped by PMDs.
> > > > 
> > > > 2) Once growing a file, the previously-last-part would not be mapped by PMDs.
> > > 
> > > Yes, the v2 patch was so, and the v3 patch fixes it.
> > > 
> > > khugepaged might have fixed it up later on, I suppose.
> > > 
> > > Hmm, does hpage_collapse_scan_file() or collapse_pte_mapped_thp()
> > > want a modification, to prevent reinserting a PMD after a failed
> > > non-shmem truncation folio_split?  And collapse_file() after a
> > > successful non-shmem truncation folio_split?
> > 
> > I operated from an assumption that file collapse is still lazy as I
> > wrote it back it the days and doesn't install PMDs. It *seems* to be
> > true for khugepaged, but not MADV_COLLAPSE.
> > 
> > Hm...
> > 
> > > Conversely, shouldn't MADV_COLLAPSE be happy to give you a PMD
> > > if the map size permits, even when spanning EOF?
> > 
> > Filesystem folks say allowing the folio to be mapped beyond
> > round_up(i_size, PAGE_SIZE) is a correctness issue, not only POSIX
> > violation.
> > 
> > I consider dropping 'install_pmd' from collapse_pte_mapped_thp() so the
> > fault path is source of truth of whether PMD can be installed or not.
> 
> (Didn't you yourself just recently enhance that?)

I failed to adjust my mental model :P

> > 
> > Objections?
> 
> Yes, I would probably object (or perhaps want to allow until EOF);
> but now it looks to me like we can agree no change is needed there.
> 
> I was mistaken in raising those khugepaged/MADV_COLLAPSE doubts,
> because file_thp_enabled(vma) is checked in the !shmem !anonymous
> !dax case, and file_thp_enabled(vma) still limits to
> CONFIG_READ_ONLY_THP_FOR_FS=y, refusing to allow collapse if anyone
> has the file open for writing (and you cannot truncate or hole-punch
> without write permission); and pagecache is invalidated afterwards
> if there are any THPs when reopened for writing (presumably for
> page_mkwrite()-ish consistency reasons, which you interestingly
> pointed to in another mail where I had worried about ENOSPC after
> split failure).
> 
> But shmem is simple, does not use page_mkwrite(), and is fine to
> continue with install_pmd here, just as it's fine to continue
> with huge page spanning EOF as you're now allowing in v3.
> 
> But please double check my conclusion there, it's so easy to
> get lost in the maze of hugepage permissions and prohibitions.

Your analysis looks correct to me.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Kiryl Shutsemau 3 months, 2 weeks ago

On Mon, Oct 27, 2025 at 01:20:42AM -0700, Hugh Dickins wrote:
> On Thu, 23 Oct 2025, Kiryl Shutsemau wrote:
> 
> > From: Kiryl Shutsemau <kas@kernel.org>
> > 
> > Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
> > supposed to generate SIGBUS.
> > 
> > Recent changes attempted to fault in full folio where possible. They did
> > not respect i_size, which led to populating PTEs beyond i_size and
> > breaking SIGBUS semantics.
> > 
> > Darrick reported generic/749 breakage because of this.
> > 
> > However, the problem existed before the recent changes. With huge=always
> > tmpfs, any write to a file leads to PMD-size allocation. Following the
> > fault-in of the folio will install PMD mapping regardless of i_size.
> > 
> > Fix filemap_map_pages() and finish_fault() to not install:
> >   - PTEs beyond i_size;
> >   - PMD mappings across i_size;
> 
> Sorry for coming in late as usual, and complicating matters.
> 
> > 
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
> > Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
> 
> ACK to restoring the correct POSIX behaviour to those filesystems
> which are being given large folios beyond EOF transparently,
> without any huge= mount option to permit it.
> 
> > Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> 
> But NAK to regressing the intentional behaviour of huge=always
> on shmem/tmpfs: the page size, whenever possible, is PMD-sized.  In
> 6.18-rc huge=always is currently (thanks to Baolin) behaving correctly
> again, as it had done for nine years: I insist we do not re-break it.
> 
> Andrew, please drop this version (and no need to worry about backports).
> 
> I'm guessing that yet another ugly shmem_file() or shmem_mapping()
> exception should be good enough - I doubt you need to consider the
> huge= option, just go by whether there is a huge folio already there -
> though that would have an implication for the following patch.
> 
> (But what do I mean by "huge folio" above?  Do I mean large or do
> I mean pmd_mappable?  It's the huge=always pmd_mappable folios I
> care not to break, the mTHPy ones can be argued either way.)

I assume you want the same exception for the second patch as well?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by David Hildenbrand 3 months, 2 weeks ago

On 23.10.25 11:32, Kiryl Shutsemau wrote:
> From: Kiryl Shutsemau <kas@kernel.org>
> 
> Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
> supposed to generate SIGBUS.
> 
> Recent changes attempted to fault in full folio where possible. They did
> not respect i_size, which led to populating PTEs beyond i_size and
> breaking SIGBUS semantics.
> 
> Darrick reported generic/749 breakage because of this.
> 
> However, the problem existed before the recent changes. With huge=always
> tmpfs, any write to a file leads to PMD-size allocation. Following the
> fault-in of the folio will install PMD mapping regardless of i_size.
> 
> Fix filemap_map_pages() and finish_fault() to not install:
>    - PTEs beyond i_size;
>    - PMD mappings across i_size;
> 
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> Reported-by: "Darrick J. Wong" <djwong@kernel.org>
> ---

Some of the code in here might deserve some cleanups IMHO :)

[...]

>   	addr0 = addr - start * PAGE_SIZE;
>   	if (folio_within_vma(folio, vmf->vma) &&
> -	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK)) {
> +	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK) &&

Isn't this just testing whether addr0 is aligned to folio_size(folio)? 
(given that we don't support folios > PMD_SIZE), like

	IS_ALIGNED(addr0, folio_size(folio))

Anyhow, unrelated to this patch ...



Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Kirill A. Shutemov 3 months, 2 weeks ago


On Fri, Oct 24, 2025, at 16:42, David Hildenbrand wrote:
> On 23.10.25 11:32, Kiryl Shutsemau wrote:
>>   	addr0 = addr - start * PAGE_SIZE;
>>   	if (folio_within_vma(folio, vmf->vma) &&
>> -	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK)) {
>> +	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK) &&
>
> Isn't this just testing whether addr0 is aligned to folio_size(folio)? 
> (given that we don't support folios > PMD_SIZE), like
>
> 	IS_ALIGNED(addr0, folio_size(folio))

Actually, no. VMA can be not aligned to folio_size().

-- 
Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by David Hildenbrand 3 months, 2 weeks ago

On 24.10.25 21:32, Kirill A. Shutemov wrote:
> 
> 
> On Fri, Oct 24, 2025, at 16:42, David Hildenbrand wrote:
>> On 23.10.25 11:32, Kiryl Shutsemau wrote:
>>>    	addr0 = addr - start * PAGE_SIZE;
>>>    	if (folio_within_vma(folio, vmf->vma) &&
>>> -	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK)) {
>>> +	    (addr0 & PMD_MASK) == ((addr0 + folio_size(folio) - 1) & PMD_MASK) &&
>>
>> Isn't this just testing whether addr0 is aligned to folio_size(folio)?
>> (given that we don't support folios > PMD_SIZE), like
>>
>> 	IS_ALIGNED(addr0, folio_size(folio))
> 
> Actually, no. VMA can be not aligned to folio_size().

Ah, I missed that we can also have folio sizes besides PMD_SIZE here.

So it's all about testing whether the complete folio would be mapped by 
a single page table.

(a helper would be nice, but cannot immediately come up with a good name)

-- 
Cheers

David / dhildenb

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Andrew Morton 3 months, 2 weeks ago

On Thu, 23 Oct 2025 10:32:50 +0100 Kiryl Shutsemau <kirill@shutemov.name> wrote:

> From: Kiryl Shutsemau <kas@kernel.org>
> 
> Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
> supposed to generate SIGBUS.
> 
> Recent changes attempted to fault in full folio where possible. They did
> not respect i_size, which led to populating PTEs beyond i_size and
> breaking SIGBUS semantics.
> 
> Darrick reported generic/749 breakage because of this.
> 
> However, the problem existed before the recent changes. With huge=always
> tmpfs, any write to a file leads to PMD-size allocation. Following the
> fault-in of the folio will install PMD mapping regardless of i_size.
> 
> Fix filemap_map_pages() and finish_fault() to not install:
>   - PTEs beyond i_size;
>   - PMD mappings across i_size;
> 
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")

Sep 28 2025

> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")

Sep 28 2025

> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")

Jul 26 2016

eek, what's this one doing here?  Are we asking -stable maintainers
to backport this patch into nine years worth of kernels?

I'll remove this Fixes: line for now...

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by David Hildenbrand 3 months, 2 weeks ago

On 23.10.25 22:49, Andrew Morton wrote:
> On Thu, 23 Oct 2025 10:32:50 +0100 Kiryl Shutsemau <kirill@shutemov.name> wrote:
> 
>> From: Kiryl Shutsemau <kas@kernel.org>
>>
>> Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
>> supposed to generate SIGBUS.
>>
>> Recent changes attempted to fault in full folio where possible. They did
>> not respect i_size, which led to populating PTEs beyond i_size and
>> breaking SIGBUS semantics.
>>
>> Darrick reported generic/749 breakage because of this.
>>
>> However, the problem existed before the recent changes. With huge=always
>> tmpfs, any write to a file leads to PMD-size allocation. Following the
>> fault-in of the folio will install PMD mapping regardless of i_size.
>>
>> Fix filemap_map_pages() and finish_fault() to not install:
>>    - PTEs beyond i_size;
>>    - PMD mappings across i_size;
>>
>> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
> 
> Sep 28 2025
> 
>> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
> 
> Sep 28 2025
> 
>> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> 
> Jul 26 2016
> 
> eek, what's this one doing here?  Are we asking -stable maintainers
> to backport this patch into nine years worth of kernels?
> 
> I'll remove this Fixes: line for now...

Ehm, why? It sure is a fix for that. We can indicate to which stable 
versions we want to have ti backported.

And yes, it might be all stable kernels.

-- 
Cheers

David / dhildenb

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Andrew Morton 3 months, 2 weeks ago

On Thu, 23 Oct 2025 22:54:49 +0200 David Hildenbrand <david@redhat.com> wrote:

> >> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
> > 
> > Sep 28 2025
> > 
> >> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
> > 
> > Sep 28 2025
> > 
> >> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> > 
> > Jul 26 2016
> > 
> > eek, what's this one doing here?  Are we asking -stable maintainers
> > to backport this patch into nine years worth of kernels?
> > 
> > I'll remove this Fixes: line for now...
> 
> Ehm, why?

Because the Sep 28 2025 Fixes: totally fooled me and because this
doesn't apply to 6.17, let alone to 6.ancient.

> It sure is a fix for that. We can indicate to which stable 
> versions we want to have ti backported.
> 
> And yes, it might be all stable kernels.

No probs, thanks for clarifying.  I'll restore the

	Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
	Cc; <stable@vger.kernel.org>

and shall let others sort out the backporting issues.

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Kiryl Shutsemau 3 months, 2 weeks ago

On Thu, Oct 23, 2025 at 02:36:24PM -0700, Andrew Morton wrote:
> On Thu, 23 Oct 2025 22:54:49 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
> > >> Fixes: 19773df031bc ("mm/fault: try to map the entire file folio in finish_fault()")
> > > 
> > > Sep 28 2025
> > > 
> > >> Fixes: 357b92761d94 ("mm/filemap: map entire large folio faultaround")
> > > 
> > > Sep 28 2025
> > > 
> > >> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> > > 
> > > Jul 26 2016
> > > 
> > > eek, what's this one doing here?  Are we asking -stable maintainers
> > > to backport this patch into nine years worth of kernels?
> > > 
> > > I'll remove this Fixes: line for now...
> > 
> > Ehm, why?
> 
> Because the Sep 28 2025 Fixes: totally fooled me and because this
> doesn't apply to 6.17, let alone to 6.ancient.
> 
> > It sure is a fix for that. We can indicate to which stable 
> > versions we want to have ti backported.
> > 
> > And yes, it might be all stable kernels.
> 
> No probs, thanks for clarifying.  I'll restore the
> 
> 	Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> 	Cc; <stable@vger.kernel.org>
> 
> and shall let others sort out the backporting issues.

One possible way to relax backporting requirements is only to backport
to kernels where we can have writable file mapping to filesystem with a
backing storage (non-shmem).

Maybe

Fixes: 01c70267053d ("fs: add a filesystem flag for THPs")

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCHv2 1/2] mm/memory: Do not populate page table entries beyond i_size

Posted by Andrew Morton 3 months, 2 weeks ago

On Fri, 24 Oct 2025 10:26:05 +0100 Kiryl Shutsemau <kirill@shutemov.name> wrote:

> > Because the Sep 28 2025 Fixes: totally fooled me and because this
> > doesn't apply to 6.17, let alone to 6.ancient.
> > 
> > > It sure is a fix for that. We can indicate to which stable 
> > > versions we want to have ti backported.
> > > 
> > > And yes, it might be all stable kernels.
> > 
> > No probs, thanks for clarifying.  I'll restore the
> > 
> > 	Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> > 	Cc; <stable@vger.kernel.org>
> > 
> > and shall let others sort out the backporting issues.
> 
> One possible way to relax backporting requirements is only to backport
> to kernels where we can have writable file mapping to filesystem with a
> backing storage (non-shmem).
> 
> Maybe
> 
> Fixes: 01c70267053d ("fs: add a filesystem flag for THPs")

OK, thanks, I changed it to

Link: https://lkml.kernel.org/r/20251023093251.54146-2-kirill@shutemov.name
Fixes: 01c70267053d ("fs: add a filesystem flag for THPs")
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
...
Cc: <stable@vger.kernel.org>