[PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages

David Hildenbrand posted 14 patches 3 months, 3 weeks ago
[PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages
Posted by David Hildenbrand 3 months, 3 weeks ago
Doing a pte_pfn() etc. of something that is not a present page table
entry is wrong. Let's check in all relevant cases where we want to
upgrade write permissions when inserting pfns/pages whether the entry
is actually present.

It's not expected to have caused real harm in practice, so this is more a
cleanup than a fix for something that would likely trigger in some
weird circumstances.

At some point, we should likely unify the two pte handling paths,
similar to how we did it for pmds/puds.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/huge_memory.c | 4 ++--
 mm/memory.c      | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e0e3cfd9f223..e52360df87d15 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1392,7 +1392,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
 		const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
 					  fop.pfn;
 
-		if (write) {
+		if (write && pmd_present(*pmd)) {
 			if (pmd_pfn(*pmd) != pfn) {
 				WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
 				return -EEXIST;
@@ -1541,7 +1541,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
 		const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
 					  fop.pfn;
 
-		if (write) {
+		if (write && pud_present(*pud)) {
 			if (WARN_ON_ONCE(pud_pfn(*pud) != pfn))
 				return;
 			entry = pud_mkyoung(*pud);
diff --git a/mm/memory.c b/mm/memory.c
index a1b5575db52ac..9a1acd057ce59 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2137,7 +2137,7 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
 	pte_t pteval = ptep_get(pte);
 
 	if (!pte_none(pteval)) {
-		if (!mkwrite)
+		if (!mkwrite || !pte_present(pteval))
 			return -EBUSY;
 
 		/* see insert_pfn(). */
@@ -2434,7 +2434,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 		return VM_FAULT_OOM;
 	entry = ptep_get(pte);
 	if (!pte_none(entry)) {
-		if (mkwrite) {
+		if (mkwrite && pte_present(entry)) {
 			/*
 			 * For read faults on private mappings the PFN passed
 			 * in may not match the PFN we have mapped if the
-- 
2.49.0
Re: [PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages
Posted by Pedro Falcato 3 months, 2 weeks ago
On Tue, Jun 17, 2025 at 05:43:34PM +0200, David Hildenbrand wrote:
> Doing a pte_pfn() etc. of something that is not a present page table
> entry is wrong. Let's check in all relevant cases where we want to
> upgrade write permissions when inserting pfns/pages whether the entry
> is actually present.
> 
> It's not expected to have caused real harm in practice, so this is more a
> cleanup than a fix for something that would likely trigger in some
> weird circumstances.

Couldn't we e.g have a swap entry's "pfn" accidentally match the one we're
inserting? Isn't that a correctness problem?

> 
> At some point, we should likely unify the two pte handling paths,
> similar to how we did it for pmds/puds.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Nice little cleanup, thanks.

Reviewed-by: Pedro Falcato <pfalcato@suse.de>

-- 
Pedro
Re: [PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages
Posted by David Hildenbrand 3 months, 2 weeks ago
On 20.06.25 20:24, Pedro Falcato wrote:
> On Tue, Jun 17, 2025 at 05:43:34PM +0200, David Hildenbrand wrote:
>> Doing a pte_pfn() etc. of something that is not a present page table
>> entry is wrong. Let's check in all relevant cases where we want to
>> upgrade write permissions when inserting pfns/pages whether the entry
>> is actually present.
>>
>> It's not expected to have caused real harm in practice, so this is more a
>> cleanup than a fix for something that would likely trigger in some
>> weird circumstances.
> 
> Couldn't we e.g have a swap entry's "pfn" accidentally match the one we're
> inserting? Isn't that a correctness problem?

In theory yes, in practice I think this will not happen.

... especially because the WARN_ON_ONCE() would already trigger in many 
other cases before we would find one situation where it doesn't.

That's why I decided against Fixes: and declaring this more a cleanup, 
because the starts really would have to align ... :)

Thanks!

-- 
Cheers,

David / dhildenb
Re: [PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages
Posted by Oscar Salvador 3 months, 2 weeks ago
On Tue, Jun 17, 2025 at 05:43:34PM +0200, David Hildenbrand wrote:
> Doing a pte_pfn() etc. of something that is not a present page table
> entry is wrong. Let's check in all relevant cases where we want to
> upgrade write permissions when inserting pfns/pages whether the entry
> is actually present.

Maybe I would add that's because the pte can have other info like
marker, swp_entry etc.

> It's not expected to have caused real harm in practice, so this is more a
> cleanup than a fix for something that would likely trigger in some
> weird circumstances.
> 
> At some point, we should likely unify the two pte handling paths,
> similar to how we did it for pmds/puds.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

Should we scream if someone passes us a non-present entry?


> ---
>  mm/huge_memory.c | 4 ++--
>  mm/memory.c      | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8e0e3cfd9f223..e52360df87d15 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1392,7 +1392,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
>  					  fop.pfn;
>  
> -		if (write) {
> +		if (write && pmd_present(*pmd)) {
>  			if (pmd_pfn(*pmd) != pfn) {
>  				WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
>  				return -EEXIST;
> @@ -1541,7 +1541,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
>  		const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
>  					  fop.pfn;
>  
> -		if (write) {
> +		if (write && pud_present(*pud)) {
>  			if (WARN_ON_ONCE(pud_pfn(*pud) != pfn))
>  				return;
>  			entry = pud_mkyoung(*pud);
> diff --git a/mm/memory.c b/mm/memory.c
> index a1b5575db52ac..9a1acd057ce59 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2137,7 +2137,7 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
>  	pte_t pteval = ptep_get(pte);
>  
>  	if (!pte_none(pteval)) {
> -		if (!mkwrite)
> +		if (!mkwrite || !pte_present(pteval))
>  			return -EBUSY;

Why EBUSY? because it might transitory?


-- 
Oscar Salvador
SUSE Labs
Re: [PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages
Posted by David Hildenbrand 3 months, 2 weeks ago
On 20.06.25 15:27, Oscar Salvador wrote:
> On Tue, Jun 17, 2025 at 05:43:34PM +0200, David Hildenbrand wrote:
>> Doing a pte_pfn() etc. of something that is not a present page table
>> entry is wrong. Let's check in all relevant cases where we want to
>> upgrade write permissions when inserting pfns/pages whether the entry
>> is actually present.
> 
> Maybe I would add that's because the pte can have other info like
> marker, swp_entry etc.
> 
>> It's not expected to have caused real harm in practice, so this is more a
>> cleanup than a fix for something that would likely trigger in some
>> weird circumstances.
>>
>> At some point, we should likely unify the two pte handling paths,
>> similar to how we did it for pmds/puds.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> 
> Should we scream if someone passes us a non-present entry?
> 

Probably? Good point, let me think about that.

> 
>> ---
>>   mm/huge_memory.c | 4 ++--
>>   mm/memory.c      | 4 ++--
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 8e0e3cfd9f223..e52360df87d15 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1392,7 +1392,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
>>   		const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
>>   					  fop.pfn;
>>   
>> -		if (write) {
>> +		if (write && pmd_present(*pmd)) {
>>   			if (pmd_pfn(*pmd) != pfn) {
>>   				WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
>>   				return -EEXIST;
>> @@ -1541,7 +1541,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
>>   		const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
>>   					  fop.pfn;
>>   
>> -		if (write) {
>> +		if (write && pud_present(*pud)) {
>>   			if (WARN_ON_ONCE(pud_pfn(*pud) != pfn))
>>   				return;
>>   			entry = pud_mkyoung(*pud);
>> diff --git a/mm/memory.c b/mm/memory.c
>> index a1b5575db52ac..9a1acd057ce59 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2137,7 +2137,7 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
>>   	pte_t pteval = ptep_get(pte);
>>   
>>   	if (!pte_none(pteval)) {
>> -		if (!mkwrite)
>> +		if (!mkwrite || !pte_present(pteval))
>>   			return -EBUSY;
> 
> Why EBUSY? because it might transitory?

I was confused myself about error handling, and why it differs for all 
cases ... adding to me todo list to investigate that (clean it up ...) :)

-- 
Cheers,

David / dhildenb