On 20.06.25 15:27, Oscar Salvador wrote:
> On Tue, Jun 17, 2025 at 05:43:34PM +0200, David Hildenbrand wrote:
>> Doing a pte_pfn() etc. of something that is not a present page table
>> entry is wrong. Let's check in all relevant cases where we want to
>> upgrade write permissions when inserting pfns/pages whether the entry
>> is actually present.
>
> Maybe I would add that's because the pte can have other info like
> marker, swp_entry etc.
>
>> It's not expected to have caused real harm in practice, so this is more a
>> cleanup than a fix for something that would likely trigger in some
>> weird circumstances.
>>
>> At some point, we should likely unify the two pte handling paths,
>> similar to how we did it for pmds/puds.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>
> Should we scream if someone passes us a non-present entry?
>
Probably? Good point, let me think about that.
>
>> ---
>> mm/huge_memory.c | 4 ++--
>> mm/memory.c | 4 ++--
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 8e0e3cfd9f223..e52360df87d15 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1392,7 +1392,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
>> const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
>> fop.pfn;
>>
>> - if (write) {
>> + if (write && pmd_present(*pmd)) {
>> if (pmd_pfn(*pmd) != pfn) {
>> WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
>> return -EEXIST;
>> @@ -1541,7 +1541,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
>> const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
>> fop.pfn;
>>
>> - if (write) {
>> + if (write && pud_present(*pud)) {
>> if (WARN_ON_ONCE(pud_pfn(*pud) != pfn))
>> return;
>> entry = pud_mkyoung(*pud);
>> diff --git a/mm/memory.c b/mm/memory.c
>> index a1b5575db52ac..9a1acd057ce59 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2137,7 +2137,7 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
>> pte_t pteval = ptep_get(pte);
>>
>> if (!pte_none(pteval)) {
>> - if (!mkwrite)
>> + if (!mkwrite || !pte_present(pteval))
>> return -EBUSY;
>
> Why EBUSY? because it might transitory?
I was confused myself about error handling, and why it differs for all
cases ... adding to me todo list to investigate that (clean it up ...) :)
--
Cheers,
David / dhildenb