Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in hugetlb_change_protection()")
similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/huge_memory.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..bdda4f426d58 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_swp_uffd_wp(*pvmw->pmd))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
--
2.39.2
On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
> similarly applies to THP.
>
> Setting/clearing uffd-wp on THP migration entries is not implemented
> properly. Further, while removing migration PMDs considers the uffd-wp
> bit, inserting migration PMDs does not consider the uffd-wp bit.
>
> We have to set/clear independently of the migration entry type in
> change_huge_pmd() and properly copy the uffd-wp bit in
> set_pmd_migration_entry().
>
> Verified using a simple reproducer that triggers migration of a THP, that
> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>
> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Thanks, one trivial nitpick:
> ---
> mm/huge_memory.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..bdda4f426d58 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> if (is_swap_pmd(*pmd)) {
> swp_entry_t entry = pmd_to_swp_entry(*pmd);
> struct page *page = pfn_swap_entry_to_page(entry);
> + pmd_t newpmd;
>
> VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> if (is_writable_migration_entry(entry)) {
> - pmd_t newpmd;
> /*
> * A protection check is difficult so
> * just be safe and disable write
> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> newpmd = pmd_swp_mksoft_dirty(newpmd);
> if (pmd_swp_uffd_wp(*pmd))
> newpmd = pmd_swp_mkuffd_wp(newpmd);
> - set_pmd_at(mm, addr, pmd, newpmd);
> + } else {
> + newpmd = *pmd;
> }
> +
> + if (uffd_wp)
> + newpmd = pmd_swp_mkuffd_wp(newpmd);
> + else if (uffd_wp_resolve)
> + newpmd = pmd_swp_clear_uffd_wp(newpmd);
> + if (!pmd_same(*pmd, newpmd))
> + set_pmd_at(mm, addr, pmd, newpmd);
> goto unlock;
> }
> #endif
> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> pmdswp = swp_entry_to_pmd(entry);
> if (pmd_soft_dirty(pmdval))
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> + if (pmd_swp_uffd_wp(*pvmw->pmd))
> + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
I think it's fine to use *pmd, but maybe still better to use pmdval? I
worry pmdp_invalidate()) can be something else in the future that may
affect the bit.
> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> page_remove_rmap(page, vma, true);
> put_page(page);
> --
> 2.39.2
>
--
Peter Xu
On 05.04.23 17:12, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
>> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
>> similarly applies to THP.
>>
>> Setting/clearing uffd-wp on THP migration entries is not implemented
>> properly. Further, while removing migration PMDs considers the uffd-wp
>> bit, inserting migration PMDs does not consider the uffd-wp bit.
>>
>> We have to set/clear independently of the migration entry type in
>> change_huge_pmd() and properly copy the uffd-wp bit in
>> set_pmd_migration_entry().
>>
>> Verified using a simple reproducer that triggers migration of a THP, that
>> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>>
>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Thanks, one trivial nitpick:
>
>> ---
>> mm/huge_memory.c | 14 ++++++++++++--
>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..bdda4f426d58 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> if (is_swap_pmd(*pmd)) {
>> swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> struct page *page = pfn_swap_entry_to_page(entry);
>> + pmd_t newpmd;
>>
>> VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>> if (is_writable_migration_entry(entry)) {
>> - pmd_t newpmd;
>> /*
>> * A protection check is difficult so
>> * just be safe and disable write
>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> newpmd = pmd_swp_mksoft_dirty(newpmd);
>> if (pmd_swp_uffd_wp(*pmd))
>> newpmd = pmd_swp_mkuffd_wp(newpmd);
>> - set_pmd_at(mm, addr, pmd, newpmd);
>> + } else {
>> + newpmd = *pmd;
>> }
>> +
>> + if (uffd_wp)
>> + newpmd = pmd_swp_mkuffd_wp(newpmd);
>> + else if (uffd_wp_resolve)
>> + newpmd = pmd_swp_clear_uffd_wp(newpmd);
>> + if (!pmd_same(*pmd, newpmd))
>> + set_pmd_at(mm, addr, pmd, newpmd);
>> goto unlock;
>> }
>> #endif
>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> pmdswp = swp_entry_to_pmd(entry);
>> if (pmd_soft_dirty(pmdval))
>> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>> + if (pmd_swp_uffd_wp(*pvmw->pmd))
>> + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>
> I think it's fine to use *pmd, but maybe still better to use pmdval? I
> worry pmdp_invalidate()) can be something else in the future that may
> affect the bit.
Wondering how I ended up with that, I realized that it's actually
wrong and might have worked by chance for my reproducer on x86.
That should make it work:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f977c965fdad..fffc953fa6ea 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
- if (pmd_swp_uffd_wp(*pvmw->pmd))
+ if (pmd_uffd_wp(pmdval))
pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
--
Thanks,
David / dhildenb
On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote:
> On 05.04.23 17:12, Peter Xu wrote:
> > On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
> > > Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
> > > fix uffd-wp handling for migration entries in hugetlb_change_protection()")
> > > similarly applies to THP.
> > >
> > > Setting/clearing uffd-wp on THP migration entries is not implemented
> > > properly. Further, while removing migration PMDs considers the uffd-wp
> > > bit, inserting migration PMDs does not consider the uffd-wp bit.
> > >
> > > We have to set/clear independently of the migration entry type in
> > > change_huge_pmd() and properly copy the uffd-wp bit in
> > > set_pmd_migration_entry().
> > >
> > > Verified using a simple reproducer that triggers migration of a THP, that
> > > the set_pmd_migration_entry() no longer loses the uffd-wp bit.
> > >
> > > Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> >
> > Thanks, one trivial nitpick:
> >
> > > ---
> > > mm/huge_memory.c | 14 ++++++++++++--
> > > 1 file changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 032fb0ef9cd1..bdda4f426d58 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > > if (is_swap_pmd(*pmd)) {
> > > swp_entry_t entry = pmd_to_swp_entry(*pmd);
> > > struct page *page = pfn_swap_entry_to_page(entry);
> > > + pmd_t newpmd;
> > > VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> > > if (is_writable_migration_entry(entry)) {
> > > - pmd_t newpmd;
> > > /*
> > > * A protection check is difficult so
> > > * just be safe and disable write
> > > @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > > newpmd = pmd_swp_mksoft_dirty(newpmd);
> > > if (pmd_swp_uffd_wp(*pmd))
> > > newpmd = pmd_swp_mkuffd_wp(newpmd);
> > > - set_pmd_at(mm, addr, pmd, newpmd);
> > > + } else {
> > > + newpmd = *pmd;
> > > }
> > > +
> > > + if (uffd_wp)
> > > + newpmd = pmd_swp_mkuffd_wp(newpmd);
> > > + else if (uffd_wp_resolve)
> > > + newpmd = pmd_swp_clear_uffd_wp(newpmd);
> > > + if (!pmd_same(*pmd, newpmd))
> > > + set_pmd_at(mm, addr, pmd, newpmd);
> > > goto unlock;
> > > }
> > > #endif
> > > @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > > pmdswp = swp_entry_to_pmd(entry);
> > > if (pmd_soft_dirty(pmdval))
> > > pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> > > + if (pmd_swp_uffd_wp(*pvmw->pmd))
> > > + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> >
> > I think it's fine to use *pmd, but maybe still better to use pmdval? I
> > worry pmdp_invalidate()) can be something else in the future that may
> > affect the bit.
>
> Wondering how I ended up with that, I realized that it's actually
> wrong and might have worked by chance for my reproducer on x86.
>
> That should make it work:
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f977c965fdad..fffc953fa6ea 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> pmdswp = swp_entry_to_pmd(entry);
> if (pmd_soft_dirty(pmdval))
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> - if (pmd_swp_uffd_wp(*pvmw->pmd))
> + if (pmd_uffd_wp(pmdval))
> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> page_remove_rmap(page, vma, true);
I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for
a present pte, but then it sets swp uffd-wp always even if it was not set.
Yes the change must be squashed in to be correct, with that, my R-b keeps.
Thanks,
--
Peter Xu
On 05.04.23 17:43, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote:
>> On 05.04.23 17:12, Peter Xu wrote:
>>> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
>>>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
>>>> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
>>>> similarly applies to THP.
>>>>
>>>> Setting/clearing uffd-wp on THP migration entries is not implemented
>>>> properly. Further, while removing migration PMDs considers the uffd-wp
>>>> bit, inserting migration PMDs does not consider the uffd-wp bit.
>>>>
>>>> We have to set/clear independently of the migration entry type in
>>>> change_huge_pmd() and properly copy the uffd-wp bit in
>>>> set_pmd_migration_entry().
>>>>
>>>> Verified using a simple reproducer that triggers migration of a THP, that
>>>> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>>>>
>>>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>
>>> Thanks, one trivial nitpick:
>>>
>>>> ---
>>>> mm/huge_memory.c | 14 ++++++++++++--
>>>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 032fb0ef9cd1..bdda4f426d58 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>> if (is_swap_pmd(*pmd)) {
>>>> swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>>> struct page *page = pfn_swap_entry_to_page(entry);
>>>> + pmd_t newpmd;
>>>> VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>>>> if (is_writable_migration_entry(entry)) {
>>>> - pmd_t newpmd;
>>>> /*
>>>> * A protection check is difficult so
>>>> * just be safe and disable write
>>>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>> newpmd = pmd_swp_mksoft_dirty(newpmd);
>>>> if (pmd_swp_uffd_wp(*pmd))
>>>> newpmd = pmd_swp_mkuffd_wp(newpmd);
>>>> - set_pmd_at(mm, addr, pmd, newpmd);
>>>> + } else {
>>>> + newpmd = *pmd;
>>>> }
>>>> +
>>>> + if (uffd_wp)
>>>> + newpmd = pmd_swp_mkuffd_wp(newpmd);
>>>> + else if (uffd_wp_resolve)
>>>> + newpmd = pmd_swp_clear_uffd_wp(newpmd);
>>>> + if (!pmd_same(*pmd, newpmd))
>>>> + set_pmd_at(mm, addr, pmd, newpmd);
>>>> goto unlock;
>>>> }
>>>> #endif
>>>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>>> pmdswp = swp_entry_to_pmd(entry);
>>>> if (pmd_soft_dirty(pmdval))
>>>> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>>>> + if (pmd_swp_uffd_wp(*pvmw->pmd))
>>>> + pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>>>
>>> I think it's fine to use *pmd, but maybe still better to use pmdval? I
>>> worry pmdp_invalidate()) can be something else in the future that may
>>> affect the bit.
>>
>> Wondering how I ended up with that, I realized that it's actually
>> wrong and might have worked by chance for my reproducer on x86.
>>
>> That should make it work:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index f977c965fdad..fffc953fa6ea 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> pmdswp = swp_entry_to_pmd(entry);
>> if (pmd_soft_dirty(pmdval))
>> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>> - if (pmd_swp_uffd_wp(*pvmw->pmd))
>> + if (pmd_uffd_wp(pmdval))
>> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
>> page_remove_rmap(page, vma, true);
>
> I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for
> a present pte, but then it sets swp uffd-wp always even if it was not set.
>
Yes. I modified the reproducer to migrate without uffd-wp first and we
suddenly gain a uffd-wp bit.
> Yes the change must be squashed in to be correct, with that, my R-b keeps.
Thanks, I will resend later.
--
Thanks,
David / dhildenb
© 2016 - 2025 Red Hat, Inc.