mm/rmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
false" has extended folio_add_new_anon_rmap() to use on non-exclusive
folios, already visible to others in swap cache and on LRU.
That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
overwriting concurrent atomic operations on folio->flags, losing bits
added or restoring bits cleared. Since it's only used in this risky
way when folio_test_locked and !folio_test_anon, many such races are
excluded; but, for example, isolations by folio_test_clear_lru() are
vulnerable, and setting or clearing active.
It could just use the atomic folio_set_swapbacked(); but this function
does try to avoid atomics where it can, so use a branch instead: just
avoid setting swapbacked when it is already set, that is good enough.
(Swapbacked is normally stable once set: lazyfree can undo it, but
only later, when found anon in a page table.)
This fixes a lot of instability under compaction and swapping loads:
assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
frees - though I've not worked out what races could lead to the latter.
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/rmap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index df1a43295c85..5394c1178bf1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_VMA(address < vma->vm_start ||
address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
- __folio_set_swapbacked(folio);
+
+ if (!folio_test_swapbacked(folio))
+ __folio_set_swapbacked(folio);
__folio_set_anon(folio, vma, address, exclusive);
if (likely(!folio_test_large(folio))) {
--
2.35.3
On 25.06.24 07:00, Hugh Dickins wrote:
> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
> folios, already visible to others in swap cache and on LRU.
>
> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
> overwriting concurrent atomic operations on folio->flags, losing bits
> added or restoring bits cleared. Since it's only used in this risky
> way when folio_test_locked and !folio_test_anon, many such races are
> excluded; but, for example, isolations by folio_test_clear_lru() are
> vulnerable, and setting or clearing active.
>
> It could just use the atomic folio_set_swapbacked(); but this function
> does try to avoid atomics where it can, so use a branch instead: just
> avoid setting swapbacked when it is already set, that is good enough.
> (Swapbacked is normally stable once set: lazyfree can undo it, but
> only later, when found anon in a page table.)
>
> This fixes a lot of instability under compaction and swapping loads:
> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
> frees - though I've not worked out what races could lead to the latter.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> mm/rmap.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index df1a43295c85..5394c1178bf1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_VMA(address < vma->vm_start ||
> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> - __folio_set_swapbacked(folio);
> +
> + if (!folio_test_swapbacked(folio))
> + __folio_set_swapbacked(folio);
> __folio_set_anon(folio, vma, address, exclusive);
>
> if (likely(!folio_test_large(folio))) {
LGTM.
I'll point out that it's sufficient for a PFN walker to do a tryget +
trylock to cause trouble.
Fortunately isolate_movable_page() will check __folio_test_movable()
before doing the trylock.
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
On Tue, 25 Jun 2024, David Hildenbrand wrote:
> On 25.06.24 07:00, Hugh Dickins wrote:
> > Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
> > false" has extended folio_add_new_anon_rmap() to use on non-exclusive
> > folios, already visible to others in swap cache and on LRU.
> >
> > That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
> > overwriting concurrent atomic operations on folio->flags, losing bits
> > added or restoring bits cleared. Since it's only used in this risky
> > way when folio_test_locked and !folio_test_anon, many such races are
> > excluded; but, for example, isolations by folio_test_clear_lru() are
> > vulnerable, and setting or clearing active.
> >
> > It could just use the atomic folio_set_swapbacked(); but this function
> > does try to avoid atomics where it can, so use a branch instead: just
> > avoid setting swapbacked when it is already set, that is good enough.
> > (Swapbacked is normally stable once set: lazyfree can undo it, but
> > only later, when found anon in a page table.)
> >
> > This fixes a lot of instability under compaction and swapping loads:
> > assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
> > frees - though I've not worked out what races could lead to the latter.
> >
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > mm/rmap.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index df1a43295c85..5394c1178bf1 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio,
> > struct vm_area_struct *vma,
> > VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> > VM_BUG_ON_VMA(address < vma->vm_start ||
> > address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> > - __folio_set_swapbacked(folio);
> > +
> > + if (!folio_test_swapbacked(folio))
> > + __folio_set_swapbacked(folio);
> > __folio_set_anon(folio, vma, address, exclusive);
> >
> > if (likely(!folio_test_large(folio))) {
>
> LGTM.
>
> I'll point out that it's sufficient for a PFN walker to do a tryget + trylock
> to cause trouble.
That surprises me. I thought a racer's tryget was irrelevant (touching
a different field) and its trylock not a problem, since "we" hold the
folio lock throughout. If my mental model is too naive there, please
explain in more detail: we all need to understand this better.
>
> Fortunately isolate_movable_page() will check __folio_test_movable() before
> doing the trylock.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
Thanks,
Hugh
On 25.06.24 21:37, Hugh Dickins wrote:
> On Tue, 25 Jun 2024, David Hildenbrand wrote:
>> On 25.06.24 07:00, Hugh Dickins wrote:
>>> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
>>> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
>>> folios, already visible to others in swap cache and on LRU.
>>>
>>> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
>>> overwriting concurrent atomic operations on folio->flags, losing bits
>>> added or restoring bits cleared. Since it's only used in this risky
>>> way when folio_test_locked and !folio_test_anon, many such races are
>>> excluded; but, for example, isolations by folio_test_clear_lru() are
>>> vulnerable, and setting or clearing active.
>>>
>>> It could just use the atomic folio_set_swapbacked(); but this function
>>> does try to avoid atomics where it can, so use a branch instead: just
>>> avoid setting swapbacked when it is already set, that is good enough.
>>> (Swapbacked is normally stable once set: lazyfree can undo it, but
>>> only later, when found anon in a page table.)
>>>
>>> This fixes a lot of instability under compaction and swapping loads:
>>> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
>>> frees - though I've not worked out what races could lead to the latter.
>>>
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>> ---
>>> mm/rmap.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index df1a43295c85..5394c1178bf1 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio,
>>> struct vm_area_struct *vma,
>>> VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>>> VM_BUG_ON_VMA(address < vma->vm_start ||
>>> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>> - __folio_set_swapbacked(folio);
>>> +
>>> + if (!folio_test_swapbacked(folio))
>>> + __folio_set_swapbacked(folio);
>>> __folio_set_anon(folio, vma, address, exclusive);
>>>
>>> if (likely(!folio_test_large(folio))) {
>>
>> LGTM.
>>
>> I'll point out that it's sufficient for a PFN walker to do a tryget + trylock
>> to cause trouble.
>
> That surprises me. I thought a racer's tryget was irrelevant (touching
> a different field) and its trylock not a problem, since "we" hold the
> folio lock throughout. If my mental model is too naive there, please
> explain in more detail: we all need to understand this better.
Sorry, I was imprecise.
tryget+trylock should indeed not be a problem, tryget+lock would be,
because IIRC folio_wait_bit_common()->folio_set_waiters() would be
messing with folio flags.
--
Cheers,
David / dhildenb
On Tue, 25 Jun 2024, David Hildenbrand wrote: > On 25.06.24 21:37, Hugh Dickins wrote: > > On Tue, 25 Jun 2024, David Hildenbrand wrote: > >> > >> I'll point out that it's sufficient for a PFN walker to do a tryget + > >> trylock > >> to cause trouble. > > > > That surprises me. I thought a racer's tryget was irrelevant (touching > > a different field) and its trylock not a problem, since "we" hold the > > folio lock throughout. If my mental model is too naive there, please > > explain in more detail: we all need to understand this better. > > Sorry, I was imprecise. > > tryget+trylock should indeed not be a problem, tryget+lock would be, because > IIRC folio_wait_bit_common()->folio_set_waiters() would be messing with folio > flags. Interesting observation, thanks. I had imagined that a folio locker was safe, but think you're right that (before the fix) this could have erased its PG_waiters. Typically, I guess something else would come along sooner or later to lock the folio, and that succeed in waking up the earlier one: so probably not an issue that would be detected in testing, but not good. Hugh
On Tue, Jun 25, 2024 at 5:00 PM Hugh Dickins <hughd@google.com> wrote:
>
> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
> folios, already visible to others in swap cache and on LRU.
>
> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
> overwriting concurrent atomic operations on folio->flags, losing bits
> added or restoring bits cleared. Since it's only used in this risky
> way when folio_test_locked and !folio_test_anon, many such races are
> excluded; but, for example, isolations by folio_test_clear_lru() are
> vulnerable, and setting or clearing active.
>
> It could just use the atomic folio_set_swapbacked(); but this function
> does try to avoid atomics where it can, so use a branch instead: just
> avoid setting swapbacked when it is already set, that is good enough.
> (Swapbacked is normally stable once set: lazyfree can undo it, but
> only later, when found anon in a page table.)
>
> This fixes a lot of instability under compaction and swapping loads:
> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
> frees - though I've not worked out what races could lead to the latter.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
Thanks a lot, Hugh. Sorry for my mistake. I guess we should squash this into
patch 1/3 "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio) ==
false"?
Andrew, could you please help to squash this one?
> ---
> mm/rmap.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index df1a43295c85..5394c1178bf1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_VMA(address < vma->vm_start ||
> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> - __folio_set_swapbacked(folio);
> +
> + if (!folio_test_swapbacked(folio))
> + __folio_set_swapbacked(folio);
> __folio_set_anon(folio, vma, address, exclusive);
>
> if (likely(!folio_test_large(folio))) {
> --
> 2.35.3
>
Thanks
Barry
On 2024/6/25 13:55, Barry Song wrote:
> On Tue, Jun 25, 2024 at 5:00 PM Hugh Dickins <hughd@google.com> wrote:
>>
>> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
>> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
>> folios, already visible to others in swap cache and on LRU.
>>
>> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
>> overwriting concurrent atomic operations on folio->flags, losing bits
>> added or restoring bits cleared. Since it's only used in this risky
>> way when folio_test_locked and !folio_test_anon, many such races are
>> excluded; but, for example, isolations by folio_test_clear_lru() are
>> vulnerable, and setting or clearing active.
>>
>> It could just use the atomic folio_set_swapbacked(); but this function
>> does try to avoid atomics where it can, so use a branch instead: just
>> avoid setting swapbacked when it is already set, that is good enough.
>> (Swapbacked is normally stable once set: lazyfree can undo it, but
>> only later, when found anon in a page table.)
>>
>> This fixes a lot of instability under compaction and swapping loads:
>> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
>> frees - though I've not worked out what races could lead to the latter.
>>
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>
> Thanks a lot, Hugh. Sorry for my mistake. I guess we should squash this into
> patch 1/3 "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio) ==
> false"?
> Andrew, could you please help to squash this one?
Hope the commit message written by Hugh can also be squashed into the
original patch, as it is very helpful to me :)
>> ---
>> mm/rmap.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index df1a43295c85..5394c1178bf1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>> VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>> VM_BUG_ON_VMA(address < vma->vm_start ||
>> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>> - __folio_set_swapbacked(folio);
>> +
>> + if (!folio_test_swapbacked(folio))
>> + __folio_set_swapbacked(folio);
>> __folio_set_anon(folio, vma, address, exclusive);
>>
>> if (likely(!folio_test_large(folio))) {
>> --
>> 2.35.3
>>
>
> Thanks
> Barry
© 2016 - 2025 Red Hat, Inc.