Now that swap supports storing all mTHP sizes, avoid splitting large
folios before swap-out. This benefits performance of the swap-out path
by eliding split_folio_to_list(), which is expensive, and also sets us
up for swapping in large folios in a future series.
If the folio is partially mapped, we continue to split it since we want
to avoid the extra IO overhead and storage of writing out pages
uneccessarily.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
mm/vmscan.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cf7d4cf47f1a..0ebec99e04c6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
if (!can_split_folio(folio, NULL))
goto activate_locked;
/*
- * Split folios without a PMD map right
- * away. Chances are some or all of the
- * tail pages can be freed without IO.
+ * Split partially mapped folios map
+ * right away. Chances are some or all
+ * of the tail pages can be freed
+ * without IO.
*/
- if (!folio_entire_mapcount(folio) &&
+ if (!list_empty(&folio->_deferred_list) &&
split_folio_to_list(folio,
folio_list))
goto activate_locked;
--
2.25.1
On 11.03.24 16:00, Ryan Roberts wrote: > Now that swap supports storing all mTHP sizes, avoid splitting large > folios before swap-out. This benefits performance of the swap-out path > by eliding split_folio_to_list(), which is expensive, and also sets us > up for swapping in large folios in a future series. > > If the folio is partially mapped, we continue to split it since we want > to avoid the extra IO overhead and storage of writing out pages > uneccessarily. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > mm/vmscan.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cf7d4cf47f1a..0ebec99e04c6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > if (!can_split_folio(folio, NULL)) > goto activate_locked; > /* > - * Split folios without a PMD map right > - * away. Chances are some or all of the > - * tail pages can be freed without IO. > + * Split partially mapped folios map > + * right away. Chances are some or all > + * of the tail pages can be freed > + * without IO. > */ > - if (!folio_entire_mapcount(folio) && > + if (!list_empty(&folio->_deferred_list) && > split_folio_to_list(folio, > folio_list)) > goto activate_locked; Not sure if we might have to annotate that with data_race(). Reviewed-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb
On 15/03/2024 10:43, David Hildenbrand wrote:
> On 11.03.24 16:00, Ryan Roberts wrote:
>> Now that swap supports storing all mTHP sizes, avoid splitting large
>> folios before swap-out. This benefits performance of the swap-out path
>> by eliding split_folio_to_list(), which is expensive, and also sets us
>> up for swapping in large folios in a future series.
>>
>> If the folio is partially mapped, we continue to split it since we want
>> to avoid the extra IO overhead and storage of writing out pages
>> uneccessarily.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>> mm/vmscan.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index cf7d4cf47f1a..0ebec99e04c6 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>> *folio_list,
>> if (!can_split_folio(folio, NULL))
>> goto activate_locked;
>> /*
>> - * Split folios without a PMD map right
>> - * away. Chances are some or all of the
>> - * tail pages can be freed without IO.
>> + * Split partially mapped folios map
>> + * right away. Chances are some or all
>> + * of the tail pages can be freed
>> + * without IO.
>> */
>> - if (!folio_entire_mapcount(folio) &&
>> + if (!list_empty(&folio->_deferred_list) &&
>> split_folio_to_list(folio,
>> folio_list))
>> goto activate_locked;
>
> Not sure if we might have to annotate that with data_race().
I asked that exact question to Matthew in another context bt didn't get a
response. There are examples of checking if the deferred list is empty with and
without data_race() in the code base. But list_empty() is implemented like this:
static inline int list_empty(const struct list_head *head)
{
return READ_ONCE(head->next) == head;
}
So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
sufficient for KCSAN?
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
>
Thanks!
On 15.03.24 11:49, Ryan Roberts wrote:
> On 15/03/2024 10:43, David Hildenbrand wrote:
>> On 11.03.24 16:00, Ryan Roberts wrote:
>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>> folios before swap-out. This benefits performance of the swap-out path
>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>> up for swapping in large folios in a future series.
>>>
>>> If the folio is partially mapped, we continue to split it since we want
>>> to avoid the extra IO overhead and storage of writing out pages
>>> uneccessarily.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>> mm/vmscan.c | 9 +++++----
>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>>> *folio_list,
>>> if (!can_split_folio(folio, NULL))
>>> goto activate_locked;
>>> /*
>>> - * Split folios without a PMD map right
>>> - * away. Chances are some or all of the
>>> - * tail pages can be freed without IO.
>>> + * Split partially mapped folios map
>>> + * right away. Chances are some or all
>>> + * of the tail pages can be freed
>>> + * without IO.
>>> */
>>> - if (!folio_entire_mapcount(folio) &&
>>> + if (!list_empty(&folio->_deferred_list) &&
>>> split_folio_to_list(folio,
>>> folio_list))
>>> goto activate_locked;
>>
>> Not sure if we might have to annotate that with data_race().
>
> I asked that exact question to Matthew in another context bt didn't get a
> response. There are examples of checking if the deferred list is empty with and
> without data_race() in the code base. But list_empty() is implemented like this:
>
> static inline int list_empty(const struct list_head *head)
> {
> return READ_ONCE(head->next) == head;
> }
>
> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
> sufficient for KCSAN?
Yeah, there is only one use of data_race with that list.
It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is
in deferred list").
Looks like that was added right in v1 of that change [1], so my best
guess is that it is not actually required.
If not required, likely we should just cleanup the single user.
[1]
https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
--
Cheers,
David / dhildenb
Hi Yin Fengwei,
On 15/03/2024 11:12, David Hildenbrand wrote:
> On 15.03.24 11:49, Ryan Roberts wrote:
>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>> folios before swap-out. This benefits performance of the swap-out path
>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>> up for swapping in large folios in a future series.
>>>>
>>>> If the folio is partially mapped, we continue to split it since we want
>>>> to avoid the extra IO overhead and storage of writing out pages
>>>> uneccessarily.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>> mm/vmscan.c | 9 +++++----
>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>>>> *folio_list,
>>>> if (!can_split_folio(folio, NULL))
>>>> goto activate_locked;
>>>> /*
>>>> - * Split folios without a PMD map right
>>>> - * away. Chances are some or all of the
>>>> - * tail pages can be freed without IO.
>>>> + * Split partially mapped folios map
>>>> + * right away. Chances are some or all
>>>> + * of the tail pages can be freed
>>>> + * without IO.
>>>> */
>>>> - if (!folio_entire_mapcount(folio) &&
>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>> split_folio_to_list(folio,
>>>> folio_list))
>>>> goto activate_locked;
>>>
>>> Not sure if we might have to annotate that with data_race().
>>
>> I asked that exact question to Matthew in another context bt didn't get a
>> response. There are examples of checking if the deferred list is empty with and
>> without data_race() in the code base. But list_empty() is implemented like this:
>>
>> static inline int list_empty(const struct list_head *head)
>> {
>> return READ_ONCE(head->next) == head;
>> }
>>
>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
>> sufficient for KCSAN?
>
> Yeah, there is only one use of data_race with that list.
>
> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
> deferred list").
>
> Looks like that was added right in v1 of that change [1], so my best guess is
> that it is not actually required.
>
> If not required, likely we should just cleanup the single user.
>
> [1]
> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
Do you have any recollection of why you added the data_race() markup?
Thanks,
Ryan
>
Ryan Roberts <ryan.roberts@arm.com> writes:
> Hi Yin Fengwei,
>
> On 15/03/2024 11:12, David Hildenbrand wrote:
>> On 15.03.24 11:49, Ryan Roberts wrote:
>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>> up for swapping in large folios in a future series.
>>>>>
>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>> uneccessarily.
>>>>>
>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>> ---
>>>>> mm/vmscan.c | 9 +++++----
>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>>>>> *folio_list,
>>>>> if (!can_split_folio(folio, NULL))
>>>>> goto activate_locked;
>>>>> /*
>>>>> - * Split folios without a PMD map right
>>>>> - * away. Chances are some or all of the
>>>>> - * tail pages can be freed without IO.
>>>>> + * Split partially mapped folios map
>>>>> + * right away. Chances are some or all
>>>>> + * of the tail pages can be freed
>>>>> + * without IO.
>>>>> */
>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>> split_folio_to_list(folio,
>>>>> folio_list))
>>>>> goto activate_locked;
>>>>
>>>> Not sure if we might have to annotate that with data_race().
>>>
>>> I asked that exact question to Matthew in another context bt didn't get a
>>> response. There are examples of checking if the deferred list is empty with and
>>> without data_race() in the code base. But list_empty() is implemented like this:
>>>
>>> static inline int list_empty(const struct list_head *head)
>>> {
>>> return READ_ONCE(head->next) == head;
>>> }
>>>
>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
>>> sufficient for KCSAN?
>>
>> Yeah, there is only one use of data_race with that list.
>>
>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>> deferred list").
>>
>> Looks like that was added right in v1 of that change [1], so my best guess is
>> that it is not actually required.
>>
>> If not required, likely we should just cleanup the single user.
>>
>> [1]
>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>
> Do you have any recollection of why you added the data_race() markup?
Per my understanding, this is used to mark that the code accesses
folio->_deferred_list without lock intentionally, while
folio->_deferred_list may be changed in parallel. IIUC, this is what
data_race() is used for. Or, my understanding is wrong?
--
Best Regards,
Huang, Ying
On 3/18/2024 10:16 AM, Huang, Ying wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
>
>> Hi Yin Fengwei,
>>
>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>> up for swapping in large folios in a future series.
>>>>>>
>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>> uneccessarily.
>>>>>>
>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> ---
>>>>>> mm/vmscan.c | 9 +++++----
>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>> --- a/mm/vmscan.c
>>>>>> +++ b/mm/vmscan.c
>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>>>>>> *folio_list,
>>>>>> if (!can_split_folio(folio, NULL))
>>>>>> goto activate_locked;
>>>>>> /*
>>>>>> - * Split folios without a PMD map right
>>>>>> - * away. Chances are some or all of the
>>>>>> - * tail pages can be freed without IO.
>>>>>> + * Split partially mapped folios map
>>>>>> + * right away. Chances are some or all
>>>>>> + * of the tail pages can be freed
>>>>>> + * without IO.
>>>>>> */
>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>> split_folio_to_list(folio,
>>>>>> folio_list))
>>>>>> goto activate_locked;
>>>>>
>>>>> Not sure if we might have to annotate that with data_race().
>>>>
>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>> response. There are examples of checking if the deferred list is empty with and
>>>> without data_race() in the code base. But list_empty() is implemented like this:
>>>>
>>>> static inline int list_empty(const struct list_head *head)
>>>> {
>>>> return READ_ONCE(head->next) == head;
>>>> }
>>>>
>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
>>>> sufficient for KCSAN?
I don't think READ_ONCE() can replace the lock.
>>>
>>> Yeah, there is only one use of data_race with that list.
>>>
>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>> deferred list").
>>>
>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>> that it is not actually required.
>>>
>>> If not required, likely we should just cleanup the single user.
>>>
>>> [1]
>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>
>> Do you have any recollection of why you added the data_race() markup?
>
> Per my understanding, this is used to mark that the code accesses
> folio->_deferred_list without lock intentionally, while
> folio->_deferred_list may be changed in parallel. IIUC, this is what
> data_race() is used for. Or, my understanding is wrong?
Yes. This is my understanding also.
Regards
Yin, Fengwei
>
> --
> Best Regards,
> Huang, Ying
On 18.03.24 11:00, Yin, Fengwei wrote:
>
>
> On 3/18/2024 10:16 AM, Huang, Ying wrote:
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>
>>> Hi Yin Fengwei,
>>>
>>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>>> up for swapping in large folios in a future series.
>>>>>>>
>>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>>> uneccessarily.
>>>>>>>
>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>> ---
>>>>>>> mm/vmscan.c | 9 +++++----
>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>>> --- a/mm/vmscan.c
>>>>>>> +++ b/mm/vmscan.c
>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>>>>>>> *folio_list,
>>>>>>> if (!can_split_folio(folio, NULL))
>>>>>>> goto activate_locked;
>>>>>>> /*
>>>>>>> - * Split folios without a PMD map right
>>>>>>> - * away. Chances are some or all of the
>>>>>>> - * tail pages can be freed without IO.
>>>>>>> + * Split partially mapped folios map
>>>>>>> + * right away. Chances are some or all
>>>>>>> + * of the tail pages can be freed
>>>>>>> + * without IO.
>>>>>>> */
>>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>>> split_folio_to_list(folio,
>>>>>>> folio_list))
>>>>>>> goto activate_locked;
>>>>>>
>>>>>> Not sure if we might have to annotate that with data_race().
>>>>>
>>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>>> response. There are examples of checking if the deferred list is empty with and
>>>>> without data_race() in the code base. But list_empty() is implemented like this:
>>>>>
>>>>> static inline int list_empty(const struct list_head *head)
>>>>> {
>>>>> return READ_ONCE(head->next) == head;
>>>>> }
>>>>>
>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
>>>>> sufficient for KCSAN?
> I don't think READ_ONCE() can replace the lock.
>
>>>>
>>>> Yeah, there is only one use of data_race with that list.
>>>>
>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>>> deferred list").
>>>>
>>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>>> that it is not actually required.
>>>>
>>>> If not required, likely we should just cleanup the single user.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>>
>>> Do you have any recollection of why you added the data_race() markup?
>>
>> Per my understanding, this is used to mark that the code accesses
>> folio->_deferred_list without lock intentionally, while
>> folio->_deferred_list may be changed in parallel. IIUC, this is what
>> data_race() is used for. Or, my understanding is wrong?
> Yes. This is my understanding also.
Why don't we have a data_race() in deferred_split_folio() then, before
taking the lock?
It's used a bit inconsistently here.
--
Cheers,
David / dhildenb
On 3/18/24 18:05, David Hildenbrand wrote:
> On 18.03.24 11:00, Yin, Fengwei wrote:
>>
>>
>> On 3/18/2024 10:16 AM, Huang, Ying wrote:
>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>
>>>> Hi Yin Fengwei,
>>>>
>>>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>>>> up for swapping in large folios in a future series.
>>>>>>>>
>>>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>>>> uneccessarily.
>>>>>>>>
>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>>> ---
>>>>>>>> mm/vmscan.c | 9 +++++----
>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>>>> --- a/mm/vmscan.c
>>>>>>>> +++ b/mm/vmscan.c
>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head
>>>>>>>> *folio_list,
>>>>>>>> if (!can_split_folio(folio, NULL))
>>>>>>>> goto activate_locked;
>>>>>>>> /*
>>>>>>>> - * Split folios without a PMD map right
>>>>>>>> - * away. Chances are some or all of the
>>>>>>>> - * tail pages can be freed without IO.
>>>>>>>> + * Split partially mapped folios map
>>>>>>>> + * right away. Chances are some or all
>>>>>>>> + * of the tail pages can be freed
>>>>>>>> + * without IO.
>>>>>>>> */
>>>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>>>> split_folio_to_list(folio,
>>>>>>>> folio_list))
>>>>>>>> goto activate_locked;
>>>>>>>
>>>>>>> Not sure if we might have to annotate that with data_race().
>>>>>>
>>>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>>>> response. There are examples of checking if the deferred list is empty with and
>>>>>> without data_race() in the code base. But list_empty() is implemented like this:
>>>>>>
>>>>>> static inline int list_empty(const struct list_head *head)
>>>>>> {
>>>>>> return READ_ONCE(head->next) == head;
>>>>>> }
>>>>>>
>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not
>>>>>> sufficient for KCSAN?
>> I don't think READ_ONCE() can replace the lock.
>>
>>>>>
>>>>> Yeah, there is only one use of data_race with that list.
>>>>>
>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>>>> deferred list").
>>>>>
>>>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>>>> that it is not actually required.
>>>>>
>>>>> If not required, likely we should just cleanup the single user.
>>>>>
>>>>> [1]
>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>>>
>>>> Do you have any recollection of why you added the data_race() markup?
>>>
>>> Per my understanding, this is used to mark that the code accesses
>>> folio->_deferred_list without lock intentionally, while
>>> folio->_deferred_list may be changed in parallel. IIUC, this is what
>>> data_race() is used for. Or, my understanding is wrong?
>> Yes. This is my understanding also.
>
> Why don't we have a data_race() in deferred_split_folio() then, before taking the lock?
No idea why there is no data_race() added. But I think we should add data_race().
Regards
Yin, Fengwei
>
> It's used a bit inconsistently here.
>
On 18/03/2024 10:05, David Hildenbrand wrote:
> On 18.03.24 11:00, Yin, Fengwei wrote:
>>
>>
>> On 3/18/2024 10:16 AM, Huang, Ying wrote:
>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>
>>>> Hi Yin Fengwei,
>>>>
>>>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>>>> up for swapping in large folios in a future series.
>>>>>>>>
>>>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>>>> uneccessarily.
>>>>>>>>
>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>>> ---
>>>>>>>> mm/vmscan.c | 9 +++++----
>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>>>> --- a/mm/vmscan.c
>>>>>>>> +++ b/mm/vmscan.c
>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct
>>>>>>>> list_head
>>>>>>>> *folio_list,
>>>>>>>> if (!can_split_folio(folio, NULL))
>>>>>>>> goto activate_locked;
>>>>>>>> /*
>>>>>>>> - * Split folios without a PMD map right
>>>>>>>> - * away. Chances are some or all of the
>>>>>>>> - * tail pages can be freed without IO.
>>>>>>>> + * Split partially mapped folios map
>>>>>>>> + * right away. Chances are some or all
>>>>>>>> + * of the tail pages can be freed
>>>>>>>> + * without IO.
>>>>>>>> */
>>>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>>>> split_folio_to_list(folio,
>>>>>>>> folio_list))
>>>>>>>> goto activate_locked;
>>>>>>>
>>>>>>> Not sure if we might have to annotate that with data_race().
>>>>>>
>>>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>>>> response. There are examples of checking if the deferred list is empty
>>>>>> with and
>>>>>> without data_race() in the code base. But list_empty() is implemented like
>>>>>> this:
>>>>>>
>>>>>> static inline int list_empty(const struct list_head *head)
>>>>>> {
>>>>>> return READ_ONCE(head->next) == head;
>>>>>> }
>>>>>>
>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps
>>>>>> not
>>>>>> sufficient for KCSAN?
>> I don't think READ_ONCE() can replace the lock.
But it doesn't ensure we get a consistent value and that the compiler orders the
load correctly. There are lots of patterns in the kernel that use READ_ONCE()
without a lock and they don't use data_race() - e.g. ptep_get_lockless().
It sounds like none of us really understand what data_race() is for, so I guess
I'll just do a KCSAN build and invoke the code path to see if it complains.
>>
>>>>>
>>>>> Yeah, there is only one use of data_race with that list.
>>>>>
>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>>>> deferred list").
>>>>>
>>>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>>>> that it is not actually required.
>>>>>
>>>>> If not required, likely we should just cleanup the single user.
>>>>>
>>>>> [1]
>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>>>
>>>> Do you have any recollection of why you added the data_race() markup?
>>>
>>> Per my understanding, this is used to mark that the code accesses
>>> folio->_deferred_list without lock intentionally, while
>>> folio->_deferred_list may be changed in parallel. IIUC, this is what
>>> data_race() is used for. Or, my understanding is wrong?
>> Yes. This is my understanding also.
>
> Why don't we have a data_race() in deferred_split_folio() then, before taking
> the lock?
>
> It's used a bit inconsistently here.
>
On 3/18/24 23:35, Ryan Roberts wrote:
> On 18/03/2024 10:05, David Hildenbrand wrote:
>> On 18.03.24 11:00, Yin, Fengwei wrote:
>>>
>>>
>>> On 3/18/2024 10:16 AM, Huang, Ying wrote:
>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>
>>>>> Hi Yin Fengwei,
>>>>>
>>>>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>>>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>>>>> up for swapping in large folios in a future series.
>>>>>>>>>
>>>>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>>>>> uneccessarily.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>>>> ---
>>>>>>>>> mm/vmscan.c | 9 +++++----
>>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>>>>> --- a/mm/vmscan.c
>>>>>>>>> +++ b/mm/vmscan.c
>>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct
>>>>>>>>> list_head
>>>>>>>>> *folio_list,
>>>>>>>>> if (!can_split_folio(folio, NULL))
>>>>>>>>> goto activate_locked;
>>>>>>>>> /*
>>>>>>>>> - * Split folios without a PMD map right
>>>>>>>>> - * away. Chances are some or all of the
>>>>>>>>> - * tail pages can be freed without IO.
>>>>>>>>> + * Split partially mapped folios map
>>>>>>>>> + * right away. Chances are some or all
>>>>>>>>> + * of the tail pages can be freed
>>>>>>>>> + * without IO.
>>>>>>>>> */
>>>>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>>>>> split_folio_to_list(folio,
>>>>>>>>> folio_list))
>>>>>>>>> goto activate_locked;
>>>>>>>>
>>>>>>>> Not sure if we might have to annotate that with data_race().
>>>>>>>
>>>>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>>>>> response. There are examples of checking if the deferred list is empty
>>>>>>> with and
>>>>>>> without data_race() in the code base. But list_empty() is implemented like
>>>>>>> this:
>>>>>>>
>>>>>>> static inline int list_empty(const struct list_head *head)
>>>>>>> {
>>>>>>> return READ_ONCE(head->next) == head;
>>>>>>> }
>>>>>>>
>>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps
>>>>>>> not
>>>>>>> sufficient for KCSAN?
>>> I don't think READ_ONCE() can replace the lock.
>
> But it doesn't ensure we get a consistent value and that the compiler orders the
> load correctly. There are lots of patterns in the kernel that use READ_ONCE()
> without a lock and they don't use data_race() - e.g. ptep_get_lockless().
They (ptep_get_lockless() and deferred_list) have different access pattern
(or race pattern) here. I don't think they are comparable.
>
> It sounds like none of us really understand what data_race() is for, so I guess
> I'll just do a KCSAN build and invoke the code path to see if it complains.
READ_ONCE() in list_empty will shutdown the KCSAN also.
>
>
>>>
>>>>>>
>>>>>> Yeah, there is only one use of data_race with that list.
>>>>>>
>>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>>>>> deferred list").
>>>>>>
>>>>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>>>>> that it is not actually required.
>>>>>>
>>>>>> If not required, likely we should just cleanup the single user.
>>>>>>
>>>>>> [1]
>>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>>>>
>>>>> Do you have any recollection of why you added the data_race() markup?
>>>>
>>>> Per my understanding, this is used to mark that the code accesses
>>>> folio->_deferred_list without lock intentionally, while
>>>> folio->_deferred_list may be changed in parallel. IIUC, this is what
>>>> data_race() is used for. Or, my understanding is wrong?
>>> Yes. This is my understanding also.
>>
>> Why don't we have a data_race() in deferred_split_folio() then, before taking
>> the lock?
>>
>> It's used a bit inconsistently here.
>>
>
On 19/03/2024 02:20, Yin Fengwei wrote:
>
>
> On 3/18/24 23:35, Ryan Roberts wrote:
>> On 18/03/2024 10:05, David Hildenbrand wrote:
>>> On 18.03.24 11:00, Yin, Fengwei wrote:
>>>>
>>>>
>>>> On 3/18/2024 10:16 AM, Huang, Ying wrote:
>>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>>
>>>>>> Hi Yin Fengwei,
>>>>>>
>>>>>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>>>>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>>>>>> up for swapping in large folios in a future series.
>>>>>>>>>>
>>>>>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>>>>>> uneccessarily.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>>>>> ---
>>>>>>>>>> mm/vmscan.c | 9 +++++----
>>>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>>>>>> --- a/mm/vmscan.c
>>>>>>>>>> +++ b/mm/vmscan.c
>>>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct
>>>>>>>>>> list_head
>>>>>>>>>> *folio_list,
>>>>>>>>>> if (!can_split_folio(folio, NULL))
>>>>>>>>>> goto activate_locked;
>>>>>>>>>> /*
>>>>>>>>>> - * Split folios without a PMD map right
>>>>>>>>>> - * away. Chances are some or all of the
>>>>>>>>>> - * tail pages can be freed without IO.
>>>>>>>>>> + * Split partially mapped folios map
>>>>>>>>>> + * right away. Chances are some or all
>>>>>>>>>> + * of the tail pages can be freed
>>>>>>>>>> + * without IO.
>>>>>>>>>> */
>>>>>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>>>>>> split_folio_to_list(folio,
>>>>>>>>>> folio_list))
>>>>>>>>>> goto activate_locked;
>>>>>>>>>
>>>>>>>>> Not sure if we might have to annotate that with data_race().
>>>>>>>>
>>>>>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>>>>>> response. There are examples of checking if the deferred list is empty
>>>>>>>> with and
>>>>>>>> without data_race() in the code base. But list_empty() is implemented like
>>>>>>>> this:
>>>>>>>>
>>>>>>>> static inline int list_empty(const struct list_head *head)
>>>>>>>> {
>>>>>>>> return READ_ONCE(head->next) == head;
>>>>>>>> }
>>>>>>>>
>>>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps
>>>>>>>> not
>>>>>>>> sufficient for KCSAN?
>>>> I don't think READ_ONCE() can replace the lock.
>>
>> But it doesn't ensure we get a consistent value and that the compiler orders the
>> load correctly. There are lots of patterns in the kernel that use READ_ONCE()
>> without a lock and they don't use data_race() - e.g. ptep_get_lockless().
> They (ptep_get_lockless() and deferred_list) have different access pattern
> (or race pattern) here. I don't think they are comparable.
>
>>
>> It sounds like none of us really understand what data_race() is for, so I guess
>> I'll just do a KCSAN build and invoke the code path to see if it complains.
> READ_ONCE() in list_empty will shutdown the KCSAN also.
OK, I found some time to run the test with KCSAN; nothing fires.
But then I read the docs and looked at the code a bit.
Documentation/dev-tools/kcsan.rst states:
In an execution, two memory accesses form a *data race* if they *conflict*,
they happen concurrently in different threads, and at least one of them is a
*plain access*; they *conflict* if both access the same memory location, and
at least one is a write.
It also clarifies the READ_ONCE() is a "marked access". So we would have a data
race if there was a concurrent, *plain* write to folio->_deferred_list.next.
This can occur in a couple of places I believe, for example:
deferred_split_folio()
list_add_tail()
__list_add()
new->next = next;
deferred_split_scan()
list_move()
list_add()
__list_add()
new->next = next;
So if either partially deferred_split_folio() or deferred_split_scan() can run
concurrently with shrink_folio_list(), for the same folio (I beleive both can
can), then we have a race, and this list_empty() check needs to be protected
with data_race(). The race is safe/by design, but it does need to be marked.
I'll fix this in my next version.
Thanks,
Ryan
>
>>
>>
>>>>
>>>>>>>
>>>>>>> Yeah, there is only one use of data_race with that list.
>>>>>>>
>>>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>>>>>> deferred list").
>>>>>>>
>>>>>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>>>>>> that it is not actually required.
>>>>>>>
>>>>>>> If not required, likely we should just cleanup the single user.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>>>>>
>>>>>> Do you have any recollection of why you added the data_race() markup?
>>>>>
>>>>> Per my understanding, this is used to mark that the code accesses
>>>>> folio->_deferred_list without lock intentionally, while
>>>>> folio->_deferred_list may be changed in parallel. IIUC, this is what
>>>>> data_race() is used for. Or, my understanding is wrong?
>>>> Yes. This is my understanding also.
>>>
>>> Why don't we have a data_race() in deferred_split_folio() then, before taking
>>> the lock?
>>>
>>> It's used a bit inconsistently here.
>>>
>>
On 18/03/2024 15:35, Ryan Roberts wrote:
> On 18/03/2024 10:05, David Hildenbrand wrote:
>> On 18.03.24 11:00, Yin, Fengwei wrote:
>>>
>>>
>>> On 3/18/2024 10:16 AM, Huang, Ying wrote:
>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>
>>>>> Hi Yin Fengwei,
>>>>>
>>>>> On 15/03/2024 11:12, David Hildenbrand wrote:
>>>>>> On 15.03.24 11:49, Ryan Roberts wrote:
>>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote:
>>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote:
>>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large
>>>>>>>>> folios before swap-out. This benefits performance of the swap-out path
>>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us
>>>>>>>>> up for swapping in large folios in a future series.
>>>>>>>>>
>>>>>>>>> If the folio is partially mapped, we continue to split it since we want
>>>>>>>>> to avoid the extra IO overhead and storage of writing out pages
>>>>>>>>> uneccessarily.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>>>> ---
>>>>>>>>> mm/vmscan.c | 9 +++++----
>>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644
>>>>>>>>> --- a/mm/vmscan.c
>>>>>>>>> +++ b/mm/vmscan.c
>>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct
>>>>>>>>> list_head
>>>>>>>>> *folio_list,
>>>>>>>>> if (!can_split_folio(folio, NULL))
>>>>>>>>> goto activate_locked;
>>>>>>>>> /*
>>>>>>>>> - * Split folios without a PMD map right
>>>>>>>>> - * away. Chances are some or all of the
>>>>>>>>> - * tail pages can be freed without IO.
>>>>>>>>> + * Split partially mapped folios map
>>>>>>>>> + * right away. Chances are some or all
>>>>>>>>> + * of the tail pages can be freed
>>>>>>>>> + * without IO.
>>>>>>>>> */
>>>>>>>>> - if (!folio_entire_mapcount(folio) &&
>>>>>>>>> + if (!list_empty(&folio->_deferred_list) &&
>>>>>>>>> split_folio_to_list(folio,
>>>>>>>>> folio_list))
>>>>>>>>> goto activate_locked;
>>>>>>>>
>>>>>>>> Not sure if we might have to annotate that with data_race().
>>>>>>>
>>>>>>> I asked that exact question to Matthew in another context bt didn't get a
>>>>>>> response. There are examples of checking if the deferred list is empty
>>>>>>> with and
>>>>>>> without data_race() in the code base. But list_empty() is implemented like
>>>>>>> this:
>>>>>>>
>>>>>>> static inline int list_empty(const struct list_head *head)
>>>>>>> {
>>>>>>> return READ_ONCE(head->next) == head;
>>>>>>> }
>>>>>>>
>>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps
>>>>>>> not
>>>>>>> sufficient for KCSAN?
>>> I don't think READ_ONCE() can replace the lock.
>
> But it doesn't ensure we get a consistent value and that the compiler orders the
Sorry - fat fingers... I meant it *does* ensure we get a consistent value (i.e.
untorn)
> load correctly. There are lots of patterns in the kernel that use READ_ONCE()
> without a lock and they don't use data_race() - e.g. ptep_get_lockless().
>
> It sounds like none of us really understand what data_race() is for, so I guess
> I'll just do a KCSAN build and invoke the code path to see if it complains.
>
>
>>>
>>>>>>
>>>>>> Yeah, there is only one use of data_race with that list.
>>>>>>
>>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in
>>>>>> deferred list").
>>>>>>
>>>>>> Looks like that was added right in v1 of that change [1], so my best guess is
>>>>>> that it is not actually required.
>>>>>>
>>>>>> If not required, likely we should just cleanup the single user.
>>>>>>
>>>>>> [1]
>>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
>>>>>
>>>>> Do you have any recollection of why you added the data_race() markup?
>>>>
>>>> Per my understanding, this is used to mark that the code accesses
>>>> folio->_deferred_list without lock intentionally, while
>>>> folio->_deferred_list may be changed in parallel. IIUC, this is what
>>>> data_race() is used for. Or, my understanding is wrong?
>>> Yes. This is my understanding also.
>>
>> Why don't we have a data_race() in deferred_split_folio() then, before taking
>> the lock?
>>
>> It's used a bit inconsistently here.
>>
>
On Mon, Mar 11, 2024 at 11:01 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > Now that swap supports storing all mTHP sizes, avoid splitting large > folios before swap-out. This benefits performance of the swap-out path > by eliding split_folio_to_list(), which is expensive, and also sets us > up for swapping in large folios in a future series. > > If the folio is partially mapped, we continue to split it since we want > to avoid the extra IO overhead and storage of writing out pages > uneccessarily. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > mm/vmscan.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cf7d4cf47f1a..0ebec99e04c6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > if (!can_split_folio(folio, NULL)) > goto activate_locked; > /* > - * Split folios without a PMD map right > - * away. Chances are some or all of the > - * tail pages can be freed without IO. > + * Split partially mapped folios map > + * right away. Chances are some or all > + * of the tail pages can be freed > + * without IO. > */ > - if (!folio_entire_mapcount(folio) && > + if (!list_empty(&folio->_deferred_list) && Hi Ryan, After reconsidering our previous discussion about PMD-mapped large folios, I've pondered the possibility of PMD-mapped Transparent Huge Pages (THPs) being mapped by multiple processes. In such a scenario, if one process decides to unmap a portion of the folio while others retain the entire mapping, it raises questions about how the system should handle this situation. Would the large folio be placed in a deferred list? If so, splitting it might not yield benefits, as neither I/O nor swap slots would increase in this case by not splitting it. Regarding PTE-mapped large folios, the absence of an indicator like "entire_map" makes it challenging to identify cases where the entire folio is mapped. Thus, splitting seems to be the only viable solution in such circumstances. > split_folio_to_list(folio, > folio_list)) > goto activate_locked; > -- > 2.25.1 Thanks Barry
On 11/03/2024 22:30, Barry Song wrote: > On Mon, Mar 11, 2024 at 11:01 PM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> Now that swap supports storing all mTHP sizes, avoid splitting large >> folios before swap-out. This benefits performance of the swap-out path >> by eliding split_folio_to_list(), which is expensive, and also sets us >> up for swapping in large folios in a future series. >> >> If the folio is partially mapped, we continue to split it since we want >> to avoid the extra IO overhead and storage of writing out pages >> uneccessarily. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> mm/vmscan.c | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index cf7d4cf47f1a..0ebec99e04c6 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, >> if (!can_split_folio(folio, NULL)) >> goto activate_locked; >> /* >> - * Split folios without a PMD map right >> - * away. Chances are some or all of the >> - * tail pages can be freed without IO. >> + * Split partially mapped folios map >> + * right away. Chances are some or all >> + * of the tail pages can be freed >> + * without IO. >> */ >> - if (!folio_entire_mapcount(folio) && >> + if (!list_empty(&folio->_deferred_list) && > > Hi Ryan, > After reconsidering our previous discussion about PMD-mapped large > folios, I've pondered > the possibility of PMD-mapped Transparent Huge Pages (THPs) being > mapped by multiple > processes. In such a scenario, if one process decides to unmap a > portion of the folio while > others retain the entire mapping, it raises questions about how the > system should handle > this situation. Would the large folio be placed in a deferred list? No - if the large folio is entirely mapped (via PMD), then the folio will not be put on the deferred split list in the first place. See __folio_remove_rmap(): last = (last < ENTIRELY_MAPPED); means that nr will never be incremented above 0. (_nr_pages_mapped is incremented by ENTIRELY_MAPPED for every PMD map). > If > so, splitting it might not > yield benefits, as neither I/O nor swap slots would increase in this > case by not splitting it. > > Regarding PTE-mapped large folios, the absence of an indicator like > "entire_map" makes it > challenging to identify cases where the entire folio is mapped. Thus, > splitting seems to be > the only viable solution in such circumstances. > >> split_folio_to_list(folio, >> folio_list)) >> goto activate_locked; >> -- >> 2.25.1 > > Thanks > Barry
On Tue, Mar 12, 2024 at 9:12 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 11/03/2024 22:30, Barry Song wrote: > > On Mon, Mar 11, 2024 at 11:01 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > >> > >> Now that swap supports storing all mTHP sizes, avoid splitting large > >> folios before swap-out. This benefits performance of the swap-out path > >> by eliding split_folio_to_list(), which is expensive, and also sets us > >> up for swapping in large folios in a future series. > >> > >> If the folio is partially mapped, we continue to split it since we want > >> to avoid the extra IO overhead and storage of writing out pages > >> uneccessarily. > >> > >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > >> --- > >> mm/vmscan.c | 9 +++++---- > >> 1 file changed, 5 insertions(+), 4 deletions(-) > >> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c > >> index cf7d4cf47f1a..0ebec99e04c6 100644 > >> --- a/mm/vmscan.c > >> +++ b/mm/vmscan.c > >> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > >> if (!can_split_folio(folio, NULL)) > >> goto activate_locked; > >> /* > >> - * Split folios without a PMD map right > >> - * away. Chances are some or all of the > >> - * tail pages can be freed without IO. > >> + * Split partially mapped folios map > >> + * right away. Chances are some or all > >> + * of the tail pages can be freed > >> + * without IO. > >> */ > >> - if (!folio_entire_mapcount(folio) && > >> + if (!list_empty(&folio->_deferred_list) && > > > > Hi Ryan, > > After reconsidering our previous discussion about PMD-mapped large > > folios, I've pondered > > the possibility of PMD-mapped Transparent Huge Pages (THPs) being > > mapped by multiple > > processes. In such a scenario, if one process decides to unmap a > > portion of the folio while > > others retain the entire mapping, it raises questions about how the > > system should handle > > this situation. Would the large folio be placed in a deferred list? > > No - if the large folio is entirely mapped (via PMD), then the folio will not be > put on the deferred split list in the first place. See __folio_remove_rmap(): > > last = (last < ENTIRELY_MAPPED); > > means that nr will never be incremented above 0. (_nr_pages_mapped is > incremented by ENTIRELY_MAPPED for every PMD map). you are right, I missed this part, we are breaking early in RMAP_LEVEL_PTE. so we won't get to if (nr). Thanks for your clarification. now we get unified code for both pmd-mapped and pte-mapped large folios. feel free to add, Reviewed-by: Barry Song <v-songbaohua@oppo.com> > > > If > > so, splitting it might not > > yield benefits, as neither I/O nor swap slots would increase in this > > case by not splitting it. > > > > Regarding PTE-mapped large folios, the absence of an indicator like > > "entire_map" makes it > > challenging to identify cases where the entire folio is mapped. Thus, > > splitting seems to be > > the only viable solution in such circumstances. > > > >> split_folio_to_list(folio, > >> folio_list)) > >> goto activate_locked; > >> -- > >> 2.25.1 > Thanks Barry
© 2016 - 2026 Red Hat, Inc.