[v5] RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

Posted by Sridhar, Kanchana P 1 year, 5 months ago

> -----Original Message-----
> From: Nhat Pham <nphamcs@gmail.com>
> Sent: Wednesday, August 28, 2024 2:35 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com;
> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> 
> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > Hi All,
> >
> > This patch-series enables zswap_store() to accept and store mTHP
> > folios. The most significant contribution in this series is from the
> > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
> > migrated to v6.11-rc3 in patch 2/4 of this series.
> >
> > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> >      https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> ryan.roberts@arm.com/T/#u
> >
> > Additionally, there is an attempt to modularize some of the functionality
> > in zswap_store(), to make it more amenable to supporting any-order
> > mTHPs. For instance, the function zswap_store_entry() stores a
> zswap_entry
> > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> > delete all offsets corresponding to a higher order folio stored in zswap.
> >
> 
> Will this have any conflict with mTHP swap work? Especially with mTHP
> swap-in and zswap writeback.
> 
> My understanding is from zswap's perspective, the large folio is
> broken apart into independent subpages, correct? What happens when we
> have partially written back mTHP (i.e some subpages are in zswap
> still, whereas others are written back to swap). Would this
> automatically prevent mTHP swapin?

That is a good point. To begin with, this patch-series would make the default
behavior for mTHP swapout/storage and swapin for ZSWAP to be on par with
ZRAM. From zswap's perspective, imo this is a significant step forward towards
realizing cold memory storage with mTHP folios. However, it is only a starting
point that makes the behavior uniform across zswap/zram. Initially, workloads
would see a one-time benefit with reclaim being able to swapout mTHP
folios without splitting, to zswap. If the mTHPs were cold memory, then we
would have derived latency gains towards memory savings (with zswap).

However, if the mTHP were part of "not so cold" memory, this would result
in a one-way mTHP conversion to 4K folios. Depending on workloads and their
access patterns, we could either see individual 4K folios being swapped in,
or entire chunks if not the entire (original) mTHP needing to be swapped in.

It should be noted that this is more of a performance vs. cold memory
preservation trade-off that needs to drive mTHP reclaim, storage, swapin and
writeback policy. Different workloads could require different policies. However,
even though this patch is only a starting point, it is still functionally correct
by being equivalent to zram-mTHP, and compatible with the rest of mm and
swap as far as mTHP. Another important functionality/data consistency decision
I made in this patch series is error handling during zswap_store() of mTHP:
in case of any errors, all swap offsets for the mTHP are deleted from the
zswap xarray/zpool, since we know that the mTHP will now have to be stored
in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
or entirely not stored in zswap.

To answer your question, we would need to come up with what the semantics
would need to be for zswap zpool storage granularity, swapin granularity,
readahead granularity and writeback wrt mTHP and how the overall swap
sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-order
folios during swapout. Once we have a good understanding of these policies,
we could implement them in zswap. Alternately, develop an abstraction that is
one level above zswap/zram and makes things easier and shareable between
zswap and zram. By this, I mean fundamental assumptions such as consecutive
swap offsets (for instance). To some extent, this implies that an mTHP as a
swap entity is defined by consecutiveness of swap offsets. Maybe the policy
to keep mTHPs in the system over extended duration might be to assemble
them dynamically based on swapin_readahead() decisions (which is based on
workload access patterns). In other words, mTHPs could be a useful abstraction
that can be static or even dynamic based on working set characteristics, and
cold memory preservation. This is quite a complex topic imho.

As we know, Barry Song and Chuanhua Han have started the discussion on
this in their zram mTHP swapin series [1].

[1] https://lore.kernel.org/all/20240821074541.516249-3-hanchuanhua@oppo.com/T/#u

Thanks,
Kanchana

Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

Posted by Nhat Pham 1 year, 5 months ago

On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Nhat Pham <nphamcs@gmail.com>
> > Sent: Wednesday, August 28, 2024 2:35 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com;
> > Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> >
> > On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > > Hi All,
> > >
> > > This patch-series enables zswap_store() to accept and store mTHP
> > > folios. The most significant contribution in this series is from the
> > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
> > > migrated to v6.11-rc3 in patch 2/4 of this series.
> > >
> > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> > >      https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> > ryan.roberts@arm.com/T/#u
> > >
> > > Additionally, there is an attempt to modularize some of the functionality
> > > in zswap_store(), to make it more amenable to supporting any-order
> > > mTHPs. For instance, the function zswap_store_entry() stores a
> > zswap_entry
> > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> > > delete all offsets corresponding to a higher order folio stored in zswap.
> > >
> >
> > Will this have any conflict with mTHP swap work? Especially with mTHP
> > swap-in and zswap writeback.
> >
> > My understanding is from zswap's perspective, the large folio is
> > broken apart into independent subpages, correct? What happens when we
> > have partially written back mTHP (i.e some subpages are in zswap
> > still, whereas others are written back to swap). Would this
> > automatically prevent mTHP swapin?
>
> That is a good point. To begin with, this patch-series would make the default
> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par with
> ZRAM. From zswap's perspective, imo this is a significant step forward towards
> realizing cold memory storage with mTHP folios. However, it is only a starting
> point that makes the behavior uniform across zswap/zram. Initially, workloads
> would see a one-time benefit with reclaim being able to swapout mTHP
> folios without splitting, to zswap. If the mTHPs were cold memory, then we
> would have derived latency gains towards memory savings (with zswap).
>
> However, if the mTHP were part of "not so cold" memory, this would result
> in a one-way mTHP conversion to 4K folios. Depending on workloads and their
> access patterns, we could either see individual 4K folios being swapped in,
> or entire chunks if not the entire (original) mTHP needing to be swapped in.
>
> It should be noted that this is more of a performance vs. cold memory
> preservation trade-off that needs to drive mTHP reclaim, storage, swapin and
> writeback policy. Different workloads could require different policies. However,
> even though this patch is only a starting point, it is still functionally correct
> by being equivalent to zram-mTHP, and compatible with the rest of mm and
> swap as far as mTHP. Another important functionality/data consistency decision
> I made in this patch series is error handling during zswap_store() of mTHP:
> in case of any errors, all swap offsets for the mTHP are deleted from the
> zswap xarray/zpool, since we know that the mTHP will now have to be stored
> in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
> or entirely not stored in zswap.
>
> To answer your question, we would need to come up with what the semantics
> would need to be for zswap zpool storage granularity, swapin granularity,
> readahead granularity and writeback wrt mTHP and how the overall swap
> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-order
> folios during swapout. Once we have a good understanding of these policies,
> we could implement them in zswap. Alternately, develop an abstraction that is
> one level above zswap/zram and makes things easier and shareable between
> zswap and zram. By this, I mean fundamental assumptions such as consecutive
> swap offsets (for instance). To some extent, this implies that an mTHP as a
> swap entity is defined by consecutiveness of swap offsets. Maybe the policy
> to keep mTHPs in the system over extended duration might be to assemble
> them dynamically based on swapin_readahead() decisions (which is based on
> workload access patterns). In other words, mTHPs could be a useful abstraction
> that can be static or even dynamic based on working set characteristics, and
> cold memory preservation. This is quite a complex topic imho.
>
> As we know, Barry Song and Chuanhua Han have started the discussion on
> this in their zram mTHP swapin series [1].

Yeah I'm a bit more concerned with the correctness aspect. As long as
it's not buggy, then we can implement mTHP zswapout first, and force
individual subpage (z)swapin for now (since we cannot control
writeback from writing individual subpages).

We can discuss strategy to harmonize mTHP, zswap (with writeback) as
we go along.

BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
not working properly... Let me manually add him in - please include
him in future submission and responses, as he is also a zswap reviewer
:)

Also cc-ing Usama who is interested in this work.

>
> [1] https://lore.kernel.org/all/20240821074541.516249-3-hanchuanhua@oppo.com/T/#u
>
> Thanks,
> Kanchana

RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

Posted by Sridhar, Kanchana P 1 year, 5 months ago

Hi Nhat,

> -----Original Message-----
> From: Nhat Pham <nphamcs@gmail.com>
> Sent: Thursday, August 29, 2024 10:11 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com;
> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>;
> Usama Arif <usamaarif642@gmail.com>; Chengming Zhou
> <chengming.zhou@linux.dev>
> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> 
> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nhat Pham <nphamcs@gmail.com>
> > > Sent: Wednesday, August 28, 2024 2:35 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; yosryahmed@google.com;
> ryan.roberts@arm.com;
> > > Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> > >
> > > On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > This patch-series enables zswap_store() to accept and store mTHP
> > > > folios. The most significant contribution in this series is from the
> > > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
> > > > migrated to v6.11-rc3 in patch 2/4 of this series.
> > > >
> > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> > > >      https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> > > ryan.roberts@arm.com/T/#u
> > > >
> > > > Additionally, there is an attempt to modularize some of the functionality
> > > > in zswap_store(), to make it more amenable to supporting any-order
> > > > mTHPs. For instance, the function zswap_store_entry() stores a
> > > zswap_entry
> > > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> > > > delete all offsets corresponding to a higher order folio stored in zswap.
> > > >
> > >
> > > Will this have any conflict with mTHP swap work? Especially with mTHP
> > > swap-in and zswap writeback.
> > >
> > > My understanding is from zswap's perspective, the large folio is
> > > broken apart into independent subpages, correct? What happens when
> we
> > > have partially written back mTHP (i.e some subpages are in zswap
> > > still, whereas others are written back to swap). Would this
> > > automatically prevent mTHP swapin?
> >
> > That is a good point. To begin with, this patch-series would make the default
> > behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
> with
> > ZRAM. From zswap's perspective, imo this is a significant step forward
> towards
> > realizing cold memory storage with mTHP folios. However, it is only a
> starting
> > point that makes the behavior uniform across zswap/zram. Initially,
> workloads
> > would see a one-time benefit with reclaim being able to swapout mTHP
> > folios without splitting, to zswap. If the mTHPs were cold memory, then we
> > would have derived latency gains towards memory savings (with zswap).
> >
> > However, if the mTHP were part of "not so cold" memory, this would result
> > in a one-way mTHP conversion to 4K folios. Depending on workloads and
> their
> > access patterns, we could either see individual 4K folios being swapped in,
> > or entire chunks if not the entire (original) mTHP needing to be swapped in.
> >
> > It should be noted that this is more of a performance vs. cold memory
> > preservation trade-off that needs to drive mTHP reclaim, storage, swapin
> and
> > writeback policy. Different workloads could require different policies.
> However,
> > even though this patch is only a starting point, it is still functionally correct
> > by being equivalent to zram-mTHP, and compatible with the rest of mm and
> > swap as far as mTHP. Another important functionality/data consistency
> decision
> > I made in this patch series is error handling during zswap_store() of mTHP:
> > in case of any errors, all swap offsets for the mTHP are deleted from the
> > zswap xarray/zpool, since we know that the mTHP will now have to be
> stored
> > in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
> > or entirely not stored in zswap.
> >
> > To answer your question, we would need to come up with what the
> semantics
> > would need to be for zswap zpool storage granularity, swapin granularity,
> > readahead granularity and writeback wrt mTHP and how the overall swap
> > sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
> order
> > folios during swapout. Once we have a good understanding of these policies,
> > we could implement them in zswap. Alternately, develop an abstraction that
> is
> > one level above zswap/zram and makes things easier and shareable
> between
> > zswap and zram. By this, I mean fundamental assumptions such as
> consecutive
> > swap offsets (for instance). To some extent, this implies that an mTHP as a
> > swap entity is defined by consecutiveness of swap offsets. Maybe the policy
> > to keep mTHPs in the system over extended duration might be to assemble
> > them dynamically based on swapin_readahead() decisions (which is based
> on
> > workload access patterns). In other words, mTHPs could be a useful
> abstraction
> > that can be static or even dynamic based on working set characteristics, and
> > cold memory preservation. This is quite a complex topic imho.
> >
> > As we know, Barry Song and Chuanhua Han have started the discussion on
> > this in their zram mTHP swapin series [1].
> 
> Yeah I'm a bit more concerned with the correctness aspect. As long as
> it's not buggy, then we can implement mTHP zswapout first, and force
> individual subpage (z)swapin for now (since we cannot control
> writeback from writing individual subpages).

Absolutely, this sounds like the way to go!

> 
> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
> we go along.

Sounds great :)

> 
> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
> not working properly... Let me manually add him in - please include
> him in future submission and responses, as he is also a zswap reviewer
> :)

I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
Chengming in future submissions and responses :)

> 
> Also cc-ing Usama who is interested in this work.

Sounds great.

Thanks,
Kanchana

> 
> >
> > [1] https://lore.kernel.org/all/20240821074541.516249-3-
> hanchuanhua@oppo.com/T/#u
> >
> > Thanks,
> > Kanchana

Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

Posted by Chengming Zhou 1 year, 5 months ago

On 2024/8/30 03:38, Sridhar, Kanchana P wrote:
> Hi Nhat,
> 
>> -----Original Message-----
>> From: Nhat Pham <nphamcs@gmail.com>
>> Sent: Thursday, August 29, 2024 10:11 AM
>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
>> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com;
>> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
>> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>;
>> Usama Arif <usamaarif642@gmail.com>; Chengming Zhou
>> <chengming.zhou@linux.dev>
>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
>>
>> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
>> <kanchana.p.sridhar@intel.com> wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Nhat Pham <nphamcs@gmail.com>
>>>> Sent: Wednesday, August 28, 2024 2:35 PM
>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>>>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
>>>> hannes@cmpxchg.org; yosryahmed@google.com;
>> ryan.roberts@arm.com;
>>>> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
>>>> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
>>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
>>>>
>>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
>>>> <kanchana.p.sridhar@intel.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> This patch-series enables zswap_store() to accept and store mTHP
>>>>> folios. The most significant contribution in this series is from the
>>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
>>>>> migrated to v6.11-rc3 in patch 2/4 of this series.
>>>>>
>>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
>>>>>       https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
>>>> ryan.roberts@arm.com/T/#u
>>>>>
>>>>> Additionally, there is an attempt to modularize some of the functionality
>>>>> in zswap_store(), to make it more amenable to supporting any-order
>>>>> mTHPs. For instance, the function zswap_store_entry() stores a
>>>> zswap_entry
>>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
>>>>> delete all offsets corresponding to a higher order folio stored in zswap.
>>>>>
>>>>
>>>> Will this have any conflict with mTHP swap work? Especially with mTHP
>>>> swap-in and zswap writeback.
>>>>
>>>> My understanding is from zswap's perspective, the large folio is
>>>> broken apart into independent subpages, correct? What happens when
>> we
>>>> have partially written back mTHP (i.e some subpages are in zswap
>>>> still, whereas others are written back to swap). Would this
>>>> automatically prevent mTHP swapin?
>>>
>>> That is a good point. To begin with, this patch-series would make the default
>>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
>> with
>>> ZRAM. From zswap's perspective, imo this is a significant step forward
>> towards
>>> realizing cold memory storage with mTHP folios. However, it is only a
>> starting
>>> point that makes the behavior uniform across zswap/zram. Initially,
>> workloads
>>> would see a one-time benefit with reclaim being able to swapout mTHP
>>> folios without splitting, to zswap. If the mTHPs were cold memory, then we
>>> would have derived latency gains towards memory savings (with zswap).
>>>
>>> However, if the mTHP were part of "not so cold" memory, this would result
>>> in a one-way mTHP conversion to 4K folios. Depending on workloads and
>> their
>>> access patterns, we could either see individual 4K folios being swapped in,
>>> or entire chunks if not the entire (original) mTHP needing to be swapped in.
>>>
>>> It should be noted that this is more of a performance vs. cold memory
>>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin
>> and
>>> writeback policy. Different workloads could require different policies.
>> However,
>>> even though this patch is only a starting point, it is still functionally correct
>>> by being equivalent to zram-mTHP, and compatible with the rest of mm and
>>> swap as far as mTHP. Another important functionality/data consistency
>> decision
>>> I made in this patch series is error handling during zswap_store() of mTHP:
>>> in case of any errors, all swap offsets for the mTHP are deleted from the
>>> zswap xarray/zpool, since we know that the mTHP will now have to be
>> stored
>>> in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
>>> or entirely not stored in zswap.
>>>
>>> To answer your question, we would need to come up with what the
>> semantics
>>> would need to be for zswap zpool storage granularity, swapin granularity,
>>> readahead granularity and writeback wrt mTHP and how the overall swap
>>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
>> order
>>> folios during swapout. Once we have a good understanding of these policies,
>>> we could implement them in zswap. Alternately, develop an abstraction that
>> is
>>> one level above zswap/zram and makes things easier and shareable
>> between
>>> zswap and zram. By this, I mean fundamental assumptions such as
>> consecutive
>>> swap offsets (for instance). To some extent, this implies that an mTHP as a
>>> swap entity is defined by consecutiveness of swap offsets. Maybe the policy
>>> to keep mTHPs in the system over extended duration might be to assemble
>>> them dynamically based on swapin_readahead() decisions (which is based
>> on
>>> workload access patterns). In other words, mTHPs could be a useful
>> abstraction
>>> that can be static or even dynamic based on working set characteristics, and
>>> cold memory preservation. This is quite a complex topic imho.
>>>
>>> As we know, Barry Song and Chuanhua Han have started the discussion on
>>> this in their zram mTHP swapin series [1].
>>
>> Yeah I'm a bit more concerned with the correctness aspect. As long as
>> it's not buggy, then we can implement mTHP zswapout first, and force
>> individual subpage (z)swapin for now (since we cannot control
>> writeback from writing individual subpages).
> 
> Absolutely, this sounds like the way to go!
> 
>>
>> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
>> we go along.
> 
> Sounds great :)
> 
>>
>> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
>> not working properly... Let me manually add him in - please include
>> him in future submission and responses, as he is also a zswap reviewer
>> :)
> 
> I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
> Chengming in future submissions and responses :)

Maybe a little late for the party, will take a look ASAP.
It's an interesting and great work.

Thanks!

> 
>>
>> Also cc-ing Usama who is interested in this work.
> 
> Sounds great.
> 
> Thanks,
> Kanchana
> 
>>
>>>
>>> [1] https://lore.kernel.org/all/20240821074541.516249-3-
>> hanchuanhua@oppo.com/T/#u
>>>
>>> Thanks,
>>> Kanchana

RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

Posted by Sridhar, Kanchana P 1 year, 4 months ago

Hi Chengming,

> -----Original Message-----
> From: Chengming Zhou <chengming.zhou@linux.dev>
> Sent: Thursday, August 29, 2024 9:52 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Nhat Pham
> <nphamcs@gmail.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com;
> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>;
> Usama Arif <usamaarif642@gmail.com>
> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> 
> On 2024/8/30 03:38, Sridhar, Kanchana P wrote:
> > Hi Nhat,
> >
> >> -----Original Message-----
> >> From: Nhat Pham <nphamcs@gmail.com>
> >> Sent: Thursday, August 29, 2024 10:11 AM
> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> >> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com;
> >> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> >> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>;
> >> Usama Arif <usamaarif642@gmail.com>; Chengming Zhou
> >> <chengming.zhou@linux.dev>
> >> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> >>
> >> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
> >> <kanchana.p.sridhar@intel.com> wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Nhat Pham <nphamcs@gmail.com>
> >>>> Sent: Wednesday, August 28, 2024 2:35 PM
> >>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >>>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> >>>> hannes@cmpxchg.org; yosryahmed@google.com;
> >> ryan.roberts@arm.com;
> >>>> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com;
> akpm@linux-
> >>>> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> >>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> >>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> >>>>
> >>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> >>>> <kanchana.p.sridhar@intel.com> wrote:
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> This patch-series enables zswap_store() to accept and store mTHP
> >>>>> folios. The most significant contribution in this series is from the
> >>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has
> been
> >>>>> migrated to v6.11-rc3 in patch 2/4 of this series.
> >>>>>
> >>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> >>>>>       https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> >>>> ryan.roberts@arm.com/T/#u
> >>>>>
> >>>>> Additionally, there is an attempt to modularize some of the
> functionality
> >>>>> in zswap_store(), to make it more amenable to supporting any-order
> >>>>> mTHPs. For instance, the function zswap_store_entry() stores a
> >>>> zswap_entry
> >>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> >>>>> delete all offsets corresponding to a higher order folio stored in zswap.
> >>>>>
> >>>>
> >>>> Will this have any conflict with mTHP swap work? Especially with mTHP
> >>>> swap-in and zswap writeback.
> >>>>
> >>>> My understanding is from zswap's perspective, the large folio is
> >>>> broken apart into independent subpages, correct? What happens when
> >> we
> >>>> have partially written back mTHP (i.e some subpages are in zswap
> >>>> still, whereas others are written back to swap). Would this
> >>>> automatically prevent mTHP swapin?
> >>>
> >>> That is a good point. To begin with, this patch-series would make the
> default
> >>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
> >> with
> >>> ZRAM. From zswap's perspective, imo this is a significant step forward
> >> towards
> >>> realizing cold memory storage with mTHP folios. However, it is only a
> >> starting
> >>> point that makes the behavior uniform across zswap/zram. Initially,
> >> workloads
> >>> would see a one-time benefit with reclaim being able to swapout mTHP
> >>> folios without splitting, to zswap. If the mTHPs were cold memory, then
> we
> >>> would have derived latency gains towards memory savings (with zswap).
> >>>
> >>> However, if the mTHP were part of "not so cold" memory, this would
> result
> >>> in a one-way mTHP conversion to 4K folios. Depending on workloads and
> >> their
> >>> access patterns, we could either see individual 4K folios being swapped in,
> >>> or entire chunks if not the entire (original) mTHP needing to be swapped
> in.
> >>>
> >>> It should be noted that this is more of a performance vs. cold memory
> >>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin
> >> and
> >>> writeback policy. Different workloads could require different policies.
> >> However,
> >>> even though this patch is only a starting point, it is still functionally
> correct
> >>> by being equivalent to zram-mTHP, and compatible with the rest of mm
> and
> >>> swap as far as mTHP. Another important functionality/data consistency
> >> decision
> >>> I made in this patch series is error handling during zswap_store() of
> mTHP:
> >>> in case of any errors, all swap offsets for the mTHP are deleted from the
> >>> zswap xarray/zpool, since we know that the mTHP will now have to be
> >> stored
> >>> in the backing swap device. IOW, an mTHP is either entirely stored in
> zswap,
> >>> or entirely not stored in zswap.
> >>>
> >>> To answer your question, we would need to come up with what the
> >> semantics
> >>> would need to be for zswap zpool storage granularity, swapin granularity,
> >>> readahead granularity and writeback wrt mTHP and how the overall
> swap
> >>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
> >> order
> >>> folios during swapout. Once we have a good understanding of these
> policies,
> >>> we could implement them in zswap. Alternately, develop an abstraction
> that
> >> is
> >>> one level above zswap/zram and makes things easier and shareable
> >> between
> >>> zswap and zram. By this, I mean fundamental assumptions such as
> >> consecutive
> >>> swap offsets (for instance). To some extent, this implies that an mTHP as
> a
> >>> swap entity is defined by consecutiveness of swap offsets. Maybe the
> policy
> >>> to keep mTHPs in the system over extended duration might be to
> assemble
> >>> them dynamically based on swapin_readahead() decisions (which is
> based
> >> on
> >>> workload access patterns). In other words, mTHPs could be a useful
> >> abstraction
> >>> that can be static or even dynamic based on working set characteristics,
> and
> >>> cold memory preservation. This is quite a complex topic imho.
> >>>
> >>> As we know, Barry Song and Chuanhua Han have started the discussion
> on
> >>> this in their zram mTHP swapin series [1].
> >>
> >> Yeah I'm a bit more concerned with the correctness aspect. As long as
> >> it's not buggy, then we can implement mTHP zswapout first, and force
> >> individual subpage (z)swapin for now (since we cannot control
> >> writeback from writing individual subpages).
> >
> > Absolutely, this sounds like the way to go!
> >
> >>
> >> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
> >> we go along.
> >
> > Sounds great :)
> >
> >>
> >> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
> >> not working properly... Let me manually add him in - please include
> >> him in future submission and responses, as he is also a zswap reviewer
> >> :)
> >
> > I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
> > Chengming in future submissions and responses :)
> 
> Maybe a little late for the party, will take a look ASAP.
> It's an interesting and great work.

Thanks! Appreciate your code review and suggestions to improve
the patchset.

Thanks,
Kanchana

> 
> Thanks!
> 
> >
> >>
> >> Also cc-ing Usama who is interested in this work.
> >
> > Sounds great.
> >
> > Thanks,
> > Kanchana
> >
> >>
> >>>
> >>> [1] https://lore.kernel.org/all/20240821074541.516249-3-
> >> hanchuanhua@oppo.com/T/#u
> >>>
> >>> Thanks,
> >>> Kanchana