Hi Nhat, > -----Original Message----- > From: Nhat Pham <nphamcs@gmail.com> > Sent: Thursday, August 29, 2024 10:11 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; > Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>; > Usama Arif <usamaarif642@gmail.com>; Chengming Zhou > <chengming.zhou@linux.dev> > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: Nhat Pham <nphamcs@gmail.com> > > > Sent: Wednesday, August 28, 2024 2:35 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > hannes@cmpxchg.org; yosryahmed@google.com; > ryan.roberts@arm.com; > > > Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > > > > > On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > Hi All, > > > > > > > > This patch-series enables zswap_store() to accept and store mTHP > > > > folios. The most significant contribution in this series is from the > > > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been > > > > migrated to v6.11-rc3 in patch 2/4 of this series. > > > > > > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting > > > > https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > > > ryan.roberts@arm.com/T/#u > > > > > > > > Additionally, there is an attempt to modularize some of the functionality > > > > in zswap_store(), to make it more amenable to supporting any-order > > > > mTHPs. For instance, the function zswap_store_entry() stores a > > > zswap_entry > > > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to > > > > delete all offsets corresponding to a higher order folio stored in zswap. > > > > > > > > > > Will this have any conflict with mTHP swap work? Especially with mTHP > > > swap-in and zswap writeback. > > > > > > My understanding is from zswap's perspective, the large folio is > > > broken apart into independent subpages, correct? What happens when > we > > > have partially written back mTHP (i.e some subpages are in zswap > > > still, whereas others are written back to swap). Would this > > > automatically prevent mTHP swapin? > > > > That is a good point. To begin with, this patch-series would make the default > > behavior for mTHP swapout/storage and swapin for ZSWAP to be on par > with > > ZRAM. From zswap's perspective, imo this is a significant step forward > towards > > realizing cold memory storage with mTHP folios. However, it is only a > starting > > point that makes the behavior uniform across zswap/zram. Initially, > workloads > > would see a one-time benefit with reclaim being able to swapout mTHP > > folios without splitting, to zswap. If the mTHPs were cold memory, then we > > would have derived latency gains towards memory savings (with zswap). > > > > However, if the mTHP were part of "not so cold" memory, this would result > > in a one-way mTHP conversion to 4K folios. Depending on workloads and > their > > access patterns, we could either see individual 4K folios being swapped in, > > or entire chunks if not the entire (original) mTHP needing to be swapped in. > > > > It should be noted that this is more of a performance vs. cold memory > > preservation trade-off that needs to drive mTHP reclaim, storage, swapin > and > > writeback policy. Different workloads could require different policies. > However, > > even though this patch is only a starting point, it is still functionally correct > > by being equivalent to zram-mTHP, and compatible with the rest of mm and > > swap as far as mTHP. Another important functionality/data consistency > decision > > I made in this patch series is error handling during zswap_store() of mTHP: > > in case of any errors, all swap offsets for the mTHP are deleted from the > > zswap xarray/zpool, since we know that the mTHP will now have to be > stored > > in the backing swap device. IOW, an mTHP is either entirely stored in zswap, > > or entirely not stored in zswap. > > > > To answer your question, we would need to come up with what the > semantics > > would need to be for zswap zpool storage granularity, swapin granularity, > > readahead granularity and writeback wrt mTHP and how the overall swap > > sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower- > order > > folios during swapout. Once we have a good understanding of these policies, > > we could implement them in zswap. Alternately, develop an abstraction that > is > > one level above zswap/zram and makes things easier and shareable > between > > zswap and zram. By this, I mean fundamental assumptions such as > consecutive > > swap offsets (for instance). To some extent, this implies that an mTHP as a > > swap entity is defined by consecutiveness of swap offsets. Maybe the policy > > to keep mTHPs in the system over extended duration might be to assemble > > them dynamically based on swapin_readahead() decisions (which is based > on > > workload access patterns). In other words, mTHPs could be a useful > abstraction > > that can be static or even dynamic based on working set characteristics, and > > cold memory preservation. This is quite a complex topic imho. > > > > As we know, Barry Song and Chuanhua Han have started the discussion on > > this in their zram mTHP swapin series [1]. > > Yeah I'm a bit more concerned with the correctness aspect. As long as > it's not buggy, then we can implement mTHP zswapout first, and force > individual subpage (z)swapin for now (since we cannot control > writeback from writing individual subpages). Absolutely, this sounds like the way to go! > > We can discuss strategy to harmonize mTHP, zswap (with writeback) as > we go along. Sounds great :) > > BTW, I think we're not cc-ing Chengming? Is the get_maintainers script > not working properly... Let me manually add him in - please include > him in future submission and responses, as he is also a zswap reviewer > :) I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include Chengming in future submissions and responses :) > > Also cc-ing Usama who is interested in this work. Sounds great. Thanks, Kanchana > > > > > [1] https://lore.kernel.org/all/20240821074541.516249-3- > hanchuanhua@oppo.com/T/#u > > > > Thanks, > > Kanchana
On 2024/8/30 03:38, Sridhar, Kanchana P wrote: > Hi Nhat, > >> -----Original Message----- >> From: Nhat Pham <nphamcs@gmail.com> >> Sent: Thursday, August 29, 2024 10:11 AM >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; >> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- >> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>; >> Usama Arif <usamaarif642@gmail.com>; Chengming Zhou >> <chengming.zhou@linux.dev> >> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios >> >> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P >> <kanchana.p.sridhar@intel.com> wrote: >>> >>> >>>> -----Original Message----- >>>> From: Nhat Pham <nphamcs@gmail.com> >>>> Sent: Wednesday, August 28, 2024 2:35 PM >>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >>>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >>>> hannes@cmpxchg.org; yosryahmed@google.com; >> ryan.roberts@arm.com; >>>> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- >>>> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K >>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> >>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios >>>> >>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar >>>> <kanchana.p.sridhar@intel.com> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> This patch-series enables zswap_store() to accept and store mTHP >>>>> folios. The most significant contribution in this series is from the >>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been >>>>> migrated to v6.11-rc3 in patch 2/4 of this series. >>>>> >>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting >>>>> https://lore.kernel.org/linux-mm/20231019110543.3284654-1- >>>> ryan.roberts@arm.com/T/#u >>>>> >>>>> Additionally, there is an attempt to modularize some of the functionality >>>>> in zswap_store(), to make it more amenable to supporting any-order >>>>> mTHPs. For instance, the function zswap_store_entry() stores a >>>> zswap_entry >>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to >>>>> delete all offsets corresponding to a higher order folio stored in zswap. >>>>> >>>> >>>> Will this have any conflict with mTHP swap work? Especially with mTHP >>>> swap-in and zswap writeback. >>>> >>>> My understanding is from zswap's perspective, the large folio is >>>> broken apart into independent subpages, correct? What happens when >> we >>>> have partially written back mTHP (i.e some subpages are in zswap >>>> still, whereas others are written back to swap). Would this >>>> automatically prevent mTHP swapin? >>> >>> That is a good point. To begin with, this patch-series would make the default >>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par >> with >>> ZRAM. From zswap's perspective, imo this is a significant step forward >> towards >>> realizing cold memory storage with mTHP folios. However, it is only a >> starting >>> point that makes the behavior uniform across zswap/zram. Initially, >> workloads >>> would see a one-time benefit with reclaim being able to swapout mTHP >>> folios without splitting, to zswap. If the mTHPs were cold memory, then we >>> would have derived latency gains towards memory savings (with zswap). >>> >>> However, if the mTHP were part of "not so cold" memory, this would result >>> in a one-way mTHP conversion to 4K folios. Depending on workloads and >> their >>> access patterns, we could either see individual 4K folios being swapped in, >>> or entire chunks if not the entire (original) mTHP needing to be swapped in. >>> >>> It should be noted that this is more of a performance vs. cold memory >>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin >> and >>> writeback policy. Different workloads could require different policies. >> However, >>> even though this patch is only a starting point, it is still functionally correct >>> by being equivalent to zram-mTHP, and compatible with the rest of mm and >>> swap as far as mTHP. Another important functionality/data consistency >> decision >>> I made in this patch series is error handling during zswap_store() of mTHP: >>> in case of any errors, all swap offsets for the mTHP are deleted from the >>> zswap xarray/zpool, since we know that the mTHP will now have to be >> stored >>> in the backing swap device. IOW, an mTHP is either entirely stored in zswap, >>> or entirely not stored in zswap. >>> >>> To answer your question, we would need to come up with what the >> semantics >>> would need to be for zswap zpool storage granularity, swapin granularity, >>> readahead granularity and writeback wrt mTHP and how the overall swap >>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower- >> order >>> folios during swapout. Once we have a good understanding of these policies, >>> we could implement them in zswap. Alternately, develop an abstraction that >> is >>> one level above zswap/zram and makes things easier and shareable >> between >>> zswap and zram. By this, I mean fundamental assumptions such as >> consecutive >>> swap offsets (for instance). To some extent, this implies that an mTHP as a >>> swap entity is defined by consecutiveness of swap offsets. Maybe the policy >>> to keep mTHPs in the system over extended duration might be to assemble >>> them dynamically based on swapin_readahead() decisions (which is based >> on >>> workload access patterns). In other words, mTHPs could be a useful >> abstraction >>> that can be static or even dynamic based on working set characteristics, and >>> cold memory preservation. This is quite a complex topic imho. >>> >>> As we know, Barry Song and Chuanhua Han have started the discussion on >>> this in their zram mTHP swapin series [1]. >> >> Yeah I'm a bit more concerned with the correctness aspect. As long as >> it's not buggy, then we can implement mTHP zswapout first, and force >> individual subpage (z)swapin for now (since we cannot control >> writeback from writing individual subpages). > > Absolutely, this sounds like the way to go! > >> >> We can discuss strategy to harmonize mTHP, zswap (with writeback) as >> we go along. > > Sounds great :) > >> >> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script >> not working properly... Let me manually add him in - please include >> him in future submission and responses, as he is also a zswap reviewer >> :) > > I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include > Chengming in future submissions and responses :) Maybe a little late for the party, will take a look ASAP. It's an interesting and great work. Thanks! > >> >> Also cc-ing Usama who is interested in this work. > > Sounds great. > > Thanks, > Kanchana > >> >>> >>> [1] https://lore.kernel.org/all/20240821074541.516249-3- >> hanchuanhua@oppo.com/T/#u >>> >>> Thanks, >>> Kanchana
Hi Chengming, > -----Original Message----- > From: Chengming Zhou <chengming.zhou@linux.dev> > Sent: Thursday, August 29, 2024 9:52 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Nhat Pham > <nphamcs@gmail.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; > Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>; > Usama Arif <usamaarif642@gmail.com> > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > On 2024/8/30 03:38, Sridhar, Kanchana P wrote: > > Hi Nhat, > > > >> -----Original Message----- > >> From: Nhat Pham <nphamcs@gmail.com> > >> Sent: Thursday, August 29, 2024 10:11 AM > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > >> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; > >> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > >> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>; > >> Usama Arif <usamaarif642@gmail.com>; Chengming Zhou > >> <chengming.zhou@linux.dev> > >> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > >> > >> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P > >> <kanchana.p.sridhar@intel.com> wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Nhat Pham <nphamcs@gmail.com> > >>>> Sent: Wednesday, August 28, 2024 2:35 PM > >>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > >>>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > >>>> hannes@cmpxchg.org; yosryahmed@google.com; > >> ryan.roberts@arm.com; > >>>> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; > akpm@linux- > >>>> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > >>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > >>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > >>>> > >>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar > >>>> <kanchana.p.sridhar@intel.com> wrote: > >>>>> > >>>>> Hi All, > >>>>> > >>>>> This patch-series enables zswap_store() to accept and store mTHP > >>>>> folios. The most significant contribution in this series is from the > >>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has > been > >>>>> migrated to v6.11-rc3 in patch 2/4 of this series. > >>>>> > >>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting > >>>>> https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > >>>> ryan.roberts@arm.com/T/#u > >>>>> > >>>>> Additionally, there is an attempt to modularize some of the > functionality > >>>>> in zswap_store(), to make it more amenable to supporting any-order > >>>>> mTHPs. For instance, the function zswap_store_entry() stores a > >>>> zswap_entry > >>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to > >>>>> delete all offsets corresponding to a higher order folio stored in zswap. > >>>>> > >>>> > >>>> Will this have any conflict with mTHP swap work? Especially with mTHP > >>>> swap-in and zswap writeback. > >>>> > >>>> My understanding is from zswap's perspective, the large folio is > >>>> broken apart into independent subpages, correct? What happens when > >> we > >>>> have partially written back mTHP (i.e some subpages are in zswap > >>>> still, whereas others are written back to swap). Would this > >>>> automatically prevent mTHP swapin? > >>> > >>> That is a good point. To begin with, this patch-series would make the > default > >>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par > >> with > >>> ZRAM. From zswap's perspective, imo this is a significant step forward > >> towards > >>> realizing cold memory storage with mTHP folios. However, it is only a > >> starting > >>> point that makes the behavior uniform across zswap/zram. Initially, > >> workloads > >>> would see a one-time benefit with reclaim being able to swapout mTHP > >>> folios without splitting, to zswap. If the mTHPs were cold memory, then > we > >>> would have derived latency gains towards memory savings (with zswap). > >>> > >>> However, if the mTHP were part of "not so cold" memory, this would > result > >>> in a one-way mTHP conversion to 4K folios. Depending on workloads and > >> their > >>> access patterns, we could either see individual 4K folios being swapped in, > >>> or entire chunks if not the entire (original) mTHP needing to be swapped > in. > >>> > >>> It should be noted that this is more of a performance vs. cold memory > >>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin > >> and > >>> writeback policy. Different workloads could require different policies. > >> However, > >>> even though this patch is only a starting point, it is still functionally > correct > >>> by being equivalent to zram-mTHP, and compatible with the rest of mm > and > >>> swap as far as mTHP. Another important functionality/data consistency > >> decision > >>> I made in this patch series is error handling during zswap_store() of > mTHP: > >>> in case of any errors, all swap offsets for the mTHP are deleted from the > >>> zswap xarray/zpool, since we know that the mTHP will now have to be > >> stored > >>> in the backing swap device. IOW, an mTHP is either entirely stored in > zswap, > >>> or entirely not stored in zswap. > >>> > >>> To answer your question, we would need to come up with what the > >> semantics > >>> would need to be for zswap zpool storage granularity, swapin granularity, > >>> readahead granularity and writeback wrt mTHP and how the overall > swap > >>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower- > >> order > >>> folios during swapout. Once we have a good understanding of these > policies, > >>> we could implement them in zswap. Alternately, develop an abstraction > that > >> is > >>> one level above zswap/zram and makes things easier and shareable > >> between > >>> zswap and zram. By this, I mean fundamental assumptions such as > >> consecutive > >>> swap offsets (for instance). To some extent, this implies that an mTHP as > a > >>> swap entity is defined by consecutiveness of swap offsets. Maybe the > policy > >>> to keep mTHPs in the system over extended duration might be to > assemble > >>> them dynamically based on swapin_readahead() decisions (which is > based > >> on > >>> workload access patterns). In other words, mTHPs could be a useful > >> abstraction > >>> that can be static or even dynamic based on working set characteristics, > and > >>> cold memory preservation. This is quite a complex topic imho. > >>> > >>> As we know, Barry Song and Chuanhua Han have started the discussion > on > >>> this in their zram mTHP swapin series [1]. > >> > >> Yeah I'm a bit more concerned with the correctness aspect. As long as > >> it's not buggy, then we can implement mTHP zswapout first, and force > >> individual subpage (z)swapin for now (since we cannot control > >> writeback from writing individual subpages). > > > > Absolutely, this sounds like the way to go! > > > >> > >> We can discuss strategy to harmonize mTHP, zswap (with writeback) as > >> we go along. > > > > Sounds great :) > > > >> > >> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script > >> not working properly... Let me manually add him in - please include > >> him in future submission and responses, as he is also a zswap reviewer > >> :) > > > > I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include > > Chengming in future submissions and responses :) > > Maybe a little late for the party, will take a look ASAP. > It's an interesting and great work. Thanks! Appreciate your code review and suggestions to improve the patchset. Thanks, Kanchana > > Thanks! > > > > >> > >> Also cc-ing Usama who is interested in this work. > > > > Sounds great. > > > > Thanks, > > Kanchana > > > >> > >>> > >>> [1] https://lore.kernel.org/all/20240821074541.516249-3- > >> hanchuanhua@oppo.com/T/#u > >>> > >>> Thanks, > >>> Kanchana
© 2016 - 2025 Red Hat, Inc.