mm/zswap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Use alloc_workqueue() to create and set finer
work item attributes instead of create_workqueue()
which is to be deprecated.
Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
---
 mm/zswap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 74411dfdad92..64dbe3e944a2 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1620,7 +1620,8 @@ static int zswap_setup(void)
 		zswap_enabled = false;
 	}
 
-	shrink_wq = create_workqueue("zswap-shrink");
+	shrink_wq = alloc_workqueue("zswap-shrink",
+			WQ_UNBOUND|WQ_MEM_RECLAIM, 0);
 	if (!shrink_wq)
 		goto fallback_fail;
 
-- 
2.34.1On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero
<debug.penguin32@gmail.com> wrote:
>
> Use alloc_workqueue() to create and set finer
> work item attributes instead of create_workqueue()
> which is to be deprecated.
>
> Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> ---
>  mm/zswap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 74411dfdad92..64dbe3e944a2 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
>                 zswap_enabled = false;
>         }
>
> -       shrink_wq = create_workqueue("zswap-shrink");
> +       shrink_wq = alloc_workqueue("zswap-shrink",
> +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 0);
Hmmm this changes the current behavior a bit right? create_workqueue()
is currently defined as:
alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
I think this should be noted in the changelog, at the very least, even
if it is fine. We should be as explicit as possible about behavior
changes.
>         if (!shrink_wq)
>                 goto fallback_fail;
>
> --
> 2.34.1
>
>
                
            Hi Nhat,
Thanks for checking.
On Tue, Dec 12, 2023 at 12:16 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero
> <debug.penguin32@gmail.com> wrote:
> >
> > Use alloc_workqueue() to create and set finer
> > work item attributes instead of create_workqueue()
> > which is to be deprecated.
> >
> > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > ---
> >  mm/zswap.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 74411dfdad92..64dbe3e944a2 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> >                 zswap_enabled = false;
> >         }
> >
> > -       shrink_wq = create_workqueue("zswap-shrink");
> > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 0);
>
> Hmmm this changes the current behavior a bit right? create_workqueue()
> is currently defined as:
>
> alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
create_workqueue is deprecated and it's observed that most of the
subsystems have changed to using alloc_workqueue. So it's a small
minority of few remnant instances in the kernel and some drivers still
using create_workqueue. With the create_workqueue defined as is , it
hardcodes the workqueue  flags to be per cpu and unbound in nature and
not giving the flexibility of using any other wait queue
flags/attributes. ( WQ_CPU_INTENSIVE, WQ_HIGHPRI, WQ_RESCUER,
WQ_FREEZEABLE, WQ_UNBOUND) . Hence most of the subsystems and drivers
use the alloc_workqueue( ) api.
#define create_workqueue(name) \
alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
> I think this should be noted in the changelog, at the very least, even
> if it is fine. We should be as explicit as possible about behavior
> changes.
>
imo, it's sort of known and consistently changed for quite some time already.
https://lists.openwall.net/linux-kernel/2016/06/07/1086
https://lists.openwall.net/linux-kernel/2011/01/03/124
https://lwn.net/Articles/403891/   => quoted: "The long-term plan, it
seems, is to convert all create_workqueue() users over to an
appropriate alloc_workqueue() call; eventually create_workqueue() will
be removed"
glad to take some suggestions , thoughts ?
BR,
ronald
                
            On Wed, Dec 13, 2023 at 5:20 AM Ronald Monthero
<debug.penguin32@gmail.com> wrote:
>
> Hi Nhat,
> Thanks for checking.
> On Tue, Dec 12, 2023 at 12:16 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero
> > <debug.penguin32@gmail.com> wrote:
> > >
> > > Use alloc_workqueue() to create and set finer
> > > work item attributes instead of create_workqueue()
> > > which is to be deprecated.
> > >
> > > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > > ---
> > >  mm/zswap.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index 74411dfdad92..64dbe3e944a2 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> > >                 zswap_enabled = false;
> > >         }
> > >
> > > -       shrink_wq = create_workqueue("zswap-shrink");
> > > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 0);
> >
> > Hmmm this changes the current behavior a bit right? create_workqueue()
> > is currently defined as:
> >
> > alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
>
> create_workqueue is deprecated and it's observed that most of the
> subsystems have changed to using alloc_workqueue. So it's a small
> minority of few remnant instances in the kernel and some drivers still
> using create_workqueue. With the create_workqueue defined as is , it
> hardcodes the workqueue  flags to be per cpu and unbound in nature and
> not giving the flexibility of using any other wait queue
> flags/attributes. ( WQ_CPU_INTENSIVE, WQ_HIGHPRI, WQ_RESCUER,
> WQ_FREEZEABLE, WQ_UNBOUND) . Hence most of the subsystems and drivers
> use the alloc_workqueue( ) api.
> #define create_workqueue(name) \
> alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
>
> > I think this should be noted in the changelog, at the very least, even
> > if it is fine. We should be as explicit as possible about behavior
> > changes.
> >
> imo, it's sort of known and consistently changed for quite some time already.
> https://lists.openwall.net/linux-kernel/2016/06/07/1086
> https://lists.openwall.net/linux-kernel/2011/01/03/124
> https://lwn.net/Articles/403891/   => quoted: "The long-term plan, it
> seems, is to convert all create_workqueue() users over to an
> appropriate alloc_workqueue() call; eventually create_workqueue() will
> be removed"
>
> glad to take some suggestions , thoughts ?
>
> BR,
> ronald
I should have been clearer. I'm not against the change per-se - I
agree that we should replace create_workqueue() with
alloc_workqueue(). What I meant was, IIUC, there are two behavioral
changes with this new workqueue creation:
a) We're replacing a bounded workqueue (which as you noted, is fixed
by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems
fine to me - I doubt locality buys us much here.
b) create_workqueue() limits the number of concurrent per-cpu
execution contexts at 1 (i.e only one single global reclaimer),
whereas after this patch this is set to the default value. This seems
fine to me too - I don't remember us taking advantage of the previous
concurrency limitation. Also, in practice, the task_struct is
one-to-one with the zswap_pool's anyway, and most of the time, there
is just a single pool being used. (But it begs the question - what's
the point of using 0 instead of 1 here?)
Both seem fine (to me anyway - other reviewers feel free to take a
look and fact-check everything). I just feel like this should be
explicitly noted in the changelog, IMHO, in case we are mistaken and
need to revisit this :) Either way, not a NACK from me.
                
            On Thu, Dec 14, 2023 at 10:28 AM Nhat Pham <nphamcs@gmail.com> wrote: > .. < snipped > > I should have been clearer. I'm not against the change per-se - I > agree that we should replace create_workqueue() with > alloc_workqueue(). What I meant was, IIUC, there are two behavioral > changes with this new workqueue creation: > > a) We're replacing a bounded workqueue (which as you noted, is fixed > by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems > fine to me - I doubt locality buys us much here. Yes the workqueue attribute change per se but the existing functionality remains seamless and the change is more a forward looking change. imo under a memory pressure scenario an unbound workqueue might workaround the scenario better as the number of backing pools is dynamic. And with the WQ_UNBOUND attribute the scheduler is more open to exercise some improvisations in any demanding scenarios for offloading cpu time slicing for workers, ie if any other worker of the same primary cpu had to be served due to workers with WQ_HIGHPRI and WQ_CPU_INTENSIVE.Although normal and highpri worker-pools don't interact with each other, the target cpu atleast need not be the same if our worker for zswap is WQ_UNBOUND. Also noting that the existing wq of zwap worker has the WQ_MEM_RECLAIM attribute, so is there a rescue worker for zswap during a memory pressure scenario ? Quoting: "All work items which might be used on code paths that handle memory reclaim are required to be queued on wq's that have a rescue-worker reserved for execution under memory pressure. Else it is possible that the worker-pool deadlocks waiting for execution contexts to free up" Also additional thought if adding WQ_FREEZABLE attribute while creating the zswap worker make sense in scenarios to handle freeze and unfreeze of specific cgroups or file system wide freeze and unfreeze scenarios ? Does zswap worker participate in freeze/unfreeze code path scenarios ? > b) create_workqueue() limits the number of concurrent per-cpu > execution contexts at 1 (i.e only one single global reclaimer), > whereas after this patch this is set to the default value. This seems > fine to me too - I don't remember us taking advantage of the previous > concurrency limitation. Also, in practice, the task_struct is > one-to-one with the zswap_pool's anyway, and most of the time, there > is just a single pool being used. (But it begs the question - what's > the point of using 0 instead of 1 here?) Nothing in particular but I left it at default 0, which can go upto 256 ( @maxactive per cpu). But if zswap worker is always intended to only have 1 active worker per cpu, then that's fine with 1, otherwise a default setting might be flexible for scaling. just a thought, does having a higher value help for larger memory systems ? > Both seem fine (to me anyway - other reviewers feel free to take a > look and fact-check everything). I just feel like this should be > explicitly noted in the changelog, IMHO, in case we are mistaken and > need to revisit this :) Either way, not a NACK from me. Thanks Nhat, for checking. Above are my thoughts, I could be missing some info or incorrect on certain fronts so I would seek clarifications. Also thanks in advance to other experts/maintainers, please share feedback and suggestions. BR, ronald
On Fri, Dec 15, 2023 at 1:04 AM Ronald Monthero <debug.penguin32@gmail.com> wrote: > > On Thu, Dec 14, 2023 at 10:28 AM Nhat Pham <nphamcs@gmail.com> wrote: > > > .. > < snipped > > > > I should have been clearer. I'm not against the change per-se - I > > agree that we should replace create_workqueue() with > > alloc_workqueue(). What I meant was, IIUC, there are two behavioral > > changes with this new workqueue creation: > > > > a) We're replacing a bounded workqueue (which as you noted, is fixed > > by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems > > fine to me - I doubt locality buys us much here. > > Yes the workqueue attribute change per se but the existing > functionality remains seamless and the change is more a forward > looking change. imo under a memory pressure scenario an unbound > workqueue might workaround the scenario better as the number of > backing pools is dynamic. And with the WQ_UNBOUND attribute the > scheduler is more open to exercise some improvisations in any > demanding scenarios for offloading cpu time slicing for workers, ie if > any other worker of the same primary cpu had to be served due to > workers with WQ_HIGHPRI and WQ_CPU_INTENSIVE.Although normal and > highpri worker-pools don't interact with each other, the target cpu > atleast need not be the same if our worker for zswap is WQ_UNBOUND. I don't object to the change per-se. I just meant that these changes/discussion should be spelled out in the patch's changelog :) IMHO, we should document behavior changes if there are any. For instance, when we switched to kmap_local_page() for zswap, there is a discussion in the changelog regarding how it differs from the old (and deprecated) kmap_atomic(): https://lore.kernel.org/linux-mm/20231127160058.586446-1-fabio.maria.de.francesco@linux.intel.com/ and how that difference doesn't matter in the case of zswap. > > Also noting that the existing wq of zwap worker has the WQ_MEM_RECLAIM > attribute, so is there a rescue worker for zswap during a memory > pressure scenario ? > Quoting: "All work items which might be used on code paths that handle > memory reclaim are required to be queued on wq's that have a > rescue-worker reserved for execution under memory pressure. Else it is > possible that the worker-pool deadlocks waiting for execution contexts > to free up" Seems like it, but this behavior is not changed by your patch :) So i'm not too concerned by it one way or another. > > Also additional thought if adding WQ_FREEZABLE attribute while > creating the zswap worker make sense in scenarios to handle freeze and > unfreeze of specific cgroups or file system wide freeze and unfreeze > scenarios ? Does zswap worker participate in freeze/unfreeze code path > scenarios ? I don't think so, no? This zswap worker is scheduled upon the zswap pool limit hit (which happens on the zswap store/swapping/memory reclaim) path. > > > b) create_workqueue() limits the number of concurrent per-cpu > > execution contexts at 1 (i.e only one single global reclaimer), > > whereas after this patch this is set to the default value. This seems > > fine to me too - I don't remember us taking advantage of the previous > > concurrency limitation. Also, in practice, the task_struct is > > one-to-one with the zswap_pool's anyway, and most of the time, there > > is just a single pool being used. (But it begs the question - what's > > the point of using 0 instead of 1 here?) > > Nothing in particular but I left it at default 0, which can go upto > 256 ( @maxactive per cpu). > But if zswap worker is always intended to only have 1 active worker per cpu, > then that's fine with 1, otherwise a default setting might be flexible > for scaling. > just a thought, does having a higher value help for larger memory systems ? I don't think having higher value helps here tbh. We only have one work_struct per pool, so it shouldn't make a difference either way :) > > > Both seem fine (to me anyway - other reviewers feel free to take a > > look and fact-check everything). I just feel like this should be > > explicitly noted in the changelog, IMHO, in case we are mistaken and > > need to revisit this :) Either way, not a NACK from me. > > Thanks Nhat, for checking. Above are my thoughts, I could be missing > some info or incorrect > on certain fronts so I would seek clarifications. > Also thanks in advance to other experts/maintainers, please share > feedback and suggestions. > > BR, > ronald Also +Chris Li, who is also working on improving zswap :)
The core-api create_workqueue is deprecated, this patch replaces
the create_workqueue with alloc_workqueue. The previous
implementation workqueue of zswap was a bounded workqueue, this
patch uses alloc_workqueue() to create an unbounded workqueue.
The WQ_UNBOUND attribute is desirable making the workqueue
not localized to a specific cpu so that the scheduler is free
to exercise improvisations in any demanding scenarios for
offloading cpu time slices for workqueues.
For example if any other workqueues of the same primary cpu
had to be served which are WQ_HIGHPRI and WQ_CPU_INTENSIVE.
Also Unbound workqueue happens to be more efficient
in a system during memory pressure scenarios in comparison
 to a bounded workqueue.
shrink_wq = alloc_workqueue("zswap-shrink",
                     WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
Overall the change suggested in this patch should be
seamless and does not alter the existing behavior,
other than the improvisation to be an unbounded workqueue.
Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
---
 mm/zswap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 74411dfdad92..64dbe3e944a2 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1620,7 +1620,8 @@ static int zswap_setup(void)
 		zswap_enabled = false;
 	}
 
-	shrink_wq = create_workqueue("zswap-shrink");
+	shrink_wq = alloc_workqueue("zswap-shrink",
+			WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
 	if (!shrink_wq)
 		goto fallback_fail;
 
-- 
2.34.1On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
<debug.penguin32@gmail.com> wrote:
+ Johannes and Yosry
>
> The core-api create_workqueue is deprecated, this patch replaces
> the create_workqueue with alloc_workqueue. The previous
> implementation workqueue of zswap was a bounded workqueue, this
> patch uses alloc_workqueue() to create an unbounded workqueue.
> The WQ_UNBOUND attribute is desirable making the workqueue
> not localized to a specific cpu so that the scheduler is free
> to exercise improvisations in any demanding scenarios for
> offloading cpu time slices for workqueues.
nit: extra space between paragraph would be nice.
> For example if any other workqueues of the same primary cpu
> had to be served which are WQ_HIGHPRI and WQ_CPU_INTENSIVE.
> Also Unbound workqueue happens to be more efficient
> in a system during memory pressure scenarios in comparison
>  to a bounded workqueue.
>
> shrink_wq = alloc_workqueue("zswap-shrink",
>                      WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
>
> Overall the change suggested in this patch should be
> seamless and does not alter the existing behavior,
> other than the improvisation to be an unbounded workqueue.
>
> Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> ---
>  mm/zswap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 74411dfdad92..64dbe3e944a2 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
>                 zswap_enabled = false;
>         }
>
> -       shrink_wq = create_workqueue("zswap-shrink");
> +       shrink_wq = alloc_workqueue("zswap-shrink",
> +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
Have you benchmarked this to check if there is any regression, just to
be safe? With an unbounded workqueue, you're gaining scheduling
flexibility at the cost of cache locality. My intuition is that it
doesn't matter too much here, but you should probably double check by
stress testing - run some workload with a relatively small zswap pool
limit (i.e heavy global writeback), and see if there is any difference
in performance.
>         if (!shrink_wq)
>                 goto fallback_fail;
>
> --
> 2.34.1
>
On a different note, I wonder if it would help to perform synchronous
reclaim here instead. With our current design, the zswap store failure
(due to global limit hit) would leave the incoming page going to swap
instead, creating an LRU inversion. Not sure if that's ideal.
                
            On Wed, Jan 17, 2024 at 11:13 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
> <debug.penguin32@gmail.com> wrote:
>
> + Johannes and Yosry
>
> >
> > The core-api create_workqueue is deprecated, this patch replaces
> > the create_workqueue with alloc_workqueue. The previous
> > implementation workqueue of zswap was a bounded workqueue, this
> > patch uses alloc_workqueue() to create an unbounded workqueue.
> > The WQ_UNBOUND attribute is desirable making the workqueue
> > not localized to a specific cpu so that the scheduler is free
> > to exercise improvisations in any demanding scenarios for
> > offloading cpu time slices for workqueues.
>
> nit: extra space between paragraph would be nice.
>
> > For example if any other workqueues of the same primary cpu
> > had to be served which are WQ_HIGHPRI and WQ_CPU_INTENSIVE.
> > Also Unbound workqueue happens to be more efficient
> > in a system during memory pressure scenarios in comparison
> >  to a bounded workqueue.
> >
> > shrink_wq = alloc_workqueue("zswap-shrink",
> >                      WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
> >
> > Overall the change suggested in this patch should be
> > seamless and does not alter the existing behavior,
> > other than the improvisation to be an unbounded workqueue.
> >
> > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > ---
> >  mm/zswap.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 74411dfdad92..64dbe3e944a2 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> >                 zswap_enabled = false;
> >         }
> >
> > -       shrink_wq = create_workqueue("zswap-shrink");
> > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
>
[...]
> >         if (!shrink_wq)
> >                 goto fallback_fail;
> >
> > --
> > 2.34.1
> >
FWIW:
Acked-by: Nhat Pham <nphamcs@gmail.com>
                
            On Wed, Jan 17, 2024 at 11:14 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
> <debug.penguin32@gmail.com> wrote:
>
> + Johannes and Yosry
>
> >
> > The core-api create_workqueue is deprecated, this patch replaces
> > the create_workqueue with alloc_workqueue. The previous
> > implementation workqueue of zswap was a bounded workqueue, this
> > patch uses alloc_workqueue() to create an unbounded workqueue.
> > The WQ_UNBOUND attribute is desirable making the workqueue
> > not localized to a specific cpu so that the scheduler is free
> > to exercise improvisations in any demanding scenarios for
> > offloading cpu time slices for workqueues.
>
> nit: extra space between paragraph would be nice.
>
> > For example if any other workqueues of the same primary cpu
> > had to be served which are WQ_HIGHPRI and WQ_CPU_INTENSIVE.
> > Also Unbound workqueue happens to be more efficient
> > in a system during memory pressure scenarios in comparison
> >  to a bounded workqueue.
> >
> > shrink_wq = alloc_workqueue("zswap-shrink",
> >                      WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
> >
> > Overall the change suggested in this patch should be
> > seamless and does not alter the existing behavior,
> > other than the improvisation to be an unbounded workqueue.
> >
> > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > ---
> >  mm/zswap.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 74411dfdad92..64dbe3e944a2 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> >                 zswap_enabled = false;
> >         }
> >
> > -       shrink_wq = create_workqueue("zswap-shrink");
> > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
>
> Have you benchmarked this to check if there is any regression, just to
> be safe? With an unbounded workqueue, you're gaining scheduling
> flexibility at the cost of cache locality. My intuition is that it
> doesn't matter too much here, but you should probably double check by
> stress testing - run some workload with a relatively small zswap pool
> limit (i.e heavy global writeback), and see if there is any difference
> in performance.
I also think this shouldn't make a large difference. The global
shrinking work is already expensive, and I imagine that it exhausts
the caches anyway by iterating memcgs. A performance smoketest would
be reassuring for sure, but I believe it won't make a difference.
Keep in mind that even with WQ_UNBOUND, we prefer the local CPU (see
wq_select_unbound_cpu()), so it will take more than global writeback
to observe a difference. The local CPU must not be in
wq_unbound_cpumask, or CONFIG_DEBUG_WQ_FORCE_RR_CPU should be on.
>
> >         if (!shrink_wq)
> >                 goto fallback_fail;
> >
> > --
> > 2.34.1
> >
>
> On a different note, I wonder if it would help to perform synchronous
> reclaim here instead. With our current design, the zswap store failure
> (due to global limit hit) would leave the incoming page going to swap
> instead, creating an LRU inversion. Not sure if that's ideal.
The global shrink path keeps reclaiming until zswap can accept again
(by default, that means reclaiming 10% of the total limit). I think
this is too expensive to be done synchronously.
                
            On Wed, Jan 17, 2024 at 11:30:50AM -0800, Yosry Ahmed wrote:
> On Wed, Jan 17, 2024 at 11:14 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
> > <debug.penguin32@gmail.com> wrote:
> >
> > + Johannes and Yosry
> >
> > >
> > > The core-api create_workqueue is deprecated, this patch replaces
> > > the create_workqueue with alloc_workqueue. The previous
> > > implementation workqueue of zswap was a bounded workqueue, this
> > > patch uses alloc_workqueue() to create an unbounded workqueue.
> > > The WQ_UNBOUND attribute is desirable making the workqueue
> > > not localized to a specific cpu so that the scheduler is free
> > > to exercise improvisations in any demanding scenarios for
> > > offloading cpu time slices for workqueues.
> >
> > nit: extra space between paragraph would be nice.
> >
> > > For example if any other workqueues of the same primary cpu
> > > had to be served which are WQ_HIGHPRI and WQ_CPU_INTENSIVE.
> > > Also Unbound workqueue happens to be more efficient
> > > in a system during memory pressure scenarios in comparison
> > >  to a bounded workqueue.
> > >
> > > shrink_wq = alloc_workqueue("zswap-shrink",
> > >                      WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
> > >
> > > Overall the change suggested in this patch should be
> > > seamless and does not alter the existing behavior,
> > > other than the improvisation to be an unbounded workqueue.
> > >
> > > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > > ---
> > >  mm/zswap.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index 74411dfdad92..64dbe3e944a2 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> > >                 zswap_enabled = false;
> > >         }
> > >
> > > -       shrink_wq = create_workqueue("zswap-shrink");
> > > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
> >
> > Have you benchmarked this to check if there is any regression, just to
> > be safe? With an unbounded workqueue, you're gaining scheduling
> > flexibility at the cost of cache locality. My intuition is that it
> > doesn't matter too much here, but you should probably double check by
> > stress testing - run some workload with a relatively small zswap pool
> > limit (i.e heavy global writeback), and see if there is any difference
> > in performance.
> 
> I also think this shouldn't make a large difference. The global
> shrinking work is already expensive, and I imagine that it exhausts
> the caches anyway by iterating memcgs. A performance smoketest would
> be reassuring for sure, but I believe it won't make a difference.
The LRU inherently makes the shrinker work on the oldest and coldest
entries, so I doubt we benefit a lot from cache locality there.
What could make a difference though is the increased concurrency by
switching max_active from 1 to 0. This could cause a higher rate of
shrinker runs, which might increase lock contention and reclaim
volume. That part would be good to double check with the shrinker
benchmarks.
> > On a different note, I wonder if it would help to perform synchronous
> > reclaim here instead. With our current design, the zswap store failure
> > (due to global limit hit) would leave the incoming page going to swap
> > instead, creating an LRU inversion. Not sure if that's ideal.
> 
> The global shrink path keeps reclaiming until zswap can accept again
> (by default, that means reclaiming 10% of the total limit). I think
> this is too expensive to be done synchronously.
That thresholding code is a bit weird right now.
It wakes the shrinker and rejects at the same time. We're guaranteed
to see rejections, even if the shrinker has no trouble flushing some
entries a split second later.
It would make more sense to wake the shrinker at e.g. 95% full and
have it run until 90%.
But with that in place we also *should* do synchronous reclaim once we
hit 100%. Just enough to make room for the store. This is important to
catch the case where reclaim rate exceeds swapout rate. Rejecting and
going to swap means the reclaimer will be throttled down to IO rate
anyway, and the app latency isn't any worse. But this way we keep the
pipeline alive, and keep swapping out the oldest zswap entries,
instead of rejecting and swapping what would be the hottest ones.
                
            > > > On a different note, I wonder if it would help to perform synchronous > > > reclaim here instead. With our current design, the zswap store failure > > > (due to global limit hit) would leave the incoming page going to swap > > > instead, creating an LRU inversion. Not sure if that's ideal. > > > > The global shrink path keeps reclaiming until zswap can accept again > > (by default, that means reclaiming 10% of the total limit). I think > > this is too expensive to be done synchronously. > > That thresholding code is a bit weird right now. > > It wakes the shrinker and rejects at the same time. We're guaranteed > to see rejections, even if the shrinker has no trouble flushing some > entries a split second later. > > It would make more sense to wake the shrinker at e.g. 95% full and > have it run until 90%. > > But with that in place we also *should* do synchronous reclaim once we > hit 100%. Just enough to make room for the store. This is important to > catch the case where reclaim rate exceeds swapout rate. Rejecting and > going to swap means the reclaimer will be throttled down to IO rate > anyway, and the app latency isn't any worse. But this way we keep the > pipeline alive, and keep swapping out the oldest zswap entries, > instead of rejecting and swapping what would be the hottest ones. I fully agree with the thresholding code being weird, and with waking up the shrinker before the pool is full. What I don't understand is how we can do synchronous reclaim when we hit 100% and still respect the acceptance threshold :/ Are you proposing we change the semantics of the acceptance threshold to begin with?
On Thu, Jan 18, 2024 at 09:06:43AM -0800, Yosry Ahmed wrote: > > > > On a different note, I wonder if it would help to perform synchronous > > > > reclaim here instead. With our current design, the zswap store failure > > > > (due to global limit hit) would leave the incoming page going to swap > > > > instead, creating an LRU inversion. Not sure if that's ideal. > > > > > > The global shrink path keeps reclaiming until zswap can accept again > > > (by default, that means reclaiming 10% of the total limit). I think > > > this is too expensive to be done synchronously. > > > > That thresholding code is a bit weird right now. > > > > It wakes the shrinker and rejects at the same time. We're guaranteed > > to see rejections, even if the shrinker has no trouble flushing some > > entries a split second later. > > > > It would make more sense to wake the shrinker at e.g. 95% full and > > have it run until 90%. > > > > But with that in place we also *should* do synchronous reclaim once we > > hit 100%. Just enough to make room for the store. This is important to > > catch the case where reclaim rate exceeds swapout rate. Rejecting and > > going to swap means the reclaimer will be throttled down to IO rate > > anyway, and the app latency isn't any worse. But this way we keep the > > pipeline alive, and keep swapping out the oldest zswap entries, > > instead of rejecting and swapping what would be the hottest ones. > > I fully agree with the thresholding code being weird, and with waking > up the shrinker before the pool is full. What I don't understand is > how we can do synchronous reclaim when we hit 100% and still respect > the acceptance threshold :/ > > Are you proposing we change the semantics of the acceptance threshold > to begin with? I kind of am. It's worth looking at the history of this knob. It was added in 2020 by 45190f01dd402112d3d22c0ddc4152994f9e1e55, and from the changelogs and the code in this patch I do not understand how this was supposed to work. It also *didn't* work for very basic real world applications. See Domenico's follow-up (e0228d590beb0d0af345c58a282f01afac5c57f3), which effectively reverted it to get halfway reasonable behavior. If there are no good usecases for this knob, then I think it makes sense to phase it out again.
On Thu, Jan 18, 2024 at 9:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Thu, Jan 18, 2024 at 09:06:43AM -0800, Yosry Ahmed wrote: > > > > > On a different note, I wonder if it would help to perform synchronous > > > > > reclaim here instead. With our current design, the zswap store failure > > > > > (due to global limit hit) would leave the incoming page going to swap > > > > > instead, creating an LRU inversion. Not sure if that's ideal. > > > > > > > > The global shrink path keeps reclaiming until zswap can accept again > > > > (by default, that means reclaiming 10% of the total limit). I think > > > > this is too expensive to be done synchronously. > > > > > > That thresholding code is a bit weird right now. > > > > > > It wakes the shrinker and rejects at the same time. We're guaranteed > > > to see rejections, even if the shrinker has no trouble flushing some > > > entries a split second later. > > > > > > It would make more sense to wake the shrinker at e.g. 95% full and > > > have it run until 90%. Yep, we should be reclaiming zswap objects way ahead of the pool limit. Hence the new shrinker, which is memory pressure-driven (i.e independent of zswap internal limits), and will typically be triggered even if the pool is not full. During experiments, I never observe the pool becoming full, with the default settings. I'd be happy to extend it (or build in extra shrinking logic) to cover these pool limits too, if it turns out to be necessary. > > > > > > But with that in place we also *should* do synchronous reclaim once we > > > hit 100%. Just enough to make room for the store. This is important to > > > catch the case where reclaim rate exceeds swapout rate. Rejecting and > > > going to swap means the reclaimer will be throttled down to IO rate > > > anyway, and the app latency isn't any worse. But this way we keep the > > > pipeline alive, and keep swapping out the oldest zswap entries, > > > instead of rejecting and swapping what would be the hottest ones. > > > > I fully agree with the thresholding code being weird, and with waking > > up the shrinker before the pool is full. What I don't understand is > > how we can do synchronous reclaim when we hit 100% and still respect > > the acceptance threshold :/ > > > > Are you proposing we change the semantics of the acceptance threshold > > to begin with? > > I kind of am. It's worth looking at the history of this knob. > > It was added in 2020 by 45190f01dd402112d3d22c0ddc4152994f9e1e55, and > from the changelogs and the code in this patch I do not understand how > this was supposed to work. > > It also *didn't* work for very basic real world applications. See > Domenico's follow-up (e0228d590beb0d0af345c58a282f01afac5c57f3), which > effectively reverted it to get halfway reasonable behavior. > > If there are no good usecases for this knob, then I think it makes > sense to phase it out again. Yeah this was my original proposal - remove this knob altogether :) Based on a cursory read, it just seems like zswap was originally trying to shrink (synchronously) one "object", then try to check if the pool size is now under the limit. This is indeed insufficient. However, I'm not quite convinced by the solution (hysteresis) either. Maybe we can synchronously shrink a la Domenico, i.e until the pool can accept new pages, but this time capacity-based (maybe under the limit + some headroom, 1 page for example)? This is just so that the immediate incoming zswap store succeeds - we can still have the shrinker freeing up space later on (or maybe keep an asynchronous pool-limit based shrinker around).
Thanks for the reviews.
This patch is available at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-zswap-improve-with-alloc_workqueue-call.patch
This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ronald Monthero <debug.penguin32@gmail.com>
Subject: mm/zswap: improve with alloc_workqueue() call
Date: Tue, 16 Jan 2024 23:31:45 +1000
The core-api create_workqueue is deprecated, this patch replaces the
create_workqueue with alloc_workqueue.  The previous implementation
workqueue of zswap was a bounded workqueue, this patch uses
alloc_workqueue() to create an unbounded workqueue.  The WQ_UNBOUND
attribute is desirable making the workqueue not localized to a specific
cpu so that the scheduler is free to exercise improvisations in any
demanding scenarios for offloading cpu time slices for workqueues.  For
example if any other workqueues of the same primary cpu had to be served
which are WQ_HIGHPRI and WQ_CPU_INTENSIVE.  Also Unbound workqueue happens
to be more efficient in a system during memory pressure scenarios in
comparison to a bounded workqueue.
shrink_wq = alloc_workqueue("zswap-shrink",
                     WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
Overall the change suggested in this patch should be seamless and does not
alter the existing behavior, other than the improvisation to be an
unbounded workqueue.
Link: https://lkml.kernel.org/r/20240116133145.12454-1-debug.penguin32@gmail.com
Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/zswap.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/zswap.c~mm-zswap-improve-with-alloc_workqueue-call
+++ a/mm/zswap.c
@@ -1884,7 +1884,8 @@ static int zswap_setup(void)
                zswap_enabled = false;
        }
-       shrink_wq = create_workqueue("zswap-shrink");
+       shrink_wq = alloc_workqueue("zswap-shrink",
+                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
        if (!shrink_wq)
                goto fallback_fail;
_
On Fri, Jan 19, 2024 at 4:33 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Thu, Jan 18, 2024 at 9:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Jan 18, 2024 at 09:06:43AM -0800, Yosry Ahmed wrote:
> > > > > > On a different note, I wonder if it would help to perform synchronous
> > > > > > reclaim here instead. With our current design, the zswap store failure
> > > > > > (due to global limit hit) would leave the incoming page going to swap
> > > > > > instead, creating an LRU inversion. Not sure if that's ideal.
> > > > >
> > > > > The global shrink path keeps reclaiming until zswap can accept again
> > > > > (by default, that means reclaiming 10% of the total limit). I think
> > > > > this is too expensive to be done synchronously.
> > > >
> > > > That thresholding code is a bit weird right now.
> > > >
> > > > It wakes the shrinker and rejects at the same time. We're guaranteed
> > > > to see rejections, even if the shrinker has no trouble flushing some
> > > > entries a split second later.
> > > >
> > > > It would make more sense to wake the shrinker at e.g. 95% full and
> > > > have it run until 90%.
>
> Yep, we should be reclaiming zswap objects way ahead of the pool
> limit. Hence the new shrinker, which is memory pressure-driven (i.e
> independent of zswap internal limits), and will typically be triggered
> even if the pool is not full. During experiments, I never observe the
> pool becoming full, with the default settings. I'd be happy to extend
> it (or build in extra shrinking logic) to cover these pool limits too,
> if it turns out to be necessary.
>
> > > >
> > > > But with that in place we also *should* do synchronous reclaim once we
> > > > hit 100%. Just enough to make room for the store. This is important to
> > > > catch the case where reclaim rate exceeds swapout rate. Rejecting and
> > > > going to swap means the reclaimer will be throttled down to IO rate
> > > > anyway, and the app latency isn't any worse. But this way we keep the
> > > > pipeline alive, and keep swapping out the oldest zswap entries,
> > > > instead of rejecting and swapping what would be the hottest ones.
> > >
> > > I fully agree with the thresholding code being weird, and with waking
> > > up the shrinker before the pool is full. What I don't understand is
> > > how we can do synchronous reclaim when we hit 100% and still respect
> > > the acceptance threshold :/
> > >
> > > Are you proposing we change the semantics of the acceptance threshold
> > > to begin with?
> >
> > I kind of am. It's worth looking at the history of this knob.
> >
> > It was added in 2020 by 45190f01dd402112d3d22c0ddc4152994f9e1e55, and
> > from the changelogs and the code in this patch I do not understand how
> > this was supposed to work.
> >
> > It also *didn't* work for very basic real world applications. See
> > Domenico's follow-up (e0228d590beb0d0af345c58a282f01afac5c57f3), which
> > effectively reverted it to get halfway reasonable behavior.
> >
> > If there are no good usecases for this knob, then I think it makes
> > sense to phase it out again.
>
> Yeah this was my original proposal - remove this knob altogether :)
> Based on a cursory read, it just seems like zswap was originally
> trying to shrink (synchronously) one "object", then try to check if
> the pool size is now under the limit. This is indeed insufficient.
> However, I'm not quite convinced by the solution (hysteresis) either.
>
> Maybe we can synchronously shrink a la Domenico, i.e until the pool
> can accept new pages, but this time capacity-based (maybe under the
> limit + some headroom, 1 page for example)? This is just so that the
> immediate incoming zswap store succeeds - we can still have the
> shrinker freeing up space later on (or maybe keep an asynchronous
> pool-limit based shrinker around).
                
            On Thu, Jan 18, 2024 at 9:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Thu, Jan 18, 2024 at 09:06:43AM -0800, Yosry Ahmed wrote: > > > > > On a different note, I wonder if it would help to perform synchronous > > > > > reclaim here instead. With our current design, the zswap store failure > > > > > (due to global limit hit) would leave the incoming page going to swap > > > > > instead, creating an LRU inversion. Not sure if that's ideal. > > > > > > > > The global shrink path keeps reclaiming until zswap can accept again > > > > (by default, that means reclaiming 10% of the total limit). I think > > > > this is too expensive to be done synchronously. > > > > > > That thresholding code is a bit weird right now. > > > > > > It wakes the shrinker and rejects at the same time. We're guaranteed > > > to see rejections, even if the shrinker has no trouble flushing some > > > entries a split second later. > > > > > > It would make more sense to wake the shrinker at e.g. 95% full and > > > have it run until 90%. > > > > > > But with that in place we also *should* do synchronous reclaim once we > > > hit 100%. Just enough to make room for the store. This is important to > > > catch the case where reclaim rate exceeds swapout rate. Rejecting and > > > going to swap means the reclaimer will be throttled down to IO rate > > > anyway, and the app latency isn't any worse. But this way we keep the > > > pipeline alive, and keep swapping out the oldest zswap entries, > > > instead of rejecting and swapping what would be the hottest ones. > > > > I fully agree with the thresholding code being weird, and with waking > > up the shrinker before the pool is full. What I don't understand is > > how we can do synchronous reclaim when we hit 100% and still respect > > the acceptance threshold :/ > > > > Are you proposing we change the semantics of the acceptance threshold > > to begin with? > > I kind of am. It's worth looking at the history of this knob. > > It was added in 2020 by 45190f01dd402112d3d22c0ddc4152994f9e1e55, and > from the changelogs and the code in this patch I do not understand how > this was supposed to work. > > It also *didn't* work for very basic real world applications. See > Domenico's follow-up (e0228d590beb0d0af345c58a282f01afac5c57f3), which > effectively reverted it to get halfway reasonable behavior. > > If there are no good usecases for this knob, then I think it makes > sense to phase it out again. I am always nervous about removing/altering user visible knobs, but if you think it's fine then I am all for it. I think it makes more sense to start writeback early to avoid the whole situation if possible, and synchronously reclaim a little bit if we hit 100%. I think the proactive writeback should reduce the amount of synchronous IO we need to do in reclaim as well, so we may see some latency improvements.
On Thu, Jan 18, 2024 at 11:16:08AM -0500, Johannes Weiner wrote:
> > > On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
> > > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> > > >                 zswap_enabled = false;
> > > >         }
> > > >
> > > > -       shrink_wq = create_workqueue("zswap-shrink");
> > > > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > > > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
> What could make a difference though is the increased concurrency by
> switching max_active from 1 to 0. This could cause a higher rate of
> shrinker runs, which might increase lock contention and reclaim
> volume. That part would be good to double check with the shrinker
> benchmarks.
Nevermind, I clearly can't read.
Could still be worthwhile testing with the default 0, but it's not a
concern in the patch as-is.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
                
            On Thu, Jan 18, 2024 at 8:48 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Jan 18, 2024 at 11:16:08AM -0500, Johannes Weiner wrote:
> > > > On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
> > > > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> > > > >                 zswap_enabled = false;
> > > > >         }
> > > > >
> > > > > -       shrink_wq = create_workqueue("zswap-shrink");
> > > > > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > > > > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
>
> > What could make a difference though is the increased concurrency by
> > switching max_active from 1 to 0. This could cause a higher rate of
> > shrinker runs, which might increase lock contention and reclaim
> > volume. That part would be good to double check with the shrinker
> > benchmarks.
>
> Nevermind, I clearly can't read.
Regardless of max_active, we only have one shrink_work per zswap pool,
and we can only have one instance of the work running at any time,
right?
>
> Could still be worthwhile testing with the default 0, but it's not a
> concern in the patch as-is.
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
                
            On Thu, Jan 18, 2024 at 9:03 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, Jan 18, 2024 at 8:48 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Jan 18, 2024 at 11:16:08AM -0500, Johannes Weiner wrote:
> > > > > On Tue, Jan 16, 2024 at 5:32 AM Ronald Monthero
> > > > > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> > > > > >                 zswap_enabled = false;
> > > > > >         }
> > > > > >
> > > > > > -       shrink_wq = create_workqueue("zswap-shrink");
> > > > > > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > > > > > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 1);
> >
> > > What could make a difference though is the increased concurrency by
> > > switching max_active from 1 to 0. This could cause a higher rate of
> > > shrinker runs, which might increase lock contention and reclaim
> > > volume. That part would be good to double check with the shrinker
> > > benchmarks.
> >
> > Nevermind, I clearly can't read.
>
> Regardless of max_active, we only have one shrink_work per zswap pool,
> and we can only have one instance of the work running at any time,
> right?
I believe so, yeah. Well I guess you can have a weird setup where
somehow multiple pools are full and submit shrink_work concurrently?
But who does that :) But let's just keep it as is to reduce our mental
workload (i.e not having to keep track of what changes) would be
ideal.
>
> >
> > Could still be worthwhile testing with the default 0, but it's not a
> > concern in the patch as-is.
> >
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
                
            On Wed, Dec 13, 2023 at 4:28 PM Nhat Pham <nphamcs@gmail.com> wrote: > > concurrency limitation. Also, in practice, the task_struct is /s/task/work. I was looking at some articles about tasks recently, so my brain just auto-completed. I was referring to pool->shrink_work here.
© 2016 - 2025 Red Hat, Inc.