workqueue: Introduce PF_WQ_RESCUE_WORKER

[RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Aaron Tomlin 2 years, 6 months ago

The Linux kernel does not provide a way to differentiate between a
kworker and a rescue kworker for user-mode.
From user-mode, one can establish if a task is a kworker by testing for
PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via
/proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search
for the function namely "rescuer_thread". This is only available to the
root user.

It can be useful to identify a rescue kworker since their CPU affinity
cannot be modified and their initial CPU assignment can be safely ignored.
Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and
WQ_SYSFS the cpumask file is not applicable to the rescue kworker.
By design a rescue kworker should run anywhere.

This patch series introduces PF_WQ_RESCUE_WORKER and ensures it is set and
cleared appropriately and simplifies current_is_workqueue_rescuer().

Aaron Tomlin (2):
  workqueue: Introduce PF_WQ_RESCUE_WORKER
  workqueue: Simplify current_is_workqueue_rescuer()

 include/linux/sched.h |  2 +-
 kernel/workqueue.c    | 25 +++++++++++++++----------
 2 files changed, 16 insertions(+), 11 deletions(-)

-- 
2.39.1

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

Hi,

Just stumbled upon this series while looking into rescuers myself. :)

On 29/07/23 14:53, Aaron Tomlin wrote:
> The Linux kernel does not provide a way to differentiate between a
> kworker and a rescue kworker for user-mode.
> From user-mode, one can establish if a task is a kworker by testing for
> PF_WQ_WORKER in a specified task's flags bit mask (or bitmap) via
> /proc/[PID]/stat. Indeed, one can examine /proc/[PID]/stack and search
> for the function namely "rescuer_thread". This is only available to the
> root user.
> 
> It can be useful to identify a rescue kworker since their CPU affinity
> cannot be modified and their initial CPU assignment can be safely ignored.
> Furthermore, a workqueue that was created with WQ_MEM_RECLAIM and
> WQ_SYSFS the cpumask file is not applicable to the rescue kworker.
> By design a rescue kworker should run anywhere.

Guess this is a requirement because, if workqueue processing is stuck
for some reason, getting rescuers to run on the same set of cpus
workqueues have been restricted to already doesn't really have good
chances of making any progress?

Wonder if we still might need some sort of fail hard/warn mode in case
strict isolation is in place? Or maybe we have that already?

Thanks!
Juri

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 1 month ago

Hello,

On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote:
> Guess this is a requirement because, if workqueue processing is stuck
> for some reason, getting rescuers to run on the same set of cpus
> workqueues have been restricted to already doesn't really have good
> chances of making any progress?

The only problem rescuers try to solve is deadlocks caused by lack of
memory, so on the cpu side, it just follows whatever worker pool it's trying
to help.

> Wonder if we still might need some sort of fail hard/warn mode in case
> strict isolation is in place? Or maybe we have that already?

For both percpu and unbound workqueues, the rescuers just follow whatever
pool it's trying to help at the moment, so it shouldn't cause any surprises
in terms of isolation. It just temporarily joins the already active but
stuck pool.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

Hello,

Thanks for the quick reply!

On 11/12/23 08:39, Tejun Heo wrote:
> Hello,
> 
> On Mon, Dec 11, 2023 at 03:51:57PM +0100, Juri Lelli wrote:
> > Guess this is a requirement because, if workqueue processing is stuck
> > for some reason, getting rescuers to run on the same set of cpus
> > workqueues have been restricted to already doesn't really have good
> > chances of making any progress?
> 
> The only problem rescuers try to solve is deadlocks caused by lack of
> memory, so on the cpu side, it just follows whatever worker pool it's trying
> to help.
> 
> > Wonder if we still might need some sort of fail hard/warn mode in case
> > strict isolation is in place? Or maybe we have that already?
> 
> For both percpu and unbound workqueues, the rescuers just follow whatever
> pool it's trying to help at the moment, so it shouldn't cause any surprises
> in terms of isolation. It just temporarily joins the already active but
> stuck pool.

Hummm, OK, but in terms of which CPU the rescuer is possibly woken up,
how are we making sure that the wake up is always happening on
housekeeping CPUs (assuming unbound workqueues have been restricted to
those)?

AFAICS, we have

send_mayday ->
  wake_up_process(wq->rescuer->task)

which is not affined to the workqueue cpumask it's called to rescue, so
in theory can be woken up anywhere?

Thanks,
Juri

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 1 month ago

Hello, Juri.

On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote:
> Hummm, OK, but in terms of which CPU the rescuer is possibly woken up,
> how are we making sure that the wake up is always happening on
> housekeeping CPUs (assuming unbound workqueues have been restricted to
> those)?
> 
> AFAICS, we have
> 
> send_mayday ->
>   wake_up_process(wq->rescuer->task)
> 
> which is not affined to the workqueue cpumask it's called to rescue, so
> in theory can be woken up anywhere?

Ah, was only thinking about work item execution. Yeah, it's not following
the isolation rule there and we probably should affine it as we're waking it
up.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

On 12/12/23 07:14, Tejun Heo wrote:
> Hello, Juri.
> 
> On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote:
> > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up,
> > how are we making sure that the wake up is always happening on
> > housekeeping CPUs (assuming unbound workqueues have been restricted to
> > those)?
> > 
> > AFAICS, we have
> > 
> > send_mayday ->
> >   wake_up_process(wq->rescuer->task)
> > 
> > which is not affined to the workqueue cpumask it's called to rescue, so
> > in theory can be woken up anywhere?
> 
> Ah, was only thinking about work item execution. Yeah, it's not following
> the isolation rule there and we probably should affine it as we're waking it
> up.

Something like the following then maybe?

---
 kernel/workqueue.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2989b57e154a7..ed73f7f80d57d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
        link_pwq(ctx->dfl_pwq);
        swap(ctx->wq->dfl_pwq, ctx->dfl_pwq);

+       /* rescuer needs to respect wq cpumask changes */
+       if (ctx->wq->rescuer) {
+               kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask);
+               wake_up_process(ctx->wq->rescuer->task);
+       }
+
        mutex_unlock(&ctx->wq->mutex);
 }

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 1 month ago

Hello,

On Wed, Dec 13, 2023 at 09:59:42AM +0100, Juri Lelli wrote:
> Something like the following then maybe?
> 
> ---
>  kernel/workqueue.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 2989b57e154a7..ed73f7f80d57d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
>         link_pwq(ctx->dfl_pwq);
>         swap(ctx->wq->dfl_pwq, ctx->dfl_pwq);
> 
> +       /* rescuer needs to respect wq cpumask changes */
> +       if (ctx->wq->rescuer) {
> +               kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask);
> +               wake_up_process(ctx->wq->rescuer->task);
> +       }
> +
>         mutex_unlock(&ctx->wq->mutex);
>  }

I'm not sure kthread_bind_mask() would be safe here. The rescuer might be
running a work item. wait_task_inactive() might fail and we don't want to
change cpumask while the rescuer is active anyway.

Maybe the easiest way to do this is making rescuer_thread() restore the wq's
cpumask right before going to sleep, and making apply_wqattrs_commit() just
wake up the rescuer.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

On 13/12/23 05:35, Tejun Heo wrote:
> Hello,
> 
> On Wed, Dec 13, 2023 at 09:59:42AM +0100, Juri Lelli wrote:
> > Something like the following then maybe?
> > 
> > ---
> >  kernel/workqueue.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 2989b57e154a7..ed73f7f80d57d 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -4405,6 +4405,12 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
> >         link_pwq(ctx->dfl_pwq);
> >         swap(ctx->wq->dfl_pwq, ctx->dfl_pwq);
> > 
> > +       /* rescuer needs to respect wq cpumask changes */
> > +       if (ctx->wq->rescuer) {
> > +               kthread_bind_mask(ctx->wq->rescuer->task, ctx->attrs->cpumask);
> > +               wake_up_process(ctx->wq->rescuer->task);
> > +       }
> > +
> >         mutex_unlock(&ctx->wq->mutex);
> >  }
> 
> I'm not sure kthread_bind_mask() would be safe here. The rescuer might be
> running a work item. wait_task_inactive() might fail and we don't want to
> change cpumask while the rescuer is active anyway.
> 
> Maybe the easiest way to do this is making rescuer_thread() restore the wq's
> cpumask right before going to sleep, and making apply_wqattrs_commit() just
> wake up the rescuer.

Hummm, don't think we can call that either while the rescuer is actually
running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/
in the above?

Thanks,
Juri

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 1 month ago

On Wed, Dec 13, 2023 at 07:32:10PM +0100, Juri Lelli wrote:
> > Maybe the easiest way to do this is making rescuer_thread() restore the wq's
> > cpumask right before going to sleep, and making apply_wqattrs_commit() just
> > wake up the rescuer.
> 
> Hummm, don't think we can call that either while the rescuer is actually
> running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/
> in the above?

So, we have to use set_cpus_allowed_ptr() but we still don't want to change
the affinity of a rescuer which is already running a task for a pool.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

On 13/12/23 08:38, Tejun Heo wrote:
> On Wed, Dec 13, 2023 at 07:32:10PM +0100, Juri Lelli wrote:
> > > Maybe the easiest way to do this is making rescuer_thread() restore the wq's
> > > cpumask right before going to sleep, and making apply_wqattrs_commit() just
> > > wake up the rescuer.
> > 
> > Hummm, don't think we can call that either while the rescuer is actually
> > running. Maybe we can simply s/kthread_bind_mask/set_cpus_allowed_ptr/
> > in the above?
> 
> So, we have to use set_cpus_allowed_ptr() but we still don't want to change
> the affinity of a rescuer which is already running a task for a pool.

But then, even today, a rescuer might keep handling work on a cpu
outside its wq cpumask if the associated wq cpumask change can proceed
w/o waiting for it to finish the iteration?

BTW, apologies for all the questions, but I'd like to make sure I can
get the implications hopefully right. :)

Thanks,
Juri

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 1 month ago

Hello,

On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote:
> > So, we have to use set_cpus_allowed_ptr() but we still don't want to change
> > the affinity of a rescuer which is already running a task for a pool.
> 
> But then, even today, a rescuer might keep handling work on a cpu
> outside its wq cpumask if the associated wq cpumask change can proceed
> w/o waiting for it to finish the iteration?

Yeah, that can happen and pool cpumasks naturally being subsets of the wq's
cpumask that they're serving, your original approach likely isn't broken
either.

> BTW, apologies for all the questions, but I'd like to make sure I can
> get the implications hopefully right. :)

I obviously haven't thought through it very well, so thanks for the
questions. So, yeah, I think we actually need to set the rescuer's cpumask
when wq's cpumask changes and doing it where you were suggesting should
probably work.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

On 14/12/23 09:47, Tejun Heo wrote:
> Hello,
> 
> On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote:
> > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change
> > > the affinity of a rescuer which is already running a task for a pool.
> > 
> > But then, even today, a rescuer might keep handling work on a cpu
> > outside its wq cpumask if the associated wq cpumask change can proceed
> > w/o waiting for it to finish the iteration?
> 
> Yeah, that can happen and pool cpumasks naturally being subsets of the wq's
> cpumask that they're serving, your original approach likely isn't broken
> either.
> 
> > BTW, apologies for all the questions, but I'd like to make sure I can
> > get the implications hopefully right. :)
> 
> I obviously haven't thought through it very well, so thanks for the
> questions. So, yeah, I think we actually need to set the rescuer's cpumask
> when wq's cpumask changes and doing it where you were suggesting should
> probably work.

OK. Going to send a proper patch asap.

Thanks!
Juri

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Juri Lelli 2 years, 1 month ago

Hello again,

On 15/12/23 07:50, Juri Lelli wrote:
> On 14/12/23 09:47, Tejun Heo wrote:
> > Hello,
> > 
> > On Thu, Dec 14, 2023 at 12:25:25PM +0100, Juri Lelli wrote:
> > > > So, we have to use set_cpus_allowed_ptr() but we still don't want to change
> > > > the affinity of a rescuer which is already running a task for a pool.
> > > 
> > > But then, even today, a rescuer might keep handling work on a cpu
> > > outside its wq cpumask if the associated wq cpumask change can proceed
> > > w/o waiting for it to finish the iteration?
> > 
> > Yeah, that can happen and pool cpumasks naturally being subsets of the wq's
> > cpumask that they're serving, your original approach likely isn't broken
> > either.
> > 
> > > BTW, apologies for all the questions, but I'd like to make sure I can
> > > get the implications hopefully right. :)
> > 
> > I obviously haven't thought through it very well, so thanks for the
> > questions. So, yeah, I think we actually need to set the rescuer's cpumask
> > when wq's cpumask changes and doing it where you were suggesting should
> > probably work.
> 
> OK. Going to send a proper patch asap.

I actually didn't do that yet as it turns out the proposed approach
doesn't cover !WQ_SYSFS unbounded wqs. Well, I thought those should be
covered as well, since we have (initiated by echo <mask> into
/sys/devices/virtual/workqueue/cpumask)

workqueue_apply_unbound_cpumask ->
  apply_wqattrs_commit

but for some reason the mask change is not reflected into rescuers
affinity.

Trying to dig deeper I went ahead and extended the recent wq_dump.py
addition with the following

---
ls/workqueue/wq_dump.py | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tools/workqueue/wq_dump.py b/tools/workqueue/wq_dump.py
index d0df5833f2c18..6da621989e210 100644
--- a/tools/workqueue/wq_dump.py
+++ b/tools/workqueue/wq_dump.py
@@ -175,3 +175,32 @@ for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_(
     if wq.flags & WQ_UNBOUND:
         print(f' {wq.dfl_pwq.pool.id.value_():{max_pool_id_len}}', end='')
     print('')
+
+print('')
+print('Workqueue -> rescuer')
+print('=====================')
+print(f'wq_unbound_cpumask={cpumask_str(wq_unbound_cpumask)}')
+print('')
+print('[    workqueue     \     type            unbound_cpumask     rescuer                  pid   cpumask]')
+
+for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_(), 'list'):
+    print(f'{wq.name.string_().decode()[-24:]:24}', end='')
+    if wq.flags & WQ_UNBOUND:
+        if wq.flags & WQ_ORDERED:
+            print(' ordered   ', end='')
+        else:
+            print(' unbound', end='')
+            if wq.unbound_attrs.affn_strict:
+                print(',S ', end='')
+            else:
+                print('   ', end='')
+        print(f' {cpumask_str(wq.unbound_attrs.cpumask):24}', end='')
+    else:
+        print(' percpu    ', end='')
+        print('                         ', end='')
+
+    if wq.flags & WQ_MEM_RECLAIM:
+        print(f' {wq.rescuer.task.comm.string_().decode()[-24:]:24}', end='')
+        print(f' {wq.rescuer.task.pid.value_():5}', end='')
+        print(f' {cpumask_str(wq.rescuer.task.cpus_ptr)}', end='')
+    print('')
---

which shows the following situation after an

# echo 00,00000003 > /sys/devices/virtual/workqueue/cpumask

on the system I'm testing with:

...
Workqueue -> rescuer
=====================
wq_unbound_cpumask=00000003

[    workqueue     \     type            unbound_cpumask     rescuer                  pid   cpumask]
events                   percpu
events_highpri           percpu
events_long              percpu
events_unbound           unbound    0xffffffff 000000ff
events_freezable         percpu
events_power_efficient   percpu
events_freezable_power_  percpu
rcu_gp                   percpu                              kworker/R-rcu_g              4 0xffffffff 000000ff
rcu_par_gp               percpu                              kworker/R-rcu_p              5 0xffffffff 000000ff
slub_flushwq             percpu                              kworker/R-slub_              6 0xffffffff 000000ff
netns                    ordered    0xffffffff 000000ff      kworker/R-netns              7 0xffffffff 000000ff
mm_percpu_wq             percpu                              kworker/R-mm_pe             13 0xffffffff 000000ff
cpuset_migrate_mm        ordered    0xffffffff 000000ff
inet_frag_wq             percpu                              kworker/R-inet_            300 0xffffffff 000000ff
pm                       percpu
cgroup_destroy           percpu
cgroup_pidlist_destroy   percpu
writeback                unbound    0xffffffff 000000ff      kworker/R-write            308 0xffffffff 000000ff
cgwb_release             percpu
cryptd                   percpu                              kworker/R-crypt            314 0xffffffff 000000ff
kintegrityd              percpu                              kworker/R-kinte            315 0xffffffff 000000ff
kblockd                  percpu                              kworker/R-kbloc            316 0xffffffff 000000ff
kacpid                   percpu
kacpi_notify             percpu
kacpi_hotplug            ordered    0xffffffff 000000ff
kec                      ordered    0xffffffff 000000ff
kec_query                percpu
tpm_dev_wq               percpu                              kworker/R-tpm_d            352 0xffffffff 000000ff
usb_hub_wq               percpu
md                       percpu                              kworker/R-md               353 0xffffffff 000000ff
md_misc                  percpu
md_bitmap                unbound    0xffffffff 000000ff      kworker/R-md_bi            354 0xffffffff 000000ff
edac-poller              ordered    0xffffffff 000000ff      kworker/R-edac-            355 0xffffffff 000000ff
...

I guess I expected wq_unbound_cpumask and unbound_cpumask for each
unbound wq to be kept in sync, so I'm evidently missing details. :)

Can you please help me here understanding what am I missing?

Thanks!
Juri

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Aaron Tomlin 2 years, 1 month ago

On Tue, Dec 12, 2023 at 07:14:48AM -1000, Tejun Heo wrote:
> Hello, Juri.
> 
> On Tue, Dec 12, 2023 at 10:56:02AM +0100, Juri Lelli wrote:
> > Hummm, OK, but in terms of which CPU the rescuer is possibly woken up,
> > how are we making sure that the wake up is always happening on
> > housekeeping CPUs (assuming unbound workqueues have been restricted to
> > those)?
> > 
> > AFAICS, we have
> > 
> > send_mayday ->
> >   wake_up_process(wq->rescuer->task)
> > 
> > which is not affined to the workqueue cpumask it's called to rescue, so
> > in theory can be woken up anywhere?
> 
> Ah, was only thinking about work item execution. Yeah, it's not following
> the isolation rule there and we probably should affine it as we're waking it
> up.

Hi Tejun,

I am confused.

I thought by design we want a rescuer kthread to execute on any CPU, no?


Kind regards,

-- 
Aaron Tomlin

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 1 month ago

On Tue, Dec 12, 2023 at 07:06:48PM +0000, Aaron Tomlin wrote:
> I thought by design we want a rescuer kthread to execute on any CPU, no?

Well, it needs to be able to move around because it dynamically attaches to
the worker pool it's rescuing and needs to take on its cpumask, but it
doesn't have to be able to run on all cpus all the time.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 6 months ago

Hello,

On Sat, Jul 29, 2023 at 02:53:32PM +0100, Aaron Tomlin wrote:
> It can be useful to identify a rescue kworker since their CPU affinity
> cannot be modified and their initial CPU assignment can be safely ignored.

You really shouldn't be setting affinities on kworkers manually. There's no
way of knowing which kworker is going to execute which workqueue. Please use
the attributes API and sysfs interface to modify per-workqueue worker
attributes. If that's not sufficient and you need finer grained control, the
right thing to do is using kthread_worker which gives you a dedicated
kthread that you can manipulate as appropriate.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Aaron Tomlin 2 years, 6 months ago

> You really shouldn't be setting affinities on kworkers manually. There's
> no way of knowing which kworker is going to execute which workqueue.
> Please use the attributes API and sysfs interface to modify per-workqueue
> worker attributes. If that's not sufficient and you need finer grained
> control, the right thing to do is using kthread_worker which gives you a
> dedicated kthread that you can manipulate as appropriate.

Hi Tejun,

I completely agree. Each kworker has PF_NO_SETAFFINITY applied anyway.
If I understand correctly, only an unbound kworker can have their CPU
affinity modified via sysfs. The objective of this series was to easily
identify a rescuer kworker from user-mode.


Kind regards,
-- 
Aaron Tomlin

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 6 months ago

Hello,

On Tue, Aug 01, 2023 at 11:53:01AM +0100, Aaron Tomlin wrote:
> > You really shouldn't be setting affinities on kworkers manually. There's
> > no way of knowing which kworker is going to execute which workqueue.
> > Please use the attributes API and sysfs interface to modify per-workqueue
> > worker attributes. If that's not sufficient and you need finer grained
> > control, the right thing to do is using kthread_worker which gives you a
> > dedicated kthread that you can manipulate as appropriate.
> 
> I completely agree. Each kworker has PF_NO_SETAFFINITY applied anyway.
> If I understand correctly, only an unbound kworker can have their CPU
> affinity modified via sysfs. The objective of this series was to easily
> identify a rescuer kworker from user-mode.

But why do you need to identify rescue workers? What are you trying to
achieve?

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Aaron Tomlin 2 years, 6 months ago

> But why do you need to identify rescue workers? What are you trying to
> achieve?

Hi Tejun,

I had a conversation with a colleague of mine. It can be useful to identify
and account for all kernel threads. From the perspective of user-mode, the
name given currently to the rescuer kworker is ambiguous. For instance,
"kworker/u16:9-kcryptd/253:0" is clearly identifiable as an unbound kworker
for the specified workqueue which can have their CPU affinity adjusted as
you mentioned before. I think if we followed the same naming convention
for a rescuer kworker then it would be more consistent. I'll send a patch
so it can be discussed further.


Kind regards,
-- 
Aaron Tomlin

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Tejun Heo 2 years, 6 months ago

On Thu, Aug 03, 2023 at 09:19:14PM +0100, Aaron Tomlin wrote:
> > But why do you need to identify rescue workers? What are you trying to
> > achieve?
> 
> Hi Tejun,
> 
> I had a conversation with a colleague of mine. It can be useful to identify
> and account for all kernel threads. From the perspective of user-mode, the
> name given currently to the rescuer kworker is ambiguous. For instance,
> "kworker/u16:9-kcryptd/253:0" is clearly identifiable as an unbound kworker
> for the specified workqueue which can have their CPU affinity adjusted as

Note that the name changes to the work item the worker is currently
executing. It won't stay that way. Workers are shared across the workqueues,
so I'm not sure "identify and account all kernel threads" is working as well
as you think it is.

> you mentioned before. I think if we followed the same naming convention
> for a rescuer kworker then it would be more consistent. I'll send a patch
> so it can be discussed further.

We can certainly rename them to indicate that they are rescuers - e.g. maybe
krescuer? But, at the moment, the proposed reason seems rather dubious.

Thanks.

-- 
tejun

Re: [RFC PATCH 0/2] workqueue: Introduce PF_WQ_RESCUE_WORKER

Posted by Aaron Tomlin 2 years, 6 months ago

> Note that the name changes to the work item the worker is currently
> executing. It won't stay that way. Workers are shared across the
> workqueues, so I'm not sure "identify and account all kernel threads" is
> working as well as you think it is.

Hi Tejun,

Indeed. The point is that these kworker kthreads are easily identifiable.

> We can certainly rename them to indicate that they are rescuers - e.g.
> maybe krescuer? But, at the moment, the proposed reason seems rather
> dubious.

Personally, I would prefer "kworker/r-%s" and then include the specified
workqueue's name e.g. "kworker/r-ext4-rsv-conver". So the rescuer task's
name is more consistent with the current naming scheme.
I will send a follow up patch.


Kind regards,

-- 
Aaron Tomlin