[v1] workqueue: introduce queue_delayed_work_on_offline_safe.

[RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by Imran Khan 1 year ago

Currently users of queue_delayed_work_on, need to ensure
that specified cpu is and remains online. The failure to
do so may result in delayed_work getting queued on an
offlined cpu and hence never getting executed.

The current users of queue_delayed_work_on, seem to ensure
the above mentioned criteria but for those, unknown amongst
current users or new users, who can't confirm to this
we need another interface.

So introduce queue_delayed_work_on_offline_safe, which
explicitly ensures that specified cpu is and remains online.
If the specified cpu is already offline or if that can't be
confirmed (due to failure in acquiring hotplug lock),
delayed_work.timer is not queued on specified cpu.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---
I have kept the patch as RFC because from mailing list,
I could not find any users, of queue_delayed_work_on,
that is ending up queuing dwork on an offlined CPU.
We have some in-house code that is running into this problem,
and currently we are fixing it on caller side of queue_delayed_work_on.
Other users who run into this issue, can also use the approach of
fixing it on caller side or we can use the interface introduced
here for such use cases.

 include/linux/workqueue.h |  3 ++
 kernel/workqueue.c        | 67 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index b0dc957c3e560..57f39807f3bf1 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
 			    struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_on_offline_safe(int cpu,
+			struct workqueue_struct *wq, struct delayed_work *work,
+			unsigned long delay);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay);
 extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8e0bb3c608239..f1b6320f675a6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2487,7 +2487,8 @@ void delayed_work_timer_fn(struct timer_list *t)
 EXPORT_SYMBOL(delayed_work_timer_fn);
 
 static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
-				struct delayed_work *dwork, unsigned long delay)
+				struct delayed_work *dwork, unsigned long delay,
+				bool offline_safe)
 {
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
@@ -2521,8 +2522,22 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	} else {
 		if (likely(cpu == WORK_CPU_UNBOUND))
 			add_timer_global(timer);
-		else
+		else if (likely(!offline_safe))
 			add_timer_on(timer, cpu);
+		else {	/* offline_safe */
+			if (unlikely(!cpu_online(cpu)))
+				/* cpu is already offline*/
+				add_timer_global(timer);
+			else if (cpus_read_trylock()) {
+				if (likely(cpu_online(cpu)))
+					add_timer_on(timer, cpu);
+				else
+					add_timer_global(timer);
+				cpus_read_unlock();
+			} else
+				/* could not get hotplug lock*/
+				add_timer_global(timer);
+		}
 	}
 }
 
@@ -2549,7 +2564,7 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) &&
 	    !clear_pending_if_disabled(work)) {
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		__queue_delayed_work(cpu, wq, dwork, delay, false);
 		ret = true;
 	}
 
@@ -2558,6 +2573,50 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 }
 EXPORT_SYMBOL(queue_delayed_work_on);
 
+/**
+ * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
+ *					delay,
+ *
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * same as queue_delayed_work_on, but checks and ensures that specified @cpu
+ * is online. If @cpu is found to be offline or if its online status can't
+ * be confirmed queue @dwork on any CPU.
+ * cpus_read_trylock is used to ensure @cpu remains online, but we don't block
+ * if hptplug lock can't be taken. So there is good chance that even when
+ * specified @cpu was online, we queued @dwork to some other CPU, because
+ * cpu_read_trylock returned failure. On the plus side this interface
+ * makes sure that we don't endup putting @dwork on offlined @cpu and
+ * thus possibly losing it forever.
+ *
+ * Return: %false if @work was already on a queue, %true otherwise.  If
+ * @delay is zero and @dwork is idle, it will be scheduled for immediate
+ * execution.
+ */
+bool queue_delayed_work_on_offline_safe(int cpu, struct workqueue_struct *wq,
+			   struct delayed_work *dwork, unsigned long delay)
+{
+	struct work_struct *work = &dwork->work;
+	bool ret = false;
+	unsigned long irq_flags;
+
+	/* read the comment in __queue_work() */
+	local_irq_save(irq_flags);
+
+	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) &&
+	    !clear_pending_if_disabled(work)) {
+		__queue_delayed_work(cpu, wq, dwork, delay, true);
+		ret = true;
+	}
+
+	local_irq_restore(irq_flags);
+	return ret;
+}
+EXPORT_SYMBOL(queue_delayed_work_on_offline_safe);
+
 /**
  * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
  * @cpu: CPU number to execute work on
@@ -2585,7 +2644,7 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	ret = work_grab_pending(&dwork->work, WORK_CANCEL_DELAYED, &irq_flags);
 
 	if (!clear_pending_if_disabled(&dwork->work))
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		__queue_delayed_work(cpu, wq, dwork, delay, false);
 
 	local_irq_restore(irq_flags);
 	return ret;

base-commit: 2b88851f583d3c4e40bcd40cfe1965241ec229dd
-- 
2.34.1

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by Tejun Heo 1 year ago

On Mon, Jan 13, 2025 at 03:35:40PM +1100, Imran Khan wrote:
...
> I have kept the patch as RFC because from mailing list,
> I could not find any users, of queue_delayed_work_on,
> that is ending up queuing dwork on an offlined CPU.
> We have some in-house code that is running into this problem,
> and currently we are fixing it on caller side of queue_delayed_work_on.
> Other users who run into this issue, can also use the approach of
> fixing it on caller side or we can use the interface introduced
> here for such use cases.

I'm not sure how necessary this is. If the timer is okay to run on other
CPUs, might as well just use queue_delayed_work().

Thanks.

-- 
tejun

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by imran.f.khan@oracle.com 1 year ago

Hello Tejun,
Thanks for taking a look into it.
On 14/1/2025 5:21 am, Tejun Heo wrote:
> On Mon, Jan 13, 2025 at 03:35:40PM +1100, Imran Khan wrote:
> ...
>> I have kept the patch as RFC because from mailing list,
>> I could not find any users, of queue_delayed_work_on,
>> that is ending up queuing dwork on an offlined CPU.
>> We have some in-house code that is running into this problem,
>> and currently we are fixing it on caller side of queue_delayed_work_on.
>> Other users who run into this issue, can also use the approach of
>> fixing it on caller side or we can use the interface introduced
>> here for such use cases.
> 
> I'm not sure how necessary this is. If the timer is okay to run on other
> CPUs, might as well just use queue_delayed_work().
> 

Yes, right now I can't locate something in upstream kernel that gets
broken due to the issue mentioned here.
All (except 3, mentioned further down) users of queued_delayed_work_on
are using smp_processor_id(), to specify the CPU. So specified CPU can't
be an already offlined CPU.

I see below 3 files (in v6.12.6), using queue_delayed_work_on with some sort of cached
cpu information:

  drivers/net/ethernet/pensando/ionic/ionic_dev.c -> line 177
  drivers/scsi/esas2r/esas2r_main.c -> line 1858
  drivers/scsi/lpfc/lpfc_sli.c
       -> line 14987
       -> line 15381 
But looks like in these cases specified CPU remains online or
they simply have not encountered the issue mentioned here.


For this patch, yes the timer is okay to run on other CPUs but that is
only as a last resort, most of the times it could still run on specified
CPU (assuming its online)

Thanks,
Imran



> Thanks.
>

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by imran.f.khan@oracle.com 1 year ago

Hello Tejun,

On 14/1/2025 2:01 pm, imran.f.khan@oracle.com wrote:
> Hello Tejun,
> Thanks for taking a look into it.
> On 14/1/2025 5:21 am, Tejun Heo wrote:
>> On Mon, Jan 13, 2025 at 03:35:40PM +1100, Imran Khan wrote:
>> ...
>>> I have kept the patch as RFC because from mailing list,
>>> I could not find any users, of queue_delayed_work_on,
>>> that is ending up queuing dwork on an offlined CPU.
>>> We have some in-house code that is running into this problem,
>>> and currently we are fixing it on caller side of queue_delayed_work_on.
>>> Other users who run into this issue, can also use the approach of
>>> fixing it on caller side or we can use the interface introduced
>>> here for such use cases.
>>
>> I'm not sure how necessary this is. If the timer is okay to run on other
>> CPUs, might as well just use queue_delayed_work().
>>

Could you kindly let me know, if it would be acceptable, to have 
queue_delayed_work_on_offline_safe, as a wrapper around 
queue_delayed_work_on, such that it can check and ensure CPU's
availability. If it can't, then it can simply return false and let
caller decide which cpu to use next. Something like below:


diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index b0dc957c3e560..57f39807f3bf1 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
                            struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
                        struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_on_offline_safe(int cpu,
+                       struct workqueue_struct *wq, struct delayed_work *work,
+                       unsigned long delay);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
                        struct delayed_work *dwork, unsigned long delay);
 extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9362484a653c4..7d3b8050422e4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2565,6 +2565,37 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 }
 EXPORT_SYMBOL(queue_delayed_work_on);

+/**
+ * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
+ *                                     delay,
+ *
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * a wrapper, around queue_delayed_work_on, that checks and ensures that
+ * specified @cpu is online. If @cpu is found to be offline or if its online
+ * status can't be reliably determined, return false and leave the decision,
+ * of selecting new cpu for delayed_work, to caller.
+ *
+ */
+bool queue_delayed_work_on_offline_safe(int cpu, struct workqueue_struct *wq,
+                          struct delayed_work *dwork, unsigned long delay)
+{
+       bool ret = false;
+       int locked = 0;
+
+       if ((locked = cpus_read_trylock()) && cpu_online(cpu))
+               ret = queue_delayed_work_on(cpu, wq, dwork, delay);
+       else if (locked)
+               cpus_read_unlock();
+
+       return ret;
+}
+EXPORT_SYMBOL(queue_delayed_work_on_offline_safe);
+
+

If this looks acceptable to you, I can send a v2 of earlier patch.

Thanks,
Imran

> 
> Yes, right now I can't locate something in upstream kernel that gets
> broken due to the issue mentioned here.
> All (except 3, mentioned further down) users of queued_delayed_work_on
> are using smp_processor_id(), to specify the CPU. So specified CPU can't
> be an already offlined CPU.
> 
> I see below 3 files (in v6.12.6), using queue_delayed_work_on with some sort of cached
> cpu information:
> 
>   drivers/net/ethernet/pensando/ionic/ionic_dev.c -> line 177
>   drivers/scsi/esas2r/esas2r_main.c -> line 1858
>   drivers/scsi/lpfc/lpfc_sli.c
>        -> line 14987
>        -> line 15381 
> But looks like in these cases specified CPU remains online or
> they simply have not encountered the issue mentioned here.
> 
> 
> For this patch, yes the timer is okay to run on other CPUs but that is
> only as a last resort, most of the times it could still run on specified
> CPU (assuming its online)
> 
> Thanks,
> Imran
> 
> 
> 
>> Thanks.
>>
>

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by imran.f.khan@oracle.com 1 year ago

Hello again Tejun,
sorry, just found one mistake in earlier shared patch, it
missed an unlock done below:

Thanks,
Imran
On 31/1/2025 9:37 pm, imran.f.khan@oracle.com wrote:
> Hello Tejun,
> 
[...]
> 
> Could you kindly let me know, if it would be acceptable, to have 
> queue_delayed_work_on_offline_safe, as a wrapper around 
> queue_delayed_work_on, such that it can check and ensure CPU's
> availability. If it can't, then it can simply return false and let
> caller decide which cpu to use next. Something like below:
> 
> 
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index b0dc957c3e560..57f39807f3bf1 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
>                             struct work_struct *work);
>  extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
>                         struct delayed_work *work, unsigned long delay);
> +extern bool queue_delayed_work_on_offline_safe(int cpu,
> +                       struct workqueue_struct *wq, struct delayed_work *work,
> +                       unsigned long delay);
>  extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
>                         struct delayed_work *dwork, unsigned long delay);
>  extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9362484a653c4..7d3b8050422e4 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2565,6 +2565,37 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
>  }
>  EXPORT_SYMBOL(queue_delayed_work_on);
> 
> +/**
> + * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
> + *                                     delay,
> + *
> + * @cpu: CPU number to execute work on
> + * @wq: workqueue to use
> + * @dwork: work to queue
> + * @delay: number of jiffies to wait before queueing
> + *
> + * a wrapper, around queue_delayed_work_on, that checks and ensures that
> + * specified @cpu is online. If @cpu is found to be offline or if its online
> + * status can't be reliably determined, return false and leave the decision,
> + * of selecting new cpu for delayed_work, to caller.
> + *
> + */
> +bool queue_delayed_work_on_offline_safe(int cpu, struct workqueue_struct *wq,
> +                          struct delayed_work *dwork, unsigned long delay)
> +{
> +       bool ret = false;
> +       int locked = 0;
> +
> +       if ((locked = cpus_read_trylock()) && cpu_online(cpu)) {
> +               ret = queue_delayed_work_on(cpu, wq, dwork, delay);
                  cpus_read_unlock(); 
> +       } else if (locked)
> +               cpus_read_unlock();
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL(queue_delayed_work_on_offline_safe);
> +
> +
> 
> If this looks acceptable to you, I can send a v2 of earlier patch.
> 
> Thanks,
> Imran
>

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by Haakon Bugge 1 year ago


On 31 Jan 2025, at 12:16, Imran Khan <imran.f.khan@oracle.com> wrote:
> Hello again Tejun,
> sorry, just found one mistake in earlier shared patch, it
> missed an unlock done below:
> 
> Thanks,
> Imran
> On 31/1/2025 9:37 pm, imran.f.khan@oracle.com wrote:
> Hello Tejun,
> 
> [...]
> 
> Could you kindly let me know, if it would be acceptable, to have
> queue_delayed_work_on_offline_safe, as a wrapper around
> queue_delayed_work_on, such that it can check and ensure CPU's
> availability. If it can't, then it can simply return false and let
> caller decide which cpu to use next. Something like below:
> 
> 
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index b0dc957c3e560..57f39807f3bf1 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
>                            struct work_struct *work);
> extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
>                        struct delayed_work *work, unsigned long delay);
> +extern bool queue_delayed_work_on_offline_safe(int cpu,
> +                       struct workqueue_struct *wq, struct delayed_work *work,
> +                       unsigned long delay);

Hi Imran,


I am not quite sure this signature will be OK. See below.

> extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
>                        struct delayed_work *dwork, unsigned long delay);
> extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9362484a653c4..7d3b8050422e4 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2565,6 +2565,37 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> }
> EXPORT_SYMBOL(queue_delayed_work_on);
> 
> +/**
> + * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
> + *                                     delay,
> + *
> + * @cpu: CPU number to execute work on
> + * @wq: workqueue to use
> + * @dwork: work to queue
> + * @delay: number of jiffies to wait before queueing
> + *
> + * a wrapper, around queue_delayed_work_on, that checks and ensures that
> + * specified @cpu is online. If @cpu is found to be offline or if its online
> + * status can't be reliably determined, return false and leave the decision,
> + * of selecting new cpu for delayed_work, to caller.

The return value here is ambiguous.

> + *
> + */
> +bool queue_delayed_work_on_offline_safe(int cpu, struct workqueue_struct *wq,
> +                          struct delayed_work *dwork, unsigned long delay)
> +{
> +       bool ret = false;
> +       int locked = 0;
> +
> +       if ((locked = cpus_read_trylock()) && cpu_online(cpu)) {
> +               ret = queue_delayed_work_on(cpu, wq, dwork, delay);

Now, ret will be false if the work was already queued.

>                  cpus_read_unlock();
> +       } else if (locked)
> +               cpus_read_unlock();
> +
> +       return ret;

If false is returned, it is a) because the designated @cpu was reliable detected online but the work was already queued or b) because the designated @cpu could not reliable be detected online.

Hence, I think you need to distinguish these cases. I suggest to keep the bool return value to mean what it does for queue_delayed_work_on() and add a bool pointer as argument which can be used to determine reliable online detection of @cpu or not.

A nit is that you should have braces on both the if/else clauses is any of them have it. But even simpler, do not use "else if" but an ordinary "if (locked)", and this way, you can remove the first cpus_read_unlock().


Thxs, Håkon


> +}
> +EXPORT_SYMBOL(queue_delayed_work_on_offline_safe);
> +
> +
> 
> If this looks acceptable to you, I can send a v2 of earlier patch.
> 
> Thanks,
> Imran

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by imran.f.khan@oracle.com 1 year ago

Hello Haakon,
Thanks a lot for providing your feedback.

On 3/2/2025 10:48 pm, Haakon Bugge wrote:
> 
> 
> On 31 Jan 2025, at 12:16, Imran Khan <imran.f.khan@oracle.com> wrote:
>> Hello again Tejun,
>> sorry, just found one mistake in earlier shared patch, it
>> missed an unlock done below:
>>
>> Thanks,
>> Imran
>> On 31/1/2025 9:37 pm, imran.f.khan@oracle.com wrote:
>> Hello Tejun,
>>
>> [...]
>>
>> Could you kindly let me know, if it would be acceptable, to have
>> queue_delayed_work_on_offline_safe, as a wrapper around
>> queue_delayed_work_on, such that it can check and ensure CPU's
>> availability. If it can't, then it can simply return false and let
>> caller decide which cpu to use next. Something like below:
>>
>>
>> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
>> index b0dc957c3e560..57f39807f3bf1 100644
>> --- a/include/linux/workqueue.h
>> +++ b/include/linux/workqueue.h
>> @@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
>>                            struct work_struct *work);
>> extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
>>                        struct delayed_work *work, unsigned long delay);
>> +extern bool queue_delayed_work_on_offline_safe(int cpu,
>> +                       struct workqueue_struct *wq, struct delayed_work *work,
>> +                       unsigned long delay);
> 
> Hi Imran,
> 
> 
> I am not quite sure this signature will be OK. See below.
> 
>> extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
>>                        struct delayed_work *dwork, unsigned long delay);
>> extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 9362484a653c4..7d3b8050422e4 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -2565,6 +2565,37 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
>> }
>> EXPORT_SYMBOL(queue_delayed_work_on);
>>
>> +/**
>> + * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
>> + *                                     delay,
>> + *
>> + * @cpu: CPU number to execute work on
>> + * @wq: workqueue to use
>> + * @dwork: work to queue
>> + * @delay: number of jiffies to wait before queueing
>> + *
>> + * a wrapper, around queue_delayed_work_on, that checks and ensures that
>> + * specified @cpu is online. If @cpu is found to be offline or if its online
>> + * status can't be reliably determined, return false and leave the decision,
>> + * of selecting new cpu for delayed_work, to caller.
> 
> The return value here is ambiguous.
>
Agree. Thanks for pointing this out. I have modified this patch,
as per your suggestion and have sent a v2 of my earlier approach.
Please see [1].

Could you please have a look and let me know your opinion.

Thanks,
Imran

[1]: https://lore.kernel.org/all/20250204054404.268888-1-imran.f.khan@oracle.com/

Re: [RFC PATCH] workqueue: introduce queue_delayed_work_on_offline_safe.

Posted by imran.f.khan@oracle.com 1 year ago

Hello everyone,

Firstly, apologies If I have added too many people. 
I have used maintainer list for drivers mentioned in [1] and 
further mentioned below in [2], [3], [4] and [5].

To give some context, in this thread we are discussing the problem
of, queue_delayed_work_on (with non zero delay) putting the work
on an (already), offlined CPU and that work not getting executed or at least
not getting executed timely because the underlying timer would not
fire on an offlined cpu.
We observed this issue with some in-house changes in RDS code, where some
preferred-cpu was being used to queue the delayed_work , but it was offline
and as a result corresponding work item remained stuck

I could not locate any reports of similar issue in the mailing list but I
see that [2], [4] and [5]are also using some sort of cached/preferred cpu
information to queue the delayed_work on.
So could you please confirm if you have faced issue, similar to one being described
here and if not how are you ensuring that selected cpu is and remains online.

For example [2] is using a preferred-cpu, obtained from [3], but [3] is using
"cpu_online_mask" to confirm that cpu is online. But can't it happen that
cpu was seen as online in cpu_online_mask by [3], but by the time, timer
of delayed_work got added this cpu was offline. I understand that race window
is very slow and very unlikely but it still looks racy or am I understanding
something totally wrong.

Similarly, [4] and [5] are using cq->chann as cpu for queueing delayed_work on,
but its not clear how its ensured that these cpus are online. Is there some
implicit semantics,that is confirming that cq->chann is and remains online,
during submission of delayed_work.

I will very thankful, if you could clarify the above queries.

Thanks,
Imran

[1]: https://lore.kernel.org/all/eab73ac2-d9eb-4ac7-9e0f-9a02b81a31bb@oracle.com/
[2]: https://elixir.bootlin.com/linux/v6.12.6/source/drivers/net/ethernet/pensando/ionic/ionic_dev.c#L177
[3]: https://elixir.bootlin.com/linux/v6.12.6/source/drivers/net/ethernet/pensando/ionic/ionic_dev.c#L76
[4]: https://elixir.bootlin.com/linux/v6.12.6/source/drivers/scsi/lpfc/lpfc_sli.c#L14987
[5]: https://elixir.bootlin.com/linux/v6.12.6/source/drivers/scsi/lpfc/lpfc_sli.c#L15381

On 14/1/2025 2:01 pm, imran.f.khan@oracle.com wrote:
> Hello Tejun,
> Thanks for taking a look into it.
> On 14/1/2025 5:21 am, Tejun Heo wrote:
>> On Mon, Jan 13, 2025 at 03:35:40PM +1100, Imran Khan wrote:
>> ...
>>> I have kept the patch as RFC because from mailing list,
>>> I could not find any users, of queue_delayed_work_on,
>>> that is ending up queuing dwork on an offlined CPU.
>>> We have some in-house code that is running into this problem,
>>> and currently we are fixing it on caller side of queue_delayed_work_on.
>>> Other users who run into this issue, can also use the approach of
>>> fixing it on caller side or we can use the interface introduced
>>> here for such use cases.
>>
>> I'm not sure how necessary this is. If the timer is okay to run on other
>> CPUs, might as well just use queue_delayed_work().
>>
> 
> Yes, right now I can't locate something in upstream kernel that gets
> broken due to the issue mentioned here.
> All (except 3, mentioned further down) users of queued_delayed_work_on
> are using smp_processor_id(), to specify the CPU. So specified CPU can't
> be an already offlined CPU.
> 
> I see below 3 files (in v6.12.6), using queue_delayed_work_on with some sort of cached
> cpu information:
> 
>   drivers/net/ethernet/pensando/ionic/ionic_dev.c -> line 177
>   drivers/scsi/esas2r/esas2r_main.c -> line 1858
>   drivers/scsi/lpfc/lpfc_sli.c
>        -> line 14987
>        -> line 15381 
> But looks like in these cases specified CPU remains online or
> they simply have not encountered the issue mentioned here.
> 
> 
> For this patch, yes the timer is okay to run on other CPUs but that is
> only as a last resort, most of the times it could still run on specified
> CPU (assuming its online)
> 
> Thanks,
> Imran
> 
> 
> 
>> Thanks.
>>
>