[PATCH v2] workqueue: introduce queue_delayed_work_on_offline_safe.

Imran Khan posted 1 patch 10 months, 1 week ago
include/linux/workqueue.h |  3 +++
kernel/workqueue.c        | 42 +++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
[PATCH v2] workqueue: introduce queue_delayed_work_on_offline_safe.
Posted by Imran Khan 10 months, 1 week ago
Currently users of queue_delayed_work_on, need to ensure
that specified cpu is and remains online. The failure to
do so may result in delayed_work getting queued on an
offlined cpu and hence never getting executed.

The current users of queue_delayed_work_on, seem to ensure
the above mentioned criteria but for those, unknown amongst
current users or new users, who can't confirm to this
we need another interface.

So introduce queue_delayed_work_on_offline_safe, which
is a wrapper around queue_delayed_work_one to ensure that
the specified cpu is and remains online.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---
v1 --> v2:
  - Remove RFC tag
  - For cases where dwork can't be put on specified CPU,
    let caller decide the next CPU to try with.

 include/linux/workqueue.h |  3 +++
 kernel/workqueue.c        | 42 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index b0dc957c3e560..cefcf9e89be6f 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
 			    struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_on_offline_safe(int cpu,
+			struct workqueue_struct *wq, struct delayed_work *work,
+			unsigned long delay, bool *online);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay);
 extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9362484a653c4..b3c030e6c6b17 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2565,6 +2565,48 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 }
 EXPORT_SYMBOL(queue_delayed_work_on);
 
+/**
+ * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
+ *					delay,
+ *
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ * @online: online status of @cpu, for caller
+ *
+ * a wrapper, around queue_delayed_work_on, that checks and ensures that
+ * specified @cpu is online. If @cpu is found to be offline or if its online
+ * status can't be reliably determined, set @online to false and return
+ * false, leaving the decision, of selecting new cpu for delayed_work, to
+ * the caller.
+ *
+ * If caller sees @online as false, it can try submitting work on a
+ * different @cpu, but if it sees @online as true, it can check the return
+ * value to determine if the work was really submitted or not.
+ */
+bool queue_delayed_work_on_offline_safe(int cpu, struct workqueue_struct *wq,
+			   struct delayed_work *dwork, unsigned long delay,
+			   bool *online)
+{
+	bool ret = false;
+	int locked = cpus_read_trylock();
+
+	if (locked && cpu_online(cpu)) {
+		ret = queue_delayed_work_on(cpu, wq, dwork, delay);
+		*online = true;
+	} else {
+		*online = false;
+	}
+
+	if (locked)
+		cpus_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL(queue_delayed_work_on_offline_safe);
+
+
 /**
  * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
  * @cpu: CPU number to execute work on

base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532
-- 
2.34.1
Re: [PATCH v2] workqueue: introduce queue_delayed_work_on_offline_safe.
Posted by Haakon Bugge 10 months, 1 week ago
Hi Imran,

Looks good to me. Just two minor NITs.

$Subject shall not be terminated with a dot (".").

> On 4 Feb 2025, at 06:44, Imran Khan <imran.f.khan@oracle.com> wrote:
> Currently users of queue_delayed_work_on, need to ensure
> that specified cpu is and remains online. The failure to
> do so may result in delayed_work getting queued on an
> offlined cpu and hence never getting executed.
> 
> The current users of queue_delayed_work_on, seem to ensure
> the above mentioned criteria but for those, unknown amongst
> current users or new users, who can't confirm to this
> we need another interface.
> 
> So introduce queue_delayed_work_on_offline_safe, which
> is a wrapper around queue_delayed_work_one to ensure that

s/queue_delayed_work_one/queue_delayed_work_on/

Otherwise, looks good, hence:

Acked-by: Håkon Bugge <haakon.bugge@oracle.com>


Thxs, Håkon


> the specified cpu is and remains online.
> 
> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
> ---
> v1 --> v2:
>  - Remove RFC tag
>  - For cases where dwork can't be put on specified CPU,
>    let caller decide the next CPU to try with.
> 
> include/linux/workqueue.h |  3 +++
> kernel/workqueue.c        | 42 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
> 
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index b0dc957c3e560..cefcf9e89be6f 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -589,6 +589,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
> 			    struct work_struct *work);
> extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> 			struct delayed_work *work, unsigned long delay);
> +extern bool queue_delayed_work_on_offline_safe(int cpu,
> +			struct workqueue_struct *wq, struct delayed_work *work,
> +			unsigned long delay, bool *online);
> extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
> 			struct delayed_work *dwork, unsigned long delay);
> extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9362484a653c4..b3c030e6c6b17 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2565,6 +2565,48 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> }
> EXPORT_SYMBOL(queue_delayed_work_on);
> 
> +/**
> + * queue_delayed_work_on_offline_safe - queue work on specific online CPU after
> + *					delay,
> + *
> + * @cpu: CPU number to execute work on
> + * @wq: workqueue to use
> + * @dwork: work to queue
> + * @delay: number of jiffies to wait before queueing
> + * @online: online status of @cpu, for caller
> + *
> + * a wrapper, around queue_delayed_work_on, that checks and ensures that
> + * specified @cpu is online. If @cpu is found to be offline or if its online
> + * status can't be reliably determined, set @online to false and return
> + * false, leaving the decision, of selecting new cpu for delayed_work, to
> + * the caller.
> + *
> + * If caller sees @online as false, it can try submitting work on a
> + * different @cpu, but if it sees @online as true, it can check the return
> + * value to determine if the work was really submitted or not.
> + */
> +bool queue_delayed_work_on_offline_safe(int cpu, struct workqueue_struct *wq,
> +			   struct delayed_work *dwork, unsigned long delay,
> +			   bool *online)
> +{
> +	bool ret = false;
> +	int locked = cpus_read_trylock();
> +
> +	if (locked && cpu_online(cpu)) {
> +		ret = queue_delayed_work_on(cpu, wq, dwork, delay);
> +		*online = true;
> +	} else {
> +		*online = false;
> +	}
> +
> +	if (locked)
> +		cpus_read_unlock();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(queue_delayed_work_on_offline_safe);
> +
> +
> /**
>  * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
>  * @cpu: CPU number to execute work on
> 
> base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532
> --
> 2.34.1
> 
>