From: Chen Ridong <chenridong@huawei.com>
If something is expected to generate large number of concurrent works,
it should utilize its own dedicated workqueue rather than system wq.
Because this may saturate system_wq and potentially block other's works.
eg, cgroup release work. Let's document this as a note.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
Documentation/core-api/workqueue.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index 16f861c9791e..9de622188f2f 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -357,6 +357,11 @@ Guidelines
difference in execution characteristics between using a dedicated wq
and a system wq.
+ Note: If something is expected to generate a large number of concurrent
+ works, it should utilize its own dedicated workqueue rather than
+ system wq. Because this may saturate system_wq and potentially lead
+ to deadlock.
+
* Unless work items are expected to consume a huge amount of CPU
cycles, using a bound wq is usually beneficial due to the increased
level of locality in wq operations and work item execution.
--
2.34.1
On Mon, Sep 23, 2024 at 11:43:51AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: > + Note: If something is expected to generate a large number of concurrent > + works, it should utilize its own dedicated workqueue rather than > + system wq. Because this may saturate system_wq and potentially lead > + to deadlock. How does "large number of concurrent" translate practically? The example with released cgroup_bpf from cgroup_destroy_locked cgroup_bpf_offline which is serialized under cgroup_mutex as argued previously. So this generates a single entry at a time and it wouldn't hint towards the creation of cgroup_bpf_destroy_wq. I reckon the argument could be something like the processing rate vs production rate of entry items should be such that number of active items is bound. But I'm not sure it's practical since users may not know the comparison result and they would end up always creating a dedicated workqueue. Michal
On 2024/9/26 20:49, Michal Koutný wrote: > On Mon, Sep 23, 2024 at 11:43:51AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: >> + Note: If something is expected to generate a large number of concurrent >> + works, it should utilize its own dedicated workqueue rather than >> + system wq. Because this may saturate system_wq and potentially lead >> + to deadlock. > > How does "large number of concurrent" translate practically? > > The example with released cgroup_bpf from > cgroup_destroy_locked > cgroup_bpf_offline > which is serialized under cgroup_mutex as argued previously. So this > generates a single entry at a time and it wouldn't hint towards the > creation of cgroup_bpf_destroy_wq. > > I reckon the argument could be something like the processing rate vs > production rate of entry items should be such that number of active > items is bound. But I'm not sure it's practical since users may not know > the comparison result and they would end up always creating a dedicated > workqueue. > > > Michal Thank you, Michal. I think it's difficult to measure the comparison result. Actually, if something generates work at a high frequency, it would be better to use dedicated wq. How about: Note: If something may generate works frequently, it may saturate the system_wq and potentially lead to deadlock. It should utilize its own dedicated workqueue rather than system wq. Best regards, Ridong
Hi. On Fri, Sep 27, 2024 at 04:08:26PM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: > How about: > Note: If something may generate works frequently, it may saturate the > system_wq and potentially lead to deadlock. It should utilize its own > dedicated workqueue rather than system wq. It doesn't depend only on generating frequency (in Tetsuo's example with slow works, the "high" would only be 256/s) and accurate information is likely only empirical, thus I'd refine it further: > Note: If something may generate more than @max_active outstanding > work items (do stress test your producers), it may saturate a system > wq and potentially lead to deadlock. It should utilize its own > dedicated workqueue rather than the system wq. (besides @max_active reference, I also changed generic system_wq to system wq as the surrounding text seems to refer to any of the system_*wq) Michal
On 2024/9/30 20:50, Michal Koutný wrote: > Hi. > > On Fri, Sep 27, 2024 at 04:08:26PM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: >> How about: >> Note: If something may generate works frequently, it may saturate the >> system_wq and potentially lead to deadlock. It should utilize its own >> dedicated workqueue rather than system wq. > > It doesn't depend only on generating frequency (in Tetsuo's example with > slow works, the "high" would only be 256/s) and accurate information is > likely only empirical, thus I'd refine it further: > >> Note: If something may generate more than @max_active outstanding >> work items (do stress test your producers), it may saturate a system >> wq and potentially lead to deadlock. It should utilize its own >> dedicated workqueue rather than the system wq. > > (besides @max_active reference, I also changed generic system_wq to > system wq as the surrounding text seems to refer to any of the > system_*wq) > > Michal Thank you, Michal. I took a week off. I will update soon. Best regards, Ridong.
© 2016 - 2024 Red Hat, Inc.