From nobody Mon Apr 6 18:06:11 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DA70C433FE for ; Tue, 4 Oct 2022 15:06:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229908AbiJDPGN (ORCPT ); Tue, 4 Oct 2022 11:06:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbiJDPGH (ORCPT ); Tue, 4 Oct 2022 11:06:07 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E59C05A8A9 for ; Tue, 4 Oct 2022 08:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664895966; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q9IkfNrJjAEpG1neD9g4eIbNDeR16z8tF/Uf9UeR0cE=; b=KwtxrRwgamitWOPCviUpDM9luXcxXdapHeLgh+ucX6vPny6mrbxSILo2DfrhdPOyfz/vyr uyeSWMh4fYJzdoEhmSQ0ShSOSXjEEhHlI4sre4MWwk/0GALi+vymCosgZok0kI2kO97FnV qAlJE4Hd8JXuBmpY5bY9WmcVluqEr34= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-614-PdAdjIp3Nl-KwfyxDsiaug-1; Tue, 04 Oct 2022 11:06:04 -0400 X-MC-Unique: PdAdjIp3Nl-KwfyxDsiaug-1 Received: by mail-wr1-f69.google.com with SMTP id k30-20020adfb35e000000b0022e04708c18so2954440wrd.22 for ; Tue, 04 Oct 2022 08:06:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Q9IkfNrJjAEpG1neD9g4eIbNDeR16z8tF/Uf9UeR0cE=; b=QRsDsQzE2FwZt/my1IGuOBH6TA8f6JtKH+GlPT3TBiT5TWWJnvWIfsUi58NI7L76Nc ix9Fml2NjdKnzJR4m1yMMunUEn/eLspjI5K7J0rX2zOblOqRZVziwx2+gpy5N+m2+YBq 7GXR1EMXSoJpMRv6LsU2cOHOXBnxicpvoSX6dIvr22JlucjwMB9Ykc++4iSjR6I17VCD gjt9KgDAnXg2m9XjS6D0e1i4OCa2ECvOz8VGXG2BvqYsjVarTUDa4YqzCPzggF592f92 YrQvrVJrbOODBAYZtxhjtPsFougf/wrV0AAVmBv1ZH8exJx4bn8/QhGeBYOVGdHGEQCv OFPA== X-Gm-Message-State: ACrzQf2/rL7C4x+L0YmC914SmqzpsKKmBsQeIzRcek+BVghD9XRhNGzj JzByjiMZFh8yuyAMw9FuTaA6x3pOL1rbJnWDqFN0Nd0+SmoZJkJL4SzQj8G67WxFkTSguMPSPsO QptSCpd0kSGfzhBpQYRJ///0tbVboTvBpZIqLqdgbHTWpEat10dpPxuBmWb4eB79CqiOjusBHsT GT X-Received: by 2002:a05:6000:168c:b0:226:f4c2:d6db with SMTP id y12-20020a056000168c00b00226f4c2d6dbmr16186561wrd.659.1664895960911; Tue, 04 Oct 2022 08:06:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6Y8f4VWUmIwlPgyvFhw+BUHewZq0tOElVMmS6BG7uBGH81oWmBHQSRDSUA5Smo5rVNwc1B6A== X-Received: by 2002:a05:6000:168c:b0:226:f4c2:d6db with SMTP id y12-20020a056000168c00b00226f4c2d6dbmr16186528wrd.659.1664895960594; Tue, 04 Oct 2022 08:06:00 -0700 (PDT) Received: from vschneid.remote.csb ([149.71.65.94]) by smtp.gmail.com with ESMTPSA id w10-20020a05600c474a00b003b4ac05a8a4sm25777717wmo.27.2022.10.04.08.05.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Oct 2022 08:05:59 -0700 (PDT) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Tejun Heo , Lai Jiangshan , Peter Zijlstra , Frederic Weisbecker , Juri Lelli , Phil Auld , Marcelo Tosatti Subject: [PATCH v4 1/4] workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex Date: Tue, 4 Oct 2022 16:05:18 +0100 Message-Id: <20221004150521.822266-2-vschneid@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221004150521.822266-1-vschneid@redhat.com> References: <20221004150521.822266-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Lai Jiangshan When unbind_workers() reads wq_unbound_cpumask to set the affinity of freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex. Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also remove the need of temporary saved_cpumask. Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool r= unning on housekeeping CPUs") Reported-by: Valentin Schneider Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 41 ++++++++++++++++------------------------- 1 file changed, 16 insertions(+), 25 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 7cd5f5e7e0a1..8e21c352c155 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -326,7 +326,7 @@ static struct rcuwait manager_wait =3D __RCUWAIT_INITIA= LIZER(manager_wait); static LIST_HEAD(workqueues); /* PR: list of all workqueues */ static bool workqueue_freezing; /* PL: have wqs started freezing? */ =20 -/* PL: allowable cpus for unbound wqs and work items */ +/* PL&A: allowable cpus for unbound wqs and work items */ static cpumask_var_t wq_unbound_cpumask; =20 /* CPU where unbound work was last round robin scheduled from this CPU */ @@ -3952,7 +3952,8 @@ static void apply_wqattrs_cleanup(struct apply_wqattr= s_ctx *ctx) /* allocate the attrs and pwqs for later installation */ static struct apply_wqattrs_ctx * apply_wqattrs_prepare(struct workqueue_struct *wq, - const struct workqueue_attrs *attrs) + const struct workqueue_attrs *attrs, + const cpumask_var_t unbound_cpumask) { struct apply_wqattrs_ctx *ctx; struct workqueue_attrs *new_attrs, *tmp_attrs; @@ -3968,14 +3969,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq, goto out_free; =20 /* - * Calculate the attrs of the default pwq. + * Calculate the attrs of the default pwq with unbound_cpumask + * which is wq_unbound_cpumask or to set to wq_unbound_cpumask. * If the user configured cpumask doesn't overlap with the * wq_unbound_cpumask, we fallback to the wq_unbound_cpumask. */ copy_workqueue_attrs(new_attrs, attrs); - cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask); + cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbound_cpumask); if (unlikely(cpumask_empty(new_attrs->cpumask))) - cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask); + cpumask_copy(new_attrs->cpumask, unbound_cpumask); =20 /* * We may create multiple pwqs with differing cpumasks. Make a @@ -4072,7 +4074,7 @@ static int apply_workqueue_attrs_locked(struct workqu= eue_struct *wq, wq->flags &=3D ~__WQ_ORDERED; } =20 - ctx =3D apply_wqattrs_prepare(wq, attrs); + ctx =3D apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask); if (!ctx) return -ENOMEM; =20 @@ -5334,7 +5336,7 @@ void thaw_workqueues(void) } #endif /* CONFIG_FREEZER */ =20 -static int workqueue_apply_unbound_cpumask(void) +static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpu= mask) { LIST_HEAD(ctxs); int ret =3D 0; @@ -5350,7 +5352,7 @@ static int workqueue_apply_unbound_cpumask(void) if (wq->flags & __WQ_ORDERED) continue; =20 - ctx =3D apply_wqattrs_prepare(wq, wq->unbound_attrs); + ctx =3D apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask); if (!ctx) { ret =3D -ENOMEM; break; @@ -5365,6 +5367,11 @@ static int workqueue_apply_unbound_cpumask(void) apply_wqattrs_cleanup(ctx); } =20 + if (!ret) { + mutex_lock(&wq_pool_attach_mutex); + cpumask_copy(wq_unbound_cpumask, unbound_cpumask); + mutex_unlock(&wq_pool_attach_mutex); + } return ret; } =20 @@ -5383,7 +5390,6 @@ static int workqueue_apply_unbound_cpumask(void) int workqueue_set_unbound_cpumask(cpumask_var_t cpumask) { int ret =3D -EINVAL; - cpumask_var_t saved_cpumask; =20 /* * Not excluding isolated cpus on purpose. @@ -5397,23 +5403,8 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpum= ask) goto out_unlock; } =20 - if (!zalloc_cpumask_var(&saved_cpumask, GFP_KERNEL)) { - ret =3D -ENOMEM; - goto out_unlock; - } - - /* save the old wq_unbound_cpumask. */ - cpumask_copy(saved_cpumask, wq_unbound_cpumask); - - /* update wq_unbound_cpumask at first and apply it to wqs. */ - cpumask_copy(wq_unbound_cpumask, cpumask); - ret =3D workqueue_apply_unbound_cpumask(); - - /* restore the wq_unbound_cpumask when failed. */ - if (ret < 0) - cpumask_copy(wq_unbound_cpumask, saved_cpumask); + ret =3D workqueue_apply_unbound_cpumask(cpumask); =20 - free_cpumask_var(saved_cpumask); out_unlock: apply_wqattrs_unlock(); } --=20 2.31.1 From nobody Mon Apr 6 18:06:11 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F284C433FE for ; Tue, 4 Oct 2022 15:06:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229801AbiJDPGR (ORCPT ); Tue, 4 Oct 2022 11:06:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229760AbiJDPGI (ORCPT ); Tue, 4 Oct 2022 11:06:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55DB25AA06 for ; Tue, 4 Oct 2022 08:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664895966; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3cLF6DtNyC/Gb0WGy4zL7jnNQldQMBfszjQeREq5ULU=; b=BLwn0k+HMWEfPeRYkfeE4PCsZEXKmgTarc8LQepVey5gVFw0u1O3D/z1P6V1eW8t4zS1fN ehUFLa4k7gRzf+szaKK4BPrg2VVN5jGIl8jtg7hMwQkfDQxvm/NBYxiEZDc+3js73jP5Gw EiATVCyMVs/8QMGen09vwutOsfbm0ZY= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-549-eMoEcQiFMkCiKn1EBThmtQ-1; Tue, 04 Oct 2022 11:06:04 -0400 X-MC-Unique: eMoEcQiFMkCiKn1EBThmtQ-1 Received: by mail-wr1-f70.google.com with SMTP id n8-20020adf8b08000000b0022e4d3dc8d2so850935wra.7 for ; Tue, 04 Oct 2022 08:06:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=3cLF6DtNyC/Gb0WGy4zL7jnNQldQMBfszjQeREq5ULU=; b=P9F/A+gD0S5NZGOKC3iDe9pgD7Fjcc4Dz3aZidODa/AOoj78PpQV38JxBYQ1yY4a9B Tlg625kAuPILLixmT2Cn6SnR5/D1kzqpqisBUp7xIDFWAUt7eQwHuot+kxdPqzoJZ83v oRlqPGypNoY7XD/dWSGbt8hX10aJnK2shyv+6GadCfgnI3T448OjpVKpMGKdtl2e/QYu 5O4EzBcuY5r1BfHo2gWSgONvA3m7nl2+WbR7+uCX+LnLTTsqawQu7Tzu047Ghu0mwnGu XkW4M+X/3VcBYNhmhtotTGq7hTBpsqOIdQq3UtxKHvoU/4Jh4KRmwBPDX7jblt2MsGWi a/PQ== X-Gm-Message-State: ACrzQf1t4LYiae/zOXe+nT6ufDlFYc8FNrsRP437QU+BtfNk75D5cmbZ ox1jzuz+o+FigDWsX/CtFkFfYgImzI93DRAl9Lue8nbV8eBDfp4ajOURZkAzgeJgmhcinqw7VTC CZiW8Ut5rmZFY4lmAW3budSQdOX767wWdo3teEOuapGjkrmqEsRQJxD/dIKdwPu/hVb3359vyyC vy X-Received: by 2002:a05:6000:10cc:b0:22e:39a:efe4 with SMTP id b12-20020a05600010cc00b0022e039aefe4mr12519659wrx.256.1664895962637; Tue, 04 Oct 2022 08:06:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6X7Cem+rU2pIGih05enKEhzv4swiuzdhoH5HERlNmvpiizxdblaOMcFhmsxOCSwl1h+y7Wqg== X-Received: by 2002:a05:6000:10cc:b0:22e:39a:efe4 with SMTP id b12-20020a05600010cc00b0022e039aefe4mr12519631wrx.256.1664895962348; Tue, 04 Oct 2022 08:06:02 -0700 (PDT) Received: from vschneid.remote.csb ([149.71.65.94]) by smtp.gmail.com with ESMTPSA id w10-20020a05600c474a00b003b4ac05a8a4sm25777717wmo.27.2022.10.04.08.06.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Oct 2022 08:06:01 -0700 (PDT) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Tejun Heo , Lai Jiangshan , Peter Zijlstra , Frederic Weisbecker , Juri Lelli , Phil Auld , Marcelo Tosatti Subject: [PATCH v4 2/4] workqueue: Factorize unbind/rebind_workers() logic Date: Tue, 4 Oct 2022 16:05:19 +0100 Message-Id: <20221004150521.822266-3-vschneid@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221004150521.822266-1-vschneid@redhat.com> References: <20221004150521.822266-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Later patches will reuse this code, move it into reusable functions. Signed-off-by: Valentin Schneider --- kernel/workqueue.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 8e21c352c155..8185a42848c5 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1972,6 +1972,23 @@ static struct worker *create_worker(struct worker_po= ol *pool) return NULL; } =20 +static void unbind_worker(struct worker *worker) +{ + lockdep_assert_held(&wq_pool_attach_mutex); + + kthread_set_per_cpu(worker->task, -1); + if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask)) + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0); + else + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0); +} + +static void rebind_worker(struct worker *worker, struct worker_pool *pool) +{ + kthread_set_per_cpu(worker->task, pool->cpu); + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0= ); +} + /** * destroy_worker - destroy a workqueue worker * @worker: worker to be destroyed @@ -5008,13 +5025,8 @@ static void unbind_workers(int cpu) =20 raw_spin_unlock_irq(&pool->lock); =20 - for_each_pool_worker(worker, pool) { - kthread_set_per_cpu(worker->task, -1); - if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask)) - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < = 0); - else - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0= ); - } + for_each_pool_worker(worker, pool) + unbind_worker(worker); =20 mutex_unlock(&wq_pool_attach_mutex); } @@ -5039,11 +5051,8 @@ static void rebind_workers(struct worker_pool *pool) * of all workers first and then clear UNBOUND. As we're called * from CPU_ONLINE, the following shouldn't fail. */ - for_each_pool_worker(worker, pool) { - kthread_set_per_cpu(worker->task, pool->cpu); - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, - pool->attrs->cpumask) < 0); - } + for_each_pool_worker(worker, pool) + rebind_worker(worker, pool); =20 raw_spin_lock_irq(&pool->lock); =20 --=20 2.31.1 From nobody Mon Apr 6 18:06:11 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8DA6C433F5 for ; Tue, 4 Oct 2022 15:06:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229975AbiJDPGW (ORCPT ); Tue, 4 Oct 2022 11:06:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229910AbiJDPGO (ORCPT ); Tue, 4 Oct 2022 11:06:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7443F5AC7E for ; Tue, 4 Oct 2022 08:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664895970; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=50a5iJfaLzCpaDTsLjUsvQCTTc/RwKOyv7j+iKLCP1I=; b=Jyxqkcz5QUHxSCeDTKkz5TB/JyXO5IVWk1qrQCNu1QeipPvjjQNwBj46/gSij/phf8nLwU NIrHQAB/urRQXEGwzujihgBr2oRQewLR1nZsmcSKG2UGjdZb4Y+J1P5pwTdJqpfLVAFAix St7JeHYpajirerJ5v+2zxG17/kBusjM= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-132-tWiqOhYdOzehMoMx07iM0A-1; Tue, 04 Oct 2022 11:06:08 -0400 X-MC-Unique: tWiqOhYdOzehMoMx07iM0A-1 Received: by mail-wm1-f70.google.com with SMTP id k21-20020a7bc415000000b003b4fac53006so3657388wmi.3 for ; Tue, 04 Oct 2022 08:06:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=50a5iJfaLzCpaDTsLjUsvQCTTc/RwKOyv7j+iKLCP1I=; b=L4dVB5JhTQMfZLTdCq7coGMAZ/hk8mDfAOFWu/EjJ55dRToWG6q0OSMholJhF20Qjm 2/Ik2N9n5EQ/vsgwW4VUT8wPRCRO44yauHVnYO7AsPo+RQt6Sf5KOiATb4dMAQhQE1/3 RoXVS1/7eSY6iXk8o/YYx+OhIJCMibN2mm0gUqLKvBVtdxcYuneywI8vy608O+mc5zgu JIwn6nhgUsJrfSs2OxaPh7RB9Lo9lLxcuoWrs674jRRfto0PaLSBV3rWQFIDWh6X7BGl vS/0gp3KobwdAAO3j6ubR2beZuRxFjTJTHvGCfuxwY/yISK3X18UW1sj5x29DqtyUOkq oDzg== X-Gm-Message-State: ACrzQf3KtpC0g4hmjN6BS7OM70JGoafkYW+9PcW6DFU94vpfiWdj/hkM rtzMp4jq8GPNlrzXEHE1CrenBsv0dJFVz5KS+MsPC/5D539adUI13kVxX37Q+g3n3BuQfCl28ko i0S9Td3WP24anPPqbx4khEmfdyVAPQCwj8hrakVSUayTnBkQ6NIXQ1KqZcfktOqMdW6ylFL7F8y 1n X-Received: by 2002:adf:ea42:0:b0:22e:3ca8:c269 with SMTP id j2-20020adfea42000000b0022e3ca8c269mr6794887wrn.186.1664895965207; Tue, 04 Oct 2022 08:06:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5Sej5aFLRJSLK7KLakiwAMiKggyVEshUxMOLh7XFtNT0VvWRKkUmbqJsfGnll/JBJ/I8yTog== X-Received: by 2002:adf:ea42:0:b0:22e:3ca8:c269 with SMTP id j2-20020adfea42000000b0022e3ca8c269mr6794847wrn.186.1664895964815; Tue, 04 Oct 2022 08:06:04 -0700 (PDT) Received: from vschneid.remote.csb ([149.71.65.94]) by smtp.gmail.com with ESMTPSA id w10-20020a05600c474a00b003b4ac05a8a4sm25777717wmo.27.2022.10.04.08.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Oct 2022 08:06:03 -0700 (PDT) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Tejun Heo , Lai Jiangshan , Peter Zijlstra , Frederic Weisbecker , Juri Lelli , Phil Auld , Marcelo Tosatti Subject: [PATCH v4 3/4] workqueue: Convert the idle_timer to a delayed_work Date: Tue, 4 Oct 2022 16:05:20 +0100 Message-Id: <20221004150521.822266-4-vschneid@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221004150521.822266-1-vschneid@redhat.com> References: <20221004150521.822266-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A later patch will require a sleepable context in the idle worker timeout function. Converting worker_pool.idle_timer to a delayed_work gives us just that. One caveat is that we need to be careful about re-queuing the dwork from its callback function. Lai expressed concerns about overtly violating documented locking rules, but extra locking is required around delaying the dwork, else a worker thread adding itself to the idle_list might push the dwork further back (IDLE_WORKER_TIMEOUT) than the work callback would (next idle worker expiry). No change in functionality intended. Signed-off-by: Valentin Schneider --- kernel/workqueue.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 8185a42848c5..436b1dbdf9ff 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -167,9 +167,9 @@ struct worker_pool { int nr_workers; /* L: total number of workers */ int nr_idle; /* L: currently idle workers */ =20 - struct list_head idle_list; /* L: list of idle workers */ - struct timer_list idle_timer; /* L: worker idle timeout */ - struct timer_list mayday_timer; /* L: SOS timer for workers */ + struct list_head idle_list; /* L: list of idle workers */ + struct delayed_work idle_reaper_work; /* L: worker idle timeout */ + struct timer_list mayday_timer; /* L: SOS timer for workers */ =20 /* a workers is either on busy_hash or idle_list, or the manager */ DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER); @@ -1806,8 +1806,10 @@ static void worker_enter_idle(struct worker *worker) /* idle_list is LIFO */ list_add(&worker->entry, &pool->idle_list); =20 - if (too_many_workers(pool) && !timer_pending(&pool->idle_timer)) - mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT); + if (too_many_workers(pool) && !delayed_work_pending(&pool->idle_reaper_wo= rk)) + mod_delayed_work(system_unbound_wq, + &pool->idle_reaper_work, + IDLE_WORKER_TIMEOUT); =20 /* Sanity check nr_running. */ WARN_ON_ONCE(pool->nr_workers =3D=3D pool->nr_idle && pool->nr_running); @@ -2019,22 +2021,37 @@ static void destroy_worker(struct worker *worker) wake_up_process(worker->task); } =20 -static void idle_worker_timeout(struct timer_list *t) +/* + * idle_reaper_fn - reap workers that have been idle for too long. + * + * The delayed_work is only ever modified under raw_spin_lock_irq(pool->lo= ck). + */ +static void idle_reaper_fn(struct work_struct *work) { - struct worker_pool *pool =3D from_timer(pool, t, idle_timer); + struct delayed_work *dwork =3D to_delayed_work(work); + struct worker_pool *pool =3D container_of(dwork, struct worker_pool, idle= _reaper_work); =20 raw_spin_lock_irq(&pool->lock); =20 while (too_many_workers(pool)) { struct worker *worker; unsigned long expires; + unsigned long now =3D jiffies; =20 - /* idle_list is kept in LIFO order, check the last one */ + /* idle_list is kept in LIFO order, check the oldest entry */ worker =3D list_entry(pool->idle_list.prev, struct worker, entry); expires =3D worker->last_active + IDLE_WORKER_TIMEOUT; =20 - if (time_before(jiffies, expires)) { - mod_timer(&pool->idle_timer, expires); + /* + * Careful: queueing a work item from here can and will cause a + * self-deadlock when dealing with an unbound pool. However, + * here the delay *cannot* be zero, so the queuing will + * happen in the timer callback. + */ + if (time_before(now, expires)) { + mod_delayed_work(system_unbound_wq, + &pool->idle_reaper_work, + expires - now); break; } =20 @@ -3478,7 +3495,7 @@ static int init_worker_pool(struct worker_pool *pool) INIT_LIST_HEAD(&pool->idle_list); hash_init(pool->busy_hash); =20 - timer_setup(&pool->idle_timer, idle_worker_timeout, TIMER_DEFERRABLE); + INIT_DEFERRABLE_WORK(&pool->idle_reaper_work, idle_reaper_fn); =20 timer_setup(&pool->mayday_timer, pool_mayday_timeout, 0); =20 @@ -3625,7 +3642,7 @@ static void put_unbound_pool(struct worker_pool *pool) wait_for_completion(pool->detach_completion); =20 /* shut down the timers */ - del_timer_sync(&pool->idle_timer); + cancel_delayed_work_sync(&pool->idle_reaper_work); del_timer_sync(&pool->mayday_timer); =20 /* RCU protected to allow dereferences from get_work_pool() */ --=20 2.31.1 From nobody Mon Apr 6 18:06:11 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 747EBC4332F for ; Tue, 4 Oct 2022 15:06:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229962AbiJDPG2 (ORCPT ); Tue, 4 Oct 2022 11:06:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229941AbiJDPGR (ORCPT ); Tue, 4 Oct 2022 11:06:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BE545AC6B for ; Tue, 4 Oct 2022 08:06:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664895973; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k4UKkux0NQcy21BxYM6cWK9bW5qHoR8CiY5xTdRWFuE=; b=Kv6tAvrGiGL0LL7uGisjKN4e2kLaHUo80rjsFlnhwlDPHIdQT6n0cY7GX7uZRFD1HPLzAx +flK+Ii3wI7D2Qa++o4fdEBl7foEcIlFBcfxVsBHl6z/ciYqbw5FHoQa0X0KsrKy0PI/c3 RmGudmoavV6Qa11x2bv7GtpNYbnTEvQ= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-134-Bl9_KzjGPLqEdFuMu4HnTw-1; Tue, 04 Oct 2022 11:06:11 -0400 X-MC-Unique: Bl9_KzjGPLqEdFuMu4HnTw-1 Received: by mail-wm1-f70.google.com with SMTP id c3-20020a7bc843000000b003b486fc6a40so3657625wml.7 for ; Tue, 04 Oct 2022 08:06:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=k4UKkux0NQcy21BxYM6cWK9bW5qHoR8CiY5xTdRWFuE=; b=C9/IVD81SZdEKHb2ALYMAu38EkUUY4Q7mNVdeTO8m1V6vuUSztp9UK6OWfQV6hehxU 4loqQG+7AfF1f4nvAMTxgmMhHDlHvJlqtVr8hSV1W8oqqMBa3G0+TsYxCigV9EyE2YjJ hhtNM9u9MVEqWcwrYSNaB6xohNGnRe1o7XXBll9Ri+jadrvye2ZnIwDxR/DED60GXG2J 0MzItM74E4i3ETbMqMsWoyH77x5gpkWwwLZ/r79jDX3TF9tW38gKIjrrFqbQJkxyKWdG l3Y1vCBWqYQZikGRoW1SHoBsPANzzkuFVthUr+X7RcBJpLuxzAm5AUwLm2+CyRpEJBJe xNCw== X-Gm-Message-State: ACrzQf1tS1orEO1Wj0qQ9ethZLP5Od9L/9JG7Jf3DLNLcEoZ7fo1br2m RvYu4/ypYd4bAsdtVTf8o8P1U4HgjN7kKlZjGvcQdnniErzC3hCPqAa9p/5gsYNXBj2efxXswlB Z30pm/f1vHb5WZW5/lRyUInS4u/O87MqHGD8d9m2VsuGqhQc+JSLohPFEwlugzN+99p/Ikr5XW2 gP X-Received: by 2002:adf:ce8b:0:b0:22c:dea6:6ea with SMTP id r11-20020adfce8b000000b0022cdea606eamr13800690wrn.387.1664895967262; Tue, 04 Oct 2022 08:06:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM599KGtFtBeX+d651Az58J2VHYEBgjX0z4viWSsXNccX/RLb9BcoArIrwRvw92M+AXscRNM6Q== X-Received: by 2002:adf:ce8b:0:b0:22c:dea6:6ea with SMTP id r11-20020adfce8b000000b0022cdea606eamr13800645wrn.387.1664895966752; Tue, 04 Oct 2022 08:06:06 -0700 (PDT) Received: from vschneid.remote.csb ([149.71.65.94]) by smtp.gmail.com with ESMTPSA id w10-20020a05600c474a00b003b4ac05a8a4sm25777717wmo.27.2022.10.04.08.06.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Oct 2022 08:06:05 -0700 (PDT) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Tejun Heo , Lai Jiangshan , Peter Zijlstra , Frederic Weisbecker , Juri Lelli , Phil Auld , Marcelo Tosatti Subject: [PATCH v4 4/4] workqueue: Unbind workers before sending them to exit() Date: Tue, 4 Oct 2022 16:05:21 +0100 Message-Id: <20221004150521.822266-5-vschneid@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20221004150521.822266-1-vschneid@redhat.com> References: <20221004150521.822266-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It has been reported that isolated CPUs can suffer from interference due to per-CPU kworkers waking up just to die. A surge of workqueue activity during initial setup of a latency-sensitive application (refresh_vm_stats() being one of the culprits) can cause extra per-CPU kworkers to be spawned. Then, said latency-sensitive task can be running merrily on an isolated CPU only to be interrupted sometime later by a kworker marked for death (cf. IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity). Prevent this by affining kworkers to the wq_unbound_cpumask (which doesn't contain isolated CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with WORKER_DIE. Changing the affinity does require a sleepable context, leverage the newly introduced pool->idle_reaper_work to get that. Remove dying workers from pool->workers and keep track of them in a separate list. This intentionally prevents for_each_loop_worker() from iterating over workers that are marked for death. Signed-off-by: Valentin Schneider --- kernel/workqueue.c | 80 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 70 insertions(+), 10 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 436b1dbdf9ff..714db7df7105 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -177,6 +177,7 @@ struct worker_pool { =20 struct worker *manager; /* L: purely informational */ struct list_head workers; /* A: attached workers */ + struct list_head dying_workers; /* A: workers about to die */ struct completion *detach_completion; /* all workers detached */ =20 struct ida worker_ida; /* worker IDs for task name */ @@ -1902,7 +1903,7 @@ static void worker_detach_from_pool(struct worker *wo= rker) list_del(&worker->node); worker->pool =3D NULL; =20 - if (list_empty(&pool->workers)) + if (list_empty(&pool->workers) && list_empty(&pool->dying_workers)) detach_completion =3D pool->detach_completion; mutex_unlock(&wq_pool_attach_mutex); =20 @@ -1991,9 +1992,31 @@ static void rebind_worker(struct worker *worker, str= uct worker_pool *pool) WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0= ); } =20 +static void reap_workers(struct list_head *reaplist) +{ + struct worker *worker, *tmp; + + list_for_each_entry_safe(worker, tmp, reaplist, entry) { + list_del_init(&worker->entry); + unbind_worker(worker); + /* + * If the worker was somehow already running, then it had to be + * in pool->idle_list when destroy_worker() happened or we + * wouldn't have gotten here. + * + * Thus, the worker must either have observed the WORKER_DIE + * flag, or have set its state to TASK_IDLE. Either way, the + * below will be observed by the worker and is safe to do + * outside of pool->lock. + */ + wake_up_process(worker->task); + } +} + /** * destroy_worker - destroy a workqueue worker * @worker: worker to be destroyed + * @list: transfer worker away from its pool->idle_list and into list * * Destroy @worker and adjust @pool stats accordingly. The worker should * be idle. @@ -2001,11 +2024,12 @@ static void rebind_worker(struct worker *worker, st= ruct worker_pool *pool) * CONTEXT: * raw_spin_lock_irq(pool->lock). */ -static void destroy_worker(struct worker *worker) +static void destroy_worker(struct worker *worker, struct list_head *list) { struct worker_pool *pool =3D worker->pool; =20 lockdep_assert_held(&pool->lock); + lockdep_assert_held(&wq_pool_attach_mutex); =20 /* sanity check frenzy */ if (WARN_ON(worker->current_work) || @@ -2016,21 +2040,50 @@ static void destroy_worker(struct worker *worker) pool->nr_workers--; pool->nr_idle--; =20 - list_del_init(&worker->entry); worker->flags |=3D WORKER_DIE; - wake_up_process(worker->task); + + list_move(&worker->entry, list); + list_move(&worker->node, &pool->dying_workers); } =20 /* * idle_reaper_fn - reap workers that have been idle for too long. * + * Unbinding marked-for-destruction workers requires a sleepable context, = as + * changing a task's affinity is not an atomic operation, and we don't want + * to disturb isolated CPUs IDLE_WORKER_TIMEOUT in the future just for a k= worker + * to do_exit(). + * + * Percpu kworkers should meet the conditions for the affinity change to n= ot + * block (not migration-disabled and not running), but there is no *hard* + * guarantee that they are not running when we get here. + * * The delayed_work is only ever modified under raw_spin_lock_irq(pool->lo= ck). */ static void idle_reaper_fn(struct work_struct *work) { struct delayed_work *dwork =3D to_delayed_work(work); struct worker_pool *pool =3D container_of(dwork, struct worker_pool, idle= _reaper_work); + struct list_head reaplist; =20 + INIT_LIST_HEAD(&reaplist); + + /* + * Unlikely as it may be, a to-be-reaped worker could run after + * idle_reaper_fn()::destroy_worker() has happened but before + * idle_reaper_fn()::reap_workers() (consider a worker that stays + * preempted after setting itself in the idle list, or before removing + * itself from it). + * + * WORKER_DIE would be set in worker->flags, so it would be able to + * kfree(worker) and head out to do_exit(), which wouldn't be nice to + * the idle reaper. + * + * Grabbing wq_pool_attach_mutex here ensures an already-running worker + * won't go beyond worker_detach_from_pool() in its self-destruct path + * (WORKER_DIE is set with wq_pool_attach_mutex set). + */ + mutex_lock(&wq_pool_attach_mutex); raw_spin_lock_irq(&pool->lock); =20 while (too_many_workers(pool)) { @@ -2055,10 +2108,11 @@ static void idle_reaper_fn(struct work_struct *work) break; } =20 - destroy_worker(worker); + destroy_worker(worker, &reaplist); } - raw_spin_unlock_irq(&pool->lock); + reap_workers(&reaplist); + mutex_unlock(&wq_pool_attach_mutex); } =20 static void send_mayday(struct work_struct *work) @@ -2422,12 +2476,12 @@ static int worker_thread(void *__worker) /* am I supposed to die? */ if (unlikely(worker->flags & WORKER_DIE)) { raw_spin_unlock_irq(&pool->lock); - WARN_ON_ONCE(!list_empty(&worker->entry)); set_pf_worker(false); =20 set_task_comm(worker->task, "kworker/dying"); ida_free(&pool->worker_ida, worker->id); worker_detach_from_pool(worker); + WARN_ON_ONCE(!list_empty(&worker->entry)); kfree(worker); return 0; } @@ -3500,6 +3554,7 @@ static int init_worker_pool(struct worker_pool *pool) timer_setup(&pool->mayday_timer, pool_mayday_timeout, 0); =20 INIT_LIST_HEAD(&pool->workers); + INIT_LIST_HEAD(&pool->dying_workers); =20 ida_init(&pool->worker_ida); INIT_HLIST_NODE(&pool->hash_node); @@ -3600,8 +3655,11 @@ static bool wq_manager_inactive(struct worker_pool *= pool) static void put_unbound_pool(struct worker_pool *pool) { DECLARE_COMPLETION_ONSTACK(detach_completion); + struct list_head reaplist; struct worker *worker; =20 + INIT_LIST_HEAD(&reaplist); + lockdep_assert_held(&wq_pool_mutex); =20 if (--pool->refcnt) @@ -3624,17 +3682,19 @@ static void put_unbound_pool(struct worker_pool *po= ol) * Because of how wq_manager_inactive() works, we will hold the * spinlock after a successful wait. */ + mutex_lock(&wq_pool_attach_mutex); rcuwait_wait_event(&manager_wait, wq_manager_inactive(pool), TASK_UNINTERRUPTIBLE); pool->flags |=3D POOL_MANAGER_ACTIVE; =20 while ((worker =3D first_idle_worker(pool))) - destroy_worker(worker); + destroy_worker(worker, &reaplist); WARN_ON(pool->nr_workers || pool->nr_idle); raw_spin_unlock_irq(&pool->lock); =20 - mutex_lock(&wq_pool_attach_mutex); - if (!list_empty(&pool->workers)) + reap_workers(&reaplist); + + if (!list_empty(&pool->workers) || !list_empty(&pool->dying_workers)) pool->detach_completion =3D &detach_completion; mutex_unlock(&wq_pool_attach_mutex); =20 --=20 2.31.1