From nobody Sun Feb 8 05:19:51 2026 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B802222563; Thu, 19 Dec 2024 09:30:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.60.130.6 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734600659; cv=none; b=gHQJSysdfD86tatzPYzKPhfyWYzwQig12fHJlDIdCWDnIrvUzJ29RTdnmUZL8yf/wfdqvq2W2Q5Z6SMU3m/7qZYMgzHfECOPIkjybl4KnSJPEQoz6yFe1B0P5Xlue7ekygIlaDoqL1uaKRpOr8fHMYss2FedRamhl9Vs/mBfu1w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734600659; c=relaxed/simple; bh=cIFpi4oPu4AE/Gkd9AjeFMbwJ5/Efjo1d0EkGsl0rd0=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=FZuBr8wUOK1BmbeMEtPRVR5H9X5z8suUd+XXlDaMicoBxhfAHo9csYPF3Gd/PQF5IIHhjNzvLX9rEOXuRZuJCXQ1W6oSXMzhGCCg2yAaZCEv7ER8jK97IczgVxiEcZbSm8dh9aFAj23/6Dzvjum6sL/oe/Nbl6w8w1uvfsMGs1w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=me1FGiEk; arc=none smtp.client-ip=178.60.130.6 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="me1FGiEk" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ZbFvZOFx+6BUg6zhOGCAS7uXf1ByPcEc85/6NcuhpcU=; b=me1FGiEktcTmKHit1bOzMXMzkS CqtggkgZIuLErwhv6sHY6ZP+Ja9kBSczjYLDZXVc/OM/3quJ11oK1TZPdxHyxTIBI07qK+Ztsb3cC 7iZf/oDmLdQoNb4YQqYDk8wY5LvG7a143tfosW+IqD/eucL7WQd9pt7fO1l2RPG3GKMy/k+WgMwy5 JrOLa5SjoOKs/ac0bGPhi4MTDx7KyyLkS6P8paakcxjz+tApB9Jqn2S6KF2oRtHCUR+yDy4uAuese 7yTsaD8b8BDkE6eG6rGOV+0qZT1m0KlD8HVvtOR9Wix4teKMzK5eWrTKy+8R4l+ifJWMsl5tCDKJf mVqn1J0Q==; Received: from [90.241.98.187] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1tOCro-005B7A-R7; Thu, 19 Dec 2024 10:30:49 +0100 From: Tvrtko Ursulin To: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: kernel-dev@igalia.com, linux-kernel@vger.kernel.org, Tvrtko Ursulin , Tejun Heo , Peter Zijlstra , Lai Jiangshan , Alex Deucher , Matthew Brost , stable@vger.kernel.org Subject: [PATCH] workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker Date: Thu, 19 Dec 2024 09:30:30 +0000 Message-ID: <20241219093030.52080-1-tursulin@igalia.com> X-Mailer: git-send-email 2.47.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Tvrtko Ursulin After commit 746ae46c1113 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM") amdgpu started seeing the following warning: [ ] workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is = flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu] ... [ ] Workqueue: sdma0 drm_sched_run_job_work [gpu_sched] ... [ ] Call Trace: [ ] ... [ ] ? check_flush_dependency+0xf5/0x110 ... [ ] cancel_delayed_work_sync+0x6e/0x80 [ ] amdgpu_gfx_off_ctrl+0xab/0x140 [amdgpu] [ ] amdgpu_ring_alloc+0x40/0x50 [amdgpu] [ ] amdgpu_ib_schedule+0xf4/0x810 [amdgpu] [ ] ? drm_sched_run_job_work+0x22c/0x430 [gpu_sched] [ ] amdgpu_job_run+0xaa/0x1f0 [amdgpu] [ ] drm_sched_run_job_work+0x257/0x430 [gpu_sched] [ ] process_one_work+0x217/0x720 ... [ ] The intent of the verifcation done in check_flush_depedency is to ensure forward progress during memory reclaim, by flagging cases when either a memory reclaim process, or a memory reclaim work item is flushed from a context not marked as memory reclaim safe. This is correct when flushing, but when called from the cancel(_delayed)_work_sync() paths it is a false positive because work is either already running, or will not be running at all. Therefore cancelling it is safe and we can relax the warning criteria by letting the helper know of the calling context. Signed-off-by: Tvrtko Ursulin Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_= MEM_RECLAIM workqueue") References: 746ae46c1113 ("drm/sched: Mark scheduler work queues with WQ_ME= M_RECLAIM") Cc: Tejun Heo Cc: Peter Zijlstra Cc: Lai Jiangshan Cc: Alex Deucher Cc: Christian K=C3=B6nig Cc: # v4.5+ --- kernel/workqueue.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9949ffad8df0..7abba81296cd 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3680,23 +3680,27 @@ void workqueue_softirq_dead(unsigned int cpu) * check_flush_dependency - check for flush dependency sanity * @target_wq: workqueue being flushed * @target_work: work item being flushed (NULL for workqueue flushes) + * @from_cancel: are we called from the work cancel path * * %current is trying to flush the whole @target_wq or @target_work on it. - * If @target_wq doesn't have %WQ_MEM_RECLAIM, verify that %current is not - * reclaiming memory or running on a workqueue which doesn't have - * %WQ_MEM_RECLAIM as that can break forward-progress guarantee leading to - * a deadlock. + * If this is not the cancel path (which implies work being flushed is eit= her + * already running, or will not be at all), check if @target_wq doesn't ha= ve + * %WQ_MEM_RECLAIM and verify that %current is not reclaiming memory or ru= nning + * on a workqueue which doesn't have %WQ_MEM_RECLAIM as that can break for= ward- + * progress guarantee leading to a deadlock. */ static void check_flush_dependency(struct workqueue_struct *target_wq, - struct work_struct *target_work) + struct work_struct *target_work, + bool from_cancel) { - work_func_t target_func =3D target_work ? target_work->func : NULL; + work_func_t target_func; struct worker *worker; =20 - if (target_wq->flags & WQ_MEM_RECLAIM) + if (from_cancel || target_wq->flags & WQ_MEM_RECLAIM) return; =20 worker =3D current_wq_worker(); + target_func =3D target_work ? target_work->func : NULL; =20 WARN_ONCE(current->flags & PF_MEMALLOC, "workqueue: PF_MEMALLOC task %d(%s) is flushing !WQ_MEM_RECLAIM %s:%ps= ", @@ -3966,7 +3970,7 @@ void __flush_workqueue(struct workqueue_struct *wq) list_add_tail(&this_flusher.list, &wq->flusher_overflow); } =20 - check_flush_dependency(wq, NULL); + check_flush_dependency(wq, NULL, false); =20 mutex_unlock(&wq->mutex); =20 @@ -4141,7 +4145,7 @@ static bool start_flush_work(struct work_struct *work= , struct wq_barrier *barr, } =20 wq =3D pwq->wq; - check_flush_dependency(wq, work); + check_flush_dependency(wq, work, from_cancel); =20 insert_wq_barrier(pwq, barr, work, worker); raw_spin_unlock_irq(&pool->lock); --=20 2.47.1