From nobody Mon Sep 15 05:50:36 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E18A4C3DA78 for ; Fri, 13 Jan 2023 21:07:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229869AbjAMVHo (ORCPT ); Fri, 13 Jan 2023 16:07:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230356AbjAMVH3 (ORCPT ); Fri, 13 Jan 2023 16:07:29 -0500 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F7008A228 for ; Fri, 13 Jan 2023 13:07:20 -0800 (PST) Received: by mail-vk1-xa4a.google.com with SMTP id f190-20020a1f1fc7000000b003b88bc02472so7024129vkf.13 for ; Fri, 13 Jan 2023 13:07:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=qnS8M5V8qfBfQsR67vowjP8fPT35SlSDCXjaC0rPBis=; b=LxeBplvdOA5TFxb0OzqE70s3B7r/HkF2IV4eAsgSsG+N8vK3TIe/tg09QR4kaUyw7t BRrelECXiX40ywlyiyZcMKrrjOh+j2CrRtolY/nSD6s0oowcqINhyrnXuVPDSXDFSNdy JqN8a0sCYsbeg6nDOeHX/4BzdUMD3MVLotZkDJIj7UaPIUpFlV2dp50N59TS8vmhONDY YeMkzVDBediDZUKJ2yb0AlAcA2KhtfZYsgcvt/4d5lKTHZMpsKYbunI8nadNLp/8qndR pFHjG4XLwmNoRbeIJZi5X8nmzAWFfhfSmYj7uzKtDAzZlrXUCuypyRrOqxLgQOdql8l7 dNGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=qnS8M5V8qfBfQsR67vowjP8fPT35SlSDCXjaC0rPBis=; b=pVRy/9O1V8VM7SMlam08GY71oeTeLB+tp9W/ENARhSj9rM0QMJWiRbxCAWS2UJDanX SwyVwbMGixwc7GYGoG+HcW8Jp9XAd1lnMpk6rTTLsmeahktjknN44F0HHEFM0jr7J4PG p/AOwQIByv9lvvajxrw4/mE21CDfHy92t3/ZXbR6t/q6psYk+DwhBNAU/nMrgiPE472Y CaiJIknG/rTWJJvMA9Q6acyl9afUxeuxh11MYSKBozmvZtRhqtIKtl8OBEwbTHy7MTgt AvSjwcTPgVtU5Kb35mcU9RPO9ej56D81fnsXU2k2tjgluOO/fOmZpbw/cP9WSlI5uAZw dV/w== X-Gm-Message-State: AFqh2kqilc4tEAqP7g6hWuUt9Z2FMyqYwJfwIPmu/Bdpyak+QHZ5/LMS 7qGic3QaALQvf8C+zfsPFmNFVO5YdA== X-Google-Smtp-Source: AMrXdXsSnS50JuFsTF0efvXBaESOLl+7bx21lxQL967ckpt+ph1S/xhpKLEADEOs9U9+MLPubvGWdTc0Qg== X-Received: from nhuck.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:39cc]) (user=nhuck job=sendgmr) by 2002:a05:6102:f0e:b0:3ce:ac79:fc7d with SMTP id v14-20020a0561020f0e00b003ceac79fc7dmr6169382vss.86.1673644039495; Fri, 13 Jan 2023 13:07:19 -0800 (PST) Date: Fri, 13 Jan 2023 13:07:02 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230113210703.62107-1-nhuck@google.com> Subject: [PATCH] workqueue: Add WQ_SCHED_FIFO From: Nathan Huckleberry Cc: Nathan Huckleberry , Sandeep Dhavale , Daeho Jeong , Eric Biggers , Sami Tolvanen , Tejun Heo , Lai Jiangshan , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a WQ flag that allows workqueues to use SCHED_FIFO with the least imporant RT priority. This can reduce scheduler latency for IO post-processing when the CPU is under load without impacting other RT workloads. This has been shown to improve app startup time on Android [1]. Scheduler latency affects several drivers as evidenced by [1], [2], [3], [4]. Some of these drivers have moved post-processing into IRQ context. However, this can cause latency spikes for real-time threads and jitter related jank on Android. Using a workqueue with SCHED_FIFO improves scheduler latency without causing latency problems for RT threads. [1]: https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google= .com/ [2]: https://lore.kernel.org/linux-f2fs-devel/20220802192437.1895492-1-daeho43@g= mail.com/ [3]: https://lore.kernel.org/dm-devel/20220722093823.4158756-4-nhuck@google.com/ [4]: https://lore.kernel.org/dm-crypt/20200706173731.3734-1-ignat@cloudflare.com/ This change has been tested on dm-verity with the following fio config: [global] time_based runtime=3D120 [do-verify] ioengine=3Dsync filename=3D/dev/testing rw=3Drandread direct=3D1 [burn_8x90%_qsort] ioengine=3Dcpuio cpuload=3D90 numjobs=3D8 cpumode=3Dqsort Before: clat (usec): min=3D13, max=3D23882, avg=3D29.56, stdev=3D113.29 READ: bw=3D122MiB/s (128MB/s), 122MiB/s-122MiB/s (128MB/s-128MB/s), io=3D14.3GiB (15.3GB), run=3D120001-120001msec After: clat (usec): min=3D13, max=3D23137, avg=3D19.96, stdev=3D105.71 READ: bw=3D180MiB/s (189MB/s), 180MiB/s-180MiB/s (189MB/s-189MB/s), io=3D21.1GiB (22.7GB), run=3D120012-120012msec Cc: Sandeep Dhavale Cc: Daeho Jeong Cc: Eric Biggers Cc: Sami Tolvanen Signed-off-by: Nathan Huckleberry Acked-by: Gao Xiang --- Documentation/core-api/workqueue.rst | 12 ++++++++++ include/linux/workqueue.h | 9 +++++++ kernel/workqueue.c | 36 +++++++++++++++++++++------- 3 files changed, 48 insertions(+), 9 deletions(-) diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/= workqueue.rst index 3b22ed137662..26faf2806c66 100644 --- a/Documentation/core-api/workqueue.rst +++ b/Documentation/core-api/workqueue.rst @@ -216,6 +216,18 @@ resources, scheduled and executed. =20 This flag is meaningless for unbound wq. =20 +``WQ_SCHED_FIFO`` + Work items of a fifo wq are queued to the fifo + worker-pool of the target cpu. Fifo worker-pools are + served by worker threads with scheduler policy SCHED_FIFO and + the least important real-time priority. This can be useful + for workloads where low latency is imporant. + + A workqueue cannot be both high-priority and fifo. + + Note that normal and fifo worker-pools don't interact with + each other. Each maintains its separate pool of workers and + implements concurrency management among its workers. =20 ``max_active`` -------------- diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index ac551b8ee7d9..43a4eeaf8ff4 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -134,6 +134,10 @@ struct workqueue_attrs { * @nice: nice level */ int nice; + /** + * @sched_fifo: is using SCHED_FIFO + */ + bool sched_fifo; =20 /** * @cpumask: allowed CPUs @@ -334,6 +338,11 @@ enum { * http://thread.gmane.org/gmane.linux.kernel/1480396 */ WQ_POWER_EFFICIENT =3D 1 << 7, + /* + * Low real-time priority workqueues can reduce scheduler latency + * for latency sensitive workloads like IO post-processing. + */ + WQ_SCHED_FIFO =3D 1 << 8, =20 __WQ_DESTROYING =3D 1 << 15, /* internal: workqueue is destroying */ __WQ_DRAINING =3D 1 << 16, /* internal: workqueue is draining */ diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 5dc67aa9d696..99c5e0a3dc28 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -85,7 +85,7 @@ enum { WORKER_NOT_RUNNING =3D WORKER_PREP | WORKER_CPU_INTENSIVE | WORKER_UNBOUND | WORKER_REBOUND, =20 - NR_STD_WORKER_POOLS =3D 2, /* # standard pools per cpu */ + NR_STD_WORKER_POOLS =3D 3, /* # standard pools per cpu */ =20 UNBOUND_POOL_HASH_ORDER =3D 6, /* hashed by pool->attrs */ BUSY_WORKER_HASH_ORDER =3D 6, /* 64 pointers */ @@ -1949,7 +1949,8 @@ static struct worker *create_worker(struct worker_poo= l *pool) =20 if (pool->cpu >=3D 0) snprintf(id_buf, sizeof(id_buf), "%d:%d%s", pool->cpu, id, - pool->attrs->nice < 0 ? "H" : ""); + pool->attrs->sched_fifo ? "F" : + (pool->attrs->nice < 0 ? "H" : "")); else snprintf(id_buf, sizeof(id_buf), "u%d:%d", pool->id, id); =20 @@ -1958,7 +1959,11 @@ static struct worker *create_worker(struct worker_po= ol *pool) if (IS_ERR(worker->task)) goto fail; =20 - set_user_nice(worker->task, pool->attrs->nice); + if (pool->attrs->sched_fifo) + sched_set_fifo_low(worker->task); + else + set_user_nice(worker->task, pool->attrs->nice); + kthread_bind_mask(worker->task, pool->attrs->cpumask); =20 /* successful, attach the worker to the pool */ @@ -4323,9 +4328,17 @@ static void wq_update_unbound_numa(struct workqueue_= struct *wq, int cpu, =20 static int alloc_and_link_pwqs(struct workqueue_struct *wq) { - bool highpri =3D wq->flags & WQ_HIGHPRI; + int pool_index =3D 0; int cpu, ret; =20 + if (wq->flags & WQ_HIGHPRI && wq->flags & WQ_SCHED_FIFO) + return -EINVAL; + + if (wq->flags & WQ_HIGHPRI) + pool_index =3D 1; + if (wq->flags & WQ_SCHED_FIFO) + pool_index =3D 2; + if (!(wq->flags & WQ_UNBOUND)) { wq->cpu_pwqs =3D alloc_percpu(struct pool_workqueue); if (!wq->cpu_pwqs) @@ -4337,7 +4350,7 @@ static int alloc_and_link_pwqs(struct workqueue_struc= t *wq) struct worker_pool *cpu_pools =3D per_cpu(cpu_worker_pools, cpu); =20 - init_pwq(pwq, wq, &cpu_pools[highpri]); + init_pwq(pwq, wq, &cpu_pools[pool_index]); =20 mutex_lock(&wq->mutex); link_pwq(pwq); @@ -4348,13 +4361,13 @@ static int alloc_and_link_pwqs(struct workqueue_str= uct *wq) =20 cpus_read_lock(); if (wq->flags & __WQ_ORDERED) { - ret =3D apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]); + ret =3D apply_workqueue_attrs(wq, ordered_wq_attrs[pool_index]); /* there should only be single pwq for ordering guarantee */ WARN(!ret && (wq->pwqs.next !=3D &wq->dfl_pwq->pwqs_node || wq->pwqs.prev !=3D &wq->dfl_pwq->pwqs_node), "ordering guarantee broken for workqueue %s\n", wq->name); } else { - ret =3D apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]); + ret =3D apply_workqueue_attrs(wq, unbound_std_wq_attrs[pool_index]); } cpus_read_unlock(); =20 @@ -6138,7 +6151,8 @@ static void __init wq_numa_init(void) */ void __init workqueue_init_early(void) { - int std_nice[NR_STD_WORKER_POOLS] =3D { 0, HIGHPRI_NICE_LEVEL }; + int std_nice[NR_STD_WORKER_POOLS] =3D { 0, HIGHPRI_NICE_LEVEL, 0 }; + bool std_sched_fifo[NR_STD_WORKER_POOLS] =3D { false, false, true }; int i, cpu; =20 BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long)); @@ -6158,8 +6172,10 @@ void __init workqueue_init_early(void) BUG_ON(init_worker_pool(pool)); pool->cpu =3D cpu; cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu)); - pool->attrs->nice =3D std_nice[i++]; + pool->attrs->nice =3D std_nice[i]; + pool->attrs->sched_fifo =3D std_sched_fifo[i]; pool->node =3D cpu_to_node(cpu); + i++; =20 /* alloc pool ID */ mutex_lock(&wq_pool_mutex); @@ -6174,6 +6190,7 @@ void __init workqueue_init_early(void) =20 BUG_ON(!(attrs =3D alloc_workqueue_attrs())); attrs->nice =3D std_nice[i]; + attrs->sched_fifo =3D std_sched_fifo[i]; unbound_std_wq_attrs[i] =3D attrs; =20 /* @@ -6183,6 +6200,7 @@ void __init workqueue_init_early(void) */ BUG_ON(!(attrs =3D alloc_workqueue_attrs())); attrs->nice =3D std_nice[i]; + attrs->sched_fifo =3D std_sched_fifo[i]; attrs->no_numa =3D true; ordered_wq_attrs[i] =3D attrs; } --=20 2.39.0.314.g84b9a713c41-goog