From nobody Sat Feb 7 15:40:35 2026 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 160E671EB3 for ; Wed, 28 Feb 2024 19:24:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709148245; cv=none; b=YEfldZoUo6XkezK9+rq9UwWzxMYMrcAufnIxz3TiJVLPZtMPhHzm82rQMdKsfOBkgh03oQTuqrWw+FnF3QWwBe6O96ZcSd63OHXAm4K/kcWfAPgPzWgF9IObIDpnUih/OsTOCE8A2xOS7DMvezzwp4m4OWljbmrSJxOxhgbY8nU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709148245; c=relaxed/simple; bh=b6UIF1ZLdEXM8cy9LIjNnogO7ZLMmBZKzULoHDdfKqU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qbaS/U2FXWvVAfG1lSP3DvvwsKXV3x/gak+viwq3ZPWrlKMdCaIaA0lZSlgznf2U5USOD0LsvLpBm6ZytNNmUkyh4zaa/dIvK4UIgI6AGQtTDtX4ag6VwVm5ryU1pCFmqe2fxnXuKYdJF0fIkGRtjMbMvYICAUuJ3gj3C2REdlY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=CT2XLcWK; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="CT2XLcWK" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-5cfb8126375so13222a12.1 for ; Wed, 28 Feb 2024 11:24:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709148242; x=1709753042; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=k8UxLx5bov8q9DvrLxnPNYwpFvC9klIZbCUnSVx2g6U=; b=CT2XLcWKJqxebLpG6w3ZCLvMPVNWOW2KmWM+AzHpUufgPoNoM5HggS+2u7sin53Iz+ PospHEy0P7wZrB0ugRT6HIqqtQf65ZIG75g6rdBZosj/efmWwGwaMnLzNcAtGAxA9j4q vddNFBLGKjXf2si3va6aTswJUcgEqZu/Hcb2uAlzoU9ZVVblFMljxfEiVqOoW6tXEAHk FFOQWzvTTUCyGg72Ys3EGLfOAvhr7u3y8eoc6MncyfjHAjgYailoj3Qp/INhYZlCGnTq ng9d4DG43Y2e8Gt4mCYMl3cQOHruZfG86KNW8UzpsPAmLPVKjg6z2mVX8TFW7gJ7NDK8 u7Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709148242; x=1709753042; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k8UxLx5bov8q9DvrLxnPNYwpFvC9klIZbCUnSVx2g6U=; b=DTVN5GcIwUfU29kv8+Nm3h4qyvLCdT+qL4TtBUOUbRgVSzgFkRtDzT9PfJ9CAGuNcw bi8IQzoR0Z6fIk55AHNYkxywG9WubsQ7y27D6BOaDQrKtSKUEQedkzCvNs7khcfkx11S X8QbewLdWHPVkyv9p5PKI7vPXY+7NjqALIMVkWLzrOWvxbXzo6tZOW990x/EEntp1SoD x6sj/r3pZMWrIIXagtJ2r1Fa4uq54kaOkCDrw1hrABTmHGrKoPfm/0Qg9nnLJd8kTZWv 0QHWdKYB46AginmvVPAvNDbH8fh4OLjaCC7V3jVJtFhLAAtZifZXdGM23509Ip/igILv 2H8g== X-Gm-Message-State: AOJu0YzuQEaTnfLRcn3uSDuo4WF7H9A5oWIWoEufUpmpB4A/xdYf1SYF owjpG6LYRC1Km3InpAJM8OxoRgTMPdp3J4tbk9y2EeDB8LeTYuI2/KL7OYttmkkeX9nWSRijKhX 8 X-Google-Smtp-Source: AGHT+IEBbo7mA1Ac6EDt35VeDeOTFB/+Lp2WiVlHoYLFb9/bYX8WdCcD0HPPIZZTyBYgcU/REfqBWg== X-Received: by 2002:a17:90a:7565:b0:299:3748:4ada with SMTP id q92-20020a17090a756500b0029937484adamr131097pjk.1.1709148241854; Wed, 28 Feb 2024 11:24:01 -0800 (PST) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id b10-20020a17090a550a00b00298f2ad430csm4230pji.0.2024.02.28.11.23.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 11:24:00 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, mingo@redhat.com, Jens Axboe Subject: [PATCH 1/2] sched/core: switch struct rq->nr_iowait to a normal int Date: Wed, 28 Feb 2024 12:16:56 -0700 Message-ID: <20240228192355.290114-2-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240228192355.290114-1-axboe@kernel.dk> References: <20240228192355.290114-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In 3 of the 4 spots where we modify rq->nr_iowait we already hold the rq lock, and hence don't need atomics to modify the current per-rq iowait count. In the 4th case, where we are scheduling in on a different CPU than the task was previously on, we do not hold the previous rq lock, and hence still need to use an atomic to increment the iowait count. Rename the existing nr_iowait to nr_iowait_remote, and use that for the 4th case. The other three cases can simply inc/dec in a non-atomic fashion under the held rq lock. The per-rq iowait now becomes the difference between the two, the local count minus the remote count. Signed-off-by: Jens Axboe Reviewed-by: Thomas Gleixner --- kernel/sched/core.c | 15 ++++++++++----- kernel/sched/cputime.c | 3 +-- kernel/sched/sched.h | 8 +++++++- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9116bcc90346..48d15529a777 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3789,7 +3789,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p= , int wake_flags, #endif if (p->in_iowait) { delayacct_blkio_end(p); - atomic_dec(&task_rq(p)->nr_iowait); + task_rq(p)->nr_iowait--; } =20 activate_task(rq, p, en_flags); @@ -4354,8 +4354,10 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) cpu =3D select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU); if (task_cpu(p) !=3D cpu) { if (p->in_iowait) { + struct rq *__rq =3D task_rq(p); + delayacct_blkio_end(p); - atomic_dec(&task_rq(p)->nr_iowait); + atomic_inc(&__rq->nr_iowait_remote); } =20 wake_flags |=3D WF_MIGRATED; @@ -5463,7 +5465,9 @@ unsigned long long nr_context_switches(void) =20 unsigned int nr_iowait_cpu(int cpu) { - return atomic_read(&cpu_rq(cpu)->nr_iowait); + struct rq *rq =3D cpu_rq(cpu); + + return rq->nr_iowait - atomic_read(&rq->nr_iowait_remote); } =20 /* @@ -6681,7 +6685,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); =20 if (prev->in_iowait) { - atomic_inc(&rq->nr_iowait); + rq->nr_iowait++; delayacct_blkio_start(); } } @@ -10029,7 +10033,8 @@ void __init sched_init(void) #endif #endif /* CONFIG_SMP */ hrtick_rq_init(rq); - atomic_set(&rq->nr_iowait, 0); + rq->nr_iowait =3D 0; + atomic_set(&rq->nr_iowait_remote, 0); =20 #ifdef CONFIG_SCHED_CORE rq->core =3D rq; diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index af7952f12e6c..0ed81c2d3c3b 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -222,9 +222,8 @@ void account_steal_time(u64 cputime) void account_idle_time(u64 cputime) { u64 *cpustat =3D kcpustat_this_cpu->cpustat; - struct rq *rq =3D this_rq(); =20 - if (atomic_read(&rq->nr_iowait) > 0) + if (nr_iowait_cpu(smp_processor_id()) > 0) cpustat[CPUTIME_IOWAIT] +=3D cputime; else cpustat[CPUTIME_IDLE] +=3D cputime; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 001fe047bd5d..91fa5b4d45ed 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1049,7 +1049,13 @@ struct rq { u64 clock_idle_copy; #endif =20 - atomic_t nr_iowait; + /* + * Total per-cpu iowait is the difference of the two below. One is + * modified under the rq lock (nr_iowait), and if we don't have the rq + * lock, then nr_iowait_remote is used. + */ + unsigned int nr_iowait; + atomic_t nr_iowait_remote; =20 #ifdef CONFIG_SCHED_DEBUG u64 last_seen_need_resched_ns; --=20 2.43.0 From nobody Sat Feb 7 15:40:35 2026 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E74E74421 for ; Wed, 28 Feb 2024 19:24:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709148247; cv=none; b=R7neLyhFqQn6KzmnTKlTooeoKXfMZd3c8xh3kX/Hr/Ibe9rCEGv1wKkcvyfZPwCtJv7SSaT+rLd8fLoaCBXD9P5fhRLXDufJsFnHwNz8PVZ0RIHlXt+rUtxf3JCtgQ5n6fFeLF/PkBMvWifg/ZcoAcylBxZJM8YFmvmkr40NtPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709148247; c=relaxed/simple; bh=QA0fPWib+YPPc1PODUfkCz0IwFdmlpf0p6N1HD6qY8o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Yp6tO+bynn6dITWS30cF21T9v1lRTRkjKfvXWh1/C9dOzwI+2NBY2HtI3z21qyeBK69CCotgMe8eEfTt9iaJucM9aXEHL77s9IufJgYm7JHCzvKhlKo1sXlvjIEwzw+zf5o3hoKRvUz0vZ+V5dwPCRAFfmIDXyiOR4vnahvO0gs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=f3hd7WoW; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="f3hd7WoW" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1dbd81000b6so481555ad.0 for ; Wed, 28 Feb 2024 11:24:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709148244; x=1709753044; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XiHI6t1uZT8CoFgwCByenNek8ZKdV2Mgf3I6FVHUuUE=; b=f3hd7WoW1u1YQ69QQ9i5kV3lyEMerHock0xkHkKYrpmvn2996wcdVby+eZI21KrHWa nVdJHMGdX82mMHwZ/8xgZJQD4KqQK38fFHF2vT2YiOPp5LKohxSmq2FLMKnA0T1FKVA2 HXoG86TxE5EQOL4fTwx2Tkx5wXhxYbcMX/2HpsweeuWI9mHSjK8gcdt+Yf8UZluMCIkh NFRdF9DBWlluNCmbKg6x1AMfU5vDCFFYe40rUEpWoZd07ZwULqWmVRqZxNQ9WJDPypf5 lDxFRxzKShnVuzQLl7gxoHATaXfEmkVnJQO5Z0TWR+QBRzdZaxtkma/kU67fGUF1jR+l CZNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709148244; x=1709753044; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XiHI6t1uZT8CoFgwCByenNek8ZKdV2Mgf3I6FVHUuUE=; b=Lx2tBi7ka3yTNqLTEt/43x3tHvK/ILZcXKgld1fpvYSzstPCrrdXj6Qxz3Af2501IP 6yE4QCfwXfweKEnWx9rD/szDusCka8xYsnjrBjD5nlts60aP4h2KLHrdb2hmEXDmLTYG f1OzfVwTgUI23rFyeNxmZgE4PekbFL4ySvJvOZago+WVJK7rg2E+S5Gp41xewue/88Fi wXk8iw5NJ+9zYKAjt7NoWjwHTKZO0wubyICd4VcRejUZI6GG60fxqFcug19KXgV8V7Bc a+chX8KdlcOdHmVnCBOW+8UbB4+RI3z8eOF9bUO4F/tNUAL9UpVNzXZh9oCDYOd6WXKl VnzQ== X-Gm-Message-State: AOJu0YxU1IOJ0h5IDiHWpgtMM61+ftYTH7+3ov0arxuHV6YZh0bJ8MFO n8D/olAAGXoLJvmgTP54NO0yhZWLZxeW3DpEww2oEXDWSagewnIJH2KOwOF+hGSvjbhmgkd7B0i E X-Google-Smtp-Source: AGHT+IHgCSvhHaxFJvfR6VuVxe3Z96J8yxLqPFmu2U8BvMIEDrec+L1AotK1ZjwJa76MBrj22IvnSA== X-Received: by 2002:a17:90a:3d45:b0:298:b736:ecf7 with SMTP id o5-20020a17090a3d4500b00298b736ecf7mr156126pjf.0.1709148244086; Wed, 28 Feb 2024 11:24:04 -0800 (PST) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id b10-20020a17090a550a00b00298f2ad430csm4230pji.0.2024.02.28.11.24.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 11:24:02 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, mingo@redhat.com, Jens Axboe Subject: [PATCH 2/2] sched/core: split iowait state into two states Date: Wed, 28 Feb 2024 12:16:57 -0700 Message-ID: <20240228192355.290114-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240228192355.290114-1-axboe@kernel.dk> References: <20240228192355.290114-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" iowait is a bogus metric, but it's helpful in the sense that it allows short waits to not enter sleep states that have a higher exit latency than we would've picked for iowait'ing tasks. However, it's harmless in that lots of applications and monitoring assumes that iowait is busy time, or otherwise use it as a health metric. Particularly for async IO it's entirely nonsensical. Split the iowait part into two parts - one that tracks whether we need boosting for short waits, and one that says we need to account the task as such. ->in_iowait_acct nests inside of ->in_iowait, both for efficiency reasons, but also so that the relationship between the two is clear. A waiter may set ->in_wait alone and not care about the accounting. Existing users of nr_iowait() for accounting purposes are switched to use nr_iowait_acct(), which leaves the governor using nr_iowait() as it only cares about iowaiters, not the accounting side. io_schedule_prepare() and io_schedule_finish() are changed to return a simple mask of two state bits, as we now have more than one state to manage. Outside of that, no further changes are needed to suppor this generically. Signed-off-by: Jens Axboe --- arch/s390/appldata/appldata_base.c | 2 +- arch/s390/appldata/appldata_os.c | 2 +- fs/proc/stat.c | 2 +- include/linux/sched.h | 6 ++++ include/linux/sched/stat.h | 10 +++++-- kernel/sched/core.c | 45 ++++++++++++++++++++++++------ kernel/sched/sched.h | 2 ++ kernel/time/tick-sched.c | 6 ++-- 8 files changed, 59 insertions(+), 16 deletions(-) diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/applda= ta_base.c index c2978cb03b36..6844b5294a8b 100644 --- a/arch/s390/appldata/appldata_base.c +++ b/arch/s390/appldata/appldata_base.c @@ -423,4 +423,4 @@ EXPORT_SYMBOL_GPL(si_swapinfo); #endif EXPORT_SYMBOL_GPL(nr_threads); EXPORT_SYMBOL_GPL(nr_running); -EXPORT_SYMBOL_GPL(nr_iowait); +EXPORT_SYMBOL_GPL(nr_iowait_acct); diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata= _os.c index a363d30ce739..fa4b278aca6c 100644 --- a/arch/s390/appldata/appldata_os.c +++ b/arch/s390/appldata/appldata_os.c @@ -100,7 +100,7 @@ static void appldata_get_os_data(void *data) =20 os_data->nr_threads =3D nr_threads; os_data->nr_running =3D nr_running(); - os_data->nr_iowait =3D nr_iowait(); + os_data->nr_iowait =3D nr_iowait_acct(); os_data->avenrun[0] =3D avenrun[0] + (FIXED_1/200); os_data->avenrun[1] =3D avenrun[1] + (FIXED_1/200); os_data->avenrun[2] =3D avenrun[2] + (FIXED_1/200); diff --git a/fs/proc/stat.c b/fs/proc/stat.c index da60956b2915..149be7a884fb 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -180,7 +180,7 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)boottime.tv_sec, total_forks, nr_running(), - nr_iowait()); + nr_iowait_acct()); =20 seq_put_decimal_ull(p, "softirq ", (unsigned long long)sum_softirq); =20 diff --git a/include/linux/sched.h b/include/linux/sched.h index ffe8f618ab86..1e198e268df1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -922,7 +922,13 @@ struct task_struct { =20 /* Bit to tell TOMOYO we're in execve(): */ unsigned in_execve:1; + /* task is in iowait */ unsigned in_iowait:1; + /* + * task is in iowait and should be accounted as such. can only be set + * if ->in_iowait is also set. + */ + unsigned in_iowait_acct:1; #ifndef TIF_RESTORE_SIGMASK unsigned restore_sigmask:1; #endif diff --git a/include/linux/sched/stat.h b/include/linux/sched/stat.h index 0108a38bb64d..7c48a35f98ee 100644 --- a/include/linux/sched/stat.h +++ b/include/linux/sched/stat.h @@ -19,8 +19,14 @@ DECLARE_PER_CPU(unsigned long, process_counts); extern int nr_processes(void); extern unsigned int nr_running(void); extern bool single_task_running(void); -extern unsigned int nr_iowait(void); -extern unsigned int nr_iowait_cpu(int cpu); +extern unsigned int nr_iowait_acct(void); +extern unsigned int nr_iowait_acct_cpu(int cpu); +unsigned int nr_iowait_cpu(int cpu); + +enum { + TASK_IOWAIT =3D 1, + TASK_IOWAIT_ACCT =3D 2, +}; =20 static inline int sched_info_on(void) { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 48d15529a777..66a3654aab5d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3790,6 +3790,8 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p= , int wake_flags, if (p->in_iowait) { delayacct_blkio_end(p); task_rq(p)->nr_iowait--; + if (p->in_iowait_acct) + task_rq(p)->nr_iowait_acct--; } =20 activate_task(rq, p, en_flags); @@ -4358,6 +4360,8 @@ int try_to_wake_up(struct task_struct *p, unsigned in= t state, int wake_flags) =20 delayacct_blkio_end(p); atomic_inc(&__rq->nr_iowait_remote); + if (p->in_iowait_acct) + atomic_inc(&__rq->nr_iowait_acct_remote); } =20 wake_flags |=3D WF_MIGRATED; @@ -5463,11 +5467,11 @@ unsigned long long nr_context_switches(void) * it does become runnable. */ =20 -unsigned int nr_iowait_cpu(int cpu) +unsigned int nr_iowait_acct_cpu(int cpu) { struct rq *rq =3D cpu_rq(cpu); =20 - return rq->nr_iowait - atomic_read(&rq->nr_iowait_remote); + return rq->nr_iowait_acct - atomic_read(&rq->nr_iowait_acct_remote); } =20 /* @@ -5500,16 +5504,23 @@ unsigned int nr_iowait_cpu(int cpu) * Task CPU affinities can make all that even more 'interesting'. */ =20 -unsigned int nr_iowait(void) +unsigned int nr_iowait_acct(void) { unsigned int i, sum =3D 0; =20 for_each_possible_cpu(i) - sum +=3D nr_iowait_cpu(i); + sum +=3D nr_iowait_acct_cpu(i); =20 return sum; } =20 +unsigned int nr_iowait_cpu(int cpu) +{ + struct rq *rq =3D cpu_rq(cpu); + + return rq->nr_iowait - atomic_read(&rq->nr_iowait_remote); +} + #ifdef CONFIG_SMP =20 /* @@ -6686,6 +6697,8 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) =20 if (prev->in_iowait) { rq->nr_iowait++; + if (prev->in_iowait_acct) + rq->nr_iowait_acct++; delayacct_blkio_start(); } } @@ -8988,18 +9001,32 @@ int __sched yield_to(struct task_struct *p, bool pr= eempt) } EXPORT_SYMBOL_GPL(yield_to); =20 +/* + * Returns a token which is comprised of the two bits of iowait wait state= - + * one is whether we're making ourselves as in iowait for cpufreq reasons, + * and the other is if the task should be accounted as such. + */ int io_schedule_prepare(void) { - int old_iowait =3D current->in_iowait; + int old_wait_flags =3D 0; + + if (current->in_iowait) + old_wait_flags |=3D TASK_IOWAIT; + if (current->in_iowait_acct) + old_wait_flags |=3D TASK_IOWAIT_ACCT; =20 current->in_iowait =3D 1; + current->in_iowait_acct =3D 1; blk_flush_plug(current->plug, true); - return old_iowait; + return old_wait_flags; } =20 -void io_schedule_finish(int token) +void io_schedule_finish(int old_wait_flags) { - current->in_iowait =3D token; + if (!(old_wait_flags & TASK_IOWAIT)) + current->in_iowait =3D 0; + if (!(old_wait_flags & TASK_IOWAIT_ACCT)) + current->in_iowait_acct =3D 0; } =20 /* @@ -10033,6 +10060,8 @@ void __init sched_init(void) #endif #endif /* CONFIG_SMP */ hrtick_rq_init(rq); + rq->nr_iowait_acct =3D 0; + atomic_set(&rq->nr_iowait_acct_remote, 0); rq->nr_iowait =3D 0; atomic_set(&rq->nr_iowait_remote, 0); =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 91fa5b4d45ed..abd7a938bc99 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1054,6 +1054,8 @@ struct rq { * modified under the rq lock (nr_iowait), and if we don't have the rq * lock, then nr_iowait_remote is used. */ + unsigned int nr_iowait_acct; + atomic_t nr_iowait_acct_remote; unsigned int nr_iowait; atomic_t nr_iowait_remote; =20 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 01fb50c1b17e..f6709d543dac 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -669,7 +669,7 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, = ktime_t now) delta =3D ktime_sub(now, ts->idle_entrytime); =20 write_seqcount_begin(&ts->idle_sleeptime_seq); - if (nr_iowait_cpu(smp_processor_id()) > 0) + if (nr_iowait_acct_cpu(smp_processor_id()) > 0) ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); else ts->idle_sleeptime =3D ktime_add(ts->idle_sleeptime, delta); @@ -742,7 +742,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); =20 return get_cpu_sleep_time_us(ts, &ts->idle_sleeptime, - !nr_iowait_cpu(cpu), last_update_time); + !nr_iowait_acct_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); =20 @@ -768,7 +768,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_ti= me) struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); =20 return get_cpu_sleep_time_us(ts, &ts->iowait_sleeptime, - nr_iowait_cpu(cpu), last_update_time); + nr_iowait_acct_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); =20 --=20 2.43.0