From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA08EECAAD8 for ; Fri, 16 Sep 2022 18:32:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229906AbiIPSct (ORCPT ); Fri, 16 Sep 2022 14:32:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229662AbiIPScg (ORCPT ); Fri, 16 Sep 2022 14:32:36 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3579EB7756 for ; Fri, 16 Sep 2022 11:32:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353154; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d7F5jNXOig7p4gZdg2FFWDOqFX0Eoffdyr2osFUuJU4=; b=ZHQ03DWJHbdmDNmxMVHSxqYuYuc4Hnl+vWrl9XVx4w5tSRnTFOLNRu8CCxvHsaePV8+Bay mOODvaQiM25R2g/T8/8xYUGtIAhiW1scOGlT/EomC0nh3jdtufta4QzV4JlFr8DQpIkNTk crbcLRyjHxeAx+cc3tteaBV8jU9mkNE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-223-KL0r7pG3Nv2w2aZOqhaSfA-1; Fri, 16 Sep 2022 14:32:28 -0400 X-MC-Unique: KL0r7pG3Nv2w2aZOqhaSfA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5A9AA862FE1; Fri, 16 Sep 2022 18:32:27 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id AB0E5140EBF5; Fri, 16 Sep 2022 18:32:26 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 1/7] sched: Add __releases annotations to affine_move_task() Date: Fri, 16 Sep 2022 14:32:11 -0400 Message-Id: <20220916183217.1172225-2-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" affine_move_task() assumes task_rq_lock() has been called and it does an implicit task_rq_unlock() before returning. Add the appropriate __releases annotations to make this clear. A typo error in comment is also fixed. Signed-off-by: Waiman Long --- kernel/sched/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ee28253c9ac0..b351e6d173b7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2696,6 +2696,8 @@ void release_user_cpus_ptr(struct task_struct *p) */ static int affine_move_task(struct rq *rq, struct task_struct *p, struct r= q_flags *rf, int dest_cpu, unsigned int flags) + __releases(rq->lock) + __releases(p->pi_lock) { struct set_affinity_pending my_pending =3D { }, *pending =3D NULL; bool stop_pending, complete =3D false; @@ -3005,7 +3007,7 @@ static int restrict_cpus_allowed_ptr(struct task_stru= ct *p, =20 /* * Restrict the CPU affinity of task @p so that it is a subset of - * task_cpu_possible_mask() and point @p->user_cpu_ptr to a copy of the + * task_cpu_possible_mask() and point @p->user_cpus_ptr to a copy of the * old affinity mask. If the resulting mask is empty, we warn and walk * up the cpuset hierarchy until we find a suitable mask. */ --=20 2.31.1 From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE4A8ECAAA1 for ; Fri, 16 Sep 2022 18:32:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230072AbiIPScm (ORCPT ); Fri, 16 Sep 2022 14:32:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229987AbiIPSce (ORCPT ); Fri, 16 Sep 2022 14:32:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 736D8B6D68 for ; Fri, 16 Sep 2022 11:32:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NimEyZtTzJQVDtCo9T4viDWAW4CqgFOy4rbVvNAhs2E=; b=FvOjcJcEjPEC+5WZYqnDoTl3ZxscZwQHDwlNye0oFrEFH2dOkzID/HO9GwQA8xkaUjpiwt wyO+DJL8b3SfraSxl9X5pP+MyB9iVato4fW2xdsywY09u0HnGy9fQGwtNi/iuiGAW4Vhpx UxA2ud1OGwDEFcwUNW/v1e16tknDugg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-338-q0sR1qfsMIWyYHR4g6sJFg-1; Fri, 16 Sep 2022 14:32:28 -0400 X-MC-Unique: q0sR1qfsMIWyYHR4g6sJFg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1656429AB3E1; Fri, 16 Sep 2022 18:32:28 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6796A140EBF3; Fri, 16 Sep 2022 18:32:27 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 2/7] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Date: Fri, 16 Sep 2022 14:32:12 -0400 Message-Id: <20220916183217.1172225-3-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched: Introduce task_struct::user_cpus_ptr to track requested affinity"). It is currently used only by arm64 arch due to possible asymmetric CPU setup. This patch extends its usage to save user provided cpumask when sched_setaffinity() is called for all arches. With this patch applied, user_cpus_ptr, once allocated after a successful call to sched_setaffinity(), will only be freed when the task exits. Since user_cpus_ptr is supposed to be used for "requested affinity", there is actually no point to save current cpu affinity in restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called. Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr() if defined but not changing it. This will be some changes in behavior for arm64 systems with asymmetric CPUs in some corner cases. For instance, if sched_setaffinity() has never been called and there is a cpuset change before relax_compatible_cpus_allowed_ptr() is called, its subsequent call will follow what the cpuset allows but not what the previous cpu affinity setting allows. Signed-off-by: Waiman Long --- kernel/sched/core.c | 82 ++++++++++++++++++++------------------------ kernel/sched/sched.h | 7 ++++ 2 files changed, 44 insertions(+), 45 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b351e6d173b7..c7c0425974c2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2850,7 +2850,6 @@ static int __set_cpus_allowed_ptr_locked(struct task_= struct *p, const struct cpumask *cpu_allowed_mask =3D task_cpu_possible_mask(p); const struct cpumask *cpu_valid_mask =3D cpu_active_mask; bool kthread =3D p->flags & PF_KTHREAD; - struct cpumask *user_mask =3D NULL; unsigned int dest_cpu; int ret =3D 0; =20 @@ -2909,14 +2908,7 @@ static int __set_cpus_allowed_ptr_locked(struct task= _struct *p, =20 __do_set_cpus_allowed(p, new_mask, flags); =20 - if (flags & SCA_USER) - user_mask =3D clear_user_cpus_ptr(p); - - ret =3D affine_move_task(rq, p, rf, dest_cpu, flags); - - kfree(user_mask); - - return ret; + return affine_move_task(rq, p, rf, dest_cpu, flags); =20 out: task_rq_unlock(rq, p, rf); @@ -2951,8 +2943,10 @@ EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr); =20 /* * Change a given task's CPU affinity to the intersection of its current - * affinity mask and @subset_mask, writing the resulting mask to @new_mask - * and pointing @p->user_cpus_ptr to a copy of the old mask. + * affinity mask and @subset_mask, writing the resulting mask to @new_mask. + * If user_cpus_ptr is defined, use it as the basis for restricting CPU + * affinity or use cpu_online_mask instead. + * * If the resulting mask is empty, leave the affinity unchanged and return * -EINVAL. */ @@ -2960,17 +2954,10 @@ static int restrict_cpus_allowed_ptr(struct task_st= ruct *p, struct cpumask *new_mask, const struct cpumask *subset_mask) { - struct cpumask *user_mask =3D NULL; struct rq_flags rf; struct rq *rq; int err; =20 - if (!p->user_cpus_ptr) { - user_mask =3D kmalloc(cpumask_size(), GFP_KERNEL); - if (!user_mask) - return -ENOMEM; - } - rq =3D task_rq_lock(p, &rf); =20 /* @@ -2983,25 +2970,15 @@ static int restrict_cpus_allowed_ptr(struct task_st= ruct *p, goto err_unlock; } =20 - if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) { + if (!cpumask_and(new_mask, task_user_cpus(p), subset_mask)) { err =3D -EINVAL; goto err_unlock; } =20 - /* - * We're about to butcher the task affinity, so keep track of what - * the user asked for in case we're able to restore it later on. - */ - if (user_mask) { - cpumask_copy(user_mask, p->cpus_ptr); - p->user_cpus_ptr =3D user_mask; - } - return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, &rf); =20 err_unlock: task_rq_unlock(rq, p, &rf); - kfree(user_mask); return err; } =20 @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const st= ruct cpumask *mask); =20 /* * Restore the affinity of a task @p which was previously restricted by a - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free) - * @p->user_cpus_ptr. + * call to force_compatible_cpus_allowed_ptr(). * * It is the caller's responsibility to serialise this with any calls to * force_compatible_cpus_allowed_ptr(@p). */ void relax_compatible_cpus_allowed_ptr(struct task_struct *p) { - struct cpumask *user_mask =3D p->user_cpus_ptr; - unsigned long flags; + int ret; =20 /* - * Try to restore the old affinity mask. If this fails, then - * we free the mask explicitly to avoid it being inherited across - * a subsequent fork(). + * Try to restore the old affinity mask with __sched_setaffinity(). + * Cpuset masking will be done there too. */ - if (!user_mask || !__sched_setaffinity(p, user_mask)) - return; - - raw_spin_lock_irqsave(&p->pi_lock, flags); - user_mask =3D clear_user_cpus_ptr(p); - raw_spin_unlock_irqrestore(&p->pi_lock, flags); - - kfree(user_mask); + ret =3D __sched_setaffinity(p, task_user_cpus(p)); + WARN_ON_ONCE(ret); } =20 void set_task_cpu(struct task_struct *p, unsigned int new_cpu) @@ -8101,7 +8069,7 @@ __sched_setaffinity(struct task_struct *p, const stru= ct cpumask *mask) if (retval) goto out_free_new_mask; again: - retval =3D __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | SCA_USER); + retval =3D __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK); if (retval) goto out_free_new_mask; =20 @@ -8124,6 +8092,7 @@ __sched_setaffinity(struct task_struct *p, const stru= ct cpumask *mask) =20 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) { + struct cpumask *user_mask; struct task_struct *p; int retval; =20 @@ -8158,7 +8127,30 @@ long sched_setaffinity(pid_t pid, const struct cpuma= sk *in_mask) if (retval) goto out_put_task; =20 + user_mask =3D kmalloc(cpumask_size(), GFP_KERNEL); + if (!user_mask) { + retval =3D -ENOMEM; + goto out_put_task; + } + cpumask_copy(user_mask, in_mask); + retval =3D __sched_setaffinity(p, in_mask); + + /* + * Save in_mask into user_cpus_ptr after a successful + * __sched_setaffinity() call. pi_lock is used to synchronize + * changes to user_cpus_ptr. + */ + if (!retval) { + unsigned long flags; + + /* Use pi_lock to synchronize changes to user_cpus_ptr */ + raw_spin_lock_irqsave(&p->pi_lock, flags); + swap(p->user_cpus_ptr, user_mask); + raw_spin_unlock_irqrestore(&p->pi_lock, flags); + } + kfree(user_mask); + out_put_task: put_task_struct(p); return retval; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e26688d387ae..ac235bc8ef08 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1881,6 +1881,13 @@ static inline void dirty_sched_domain_sysctl(int cpu) #endif =20 extern int sched_update_scaling(void); + +static inline const struct cpumask *task_user_cpus(struct task_struct *p) +{ + if (!p->user_cpus_ptr) + return cpu_possible_mask; /* &init_task.cpus_mask */ + return p->user_cpus_ptr; +} #endif /* CONFIG_SMP */ =20 #include "stats.h" --=20 2.31.1 From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43FD3ECAAA1 for ; Fri, 16 Sep 2022 18:32:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230062AbiIPScq (ORCPT ); Fri, 16 Sep 2022 14:32:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229657AbiIPScf (ORCPT ); Fri, 16 Sep 2022 14:32:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71D08A8968 for ; Fri, 16 Sep 2022 11:32:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353153; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F54pq1Oysxtlf47HLeXyrmlcrZpsuHEQEVGt0+zK5nE=; b=in9tPEtgDtl8g3G6Wzp+x8OcwooQeN0VLYOpcc3S7Z/AkciKUdex9qBop+GhDrLg6+mYr1 QsE0/+8fh2SCWWKXtDyrsmtWcr5leAzAORLNm1Jrzy6DDEF39PgMFvu2kQ/ayT9xj2SrsN THkU4tbXZv/KqzzIpwgSiVE1iqjPwqE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-227-xSrOv6HPNvqs42ETZ-ijTw-1; Fri, 16 Sep 2022 14:32:29 -0400 X-MC-Unique: xSrOv6HPNvqs42ETZ-ijTw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C4E843C0CD54; Fri, 16 Sep 2022 18:32:28 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 21CCD140EBF3; Fri, 16 Sep 2022 18:32:28 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 3/7] sched: Enforce user requested affinity Date: Fri, 16 Sep 2022 14:32:13 -0400 Message-Id: <20220916183217.1172225-4-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It was found that the user requested affinity via sched_setaffinity() can be easily overwritten by other kernel subsystems without an easy way to reset it back to what the user requested. For example, any change to the current cpuset hierarchy may reset the cpumask of the tasks in the affected cpusets to the default cpuset value even if those tasks have pre-existing user requested affinity. That is especially easy to trigger under a cgroup v2 environment where writing "+cpuset" to the root cgroup's cgroup.subtree_control file will reset the cpus affinity of all the processes in the system. That is problematic in a nohz_full environment where the tasks running in the nohz_full CPUs usually have their cpus affinity explicitly set and will behave incorrectly if cpus affinity changes. Fix this problem by looking at user_cpus_ptr in __set_cpus_allowed_ptr() and use it to restrcit the given cpumask unless there is no overlap. In that case, it will fallback to the given one. The SCA_USER flag is reused to indicate intent to set user_cpus_ptr and so user_cpus_ptr masking should be skipped. In addition, masking should also be skipped if any of the SCA_MIGRATE_* flag is set. All callers of set_cpus_allowed_ptr() will be affected by this change. A scratch cpumask is added to percpu runqueues structure for doing additional masking when user_cpus_ptr is set. Signed-off-by: Waiman Long --- kernel/sched/core.c | 22 +++++++++++++++++----- kernel/sched/sched.h | 3 +++ 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c7c0425974c2..ab8e591dbaf5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2932,6 +2932,15 @@ static int __set_cpus_allowed_ptr(struct task_struct= *p, struct rq *rq; =20 rq =3D task_rq_lock(p, &rf); + /* + * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_* + * flags are set. + */ + if (p->user_cpus_ptr && + !(flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) && + cpumask_and(rq->scratch_mask, new_mask, p->user_cpus_ptr)) + new_mask =3D rq->scratch_mask; + return __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, &rf); } =20 @@ -3028,7 +3037,7 @@ void force_compatible_cpus_allowed_ptr(struct task_st= ruct *p) } =20 static int -__sched_setaffinity(struct task_struct *p, const struct cpumask *mask); +__sched_setaffinity(struct task_struct *p, const struct cpumask *mask, int= flags); =20 /* * Restore the affinity of a task @p which was previously restricted by a @@ -3045,7 +3054,7 @@ void relax_compatible_cpus_allowed_ptr(struct task_st= ruct *p) * Try to restore the old affinity mask with __sched_setaffinity(). * Cpuset masking will be done there too. */ - ret =3D __sched_setaffinity(p, task_user_cpus(p)); + ret =3D __sched_setaffinity(p, task_user_cpus(p), 0); WARN_ON_ONCE(ret); } =20 @@ -8049,7 +8058,7 @@ int dl_task_check_affinity(struct task_struct *p, con= st struct cpumask *mask) #endif =20 static int -__sched_setaffinity(struct task_struct *p, const struct cpumask *mask) +__sched_setaffinity(struct task_struct *p, const struct cpumask *mask, int= flags) { int retval; cpumask_var_t cpus_allowed, new_mask; @@ -8069,7 +8078,7 @@ __sched_setaffinity(struct task_struct *p, const stru= ct cpumask *mask) if (retval) goto out_free_new_mask; again: - retval =3D __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK); + retval =3D __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | flags); if (retval) goto out_free_new_mask; =20 @@ -8134,7 +8143,7 @@ long sched_setaffinity(pid_t pid, const struct cpumas= k *in_mask) } cpumask_copy(user_mask, in_mask); =20 - retval =3D __sched_setaffinity(p, in_mask); + retval =3D __sched_setaffinity(p, in_mask, SCA_USER); =20 /* * Save in_mask into user_cpus_ptr after a successful @@ -9647,6 +9656,9 @@ void __init sched_init(void) cpumask_size(), GFP_KERNEL, cpu_to_node(i)); per_cpu(select_rq_mask, i) =3D (cpumask_var_t)kzalloc_node( cpumask_size(), GFP_KERNEL, cpu_to_node(i)); + per_cpu(runqueues.scratch_mask, i) =3D + (cpumask_var_t)kzalloc_node(cpumask_size(), + GFP_KERNEL, cpu_to_node(i)); } #endif /* CONFIG_CPUMASK_OFFSTACK */ =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ac235bc8ef08..482b702d65ea 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1159,6 +1159,9 @@ struct rq { unsigned int core_forceidle_occupation; u64 core_forceidle_start; #endif + + /* Scratch cpumask to be temporarily used under rq_lock */ + cpumask_var_t scratch_mask; }; =20 #ifdef CONFIG_FAIR_GROUP_SCHED --=20 2.31.1 From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20E35ECAAD8 for ; Fri, 16 Sep 2022 18:32:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230045AbiIPSch (ORCPT ); Fri, 16 Sep 2022 14:32:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229704AbiIPScd (ORCPT ); Fri, 16 Sep 2022 14:32:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D2AFA8968 for ; Fri, 16 Sep 2022 11:32:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353151; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0z0MoJ1mOMYY678xtBLlHklVO72FgsQCaiAG6dT+sgA=; b=F1BMNdN6IjW/H+Reqm2o34TEPcitQMvJNe+Ujf4uKTqRI53INd900zYQTFgtbypbFe+p0I YMlbDlmyHvqo2gBVPS9iIhX7Ovt9ayzL2/rMgTB+uXrq+2kV6DIQCJSI+aeP7PeDbIC8sJ oy7e7NrRuJykJR3qMZu/ZMzpC06g/zk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-528-FNtTXEW6Mf6KN7K26Kmzig-1; Fri, 16 Sep 2022 14:32:30 -0400 X-MC-Unique: FNtTXEW6Mf6KN7K26Kmzig-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 896841C05AEA; Fri, 16 Sep 2022 18:32:29 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id D23F6140EBF3; Fri, 16 Sep 2022 18:32:28 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 4/7] sched: Introduce affinity_context structure Date: Fri, 16 Sep 2022 14:32:14 -0400 Message-Id: <20220916183217.1172225-5-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Introduce a new affinity_context structure for passing cpu affinity informa= tion around in core scheduler code. The relevant functions are modified to use the new structure. There is no functional change. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- kernel/sched/core.c | 120 ++++++++++++++++++++++++++-------------- kernel/sched/deadline.c | 7 +-- kernel/sched/sched.h | 11 ++-- 3 files changed, 89 insertions(+), 49 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ab8e591dbaf5..b662d8ddc169 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2195,14 +2195,18 @@ void check_preempt_curr(struct rq *rq, struct task_= struct *p, int flags) #ifdef CONFIG_SMP =20 static void -__do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mas= k, u32 flags); +__do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx); =20 static int __set_cpus_allowed_ptr(struct task_struct *p, - const struct cpumask *new_mask, - u32 flags); + struct affinity_context *ctx); =20 static void migrate_disable_switch(struct rq *rq, struct task_struct *p) { + struct affinity_context ac =3D { + .new_mask =3D cpumask_of(rq->cpu), + .flags =3D SCA_MIGRATE_DISABLE, + }; + if (likely(!p->migration_disabled)) return; =20 @@ -2212,7 +2216,7 @@ static void migrate_disable_switch(struct rq *rq, str= uct task_struct *p) /* * Violates locking rules! see comment in __do_set_cpus_allowed(). */ - __do_set_cpus_allowed(p, cpumask_of(rq->cpu), SCA_MIGRATE_DISABLE); + __do_set_cpus_allowed(p, &ac); } =20 void migrate_disable(void) @@ -2234,6 +2238,10 @@ EXPORT_SYMBOL_GPL(migrate_disable); void migrate_enable(void) { struct task_struct *p =3D current; + struct affinity_context ac =3D { + .new_mask =3D &p->cpus_mask, + .flags =3D SCA_MIGRATE_ENABLE, + }; =20 if (p->migration_disabled > 1) { p->migration_disabled--; @@ -2249,7 +2257,7 @@ void migrate_enable(void) */ preempt_disable(); if (p->cpus_ptr !=3D &p->cpus_mask) - __set_cpus_allowed_ptr(p, &p->cpus_mask, SCA_MIGRATE_ENABLE); + __set_cpus_allowed_ptr(p, &ac); /* * Mustn't clear migration_disabled() until cpus_ptr points back at the * regular cpus_mask, otherwise things that race (eg. @@ -2529,19 +2537,19 @@ int push_cpu_stop(void *arg) * sched_class::set_cpus_allowed must do the below, but is not required to * actually call this function. */ -void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *= new_mask, u32 flags) +void set_cpus_allowed_common(struct task_struct *p, struct affinity_contex= t *ctx) { - if (flags & (SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) { - p->cpus_ptr =3D new_mask; + if (ctx->flags & (SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) { + p->cpus_ptr =3D ctx->new_mask; return; } =20 - cpumask_copy(&p->cpus_mask, new_mask); - p->nr_cpus_allowed =3D cpumask_weight(new_mask); + cpumask_copy(&p->cpus_mask, ctx->new_mask); + p->nr_cpus_allowed =3D cpumask_weight(ctx->new_mask); } =20 static void -__do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mas= k, u32 flags) +__do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { struct rq *rq =3D task_rq(p); bool queued, running; @@ -2558,7 +2566,7 @@ __do_set_cpus_allowed(struct task_struct *p, const st= ruct cpumask *new_mask, u32 * * XXX do further audits, this smells like something putrid. */ - if (flags & SCA_MIGRATE_DISABLE) + if (ctx->flags & SCA_MIGRATE_DISABLE) SCHED_WARN_ON(!p->on_cpu); else lockdep_assert_held(&p->pi_lock); @@ -2577,7 +2585,7 @@ __do_set_cpus_allowed(struct task_struct *p, const st= ruct cpumask *new_mask, u32 if (running) put_prev_task(rq, p); =20 - p->sched_class->set_cpus_allowed(p, new_mask, flags); + p->sched_class->set_cpus_allowed(p, ctx); =20 if (queued) enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); @@ -2587,7 +2595,12 @@ __do_set_cpus_allowed(struct task_struct *p, const s= truct cpumask *new_mask, u32 =20 void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_= mask) { - __do_set_cpus_allowed(p, new_mask, 0); + struct affinity_context ac =3D { + .new_mask =3D new_mask, + .flags =3D 0, + }; + + __do_set_cpus_allowed(p, &ac); } =20 int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, @@ -2840,8 +2853,7 @@ static int affine_move_task(struct rq *rq, struct tas= k_struct *p, struct rq_flag * Called with both p->pi_lock and rq->lock held; drops both before return= ing. */ static int __set_cpus_allowed_ptr_locked(struct task_struct *p, - const struct cpumask *new_mask, - u32 flags, + struct affinity_context *ctx, struct rq *rq, struct rq_flags *rf) __releases(rq->lock) @@ -2869,7 +2881,7 @@ static int __set_cpus_allowed_ptr_locked(struct task_= struct *p, cpu_valid_mask =3D cpu_online_mask; } =20 - if (!kthread && !cpumask_subset(new_mask, cpu_allowed_mask)) { + if (!kthread && !cpumask_subset(ctx->new_mask, cpu_allowed_mask)) { ret =3D -EINVAL; goto out; } @@ -2878,18 +2890,18 @@ static int __set_cpus_allowed_ptr_locked(struct tas= k_struct *p, * Must re-check here, to close a race against __kthread_bind(), * sched_setaffinity() is not guaranteed to observe the flag. */ - if ((flags & SCA_CHECK) && (p->flags & PF_NO_SETAFFINITY)) { + if ((ctx->flags & SCA_CHECK) && (p->flags & PF_NO_SETAFFINITY)) { ret =3D -EINVAL; goto out; } =20 - if (!(flags & SCA_MIGRATE_ENABLE)) { - if (cpumask_equal(&p->cpus_mask, new_mask)) + if (!(ctx->flags & SCA_MIGRATE_ENABLE)) { + if (cpumask_equal(&p->cpus_mask, ctx->new_mask)) goto out; =20 if (WARN_ON_ONCE(p =3D=3D current && is_migration_disabled(p) && - !cpumask_test_cpu(task_cpu(p), new_mask))) { + !cpumask_test_cpu(task_cpu(p), ctx->new_mask))) { ret =3D -EBUSY; goto out; } @@ -2900,15 +2912,15 @@ static int __set_cpus_allowed_ptr_locked(struct tas= k_struct *p, * for groups of tasks (ie. cpuset), so that load balancing is not * immediately required to distribute the tasks within their new mask. */ - dest_cpu =3D cpumask_any_and_distribute(cpu_valid_mask, new_mask); + dest_cpu =3D cpumask_any_and_distribute(cpu_valid_mask, ctx->new_mask); if (dest_cpu >=3D nr_cpu_ids) { ret =3D -EINVAL; goto out; } =20 - __do_set_cpus_allowed(p, new_mask, flags); + __do_set_cpus_allowed(p, ctx); =20 - return affine_move_task(rq, p, rf, dest_cpu, flags); + return affine_move_task(rq, p, rf, dest_cpu, ctx->flags); =20 out: task_rq_unlock(rq, p, rf); @@ -2926,7 +2938,7 @@ static int __set_cpus_allowed_ptr_locked(struct task_= struct *p, * call is not atomic; no spinlocks may be held. */ static int __set_cpus_allowed_ptr(struct task_struct *p, - const struct cpumask *new_mask, u32 flags) + struct affinity_context *ctx) { struct rq_flags rf; struct rq *rq; @@ -2937,16 +2949,21 @@ static int __set_cpus_allowed_ptr(struct task_struc= t *p, * flags are set. */ if (p->user_cpus_ptr && - !(flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) && - cpumask_and(rq->scratch_mask, new_mask, p->user_cpus_ptr)) - new_mask =3D rq->scratch_mask; + !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE))= && + cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr)) + ctx->new_mask =3D rq->scratch_mask; =20 - return __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, &rf); + return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf); } =20 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_= mask) { - return __set_cpus_allowed_ptr(p, new_mask, 0); + struct affinity_context ac =3D { + .new_mask =3D new_mask, + .flags =3D 0, + }; + + return __set_cpus_allowed_ptr(p, &ac); } EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr); =20 @@ -2963,6 +2980,10 @@ static int restrict_cpus_allowed_ptr(struct task_str= uct *p, struct cpumask *new_mask, const struct cpumask *subset_mask) { + struct affinity_context ac =3D { + .new_mask =3D new_mask, + .flags =3D 0, + }; struct rq_flags rf; struct rq *rq; int err; @@ -2984,7 +3005,7 @@ static int restrict_cpus_allowed_ptr(struct task_stru= ct *p, goto err_unlock; } =20 - return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, &rf); + return __set_cpus_allowed_ptr_locked(p, &ac, rq, &rf); =20 err_unlock: task_rq_unlock(rq, p, &rf); @@ -3037,7 +3058,7 @@ void force_compatible_cpus_allowed_ptr(struct task_st= ruct *p) } =20 static int -__sched_setaffinity(struct task_struct *p, const struct cpumask *mask, int= flags); +__sched_setaffinity(struct task_struct *p, struct affinity_context *ctx); =20 /* * Restore the affinity of a task @p which was previously restricted by a @@ -3048,13 +3069,17 @@ __sched_setaffinity(struct task_struct *p, const st= ruct cpumask *mask, int flags */ void relax_compatible_cpus_allowed_ptr(struct task_struct *p) { + struct affinity_context ac =3D { + .new_mask =3D task_user_cpus(p), + .flags =3D 0, + }; int ret; =20 /* * Try to restore the old affinity mask with __sched_setaffinity(). * Cpuset masking will be done there too. */ - ret =3D __sched_setaffinity(p, task_user_cpus(p), 0); + ret =3D __sched_setaffinity(p, &ac); WARN_ON_ONCE(ret); } =20 @@ -3533,10 +3558,9 @@ void sched_set_stop_task(int cpu, struct task_struct= *stop) #else /* CONFIG_SMP */ =20 static inline int __set_cpus_allowed_ptr(struct task_struct *p, - const struct cpumask *new_mask, - u32 flags) + struct affinity_context *ctx) { - return set_cpus_allowed_ptr(p, new_mask); + return set_cpus_allowed_ptr(p, ctx->new_mask); } =20 static inline void migrate_disable_switch(struct rq *rq, struct task_struc= t *p) { } @@ -8058,7 +8082,7 @@ int dl_task_check_affinity(struct task_struct *p, con= st struct cpumask *mask) #endif =20 static int -__sched_setaffinity(struct task_struct *p, const struct cpumask *mask, int= flags) +__sched_setaffinity(struct task_struct *p, struct affinity_context *ctx) { int retval; cpumask_var_t cpus_allowed, new_mask; @@ -8072,13 +8096,16 @@ __sched_setaffinity(struct task_struct *p, const st= ruct cpumask *mask, int flags } =20 cpuset_cpus_allowed(p, cpus_allowed); - cpumask_and(new_mask, mask, cpus_allowed); + cpumask_and(new_mask, ctx->new_mask, cpus_allowed); + + ctx->new_mask =3D new_mask; + ctx->flags |=3D SCA_CHECK; =20 retval =3D dl_task_check_affinity(p, new_mask); if (retval) goto out_free_new_mask; again: - retval =3D __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | flags); + retval =3D __set_cpus_allowed_ptr(p, ctx); if (retval) goto out_free_new_mask; =20 @@ -8101,6 +8128,7 @@ __sched_setaffinity(struct task_struct *p, const stru= ct cpumask *mask, int flags =20 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) { + struct affinity_context ac; struct cpumask *user_mask; struct task_struct *p; int retval; @@ -8142,8 +8170,12 @@ long sched_setaffinity(pid_t pid, const struct cpuma= sk *in_mask) goto out_put_task; } cpumask_copy(user_mask, in_mask); + ac =3D (struct affinity_context){ + .new_mask =3D in_mask, + .flags =3D SCA_USER, + }; =20 - retval =3D __sched_setaffinity(p, in_mask, SCA_USER); + retval =3D __sched_setaffinity(p, &ac); =20 /* * Save in_mask into user_cpus_ptr after a successful @@ -8940,6 +8972,12 @@ void show_state_filter(unsigned int state_filter) */ void __init init_idle(struct task_struct *idle, int cpu) { +#ifdef CONFIG_SMP + struct affinity_context ac =3D (struct affinity_context) { + .new_mask =3D cpumask_of(cpu), + .flags =3D 0, + }; +#endif struct rq *rq =3D cpu_rq(cpu); unsigned long flags; =20 @@ -8964,7 +9002,7 @@ void __init init_idle(struct task_struct *idle, int c= pu) * * And since this is boot we can forgo the serialization. */ - set_cpus_allowed_common(idle, cpumask_of(cpu), 0); + set_cpus_allowed_common(idle, &ac); #endif /* * We're having a chicken and egg problem, even though we are diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 0ab79d819a0d..38fa2c3ef7db 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2486,8 +2486,7 @@ static void task_woken_dl(struct rq *rq, struct task_= struct *p) } =20 static void set_cpus_allowed_dl(struct task_struct *p, - const struct cpumask *new_mask, - u32 flags) + struct affinity_context *ctx) { struct root_domain *src_rd; struct rq *rq; @@ -2502,7 +2501,7 @@ static void set_cpus_allowed_dl(struct task_struct *p, * update. We already made space for us in the destination * domain (see cpuset_can_attach()). */ - if (!cpumask_intersects(src_rd->span, new_mask)) { + if (!cpumask_intersects(src_rd->span, ctx->new_mask)) { struct dl_bw *src_dl_b; =20 src_dl_b =3D dl_bw_of(cpu_of(rq)); @@ -2516,7 +2515,7 @@ static void set_cpus_allowed_dl(struct task_struct *p, raw_spin_unlock(&src_dl_b->lock); } =20 - set_cpus_allowed_common(p, new_mask, flags); + set_cpus_allowed_common(p, ctx); } =20 /* Assumes rq->lock is held */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 482b702d65ea..1927c02f68fa 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2157,6 +2157,11 @@ extern const u32 sched_prio_to_wmult[40]; =20 #define RETRY_TASK ((void *)-1UL) =20 +struct affinity_context { + const struct cpumask *new_mask; + unsigned int flags; +}; + struct sched_class { =20 #ifdef CONFIG_UCLAMP_TASK @@ -2185,9 +2190,7 @@ struct sched_class { =20 void (*task_woken)(struct rq *this_rq, struct task_struct *task); =20 - void (*set_cpus_allowed)(struct task_struct *p, - const struct cpumask *newmask, - u32 flags); + void (*set_cpus_allowed)(struct task_struct *p, struct affinity_context *= ctx); =20 void (*rq_online)(struct rq *rq); void (*rq_offline)(struct rq *rq); @@ -2301,7 +2304,7 @@ extern void update_group_capacity(struct sched_domain= *sd, int cpu); =20 extern void trigger_load_balance(struct rq *rq); =20 -extern void set_cpus_allowed_common(struct task_struct *p, const struct cp= umask *new_mask, u32 flags); +extern void set_cpus_allowed_common(struct task_struct *p, struct affinity= _context *ctx); =20 static inline struct task_struct *get_push_task(struct rq *rq) { --=20 2.31.1 From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2374DECAAA1 for ; Fri, 16 Sep 2022 18:33:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230163AbiIPSdE (ORCPT ); Fri, 16 Sep 2022 14:33:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229704AbiIPScj (ORCPT ); Fri, 16 Sep 2022 14:32:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BF24B7767 for ; Fri, 16 Sep 2022 11:32:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353157; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DDgf6zwljfxgVvExU0RAh8U1qzZSnrk4wBRDgCybunk=; b=PLUh0t08GXoVSmemeLZUXorgGNgus4PG/erlDR8/imbWWIIXZi6s6JmVza+L7p6UF/6WBM lj9g0bsjQ4ttzs88ILjTYFcxIZQAIlE6JQ/XmgP0znElBwRqfTMHqM5xi2Cijt8Yialuhe Bzp4VbY0Jg9UF5nJoZRZ61DsOw2OjUI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-19-sglvBA7eOsOFkcLUoBxZFg-1; Fri, 16 Sep 2022 14:32:31 -0400 X-MC-Unique: sglvBA7eOsOFkcLUoBxZFg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 492F529AB3F3; Fri, 16 Sep 2022 18:32:30 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 975E1140EBF3; Fri, 16 Sep 2022 18:32:29 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 5/7] sched: Handle set_cpus_allowed_ptr() & sched_setaffinity() race Date: Fri, 16 Sep 2022 14:32:15 -0400 Message-Id: <20220916183217.1172225-6-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Racing is possible between set_cpus_allowed_ptr() and sched_setaffinity() or between multiple sched_setaffinity() calls from different CPUs. To resolve these race conditions, we need to update both user_cpus_ptr and cpus_mask in a single lock critical section instead of separated ones. This requires moving the user_cpus_ptr update to set_cpus_allowed_common() by putting the user_mask into the affinity_context structure. This patch also changes the handling of the race between the sched_setaffinity() call and the changing of cpumask of the current cpuset. In case the new mask conflicts with newly updated cpuset, the cpus_mask will be reset to the cpuset cpumask and an error value of -EINVAL will be returned. If a previous user_cpus_ptr value exists, it will be swapped back in and the new_mask will be further restricted to what is allowed in the cpumask pointed to by the old user_cpus_ptr. Signed-off-by: Waiman Long --- kernel/sched/core.c | 44 +++++++++++++++++++++++++++----------------- kernel/sched/sched.h | 1 + 2 files changed, 28 insertions(+), 17 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b662d8ddc169..c748e56ba254 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2546,6 +2546,12 @@ void set_cpus_allowed_common(struct task_struct *p, = struct affinity_context *ctx =20 cpumask_copy(&p->cpus_mask, ctx->new_mask); p->nr_cpus_allowed =3D cpumask_weight(ctx->new_mask); + + /* + * Swap in a new user_cpus_ptr if SCA_USER flag set + */ + if (ctx->flags & SCA_USER) + swap(p->user_cpus_ptr, ctx->user_mask); } =20 static void @@ -8104,7 +8110,7 @@ __sched_setaffinity(struct task_struct *p, struct aff= inity_context *ctx) retval =3D dl_task_check_affinity(p, new_mask); if (retval) goto out_free_new_mask; -again: + retval =3D __set_cpus_allowed_ptr(p, ctx); if (retval) goto out_free_new_mask; @@ -8116,7 +8122,24 @@ __sched_setaffinity(struct task_struct *p, struct af= finity_context *ctx) * Just reset the cpumask to the cpuset's cpus_allowed. */ cpumask_copy(new_mask, cpus_allowed); - goto again; + + /* + * If SCA_USER is set, a 2nd call to __set_cpus_allowed_ptr() + * will restore the previous user_cpus_ptr value. + * + * In the unlikely event a previous user_cpus_ptr exists, + * we need to further restrict the mask to what is allowed + * by that old user_cpus_ptr. + */ + if (unlikely((ctx->flags & SCA_USER) && ctx->user_mask)) { + bool empty =3D !cpumask_and(new_mask, new_mask, + ctx->user_mask); + + if (WARN_ON_ONCE(empty)) + cpumask_copy(new_mask, cpus_allowed); + } + __set_cpus_allowed_ptr(p, ctx); + retval =3D -EINVAL; } =20 out_free_new_mask: @@ -8172,25 +8195,12 @@ long sched_setaffinity(pid_t pid, const struct cpum= ask *in_mask) cpumask_copy(user_mask, in_mask); ac =3D (struct affinity_context){ .new_mask =3D in_mask, + .user_mask =3D user_mask, .flags =3D SCA_USER, }; =20 retval =3D __sched_setaffinity(p, &ac); - - /* - * Save in_mask into user_cpus_ptr after a successful - * __sched_setaffinity() call. pi_lock is used to synchronize - * changes to user_cpus_ptr. - */ - if (!retval) { - unsigned long flags; - - /* Use pi_lock to synchronize changes to user_cpus_ptr */ - raw_spin_lock_irqsave(&p->pi_lock, flags); - swap(p->user_cpus_ptr, user_mask); - raw_spin_unlock_irqrestore(&p->pi_lock, flags); - } - kfree(user_mask); + kfree(ac.user_mask); =20 out_put_task: put_task_struct(p); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1927c02f68fa..110e13b7d78b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2159,6 +2159,7 @@ extern const u32 sched_prio_to_wmult[40]; =20 struct affinity_context { const struct cpumask *new_mask; + struct cpumask *user_mask; unsigned int flags; }; =20 --=20 2.31.1 From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84645ECAAD8 for ; Fri, 16 Sep 2022 18:33:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230084AbiIPSdA (ORCPT ); Fri, 16 Sep 2022 14:33:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230043AbiIPSch (ORCPT ); Fri, 16 Sep 2022 14:32:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E9BFB7756 for ; Fri, 16 Sep 2022 11:32:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T+PEuxHCi+ATSWjwodTYHg19fW+xLGaox+jYGObcwjs=; b=R5m03Ml6RI9X4E4Akzepr4KknHBtBKazf/2dHE6+ah9YZqTHGI7QbwEip8nySKwubkslca Ydyec8mKRBWgOlxFPTIwYoqrx3ojKqkz0fM9CAt6ymDpvZO/tNWXmI+y43o3pJblR5D5J/ XWoexWIYM2yOGS18aKjx39V/Oidg+K8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-121-Oq0xZ-C2MairowVbc7yeXA-1; Fri, 16 Sep 2022 14:32:31 -0400 X-MC-Unique: Oq0xZ-C2MairowVbc7yeXA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 04EEF862FDC; Fri, 16 Sep 2022 18:32:31 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 54AF3140EBF5; Fri, 16 Sep 2022 18:32:30 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 6/7] sched: Fix sched_setaffinity() and fork/clone() race Date: Fri, 16 Sep 2022 14:32:16 -0400 Message-Id: <20220916183217.1172225-7-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" sched_setaffinity() can also race with a concurrent fork/clone() syscall calling dup_user_cpus_ptr(). That may lead to a use after free problem. Fix that by protecting the cpumask copying using pi_lock of the source task. Signed-off-by: Waiman Long --- kernel/sched/core.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c748e56ba254..ce626cad4105 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2612,6 +2612,8 @@ void do_set_cpus_allowed(struct task_struct *p, const= struct cpumask *new_mask) int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node) { + unsigned long flags; + if (!src->user_cpus_ptr) return 0; =20 @@ -2619,7 +2621,10 @@ int dup_user_cpus_ptr(struct task_struct *dst, struc= t task_struct *src, if (!dst->user_cpus_ptr) return -ENOMEM; =20 + /* Use pi_lock to protect content of user_cpus_ptr */ + raw_spin_lock_irqsave(&src->pi_lock, flags); cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr); + raw_spin_unlock_irqrestore(&src->pi_lock, flags); return 0; } =20 --=20 2.31.1 From nobody Fri Apr 10 12:57:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 341B6ECAAD8 for ; Fri, 16 Sep 2022 18:32:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230112AbiIPScz (ORCPT ); Fri, 16 Sep 2022 14:32:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230040AbiIPSch (ORCPT ); Fri, 16 Sep 2022 14:32:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 239E2B775E for ; Fri, 16 Sep 2022 11:32:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663353155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vjH1as/LDI5IJuljZ+OOdK8KpVo326IxxOG9a4w9AZY=; b=botAaDIN7zVrwuN6TJsrA0fmW5r4I0rpz4+jNlJufbDQEjaLJIxS8fhQwps159wh7ubb1N sYL2zqDEkj8+81dEVNmzeDR6rSPtjw+OZU0oYVXIoxjPXdhItw1mFbyx8pUOZykXXsBwx/ WKjwj1GDu+zLNu/9j1akbiHqBwpgjJQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-554-n9I3xfAlNeWQizh70neE7w-1; Fri, 16 Sep 2022 14:32:32 -0400 X-MC-Unique: n9I3xfAlNeWQizh70neE7w-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B66A880206D; Fri, 16 Sep 2022 18:32:31 +0000 (UTC) Received: from llong.com (unknown [10.22.17.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 10AF9140EBF3; Fri, 16 Sep 2022 18:32:31 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner , Will Deacon Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Lai Jiangshan , Waiman Long Subject: [PATCH v9 7/7] sched: Always clear user_cpus_ptr in do_set_cpus_allowed() Date: Fri, 16 Sep 2022 14:32:17 -0400 Message-Id: <20220916183217.1172225-8-longman@redhat.com> In-Reply-To: <20220916183217.1172225-1-longman@redhat.com> References: <20220916183217.1172225-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The do_set_cpus_allowed() function is used by either kthread_bind() or select_fallback_rq(). In both cases the user affinity (if any) should be destroyed too. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- kernel/sched/core.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ce626cad4105..a5240c603667 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2599,14 +2599,20 @@ __do_set_cpus_allowed(struct task_struct *p, struct= affinity_context *ctx) set_next_task(rq, p); } =20 +/* + * Used for kthread_bind() and select_fallback_rq(), in both cases the user + * affinity (if any) should be destroyed too. + */ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_= mask) { struct affinity_context ac =3D { .new_mask =3D new_mask, - .flags =3D 0, + .user_mask =3D NULL, + .flags =3D SCA_USER, /* clear the user requested mask */ }; =20 __do_set_cpus_allowed(p, &ac); + kfree(ac.user_mask); } =20 int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, --=20 2.31.1