From nobody Tue Dec 23 16:22:39 2025 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 476F52343B8 for ; Mon, 13 Jan 2025 07:31:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736753478; cv=none; b=dlT0OH9lheZF5IzbSiyN9+SJcWrIBgtWYt3tH9zbUbYoam7hWZ5sF15+AyKPOQyl68g13/or1YH/u5Ll1NLx8ijU8OMLSPNJ+zM85Wd3DPhJIicWqxBAoLCeMySLt5klc11s59kZDvn9rv6bnPX6BJHRrWci3dM1rh6hEs2hJdU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736753478; c=relaxed/simple; bh=rYIttu5ByCzeg8uBoXMhdblcVt4QCQ1BfYAcyrMwLSU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=neZsfgIGSg0Au1QwLJWZcRK7KPNnx7DgcT2QuEXM78NQXWdjewHTpHuQuGoChZv1xudN0MhL8psDKwqG/AVkd6tJFYa3ApKQ0HvThMmT36u7gruHmIouuXgvaoSpykHVJTOXdwvJHZZv3kcW6awY+MgHfpeBXCW9RW03dR2ojBk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=g2UtfEaU; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="g2UtfEaU" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-21a7ed0155cso65868695ad.3 for ; Sun, 12 Jan 2025 23:31:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1736753475; x=1737358275; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z3wejKOGXC7O+w1yz/KHLhR+X5S1bkvRmFQ7SwdfPiI=; b=g2UtfEaURe2h7KRQCfNoHpmjl2303yiLIYwTIXqvf54hdWSBNY7NkErhRe6g4QbERf d1fgPClIhCVELGzjbjTAczytA6eAO7uE/5Sr8LcyELZ+6pPTwJOOWC1Nd6GVIi2ZE9K6 u2UxyuPojtTrfEGwGhp7OAvPe7Ki1E/M2ePk2m2ovprV1oGCbJkxKjpLp7GGYazWr2AP rRLtbxYQLVzToIF4MSpgQYenopHOZfUuDqQNxjHB9dF7sXRiDAeti0+zKwKiDL8fbDEP PC8GyN2DhoxwYlYP9gZ6HEeZhJhKZLm9Yj/BnXJ/znWbTOxgULhA2Gvn4VF7ak4gmGui vq/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736753475; x=1737358275; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z3wejKOGXC7O+w1yz/KHLhR+X5S1bkvRmFQ7SwdfPiI=; b=YQnxqkPB2gbjntZYrIK7XtcvyQZitgF9aKk8YvPaNJL1lstOeJKPs3K0Z+saSe+Vx0 djiywcMeD1iznE2mDEdR1V/9FOyIMPpgx7M72JtzeaV0+6z/x+wxZRtlS/aFO6fW8mKK UpvLtTy3/0a7KHJ8AR4I1UjlxETrtcL3IH3YW9dH33Cle+5Ru4yCwyK1QAMu9GpleYX9 r9xaXSx/lNxD+kvk2XqyMPzMLTzpegzK/XWUilTiu9ABabEtPyv7J8pXCSWxabRQxy8a 5bwfTdjrrp80dAEqWQdrbUCgYSSDQ8DMm6EqAWTPrRdEqMfLVoDFYsuPeczycw2gpRvh 92lw== X-Forwarded-Encrypted: i=1; AJvYcCXydA0170h2fZWtJ8G9m+JA4ynks55qNMTiuc8QgyFpTx4/OXy95Ho/vc7+j5HQqLcv+jTDiRS3wqGV2WA=@vger.kernel.org X-Gm-Message-State: AOJu0YzTpEgn/O2ypyRCfTjrBo6+/59pEWifvRfiWYYFSakwucs4nB34 nsWngsUDeMSO6UoO8dbTuvxcdSqKW5yttOpB0EnO2eIiz3JlyD+BkI0Xic2gdxk= X-Gm-Gg: ASbGnctjxR1JTptev+0HGekk6VCSDfvO3jsdANQ72uqi24qJg7n6C78E+2df2ITb07B mh+lGC4v1nuGOHovV2cKY5h6in1Uc4boAQh70KtfNdJu3jDlhZz6GVTFdSoVyIGu4rj9ZUZ7Knl leZy5codLYzBjboV3e+0A0TsZawFFjo9flNb/GG30ab3Yl4hfdbczQCE9DZhXGO4oUXXQrrecKD fJhEfPVh0RyIxqjXlvD/FTp9jtynwF+lfkiOamEH7bxn+VcJcq/ElSypRxkbRSOeG6eigkjwAY= X-Google-Smtp-Source: AGHT+IGUMnpQKS6570B7hZbCaHx1aE/RfNXJez4nu5u9NZHttbdykN7IZ54d1ikoVTOmUCm8KOoKQw== X-Received: by 2002:a17:902:fb8e:b0:215:6426:30a5 with SMTP id d9443c01a7336-21a83fc150dmr244706285ad.40.1736753475603; Sun, 12 Jan 2025 23:31:15 -0800 (PST) Received: from n37-019-243.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10e2c6sm47390875ad.33.2025.01.12.23.31.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jan 2025 23:31:15 -0800 (PST) From: Chuyi Zhou To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, riel@surriel.com Cc: chengming.zhou@linux.dev, kprateek.nayak@amd.com, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH v3 3/3] sched/fair: Take sched_domain into account in task_numa_migrate Date: Mon, 13 Jan 2025 15:30:50 +0800 Message-Id: <20250113073050.2811925-4-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20250113073050.2811925-1-zhouchuyi@bytedance.com> References: <20250113073050.2811925-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When we attempt to migrate a task in task_numa_migrate(), we need to consider the scheduling domain. Specifically: When searching for the best_cpu, we should skip CPUs that are not in the current scheduling domain, such as isolated CPUs. Now we only search for suitable CPUs in p->cpus_ptr, but this is not sufficient. Cpuset configured partitions are always reflected in each member task's cpumask. However, for the isolcpus=3D kernel command line option, the isolated CPUs are simply omitted from sched_domains without further restrictions on tasks' cpumasks. If a task's cpumask includes isolated CPUs, the task may be migrated to an isolated cpu. In update_numa_stats(), skip CPUs that are not in the scheduling domain. update_numa_stats() is used to be compatible with standard load balancing. For CPUs that do not participate in load balancing, such as isolated cpus, we should also skip them. This patch tries to fix the above issue by considering src_rq->rd->span in task_numa_migrate(). Note that src_cpu itself may be in an isolated domain too, and its rd may point to def_root_domain, the span may not be what we expected. In such cases, bail out early by checking whether sd_numa is null. Signed-off-by: Chuyi Zhou --- kernel/sched/fair.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 53fd95129b48..764797dd3744 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2120,12 +2120,13 @@ static void update_numa_stats(struct task_numa_env = *env, struct numa_stats *ns, int nid, bool find_idle) { + cpumask_t *span =3D cpu_rq(env->src_cpu)->rd->span; int cpu, idle_core =3D -1; =20 memset(ns, 0, sizeof(*ns)); ns->idle_cpu =3D -1; =20 - cpumask_copy(env->cpus, cpumask_of_node(nid)); + cpumask_and(env->cpus, span, cpumask_of_node(nid)); =20 rcu_read_lock(); for_each_cpu(cpu, env->cpus) { @@ -2435,10 +2436,12 @@ static bool task_numa_compare(struct task_numa_env = *env, static void task_numa_find_cpu(struct task_numa_env *env, long taskimp, long groupimp) { + cpumask_t *span =3D cpu_rq(env->src_cpu)->rd->span; bool maymove =3D false; int cpu; =20 cpumask_and(env->cpus, cpumask_of_node(env->dst_nid), env->p->cpus_ptr); + cpumask_and(env->cpus, env->cpus, span); =20 /* * If dst node has spare capacity, then check if there is an @@ -2503,10 +2506,10 @@ static void task_numa_migrate(struct task_struct *p) .best_cpu =3D -1, }; unsigned long taskweight, groupweight; + struct rq *best_rq, *src_rq; struct sched_domain *sd; long taskimp, groupimp; struct numa_group *ng; - struct rq *best_rq; int nid, ret, dist; =20 /* @@ -2530,6 +2533,9 @@ static void task_numa_migrate(struct task_struct *p) * balance domains, some of which do not cross NUMA boundaries. * Tasks that are "trapped" in such domains cannot be migrated * elsewhere, so there is no point in (re)trying. + * + * Another situation is that src_cpu is in the isolated domain, + * if so, bail out early. */ if (unlikely(!sd)) { sched_setnuma(p, task_node(p)); @@ -2541,6 +2547,7 @@ static void task_numa_migrate(struct task_struct *p) */ preempt_disable(); =20 + src_rq =3D cpu_rq(env.src_cpu); env.cpus =3D this_cpu_cpumask_var_ptr(numa_balance_mask); env.dst_nid =3D p->numa_preferred_nid; dist =3D env.dist =3D node_distance(env.src_nid, env.dst_nid); @@ -2567,6 +2574,10 @@ static void task_numa_migrate(struct task_struct *p) if (nid =3D=3D env.src_nid || nid =3D=3D p->numa_preferred_nid) continue; =20 + if (unlikely(!cpumask_intersects(src_rq->rd->span, + cpumask_of_node(nid)))) + continue; + dist =3D node_distance(env.src_nid, env.dst_nid); if (sched_numa_topology_type =3D=3D NUMA_BACKPLANE && dist !=3D env.dist) { --=20 2.20.1