From nobody Tue Dec 23 16:22:39 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 673E52327A3 for ; Mon, 13 Jan 2025 07:31:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736753473; cv=none; b=J4vqhzE+dqtKpLV/jgyPDLGbUZk1gzztPe0pBu4nEEA+B9S0qjOeqVAsc2shrmrV5KhPKkbwLLYsRgZP7W5PdwD5XQAoZOAbzihpKDAy0PPfyDr+ODooFIfII0lIeSJCKTBId6N+AChJSjDCQOdU+bRjZiFZNyw7CNJN5JumK2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736753473; c=relaxed/simple; bh=8rDS8l3fx5GE3UOK00jCmGjQdiR0eyu4Z2Dl4M8cOLw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VacWrtr3uaUsUVJvIeX6/3ejx1MsgjA3utdKqn7vnLq8Sj58IiffOQYjFIR9gXWE3kymGdJXoaE0qHubFd1u5PRbI/zPtWOVnKCGWu1koGRvtIswhVJjWJtpMh1SgLJ1demscG7ur6cVunE0e631/kvIJXc8plqC2UOns6kvORU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=G5a77mp8; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="G5a77mp8" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-21654fdd5daso64518345ad.1 for ; Sun, 12 Jan 2025 23:31:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1736753471; x=1737358271; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=U67oid9U7yXb6CHfN+uN7wExKwSoaHm29VZTd8oQjXw=; b=G5a77mp8bs2yMBuzxY2SaXwX6gQjT8QB3csTzp1glr5x9w2pUGZqZpvQWV/V9aAhy+ o9l/5JVPCD2JaVngkEJ10gq8zR2KriFpxM/jDADjpJJ1oHBxGm7fZX2TPORc5qH584fx bYs0PIGAAwJ5tbKkWkvPr+AbXxxE3KrQ3oJT+brNedyDI2yi5TqduVBV451ZZMSMC0PA Wzqesa3/x9qy/PvWHAxCHkD1xPnT0N5iYm0mCYW1MqtFzfshYU/BdK8wb3le14wRNGX6 ObqRcYJWBXdcm1Up1kUQ7iwxCEuL8og1Adwa+W0OyVAC32pm0kprSzk58RgsC4mDzHS0 jgjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736753471; x=1737358271; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U67oid9U7yXb6CHfN+uN7wExKwSoaHm29VZTd8oQjXw=; b=KLz/KdFcqMU1kTrBiFVmGq5zMrj4DhYoqTog/m9PzuvMo5dusveGy0DctuU0Z4pxOu PXHO2mMTzXlDv5e8kzKYTjdEYPA7SaaWBtQ5gQmu16o0ltvb/i/L+G5XGTdPi/AewfjZ gWF1g7md9+qMXMfGq9x6xaI9xvkPeNsTD39iu2D3RrAY68on3/jqsH7HmD3JIe7tr8cE SVZQZ/MhFS1+NAXTtuQ2n6DUe3giBlFRYAaMCmAcoTeAjp3PPkeVsxxtwxXGcX10j8S2 yknM3iKj3zuhpUHwn89LUhy35DM1GNFc8EhfQ+mEp9wZ83sXgzK2cZeUjqPF+Zm1c2hA LC5A== X-Forwarded-Encrypted: i=1; AJvYcCUwaJpsllPbJv0R/b9/LXMn5ocMjPZXtVE86SjF0v6XR3deRXWQhAQfj9nUtWycfLbx560WzPncLickYCs=@vger.kernel.org X-Gm-Message-State: AOJu0YyE9ee5goreT/PWHVYpSTq9iycP8KSR59Cz13PoQ0vL5+6p856i lRgIpSXqi3UU9C3yRyWnvLu4Ib1aU1SpAyzKnhD59ZlfBUEuXRSJx9QuaDZ/QHc= X-Gm-Gg: ASbGnct5pOoN2cCRaNpJ/PMhHFFnbaWfVzAqhI0yBFy1JwSfZnGQHzsK0N/2lnETbb5 Eo3xA2nR3nUh88kOotas1pyUC1s/ukNZi7zHRSUGjxsnN3IFHLwVmWsksIeEewjyD+Iza2gpzyO 39TSv9n5MgtW2M8goyqalIwLVOtBWw/RC0Ti8PedLB2jWoSLl6wK8RtNN8wHaSCG0dDqyR3Gq/h 3I50OpXSShziIgRepIlsS68EOVZTXpS/vgrkqFkCIsYl1vgIejAd8NE4OgizON3i3Io5ZAXyeg= X-Google-Smtp-Source: AGHT+IEWvHdqG5f5n4qnnKcupRBkSp1fR06JhmffkfwJshDg6mpHoVjCURZHyiCzD4NUGR4wBj+G6Q== X-Received: by 2002:a17:902:f546:b0:216:51b0:6600 with SMTP id d9443c01a7336-21a83f691eamr317863345ad.24.1736753470675; Sun, 12 Jan 2025 23:31:10 -0800 (PST) Received: from n37-019-243.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10e2c6sm47390875ad.33.2025.01.12.23.31.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jan 2025 23:31:10 -0800 (PST) From: Chuyi Zhou To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, riel@surriel.com Cc: chengming.zhou@linux.dev, kprateek.nayak@amd.com, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH v3 2/3] sched/fair: Introduce per cpu numa_balance_mask Date: Mon, 13 Jan 2025 15:30:49 +0800 Message-Id: <20250113073050.2811925-3-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20250113073050.2811925-1-zhouchuyi@bytedance.com> References: <20250113073050.2811925-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch introduces per cpu numa_balance_mask. Similar to select_rq_mask, it will be used as a temporary variable for candidate cpu searching and numa status update. This will simplify the later patch, and we no longer need to repeatedly verify whether the candidate CPU is in env->p->cpus_ptr during iteration. Signed-off-by: Chuyi Zhou --- kernel/sched/fair.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f544012b9320..53fd95129b48 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1448,6 +1448,9 @@ unsigned int sysctl_numa_balancing_scan_delay =3D 100= 0; /* The page with hint page fault latency < threshold in ms is considered h= ot */ unsigned int sysctl_numa_balancing_hot_threshold =3D MSEC_PER_SEC; =20 +/* Working cpumask for task_numa_migrate() */ +static DEFINE_PER_CPU(cpumask_var_t, numa_balance_mask); + struct numa_group { refcount_t refcount; =20 @@ -2047,6 +2050,7 @@ struct numa_stats { struct task_numa_env { struct task_struct *p; =20 + struct cpumask *cpus; int src_cpu, src_nid; int dst_cpu, dst_nid; int imb_numa_nr; @@ -2121,8 +2125,10 @@ static void update_numa_stats(struct task_numa_env *= env, memset(ns, 0, sizeof(*ns)); ns->idle_cpu =3D -1; =20 + cpumask_copy(env->cpus, cpumask_of_node(nid)); + rcu_read_lock(); - for_each_cpu(cpu, cpumask_of_node(nid)) { + for_each_cpu(cpu, env->cpus) { struct rq *rq =3D cpu_rq(cpu); =20 ns->load +=3D cpu_load(rq); @@ -2144,7 +2150,7 @@ static void update_numa_stats(struct task_numa_env *e= nv, } rcu_read_unlock(); =20 - ns->weight =3D cpumask_weight(cpumask_of_node(nid)); + ns->weight =3D cpumask_weight(env->cpus); =20 ns->node_type =3D numa_classify(env->imbalance_pct, ns); =20 @@ -2163,11 +2169,9 @@ static void task_numa_assign(struct task_numa_env *e= nv, int start =3D env->dst_cpu; =20 /* Find alternative idle CPU. */ - for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start + 1) { - if (cpu =3D=3D env->best_cpu || !idle_cpu(cpu) || - !cpumask_test_cpu(cpu, env->p->cpus_ptr)) { + for_each_cpu_wrap(cpu, env->cpus, start + 1) { + if (cpu =3D=3D env->best_cpu || !idle_cpu(cpu)) continue; - } =20 env->dst_cpu =3D cpu; rq =3D cpu_rq(env->dst_cpu); @@ -2434,6 +2438,8 @@ static void task_numa_find_cpu(struct task_numa_env *= env, bool maymove =3D false; int cpu; =20 + cpumask_and(env->cpus, cpumask_of_node(env->dst_nid), env->p->cpus_ptr); + /* * If dst node has spare capacity, then check if there is an * imbalance that would be overruled by the load balancer. @@ -2475,11 +2481,7 @@ static void task_numa_find_cpu(struct task_numa_env = *env, maymove =3D !load_too_imbalanced(src_load, dst_load, env); } =20 - for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) { - /* Skip this CPU if the source task cannot migrate */ - if (!cpumask_test_cpu(cpu, env->p->cpus_ptr)) - continue; - + for_each_cpu(cpu, env->cpus) { env->dst_cpu =3D cpu; if (task_numa_compare(env, taskimp, groupimp, maymove)) break; @@ -2534,6 +2536,12 @@ static void task_numa_migrate(struct task_struct *p) return; } =20 + /* + * per-cpu numa_balance_mask and rq->rd->span usage + */ + preempt_disable(); + + env.cpus =3D this_cpu_cpumask_var_ptr(numa_balance_mask); env.dst_nid =3D p->numa_preferred_nid; dist =3D env.dist =3D node_distance(env.src_nid, env.dst_nid); taskweight =3D task_weight(p, env.src_nid, dist); @@ -2579,6 +2587,8 @@ static void task_numa_migrate(struct task_struct *p) } } =20 + preempt_enable(); + /* * If the task is part of a workload that spans multiple NUMA nodes, * and is migrating into one of the workload's active nodes, remember @@ -13638,6 +13648,10 @@ __init void init_sched_fair_class(void) zalloc_cpumask_var_node(&per_cpu(should_we_balance_tmpmask, i), GFP_KERNEL, cpu_to_node(i)); =20 +#ifdef CONFIG_NUMA_BALANCING + zalloc_cpumask_var_node(&per_cpu(numa_balance_mask, i), GFP_KERNEL, cpu_= to_node(i)); +#endif + #ifdef CONFIG_CFS_BANDWIDTH INIT_CSD(&cpu_rq(i)->cfsb_csd, __cfsb_csd_unthrottle, cpu_rq(i)); INIT_LIST_HEAD(&cpu_rq(i)->cfsb_csd_list); --=20 2.20.1