From nobody Sun Feb 8 21:46:52 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EF453587A5 for ; Thu, 4 Dec 2025 17:56:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764870998; cv=none; b=GgTQ0RwCFHcSJYG8h5NboqkueT24BTgejHg5/oDI37Vr56yA31MgweXJEu4eh6NoM4UYL/CwxIOXU1vb40LceTYaSt7iLGOyD0ReAAH3/AHeidYaLxe+IKXq3n6y0JgGw1mP9bvSRX13ySWMyWclgSEoxbUpfxmmm9QJsC8umRk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764870998; c=relaxed/simple; bh=YCCb/7EaIQcNXdAqnwws8/QAxOPzLfxuw7wg0TKCfpc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nj/9Is9DwZ9U8OCV8wOG0jRkyefk+px3SNB+8H5hCWkrO/XbFgXUg3JlWhcTsORAtRoSqWZ/f6kIIa71oeQgzylTXS9oZLTWU6V7V2oEezKjaDRB4kSCdktfg7uuvzGCt5aEB1JLNJPZFLuGtu1VlL6F2a3Ss4rllTNVLt9KRj8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=XSQ+SfwL; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="XSQ+SfwL" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5B4B7HVR014086; Thu, 4 Dec 2025 17:56:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=kkg9Mm171EveHML6p pZAmLKP+NVKMK5te30viw6KABs=; b=XSQ+SfwL6z47CoKBBmz3Q9telL8/hbtRI 5sdYycFJjuS7kp0bP/N3/CBif4092s9DjX9te2V7QTTYGZuXbzNd5FN65tBc8h6I ZqyPJdXb6mCsLvcjaevXOJ/HQXFuLvb8Jb9W7mQzLlGHWpsWwfLSnFFa3CphkT4q lpdCbS+ERxpeuiHtl7pZYQ8gS3awsqanO1n/e8n9QgoSrw0B0kiFjO2YXuqq92x8 WLiAfnfg4KFOMId/3m8of53BIR/fMQpVhKf4TYYdQ8Mz8TwWcBJB/K/mmlk7Z9ly 52PvN3LgjIhUDwiT5Hx7i9bDW/SPKekWyUpXF6GHSc1pGQfsP8tgQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aqrg5s6eg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Dec 2025 17:56:04 +0000 (GMT) Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5B4HtbE6019080; Thu, 4 Dec 2025 17:56:04 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aqrg5s6eb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Dec 2025 17:56:03 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5B4G44Fo021859; Thu, 4 Dec 2025 17:56:03 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4at8c6jbbd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Dec 2025 17:56:03 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5B4Htxnc43123066 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 4 Dec 2025 17:55:59 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2A89720043; Thu, 4 Dec 2025 17:55:59 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AB86B20040; Thu, 4 Dec 2025 17:55:55 +0000 (GMT) Received: from sapthagiri.in.ibm.com (unknown [9.39.29.188]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 4 Dec 2025 17:55:55 +0000 (GMT) From: Srikar Dronamraju To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Peter Zijlstra Cc: Ben Segall , Christophe Leroy , Dietmar Eggemann , Ingo Molnar , Juri Lelli , K Prateek Nayak , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Nicholas Piggin , Shrikanth Hegde , Srikar Dronamraju , Steven Rostedt , Swapnil Sapkal , Thomas Huth , Valentin Schneider , Vincent Guittot , virtualization@lists.linux.dev, Yicong Yang , Ilya Leoshkevich Subject: [PATCH 12/17] pseries/smp: Trigger softoffline based on steal metrics Date: Thu, 4 Dec 2025 23:24:00 +0530 Message-ID: <20251204175405.1511340-13-srikar@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251204175405.1511340-1-srikar@linux.ibm.com> References: <20251204175405.1511340-1-srikar@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Fm2fUOblfS-MMrubUr6aOuAovAIP_ibQ X-Authority-Analysis: v=2.4 cv=Ir0Tsb/g c=1 sm=1 tr=0 ts=6931cb34 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=FOpSX02iUUlzJe458GUA:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTI5MDAyMCBTYWx0ZWRfXzH5h37RCypag CCwblen8QWnLe+fJsBdDHQmMtzkqPHZyQRpErufS0TjeQoNj8c3hcWF3/e0yJFSONnll0h44xut 0NsHUMjoP+83VDDvVWpk/SAzr3PYwUI+ltRxjx75+riyLC14MrbLGTs7QYkaHYeNqFwtc1dpL3C GScLGWxf/oocOLQNYEdJc9ywkMYN64zxK+k3UwHNTdaJF2IeWn60SOkojs44g19vJeeUWgooAW0 plIcSA1/fHFF5PKuTYIu36eEEbDn1eTcEVKKeSPThx6cWKtM5QqwTeWjRb6ZQ/4N5xUzJ3UHL9d 4etSHgiSKVA5uMEm58kZKVvYctnKMn+WhLQgjxEJhHow9QoXXHMUfudaNtOcLdrpi7ib7FyCrsu v3ALub4VJxULOJjMiFoDzS7AUmxogA== X-Proofpoint-GUID: Ljq-AuS19zSba1sj3Hwb21dCzsrOzBLx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-04_04,2025-12-04_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 impostorscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 spamscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511290020 Content-Type: text/plain; charset="utf-8" Based on the steal metrics, update the number of CPUs that need to soft onlined/offlined. If LPAR continues to see steal above the given higher threshold, then continue to offline more CPUs. This will result in more CPUs of the active cores being used and LPAR should see lesser vCPU preemption. In the next interval, the steal metrics would also continue to drop. If LPAR continues to see steal below the lower threshold, then continue to online more cores. To avoid ping-pong behaviour, online/offline a core only if steal metrics trend is seen for at least 2 intervals. In a PowerVM environment schedules at a core granularity. Hence its preferable to soft online/offline an entire core. Online / Offline of only few CPUs from a core is neither going to reduce steal nor would the resources being used efficiently/effectively. A Shared LPAR on a PowerVM environment will have cores interleaved across multiple NUMA nodes. Hence choosing the last active core to offline and the first inactive core to online will most likely be able to balance NUMA. A more intelligent approach to select cores to online /offline may be needed in the future. Signed-off-by: Srikar Dronamraju --- arch/powerpc/platforms/pseries/lpar.c | 3 -- arch/powerpc/platforms/pseries/pseries.h | 3 ++ arch/powerpc/platforms/pseries/smp.c | 57 ++++++++++++++++++++++++ 3 files changed, 60 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms= /pseries/lpar.c index f8e049ac9364..f5caf1137707 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -662,9 +662,6 @@ machine_device_initcall(pseries, vcpudispatch_stats_pro= cfs_init); #define STEAL_MULTIPLE (STEAL_RATIO * STEAL_RATIO) #define PURR_UPDATE_TB tb_ticks_per_sec =20 -static void trigger_softoffline(unsigned long steal_ratio) -{ -} =20 static bool should_cpu_process_steal(int cpu) { diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platfo= rms/pseries/pseries.h index 68cf25152870..2527c2049e74 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -119,6 +119,9 @@ int dlpar_workqueue_init(void); =20 extern u32 pseries_security_flavor; void pseries_setup_security_mitigations(void); +#ifdef CONFIG_PPC_SPLPAR +void trigger_softoffline(unsigned long steal_ratio); +#endif =20 #ifdef CONFIG_PPC_64S_HASH_MMU void pseries_lpar_read_hblkrm_characteristics(void); diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/= pseries/smp.c index ec1af13670f2..4c83749018d0 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -51,6 +51,9 @@ * interface by prom_hold_cpus and is spinning on secondary_hold_spinloop. */ static cpumask_var_t of_spin_mask; +#ifdef CONFIG_PPC_SPLPAR +static cpumask_var_t cpus; +#endif =20 /* Query where a cpu is now. Return codes #defined in plpar_wrappers.h */ int smp_query_cpu_stopped(unsigned int pcpu) @@ -277,6 +280,14 @@ static __init void pSeries_smp_probe(void) } =20 #ifdef CONFIG_PPC_SPLPAR +/* + * Set higher threshold values to which steal has to be limited. Also set + * lower threshold values below which allow work to spread out to more + * cores. + */ +#define STEAL_RATIO_HIGH (10 * STEAL_RATIO) +#define STEAL_RATIO_LOW (5 * STEAL_RATIO) + static unsigned int max_virtual_cores __read_mostly; static unsigned int entitled_cores __read_mostly; static unsigned int available_cores; @@ -311,6 +322,49 @@ static unsigned int pseries_num_available_cores(void) =20 return available_cores; } + +void trigger_softoffline(unsigned long steal_ratio) +{ + int currcpu =3D smp_processor_id(); + static int prev_direction; + int cpu, i; + + if (steal_ratio >=3D STEAL_RATIO_HIGH && prev_direction > 0) { + /* + * System entitlement was reduced earlier but we continue to + * see steal time. Reduce entitlement further. + */ + cpu =3D cpumask_last(cpu_active_mask); + for_each_cpu_andnot(i, cpu_sibling_mask(cpu), cpu_sibling_mask(currcpu))= { + struct offline_worker *worker =3D &per_cpu(offline_workers, i); + + worker->offline =3D 1; + schedule_work_on(i, &worker->work); + } + } else if (steal_ratio <=3D STEAL_RATIO_LOW && prev_direction < 0) { + /* + * System entitlement was increased but we continue to see + * less steal time. Increase entitlement further. + */ + cpumask_andnot(cpus, cpu_online_mask, cpu_active_mask); + if (cpumask_empty(cpus)) + return; + + cpu =3D cpumask_first(cpus); + for_each_cpu_andnot(i, cpu_sibling_mask(cpu), cpu_sibling_mask(currcpu))= { + struct offline_worker *worker =3D &per_cpu(offline_workers, i); + + worker->offline =3D 0; + schedule_work_on(i, &worker->work); + } + } + if (steal_ratio >=3D STEAL_RATIO_HIGH) + prev_direction =3D 1; + else if (steal_ratio <=3D STEAL_RATIO_LOW) + prev_direction =3D -1; + else + prev_direction =3D 0; +} #endif =20 static struct smp_ops_t pseries_smp_ops =3D { @@ -336,6 +390,9 @@ void __init smp_init_pseries(void) smp_ops =3D &pseries_smp_ops; =20 alloc_bootmem_cpumask_var(&of_spin_mask); +#ifdef CONFIG_PPC_SPLPAR + alloc_bootmem_cpumask_var(&cpus); +#endif =20 /* * Mark threads which are still spinning in hold loops --=20 2.43.7