From nobody Fri Sep 19 03:46:44 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00A80C4321E for ; Tue, 29 Nov 2022 11:11:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233112AbiK2LLb (ORCPT ); Tue, 29 Nov 2022 06:11:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232970AbiK2LLP (ORCPT ); Tue, 29 Nov 2022 06:11:15 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B85FA10B64 for ; Tue, 29 Nov 2022 03:11:10 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id c188-20020a25c0c5000000b006d8eba07513so13064033ybf.17 for ; Tue, 29 Nov 2022 03:11:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XGkUDP29a4yBSoWuYgHMkH94AykgrslaPfp+GvPTOqc=; b=Ub3H3IpKDBQJkHIiWOO0oyGDX+cqdRog00hqEGvkod9f856fH7xjr+j5QUH5e74qYC AhSAlnLtUAA2RarZ2+vyhFqrtn4wywNzudg5dX47CzSzMp3s/vyMSHxn7IiR3IrS+5kV +EqcmNtB9J+B9uLZRr5f6UU/waAGzQra9MeZEocLmvAn7R9emgB2bNuC7hT3cXcjm/Tb 9aBhEstoB7keQQrVBbGWp88WHXKyDki/PhaEGzno6dJ0Cbo1iyEt8OgbHwgE3S2iQFP8 0a6wcmxbBbfYQTVgBrwIknFN22kMKCGS3ELr9jDS8rdwM0jLnG3iFTIU8JR+TTj51sl9 cqkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XGkUDP29a4yBSoWuYgHMkH94AykgrslaPfp+GvPTOqc=; b=41iuPBbMqXcNF6DsnhKQSRuTC1odRRsfV5IbkelW21EXGOkqqnpOuQCJYtKhBIW+8X O9jHKUZi+rCpIerjWLNHTqJAFbOh5MjvrfOy8NS2a2ixKUQsmhmqrfdJosl92iFhG+c/ 4rQ3NZ7gu8apyjA3Srr86SOqXcsziPyWXa2FtXZurtkC97mIdOOeAaUO9n0g8cqJ3DPJ wgQstbvyIcedMq3qHSPJaZQOzUXkoaVb5cBshBOSjXJXiycgwv+fByi8NTQjRy5MJA7E oml/60m246UEP1rkhDdH0TLWsbdQgN2+LKEsI2Jm3yngU8P3KiFGePd1QzIB6WBmFvDc oG9g== X-Gm-Message-State: ANoB5pml3m6nr3DAOikah8GcW77IBaiizy8TfclHBXwVp9vWMbLE/Io7 gpKr8R46Vsl9F6b/LOCeAC2wIglr6275n9BNMg== X-Google-Smtp-Source: AA0mqf6WvtwLJ16U0bJUzIbVMITdvidXn3Elwjfvzuml5qv13d57eLuJcjx5GfrpC7NxYwS4xXgp1H3Gb73baaYCfA== X-Received: from peternewman10.zrh.corp.google.com ([2a00:79e0:9d:6:e398:2261:c909:b359]) (user=peternewman job=sendgmr) by 2002:a0d:f545:0:b0:377:5ad7:1c9a with SMTP id e66-20020a0df545000000b003775ad71c9amr6149710ywf.306.1669720269444; Tue, 29 Nov 2022 03:11:09 -0800 (PST) Date: Tue, 29 Nov 2022 12:10:54 +0100 In-Reply-To: <20221129111055.953833-1-peternewman@google.com> Mime-Version: 1.0 References: <20221129111055.953833-1-peternewman@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221129111055.953833-2-peternewman@google.com> Subject: [PATCH v4 1/2] x86/resctrl: Update task closid/rmid with task_call_func() From: Peter Newman To: reinette.chatre@intel.com, fenghua.yu@intel.com Cc: bp@alien8.de, derkling@google.com, eranian@google.com, hpa@zytor.com, james.morse@arm.com, jannh@google.com, kpsingh@google.com, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, x86@kernel.org, Peter Newman Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When the user moves a running task to a new rdtgroup using the tasks file interface, the resulting change in CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the task's CPU. It is possible for a task to wake up or migrate while it is being moved to a new group. If __rdtgroup_move_task() fails to observe that a task has begun running or misses that it migrated to a new CPU, the task will continue to use the old CLOSID or RMID until it switches in again. __rdtgroup_move_task() assumes that if the task migrates off of its CPU before it can IPI the task, then the task has already observed the updated CLOSID/RMID. Because this is done locklessly and an x86 CPU can delay stores until after loads, the following incorrect scenarios are possible: 1. __rdtgroup_move_task() stores the new closid and rmid in the task structure after it loads task_curr() and task_cpu(). 2. resctrl_sched_in() loads t->{closid,rmid} before the calling context switch stores new task_curr() and task_cpu() values. Use task_call_func() in __rdtgroup_move_task() to serialize updates to the closid and rmid fields in the task_struct with context switch. Signed-off-by: Peter Newman Reviewed-by: James Morse --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 78 ++++++++++++++++---------- 1 file changed, 47 insertions(+), 31 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/r= esctrl/rdtgroup.c index e5a48f05e787..59b7ffcd53bb 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -528,6 +528,31 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp) kfree(rdtgrp); } =20 +static int update_locked_task_closid_rmid(struct task_struct *t, void *arg) +{ + struct rdtgroup *rdtgrp =3D arg; + + /* + * Although task_call_func() serializes the writes below with the paired + * reads in resctrl_sched_in(), resctrl_sched_in() still needs + * READ_ONCE() due to rdt_move_group_tasks(), so use WRITE_ONCE() here + * to conform. + */ + if (rdtgrp->type =3D=3D RDTCTRL_GROUP) { + WRITE_ONCE(t->closid, rdtgrp->closid); + WRITE_ONCE(t->rmid, rdtgrp->mon.rmid); + } else if (rdtgrp->type =3D=3D RDTMON_GROUP) { + WRITE_ONCE(t->rmid, rdtgrp->mon.rmid); + } + + /* + * If the task is current on a CPU, the PQR_ASSOC MSR needs to be + * updated to make the resource group go into effect. If the task is not + * current, the MSR will be updated when the task is scheduled in. + */ + return task_curr(t); +} + static void _update_task_closid_rmid(void *task) { /* @@ -538,10 +563,24 @@ static void _update_task_closid_rmid(void *task) resctrl_sched_in(); } =20 -static void update_task_closid_rmid(struct task_struct *t) +static void update_task_closid_rmid(struct task_struct *t, + struct rdtgroup *rdtgrp) { - if (IS_ENABLED(CONFIG_SMP) && task_curr(t)) - smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1); + /* + * Serialize the closid and rmid update with context switch. If + * task_call_func() indicates that the task was running during + * update_locked_task_closid_rmid(), then interrupt it. + */ + if (task_call_func(t, update_locked_task_closid_rmid, rdtgrp) && + IS_ENABLED(CONFIG_SMP)) + /* + * If the task has migrated away from the CPU indicated by + * task_cpu() below, then it has already switched in on the + * new CPU using the updated closid and rmid and the call below + * is unnecessary, but harmless. + */ + smp_call_function_single(task_cpu(t), + _update_task_closid_rmid, t, 1); else _update_task_closid_rmid(t); } @@ -557,39 +596,16 @@ static int __rdtgroup_move_task(struct task_struct *t= sk, return 0; =20 /* - * Set the task's closid/rmid before the PQR_ASSOC MSR can be - * updated by them. - * - * For ctrl_mon groups, move both closid and rmid. * For monitor groups, can move the tasks only from * their parent CTRL group. */ - - if (rdtgrp->type =3D=3D RDTCTRL_GROUP) { - WRITE_ONCE(tsk->closid, rdtgrp->closid); - WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid); - } else if (rdtgrp->type =3D=3D RDTMON_GROUP) { - if (rdtgrp->mon.parent->closid =3D=3D tsk->closid) { - WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid); - } else { - rdt_last_cmd_puts("Can't move task to different control group\n"); - return -EINVAL; - } + if (rdtgrp->type =3D=3D RDTMON_GROUP && + rdtgrp->mon.parent->closid !=3D tsk->closid) { + rdt_last_cmd_puts("Can't move task to different control group\n"); + return -EINVAL; } =20 - /* - * Ensure the task's closid and rmid are written before determining if - * the task is current that will decide if it will be interrupted. - */ - barrier(); - - /* - * By now, the task's closid and rmid are set. If the task is current - * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource - * group go into effect. If the task is not current, the MSR will be - * updated when the task is scheduled in. - */ - update_task_closid_rmid(tsk); + update_task_closid_rmid(tsk, rdtgrp); =20 return 0; } --=20 2.38.1.584.g0f3c55d4c2-goog From nobody Fri Sep 19 03:46:44 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ECCEC4321E for ; Tue, 29 Nov 2022 11:11:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229641AbiK2LLi (ORCPT ); Tue, 29 Nov 2022 06:11:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232875AbiK2LLR (ORCPT ); Tue, 29 Nov 2022 06:11:17 -0500 Received: from mail-wm1-x349.google.com (mail-wm1-x349.google.com [IPv6:2a00:1450:4864:20::349]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4E3C25D1 for ; Tue, 29 Nov 2022 03:11:15 -0800 (PST) Received: by mail-wm1-x349.google.com with SMTP id b47-20020a05600c4aaf00b003d031aeb1b6so10005807wmp.9 for ; Tue, 29 Nov 2022 03:11:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rJmy483wHd30jX7z0XJjlHf6mlOL34e660IEJzcRK0c=; b=U9valu+7yplhsb5N409ZkqkU9eYZptS5QMZfhsrdvqRgGOsc3UIe919Fmwn5MGUtCy n9M6de64kxbi0VJlIgEH6iIa1/UYGSVpkqQq5kudBj5Rt7KIe7iBsYK86Dik16SKoThR IVNKBtPlCbp1hR2x/Kreqywk8ZHwkMGyAI+xRGX2Y2e6aLSsHVU1xCYfXRY/pKc2HCD7 6Uxv6MvlbNB6V30HUEEmqX00vbsrtRr4E+OyNxVyJGDtMhzsQQ5wstJpmY1N3Cf2uhgj fCyfYaLnWmZhdN3kcOwD0zXYQdc3/QxFXLQE87OnAHSHGXwpEJKiBKTc6FV+G6rItV7D PozA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rJmy483wHd30jX7z0XJjlHf6mlOL34e660IEJzcRK0c=; b=jbVCbkHr/K/q8B6n1McK3x+zhvDlJSoGbYU3rsYqj+B8bpW+oRGdN1ieZuQhRXRSpC suvO1XHTrOgSQ7JsuIoBgZ5tDqTib7bGWJnsBGTTak56hgyKzgR2R41/UCILh8kPLMd7 ZPsrNv1M1Rx2Zsy1ogvcx9lMXhLPX2p3iRa0VYPPBCpkGqr57S2UaBGsxHE+kFy6rgPO spIIXIGAPUR18HGuGZBNEO0QPvWhykFt7uEi3i+8z99ENfSzqcYxuwvZh09dsEy/SAmk oSLstoMki3QPheNswVDHhL0HKwMgbOHxlZn/H7HqnzAKeO6fydpsFfQqzGK4w/+Xvjbi 0Apg== X-Gm-Message-State: ANoB5pkkdbBR+5Z5eAMzqabf9vXqkxxSp7bJAIOifGhNqCp1MEhKrJUN Vtvl5+0Jed4V6udm1RPoVkTLxhlokCG9k+nWaA== X-Google-Smtp-Source: AA0mqf7eVvlJX1XaKs0HbL+JRPRlmIpYqq7hlX0ciip0Rt2iXUDIF+qLCjTJqfkIyl/F9CCrvznlWfPF9VJSDTLsbA== X-Received: from peternewman10.zrh.corp.google.com ([2a00:79e0:9d:6:e398:2261:c909:b359]) (user=peternewman job=sendgmr) by 2002:a05:6000:1c4:b0:241:792f:a914 with SMTP id t4-20020a05600001c400b00241792fa914mr27031028wrx.117.1669720274493; Tue, 29 Nov 2022 03:11:14 -0800 (PST) Date: Tue, 29 Nov 2022 12:10:55 +0100 In-Reply-To: <20221129111055.953833-1-peternewman@google.com> Mime-Version: 1.0 References: <20221129111055.953833-1-peternewman@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221129111055.953833-3-peternewman@google.com> Subject: [PATCH v4 2/2] x86/resctrl: IPI all online CPUs for group updates From: Peter Newman To: reinette.chatre@intel.com, fenghua.yu@intel.com Cc: bp@alien8.de, derkling@google.com, eranian@google.com, hpa@zytor.com, james.morse@arm.com, jannh@google.com, kpsingh@google.com, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, x86@kernel.org, Peter Newman Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Removing a CTRL_MON or MON group directory moves all tasks to the parent group. The rmdir implementation therefore interrupts any running tasks which were in the deleted group to update their CLOSID/RMID to those of the parent. The rmdir operation iterates over all tasks in the deleted group while read-locking the tasklist_lock to ensure that no newly-created child tasks remain in the deleted group. Calling task_call_func() to perform the updates on every task in the deleted group, similar to the recent fix in __rdtgroup_move_task(), would result in a much longer tasklist_lock critical section. To avoid this, stop attempting to construct a precise mask of CPUs hosting the moved tasks in rdt_move_group_tasks(). Its callers instead perform the PQR_ASSOC MSR update on all online CPUs to ensure all affected tasks are notified. To measure the impact of the rdt_move_group_tasks() implementation options, the following command was run in an rdtgroup to produce a 1600-task workload: # mkdir /sys/fs/resctrl/test # echo $$ > /sys/fs/resctrl/test/tasks # perf bench sched messaging -g 40 -l 100000 Results collected using: # perf stat rmdir /sys/fs/resctrl/test CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads) Calling task_call_func() on all tasks in the deleted group increased task-clock time from 1.54 to 2.35 ms, while the IPI broadcast reduced the time to 1.31 ms. Restructuring resctrl groups is assumed to be a rare act of system-level reconfiguration by the user, so the impact of additional IPIs resulting from this change to a CPU-isolated workload is not a concern. Signed-off-by: Peter Newman Reviewed-by: James Morse --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 +++++++------------------- 1 file changed, 13 insertions(+), 39 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/r= esctrl/rdtgroup.c index 59b7ffcd53bb..4a3c0b315484 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -2401,12 +2401,10 @@ static int reset_all_ctrls(struct rdt_resource *r) * Move tasks from one to the other group. If @from is NULL, then all tasks * in the systems are moved unconditionally (used for teardown). * - * If @mask is not NULL the cpus on which moved tasks are running are set - * in that mask so the update smp function call is restricted to affected - * cpus. + * Following this operation, the caller should update PQR_ASSOC MSR and pe= r-CPU + * storage on all online CPUs. */ -static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t= o, - struct cpumask *mask) +static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t= o) { struct task_struct *p, *t; =20 @@ -2416,16 +2414,6 @@ static void rdt_move_group_tasks(struct rdtgroup *fr= om, struct rdtgroup *to, is_rmid_match(t, from)) { WRITE_ONCE(t->closid, to->closid); WRITE_ONCE(t->rmid, to->mon.rmid); - - /* - * If the task is on a CPU, set the CPU in the mask. - * The detection is inaccurate as tasks might move or - * schedule before the smp function call takes place. - * In such a case the function call is pointless, but - * there is no other side effect. - */ - if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t)) - cpumask_set_cpu(task_cpu(t), mask); } } read_unlock(&tasklist_lock); @@ -2456,7 +2444,7 @@ static void rmdir_all_sub(void) struct rdtgroup *rdtgrp, *tmp; =20 /* Move all tasks to the default resource group */ - rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); + rdt_move_group_tasks(NULL, &rdtgroup_default); =20 list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { /* Free any child rmids */ @@ -3115,23 +3103,19 @@ static int rdtgroup_mkdir(struct kernfs_node *paren= t_kn, const char *name, return -EPERM; } =20 -static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpma= sk) +static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp) { struct rdtgroup *prdtgrp =3D rdtgrp->mon.parent; int cpu; =20 /* Give any tasks back to the parent group */ - rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask); + rdt_move_group_tasks(rdtgrp, prdtgrp); =20 /* Update per cpu rmid of the moved CPUs first */ for_each_cpu(cpu, &rdtgrp->cpu_mask) per_cpu(pqr_state.default_rmid, cpu) =3D prdtgrp->mon.rmid; - /* - * Update the MSR on moved CPUs and CPUs which have moved - * task running on them. - */ - cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask); - update_closid_rmid(tmpmask, NULL); + + update_closid_rmid(cpu_online_mask, NULL); =20 rdtgrp->flags =3D RDT_DELETED; free_rmid(rdtgrp->mon.rmid); @@ -3156,12 +3140,12 @@ static int rdtgroup_ctrl_remove(struct rdtgroup *rd= tgrp) return 0; } =20 -static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpm= ask) +static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp) { int cpu; =20 /* Give any tasks back to the default group */ - rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask); + rdt_move_group_tasks(rdtgrp, &rdtgroup_default); =20 /* Give any CPUs back to the default group */ cpumask_or(&rdtgroup_default.cpu_mask, @@ -3173,12 +3157,7 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtg= rp, cpumask_var_t tmpmask) per_cpu(pqr_state.default_rmid, cpu) =3D rdtgroup_default.mon.rmid; } =20 - /* - * Update the MSR on moved CPUs and CPUs which have moved - * task running on them. - */ - cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask); - update_closid_rmid(tmpmask, NULL); + update_closid_rmid(cpu_online_mask, NULL); =20 closid_free(rdtgrp->closid); free_rmid(rdtgrp->mon.rmid); @@ -3197,12 +3176,8 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) { struct kernfs_node *parent_kn =3D kn->parent; struct rdtgroup *rdtgrp; - cpumask_var_t tmpmask; int ret =3D 0; =20 - if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) - return -ENOMEM; - rdtgrp =3D rdtgroup_kn_lock_live(kn); if (!rdtgrp) { ret =3D -EPERM; @@ -3222,18 +3197,17 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) { ret =3D rdtgroup_ctrl_remove(rdtgrp); } else { - ret =3D rdtgroup_rmdir_ctrl(rdtgrp, tmpmask); + ret =3D rdtgroup_rmdir_ctrl(rdtgrp); } } else if (rdtgrp->type =3D=3D RDTMON_GROUP && is_mon_groups(parent_kn, kn->name)) { - ret =3D rdtgroup_rmdir_mon(rdtgrp, tmpmask); + ret =3D rdtgroup_rmdir_mon(rdtgrp); } else { ret =3D -EPERM; } =20 out: rdtgroup_kn_unlock(kn); - free_cpumask_var(tmpmask); return ret; } =20 --=20 2.38.1.584.g0f3c55d4c2-goog