From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57E0EC001DE for ; Sat, 22 Jul 2023 19:08:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229861AbjGVTIF (ORCPT ); Sat, 22 Jul 2023 15:08:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229703AbjGVTHz (ORCPT ); Sat, 22 Jul 2023 15:07:55 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64C13E6E; Sat, 22 Jul 2023 12:07:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052873; x=1721588873; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cpV3MqFoCBTpdgDSMwnM+O5GM2EEoQCG0WEeo68BAPk=; b=HZ5InklorLVBgKziGhW0QDXYliPc4UUuQY3OFo6xBX9xw179gf97JFXi vgwYklem5Q6oFlLcluR1jvkK3t7/TUyoUwBIFhb7lWxgjWOpyf+84DLxn h0tWqdudDqXfGE+8XRUVzvkLCZtuCyAMJtNZ+aXVv/s1cD+TsH9PPEYWf SLg4WYFjr644l650QD4pdO5+ShpLv+N9GuqyzGhBv1lgj4yHBvD/pS57N SiWriRGK5Jgo2OL117MJjGKcURgzDjXbiNOUuMtMb31iKMiwt/AZeDFL5 ES4ti7JXc764vToCtBzD4xcwzj5MQYlvS0MyKo21by6EuwMWFbRH1T9dl g==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823962" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823962" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368077" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368077" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:51 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 1/7] x86/resctrl: Create separate domains for control and monitoring Date: Sat, 22 Jul 2023 12:07:34 -0700 Message-Id: <20230722190740.326190-2-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" First step towards supporting resource control where the scope of control operations is not the same as monitor operations. Add an extra list in the rdt_resource structure. For this will just duplicate the existing list of domains based on the L3 cache scope. Refactor the domain_add_cpu() and domain_remove() functions to build separate lists for r->alloc_capable and r->mon_capable resources. Note that only the "L3" domain currently supports both types. Change all places where monitoring functions walk the list of domains to use the new "mondomains" list instead of the old "domains" list. Signed-off-by: Tony Luck --- include/linux/resctrl.h | 10 +- arch/x86/kernel/cpu/resctrl/internal.h | 2 +- arch/x86/kernel/cpu/resctrl/core.c | 195 +++++++++++++++------- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +- arch/x86/kernel/cpu/resctrl/monitor.c | 2 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 30 ++-- 6 files changed, 167 insertions(+), 74 deletions(-) diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 8334eeacfec5..1267d56f9e76 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -151,9 +151,11 @@ struct resctrl_schema; * @mon_capable: Is monitor feature available on this machine * @num_rmid: Number of RMIDs available * @cache_level: Which cache level defines scope of this resource + * @mon_scope: Scope of this resource if different from cache_level * @cache: Cache allocation related data * @membw: If the component has bandwidth controls, their properties. * @domains: All domains for this resource + * @mondomains: Monitor domains for this resource * @name: Name to use in "schemata" file. * @data_width: Character width of data when displaying * @default_ctrl: Specifies default cache cbm or memory B/W percent. @@ -169,9 +171,11 @@ struct rdt_resource { bool mon_capable; int num_rmid; int cache_level; + int mon_scope; struct resctrl_cache cache; struct resctrl_membw membw; struct list_head domains; + struct list_head mondomains; char *name; int data_width; u32 default_ctrl; @@ -217,8 +221,10 @@ int resctrl_arch_update_one(struct rdt_resource *r, st= ruct rdt_domain *d, =20 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); +int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *= d); +int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d= ); +void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain= *d); +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain = *d); =20 /** * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rm= id diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/r= esctrl/internal.h index 85ceaf9a31ac..c5e2ac2a60cf 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -511,7 +511,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn); int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name); int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name, umode_t mask); -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, +struct rdt_domain *rdt_find_domain(struct list_head *h, int id, struct list_head **pos); ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resct= rl/core.c index 030d3b409768..274605aaa026 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -57,7 +57,7 @@ static void mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); =20 -#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.dom= ains) +#define domain_init(id, field) LIST_HEAD_INIT(rdt_resources_all[id].r_resc= trl.field) =20 struct rdt_hw_resource rdt_resources_all[] =3D { [RDT_RESOURCE_L3] =3D @@ -66,7 +66,9 @@ struct rdt_hw_resource rdt_resources_all[] =3D { .rid =3D RDT_RESOURCE_L3, .name =3D "L3", .cache_level =3D 3, - .domains =3D domain_init(RDT_RESOURCE_L3), + .mon_scope =3D 3, + .domains =3D domain_init(RDT_RESOURCE_L3, domains), + .mondomains =3D domain_init(RDT_RESOURCE_L3, mondomains), .parse_ctrlval =3D parse_cbm, .format_str =3D "%d=3D%0*x", .fflags =3D RFTYPE_RES_CACHE, @@ -80,7 +82,7 @@ struct rdt_hw_resource rdt_resources_all[] =3D { .rid =3D RDT_RESOURCE_L2, .name =3D "L2", .cache_level =3D 2, - .domains =3D domain_init(RDT_RESOURCE_L2), + .domains =3D domain_init(RDT_RESOURCE_L2, domains), .parse_ctrlval =3D parse_cbm, .format_str =3D "%d=3D%0*x", .fflags =3D RFTYPE_RES_CACHE, @@ -94,7 +96,7 @@ struct rdt_hw_resource rdt_resources_all[] =3D { .rid =3D RDT_RESOURCE_MBA, .name =3D "MB", .cache_level =3D 3, - .domains =3D domain_init(RDT_RESOURCE_MBA), + .domains =3D domain_init(RDT_RESOURCE_MBA, domains), .parse_ctrlval =3D parse_bw, .format_str =3D "%d=3D%*u", .fflags =3D RFTYPE_RES_MB, @@ -106,7 +108,7 @@ struct rdt_hw_resource rdt_resources_all[] =3D { .rid =3D RDT_RESOURCE_SMBA, .name =3D "SMBA", .cache_level =3D 3, - .domains =3D domain_init(RDT_RESOURCE_SMBA), + .domains =3D domain_init(RDT_RESOURCE_SMBA, domains), .parse_ctrlval =3D parse_bw, .format_str =3D "%d=3D%*u", .fflags =3D RFTYPE_RES_MB, @@ -384,14 +386,15 @@ void rdt_ctrl_update(void *arg) } =20 /* - * rdt_find_domain - Find a domain in a resource that matches input resour= ce id + * rdt_find_domain - Find a domain in one of the lists for a resource that + * matches input resource id * * Search resource r's domain list to find the resource id. If the resource * id is found in a domain, return the domain. Otherwise, if requested by * caller, return the first domain whose id is bigger than the input id. * The domain list is sorted by id in ascending order. */ -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, +struct rdt_domain *rdt_find_domain(struct list_head *h, int id, struct list_head **pos) { struct rdt_domain *d; @@ -400,7 +403,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource = *r, int id, if (id < 0) return ERR_PTR(-ENODEV); =20 - list_for_each(l, &r->domains) { + list_for_each(l, h) { d =3D list_entry(l, struct rdt_domain, list); /* When id is found, return its domain. */ if (id =3D=3D d->id) @@ -487,6 +490,94 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct = rdt_hw_domain *hw_dom) return 0; } =20 +static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r) +{ + int id =3D get_cpu_cacheinfo_id(cpu, r->cache_level); + struct list_head *add_pos =3D NULL; + struct rdt_hw_domain *hw_dom; + struct rdt_domain *d; + int err; + + d =3D rdt_find_domain(&r->domains, id, &add_pos); + if (IS_ERR(d)) { + pr_warn("Couldn't find cache id for CPU %d\n", cpu); + return; + } + + if (d) { + cpumask_set_cpu(cpu, &d->cpu_mask); + if (r->cache.arch_has_per_cpu_cfg) + rdt_domain_reconfigure_cdp(r); + return; + } + + hw_dom =3D kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); + if (!hw_dom) + return; + + d =3D &hw_dom->d_resctrl; + d->id =3D id; + cpumask_set_cpu(cpu, &d->cpu_mask); + + rdt_domain_reconfigure_cdp(r); + + if (domain_setup_ctrlval(r, d)) { + domain_free(hw_dom); + return; + } + + list_add_tail(&d->list, add_pos); + + err =3D resctrl_online_ctrl_domain(r, d); + if (err) { + list_del(&d->list); + domain_free(hw_dom); + } +} + +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r) +{ + int id =3D get_cpu_cacheinfo_id(cpu, r->mon_scope); + struct list_head *add_pos =3D NULL; + struct rdt_hw_domain *hw_dom; + struct rdt_domain *d; + int err; + + d =3D rdt_find_domain(&r->mondomains, id, &add_pos); + if (IS_ERR(d)) { + pr_warn("Couldn't find cache id for CPU %d\n", cpu); + return; + } + + if (d) { + cpumask_set_cpu(cpu, &d->cpu_mask); + if (r->cache.arch_has_per_cpu_cfg) + rdt_domain_reconfigure_cdp(r); + return; + } + + hw_dom =3D kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); + if (!hw_dom) + return; + + d =3D &hw_dom->d_resctrl; + d->id =3D id; + cpumask_set_cpu(cpu, &d->cpu_mask); + + if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { + domain_free(hw_dom); + return; + } + + list_add_tail(&d->list, add_pos); + + err =3D resctrl_online_mon_domain(r, d); + if (err) { + list_del(&d->list); + domain_free(hw_dom); + } +} + /* * domain_add_cpu - Add a cpu to a resource's domain list. * @@ -502,61 +593,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct= rdt_hw_domain *hw_dom) */ static void domain_add_cpu(int cpu, struct rdt_resource *r) { - int id =3D get_cpu_cacheinfo_id(cpu, r->cache_level); - struct list_head *add_pos =3D NULL; - struct rdt_hw_domain *hw_dom; - struct rdt_domain *d; - int err; - - d =3D rdt_find_domain(r, id, &add_pos); - if (IS_ERR(d)) { - pr_warn("Couldn't find cache id for CPU %d\n", cpu); - return; - } - - if (d) { - cpumask_set_cpu(cpu, &d->cpu_mask); - if (r->cache.arch_has_per_cpu_cfg) - rdt_domain_reconfigure_cdp(r); - return; - } - - hw_dom =3D kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); - if (!hw_dom) - return; - - d =3D &hw_dom->d_resctrl; - d->id =3D id; - cpumask_set_cpu(cpu, &d->cpu_mask); - - rdt_domain_reconfigure_cdp(r); - - if (r->alloc_capable && domain_setup_ctrlval(r, d)) { - domain_free(hw_dom); - return; - } - - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { - domain_free(hw_dom); - return; - } - - list_add_tail(&d->list, add_pos); - - err =3D resctrl_online_domain(r, d); - if (err) { - list_del(&d->list); - domain_free(hw_dom); - } + if (r->alloc_capable) + domain_add_cpu_ctrl(cpu, r); + if (r->mon_capable) + domain_add_cpu_mon(cpu, r); } =20 -static void domain_remove_cpu(int cpu, struct rdt_resource *r) +static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r) { int id =3D get_cpu_cacheinfo_id(cpu, r->cache_level); struct rdt_hw_domain *hw_dom; struct rdt_domain *d; =20 - d =3D rdt_find_domain(r, id, NULL); + d =3D rdt_find_domain(&r->domains, id, NULL); if (IS_ERR_OR_NULL(d)) { pr_warn("Couldn't find cache id for CPU %d\n", cpu); return; @@ -565,7 +614,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resou= rce *r) =20 cpumask_clear_cpu(cpu, &d->cpu_mask); if (cpumask_empty(&d->cpu_mask)) { - resctrl_offline_domain(r, d); + resctrl_offline_ctrl_domain(r, d); list_del(&d->list); =20 /* @@ -578,6 +627,30 @@ static void domain_remove_cpu(int cpu, struct rdt_reso= urce *r) =20 return; } +} + +static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r) +{ + int id =3D get_cpu_cacheinfo_id(cpu, r->cache_level); + struct rdt_hw_domain *hw_dom; + struct rdt_domain *d; + + d =3D rdt_find_domain(&r->mondomains, id, NULL); + if (IS_ERR_OR_NULL(d)) { + pr_warn("Couldn't find cache id for CPU %d\n", cpu); + return; + } + hw_dom =3D resctrl_to_arch_dom(d); + + cpumask_clear_cpu(cpu, &d->cpu_mask); + if (cpumask_empty(&d->cpu_mask)) { + resctrl_offline_mon_domain(r, d); + list_del(&d->list); + + domain_free(hw_dom); + + return; + } =20 if (r =3D=3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) { if (is_mbm_enabled() && cpu =3D=3D d->mbm_work_cpu) { @@ -592,6 +665,14 @@ static void domain_remove_cpu(int cpu, struct rdt_reso= urce *r) } } =20 +static void domain_remove_cpu(int cpu, struct rdt_resource *r) +{ + if (r->alloc_capable) + domain_remove_cpu_ctrl(cpu, r); + if (r->mon_capable) + domain_remove_cpu_mon(cpu, r); +} + static void clear_closid_rmid(int cpu) { struct resctrl_pqr_state *state =3D this_cpu_ptr(&pqr_state); diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cp= u/resctrl/ctrlmondata.c index b44c487727d4..839df83d1a0a 100644 --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c @@ -560,7 +560,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg) evtid =3D md.u.evtid; =20 r =3D &rdt_resources_all[resid].r_resctrl; - d =3D rdt_find_domain(r, domid, NULL); + d =3D rdt_find_domain(&r->mondomains, domid, NULL); if (IS_ERR_OR_NULL(d)) { ret =3D -ENOENT; goto out; diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/re= sctrl/monitor.c index ded1fc7cb7cb..66beca785535 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -340,7 +340,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry) =20 entry->busy =3D 0; cpu =3D get_cpu(); - list_for_each_entry(d, &r->domains, list) { + list_for_each_entry(d, &r->mondomains, list) { if (cpumask_test_cpu(cpu, &d->cpu_mask)) { err =3D resctrl_arch_rmid_read(r, d, entry->rmid, QOS_L3_OCCUP_EVENT_ID, diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/r= esctrl/rdtgroup.c index 725344048f85..27753eb5d513 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1496,7 +1496,7 @@ static int mbm_config_show(struct seq_file *s, struct= rdt_resource *r, u32 evtid =20 mutex_lock(&rdtgroup_mutex); =20 - list_for_each_entry(dom, &r->domains, list) { + list_for_each_entry(dom, &r->mondomains, list) { if (sep) seq_puts(s, ";"); =20 @@ -1619,7 +1619,7 @@ static int mon_config_write(struct rdt_resource *r, c= har *tok, u32 evtid) return -EINVAL; } =20 - list_for_each_entry(d, &r->domains, list) { + list_for_each_entry(d, &r->mondomains, list) { if (d->id =3D=3D dom_id) { ret =3D mbm_config_write_domain(r, d, evtid, val); if (ret) @@ -2525,7 +2525,7 @@ static int rdt_get_tree(struct fs_context *fc) =20 if (is_mbm_enabled()) { r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; - list_for_each_entry(dom, &r->domains, list) + list_for_each_entry(dom, &r->mondomains, list) mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL); } =20 @@ -2919,7 +2919,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_= node *parent_kn, struct rdt_domain *dom; int ret; =20 - list_for_each_entry(dom, &r->domains, list) { + list_for_each_entry(dom, &r->mondomains, list) { ret =3D mkdir_mondata_subdir(parent_kn, dom, r, prgrp); if (ret) return ret; @@ -3708,15 +3708,17 @@ static void domain_destroy_mon_state(struct rdt_dom= ain *d) kfree(d->mbm_local); } =20 -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d) +void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain= *d) { lockdep_assert_held(&rdtgroup_mutex); =20 if (supports_mba_mbps() && r->rid =3D=3D RDT_RESOURCE_MBA) mba_sc_domain_destroy(r, d); +} =20 - if (!r->mon_capable) - return; +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain = *d) +{ + lockdep_assert_held(&rdtgroup_mutex); =20 /* * If resctrl is mounted, remove all the @@ -3773,18 +3775,22 @@ static int domain_setup_mon_state(struct rdt_resour= ce *r, struct rdt_domain *d) return 0; } =20 -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) +int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *= d) { - int err; - lockdep_assert_held(&rdtgroup_mutex); =20 if (supports_mba_mbps() && r->rid =3D=3D RDT_RESOURCE_MBA) /* RDT_RESOURCE_MBA is never mon_capable */ return mba_sc_domain_allocate(r, d); =20 - if (!r->mon_capable) - return 0; + return 0; +} + +int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d) +{ + int err; + + lockdep_assert_held(&rdtgroup_mutex); =20 err =3D domain_setup_mon_state(r, d); if (err) --=20 2.40.1 From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E98DC04A6A for ; Sat, 22 Jul 2023 19:08:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229805AbjGVTH5 (ORCPT ); Sat, 22 Jul 2023 15:07:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229710AbjGVTHy (ORCPT ); Sat, 22 Jul 2023 15:07:54 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D62BE66; Sat, 22 Jul 2023 12:07:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052873; x=1721588873; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Yy6x70ipohJUPyKImz7JLNE9lxHDkygNbgZeApYJDZc=; b=nLtkhuXoeMtTnxTgnOfXr4AYQ3L72upxxzgJsCHe/JcdZSMLGvl3Becl LWklwzDUT2T3UUR7dIWgHypFfCs7s6ikfmcmSbN8rKE++pi6mjdW7YbBR ZxGBhhjVl34VYgVPths3E58HH57qSu7TmIm0PTdVCe0ePzkBhWuX/HEoc iESABcaFEu9GfucIPA9/8qz7F7uMlQr1UcZe3QPK0b71/dJvnhTkabY+S wTrlQToqmQMdckPTsafoSUpZXs/tniP4iqdoLTso6pyKZClBSJ2h20BKp B3pKHGEuikNbZPMrsKQRVDtth4YWW2zcCvV9wgZk9rk7WRoM2zr/6iBLU g==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823971" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823971" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368081" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368081" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:51 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 2/7] x86/resctrl: Split the rdt_domain structures Date: Sat, 22 Jul 2023 12:07:35 -0700 Message-Id: <20230722190740.326190-3-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The rdt_domain and rdt_hw_domain structures contain an amalgam of fields used by control and monitoring features. Now that there are separate domain lists for control/monitoring these can be divided between two structures. First step: Add new domain structures for monitoring with the fields that are needed. Leave these fields in the legacy structure so compilation won't fail. They will be deleted once all the monitoring code has been converted to use the new structure. Signed-off-by: Tony Luck --- include/linux/resctrl.h | 28 +++++++++++++++++++++++++- arch/x86/kernel/cpu/resctrl/internal.h | 17 +++++++++++++++- 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 1267d56f9e76..475912662e47 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -53,7 +53,7 @@ struct resctrl_staged_config { }; =20 /** - * struct rdt_domain - group of CPUs sharing a resctrl resource + * struct rdt_domain - group of CPUs sharing a resctrl control resource * @list: all instances of this resource * @id: unique id for this instance * @cpu_mask: which CPUs share this resource @@ -86,6 +86,32 @@ struct rdt_domain { u32 *mbps_val; }; =20 +/** + * struct rdt_mondomain - group of CPUs sharing a resctrl monitor resource + * @list: all instances of this resource + * @id: unique id for this instance + * @cpu_mask: which CPUs share this resource + * @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold + * @mbm_total: saved state for MBM total bandwidth + * @mbm_local: saved state for MBM local bandwidth + * @mbm_over: worker to periodically read MBM h/w counters + * @cqm_limbo: worker to periodically read CQM h/w counters + * @mbm_work_cpu: worker CPU for MBM h/w counters + * @cqm_work_cpu: worker CPU for CQM h/w counters + */ +struct rdt_mondomain { + struct list_head list; + int id; + struct cpumask cpu_mask; + unsigned long *rmid_busy_llc; + struct mbm_state *mbm_total; + struct mbm_state *mbm_local; + struct delayed_work mbm_over; + struct delayed_work cqm_limbo; + int mbm_work_cpu; + int cqm_work_cpu; +}; + /** * struct resctrl_cache - Cache allocation related data * @cbm_len: Length of the cache bit mask diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/r= esctrl/internal.h index c5e2ac2a60cf..e956090a874e 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -320,7 +320,7 @@ struct arch_mbm_state { =20 /** * struct rdt_hw_domain - Arch private attributes of a set of CPUs that sh= are - * a resource + * a control resource * @d_resctrl: Properties exposed to the resctrl file system * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) * @arch_mbm_total: arch private state for MBM total bandwidth @@ -335,6 +335,21 @@ struct rdt_hw_domain { struct arch_mbm_state *arch_mbm_local; }; =20 +/** + * struct rdt_hw_mondomain - Arch private attributes of a set of CPUs that= share + * a monitor resource + * @d_resctrl: Properties exposed to the resctrl file system + * @arch_mbm_total: arch private state for MBM total bandwidth + * @arch_mbm_local: arch private state for MBM local bandwidth + * + * Members of this structure are accessed via helpers that provide abstrac= tion. + */ +struct rdt_hw_mondomain { + struct rdt_mondomain d_resctrl; + struct arch_mbm_state *arch_mbm_total; + struct arch_mbm_state *arch_mbm_local; +}; + static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain = *r) { return container_of(r, struct rdt_hw_domain, d_resctrl); --=20 2.40.1 From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 833B7C001DE for ; Sat, 22 Jul 2023 19:08:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229515AbjGVTIP (ORCPT ); Sat, 22 Jul 2023 15:08:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229785AbjGVTH4 (ORCPT ); Sat, 22 Jul 2023 15:07:56 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAF7CE66; Sat, 22 Jul 2023 12:07:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052874; x=1721588874; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=befwtDDXWSFUCMNKnCYf7+XXxsRZza2+B9XgXPEc3dU=; b=WGPR117XTxeLv/U8p6YtbavXBZtuoJkfno8wIElwJbCzo+3go3sY42lu ElCyA5gGa3g2Q0fkIvSV+a1YUoNsIRNZ7oTsVpu1CaZYfq1Jk8z3kBD+Z DW8s+jHj1K+lac7OYH4F5STFkCrnSUc0hrRuorY86scyKlu/AvfbLTek4 c99iS59wnSyev+N3SCzF5HnRjmVCkCTdBQW0fFHe3K+G1yUVfXMK5SmqJ JSdqJSRozXYK0cLjZRxNw/ndZmvKBe/emjfXTORZRsfUqVrgq1Nz80BnF gNYZILDvx8XQONR8WkAeidDeQ3H52Q6P/uwUfpzT/EqrbMtue9mYWzmVB A==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823972" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823972" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368084" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368084" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:51 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 3/7] x86/resctrl: Change monitor code to use rdt_mondomain Date: Sat, 22 Jul 2023 12:07:36 -0700 Message-Id: <20230722190740.326190-4-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A few functions need to be duplicated to provide versions to operate on control and monitor domains respectively. But most of the changes are just fixing argument and return value types. Signed-off-by: Tony Luck --- include/linux/resctrl.h | 10 +++--- arch/x86/kernel/cpu/resctrl/internal.h | 21 +++++++----- arch/x86/kernel/cpu/resctrl/core.c | 40 ++++++++++++++--------- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +-- arch/x86/kernel/cpu/resctrl/monitor.c | 38 ++++++++++----------- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 +++++++------- 6 files changed, 75 insertions(+), 62 deletions(-) diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 475912662e47..663bbc427c4b 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -248,9 +248,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, str= uct rdt_domain *d, u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *= d); -int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d= ); +int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mondomain= *d); void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain= *d); -void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain = *d); +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mondoma= in *d); =20 /** * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rm= id @@ -266,7 +266,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r,= struct rdt_domain *d); * Return: * 0 on success, or -EIO, -EINVAL etc on error. */ -int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, +int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mondomain *d, u32 rmid, enum resctrl_event_id eventid, u64 *val); =20 /** @@ -279,7 +279,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, stru= ct rdt_domain *d, * * This can be called from any CPU. */ -void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mondomain = *d, u32 rmid, enum resctrl_event_id eventid); =20 /** @@ -291,7 +291,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, st= ruct rdt_domain *d, * * This can be called from any CPU. */ -void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain= *d); +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mondom= ain *d); =20 extern unsigned int resctrl_rmid_realloc_threshold; extern unsigned int resctrl_rmid_realloc_limit; diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/r= esctrl/internal.h index e956090a874e..401af6ccf272 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -106,7 +106,7 @@ union mon_data_bits { struct rmid_read { struct rdtgroup *rgrp; struct rdt_resource *r; - struct rdt_domain *d; + struct rdt_mondomain *d; enum resctrl_event_id evtid; bool first; int err; @@ -355,6 +355,11 @@ static inline struct rdt_hw_domain *resctrl_to_arch_do= m(struct rdt_domain *r) return container_of(r, struct rdt_hw_domain, d_resctrl); } =20 +static inline struct rdt_hw_mondomain *resctrl_to_arch_mondom(struct rdt_m= ondomain *r) +{ + return container_of(r, struct rdt_hw_mondomain, d_resctrl); +} + /** * struct msr_param - set a range of MSRs from a domain * @res: The resource to use @@ -526,8 +531,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn); int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name); int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name, umode_t mask); -struct rdt_domain *rdt_find_domain(struct list_head *h, int id, - struct list_head **pos); +void *rdt_find_domain(struct list_head *h, int id, + struct list_head **pos); ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); int rdtgroup_schemata_show(struct kernfs_open_file *of, @@ -556,17 +561,17 @@ bool __init rdt_cpu_has(int flag); void mon_event_count(void *info); int rdtgroup_mondata_show(struct seq_file *m, void *arg); void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, - struct rdt_domain *d, struct rdtgroup *rdtgrp, + struct rdt_mondomain *d, struct rdtgroup *rdtgrp, int evtid, int first); -void mbm_setup_overflow_handler(struct rdt_domain *dom, +void mbm_setup_overflow_handler(struct rdt_mondomain *dom, unsigned long delay_ms); void mbm_handle_overflow(struct work_struct *work); void __init intel_rdt_mbm_apply_quirk(void); bool is_mba_sc(struct rdt_resource *r); -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_m= s); +void cqm_setup_limbo_handler(struct rdt_mondomain *dom, unsigned long dela= y_ms); void cqm_handle_limbo(struct work_struct *work); -bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d); -void __check_limbo(struct rdt_domain *d, bool force_free); +bool has_busy_rmid(struct rdt_resource *r, struct rdt_mondomain *d); +void __check_limbo(struct rdt_mondomain *d, bool force_free); void rdt_domain_reconfigure_cdp(struct rdt_resource *r); void __init thread_throttle_mode_init(void); void __init mbm_config_rftype_init(const char *config); diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resct= rl/core.c index 274605aaa026..0161362b0c3e 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -393,9 +393,12 @@ void rdt_ctrl_update(void *arg) * id is found in a domain, return the domain. Otherwise, if requested by * caller, return the first domain whose id is bigger than the input id. * The domain list is sorted by id in ascending order. + * + * N.B. Returned value may be either a pointer to "struct rdt_domain" or + * to "struct rdt_mondomain" depending on which domain list is scanned. */ -struct rdt_domain *rdt_find_domain(struct list_head *h, int id, - struct list_head **pos) +void *rdt_find_domain(struct list_head *h, int id, + struct list_head **pos) { struct rdt_domain *d; struct list_head *l; @@ -434,10 +437,15 @@ static void setup_default_ctrlval(struct rdt_resource= *r, u32 *dc) } =20 static void domain_free(struct rdt_hw_domain *hw_dom) +{ + kfree(hw_dom->ctrl_val); + kfree(hw_dom); +} + +static void mondomain_free(struct rdt_hw_mondomain *hw_dom) { kfree(hw_dom->arch_mbm_total); kfree(hw_dom->arch_mbm_local); - kfree(hw_dom->ctrl_val); kfree(hw_dom); } =20 @@ -467,7 +475,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r,= struct rdt_domain *d) * @num_rmid: The size of the MBM counter array * @hw_dom: The domain that owns the allocated arrays */ -static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_do= m) +static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mondomain *hw= _dom) { size_t tsize; =20 @@ -539,8 +547,8 @@ static void domain_add_cpu_mon(int cpu, struct rdt_reso= urce *r) { int id =3D get_cpu_cacheinfo_id(cpu, r->mon_scope); struct list_head *add_pos =3D NULL; - struct rdt_hw_domain *hw_dom; - struct rdt_domain *d; + struct rdt_hw_mondomain *hw_mondom; + struct rdt_mondomain *d; int err; =20 d =3D rdt_find_domain(&r->mondomains, id, &add_pos); @@ -556,16 +564,16 @@ static void domain_add_cpu_mon(int cpu, struct rdt_re= source *r) return; } =20 - hw_dom =3D kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); - if (!hw_dom) + hw_mondom =3D kzalloc_node(sizeof(*hw_mondom), GFP_KERNEL, cpu_to_node(cp= u)); + if (!hw_mondom) return; =20 - d =3D &hw_dom->d_resctrl; + d =3D &hw_mondom->d_resctrl; d->id =3D id; cpumask_set_cpu(cpu, &d->cpu_mask); =20 - if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { - domain_free(hw_dom); + if (arch_domain_mbm_alloc(r->num_rmid, hw_mondom)) { + mondomain_free(hw_mondom); return; } =20 @@ -574,7 +582,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_reso= urce *r) err =3D resctrl_online_mon_domain(r, d); if (err) { list_del(&d->list); - domain_free(hw_dom); + mondomain_free(hw_mondom); } } =20 @@ -632,22 +640,22 @@ static void domain_remove_cpu_ctrl(int cpu, struct rd= t_resource *r) static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r) { int id =3D get_cpu_cacheinfo_id(cpu, r->cache_level); - struct rdt_hw_domain *hw_dom; - struct rdt_domain *d; + struct rdt_hw_mondomain *hw_mondom; + struct rdt_mondomain *d; =20 d =3D rdt_find_domain(&r->mondomains, id, NULL); if (IS_ERR_OR_NULL(d)) { pr_warn("Couldn't find cache id for CPU %d\n", cpu); return; } - hw_dom =3D resctrl_to_arch_dom(d); + hw_mondom =3D resctrl_to_arch_mondom(d); =20 cpumask_clear_cpu(cpu, &d->cpu_mask); if (cpumask_empty(&d->cpu_mask)) { resctrl_offline_mon_domain(r, d); list_del(&d->list); =20 - domain_free(hw_dom); + mondomain_free(hw_mondom); =20 return; } diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cp= u/resctrl/ctrlmondata.c index 839df83d1a0a..86fc5b0e3d39 100644 --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c @@ -521,7 +521,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of, } =20 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, - struct rdt_domain *d, struct rdtgroup *rdtgrp, + struct rdt_mondomain *d, struct rdtgroup *rdtgrp, int evtid, int first) { /* @@ -544,7 +544,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg) struct rdtgroup *rdtgrp; struct rdt_resource *r; union mon_data_bits md; - struct rdt_domain *d; + struct rdt_mondomain *d; struct rmid_read rr; int ret =3D 0; =20 diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/re= sctrl/monitor.c index 66beca785535..0d9605fccb34 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -170,7 +170,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id = eventid, u64 *val) return 0; } =20 -static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_= dom, +static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mondomain *= hw_dom, u32 rmid, enum resctrl_event_id eventid) { @@ -189,10 +189,10 @@ static struct arch_mbm_state *get_arch_mbm_state(stru= ct rdt_hw_domain *hw_dom, return NULL; } =20 -void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mondomain = *d, u32 rmid, enum resctrl_event_id eventid) { - struct rdt_hw_domain *hw_dom =3D resctrl_to_arch_dom(d); + struct rdt_hw_mondomain *hw_dom =3D resctrl_to_arch_mondom(d); struct arch_mbm_state *am; =20 am =3D get_arch_mbm_state(hw_dom, rmid, eventid); @@ -208,9 +208,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, st= ruct rdt_domain *d, * Assumes that hardware counters are also reset and thus that there is * no need to record initial non-zero counts. */ -void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain= *d) +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mondom= ain *d) { - struct rdt_hw_domain *hw_dom =3D resctrl_to_arch_dom(d); + struct rdt_hw_mondomain *hw_dom =3D resctrl_to_arch_mondom(d); =20 if (is_mbm_total_enabled()) memset(hw_dom->arch_mbm_total, 0, @@ -229,11 +229,11 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_m= sr, unsigned int width) return chunks >> shift; } =20 -int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, +int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mondomain *d, u32 rmid, enum resctrl_event_id eventid, u64 *val) { struct rdt_hw_resource *hw_res =3D resctrl_to_arch_res(r); - struct rdt_hw_domain *hw_dom =3D resctrl_to_arch_dom(d); + struct rdt_hw_mondomain *hw_dom =3D resctrl_to_arch_mondom(d); struct arch_mbm_state *am; u64 msr_val, chunks; int ret; @@ -266,7 +266,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, stru= ct rdt_domain *d, * decrement the count. If the busy count gets to zero on an RMID, we * free the RMID */ -void __check_limbo(struct rdt_domain *d, bool force_free) +void __check_limbo(struct rdt_mondomain *d, bool force_free) { struct rdt_resource *r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; struct rmid_entry *entry; @@ -305,7 +305,7 @@ void __check_limbo(struct rdt_domain *d, bool force_fre= e) } } =20 -bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d) +bool has_busy_rmid(struct rdt_resource *r, struct rdt_mondomain *d) { return find_first_bit(d->rmid_busy_llc, r->num_rmid) !=3D r->num_rmid; } @@ -334,7 +334,7 @@ int alloc_rmid(void) static void add_rmid_to_limbo(struct rmid_entry *entry) { struct rdt_resource *r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; - struct rdt_domain *d; + struct rdt_mondomain *d; int cpu, err; u64 val =3D 0; =20 @@ -383,7 +383,7 @@ void free_rmid(u32 rmid) list_add_tail(&entry->list, &rmid_free_lru); } =20 -static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid, +static struct mbm_state *get_mbm_state(struct rdt_mondomain *d, u32 rmid, enum resctrl_event_id evtid) { switch (evtid) { @@ -516,7 +516,7 @@ void mon_event_count(void *info) * throttle MSRs already have low percentage values. To avoid * unnecessarily restricting such rdtgroups, we also increase the bandwidt= h. */ -static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mb= m) +static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mondomain *dom= _mbm) { u32 closid, rmid, cur_msr_val, new_msr_val; struct mbm_state *pmbm_data, *cmbm_data; @@ -600,7 +600,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct= rdt_domain *dom_mbm) } } =20 -static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int r= mid) +static void mbm_update(struct rdt_resource *r, struct rdt_mondomain *d, in= t rmid) { struct rmid_read rr; =20 @@ -641,12 +641,12 @@ void cqm_handle_limbo(struct work_struct *work) unsigned long delay =3D msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL); int cpu =3D smp_processor_id(); struct rdt_resource *r; - struct rdt_domain *d; + struct rdt_mondomain *d; =20 mutex_lock(&rdtgroup_mutex); =20 r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; - d =3D container_of(work, struct rdt_domain, cqm_limbo.work); + d =3D container_of(work, struct rdt_mondomain, cqm_limbo.work); =20 __check_limbo(d, false); =20 @@ -656,7 +656,7 @@ void cqm_handle_limbo(struct work_struct *work) mutex_unlock(&rdtgroup_mutex); } =20 -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_m= s) +void cqm_setup_limbo_handler(struct rdt_mondomain *dom, unsigned long dela= y_ms) { unsigned long delay =3D msecs_to_jiffies(delay_ms); int cpu; @@ -674,7 +674,7 @@ void mbm_handle_overflow(struct work_struct *work) int cpu =3D smp_processor_id(); struct list_head *head; struct rdt_resource *r; - struct rdt_domain *d; + struct rdt_mondomain *d; =20 mutex_lock(&rdtgroup_mutex); =20 @@ -682,7 +682,7 @@ void mbm_handle_overflow(struct work_struct *work) goto out_unlock; =20 r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; - d =3D container_of(work, struct rdt_domain, mbm_over.work); + d =3D container_of(work, struct rdt_mondomain, mbm_over.work); =20 list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { mbm_update(r, d, prgrp->mon.rmid); @@ -701,7 +701,7 @@ void mbm_handle_overflow(struct work_struct *work) mutex_unlock(&rdtgroup_mutex); } =20 -void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long dela= y_ms) +void mbm_setup_overflow_handler(struct rdt_mondomain *dom, unsigned long d= elay_ms) { unsigned long delay =3D msecs_to_jiffies(delay_ms); int cpu; diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/r= esctrl/rdtgroup.c index 27753eb5d513..4a268df9b456 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1483,7 +1483,7 @@ static void mon_event_config_read(void *info) mon_info->mon_config =3D msrval & MAX_EVT_CONFIG_BITS; } =20 -static void mondata_config_read(struct rdt_domain *d, struct mon_config_in= fo *mon_info) +static void mondata_config_read(struct rdt_mondomain *d, struct mon_config= _info *mon_info) { smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1); } @@ -1491,7 +1491,7 @@ static void mondata_config_read(struct rdt_domain *d,= struct mon_config_info *mo static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32= evtid) { struct mon_config_info mon_info =3D {0}; - struct rdt_domain *dom; + struct rdt_mondomain *dom; bool sep =3D false; =20 mutex_lock(&rdtgroup_mutex); @@ -1548,7 +1548,7 @@ static void mon_event_config_write(void *info) } =20 static int mbm_config_write_domain(struct rdt_resource *r, - struct rdt_domain *d, u32 evtid, u32 val) + struct rdt_mondomain *d, u32 evtid, u32 val) { struct mon_config_info mon_info =3D {0}; int ret =3D 0; @@ -1598,7 +1598,7 @@ static int mon_config_write(struct rdt_resource *r, c= har *tok, u32 evtid) { char *dom_str =3D NULL, *id_str; unsigned long dom_id, val; - struct rdt_domain *d; + struct rdt_mondomain *d; int ret =3D 0; =20 next: @@ -2463,7 +2463,7 @@ static void schemata_list_destroy(void) static int rdt_get_tree(struct fs_context *fc) { struct rdt_fs_context *ctx =3D rdt_fc2context(fc); - struct rdt_domain *dom; + struct rdt_mondomain *dom; struct rdt_resource *r; int ret; =20 @@ -2845,7 +2845,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt= _resource *r, } =20 static int mkdir_mondata_subdir(struct kernfs_node *parent_kn, - struct rdt_domain *d, + struct rdt_mondomain *d, struct rdt_resource *r, struct rdtgroup *prgrp) { union mon_data_bits priv; @@ -2894,7 +2894,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *p= arent_kn, * and "monitor" groups with given domain id. */ static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, - struct rdt_domain *d) + struct rdt_mondomain *d) { struct kernfs_node *parent_kn; struct rdtgroup *prgrp, *crgrp; @@ -2916,7 +2916,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_= node *parent_kn, struct rdt_resource *r, struct rdtgroup *prgrp) { - struct rdt_domain *dom; + struct rdt_mondomain *dom; int ret; =20 list_for_each_entry(dom, &r->mondomains, list) { @@ -3701,7 +3701,7 @@ static int __init rdtgroup_setup_root(void) return ret; } =20 -static void domain_destroy_mon_state(struct rdt_domain *d) +static void domain_destroy_mon_state(struct rdt_mondomain *d) { bitmap_free(d->rmid_busy_llc); kfree(d->mbm_total); @@ -3716,7 +3716,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource = *r, struct rdt_domain *d) mba_sc_domain_destroy(r, d); } =20 -void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain = *d) +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mondoma= in *d) { lockdep_assert_held(&rdtgroup_mutex); =20 @@ -3745,7 +3745,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *= r, struct rdt_domain *d) domain_destroy_mon_state(d); } =20 -static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domai= n *d) +static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mondo= main *d) { size_t tsize; =20 @@ -3786,7 +3786,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r= , struct rdt_domain *d) return 0; } =20 -int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d) +int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mondomain= *d) { int err; =20 --=20 2.40.1 From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C382AC0015E for ; Sat, 22 Jul 2023 19:08:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229843AbjGVTIC (ORCPT ); Sat, 22 Jul 2023 15:08:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229684AbjGVTHz (ORCPT ); Sat, 22 Jul 2023 15:07:55 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A2AFE65; Sat, 22 Jul 2023 12:07:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052874; x=1721588874; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=S8skQQ7drJ+64mO8NcrQzFMdrllN3e8XeB4pKF6xRCE=; b=Rhm5MRAltWbnFMD/FG6xQmYR2POmfCqLcoYsZrhgTGKGSeAR7RdkQ+6H 6PZcCZsOo5pgbqXvzt6f29TXKTso8K20d5foW7uN9x+GrATSqbSRsvijL N/4hNRsNWYiTpQDh4kT/aMyXTc9Z01jW3WCehzEq6KNtmBbz+5qUvpn0b 36gEinwcpL0Ie0AcvO7bDD8HWj0dBOlprd4wlVUo39hhiAqS4YLV70Xhz 8LzyqQcRKG7mVPPMzvU6s/9xgzh+F1FxqtiafDaIvVsFoV08ynbvehb2u z0msKtwBtz9faNF1cHefYThHTyjDx/gfupj+hGI3GdFF4nva8CG3Q8jZ8 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823981" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823981" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368088" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368088" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:51 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 4/7] x86/resctrl: Delete unused fields from struct rdt_domain Date: Sat, 22 Jul 2023 12:07:37 -0700 Message-Id: <20230722190740.326190-5-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now that all the monitoring functions use struct rdt_mondomain the monitor fields can be dropped from the structure used for control operations. Signed-off-by: Tony Luck --- include/linux/resctrl.h | 14 -------------- arch/x86/kernel/cpu/resctrl/internal.h | 4 ---- 2 files changed, 18 deletions(-) diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 663bbc427c4b..80a89d171eba 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -57,13 +57,6 @@ struct resctrl_staged_config { * @list: all instances of this resource * @id: unique id for this instance * @cpu_mask: which CPUs share this resource - * @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold - * @mbm_total: saved state for MBM total bandwidth - * @mbm_local: saved state for MBM local bandwidth - * @mbm_over: worker to periodically read MBM h/w counters - * @cqm_limbo: worker to periodically read CQM h/w counters - * @mbm_work_cpu: worker CPU for MBM h/w counters - * @cqm_work_cpu: worker CPU for CQM h/w counters * @plr: pseudo-locked region (if any) associated with domain * @staged_config: parsed configuration to be applied * @mbps_val: When mba_sc is enabled, this holds the array of user @@ -74,13 +67,6 @@ struct rdt_domain { struct list_head list; int id; struct cpumask cpu_mask; - unsigned long *rmid_busy_llc; - struct mbm_state *mbm_total; - struct mbm_state *mbm_local; - struct delayed_work mbm_over; - struct delayed_work cqm_limbo; - int mbm_work_cpu; - int cqm_work_cpu; struct pseudo_lock_region *plr; struct resctrl_staged_config staged_config[CDP_NUM_TYPES]; u32 *mbps_val; diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/r= esctrl/internal.h index 401af6ccf272..016ef0373c5a 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -323,16 +323,12 @@ struct arch_mbm_state { * a control resource * @d_resctrl: Properties exposed to the resctrl file system * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) - * @arch_mbm_total: arch private state for MBM total bandwidth - * @arch_mbm_local: arch private state for MBM local bandwidth * * Members of this structure are accessed via helpers that provide abstrac= tion. */ struct rdt_hw_domain { struct rdt_domain d_resctrl; u32 *ctrl_val; - struct arch_mbm_state *arch_mbm_total; - struct arch_mbm_state *arch_mbm_local; }; =20 /** --=20 2.40.1 From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80A4BC0015E for ; Sat, 22 Jul 2023 19:08:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229897AbjGVTIN (ORCPT ); Sat, 22 Jul 2023 15:08:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229771AbjGVTH4 (ORCPT ); Sat, 22 Jul 2023 15:07:56 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2887410D5; Sat, 22 Jul 2023 12:07:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052875; x=1721588875; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZOy5MkUskaMi6eenP6yRXpishaPCWFKbnwW46GbBktI=; b=InNELO8yAmETukic+SMgG2lWQ0WrUAv5yCTLFat7ASoq2nQ/ZqRlf5Ua 6p5M+WRooZLOV0JTSPFeplVLlmMhBXbyzgvt99ZXEBwvyyNRxw+qBK3jF ouBURxR8Ws+iexuHCjokQ8wN5EWgE4iuzQJulKJrPj3foPr7sxvGKJpOF 5Ic59aYj/vRnXai6SrXLpAZnYUcyFFwdB6utB+TINLQLL/b/5s8I4QL+7 k7txYVje7AP8rD0sbRNdEKvT2uSUvAph/vdrsSXATBH2cp2LCO2V4UfTd t9AOturEYtMdYAbQDGBwRmJKTlMdIcNjHczbNeXXWGIZpg33RxFQlBEhH Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823986" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823986" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368091" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368091" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 5/7] x86/resctrl: Determine if Sub-NUMA Cluster is enabled and initialize. Date: Sat, 22 Jul 2023 12:07:38 -0700 Message-Id: <20230722190740.326190-6-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There isn't a simple hardware enumeration to indicate to software that a system is running with Sub-NUMA Cluster enabled. Compare the number of NUMA nodes with the number of L3 caches to calculate the number of Sub-NUMA nodes per L3 cache. When Sub-NUMA cluster mode is enabled in BIOS setup the RMID counters are distributed equally between the SNC nodes within each socket. E.g. if there are 400 RMID counters, and the system is configured with two SNC nodes per socket, then RMID counter 0..199 are used on SNC node 0 on the socket, and RMID counter 200..399 on SNC node 1. A model specific MSR (0xca0) can change the configuration of the RMIDs when SNC mode is enabled. The MSR controls the interpretation of the RMID field in the IA32_PQR_ASSOC MSR so that the appropriate hardware counters within the SNC node are updated. To read the RMID counters an offset must be used to get data from the physical counter associated with the SNC node. As in the example above with 400 RMID counters Linux sees only 200 counters. No special action is needed to read a counter from the first SNC node on a socket. But to read a Linux visible counter 50 on the second SNC node the kernel must load 250 into the QM_EVTSEL MSR. N.B. this works well for well-behaved NUMA applications that access memory predominantly from the local memory node. For applications that access memory across multiple nodes it may be necessary for the user to read counters for all SNC nodes on a socket and add the values to get the actual LLC occupancy or memory bandwidth. Perhaps this isn't all that different from applications that span across multiple sockets in a legacy system. The cache allocation feature still provides the same number of bits in a mask to control allocation into the L3 cache. But each of those ways has its capacity reduced because the cache is divided between the SNC nodes. Adjust the value reported in the resctrl "size" file accordingly. Mounting the file system with the "mba_MBps" option is disabled when SNC mode is enabled. This is because the measurement of bandwidth is per SNC node, while the MBA throttling controls are still at the L3 cache scope. Signed-off-by: Tony Luck --- include/linux/resctrl.h | 2 + arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/resctrl/internal.h | 2 + arch/x86/kernel/cpu/resctrl/core.c | 82 +++++++++++++++++++++++++- arch/x86/kernel/cpu/resctrl/monitor.c | 18 +++++- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +- 6 files changed, 103 insertions(+), 6 deletions(-) diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 80a89d171eba..576dc21bd990 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -200,6 +200,8 @@ struct rdt_resource { bool cdp_capable; }; =20 +#define MON_SCOPE_NODE 100 + /** * struct resctrl_schema - configuration abilities of a resource presented= to * user-space diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 3aedae61af4f..4b624a37d64a 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1087,6 +1087,7 @@ #define MSR_IA32_QM_CTR 0xc8e #define MSR_IA32_PQR_ASSOC 0xc8f #define MSR_IA32_L3_CBM_BASE 0xc90 +#define MSR_RMID_SNC_CONFIG 0xca0 #define MSR_IA32_L2_CBM_BASE 0xd10 #define MSR_IA32_MBA_THRTL_BASE 0xd50 =20 diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/r= esctrl/internal.h index 016ef0373c5a..00a330bc5ced 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -446,6 +446,8 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key); =20 extern struct dentry *debugfs_resctrl; =20 +extern int snc_nodes_per_l3_cache; + enum resctrl_res_level { RDT_RESOURCE_L3, RDT_RESOURCE_L2, diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resct= rl/core.c index 0161362b0c3e..1331add347fc 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -16,11 +16,14 @@ =20 #define pr_fmt(fmt) "resctrl: " fmt =20 +#include #include #include #include #include +#include =20 +#include #include #include #include "internal.h" @@ -48,6 +51,13 @@ int max_name_width, max_data_width; */ bool rdt_alloc_capable; =20 +/* + * Number of SNC nodes that share each L3 cache. + * Default is 1 for systems that do not support + * SNC, or have SNC disabled. + */ +int snc_nodes_per_l3_cache =3D 1; + static void mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); @@ -543,9 +553,16 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_re= source *r) } } =20 +static int get_mon_scope_id(int cpu, int scope) +{ + if (scope =3D=3D MON_SCOPE_NODE) + return cpu_to_node(cpu); + return get_cpu_cacheinfo_id(cpu, scope); +} + static void domain_add_cpu_mon(int cpu, struct rdt_resource *r) { - int id =3D get_cpu_cacheinfo_id(cpu, r->mon_scope); + int id =3D get_mon_scope_id(cpu, r->mon_scope); struct list_head *add_pos =3D NULL; struct rdt_hw_mondomain *hw_mondom; struct rdt_mondomain *d; @@ -692,11 +709,28 @@ static void clear_closid_rmid(int cpu) wrmsr(MSR_IA32_PQR_ASSOC, 0, 0); } =20 +static void snc_remap_rmids(int cpu) +{ + u64 val; + + /* Only need to enable once per package */ + if (cpumask_first(topology_core_cpumask(cpu)) !=3D cpu) + return; + + rdmsrl(MSR_RMID_SNC_CONFIG, val); + val &=3D ~BIT_ULL(0); + wrmsrl(MSR_RMID_SNC_CONFIG, val); +} + static int resctrl_online_cpu(unsigned int cpu) { struct rdt_resource *r; =20 mutex_lock(&rdtgroup_mutex); + + if (snc_nodes_per_l3_cache > 1) + snc_remap_rmids(cpu); + for_each_capable_rdt_resource(r) domain_add_cpu(cpu, r); /* The cpu is set in default rdtgroup after online. */ @@ -951,11 +985,57 @@ static __init bool get_rdt_resources(void) return (rdt_mon_capable || rdt_alloc_capable); } =20 +static const struct x86_cpu_id snc_cpu_ids[] __initconst =3D { + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0), + {} +}; + +/* + * There isn't a simple enumeration bit to show whether SNC mode + * is enabled. Look at the ratio of number of NUMA nodes to the + * number of distinct L3 caches. Take care to skip memory-only nodes. + */ +static __init int get_snc_config(void) +{ + unsigned long *node_caches; + int mem_only_nodes =3D 0; + int cpu, node, ret; + + if (!x86_match_cpu(snc_cpu_ids)) + return 1; + + node_caches =3D kcalloc(BITS_TO_LONGS(nr_node_ids), sizeof(*node_caches),= GFP_KERNEL); + if (!node_caches) + return 1; + + cpus_read_lock(); + for_each_node(node) { + cpu =3D cpumask_first(cpumask_of_node(node)); + if (cpu < nr_cpu_ids) + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches); + else + mem_only_nodes++; + } + cpus_read_unlock(); + + ret =3D (nr_node_ids - mem_only_nodes) / bitmap_weight(node_caches, nr_no= de_ids); + kfree(node_caches); + + if (ret > 1) + rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope =3D MON_SCOPE_NOD= E; + + return ret; +} + static __init void rdt_init_res_defs_intel(void) { struct rdt_hw_resource *hw_res; struct rdt_resource *r; =20 + snc_nodes_per_l3_cache =3D get_snc_config(); + for_each_rdt_resource(r) { hw_res =3D resctrl_to_arch_res(r); =20 diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/re= sctrl/monitor.c index 0d9605fccb34..4ca064e62911 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -148,8 +148,18 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid) =20 static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val) { + struct rdt_resource *r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; + int cpu =3D get_cpu(); + int rmid_offset =3D 0; u64 msr_val; =20 + /* + * When SNC mode is on, need to compute the offset to read the + * physical RMID counter for the node to which this CPU belongs + */ + if (snc_nodes_per_l3_cache > 1) + rmid_offset =3D (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmi= d; + /* * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured * with a valid event code for supported resource type and the bits @@ -158,9 +168,11 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id= eventid, u64 *val) * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62) * are error bits. */ - wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid); + wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid + rmid_offset); rdmsrl(MSR_IA32_QM_CTR, msr_val); =20 + put_cpu(); + if (msr_val & RMID_VAL_ERROR) return -EIO; if (msr_val & RMID_VAL_UNAVAIL) @@ -783,8 +795,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r) int ret; =20 resctrl_rmid_realloc_limit =3D boot_cpu_data.x86_cache_size * 1024; - hw_res->mon_scale =3D boot_cpu_data.x86_cache_occ_scale; - r->num_rmid =3D boot_cpu_data.x86_cache_max_rmid + 1; + hw_res->mon_scale =3D boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l= 3_cache; + r->num_rmid =3D (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3= _cache; hw_res->mbm_width =3D MBM_CNTR_WIDTH_BASE; =20 if (mbm_offset > 0 && mbm_offset <=3D MBM_CNTR_WIDTH_OFFSET_MAX) diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/r= esctrl/rdtgroup.c index 4a268df9b456..d831b21f7389 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1354,7 +1354,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource= *r, } } =20 - return size; + return size / snc_nodes_per_l3_cache; } =20 /** @@ -2587,7 +2587,7 @@ static int rdt_parse_param(struct fs_context *fc, str= uct fs_parameter *param) ctx->enable_cdpl2 =3D true; return 0; case Opt_mba_mbps: - if (!supports_mba_mbps()) + if (!supports_mba_mbps() || snc_nodes_per_l3_cache > 1) return -EINVAL; ctx->enable_mba_mbps =3D true; return 0; --=20 2.40.1 From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1B70C0015E for ; Sat, 22 Jul 2023 19:08:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229534AbjGVTII (ORCPT ); Sat, 22 Jul 2023 15:08:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229756AbjGVTHz (ORCPT ); Sat, 22 Jul 2023 15:07:55 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3F1510CF; Sat, 22 Jul 2023 12:07:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052874; x=1721588874; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cXpDt+dnuqr6w3vRcORF5earUb+5d5AO1lTx5qdXM8g=; b=TJkcugbIlU6SW7BFbbI7TzRRQme8SxTm7tWXjjSWLRhTxeIfREoUE9v8 QgQ8Dl2UIiijJDIaphe4eY3Th78WdL6Qnm5Fy6K0FEbjYLuWc+hKwhxHI FUS0XxdKxFgjD3gFfbUaRtHf2JrzI+maQ2p6FFpmJFgk9B2Azel1Em8Vj 7ELSDhjt5rDtXcEE7JQ83wVJPmfjZpRVg4dWmXxOnlfx0Wv/E21Riqn1/ puTZ0Ymzbj0KZtJJPKyM/Z8jz/kkNNmRhHxKoTLvKqZmx2pNTJJtFSey+ 8/5RjPYkbqsLrJmibUeWe7G4wpxQwCgrMsFNtQjRazFiMuo0ikrZGninE g==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823988" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823988" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368094" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368094" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 6/7] x86/resctrl: Update documentation with Sub-NUMA cluster changes Date: Sat, 22 Jul 2023 12:07:39 -0700 Message-Id: <20230722190740.326190-7-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" With Sub-NUMA Cluster mode enabled the scope of monitoring resources is per-NODE instead of per-L3 cache. Suffixes of directories with "L3" in their name refer to Sub-NUMA nodes instead of L3 cache ids. Signed-off-by: Tony Luck Reviewed-by: Peter Newman --- Documentation/arch/x86/resctrl.rst | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/re= sctrl.rst index cb05d90111b4..4d9ddb91751d 100644 --- a/Documentation/arch/x86/resctrl.rst +++ b/Documentation/arch/x86/resctrl.rst @@ -345,9 +345,13 @@ When control is enabled all CTRL_MON groups will also = contain: When monitoring is enabled all MON groups will also contain: =20 "mon_data": - This contains a set of files organized by L3 domain and by - RDT event. E.g. on a system with two L3 domains there will - be subdirectories "mon_L3_00" and "mon_L3_01". Each of these + This contains a set of files organized by L3 domain or by NUMA + node (depending on whether Sub-NUMA Cluster (SNC) mode is disabled + or enabled respectively) and by RDT event. E.g. on a system with + SNC mode disabled with two L3 domains there will be subdirectories + "mon_L3_00" and "mon_L3_01". The numerical suffix refers to the + L3 cache id. With SNC enabled the directory names are the same, + but the numerical suffix refers to the node id. Each of these directories have one file per event (e.g. "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these files provide a read out of the current value of the event for --=20 2.40.1 From nobody Sun Feb 8 05:09:05 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94BFDC001DC for ; Sat, 22 Jul 2023 19:08:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229617AbjGVTIK (ORCPT ); Sat, 22 Jul 2023 15:08:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229779AbjGVTH4 (ORCPT ); Sat, 22 Jul 2023 15:07:56 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7FA4E65; Sat, 22 Jul 2023 12:07:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690052875; x=1721588875; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QLPVmH8OKhS156FijxUUbjl5TQT038tMDl99vshP4rU=; b=XjrTHdOPIuuIRs1WPJF78uvUA0y0U/DWXpQemXmZhp+c1F0R9UlxwD0n 6KSYR6eMGkD3pbAwgdEiWoCl/LWOwJZ963zZQ/ceXz+rJvfUBTBRTRGO9 qeREIjfvxOe8z+XZgym6knAz2d5EuKtdzaTm98HWl4oWbHWIKDOLIsFUE 3cSGm6OTvkzp+anaNf1jv1QDuZJpiPVGuxM5Bmi/+9HnUIrfdYpyOakVu +L+etQTw7qkfcOOJUcBCH3Jr/4SWEixPyz27Pl33sNL7jy4VGO832cf19 i6LpJTDlqka57SJdxc4WSBE03qTvUgDM5EoAIpc2Jlo+cqbiDCZAlEVRx Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="346823994" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="346823994" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10779"; a="815368098" X-IronPort-AV: E=Sophos;i="6.01,224,1684825200"; d="scan'208";a="815368098" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2023 12:07:52 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v4 7/7] selftests/resctrl: Adjust effective L3 cache size when SNC enabled Date: Sat, 22 Jul 2023 12:07:40 -0700 Message-Id: <20230722190740.326190-8-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230722190740.326190-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> <20230722190740.326190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA nodes. Systems may support splitting into either two or four nodes. When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3. Detect which SNC mode is active by comparing the number of CPUs that share a cache with CPU0, with the number of CPUs on node0. Reported-by: "Shaopeng Tan (Fujitsu)" Closes: https://lore.kernel.org/r/TYAPR01MB6330B9B17686EF426D2C3F308B25A@TY= APR01MB6330.jpnprd01.prod.outlook.com Signed-off-by: Tony Luck --- tools/testing/selftests/resctrl/resctrl.h | 1 + tools/testing/selftests/resctrl/resctrlfs.c | 57 +++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/self= tests/resctrl/resctrl.h index 87e39456dee0..a8b43210b573 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/se= lftests/resctrl/resctrlfs.c index fb00245dee92..79eecbf9f863 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -130,6 +130,61 @@ int get_resource_id(int cpu_no, int *resource_id) return 0; } =20 +/* + * Count number of CPUs in a /sys bit map + */ +static int count_sys_bitmap_bits(char *name) +{ + FILE *fp =3D fopen(name, "r"); + int count =3D 0, c; + + if (!fp) + return 0; + + while ((c =3D fgetc(fp)) !=3D EOF) { + if (!isxdigit(c)) + continue; + switch (c) { + case 'f': + count++; + case '7': case 'b': case 'd': case 'e': + count++; + case '3': case '5': case '6': case '9': case 'a': case 'c': + count++; + case '1': case '2': case '4': case '8': + count++; + } + } + fclose(fp); + + return count; +} + +/* + * Detect SNC by compating #CPUs in node0 with #CPUs sharing LLC with CPU0 + * Try to get this right, even if a few CPUs are offline so that the number + * of CPUs in node0 is not exactly half or a quarter of the CPUs sharing t= he + * LLC of CPU0. + */ +static int snc_ways(void) +{ + int node_cpus, cache_cpus; + + node_cpus =3D count_sys_bitmap_bits("/sys/devices/system/node/node0/cpuma= p"); + cache_cpus =3D count_sys_bitmap_bits("/sys/devices/system/cpu/cpu0/cache/= index3/shared_cpu_map"); + + if (!node_cpus || !cache_cpus) { + fprintf(stderr, "Warning could not determine Sub-NUMA Cluster mode\n"); + return 1; + } + + if (4 * node_cpus >=3D cache_cpus) + return 4; + else if (2 * node_cpus >=3D cache_cpus) + return 2; + return 1; +} + /* * get_cache_size - Get cache size for a specified CPU * @cpu_no: CPU number @@ -190,6 +245,8 @@ int get_cache_size(int cpu_no, char *cache_type, unsign= ed long *cache_size) break; } =20 + if (cache_num =3D=3D 3) + *cache_size /=3D snc_ways(); return 0; } =20 --=20 2.40.1