[PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups

Babu Moger posted 23 patches 1 year ago
There is a newer version of this series
[PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Babu Moger 1 year ago
When mbm_cntr_assign mode is enabled, users can designate which of the MBM
events in the CTRL_MON or MON groups should have counters assigned.

Provide an interface for assigning MBM events by writing to the file:
/sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
events can be assigned or unassigned as needed.

Format is similar to the list format with addition of opcode for the
assignment operation.
 "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

Format for specific type of groups:

 * Default CTRL_MON group:
         "//<domain_id><opcode><flags>"

 * Non-default CTRL_MON group:
         "<CTRL_MON group>//<domain_id><opcode><flags>"

 * Child MON group of default CTRL_MON group:
         "/<MON group>/<domain_id><opcode><flags>"

 * Child MON group of non-default CTRL_MON group:
         "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

Domain_id '*' will apply the flags on all the domains.

Opcode can be one of the following:

 = Update the assignment to match the flags
 + Assign a new MBM event without impacting existing assignments.
 - Unassign a MBM event from currently assigned events.

Assignment flags can be one of the following:
 t  MBM total event
 l  MBM local event
 tl Both total and local MBM events
 _  None of the MBM events. Valid only with '=' opcode. This flag cannot
    be combined with other flags.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v11: Fixed the static check warning with initializing dom_id in resctrl_process_flags().

v10: Fixed the issue with finding the domain in multiple iterations.
     Printed error message with domain information when assign fails.
     Changed the variables to unsigned for processing assign state.
     Taken care of few format corrections.

v9: Fixed handling special case '//0=' and '//".
    Removed extra strstr() call.
    Added generic failure text when assignment operation fails.
    Corrected user documentation format texts.

v8: Moved unassign as the first action during the assign modification.
    Assign none "_" takes priority. Cannot be mixed with other flags.
    Updated the documentation and .rst file format. htmldoc looks ok.

v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
    Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
    Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
    Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
    Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
    Removed ABMC reference in FS code.
    Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
    Not sure if we need to change the behaviour here. Processed them sequencially right now.
    Users have the liberty to pass the flags. Restricting it might be a problem later.

v6: Added support assign all if domain id is '*'
    Fixed the allocation of counter id if it not assigned already.

v5: Interface name changed from mbm_assign_control to mbm_control.
    Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.
    Minor message update.
    Renamed the function with prefix - rdtgroup_.
    Corrected few documentation mistakes.
    Rebase related changes after SNC support.

v4: Added domain specific assignments. Fixed the opcode parsing.

v3: New patch.
    Addresses the feedback to provide the global assignment interface.
---
 Documentation/arch/x86/resctrl.rst     | 116 +++++++++++-
 arch/x86/kernel/cpu/resctrl/internal.h |  10 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 241 ++++++++++++++++++++++++-
 3 files changed, 365 insertions(+), 2 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 3040e5c4cd76..47e15b48d951 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -356,7 +356,8 @@ with the following files:
 	 t  MBM total event is assigned.
 	 l  MBM local event is assigned.
 	 tl Both MBM total and local events are assigned.
-	 _  None of the MBM events are assigned.
+	 _  None of the MBM events are assigned. Only works with opcode '=' for write
+	    and cannot be combined with other flags.
 
 	Examples:
 	::
@@ -374,6 +375,119 @@ with the following files:
 	There are four resctrl groups. All the groups have total and local MBM events
 	assigned on domain 0 and 1.
 
+	Assignment state can be updated by writing to "mbm_assign_control".
+
+	Format is similar to the list format with addition of opcode for the
+	assignment operation.
+
+		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+	Format for each type of group:
+
+        * Default CTRL_MON group:
+                "//<domain_id><opcode><flags>"
+
+        * Non-default CTRL_MON group:
+                "<CTRL_MON group>//<domain_id><opcode><flags>"
+
+        * Child MON group of default CTRL_MON group:
+                "/<MON group>/<domain_id><opcode><flags>"
+
+        * Child MON group of non-default CTRL_MON group:
+                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+	Domain_id '*' will apply the flags to all the domains.
+
+	Opcode can be one of the following:
+	::
+
+	 = Update the assignment to match the MBM event.
+	 + Assign a new MBM event without impacting existing assignments.
+	 - Unassign a MBM event from currently assigned events.
+
+	Examples:
+	Initial group status:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
+	  //0=tl;1=tl
+	  /child_default_mon_grp/0=tl;1=tl
+
+	To update the default group to assign only total MBM event on domain 0:
+	::
+
+	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
+	  //0=t;1=tl
+	  /child_default_mon_grp/0=tl;1=tl
+
+	To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
+	::
+
+	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
+	  //0=t;1=tl
+	  /child_default_mon_grp/0=tl;1=l
+
+	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
+	both local and total MBM events on domain 1:
+	::
+
+	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
+	  //0=t;1=tl
+	  /child_default_mon_grp/0=tl;1=l
+
+	To update the default group to add a local MBM event domain 0:
+	::
+
+	  # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=tl;1=tl
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
+	  //0=tl;1=tl
+	  /child_default_mon_grp/0=tl;1=l
+
+	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the
+	MBM events on all the domains:
+	::
+
+	  # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+	Assignment status after the update:
+	::
+
+	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+	  non_default_ctrl_mon_grp//0=_;1=_
+	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
+	  //0=tl;1=tl
+	  /child_default_mon_grp/0=tl;1=l
+
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
 		bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index aec564fa2833..377b5db66793 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -62,6 +62,16 @@
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
 #define ABMC_ENABLE_BIT			0
 
+/*
+ * Assignment flags for mbm_cntr_assign mode
+ */
+enum {
+	ASSIGN_NONE	= 0,
+	ASSIGN_TOTAL	= BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
+	ASSIGN_LOCAL	= BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
+	ASSIGN_INVALID,
+};
+
 /**
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6e29827239e0..299839bcf23f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static unsigned int resctrl_str_to_mon_state(char *flag)
+{
+	unsigned int i, mon_state = ASSIGN_NONE;
+
+	if (!strlen(flag))
+		return ASSIGN_INVALID;
+
+	for (i = 0; i < strlen(flag); i++) {
+		switch (*(flag + i)) {
+		case 't':
+			mon_state |= ASSIGN_TOTAL;
+			break;
+		case 'l':
+			mon_state |= ASSIGN_LOCAL;
+			break;
+		case '_':
+			return ASSIGN_NONE;
+		default:
+			return ASSIGN_INVALID;
+		}
+	}
+
+	return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
+						  char *p_grp, char *c_grp)
+{
+	struct rdtgroup *rdtg, *crg;
+
+	if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+		return &rdtgroup_default;
+	} else if (rtype == RDTCTRL_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+			if (!strcmp(p_grp, rdtg->kn->name))
+				return rdtg;
+	} else if (rtype == RDTMON_GROUP) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+			if (!strcmp(p_grp, rdtg->kn->name)) {
+				list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+						    mon.crdtgrp_list) {
+					if (!strcmp(c_grp, crg->kn->name))
+						return crg;
+				}
+			}
+		}
+	}
+
+	return NULL;
+}
+
+static int resctrl_process_flags(struct rdt_resource *r,
+				 enum rdt_group_type rtype,
+				 char *p_grp, char *c_grp, char *tok)
+{
+	unsigned int op, mon_state, assign_state, unassign_state;
+	char *dom_str, *id_str, *op_str;
+	struct rdt_mon_domain *d;
+	unsigned long dom_id = 0;
+	struct rdtgroup *rdtgrp;
+	char domain[10];
+	bool found;
+	int ret;
+
+	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
+
+	if (!rdtgrp) {
+		rdt_last_cmd_puts("Not a valid resctrl group\n");
+		return -EINVAL;
+	}
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+
+	op_str = strpbrk(dom_str, "=+-");
+
+	if (op_str) {
+		op = *op_str;
+	} else {
+		rdt_last_cmd_puts("Missing operation =, +, - character\n");
+		return -EINVAL;
+	}
+
+	id_str = strsep(&dom_str, "=+-");
+
+	/* Check for domain id '*' which means all domains */
+	if (id_str && *id_str == '*') {
+		d = NULL;
+		goto check_state;
+	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing domain id\n");
+		return -EINVAL;
+	}
+
+	/* Verify if the dom_id is valid */
+	found = false;
+	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		if (d->hdr.id == dom_id) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+		return -EINVAL;
+	}
+
+check_state:
+	mon_state = resctrl_str_to_mon_state(dom_str);
+
+	if (mon_state == ASSIGN_INVALID) {
+		rdt_last_cmd_puts("Invalid assign flag\n");
+		goto out_fail;
+	}
+
+	assign_state = 0;
+	unassign_state = 0;
+
+	switch (op) {
+	case '+':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		assign_state = mon_state;
+		break;
+	case '-':
+		if (mon_state == ASSIGN_NONE) {
+			rdt_last_cmd_puts("Invalid assign opcode\n");
+			goto out_fail;
+		}
+		unassign_state = mon_state;
+		break;
+	case '=':
+		assign_state = mon_state;
+		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+		break;
+	default:
+		break;
+	}
+
+	if (unassign_state & ASSIGN_TOTAL) {
+		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (unassign_state & ASSIGN_LOCAL) {
+		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (assign_state & ASSIGN_TOTAL) {
+		ret = resctrl_assign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	if (assign_state & ASSIGN_LOCAL) {
+		ret = resctrl_assign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+		if (ret)
+			goto out_fail;
+	}
+
+	goto next;
+
+out_fail:
+	sprintf(domain, d ? "%ld" : "*", dom_id);
+
+	rdt_last_cmd_printf("Assign operation '%s%c%s' failed on the group %s/%s/\n",
+			    domain, op, dom_str, p_grp, c_grp);
+
+	return -EINVAL;
+}
+
+static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
+						char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	char *token, *cmon_grp, *mon_grp;
+	enum rdt_group_type rtype;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	buf[nbytes - 1] = '\0';
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+		mutex_unlock(&rdtgroup_mutex);
+		cpus_read_unlock();
+		return -EINVAL;
+	}
+
+	while ((token = strsep(&buf, "\n")) != NULL) {
+		/*
+		 * The write command follows the following format:
+		 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
+		 * Extract the CTRL_MON group.
+		 */
+		cmon_grp = strsep(&token, "/");
+
+		/*
+		 * Extract the MON_GROUP.
+		 * strsep returns empty string for contiguous delimiters.
+		 * Empty mon_grp here means it is a RDTCTRL_GROUP.
+		 */
+		mon_grp = strsep(&token, "/");
+
+		if (*mon_grp == '\0')
+			rtype = RDTCTRL_GROUP;
+		else
+			rtype = RDTMON_GROUP;
+
+		ret = resctrl_process_flags(r, rtype, cmon_grp, mon_grp, token);
+		if (ret)
+			break;
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 /*
@@ -2073,9 +2311,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_assign_control",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= resctrl_mbm_assign_control_show,
+		.write		= resctrl_mbm_assign_control_write,
 	},
 	{
 		.name		= "mbm_assign_mode",
-- 
2.34.1

Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by James Morse 11 months, 3 weeks ago
Hi Babu,

On 22/01/2025 20:20, Babu Moger wrote:
> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
> events in the CTRL_MON or MON groups should have counters assigned.
> 
> Provide an interface for assigning MBM events by writing to the file:
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
> events can be assigned or unassigned as needed.
> 
> Format is similar to the list format with addition of opcode for the
> assignment operation.
>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Format for specific type of groups:
> 
>  * Default CTRL_MON group:
>          "//<domain_id><opcode><flags>"
> 
>  * Non-default CTRL_MON group:
>          "<CTRL_MON group>//<domain_id><opcode><flags>"
> 
>  * Child MON group of default CTRL_MON group:
>          "/<MON group>/<domain_id><opcode><flags>"
> 
>  * Child MON group of non-default CTRL_MON group:
>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Domain_id '*' will apply the flags on all the domains.
> 
> Opcode can be one of the following:
> 
>  = Update the assignment to match the flags
>  + Assign a new MBM event without impacting existing assignments.
>  - Unassign a MBM event from currently assigned events.
> 
> Assignment flags can be one of the following:
>  t  MBM total event
>  l  MBM local event
>  tl Both total and local MBM events
>  _  None of the MBM events. Valid only with '=' opcode. This flag cannot
>     be combined with other flags.

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6e29827239e0..299839bcf23f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,

> +static int resctrl_process_flags(struct rdt_resource *r,
> +				 enum rdt_group_type rtype,
> +				 char *p_grp, char *c_grp, char *tok)
> +{
> +	unsigned int op, mon_state, assign_state, unassign_state;
> +	char *dom_str, *id_str, *op_str;
> +	struct rdt_mon_domain *d;
> +	unsigned long dom_id = 0;
> +	struct rdtgroup *rdtgrp;
> +	char domain[10];
> +	bool found;
> +	int ret;
> +
> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
> +
> +	if (!rdtgrp) {
> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
> +		return -EINVAL;
> +	}
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +
> +	op_str = strpbrk(dom_str, "=+-");
> +
> +	if (op_str) {
> +		op = *op_str;
> +	} else {
> +		rdt_last_cmd_puts("Missing operation =, +, - character\n");
> +		return -EINVAL;
> +	}
> +
> +	id_str = strsep(&dom_str, "=+-");
> +
> +	/* Check for domain id '*' which means all domains */
> +	if (id_str && *id_str == '*') {
> +		d = NULL;
> +		goto check_state;
> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Verify if the dom_id is valid */
> +	found = false;
> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +		if (d->hdr.id == dom_id) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> +		return -EINVAL;
> +	}
> +
> +check_state:
> +	mon_state = resctrl_str_to_mon_state(dom_str);
> +
> +	if (mon_state == ASSIGN_INVALID) {
> +		rdt_last_cmd_puts("Invalid assign flag\n");
> +		goto out_fail;
> +	}
> +
> +	assign_state = 0;
> +	unassign_state = 0;
> +
> +	switch (op) {
> +	case '+':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		assign_state = mon_state;
> +		break;
> +	case '-':
> +		if (mon_state == ASSIGN_NONE) {
> +			rdt_last_cmd_puts("Invalid assign opcode\n");
> +			goto out_fail;
> +		}
> +		unassign_state = mon_state;
> +		break;
> +	case '=':
> +		assign_state = mon_state;
> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
> +		break;
> +	default:
> +		break;
> +	}


> +	if (unassign_state & ASSIGN_TOTAL) {
> +		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (unassign_state & ASSIGN_LOCAL) {
> +		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (assign_state & ASSIGN_TOTAL) {
> +		ret = resctrl_assign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}
> +
> +	if (assign_state & ASSIGN_LOCAL) {
> +		ret = resctrl_assign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +		if (ret)
> +			goto out_fail;
> +	}

This sequence of if's allows the helpers to be called on platforms that doesn't support
both local and total. Could we reject such misconfiguration here in the parsing code?
You have these checks in rdtgroup_assign_cntrs() added in patch 17.


What do you think to trying to group these four by event type, and passing the event type
in as an argument? ... it ends up with a helper that takes a large number of arguments,
(both assign_state and unassign_state), but there is less repetition...


Thanks,

James

> +	goto next;
> +
> +out_fail:
> +	sprintf(domain, d ? "%ld" : "*", dom_id);
> +
> +	rdt_last_cmd_printf("Assign operation '%s%c%s' failed on the group %s/%s/\n",
> +			    domain, op, dom_str, p_grp, c_grp);
> +
> +	return -EINVAL;
> +}
> +
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Moger, Babu 11 months, 3 weeks ago
Hi James,

On 2/21/25 12:07, James Morse wrote:
> Hi Babu,
> 
> On 22/01/2025 20:20, Babu Moger wrote:
>> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
>> events in the CTRL_MON or MON groups should have counters assigned.
>>
>> Provide an interface for assigning MBM events by writing to the file:
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
>> events can be assigned or unassigned as needed.
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Format for specific type of groups:
>>
>>  * Default CTRL_MON group:
>>          "//<domain_id><opcode><flags>"
>>
>>  * Non-default CTRL_MON group:
>>          "<CTRL_MON group>//<domain_id><opcode><flags>"
>>
>>  * Child MON group of default CTRL_MON group:
>>          "/<MON group>/<domain_id><opcode><flags>"
>>
>>  * Child MON group of non-default CTRL_MON group:
>>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Domain_id '*' will apply the flags on all the domains.
>>
>> Opcode can be one of the following:
>>
>>  = Update the assignment to match the flags
>>  + Assign a new MBM event without impacting existing assignments.
>>  - Unassign a MBM event from currently assigned events.
>>
>> Assignment flags can be one of the following:
>>  t  MBM total event
>>  l  MBM local event
>>  tl Both total and local MBM events
>>  _  None of the MBM events. Valid only with '=' opcode. This flag cannot
>>     be combined with other flags.
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6e29827239e0..299839bcf23f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
> 
>> +static int resctrl_process_flags(struct rdt_resource *r,
>> +				 enum rdt_group_type rtype,
>> +				 char *p_grp, char *c_grp, char *tok)
>> +{
>> +	unsigned int op, mon_state, assign_state, unassign_state;
>> +	char *dom_str, *id_str, *op_str;
>> +	struct rdt_mon_domain *d;
>> +	unsigned long dom_id = 0;
>> +	struct rdtgroup *rdtgrp;
>> +	char domain[10];
>> +	bool found;
>> +	int ret;
>> +
>> +	rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
>> +
>> +	if (!rdtgrp) {
>> +		rdt_last_cmd_puts("Not a valid resctrl group\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +next:
>> +	if (!tok || tok[0] == '\0')
>> +		return 0;
>> +
>> +	/* Start processing the strings for each domain */
>> +	dom_str = strim(strsep(&tok, ";"));
>> +
>> +	op_str = strpbrk(dom_str, "=+-");
>> +
>> +	if (op_str) {
>> +		op = *op_str;
>> +	} else {
>> +		rdt_last_cmd_puts("Missing operation =, +, - character\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	id_str = strsep(&dom_str, "=+-");
>> +
>> +	/* Check for domain id '*' which means all domains */
>> +	if (id_str && *id_str == '*') {
>> +		d = NULL;
>> +		goto check_state;
>> +	} else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
>> +		rdt_last_cmd_puts("Missing domain id\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Verify if the dom_id is valid */
>> +	found = false;
>> +	list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> +		if (d->hdr.id == dom_id) {
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!found) {
>> +		rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
>> +		return -EINVAL;
>> +	}
>> +
>> +check_state:
>> +	mon_state = resctrl_str_to_mon_state(dom_str);
>> +
>> +	if (mon_state == ASSIGN_INVALID) {
>> +		rdt_last_cmd_puts("Invalid assign flag\n");
>> +		goto out_fail;
>> +	}
>> +
>> +	assign_state = 0;
>> +	unassign_state = 0;
>> +
>> +	switch (op) {
>> +	case '+':
>> +		if (mon_state == ASSIGN_NONE) {
>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>> +			goto out_fail;
>> +		}
>> +		assign_state = mon_state;
>> +		break;
>> +	case '-':
>> +		if (mon_state == ASSIGN_NONE) {
>> +			rdt_last_cmd_puts("Invalid assign opcode\n");
>> +			goto out_fail;
>> +		}
>> +		unassign_state = mon_state;
>> +		break;
>> +	case '=':
>> +		assign_state = mon_state;
>> +		unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
>> +		break;
>> +	default:
>> +		break;
>> +	}
> 
> 
>> +	if (unassign_state & ASSIGN_TOTAL) {
>> +		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (unassign_state & ASSIGN_LOCAL) {
>> +		ret = resctrl_unassign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (assign_state & ASSIGN_TOTAL) {
>> +		ret = resctrl_assign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
>> +
>> +	if (assign_state & ASSIGN_LOCAL) {
>> +		ret = resctrl_assign_cntr_event(r, d, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +		if (ret)
>> +			goto out_fail;
>> +	}
> 
> This sequence of if's allows the helpers to be called on platforms that doesn't support
> both local and total. Could we reject such misconfiguration here in the parsing code?
> You have these checks in rdtgroup_assign_cntrs() added in patch 17.
> 

Yes. Added the check is_mbm_total_enabled() and is_mbm_local_enabled().

> 
> What do you think to trying to group these four by event type, and passing the event type
> in as an argument? ... it ends up with a helper that takes a large number of arguments,
> (both assign_state and unassign_state), but there is less repetition...

Added a new helper function resctrl_process_cntr_event(). I could not
avoid the repetitions.



> 
> 
> Thanks,
> 
> James
> 
>> +	goto next;
>> +
>> +out_fail:
>> +	sprintf(domain, d ? "%ld" : "*", dom_id);
>> +
>> +	rdt_last_cmd_printf("Assign operation '%s%c%s' failed on the group %s/%s/\n",
>> +			    domain, op, dom_str, p_grp, c_grp);
>> +
>> +	return -EINVAL;
>> +}
>> +
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Dave Martin 11 months, 3 weeks ago
Hi,

On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:
> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
> events in the CTRL_MON or MON groups should have counters assigned.
> 
> Provide an interface for assigning MBM events by writing to the file:
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
> events can be assigned or unassigned as needed.
> 
> Format is similar to the list format with addition of opcode for the
> assignment operation.
>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"

[...]

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6e29827239e0..299839bcf23f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,

[...]

> +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
> +						char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	char *token, *cmon_grp, *mon_grp;
> +	enum rdt_group_type rtype;
> +	int ret;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
> +		mutex_unlock(&rdtgroup_mutex);
> +		cpus_read_unlock();
> +		return -EINVAL;
> +	}
> +
> +	while ((token = strsep(&buf, "\n")) != NULL) {
> +		/*
> +		 * The write command follows the following format:
> +		 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
> +		 * Extract the CTRL_MON group.
> +		 */
> +		cmon_grp = strsep(&token, "/");
> +

As when reading this file, I think that the data can grow larger than a
page and get split into multiple write() calls.

I don't currently think the file needs to be redesigned, but there are
some concerns about how userspace will work with it that need to be
sorted out.

Every monitoring group can contribute a line to this file:

	CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF

so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1

NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
600 bytes per monitoring group in the worst case.

We don't need to have many control and monitoring groups for this to
grow potentially over 4K.


We could simply place a limit on how much userspace is allowed to write
to this file in one go, although this restriction feels difficult for
userspace to follow -- but maybe this is workable in the short term, on
current systems (?)

Otherwise, since we expect this interface to be written using scripting
languages, I think we need to be prepared to accept fully-buffered
I/O.  That means that the data may be cut at random places, not
necessarily at newlines.  (For smaller files such as schemata this is
not such an issue, since the whole file is likely to be small enough to
fit into the default stdio buffers -- this is how sysfs gets away with
it IIUC.)

For fully-buffered I/O, we may have to cache an incomplete line in
between write() calls.  If there is a dangling incomplete line when the
file is closed then it is hard to tell userspace, because people often
don't bother to check the return value of close(), fclose() etc.
However, since it's an ABI violation for userspace to end this file
with a partial line, I think it's sufficient to report that via
last_cmd_status.  (Making close() return -EIO still seems a good idea
though, just in case userspace is listening.)

I hacked up something a bit like this so that schemata could be written
interactively from the shell, so I can try to port that onto this series
as an illustration, if it helps.

Cheers
---Dave
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Moger, Babu 11 months, 3 weeks ago
Hi Dave,

On 2/19/2025 10:07 AM, Dave Martin wrote:
> Hi,
> 
> On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:
>> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
>> events in the CTRL_MON or MON groups should have counters assigned.
>>
>> Provide an interface for assigning MBM events by writing to the file:
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
>> events can be assigned or unassigned as needed.
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>>   "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> [...]
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6e29827239e0..299839bcf23f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
> 
> [...]
> 
>> +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
>> +						char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	char *token, *cmon_grp, *mon_grp;
>> +	enum rdt_group_type rtype;
>> +	int ret;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> +		mutex_unlock(&rdtgroup_mutex);
>> +		cpus_read_unlock();
>> +		return -EINVAL;
>> +	}
>> +
>> +	while ((token = strsep(&buf, "\n")) != NULL) {
>> +		/*
>> +		 * The write command follows the following format:
>> +		 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>> +		 * Extract the CTRL_MON group.
>> +		 */
>> +		cmon_grp = strsep(&token, "/");
>> +
> 
> As when reading this file, I think that the data can grow larger than a
> page and get split into multiple write() calls.
> 
> I don't currently think the file needs to be redesigned, but there are
> some concerns about how userspace will work with it that need to be
> sorted out.
> 
> Every monitoring group can contribute a line to this file:
> 
> 	CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF
> 
> so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1
> 
> NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
> 600 bytes per monitoring group in the worst case.
> 
> We don't need to have many control and monitoring groups for this to
> grow potentially over 4K.
> 
> 
> We could simply place a limit on how much userspace is allowed to write
> to this file in one go, although this restriction feels difficult for
> userspace to follow -- but maybe this is workable in the short term, on
> current systems (?)
> 
> Otherwise, since we expect this interface to be written using scripting
> languages, I think we need to be prepared to accept fully-buffered
> I/O.  That means that the data may be cut at random places, not
> necessarily at newlines.  (For smaller files such as schemata this is
> not such an issue, since the whole file is likely to be small enough to
> fit into the default stdio buffers -- this is how sysfs gets away with
> it IIUC.)
> 
> For fully-buffered I/O, we may have to cache an incomplete line in
> between write() calls.  If there is a dangling incomplete line when the
> file is closed then it is hard to tell userspace, because people often
> don't bother to check the return value of close(), fclose() etc.
> However, since it's an ABI violation for userspace to end this file
> with a partial line, I think it's sufficient to report that via
> last_cmd_status.  (Making close() return -EIO still seems a good idea
> though, just in case userspace is listening.)

Seems like we can add a check in resctrl_mbm_assign_control_write() to 
compare nbytes > PAGE_SIZE.

But do we really need this? I have no way of testing this. Help me 
understand.

All these file operations go thru generic call kernfs_fop_write_iter(). 
Doesn't it take care of buffer check and overflow?


> 
> I hacked up something a bit like this so that schemata could be written
> interactively from the shell, so I can try to port that onto this series
> as an illustration, if it helps.
> 
> Cheers
> ---Dave
> 

Thanks
Babu
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Dave Martin 11 months, 3 weeks ago
Hi,

On Wed, Feb 19, 2025 at 06:34:42PM -0600, Moger, Babu wrote:
> Hi Dave,
> 
> On 2/19/2025 10:07 AM, Dave Martin wrote:
> > Hi,
> > 
> > On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:

> > [...]
> > 
> > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > > index 6e29827239e0..299839bcf23f 100644
> > > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > > @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
> > 
> > [...]
> > 
> > > +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
> > > +						char *buf, size_t nbytes, loff_t off)
> > > +{

[...]

> > > +	while ((token = strsep(&buf, "\n")) != NULL) {
> > > +		/*
> > > +		 * The write command follows the following format:
> > > +		 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
> > > +		 * Extract the CTRL_MON group.
> > > +		 */
> > > +		cmon_grp = strsep(&token, "/");
> > > +
> > 
> > As when reading this file, I think that the data can grow larger than a
> > page and get split into multiple write() calls.
> > 
> > I don't currently think the file needs to be redesigned, but there are
> > some concerns about how userspace will work with it that need to be
> > sorted out.
> > 
> > Every monitoring group can contribute a line to this file:
> > 
> > 	CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF
> > 
> > so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1
> > 
> > NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
> > 600 bytes per monitoring group in the worst case.
> > 
> > We don't need to have many control and monitoring groups for this to
> > grow potentially over 4K.
> > 
> > 
> > We could simply place a limit on how much userspace is allowed to write
> > to this file in one go, although this restriction feels difficult for
> > userspace to follow -- but maybe this is workable in the short term, on
> > current systems (?)
> > 
> > Otherwise, since we expect this interface to be written using scripting
> > languages, I think we need to be prepared to accept fully-buffered
> > I/O.  That means that the data may be cut at random places, not
> > necessarily at newlines.  (For smaller files such as schemata this is
> > not such an issue, since the whole file is likely to be small enough to
> > fit into the default stdio buffers -- this is how sysfs gets away with
> > it IIUC.)
> > 
> > For fully-buffered I/O, we may have to cache an incomplete line in
> > between write() calls.  If there is a dangling incomplete line when the
> > file is closed then it is hard to tell userspace, because people often
> > don't bother to check the return value of close(), fclose() etc.
> > However, since it's an ABI violation for userspace to end this file
> > with a partial line, I think it's sufficient to report that via
> > last_cmd_status.  (Making close() return -EIO still seems a good idea
> > though, just in case userspace is listening.)
> 
> Seems like we can add a check in resctrl_mbm_assign_control_write() to
> compare nbytes > PAGE_SIZE.

This might be a reasonable stopgap approach, if we are confident that the
number of RMIDs and monitoring domains is small enough on known
platforms that the problem is unlikely to be hit.  I can't really judge
on this.

> But do we really need this? I have no way of testing this. Help me
> understand.

It's easy to demonatrate this using the schemata file (which works in a
similar way).  Open f in /sys/fs/resctrl/schemata, then:

	int n = 0;

	for (n = 0; n < 1000; n++)
		if (fputs("MB:0=100;1=100\n", f) == EOF)
			fprintf(stderr, "Failed on interation %d\n", n);

This will succeed a certain number of times (272, for me) and then fail
when the stdio buffer for f overflows, triggering a write().

Putting an explicit fflush() after every fputs() call (or doing a
setlinebuf(f) before the loop) makes it work.  But this is awkward and
unexpected for the user, and doing the right thing from a scripting
language may be tricky.

In this example I am doing something a bit artificial -- we don't
officially say what happens when a pre-opened schemata file handle is
reused in this way, AFAICT.  But for mbm_assign_control it is
legitimate to write many lines, and we can hit this kind of problem.


I'll leave it to others to judge whether we _need_ to fix this, but it
feels like a problem waiting to happen.


> All these file operations go thru generic call kernfs_fop_write_iter().
> Doesn't it take care of buffer check and overflow?

No, this is called for each iovec segment (where userspace used one of
the iovec based I/O syscalls).  But there is no buffering or
concatenation of the data read in: each segment gets passed down to the
individual kernfs_file_operations write method for the file:

	len = ops->write(of, buf, len, iocb->ki_pos)

calls down to

	resctrl_mbm_assign_control_write(of, buf, len, iocb->ki_pos).


I'll try to port my buffering hack on top of the series -- that should
help to illustrate what I mean.

Cheers
---Dave
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Moger, Babu 11 months, 3 weeks ago
Hi Dave,

On 2/20/25 09:21, Dave Martin wrote:
> Hi,
> 
> On Wed, Feb 19, 2025 at 06:34:42PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/19/2025 10:07 AM, Dave Martin wrote:
>>> Hi,
>>>
>>> On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:
> 
>>> [...]
>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 6e29827239e0..299839bcf23f 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
>>>
>>> [...]
>>>
>>>> +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
>>>> +						char *buf, size_t nbytes, loff_t off)
>>>> +{
> 
> [...]
> 
>>>> +	while ((token = strsep(&buf, "\n")) != NULL) {
>>>> +		/*
>>>> +		 * The write command follows the following format:
>>>> +		 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>>>> +		 * Extract the CTRL_MON group.
>>>> +		 */
>>>> +		cmon_grp = strsep(&token, "/");
>>>> +
>>>
>>> As when reading this file, I think that the data can grow larger than a
>>> page and get split into multiple write() calls.
>>>
>>> I don't currently think the file needs to be redesigned, but there are
>>> some concerns about how userspace will work with it that need to be
>>> sorted out.
>>>
>>> Every monitoring group can contribute a line to this file:
>>>
>>> 	CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF
>>>
>>> so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1
>>>
>>> NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
>>> 600 bytes per monitoring group in the worst case.
>>>
>>> We don't need to have many control and monitoring groups for this to
>>> grow potentially over 4K.
>>>
>>>
>>> We could simply place a limit on how much userspace is allowed to write
>>> to this file in one go, although this restriction feels difficult for
>>> userspace to follow -- but maybe this is workable in the short term, on
>>> current systems (?)
>>>
>>> Otherwise, since we expect this interface to be written using scripting
>>> languages, I think we need to be prepared to accept fully-buffered
>>> I/O.  That means that the data may be cut at random places, not
>>> necessarily at newlines.  (For smaller files such as schemata this is
>>> not such an issue, since the whole file is likely to be small enough to
>>> fit into the default stdio buffers -- this is how sysfs gets away with
>>> it IIUC.)
>>>
>>> For fully-buffered I/O, we may have to cache an incomplete line in
>>> between write() calls.  If there is a dangling incomplete line when the
>>> file is closed then it is hard to tell userspace, because people often
>>> don't bother to check the return value of close(), fclose() etc.
>>> However, since it's an ABI violation for userspace to end this file
>>> with a partial line, I think it's sufficient to report that via
>>> last_cmd_status.  (Making close() return -EIO still seems a good idea
>>> though, just in case userspace is listening.)
>>
>> Seems like we can add a check in resctrl_mbm_assign_control_write() to
>> compare nbytes > PAGE_SIZE.
> 
> This might be a reasonable stopgap approach, if we are confident that the
> number of RMIDs and monitoring domains is small enough on known
> platforms that the problem is unlikely to be hit.  I can't really judge
> on this.
> 
>> But do we really need this? I have no way of testing this. Help me
>> understand.
> 
> It's easy to demonatrate this using the schemata file (which works in a
> similar way).  Open f in /sys/fs/resctrl/schemata, then:
> 
> 	int n = 0;
> 
> 	for (n = 0; n < 1000; n++)
> 		if (fputs("MB:0=100;1=100\n", f) == EOF)
> 			fprintf(stderr, "Failed on interation %d\n", n);
> 
> This will succeed a certain number of times (272, for me) and then fail
> when the stdio buffer for f overflows, triggering a write().
> 
> Putting an explicit fflush() after every fputs() call (or doing a
> setlinebuf(f) before the loop) makes it work.  But this is awkward and
> unexpected for the user, and doing the right thing from a scripting
> language may be tricky.
> 
> In this example I am doing something a bit artificial -- we don't
> officially say what happens when a pre-opened schemata file handle is
> reused in this way, AFAICT.  But for mbm_assign_control it is
> legitimate to write many lines, and we can hit this kind of problem.
> 
> 
> I'll leave it to others to judge whether we _need_ to fix this, but it
> feels like a problem waiting to happen.

Created the problem using this code using a "test" group.

include <stdio.h>
#include <errno.h>
#include <string.h>

int main()
{
        FILE *file;
        int n;

        file = fopen("/sys/fs/resctrl/info/L3_MON/mbm_assign_control", "w");

        if (file == NULL) {
                printf("Error opening file!\n");
                return 1;
        }

        printf("File opened successfully.\n");

        for (n = 0; n < 100; n++)
                if
(fputs("test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;9=tl;10=tl;11=tl\n", file) == EOF)
                        fprintf(stderr, "Failed on interation %d error
%s\n ", n, strerror(errno));

        if (fclose(file) == 0) {
                printf("File closed successfully.\n");
        } else {
                printf("Error closing file!\n");
        }
}


When the buffer overflow happens the newline will not be there. I have
added this error via rdt_last_cmd_puts. At least user knows there is an error.

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 484d6009869f..70a96976e3ab 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1250,8 +1252,10 @@ static ssize_t
resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
        int ret;

        /* Valid input requires a trailing newline */
-       if (nbytes == 0 || buf[nbytes - 1] != '\n')
+       if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+               rdt_last_cmd_puts("mbm_cntr_assign: buffer invalid\n");
                return -EINVAL;
+       }

        buf[nbytes - 1] = '\0';



I am open to other ideas to handle this case.


> 
> 
>> All these file operations go thru generic call kernfs_fop_write_iter().
>> Doesn't it take care of buffer check and overflow?
> 
> No, this is called for each iovec segment (where userspace used one of
> the iovec based I/O syscalls).  But there is no buffering or
> concatenation of the data read in: each segment gets passed down to the
> individual kernfs_file_operations write method for the file:
> 
> 	len = ops->write(of, buf, len, iocb->ki_pos)
> 
> calls down to
> 
> 	resctrl_mbm_assign_control_write(of, buf, len, iocb->ki_pos).
> 
> 
> I'll try to port my buffering hack on top of the series -- that should
> help to illustrate what I mean.
> 
> Cheers
> ---Dave
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Dave Martin 11 months, 3 weeks ago
Hi,

On Thu, Feb 20, 2025 at 02:57:31PM -0600, Moger, Babu wrote:
> Hi Dave,

[...]

> Created the problem using this code using a "test" group.
> 
> include <stdio.h>
> #include <errno.h>
> #include <string.h>
> 
> int main()
> {
>         FILE *file;
>         int n;
> 
>         file = fopen("/sys/fs/resctrl/info/L3_MON/mbm_assign_control", "w");
> 
>         if (file == NULL) {
>                 printf("Error opening file!\n");
>                 return 1;
>         }
> 
>         printf("File opened successfully.\n");
> 
>         for (n = 0; n < 100; n++)
>                 if
> (fputs("test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;9=tl;10=tl;11=tl\n", file) == EOF)
>                         fprintf(stderr, "Failed on interation %d error
> %s\n ", n, strerror(errno));
> 
>         if (fclose(file) == 0) {
>                 printf("File closed successfully.\n");
>         } else {
>                 printf("Error closing file!\n");
>         }
> }

Right.

> When the buffer overflow happens the newline will not be there. I have
> added this error via rdt_last_cmd_puts. At least user knows there is an error.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 484d6009869f..70a96976e3ab 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1250,8 +1252,10 @@ static ssize_t
> resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
>         int ret;
> 
>         /* Valid input requires a trailing newline */
> -       if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +       if (nbytes == 0 || buf[nbytes - 1] != '\n') {
> +               rdt_last_cmd_puts("mbm_cntr_assign: buffer invalid\n");
>                 return -EINVAL;
> +       }
> 
>         buf[nbytes - 1] = '\0';
> 
> 
> 
> I am open to other ideas to handle this case.

Reinette, what do you think about this as a stopgap approach?

The worst that happens is that userspace gets an unexpected failure in
scenarios that seem unlikely in the near future (i.e., where there are
a lot of RMIDs available, and at the same time groups have been given
stupidly long names).

Since this is an implementation issue rather than an interface issue,
we could fix it later on.


Longer term, we may want to define some stuff along the lines of

	struct rdtgroup_file {
		/* persistent data for an rdtgroup open file instance */
	};

	static int rdtgroup_file_open(struct kernfs_open_file *of)
	{
		struct rdtgroup_file *rf;

		rf = kzalloc(sizeof(*rf), GFP_KERNEL);
		if (!rf)
			return -ENOMEM;

		of->priv;
	}

	static void rdtgroup_file_release(struct kernfs_open_file *of)
	{
		/*
		 * Deal with dangling data and do cleanup appropriate
		 * for whatever kind of file this is, then:
		 */
		kfree(of->priv);
	}


Then we'd have somewhere to stash data that needs to be carried over
from one read/write call to the next.

I tried to port my schemata buffering hack over, but the requirements
are not exactly the same as for mbm_assign_control, so it wasn't
trivial.  It feels do-able, but it might be better to stabilise this
series before going down that road.

(I'm happy to spend some time trying to wire this up if it would be
useful, though.)

Cheers
---Dave
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Dave,

On 2/21/25 7:53 AM, Dave Martin wrote:
> Hi,
> 
> On Thu, Feb 20, 2025 at 02:57:31PM -0600, Moger, Babu wrote:
>> Hi Dave,
> 
> [...]
> 
>> Created the problem using this code using a "test" group.
>>
>> include <stdio.h>
>> #include <errno.h>
>> #include <string.h>
>>
>> int main()
>> {
>>         FILE *file;
>>         int n;
>>
>>         file = fopen("/sys/fs/resctrl/info/L3_MON/mbm_assign_control", "w");
>>
>>         if (file == NULL) {
>>                 printf("Error opening file!\n");
>>                 return 1;
>>         }
>>
>>         printf("File opened successfully.\n");
>>
>>         for (n = 0; n < 100; n++)
>>                 if
>> (fputs("test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;9=tl;10=tl;11=tl\n", file) == EOF)
>>                         fprintf(stderr, "Failed on interation %d error
>> %s\n ", n, strerror(errno));
>>
>>         if (fclose(file) == 0) {
>>                 printf("File closed successfully.\n");
>>         } else {
>>                 printf("Error closing file!\n");
>>         }
>> }
> 
> Right.
> 
>> When the buffer overflow happens the newline will not be there. I have
>> added this error via rdt_last_cmd_puts. At least user knows there is an error.
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 484d6009869f..70a96976e3ab 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1250,8 +1252,10 @@ static ssize_t
>> resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
>>         int ret;
>>
>>         /* Valid input requires a trailing newline */
>> -       if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +       if (nbytes == 0 || buf[nbytes - 1] != '\n') {
>> +               rdt_last_cmd_puts("mbm_cntr_assign: buffer invalid\n");
>>                 return -EINVAL;
>> +       }
>>
>>         buf[nbytes - 1] = '\0';
>>
>>
>>
>> I am open to other ideas to handle this case.
> 
> Reinette, what do you think about this as a stopgap approach?

This seems fair. I expect that we need to document somewhere that writes
"may" (to leave wiggle room for improvements) require page sized writes. 

> 
> The worst that happens is that userspace gets an unexpected failure in
> scenarios that seem unlikely in the near future (i.e., where there are
> a lot of RMIDs available, and at the same time groups have been given
> stupidly long names).
> 
> Since this is an implementation issue rather than an interface issue,
> we could fix it later on.
> 
> 
> Longer term, we may want to define some stuff along the lines of
> 
> 	struct rdtgroup_file {
> 		/* persistent data for an rdtgroup open file instance */
> 	};
> 
> 	static int rdtgroup_file_open(struct kernfs_open_file *of)
> 	{
> 		struct rdtgroup_file *rf;
> 
> 		rf = kzalloc(sizeof(*rf), GFP_KERNEL);
> 		if (!rf)
> 			return -ENOMEM;
> 
> 		of->priv;
> 	}
> 
> 	static void rdtgroup_file_release(struct kernfs_open_file *of)
> 	{
> 		/*
> 		 * Deal with dangling data and do cleanup appropriate
> 		 * for whatever kind of file this is, then:
> 		 */
> 		kfree(of->priv);
> 	}
> 
> 
> Then we'd have somewhere to stash data that needs to be carried over
> from one read/write call to the next.

Something like this seems needed for reading from this file. 

> 
> I tried to port my schemata buffering hack over, but the requirements
> are not exactly the same as for mbm_assign_control, so it wasn't
> trivial.  It feels do-able, but it might be better to stabilise this
> series before going down that road.
> 
> (I'm happy to spend some time trying to wire this up if it would be
> useful, though.)

I was hoping that we would not need to re-invent something here. This does
not seem like a new problem.

Reinette
RE: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Luck, Tony 11 months, 3 weeks ago
> I hacked up something a bit like this so that schemata could be written
> interactively from the shell, so I can try to port that onto this series
> as an illustration, if it helps.

Note that schemata will accept writes that just change the bits you want to change.

So from the shell:

# cat schemata
MB:0=100;1=100
L3:0=fff;1=fff

# echo "MB:1=90" > schemata

# cat schemata
MB:0=100;1= 90
L3:0=fff;1=fff

-Tony

Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Dave Martin 11 months, 3 weeks ago
[Dropped Cc: fenghua.yu@intel.com <fenghua.yu@intel.com> (bounces)]

On Wed, Feb 19, 2025 at 05:43:43PM +0000, Luck, Tony wrote:
> > I hacked up something a bit like this so that schemata could be written
> > interactively from the shell, so I can try to port that onto this series
> > as an illustration, if it helps.
> 
> Note that schemata will accept writes that just change the bits you want to change.
> 
> So from the shell:
> 
> # cat schemata
> MB:0=100;1=100
> L3:0=fff;1=fff
> 
> # echo "MB:1=90" > schemata
> 
> # cat schemata
> MB:0=100;1= 90
> L3:0=fff;1=fff
> 
> -Tony
> 

Yes, but not:

# {
	p=:
	echo -n MB
	for ((d = 0; d < 2; d++)); do
		echo -n "$p$d=100"
		p=';'
	done
	echo
  } >schemata

(Or at least, it depends on the shell.  Each simple command that
generates output can result in a separate write() call -- certainly
there is no guarantee that it won't.)

Doing the same thing from C will "work", because by default I/O on the
schemata file will be fully buffered in userspace... unless the whole
output exceeds the default buffer size.

The difference from sysfs here is that it would be insane to write a
small, single formatted value in pieces when it is natural to generate
it from a single format specifier -- whereas the syntax of some of
resctrl's files has a multilevel internal structure that has to be
built up in a piecemeal fashion (whether or not it is written to the
file in one go).


I'm not saying that this is an issue for realistic uses though, and
anyway, the schemata file is nothing to do with this series.

Cheers
---Dave
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Reinette Chatre 1 year ago
Hi Babu,

On 1/22/25 12:20 PM, Babu Moger wrote:
> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
> events in the CTRL_MON or MON groups should have counters assigned.
> 
> Provide an interface for assigning MBM events by writing to the file:
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
> events can be assigned or unassigned as needed.
> 
> Format is similar to the list format with addition of opcode for the
> assignment operation.
>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Format for specific type of groups:
> 
>  * Default CTRL_MON group:
>          "//<domain_id><opcode><flags>"
> 
>  * Non-default CTRL_MON group:
>          "<CTRL_MON group>//<domain_id><opcode><flags>"
> 
>  * Child MON group of default CTRL_MON group:
>          "/<MON group>/<domain_id><opcode><flags>"
> 
>  * Child MON group of non-default CTRL_MON group:
>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> Domain_id '*' will apply the flags on all the domains.
> 
> Opcode can be one of the following:
> 
>  = Update the assignment to match the flags
>  + Assign a new MBM event without impacting existing assignments.
>  - Unassign a MBM event from currently assigned events.
> 
> Assignment flags can be one of the following:
>  t  MBM total event
>  l  MBM local event
>  tl Both total and local MBM events
>  _  None of the MBM events. Valid only with '=' opcode. This flag cannot
>     be combined with other flags.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v11: Fixed the static check warning with initializing dom_id in resctrl_process_flags().
> 
> v10: Fixed the issue with finding the domain in multiple iterations.
>      Printed error message with domain information when assign fails.
>      Changed the variables to unsigned for processing assign state.
>      Taken care of few format corrections.
> 
> v9: Fixed handling special case '//0=' and '//".
>     Removed extra strstr() call.
>     Added generic failure text when assignment operation fails.
>     Corrected user documentation format texts.
> 
> v8: Moved unassign as the first action during the assign modification.
>     Assign none "_" takes priority. Cannot be mixed with other flags.
>     Updated the documentation and .rst file format. htmldoc looks ok.
> 
> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>     Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>     Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>     Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>     Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>     Removed ABMC reference in FS code.
>     Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>     Not sure if we need to change the behaviour here. Processed them sequencially right now.
>     Users have the liberty to pass the flags. Restricting it might be a problem later.
> 
> v6: Added support assign all if domain id is '*'
>     Fixed the allocation of counter id if it not assigned already.
> 
> v5: Interface name changed from mbm_assign_control to mbm_control.
>     Fixed opcode and flags combination.
>     '=_" is valid.
>     "-_" amd "+_" is not valid.
>     Minor message update.
>     Renamed the function with prefix - rdtgroup_.
>     Corrected few documentation mistakes.
>     Rebase related changes after SNC support.
> 
> v4: Added domain specific assignments. Fixed the opcode parsing.
> 
> v3: New patch.
>     Addresses the feedback to provide the global assignment interface.
> ---
>  Documentation/arch/x86/resctrl.rst     | 116 +++++++++++-
>  arch/x86/kernel/cpu/resctrl/internal.h |  10 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 241 ++++++++++++++++++++++++-
>  3 files changed, 365 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 3040e5c4cd76..47e15b48d951 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -356,7 +356,8 @@ with the following files:
>  	 t  MBM total event is assigned.
>  	 l  MBM local event is assigned.
>  	 tl Both MBM total and local events are assigned.
> -	 _  None of the MBM events are assigned.
> +	 _  None of the MBM events are assigned. Only works with opcode '=' for write
> +	    and cannot be combined with other flags.
>  
>  	Examples:
>  	::
> @@ -374,6 +375,119 @@ with the following files:
>  	There are four resctrl groups. All the groups have total and local MBM events
>  	assigned on domain 0 and 1.
>  
> +	Assignment state can be updated by writing to "mbm_assign_control".
> +
> +	Format is similar to the list format with addition of opcode for the
> +	assignment operation.
> +
> +		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> +	Format for each type of group:
> +
> +        * Default CTRL_MON group:
> +                "//<domain_id><opcode><flags>"
> +
> +        * Non-default CTRL_MON group:
> +                "<CTRL_MON group>//<domain_id><opcode><flags>"
> +
> +        * Child MON group of default CTRL_MON group:
> +                "/<MON group>/<domain_id><opcode><flags>"
> +
> +        * Child MON group of non-default CTRL_MON group:
> +                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> +
> +	Domain_id '*' will apply the flags to all the domains.
> +
> +	Opcode can be one of the following:
> +	::
> +
> +	 = Update the assignment to match the MBM event.
> +	 + Assign a new MBM event without impacting existing assignments.
> +	 - Unassign a MBM event from currently assigned events.
> +
> +	Examples:
> +	Initial group status:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
> +	  //0=tl;1=tl
> +	  /child_default_mon_grp/0=tl;1=tl
> +
> +	To update the default group to assign only total MBM event on domain 0:
> +	::
> +
> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
> +	  //0=t;1=tl
> +	  /child_default_mon_grp/0=tl;1=tl
> +
> +	To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
> +	::
> +
> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
> +	  //0=t;1=tl
> +	  /child_default_mon_grp/0=tl;1=l
> +
> +	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
> +	both local and total MBM events on domain 1:
> +	::
> +
> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> +			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +
> +	Assignment status after the update:
> +	::
> +
> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> +	  non_default_ctrl_mon_grp//0=tl;1=tl
> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
> +	  //0=t;1=tl
> +	  /child_default_mon_grp/0=tl;1=l
> +
> +	To update the default group to add a local MBM event domain 0:

"local MBM event domain 0" -> "local MBM event on domain 0"?

Taking a step back to look at the completed "mbm_assign_control" section
it is noteworthy that all this work is about assigning counters to events
but after this large section is complete the word "counter" does not appear
a single time.

The section starts with a brief:
"Reports the resctrl group and monitor status of each group." and then
moves to terms like "assigning events"/"assignment status" without defining
what that means.

Instead of rewriting this, what do you think of adding some definition
of what "assignment state" means to the start of the section. For example,
(I am sure it can be improved):

"Use "mbm_assign_control" to manage monitoring counter assignment to
monitoring events when mbm_cntr_assign_mode is enabled."

Reinette
Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Posted by Moger, Babu 1 year ago
Hi Reinette,

On 2/6/25 12:48, Reinette Chatre wrote:
> Hi Babu,
> 
> On 1/22/25 12:20 PM, Babu Moger wrote:
>> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
>> events in the CTRL_MON or MON groups should have counters assigned.
>>
>> Provide an interface for assigning MBM events by writing to the file:
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
>> events can be assigned or unassigned as needed.
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>>  "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Format for specific type of groups:
>>
>>  * Default CTRL_MON group:
>>          "//<domain_id><opcode><flags>"
>>
>>  * Non-default CTRL_MON group:
>>          "<CTRL_MON group>//<domain_id><opcode><flags>"
>>
>>  * Child MON group of default CTRL_MON group:
>>          "/<MON group>/<domain_id><opcode><flags>"
>>
>>  * Child MON group of non-default CTRL_MON group:
>>          "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>>
>> Domain_id '*' will apply the flags on all the domains.
>>
>> Opcode can be one of the following:
>>
>>  = Update the assignment to match the flags
>>  + Assign a new MBM event without impacting existing assignments.
>>  - Unassign a MBM event from currently assigned events.
>>
>> Assignment flags can be one of the following:
>>  t  MBM total event
>>  l  MBM local event
>>  tl Both total and local MBM events
>>  _  None of the MBM events. Valid only with '=' opcode. This flag cannot
>>     be combined with other flags.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v11: Fixed the static check warning with initializing dom_id in resctrl_process_flags().
>>
>> v10: Fixed the issue with finding the domain in multiple iterations.
>>      Printed error message with domain information when assign fails.
>>      Changed the variables to unsigned for processing assign state.
>>      Taken care of few format corrections.
>>
>> v9: Fixed handling special case '//0=' and '//".
>>     Removed extra strstr() call.
>>     Added generic failure text when assignment operation fails.
>>     Corrected user documentation format texts.
>>
>> v8: Moved unassign as the first action during the assign modification.
>>     Assign none "_" takes priority. Cannot be mixed with other flags.
>>     Updated the documentation and .rst file format. htmldoc looks ok.
>>
>> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
>>     Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
>>     Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
>>     Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
>>     Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
>>     Removed ABMC reference in FS code.
>>     Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
>>     Not sure if we need to change the behaviour here. Processed them sequencially right now.
>>     Users have the liberty to pass the flags. Restricting it might be a problem later.
>>
>> v6: Added support assign all if domain id is '*'
>>     Fixed the allocation of counter id if it not assigned already.
>>
>> v5: Interface name changed from mbm_assign_control to mbm_control.
>>     Fixed opcode and flags combination.
>>     '=_" is valid.
>>     "-_" amd "+_" is not valid.
>>     Minor message update.
>>     Renamed the function with prefix - rdtgroup_.
>>     Corrected few documentation mistakes.
>>     Rebase related changes after SNC support.
>>
>> v4: Added domain specific assignments. Fixed the opcode parsing.
>>
>> v3: New patch.
>>     Addresses the feedback to provide the global assignment interface.
>> ---
>>  Documentation/arch/x86/resctrl.rst     | 116 +++++++++++-
>>  arch/x86/kernel/cpu/resctrl/internal.h |  10 +
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 241 ++++++++++++++++++++++++-
>>  3 files changed, 365 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 3040e5c4cd76..47e15b48d951 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -356,7 +356,8 @@ with the following files:
>>  	 t  MBM total event is assigned.
>>  	 l  MBM local event is assigned.
>>  	 tl Both MBM total and local events are assigned.
>> -	 _  None of the MBM events are assigned.
>> +	 _  None of the MBM events are assigned. Only works with opcode '=' for write
>> +	    and cannot be combined with other flags.
>>  
>>  	Examples:
>>  	::
>> @@ -374,6 +375,119 @@ with the following files:
>>  	There are four resctrl groups. All the groups have total and local MBM events
>>  	assigned on domain 0 and 1.
>>  
>> +	Assignment state can be updated by writing to "mbm_assign_control".
>> +
>> +	Format is similar to the list format with addition of opcode for the
>> +	assignment operation.
>> +
>> +		"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> +	Format for each type of group:
>> +
>> +        * Default CTRL_MON group:
>> +                "//<domain_id><opcode><flags>"
>> +
>> +        * Non-default CTRL_MON group:
>> +                "<CTRL_MON group>//<domain_id><opcode><flags>"
>> +
>> +        * Child MON group of default CTRL_MON group:
>> +                "/<MON group>/<domain_id><opcode><flags>"
>> +
>> +        * Child MON group of non-default CTRL_MON group:
>> +                "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
>> +
>> +	Domain_id '*' will apply the flags to all the domains.
>> +
>> +	Opcode can be one of the following:
>> +	::
>> +
>> +	 = Update the assignment to match the MBM event.
>> +	 + Assign a new MBM event without impacting existing assignments.
>> +	 - Unassign a MBM event from currently assigned events.
>> +
>> +	Examples:
>> +	Initial group status:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
>> +	  //0=tl;1=tl
>> +	  /child_default_mon_grp/0=tl;1=tl
>> +
>> +	To update the default group to assign only total MBM event on domain 0:
>> +	::
>> +
>> +	  # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
>> +	  //0=t;1=tl
>> +	  /child_default_mon_grp/0=tl;1=tl
>> +
>> +	To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
>> +	::
>> +
>> +	  # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
>> +	  //0=t;1=tl
>> +	  /child_default_mon_grp/0=tl;1=l
>> +
>> +	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
>> +	both local and total MBM events on domain 1:
>> +	::
>> +
>> +	  # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> +			/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +
>> +	Assignment status after the update:
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> +	  non_default_ctrl_mon_grp//0=tl;1=tl
>> +	  non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
>> +	  //0=t;1=tl
>> +	  /child_default_mon_grp/0=tl;1=l
>> +
>> +	To update the default group to add a local MBM event domain 0:
> 
> "local MBM event domain 0" -> "local MBM event on domain 0"?

Sure.

> 
> Taking a step back to look at the completed "mbm_assign_control" section
> it is noteworthy that all this work is about assigning counters to events
> but after this large section is complete the word "counter" does not appear
> a single time.
> 
> The section starts with a brief:
> "Reports the resctrl group and monitor status of each group." and then
> moves to terms like "assigning events"/"assignment status" without defining
> what that means.
> 
> Instead of rewriting this, what do you think of adding some definition
> of what "assignment state" means to the start of the section. For example,
> (I am sure it can be improved):
> 
> "Use "mbm_assign_control" to manage monitoring counter assignment to
> monitoring events when mbm_cntr_assign_mode is enabled."


Sure.

-- 
Thanks
Babu Moger