Introduce the interface to assign MBM events in mbm_cntr_assign mode.
Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_assign_control
Format is similar to the list format with addition of opcode for the
assignment operation.
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id><opcode><flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><opcode><flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><opcode><flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
Domain_id '*' will apply the flags on all the domains.
Opcode can be one of the following:
= Update the assignment to match the flags
+ Assign a new MBM event without impacting existing assignments.
- Unassign a MBM event from currently assigned events.
Assignment flags can be one of the following:
t MBM total event
l MBM local event
tl Both total and local MBM events
_ None of the MBM events. Valid only with '=' opcode. This flag cannot
be combined with other flags.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Moved unassign as the first action during the assign modification.
Assign none "_" takes priority. Cannot be mixed with other flags.
Updated the documentation and .rst file format. htmldoc looks ok.
v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.
Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name.
Fixed rdtgroup_str_to_mon_state to return error for invalid flags.
Simplified the calls rdtgroup_assign_cntr by merging few functions earlier.
Removed ABMC reference in FS code.
Reinette commented about handling the combination of flags like 'lt_' and '_lt'.
Not sure if we need to change the behaviour here. Processed them sequencially right now.
Users have the liberty to pass the flags. Restricting it might be a problem later.
v6: Added support assign all if domain id is '*'
Fixed the allocation of counter id if it not assigned already.
v5: Interface name changed from mbm_assign_control to mbm_control.
Fixed opcode and flags combination.
'=_" is valid.
"-_" amd "+_" is not valid.
Minor message update.
Renamed the function with prefix - rdtgroup_.
Corrected few documentation mistakes.
Rebase related changes after SNC support.
v4: Added domain specific assignments. Fixed the opcode parsing.
v3: New patch.
Addresses the feedback to provide the global assignment interface.
https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/
---
Documentation/arch/x86/resctrl.rst | 115 +++++++++++-
arch/x86/kernel/cpu/resctrl/internal.h | 10 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 233 ++++++++++++++++++++++++-
3 files changed, 356 insertions(+), 2 deletions(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index b85d3bc3e301..77bb0b095127 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -336,7 +336,8 @@ with the following files:
t MBM total event is assigned.
l MBM local event is assigned.
tl Both total and local MBM events are assigned.
- _ None of the MBM events are assigned.
+ _ None of the MBM events are assigned. Only works with opcode '=' for write
+ and cannot be combined with other flags.
Examples:
::
@@ -354,6 +355,118 @@ with the following files:
There are four resctrl groups. All the groups have total and local MBM events
assigned on domain 0 and 1.
+ Assignment state can be updated by writing to the interface.
+
+ Format is similar to the list format with addition of opcode for the
+ assignment operation.
+
+ "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+ Format for each type of groups:
+
+ * Default CTRL_MON group:
+ "//<domain_id><opcode><flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id><opcode><flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id><opcode><flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
+
+ Domain_id '*' will apply the flags on all the domains.
+
+ Opcode can be one of the following:
+ ::
+
+ = Update the assignment to match the MBM event.
+ + Assign a new MBM event without impacting existing assignments.
+ - Unassign a MBM event from currently assigned events.
+
+ Examples:
+ Initial group status:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the default group to assign only total MBM event on domain 0:
+ ::
+
+ # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the MON group child_default_mon_grp to remove total MBM event on domain 1:
+ ::
+
+ # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign
+ both local and total MBM events on domain 1:
+ ::
+
+ # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+ /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the default group to add a local MBM event domain 0.
+ ::
+
+ # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
+ To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the
+ MBM events on all the domains.
+ ::
+
+ # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ ::
+
+ #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=_;1=_;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=l;
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a6f40d3115f4..e8d6a430dc4a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -74,6 +74,16 @@
*/
#define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2)
+/*
+ * Assignment flags for mbm_cntr_assign feature
+ */
+enum {
+ ASSIGN_NONE = 0,
+ ASSIGN_TOTAL = BIT(QOS_L3_MBM_TOTAL_EVENT_ID),
+ ASSIGN_LOCAL = BIT(QOS_L3_MBM_LOCAL_EVENT_ID),
+ ASSIGN_INVALID,
+};
+
/**
* cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
* aren't marked nohz_full
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cf92ceb0f05e..6095146e3ba4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1040,6 +1040,236 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
return 0;
}
+static int rdtgroup_str_to_mon_state(char *flag)
+{
+ int i, mon_state = ASSIGN_NONE;
+
+ for (i = 0; i < strlen(flag); i++) {
+ switch (*(flag + i)) {
+ case 't':
+ mon_state |= ASSIGN_TOTAL;
+ break;
+ case 'l':
+ mon_state |= ASSIGN_LOCAL;
+ break;
+ case '_':
+ return ASSIGN_NONE;
+ default:
+ return ASSIGN_INVALID;
+ }
+ }
+
+ return mon_state;
+}
+
+static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype,
+ char *p_grp, char *c_grp)
+{
+ struct rdtgroup *rdtg, *crg;
+
+ if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+ return &rdtgroup_default;
+ } else if (rtype == RDTCTRL_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+ if (!strcmp(p_grp, rdtg->kn->name))
+ return rdtg;
+ } else if (rtype == RDTMON_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ if (!strcmp(p_grp, rdtg->kn->name)) {
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ if (!strcmp(c_grp, crg->kn->name))
+ return crg;
+ }
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static int rdtgroup_process_flags(struct rdt_resource *r,
+ enum rdt_group_type rtype,
+ char *p_grp, char *c_grp, char *tok)
+{
+ int op, mon_state, assign_state, unassign_state;
+ char *dom_str, *id_str, *op_str;
+ struct rdt_mon_domain *d;
+ struct rdtgroup *rdtgrp;
+ unsigned long dom_id;
+ int ret, found = 0;
+
+ rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp);
+
+ if (!rdtgrp) {
+ rdt_last_cmd_puts("Not a valid resctrl group\n");
+ return -EINVAL;
+ }
+
+next:
+ if (!tok || tok[0] == '\0')
+ return 0;
+
+ /* Start processing the strings for each domain */
+ dom_str = strim(strsep(&tok, ";"));
+
+ op_str = strpbrk(dom_str, "=+-");
+
+ if (op_str) {
+ op = *op_str;
+ } else {
+ rdt_last_cmd_puts("Missing operation =, +, - character\n");
+ return -EINVAL;
+ }
+
+ id_str = strsep(&dom_str, "=+-");
+
+ /* Check for domain id '*' which means all domains */
+ if (id_str && *id_str == '*') {
+ d = NULL;
+ goto check_state;
+ } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+ rdt_last_cmd_puts("Missing domain id\n");
+ return -EINVAL;
+ }
+
+ /* Verify if the dom_id is valid */
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (d->hdr.id == dom_id) {
+ found = 1;
+ break;
+ }
+ }
+
+ if (!found) {
+ rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+ return -EINVAL;
+ }
+
+check_state:
+ mon_state = rdtgroup_str_to_mon_state(dom_str);
+
+ if (mon_state == ASSIGN_INVALID) {
+ rdt_last_cmd_puts("Invalid assign flag\n");
+ goto out_fail;
+ }
+
+ assign_state = 0;
+ unassign_state = 0;
+
+ switch (op) {
+ case '+':
+ if (mon_state == ASSIGN_NONE) {
+ rdt_last_cmd_puts("Invalid assign opcode\n");
+ goto out_fail;
+ }
+ assign_state = mon_state;
+ break;
+ case '-':
+ if (mon_state == ASSIGN_NONE) {
+ rdt_last_cmd_puts("Invalid assign opcode\n");
+ goto out_fail;
+ }
+ unassign_state = mon_state;
+ break;
+ case '=':
+ assign_state = mon_state;
+ unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+ break;
+ default:
+ break;
+ }
+
+ if (unassign_state & ASSIGN_TOTAL) {
+ ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (unassign_state & ASSIGN_LOCAL) {
+ ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (assign_state & ASSIGN_TOTAL) {
+ ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ if (assign_state & ASSIGN_LOCAL) {
+ ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID);
+ if (ret)
+ goto out_fail;
+ }
+
+ goto next;
+
+out_fail:
+
+ return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ char *token, *cmon_grp, *mon_grp;
+ enum rdt_group_type rtype;
+ int ret;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return -EINVAL;
+ }
+
+ rdt_last_cmd_clear();
+
+ while ((token = strsep(&buf, "\n")) != NULL) {
+ if (strstr(token, "/")) {
+ /*
+ * The write command follows the following format:
+ * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
+ * Extract the CTRL_MON group.
+ */
+ cmon_grp = strsep(&token, "/");
+
+ /*
+ * Extract the MON_GROUP.
+ * strsep returns empty string for contiguous delimiters.
+ * Empty mon_grp here means it is a RDTCTRL_GROUP.
+ */
+ mon_grp = strsep(&token, "/");
+
+ if (*mon_grp == '\0')
+ rtype = RDTCTRL_GROUP;
+ else
+ rtype = RDTMON_GROUP;
+
+ ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token);
+ if (ret)
+ break;
+ }
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_assign_control",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_assign_control_show,
+ .write = rdtgroup_mbm_assign_control_write,
},
{
.name = "cpus_list",
--
2.34.1
Hi Babu, On 10/9/24 10:39 AM, Babu Moger wrote: > Introduce the interface to assign MBM events in mbm_cntr_assign mode. > > Events can be enabled or disabled by writing to file > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Format is similar to the list format with addition of opcode for the > assignment operation. > "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > > Format for specific type of groups: > > * Default CTRL_MON group: > "//<domain_id><opcode><flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id><opcode><flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id><opcode><flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > > Domain_id '*' will apply the flags on all the domains. > > Opcode can be one of the following: > > = Update the assignment to match the flags > + Assign a new MBM event without impacting existing assignments. > - Unassign a MBM event from currently assigned events. > > Assignment flags can be one of the following: > t MBM total event > l MBM local event > tl Both total and local MBM events > _ None of the MBM events. Valid only with '=' opcode. This flag cannot > be combined with other flags. > > Signed-off-by: Babu Moger <babu.moger@amd.com> > --- > v8: Moved unassign as the first action during the assign modification. > Assign none "_" takes priority. Cannot be mixed with other flags. > Updated the documentation and .rst file format. htmldoc looks ok. > > v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write(). > Added mutex lock in rdtgroup_mbm_assign_control_write() while processing. > Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name. > Fixed rdtgroup_str_to_mon_state to return error for invalid flags. > Simplified the calls rdtgroup_assign_cntr by merging few functions earlier. > Removed ABMC reference in FS code. > Reinette commented about handling the combination of flags like 'lt_' and '_lt'. > Not sure if we need to change the behaviour here. Processed them sequencially right now. > Users have the liberty to pass the flags. Restricting it might be a problem later. > > v6: Added support assign all if domain id is '*' > Fixed the allocation of counter id if it not assigned already. > > v5: Interface name changed from mbm_assign_control to mbm_control. > Fixed opcode and flags combination. > '=_" is valid. > "-_" amd "+_" is not valid. > Minor message update. > Renamed the function with prefix - rdtgroup_. > Corrected few documentation mistakes. > Rebase related changes after SNC support. > > v4: Added domain specific assignments. Fixed the opcode parsing. > > v3: New patch. > Addresses the feedback to provide the global assignment interface. > https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/ > --- > Documentation/arch/x86/resctrl.rst | 115 +++++++++++- > arch/x86/kernel/cpu/resctrl/internal.h | 10 ++ > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 233 ++++++++++++++++++++++++- > 3 files changed, 356 insertions(+), 2 deletions(-) > > diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst > index b85d3bc3e301..77bb0b095127 100644 > --- a/Documentation/arch/x86/resctrl.rst > +++ b/Documentation/arch/x86/resctrl.rst > @@ -336,7 +336,8 @@ with the following files: > t MBM total event is assigned. > l MBM local event is assigned. > tl Both total and local MBM events are assigned. > - _ None of the MBM events are assigned. > + _ None of the MBM events are assigned. Only works with opcode '=' for write > + and cannot be combined with other flags. > > Examples: > :: > @@ -354,6 +355,118 @@ with the following files: > There are four resctrl groups. All the groups have total and local MBM events > assigned on domain 0 and 1. > > + Assignment state can be updated by writing to the interface. > + > + Format is similar to the list format with addition of opcode for the > + assignment operation. > + > + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > + > + Format for each type of groups: > + > + * Default CTRL_MON group: > + "//<domain_id><opcode><flags>" > + > + * Non-default CTRL_MON group: > + "<CTRL_MON group>//<domain_id><opcode><flags>" > + > + * Child MON group of default CTRL_MON group: > + "/<MON group>/<domain_id><opcode><flags>" > + > + * Child MON group of non-default CTRL_MON group: > + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" > + > + Domain_id '*' will apply the flags on all the domains. > + > + Opcode can be one of the following: > + :: > + > + = Update the assignment to match the MBM event. > + + Assign a new MBM event without impacting existing assignments. > + - Unassign a MBM event from currently assigned events. > + > + Examples: > + Initial group status: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > + //0=tl;1=tl; > + /child_default_mon_grp/0=tl;1=tl; > + > + To update the default group to assign only total MBM event on domain 0: > + :: > + > + # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > + //0=t;1=tl; > + /child_default_mon_grp/0=tl;1=tl; > + > + To update the MON group child_default_mon_grp to remove total MBM event on domain 1: > + :: > + > + # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control Please be consistent by always using "# cat", not sometimes "$ cat" as above. > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > + //0=t;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > + To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign > + both local and total MBM events on domain 1: > + :: > + > + # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" > > + /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + Missing "# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control" > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; > + //0=t;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > + To update the default group to add a local MBM event domain 0. > + :: > + > + # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + non_default_ctrl_mon_grp//0=tl;1=tl; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; > + //0=tl;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > + To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the > + MBM events on all the domains. > + :: > + > + # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > + > + Assignment status after the update: > + :: > + > + #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control Please be consistent with spacing "# cat" vs "#cat". This is very noticeable when viewing the formatted docs. > + non_default_ctrl_mon_grp//0=_;1=_; > + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; > + //0=tl;1=tl; > + /child_default_mon_grp/0=tl;1=l; > + > "max_threshold_occupancy": > Read/write file provides the largest value (in > bytes) at which a previously used LLC_occupancy > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h > index a6f40d3115f4..e8d6a430dc4a 100644 > --- a/arch/x86/kernel/cpu/resctrl/internal.h > +++ b/arch/x86/kernel/cpu/resctrl/internal.h > @@ -74,6 +74,16 @@ > */ > #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2) > > +/* > + * Assignment flags for mbm_cntr_assign feature > + */ > +enum { > + ASSIGN_NONE = 0, > + ASSIGN_TOTAL = BIT(QOS_L3_MBM_TOTAL_EVENT_ID), > + ASSIGN_LOCAL = BIT(QOS_L3_MBM_LOCAL_EVENT_ID), > + ASSIGN_INVALID, > +}; > + > /** > * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that > * aren't marked nohz_full > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index cf92ceb0f05e..6095146e3ba4 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -1040,6 +1040,236 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of, > return 0; > } > > +static int rdtgroup_str_to_mon_state(char *flag) > +{ > + int i, mon_state = ASSIGN_NONE; > + > + for (i = 0; i < strlen(flag); i++) { > + switch (*(flag + i)) { > + case 't': > + mon_state |= ASSIGN_TOTAL; > + break; > + case 'l': > + mon_state |= ASSIGN_LOCAL; > + break; > + case '_': > + return ASSIGN_NONE; > + default: > + return ASSIGN_INVALID; > + } > + } > + > + return mon_state; > +} > + > +static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype, > + char *p_grp, char *c_grp) > +{ > + struct rdtgroup *rdtg, *crg; > + > + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') { > + return &rdtgroup_default; > + } else if (rtype == RDTCTRL_GROUP) { > + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) > + if (!strcmp(p_grp, rdtg->kn->name)) > + return rdtg; > + } else if (rtype == RDTMON_GROUP) { > + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) { > + if (!strcmp(p_grp, rdtg->kn->name)) { > + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, > + mon.crdtgrp_list) { > + if (!strcmp(c_grp, crg->kn->name)) > + return crg; > + } > + } > + } > + } > + > + return NULL; > +} > + > +static int rdtgroup_process_flags(struct rdt_resource *r, > + enum rdt_group_type rtype, > + char *p_grp, char *c_grp, char *tok) > +{ > + int op, mon_state, assign_state, unassign_state; > + char *dom_str, *id_str, *op_str; > + struct rdt_mon_domain *d; > + struct rdtgroup *rdtgrp; > + unsigned long dom_id; > + int ret, found = 0; > + > + rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp); > + > + if (!rdtgrp) { > + rdt_last_cmd_puts("Not a valid resctrl group\n"); > + return -EINVAL; > + } > + > +next: > + if (!tok || tok[0] == '\0') > + return 0; > + > + /* Start processing the strings for each domain */ > + dom_str = strim(strsep(&tok, ";")); > + > + op_str = strpbrk(dom_str, "=+-"); > + > + if (op_str) { > + op = *op_str; > + } else { > + rdt_last_cmd_puts("Missing operation =, +, - character\n"); > + return -EINVAL; > + } > + > + id_str = strsep(&dom_str, "=+-"); > + > + /* Check for domain id '*' which means all domains */ > + if (id_str && *id_str == '*') { > + d = NULL; > + goto check_state; > + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) { > + rdt_last_cmd_puts("Missing domain id\n"); > + return -EINVAL; > + } > + > + /* Verify if the dom_id is valid */ > + list_for_each_entry(d, &r->mon_domains, hdr.list) { > + if (d->hdr.id == dom_id) { > + found = 1; > + break; > + } > + } > + > + if (!found) { > + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id); > + return -EINVAL; > + } > + > +check_state: > + mon_state = rdtgroup_str_to_mon_state(dom_str); > + > + if (mon_state == ASSIGN_INVALID) { > + rdt_last_cmd_puts("Invalid assign flag\n"); > + goto out_fail; > + } > + > + assign_state = 0; > + unassign_state = 0; > + > + switch (op) { > + case '+': > + if (mon_state == ASSIGN_NONE) { > + rdt_last_cmd_puts("Invalid assign opcode\n"); > + goto out_fail; > + } > + assign_state = mon_state; > + break; > + case '-': > + if (mon_state == ASSIGN_NONE) { > + rdt_last_cmd_puts("Invalid assign opcode\n"); > + goto out_fail; > + } > + unassign_state = mon_state; > + break; > + case '=': > + assign_state = mon_state; > + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state; > + break; > + default: > + break; > + } > + > + if (unassign_state & ASSIGN_TOTAL) { > + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + if (unassign_state & ASSIGN_LOCAL) { > + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + if (assign_state & ASSIGN_TOTAL) { > + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + if (assign_state & ASSIGN_LOCAL) { > + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); > + if (ret) > + goto out_fail; > + } > + > + goto next; > + > +out_fail: Is it possible to print a message to the command status to give some details about which request failed? I am wondering about a scenario where a user changes multiple domains of multiple groups, since the operation does not undo changes, it will fail without information to user space about which setting triggered the failure and which settings succeeded. This is similar to what is done when user attempts to move several tasks ... the error will indicate which task triggered failure so that user space knows what completed successfully. > + > + return -EINVAL; > +} > + > +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct rdt_resource *r = of->kn->parent->priv; > + char *token, *cmon_grp, *mon_grp; > + enum rdt_group_type rtype; > + int ret; > + > + /* Valid input requires a trailing newline */ > + if (nbytes == 0 || buf[nbytes - 1] != '\n') > + return -EINVAL; > + > + buf[nbytes - 1] = '\0'; > + > + cpus_read_lock(); > + mutex_lock(&rdtgroup_mutex); > + > + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { > + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n"); Writing to last_cmd_status_buf here ... > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + return -EINVAL; > + } > + > + rdt_last_cmd_clear(); ... but initializing buffer here. Sidenote: This was an issue before. If you receive comments about items in patches, please do check if those comments apply to other patches also. > + > + while ((token = strsep(&buf, "\n")) != NULL) { > + if (strstr(token, "/")) { > + /* > + * The write command follows the following format: > + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>” > + * Extract the CTRL_MON group. > + */ > + cmon_grp = strsep(&token, "/"); > + > + /* > + * Extract the MON_GROUP. > + * strsep returns empty string for contiguous delimiters. > + * Empty mon_grp here means it is a RDTCTRL_GROUP. > + */ > + mon_grp = strsep(&token, "/"); > + > + if (*mon_grp == '\0') > + rtype = RDTCTRL_GROUP; > + else > + rtype = RDTMON_GROUP; > + > + ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token); > + if (ret) > + break; > + } > + } > + > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + > + return ret ?: nbytes; > +} > + > #ifdef CONFIG_PROC_CPU_RESCTRL > > /* > @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = { > }, > { > .name = "mbm_assign_control", > - .mode = 0444, > + .mode = 0644, > .kf_ops = &rdtgroup_kf_single_ops, > .seq_show = rdtgroup_mbm_assign_control_show, > + .write = rdtgroup_mbm_assign_control_write, > }, > { > .name = "cpus_list", On a high level this looks ok but this code needs to be more robust. This will parse data from user space that may include all kinds of input ... think malicious user or a buggy script. I am not able to test this code but I tried to work through what will happen under some wrong input and found some issues. For example, if user space provides input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for this invalid input. A more severe example is with input like '//0=\n', from what I can tell this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat this as ASSIGN_NONE and proceed as if user provided '//0=_'. This was just some scenarios with basic input that could be typos, no real stress tests. I stopped here though since I believe it is already clear this needs to be more robust. Please do test this interface by exercising it with invalid input and corner cases. Reinette
Hi Reinette, On 10/15/24 22:43, Reinette Chatre wrote: > Hi Babu, > > On 10/9/24 10:39 AM, Babu Moger wrote: >> Introduce the interface to assign MBM events in mbm_cntr_assign mode. >> >> Events can be enabled or disabled by writing to file >> /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> >> Format is similar to the list format with addition of opcode for the >> assignment operation. >> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" >> >> Format for specific type of groups: >> >> * Default CTRL_MON group: >> "//<domain_id><opcode><flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id><opcode><flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id><opcode><flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" >> >> Domain_id '*' will apply the flags on all the domains. >> >> Opcode can be one of the following: >> >> = Update the assignment to match the flags >> + Assign a new MBM event without impacting existing assignments. >> - Unassign a MBM event from currently assigned events. >> >> Assignment flags can be one of the following: >> t MBM total event >> l MBM local event >> tl Both total and local MBM events >> _ None of the MBM events. Valid only with '=' opcode. This flag cannot >> be combined with other flags. >> >> Signed-off-by: Babu Moger <babu.moger@amd.com> >> --- >> v8: Moved unassign as the first action during the assign modification. >> Assign none "_" takes priority. Cannot be mixed with other flags. >> Updated the documentation and .rst file format. htmldoc looks ok. >> >> v7: Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write(). >> Added mutex lock in rdtgroup_mbm_assign_control_write() while processing. >> Renamed rdtgroup_find_grp to rdtgroup_find_grp_by_name. >> Fixed rdtgroup_str_to_mon_state to return error for invalid flags. >> Simplified the calls rdtgroup_assign_cntr by merging few functions earlier. >> Removed ABMC reference in FS code. >> Reinette commented about handling the combination of flags like 'lt_' and '_lt'. >> Not sure if we need to change the behaviour here. Processed them sequencially right now. >> Users have the liberty to pass the flags. Restricting it might be a problem later. >> >> v6: Added support assign all if domain id is '*' >> Fixed the allocation of counter id if it not assigned already. >> >> v5: Interface name changed from mbm_assign_control to mbm_control. >> Fixed opcode and flags combination. >> '=_" is valid. >> "-_" amd "+_" is not valid. >> Minor message update. >> Renamed the function with prefix - rdtgroup_. >> Corrected few documentation mistakes. >> Rebase related changes after SNC support. >> >> v4: Added domain specific assignments. Fixed the opcode parsing. >> >> v3: New patch. >> Addresses the feedback to provide the global assignment interface. >> https://lore.kernel.org/lkml/c73f444b-83a1-4e9a-95d3-54c5165ee782@intel.com/ >> --- >> Documentation/arch/x86/resctrl.rst | 115 +++++++++++- >> arch/x86/kernel/cpu/resctrl/internal.h | 10 ++ >> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 233 ++++++++++++++++++++++++- >> 3 files changed, 356 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst >> index b85d3bc3e301..77bb0b095127 100644 >> --- a/Documentation/arch/x86/resctrl.rst >> +++ b/Documentation/arch/x86/resctrl.rst >> @@ -336,7 +336,8 @@ with the following files: >> t MBM total event is assigned. >> l MBM local event is assigned. >> tl Both total and local MBM events are assigned. >> - _ None of the MBM events are assigned. >> + _ None of the MBM events are assigned. Only works with opcode '=' for write >> + and cannot be combined with other flags. >> >> Examples: >> :: >> @@ -354,6 +355,118 @@ with the following files: >> There are four resctrl groups. All the groups have total and local MBM events >> assigned on domain 0 and 1. >> >> + Assignment state can be updated by writing to the interface. >> + >> + Format is similar to the list format with addition of opcode for the >> + assignment operation. >> + >> + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" >> + >> + Format for each type of groups: >> + >> + * Default CTRL_MON group: >> + "//<domain_id><opcode><flags>" >> + >> + * Non-default CTRL_MON group: >> + "<CTRL_MON group>//<domain_id><opcode><flags>" >> + >> + * Child MON group of default CTRL_MON group: >> + "/<MON group>/<domain_id><opcode><flags>" >> + >> + * Child MON group of non-default CTRL_MON group: >> + "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>" >> + >> + Domain_id '*' will apply the flags on all the domains. >> + >> + Opcode can be one of the following: >> + :: >> + >> + = Update the assignment to match the MBM event. >> + + Assign a new MBM event without impacting existing assignments. >> + - Unassign a MBM event from currently assigned events. >> + >> + Examples: >> + Initial group status: >> + :: >> + >> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + non_default_ctrl_mon_grp//0=tl;1=tl; >> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> + //0=tl;1=tl; >> + /child_default_mon_grp/0=tl;1=tl; >> + >> + To update the default group to assign only total MBM event on domain 0: >> + :: >> + >> + # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + >> + Assignment status after the update: >> + :: >> + >> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + non_default_ctrl_mon_grp//0=tl;1=tl; >> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> + //0=t;1=tl; >> + /child_default_mon_grp/0=tl;1=tl; >> + >> + To update the MON group child_default_mon_grp to remove total MBM event on domain 1: >> + :: >> + >> + # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + >> + Assignment status after the update: >> + :: >> + >> + $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Please be consistent by always using "# cat", not sometimes "$ cat" as above. Sure. > >> + non_default_ctrl_mon_grp//0=tl;1=tl; >> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> + //0=t;1=tl; >> + /child_default_mon_grp/0=tl;1=l; >> + >> + To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to unassign >> + both local and total MBM events on domain 1: >> + :: >> + >> + # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" > >> + /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + >> + Assignment status after the update: >> + :: >> + > > Missing "# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control" Sure. > >> + non_default_ctrl_mon_grp//0=tl;1=tl; >> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; >> + //0=t;1=tl; >> + /child_default_mon_grp/0=tl;1=l; >> + >> + To update the default group to add a local MBM event domain 0. >> + :: >> + >> + # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + >> + Assignment status after the update: >> + :: >> + >> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + non_default_ctrl_mon_grp//0=tl;1=tl; >> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; >> + //0=tl;1=tl; >> + /child_default_mon_grp/0=tl;1=l; >> + >> + To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all the >> + MBM events on all the domains. >> + :: >> + >> + # echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> + >> + Assignment status after the update: >> + :: >> + >> + #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Please be consistent with spacing "# cat" vs "#cat". This is very noticeable when > viewing the formatted docs. Sure. > >> + non_default_ctrl_mon_grp//0=_;1=_; >> + non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_; >> + //0=tl;1=tl; >> + /child_default_mon_grp/0=tl;1=l; >> + >> "max_threshold_occupancy": >> Read/write file provides the largest value (in >> bytes) at which a previously used LLC_occupancy >> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h >> index a6f40d3115f4..e8d6a430dc4a 100644 >> --- a/arch/x86/kernel/cpu/resctrl/internal.h >> +++ b/arch/x86/kernel/cpu/resctrl/internal.h >> @@ -74,6 +74,16 @@ >> */ >> #define MBM_EVENT_ARRAY_INDEX(_event) ((_event) - 2) >> >> +/* >> + * Assignment flags for mbm_cntr_assign feature >> + */ >> +enum { >> + ASSIGN_NONE = 0, >> + ASSIGN_TOTAL = BIT(QOS_L3_MBM_TOTAL_EVENT_ID), >> + ASSIGN_LOCAL = BIT(QOS_L3_MBM_LOCAL_EVENT_ID), >> + ASSIGN_INVALID, >> +}; >> + >> /** >> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that >> * aren't marked nohz_full >> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> index cf92ceb0f05e..6095146e3ba4 100644 >> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> @@ -1040,6 +1040,236 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of, >> return 0; >> } >> >> +static int rdtgroup_str_to_mon_state(char *flag) >> +{ >> + int i, mon_state = ASSIGN_NONE; >> + >> + for (i = 0; i < strlen(flag); i++) { >> + switch (*(flag + i)) { >> + case 't': >> + mon_state |= ASSIGN_TOTAL; >> + break; >> + case 'l': >> + mon_state |= ASSIGN_LOCAL; >> + break; >> + case '_': >> + return ASSIGN_NONE; >> + default: >> + return ASSIGN_INVALID; >> + } >> + } >> + >> + return mon_state; >> +} >> + >> +static struct rdtgroup *rdtgroup_find_grp_by_name(enum rdt_group_type rtype, >> + char *p_grp, char *c_grp) >> +{ >> + struct rdtgroup *rdtg, *crg; >> + >> + if (rtype == RDTCTRL_GROUP && *p_grp == '\0') { >> + return &rdtgroup_default; >> + } else if (rtype == RDTCTRL_GROUP) { >> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) >> + if (!strcmp(p_grp, rdtg->kn->name)) >> + return rdtg; >> + } else if (rtype == RDTMON_GROUP) { >> + list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) { >> + if (!strcmp(p_grp, rdtg->kn->name)) { >> + list_for_each_entry(crg, &rdtg->mon.crdtgrp_list, >> + mon.crdtgrp_list) { >> + if (!strcmp(c_grp, crg->kn->name)) >> + return crg; >> + } >> + } >> + } >> + } >> + >> + return NULL; >> +} >> + >> +static int rdtgroup_process_flags(struct rdt_resource *r, >> + enum rdt_group_type rtype, >> + char *p_grp, char *c_grp, char *tok) >> +{ >> + int op, mon_state, assign_state, unassign_state; >> + char *dom_str, *id_str, *op_str; >> + struct rdt_mon_domain *d; >> + struct rdtgroup *rdtgrp; >> + unsigned long dom_id; >> + int ret, found = 0; >> + >> + rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp); >> + >> + if (!rdtgrp) { >> + rdt_last_cmd_puts("Not a valid resctrl group\n"); >> + return -EINVAL; >> + } >> + >> +next: >> + if (!tok || tok[0] == '\0') >> + return 0; >> + >> + /* Start processing the strings for each domain */ >> + dom_str = strim(strsep(&tok, ";")); >> + >> + op_str = strpbrk(dom_str, "=+-"); >> + >> + if (op_str) { >> + op = *op_str; >> + } else { >> + rdt_last_cmd_puts("Missing operation =, +, - character\n"); >> + return -EINVAL; >> + } >> + >> + id_str = strsep(&dom_str, "=+-"); >> + >> + /* Check for domain id '*' which means all domains */ >> + if (id_str && *id_str == '*') { >> + d = NULL; >> + goto check_state; >> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) { >> + rdt_last_cmd_puts("Missing domain id\n"); >> + return -EINVAL; >> + } >> + >> + /* Verify if the dom_id is valid */ >> + list_for_each_entry(d, &r->mon_domains, hdr.list) { >> + if (d->hdr.id == dom_id) { >> + found = 1; >> + break; >> + } >> + } >> + >> + if (!found) { >> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id); >> + return -EINVAL; >> + } >> + >> +check_state: >> + mon_state = rdtgroup_str_to_mon_state(dom_str); >> + >> + if (mon_state == ASSIGN_INVALID) { >> + rdt_last_cmd_puts("Invalid assign flag\n"); >> + goto out_fail; >> + } >> + >> + assign_state = 0; >> + unassign_state = 0; >> + >> + switch (op) { >> + case '+': >> + if (mon_state == ASSIGN_NONE) { >> + rdt_last_cmd_puts("Invalid assign opcode\n"); >> + goto out_fail; >> + } >> + assign_state = mon_state; >> + break; >> + case '-': >> + if (mon_state == ASSIGN_NONE) { >> + rdt_last_cmd_puts("Invalid assign opcode\n"); >> + goto out_fail; >> + } >> + unassign_state = mon_state; >> + break; >> + case '=': >> + assign_state = mon_state; >> + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state; >> + break; >> + default: >> + break; >> + } >> + >> + if (unassign_state & ASSIGN_TOTAL) { >> + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); >> + if (ret) >> + goto out_fail; >> + } >> + >> + if (unassign_state & ASSIGN_LOCAL) { >> + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); >> + if (ret) >> + goto out_fail; >> + } >> + >> + if (assign_state & ASSIGN_TOTAL) { >> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); >> + if (ret) >> + goto out_fail; >> + } >> + >> + if (assign_state & ASSIGN_LOCAL) { >> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); >> + if (ret) >> + goto out_fail; >> + } >> + >> + goto next; >> + >> +out_fail: > > Is it possible to print a message to the command status to give some details about which > request failed? I am wondering about a scenario where a user changes multiple domains of > multiple groups, since the operation does not undo changes, it will fail without information > to user space about which setting triggered the failure and which settings succeeded. > This is similar to what is done when user attempts to move several tasks ... the error will > indicate which task triggered failure so that user space knows what completed successfully. Will add something like this on failure. rdt_last_cmd_printf("Total event assign failed on domain %d\n", dom_id); > >> + >> + return -EINVAL; >> +} >> + >> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of, >> + char *buf, size_t nbytes, loff_t off) >> +{ >> + struct rdt_resource *r = of->kn->parent->priv; >> + char *token, *cmon_grp, *mon_grp; >> + enum rdt_group_type rtype; >> + int ret; >> + >> + /* Valid input requires a trailing newline */ >> + if (nbytes == 0 || buf[nbytes - 1] != '\n') >> + return -EINVAL; >> + >> + buf[nbytes - 1] = '\0'; >> + >> + cpus_read_lock(); >> + mutex_lock(&rdtgroup_mutex); >> + >> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { >> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n"); > > Writing to last_cmd_status_buf here ... Sure. > >> + mutex_unlock(&rdtgroup_mutex); >> + cpus_read_unlock(); >> + return -EINVAL; >> + } >> + >> + rdt_last_cmd_clear(); > > ... but initializing buffer here. > Sidenote: This was an issue before. If you receive comments about > items in patches, please do check if those comments apply to other patches also. Missed it. > >> + >> + while ((token = strsep(&buf, "\n")) != NULL) { >> + if (strstr(token, "/")) { >> + /* >> + * The write command follows the following format: >> + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>” >> + * Extract the CTRL_MON group. >> + */ >> + cmon_grp = strsep(&token, "/"); >> + >> + /* >> + * Extract the MON_GROUP. >> + * strsep returns empty string for contiguous delimiters. >> + * Empty mon_grp here means it is a RDTCTRL_GROUP. >> + */ >> + mon_grp = strsep(&token, "/"); >> + >> + if (*mon_grp == '\0') >> + rtype = RDTCTRL_GROUP; >> + else >> + rtype = RDTMON_GROUP; >> + >> + ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token); >> + if (ret) >> + break; >> + } >> + } >> + >> + mutex_unlock(&rdtgroup_mutex); >> + cpus_read_unlock(); >> + >> + return ret ?: nbytes; >> +} >> + >> #ifdef CONFIG_PROC_CPU_RESCTRL >> >> /* >> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = { >> }, >> { >> .name = "mbm_assign_control", >> - .mode = 0444, >> + .mode = 0644, >> .kf_ops = &rdtgroup_kf_single_ops, >> .seq_show = rdtgroup_mbm_assign_control_show, >> + .write = rdtgroup_mbm_assign_control_write, >> }, >> { >> .name = "cpus_list", > > On a high level this looks ok but this code needs to be more robust. This will parse > data from user space that may include all kinds of input ... think malicious user or > a buggy script. I am not able to test this code but I tried to work through what will > happen under some wrong input and found some issues. For example, if user space provides > input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will > result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for > this invalid input. A more severe example is with input like '//0=\n', from what I can tell > this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat > this as ASSIGN_NONE and proceed as if user provided '//0=_'. > This was just some scenarios with basic input that could be typos, no real stress tests. > I stopped here though since I believe it is already clear this needs to be more robust. > Please do test this interface by exercising it with invalid input and corner cases. Agree. But, tested the cases you mentioned above. It seems to handle as expected. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl; #echo '//\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control bash: echo: write error: Invalid argument # cat /sys/fs/resctrl/info/last_cmd_status Missing operation =, +, - character #echo '//0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control bash: echo: write error: Invalid argument #cat /sys/fs/resctrl/info/last_cmd_status Invalid assign flag #echo '/0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control bash: echo: write error: Invalid argument # cat /sys/fs/resctrl/info/last_cmd_status Not a valid resctrl group The assign state did not change. #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl; Sure. will test some more combinations to be sure. -- Thanks Babu Moger
Hi Babu, On 10/21/24 10:04 AM, Moger, Babu wrote: > On 10/15/24 22:43, Reinette Chatre wrote: >> On 10/9/24 10:39 AM, Babu Moger wrote: >>> +static int rdtgroup_process_flags(struct rdt_resource *r, >>> + enum rdt_group_type rtype, >>> + char *p_grp, char *c_grp, char *tok) >>> +{ >>> + int op, mon_state, assign_state, unassign_state; >>> + char *dom_str, *id_str, *op_str; >>> + struct rdt_mon_domain *d; >>> + struct rdtgroup *rdtgrp; >>> + unsigned long dom_id; >>> + int ret, found = 0; >>> + >>> + rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp); >>> + >>> + if (!rdtgrp) { >>> + rdt_last_cmd_puts("Not a valid resctrl group\n"); >>> + return -EINVAL; >>> + } >>> + >>> +next: >>> + if (!tok || tok[0] == '\0') >>> + return 0; >>> + >>> + /* Start processing the strings for each domain */ >>> + dom_str = strim(strsep(&tok, ";")); >>> + >>> + op_str = strpbrk(dom_str, "=+-"); >>> + >>> + if (op_str) { >>> + op = *op_str; >>> + } else { >>> + rdt_last_cmd_puts("Missing operation =, +, - character\n"); >>> + return -EINVAL; >>> + } >>> + >>> + id_str = strsep(&dom_str, "=+-"); >>> + >>> + /* Check for domain id '*' which means all domains */ >>> + if (id_str && *id_str == '*') { >>> + d = NULL; >>> + goto check_state; >>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) { >>> + rdt_last_cmd_puts("Missing domain id\n"); >>> + return -EINVAL; >>> + } >>> + >>> + /* Verify if the dom_id is valid */ >>> + list_for_each_entry(d, &r->mon_domains, hdr.list) { >>> + if (d->hdr.id == dom_id) { >>> + found = 1; >>> + break; >>> + } >>> + } >>> + >>> + if (!found) { >>> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id); >>> + return -EINVAL; >>> + } >>> + >>> +check_state: >>> + mon_state = rdtgroup_str_to_mon_state(dom_str); >>> + >>> + if (mon_state == ASSIGN_INVALID) { >>> + rdt_last_cmd_puts("Invalid assign flag\n"); >>> + goto out_fail; >>> + } >>> + >>> + assign_state = 0; >>> + unassign_state = 0; >>> + >>> + switch (op) { >>> + case '+': >>> + if (mon_state == ASSIGN_NONE) { >>> + rdt_last_cmd_puts("Invalid assign opcode\n"); >>> + goto out_fail; >>> + } >>> + assign_state = mon_state; >>> + break; >>> + case '-': >>> + if (mon_state == ASSIGN_NONE) { >>> + rdt_last_cmd_puts("Invalid assign opcode\n"); >>> + goto out_fail; >>> + } >>> + unassign_state = mon_state; >>> + break; >>> + case '=': >>> + assign_state = mon_state; >>> + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state; >>> + break; >>> + default: >>> + break; >>> + } >>> + >>> + if (unassign_state & ASSIGN_TOTAL) { >>> + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); >>> + if (ret) >>> + goto out_fail; >>> + } >>> + >>> + if (unassign_state & ASSIGN_LOCAL) { >>> + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); >>> + if (ret) >>> + goto out_fail; >>> + } >>> + >>> + if (assign_state & ASSIGN_TOTAL) { >>> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); >>> + if (ret) >>> + goto out_fail; >>> + } >>> + >>> + if (assign_state & ASSIGN_LOCAL) { >>> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); >>> + if (ret) >>> + goto out_fail; >>> + } >>> + >>> + goto next; >>> + >>> +out_fail: >> >> Is it possible to print a message to the command status to give some details about which >> request failed? I am wondering about a scenario where a user changes multiple domains of >> multiple groups, since the operation does not undo changes, it will fail without information >> to user space about which setting triggered the failure and which settings succeeded. >> This is similar to what is done when user attempts to move several tasks ... the error will >> indicate which task triggered failure so that user space knows what completed successfully. > > Will add something like this on failure. > > rdt_last_cmd_printf("Total event assign failed on domain %d\n", dom_id); The user may provide changes for several groups in a single write. Could the CTRL_MON and MON group names also be printed? It is not clear to me if it will be easier to print the flags the user provides or verbose text that the flags translate to, that is "t" vs "Total event". >>> + >>> + return -EINVAL; >>> +} >>> + >>> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of, >>> + char *buf, size_t nbytes, loff_t off) >>> +{ >>> + struct rdt_resource *r = of->kn->parent->priv; >>> + char *token, *cmon_grp, *mon_grp; >>> + enum rdt_group_type rtype; >>> + int ret; >>> + >>> + /* Valid input requires a trailing newline */ >>> + if (nbytes == 0 || buf[nbytes - 1] != '\n') >>> + return -EINVAL; >>> + >>> + buf[nbytes - 1] = '\0'; >>> + >>> + cpus_read_lock(); >>> + mutex_lock(&rdtgroup_mutex); >>> + >>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { >>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n"); >> >> Writing to last_cmd_status_buf here ... > > Sure. > >> >>> + mutex_unlock(&rdtgroup_mutex); >>> + cpus_read_unlock(); >>> + return -EINVAL; >>> + } >>> + >>> + rdt_last_cmd_clear(); >> >> ... but initializing buffer here. >> Sidenote: This was an issue before. If you receive comments about >> items in patches, please do check if those comments apply to other patches also. > > Missed it. > >> >>> + >>> + while ((token = strsep(&buf, "\n")) != NULL) { >>> + if (strstr(token, "/")) { What is the purpose of this strstr() call? >>> + /* >>> + * The write command follows the following format: >>> + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>” >>> + * Extract the CTRL_MON group. >>> + */ >>> + cmon_grp = strsep(&token, "/"); >>> + >>> + /* >>> + * Extract the MON_GROUP. >>> + * strsep returns empty string for contiguous delimiters. >>> + * Empty mon_grp here means it is a RDTCTRL_GROUP. >>> + */ >>> + mon_grp = strsep(&token, "/"); >>> + >>> + if (*mon_grp == '\0') >>> + rtype = RDTCTRL_GROUP; >>> + else >>> + rtype = RDTMON_GROUP; >>> + >>> + ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token); >>> + if (ret) >>> + break; >>> + } >>> + } >>> + >>> + mutex_unlock(&rdtgroup_mutex); >>> + cpus_read_unlock(); >>> + >>> + return ret ?: nbytes; >>> +} >>> + >>> #ifdef CONFIG_PROC_CPU_RESCTRL >>> >>> /* >>> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = { >>> }, >>> { >>> .name = "mbm_assign_control", >>> - .mode = 0444, >>> + .mode = 0644, >>> .kf_ops = &rdtgroup_kf_single_ops, >>> .seq_show = rdtgroup_mbm_assign_control_show, >>> + .write = rdtgroup_mbm_assign_control_write, >>> }, >>> { >>> .name = "cpus_list", >> >> On a high level this looks ok but this code needs to be more robust. This will parse >> data from user space that may include all kinds of input ... think malicious user or >> a buggy script. I am not able to test this code but I tried to work through what will >> happen under some wrong input and found some issues. For example, if user space provides >> input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will >> result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for >> this invalid input. A more severe example is with input like '//0=\n', from what I can tell >> this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat >> this as ASSIGN_NONE and proceed as if user provided '//0=_'. >> This was just some scenarios with basic input that could be typos, no real stress tests. >> I stopped here though since I believe it is already clear this needs to be more robust. >> Please do test this interface by exercising it with invalid input and corner cases. > > Agree. > > But, tested the cases you mentioned above. It seems to handle as expected. > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl; > > #echo '//\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > bash: echo: write error: Invalid argument > > # cat /sys/fs/resctrl/info/last_cmd_status > Missing operation =, +, - character > > > #echo '//0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > bash: echo: write error: Invalid argument > > #cat /sys/fs/resctrl/info/last_cmd_status > Invalid assign flag > > #echo '/0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > bash: echo: write error: Invalid argument > # cat /sys/fs/resctrl/info/last_cmd_status > Not a valid resctrl group > > > The assign state did not change. > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl; > > Sure. will test some more combinations to be sure. hmmm ... these are not quite the examples I shared since from what I can tell it adds a second \n that impacts the processing of string. Could you please try: # echo '//' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control and # echo '//0=' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control Reinette
Hi Reinette, On 10/21/2024 12:20 PM, Reinette Chatre wrote: > Hi Babu, > > On 10/21/24 10:04 AM, Moger, Babu wrote: >> On 10/15/24 22:43, Reinette Chatre wrote: >>> On 10/9/24 10:39 AM, Babu Moger wrote: > > >>>> +static int rdtgroup_process_flags(struct rdt_resource *r, >>>> + enum rdt_group_type rtype, >>>> + char *p_grp, char *c_grp, char *tok) >>>> +{ >>>> + int op, mon_state, assign_state, unassign_state; >>>> + char *dom_str, *id_str, *op_str; >>>> + struct rdt_mon_domain *d; >>>> + struct rdtgroup *rdtgrp; >>>> + unsigned long dom_id; >>>> + int ret, found = 0; >>>> + >>>> + rdtgrp = rdtgroup_find_grp_by_name(rtype, p_grp, c_grp); >>>> + >>>> + if (!rdtgrp) { >>>> + rdt_last_cmd_puts("Not a valid resctrl group\n"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> +next: >>>> + if (!tok || tok[0] == '\0') >>>> + return 0; >>>> + >>>> + /* Start processing the strings for each domain */ >>>> + dom_str = strim(strsep(&tok, ";")); >>>> + >>>> + op_str = strpbrk(dom_str, "=+-"); >>>> + >>>> + if (op_str) { >>>> + op = *op_str; >>>> + } else { >>>> + rdt_last_cmd_puts("Missing operation =, +, - character\n"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + id_str = strsep(&dom_str, "=+-"); >>>> + >>>> + /* Check for domain id '*' which means all domains */ >>>> + if (id_str && *id_str == '*') { >>>> + d = NULL; >>>> + goto check_state; >>>> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) { >>>> + rdt_last_cmd_puts("Missing domain id\n"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + /* Verify if the dom_id is valid */ >>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) { >>>> + if (d->hdr.id == dom_id) { >>>> + found = 1; >>>> + break; >>>> + } >>>> + } >>>> + >>>> + if (!found) { >>>> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id); >>>> + return -EINVAL; >>>> + } >>>> + >>>> +check_state: >>>> + mon_state = rdtgroup_str_to_mon_state(dom_str); >>>> + >>>> + if (mon_state == ASSIGN_INVALID) { >>>> + rdt_last_cmd_puts("Invalid assign flag\n"); >>>> + goto out_fail; >>>> + } >>>> + >>>> + assign_state = 0; >>>> + unassign_state = 0; >>>> + >>>> + switch (op) { >>>> + case '+': >>>> + if (mon_state == ASSIGN_NONE) { >>>> + rdt_last_cmd_puts("Invalid assign opcode\n"); >>>> + goto out_fail; >>>> + } >>>> + assign_state = mon_state; >>>> + break; >>>> + case '-': >>>> + if (mon_state == ASSIGN_NONE) { >>>> + rdt_last_cmd_puts("Invalid assign opcode\n"); >>>> + goto out_fail; >>>> + } >>>> + unassign_state = mon_state; >>>> + break; >>>> + case '=': >>>> + assign_state = mon_state; >>>> + unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state; >>>> + break; >>>> + default: >>>> + break; >>>> + } >>>> + >>>> + if (unassign_state & ASSIGN_TOTAL) { >>>> + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); >>>> + if (ret) >>>> + goto out_fail; >>>> + } >>>> + >>>> + if (unassign_state & ASSIGN_LOCAL) { >>>> + ret = rdtgroup_unassign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); >>>> + if (ret) >>>> + goto out_fail; >>>> + } >>>> + >>>> + if (assign_state & ASSIGN_TOTAL) { >>>> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_TOTAL_EVENT_ID); >>>> + if (ret) >>>> + goto out_fail; >>>> + } >>>> + >>>> + if (assign_state & ASSIGN_LOCAL) { >>>> + ret = rdtgroup_assign_cntr_event(r, rdtgrp, d, QOS_L3_MBM_LOCAL_EVENT_ID); >>>> + if (ret) >>>> + goto out_fail; >>>> + } >>>> + >>>> + goto next; >>>> + >>>> +out_fail: >>> >>> Is it possible to print a message to the command status to give some details about which >>> request failed? I am wondering about a scenario where a user changes multiple domains of >>> multiple groups, since the operation does not undo changes, it will fail without information >>> to user space about which setting triggered the failure and which settings succeeded. >>> This is similar to what is done when user attempts to move several tasks ... the error will >>> indicate which task triggered failure so that user space knows what completed successfully. >> >> Will add something like this on failure. >> >> rdt_last_cmd_printf("Total event assign failed on domain %d\n", dom_id); > > The user may provide changes for several groups in a single write. > Could the CTRL_MON and MON group names also be printed? It is not clear > to me if it will be easier to print the flags the user provides or verbose text > that the flags translate to, that is "t" vs "Total event". Yes. We can print generic messages with group names and flags "Assignment operation +-= failed on resctrl group ABC with flags = lt" > >>>> + >>>> + return -EINVAL; >>>> +} >>>> + >>>> +static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of, >>>> + char *buf, size_t nbytes, loff_t off) >>>> +{ >>>> + struct rdt_resource *r = of->kn->parent->priv; >>>> + char *token, *cmon_grp, *mon_grp; >>>> + enum rdt_group_type rtype; >>>> + int ret; >>>> + >>>> + /* Valid input requires a trailing newline */ >>>> + if (nbytes == 0 || buf[nbytes - 1] != '\n') >>>> + return -EINVAL; >>>> + >>>> + buf[nbytes - 1] = '\0'; >>>> + >>>> + cpus_read_lock(); >>>> + mutex_lock(&rdtgroup_mutex); >>>> + >>>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { >>>> + rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n"); >>> >>> Writing to last_cmd_status_buf here ... >> >> Sure. >> >>> >>>> + mutex_unlock(&rdtgroup_mutex); >>>> + cpus_read_unlock(); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + rdt_last_cmd_clear(); >>> >>> ... but initializing buffer here. >>> Sidenote: This was an issue before. If you receive comments about >>> items in patches, please do check if those comments apply to other patches also. >> >> Missed it. >> >>> >>>> + >>>> + while ((token = strsep(&buf, "\n")) != NULL) { >>>> + if (strstr(token, "/")) { > > What is the purpose of this strstr() call? This is a carry over for v6. Not required. Will remove. > >>>> + /* >>>> + * The write command follows the following format: >>>> + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>” >>>> + * Extract the CTRL_MON group. >>>> + */ >>>> + cmon_grp = strsep(&token, "/"); >>>> + >>>> + /* >>>> + * Extract the MON_GROUP. >>>> + * strsep returns empty string for contiguous delimiters. >>>> + * Empty mon_grp here means it is a RDTCTRL_GROUP. >>>> + */ >>>> + mon_grp = strsep(&token, "/"); >>>> + >>>> + if (*mon_grp == '\0') >>>> + rtype = RDTCTRL_GROUP; >>>> + else >>>> + rtype = RDTMON_GROUP; >>>> + >>>> + ret = rdtgroup_process_flags(r, rtype, cmon_grp, mon_grp, token); >>>> + if (ret) >>>> + break; >>>> + } >>>> + } >>>> + >>>> + mutex_unlock(&rdtgroup_mutex); >>>> + cpus_read_unlock(); >>>> + >>>> + return ret ?: nbytes; >>>> +} >>>> + >>>> #ifdef CONFIG_PROC_CPU_RESCTRL >>>> >>>> /* >>>> @@ -2328,9 +2558,10 @@ static struct rftype res_common_files[] = { >>>> }, >>>> { >>>> .name = "mbm_assign_control", >>>> - .mode = 0444, >>>> + .mode = 0644, >>>> .kf_ops = &rdtgroup_kf_single_ops, >>>> .seq_show = rdtgroup_mbm_assign_control_show, >>>> + .write = rdtgroup_mbm_assign_control_write, >>>> }, >>>> { >>>> .name = "cpus_list", >>> >>> On a high level this looks ok but this code needs to be more robust. This will parse >>> data from user space that may include all kinds of input ... think malicious user or >>> a buggy script. I am not able to test this code but I tried to work through what will >>> happen under some wrong input and found some issues. For example, if user space provides >>> input like '//\n' then rdtgroup_process_flags() will be called with token == NULL. This will >>> result in rdtgroup_process_flags() returning "success", but fortunately do nothing, for >>> this invalid input. A more severe example is with input like '//0=\n', from what I can tell >>> this will result in rdtgroup_str_to_mon_state() called with dom_str==NULL that will treat >>> this as ASSIGN_NONE and proceed as if user provided '//0=_'. >>> This was just some scenarios with basic input that could be typos, no real stress tests. >>> I stopped here though since I believe it is already clear this needs to be more robust. >>> Please do test this interface by exercising it with invalid input and corner cases. >> >> Agree. >> >> But, tested the cases you mentioned above. It seems to handle as expected. >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl; >> >> #echo '//\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> bash: echo: write error: Invalid argument >> >> # cat /sys/fs/resctrl/info/last_cmd_status >> Missing operation =, +, - character >> >> >> #echo '//0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> bash: echo: write error: Invalid argument >> >> #cat /sys/fs/resctrl/info/last_cmd_status >> Invalid assign flag >> >> #echo '/0=\n' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> bash: echo: write error: Invalid argument >> # cat /sys/fs/resctrl/info/last_cmd_status >> Not a valid resctrl group >> >> >> The assign state did not change. >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl; >> >> Sure. will test some more combinations to be sure. > > hmmm ... these are not quite the examples I shared since from what I can > tell it adds a second \n that impacts the processing of string. > > Could you please try: > # echo '//' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > and > # echo '//0=' > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > Yes. You are right. Above cases does not work as expected. This should fix. Will test more. diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 6095146e3ba4..cccce991d2d0 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1044,6 +1044,9 @@ static int rdtgroup_str_to_mon_state(char *flag) { int i, mon_state = ASSIGN_NONE; + if (!strlen(flag)) + return ASSIGN_INVALID; + for (i = 0; i < strlen(flag); i++) { switch (*(flag + i)) { case 't': Thanks - Babu Moger
© 2016 - 2024 Red Hat, Inc.