[PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file

Tony Luck posted 7 patches 3 weeks, 5 days ago
There is a newer version of this series
[PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Tony Luck 3 weeks, 5 days ago
A user can choose any of the memory bandwidth monitoring events
listed in /sys/fs/resctrl/info/L3_mon/mon_features independently
for each ctrl_mon group by writing to the "mba_MBps_event" file.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 +
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 46 +++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  1 +
 3 files changed, 49 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5f3438ca9e2b..35483c6615b6 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -609,6 +609,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
+ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off);
 int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
 				 struct seq_file *s, void *v);
 bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index b9ba419e5c88..fc5585dc688f 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -518,6 +518,52 @@ static int smp_mon_event_count(void *arg)
 	return 0;
 }
 
+ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+	buf[nbytes - 1] = '\0';
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
+	rdt_last_cmd_clear();
+
+	if (!strcmp(buf, "mbm_local_bytes")) {
+		if (is_mbm_local_enabled())
+			rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
+		else
+			ret = -ENXIO;
+	} else if (!strcmp(buf, "mbm_total_bytes")) {
+		if (is_mbm_total_enabled())
+			rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
+		else
+			ret = -ENXIO;
+	} else {
+		ret = -EINVAL;
+	}
+
+	switch (ret) {
+	case -ENXIO:
+		rdt_last_cmd_printf("Unsupported event id '%s'\n", buf);
+		break;
+	case -EINVAL:
+		rdt_last_cmd_printf("Unknown event id '%s'\n", buf);
+		break;
+	}
+
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
 				 struct seq_file *s, void *v)
 {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3ba81963e981..6fa501ef187f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1947,6 +1947,7 @@ static struct rftype res_common_files[] = {
 		.name		= "mba_MBps_event",
 		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_mba_mbps_event_write,
 		.seq_show	= rdtgroup_mba_mbps_event_show,
 	},
 	{
-- 
2.47.0
Re: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Reinette Chatre 1 week, 5 days ago
Hi Tony,

On 10/29/24 10:28 AM, Tony Luck wrote:
> A user can choose any of the memory bandwidth monitoring events
> listed in /sys/fs/resctrl/info/L3_mon/mon_features independently
> for each ctrl_mon group by writing to the "mba_MBps_event" file.

Please review the changelog based on tip requirements. Folks used to
tip customs may expect changelog to start with the context, while the
above reads like it describes current context it describes what this patch
makes possible without ever getting to description of the change itself.

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h    |  2 +
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 46 +++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  1 +
>  3 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5f3438ca9e2b..35483c6615b6 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -609,6 +609,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>  				char *buf, size_t nbytes, loff_t off);
>  int rdtgroup_schemata_show(struct kernfs_open_file *of,
>  			   struct seq_file *s, void *v);
> +ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
> +				      char *buf, size_t nbytes, loff_t off);
>  int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
>  				 struct seq_file *s, void *v);
>  bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index b9ba419e5c88..fc5585dc688f 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -518,6 +518,52 @@ static int smp_mon_event_count(void *arg)
>  	return 0;
>  }
>  
> +ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
> +				      char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdtgroup *rdtgrp;
> +	int ret = 0;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +	buf[nbytes - 1] = '\0';
> +
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		rdtgroup_kn_unlock(of->kn);
> +		return -ENOENT;
> +	}
> +	rdt_last_cmd_clear();
> +
> +	if (!strcmp(buf, "mbm_local_bytes")) {
> +		if (is_mbm_local_enabled())
> +			rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
> +		else
> +			ret = -ENXIO;
> +	} else if (!strcmp(buf, "mbm_total_bytes")) {
> +		if (is_mbm_total_enabled())
> +			rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
> +		else
> +			ret = -ENXIO;
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	switch (ret) {
> +	case -ENXIO:
> +		rdt_last_cmd_printf("Unsupported event id '%s'\n", buf);
> +		break;
> +	case -EINVAL:
> +		rdt_last_cmd_printf("Unknown event id '%s'\n", buf);
> +		break;
> +	}

nit: What is the benefit of distinguishing between these cases in messaging to
user space? I am thinking of the scenario where user may write "llc_occupancy",
this will print "Unknown event id", which is technically not correct since it is
"known", it is just not "supported". Perhaps the default error message can just
always be "Unsupported"?

> +
> +	rdtgroup_kn_unlock(of->kn);
> +
> +	return ret ?: nbytes;
> +}
> +
>  int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
>  				 struct seq_file *s, void *v)
>  {
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 3ba81963e981..6fa501ef187f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1947,6 +1947,7 @@ static struct rftype res_common_files[] = {
>  		.name		= "mba_MBps_event",
>  		.mode		= 0644,

Writable bits can be set in this commit.

>  		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.write		= rdtgroup_mba_mbps_event_write,
>  		.seq_show	= rdtgroup_mba_mbps_event_show,
>  	},
>  	{


Reinette
Re: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Fenghua Yu 3 weeks, 2 days ago
Hi, Tony,

On 10/29/24 10:28, Tony Luck wrote:
> A user can choose any of the memory bandwidth monitoring events
> listed in /sys/fs/resctrl/info/L3_mon/mon_features independently
> for each ctrl_mon group by writing to the "mba_MBps_event" file.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h    |  2 +
>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 46 +++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  1 +
>   3 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5f3438ca9e2b..35483c6615b6 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -609,6 +609,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>   				char *buf, size_t nbytes, loff_t off);
>   int rdtgroup_schemata_show(struct kernfs_open_file *of,
>   			   struct seq_file *s, void *v);
> +ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
> +				      char *buf, size_t nbytes, loff_t off);
>   int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
>   				 struct seq_file *s, void *v);
>   bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index b9ba419e5c88..fc5585dc688f 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -518,6 +518,52 @@ static int smp_mon_event_count(void *arg)
>   	return 0;
>   }
>   
> +ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
> +				      char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdtgroup *rdtgrp;
> +	int ret = 0;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +	buf[nbytes - 1] = '\0';
> +
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		rdtgroup_kn_unlock(of->kn);
> +		return -ENOENT;
> +	}
> +	rdt_last_cmd_clear();
> +
> +	if (!strcmp(buf, "mbm_local_bytes")) {
> +		if (is_mbm_local_enabled())
> +			rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
> +		else
> +			ret = -ENXIO;
> +	} else if (!strcmp(buf, "mbm_total_bytes")) {
> +		if (is_mbm_total_enabled())
> +			rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;


User may think each time toggling the local/total event will effect MBA. 
And they may create usage case like frequently changing the events to 
maintain/adjust both total and local within bw boundary.

But toggling mba_mbps_event faster than 1sec doesn't have any effect on 
MBA SC because MBA SC is called every one second.

Maybe need to add a ratelimit of 1 second on calling this function? And 
adding info in the document that toggling speed should be slower than 1 
second?

> +		else
> +			ret = -ENXIO;
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	switch (ret) {
> +	case -ENXIO:
> +		rdt_last_cmd_printf("Unsupported event id '%s'\n", buf);
> +		break;
> +	case -EINVAL:
> +		rdt_last_cmd_printf("Unknown event id '%s'\n", buf);
> +		break;
> +	}
> +
> +	rdtgroup_kn_unlock(of->kn);
> +
> +	return ret ?: nbytes;
> +}
> +
>   int rdtgroup_mba_mbps_event_show(struct kernfs_open_file *of,
>   				 struct seq_file *s, void *v)
>   {
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 3ba81963e981..6fa501ef187f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1947,6 +1947,7 @@ static struct rftype res_common_files[] = {
>   		.name		= "mba_MBps_event",
>   		.mode		= 0644,
>   		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.write		= rdtgroup_mba_mbps_event_write,
>   		.seq_show	= rdtgroup_mba_mbps_event_show,
>   	},
>   	{

Thanks.

-Fenghua
RE: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Luck, Tony 3 weeks, 2 days ago
> > +   if (!strcmp(buf, "mbm_local_bytes")) {
> > +           if (is_mbm_local_enabled())
> > +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
> > +           else
> > +                   ret = -ENXIO;
> > +   } else if (!strcmp(buf, "mbm_total_bytes")) {
> > +           if (is_mbm_total_enabled())
> > +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>
>
> User may think each time toggling the local/total event will effect MBA.
> And they may create usage case like frequently changing the events to
> maintain/adjust both total and local within bw boundary.
>
> But toggling mba_mbps_event faster than 1sec doesn't have any effect on
> MBA SC because MBA SC is called every one second.
>
> Maybe need to add a ratelimit of 1 second on calling this function? And
> adding info in the document that toggling speed should be slower than 1
> second?

The limit would need to be per ctrl_mon group, not on calls to this function.
It's perfectly ok to switch multiple groups in a short interval.

I'm not sure how to rate limit here. I could add a delay so that the write()
call blocks until enough time passes before making the change. But
what should I do if a user submits more writes to the file? Queue them
all and apply at one second intervals?

Maybe it would be better to just to add some additional text to the
documentation pointing out that resctrl only checks bandwidth once
per second to make throttling adjustments. So changes to the event
will only have effect after some seconds have passed?

-Tony
Re: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Fenghua Yu 3 weeks, 2 days ago
Hi, Tony,

On 11/1/24 16:55, Luck, Tony wrote:
>>> +   if (!strcmp(buf, "mbm_local_bytes")) {
>>> +           if (is_mbm_local_enabled())
>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
>>> +           else
>>> +                   ret = -ENXIO;
>>> +   } else if (!strcmp(buf, "mbm_total_bytes")) {
>>> +           if (is_mbm_total_enabled())
>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>
>>
>> User may think each time toggling the local/total event will effect MBA.
>> And they may create usage case like frequently changing the events to
>> maintain/adjust both total and local within bw boundary.
>>
>> But toggling mba_mbps_event faster than 1sec doesn't have any effect on
>> MBA SC because MBA SC is called every one second.
>>
>> Maybe need to add a ratelimit of 1 second on calling this function? And
>> adding info in the document that toggling speed should be slower than 1
>> second?
> 
> The limit would need to be per ctrl_mon group, not on calls to this function.
> It's perfectly ok to switch multiple groups in a short interval.

Agree.

> 
> I'm not sure how to rate limit here. I could add a delay so that the write()
> call blocks until enough time passes before making the change. But
> what should I do if a user submits more writes to the file? Queue them
> all and apply at one second intervals?

Maybe define "mba_mbps_last_time" in rdtgroup. Then

if (time_before(jiffies, rdtgrp->mba_mbps_last_time + HZ) {
	rdt_last_cmd_printf("Too fast (>1/s) mba_MBps event change)\n");
         rdtgroup_kn_unlock(of->kn);
	return -EAGAIN;
}
rdtgrp->mba_mbps_last_time = jiffies;

> 
> Maybe it would be better to just to add some additional text to the
> documentation pointing out that resctrl only checks bandwidth once
> per second to make throttling adjustments. So changes to the event
> will only have effect after some seconds have passed?


Add additional text would be great.

Thanks.

-Fenghua
Re: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Reinette Chatre 1 week, 5 days ago
Hi Fenghua and Tony,

On 11/1/24 5:57 PM, Fenghua Yu wrote:
> Hi, Tony,
> 
> On 11/1/24 16:55, Luck, Tony wrote:
>>>> +   if (!strcmp(buf, "mbm_local_bytes")) {
>>>> +           if (is_mbm_local_enabled())
>>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
>>>> +           else
>>>> +                   ret = -ENXIO;
>>>> +   } else if (!strcmp(buf, "mbm_total_bytes")) {
>>>> +           if (is_mbm_total_enabled())
>>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>>
>>>
>>> User may think each time toggling the local/total event will effect MBA.
>>> And they may create usage case like frequently changing the events to
>>> maintain/adjust both total and local within bw boundary.
>>>
>>> But toggling mba_mbps_event faster than 1sec doesn't have any effect on
>>> MBA SC because MBA SC is called every one second.
>>>
>>> Maybe need to add a ratelimit of 1 second on calling this function? And
>>> adding info in the document that toggling speed should be slower than 1
>>> second?
>>
>> The limit would need to be per ctrl_mon group, not on calls to this function.
>> It's perfectly ok to switch multiple groups in a short interval.
> 
> Agree.
> 
>>
>> I'm not sure how to rate limit here. I could add a delay so that the write()
>> call blocks until enough time passes before making the change. But
>> what should I do if a user submits more writes to the file? Queue them
>> all and apply at one second intervals?
> 
> Maybe define "mba_mbps_last_time" in rdtgroup. Then
> 
> if (time_before(jiffies, rdtgrp->mba_mbps_last_time + HZ) {
>     rdt_last_cmd_printf("Too fast (>1/s) mba_MBps event change)\n");
>         rdtgroup_kn_unlock(of->kn);
>     return -EAGAIN;
> }
> rdtgrp->mba_mbps_last_time = jiffies;

This seems like enforcing an unnecessary limitation. For example, this would mean
that user space needs to wait for a second to undo a change to the monitoring event
in case there was a mistake. This seems like an unnecessary restriction to me.

I am also afraid that there may be some corner cases where a write to the file and
the actual run of the overflow handler  (on every domain) may not be exactly HZ apart.

Bandwidth allocation remains to be adjusted in either direction with at most the bandwidth
granularity. A user attempting to toggle bandwidth event cannot expecting large
changes in allocated bandwidth even if the events differ significantly.

Surely we can explore more if we learn about a specific use case.

>> Maybe it would be better to just to add some additional text to the
>> documentation pointing out that resctrl only checks bandwidth once
>> per second to make throttling adjustments. So changes to the event
>> will only have effect after some seconds have passed?
> 
> 
> Add additional text would be great.


Agreed.

Reinette
RE: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Luck, Tony 1 week, 5 days ago
> >>>> +   if (!strcmp(buf, "mbm_local_bytes")) {
> >>>> +           if (is_mbm_local_enabled())
> >>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
> >>>> +           else
> >>>> +                   ret = -ENXIO;
> >>>> +   } else if (!strcmp(buf, "mbm_total_bytes")) {
> >>>> +           if (is_mbm_total_enabled())
> >>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
> >>>
> >>>
> >>> User may think each time toggling the local/total event will effect MBA.
> >>> And they may create usage case like frequently changing the events to
> >>> maintain/adjust both total and local within bw boundary.
> >>>
> >>> But toggling mba_mbps_event faster than 1sec doesn't have any effect on
> >>> MBA SC because MBA SC is called every one second.
> >>>
> >>> Maybe need to add a ratelimit of 1 second on calling this function? And
> >>> adding info in the document that toggling speed should be slower than 1
> >>> second?
> >>
> >> The limit would need to be per ctrl_mon group, not on calls to this function.
> >> It's perfectly ok to switch multiple groups in a short interval.
> >
> > Agree.
> >
> >>
> >> I'm not sure how to rate limit here. I could add a delay so that the write()
> >> call blocks until enough time passes before making the change. But
> >> what should I do if a user submits more writes to the file? Queue them
> >> all and apply at one second intervals?
> >
> > Maybe define "mba_mbps_last_time" in rdtgroup. Then
> >
> > if (time_before(jiffies, rdtgrp->mba_mbps_last_time + HZ) {
> >     rdt_last_cmd_printf("Too fast (>1/s) mba_MBps event change)\n");
> >         rdtgroup_kn_unlock(of->kn);
> >     return -EAGAIN;
> > }
> > rdtgrp->mba_mbps_last_time = jiffies;
>
> This seems like enforcing an unnecessary limitation. For example, this would mean
> that user space needs to wait for a second to undo a change to the monitoring event
> in case there was a mistake. This seems like an unnecessary restriction to me.
>
> I am also afraid that there may be some corner cases where a write to the file and
> the actual run of the overflow handler  (on every domain) may not be exactly HZ apart.
>
> Bandwidth allocation remains to be adjusted in either direction with at most the bandwidth
> granularity. A user attempting to toggle bandwidth event cannot expecting large
> changes in allocated bandwidth even if the events differ significantly.
>
> Surely we can explore more if we learn about a specific use case.

Note that the kernel generally doesn't prevent the user from doing inane things
that do not cause damage.  E.g. a user can already abuse the legacy memory
percentage bandwidth controls with

while :
do
	echo "MB:0=100;1=10" > schemata
	sleep 0.1
	echo "MB:0=10;1=100" > schemata
	sleep 0.1
done

Similar abuse of the percentage mode is also possible.

>
> >> Maybe it would be better to just to add some additional text to the
> >> documentation pointing out that resctrl only checks bandwidth once
> >> per second to make throttling adjustments. So changes to the event
> >> will only have effect after some seconds have passed?
> >
> >
> > Add additional text would be great.
>
>
> Agreed.

We hadn't previously documented the rate at which mba_sc measured and adjusted
the memory bandwidth allocation controls. Once per second is currently an implementation
detail that in theory could be changed in the future.

I'm reluctant to carve that value into the stone of resctrl.rst, But without a specific
value "don't change the values too rapidly" is meaningless.

-Tony
Re: [PATCH v8 6/7] x86/resctrl: Add write option to "mba_MBps_event" file
Posted by Reinette Chatre 1 week, 5 days ago
Hi Tony,

On 11/12/24 3:57 PM, Luck, Tony wrote:
>>>>>> +   if (!strcmp(buf, "mbm_local_bytes")) {
>>>>>> +           if (is_mbm_local_enabled())
>>>>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
>>>>>> +           else
>>>>>> +                   ret = -ENXIO;
>>>>>> +   } else if (!strcmp(buf, "mbm_total_bytes")) {
>>>>>> +           if (is_mbm_total_enabled())
>>>>>> +                   rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>>>>
>>>>>
>>>>> User may think each time toggling the local/total event will effect MBA.
>>>>> And they may create usage case like frequently changing the events to
>>>>> maintain/adjust both total and local within bw boundary.
>>>>>
>>>>> But toggling mba_mbps_event faster than 1sec doesn't have any effect on
>>>>> MBA SC because MBA SC is called every one second.
>>>>>
>>>>> Maybe need to add a ratelimit of 1 second on calling this function? And
>>>>> adding info in the document that toggling speed should be slower than 1
>>>>> second?
>>>>
>>>> The limit would need to be per ctrl_mon group, not on calls to this function.
>>>> It's perfectly ok to switch multiple groups in a short interval.
>>>
>>> Agree.
>>>
>>>>
>>>> I'm not sure how to rate limit here. I could add a delay so that the write()
>>>> call blocks until enough time passes before making the change. But
>>>> what should I do if a user submits more writes to the file? Queue them
>>>> all and apply at one second intervals?
>>>
>>> Maybe define "mba_mbps_last_time" in rdtgroup. Then
>>>
>>> if (time_before(jiffies, rdtgrp->mba_mbps_last_time + HZ) {
>>>     rdt_last_cmd_printf("Too fast (>1/s) mba_MBps event change)\n");
>>>         rdtgroup_kn_unlock(of->kn);
>>>     return -EAGAIN;
>>> }
>>> rdtgrp->mba_mbps_last_time = jiffies;
>>
>> This seems like enforcing an unnecessary limitation. For example, this would mean
>> that user space needs to wait for a second to undo a change to the monitoring event
>> in case there was a mistake. This seems like an unnecessary restriction to me.
>>
>> I am also afraid that there may be some corner cases where a write to the file and
>> the actual run of the overflow handler  (on every domain) may not be exactly HZ apart.
>>
>> Bandwidth allocation remains to be adjusted in either direction with at most the bandwidth
>> granularity. A user attempting to toggle bandwidth event cannot expecting large
>> changes in allocated bandwidth even if the events differ significantly.
>>
>> Surely we can explore more if we learn about a specific use case.
> 
> Note that the kernel generally doesn't prevent the user from doing inane things
> that do not cause damage.  E.g. a user can already abuse the legacy memory
> percentage bandwidth controls with
> 
> while :
> do
> 	echo "MB:0=100;1=10" > schemata
> 	sleep 0.1
> 	echo "MB:0=10;1=100" > schemata
> 	sleep 0.1
> done
> 
> Similar abuse of the percentage mode is also possible.

Good point.

At least with the percentage modes the MBA allocation is promptly written to
hardware.

Related to issue at hand the user is also already able to quickly switch the MBps
value via quick successive writes to the schemata and those changes are already
invisible to the software controller that only consumes user's MBps once per second.

>>>> Maybe it would be better to just to add some additional text to the
>>>> documentation pointing out that resctrl only checks bandwidth once
>>>> per second to make throttling adjustments. So changes to the event
>>>> will only have effect after some seconds have passed?
>>>
>>>
>>> Add additional text would be great.
>>
>>
>> Agreed.
> 
> We hadn't previously documented the rate at which mba_sc measured and adjusted
> the memory bandwidth allocation controls. Once per second is currently an implementation
> detail that in theory could be changed in the future.
> 
> I'm reluctant to carve that value into the stone of resctrl.rst, But without a specific
> value "don't change the values too rapidly" is meaningless.

I agree that one second should not be part of documentation, a higher level generic "interval"
can be used instead. But ... since you pointed out that writing to schemata also has this
caveat any documentation should ideally address that also. With existing documentation seemingly
sufficient in that regard I'll also be ok with keeping it as high level with this new event
support.

Reinette