[v5] x86/resctrl telemetry monitoring

[PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Tony Luck 8 months, 3 weeks ago

Resctrl was written with the assumption that all monitor events
can be displayed as unsigned decimal integers.

Hardware architecture counters may provide some telemetry events with
greater precision where the event is not a simple count, but is a
measurement of some sort (e.g. Joules for energy consumed).

Add a new argument to resctrl_enable_mon_event() for architecture
code to inform the file system that the value for a counter is
a fixed-point value with a specific number of binary places.

Fixed point values are displayed with values rounded to an
appropriate number of decimal places.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  4 +-
 fs/resctrl/internal.h              |  2 +
 arch/x86/kernel/cpu/resctrl/core.c |  6 +--
 fs/resctrl/ctrlmondata.c           | 75 +++++++++++++++++++++++++++++-
 fs/resctrl/monitor.c               |  5 +-
 5 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9aab3d78005a..46ba62ee94a1 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -377,7 +377,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
-void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu);
+#define MAX_BINARY_BITS	27
+
+void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu, u32 binary_bits);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id evt);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index eb6e92d1ab15..d5045491790e 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -58,6 +58,7 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  * @name:		name of the event
  * @configurable:	true if the event is configurable
  * @any_cpu:		true if the event can be read from any CPU
+ * @binary_bits:	number of fixed-point binary bits from architecture
  * @enabled:		true if the event is enabled
  */
 struct mon_evt {
@@ -66,6 +67,7 @@ struct mon_evt {
 	char			*name;
 	bool			configurable;
 	bool			any_cpu;
+	int			binary_bits;
 	bool			enabled;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5d9a024ce4b0..306afb50fd37 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -880,15 +880,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
 		ret = true;
 	}
 
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 1337716f59c8..07bf44834a46 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -590,6 +590,77 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
 
+/**
+ * struct fixed_params - parameters to decode a binary fixed point value
+ * @mask:	Mask for fractional part of value.
+ * @lshift:	Shift to round-up binary places.
+ * @pow10:	Multiplier (10 ^ decimal places).
+ * @round:	Add to round up to nearest decimal representation.
+ * @rshift:	Shift back for final answer.
+ * @decplaces:	Number of decimal places for this number of binary places.
+ */
+struct fixed_params {
+	u64	mask;
+	int	lshift;
+	int	pow10;
+	u64	round;
+	int	rshift;
+	int	decplaces;
+};
+
+static struct fixed_params fixed_params[MAX_BINARY_BITS + 1] = {
+	[1]  = { GENMASK_ULL(1, 0),  0,         10, 0x00000000,  1,  1 },
+	[2]  = { GENMASK_ULL(2, 0),  0,        100, 0x00000000,  2,  2 },
+	[3]  = { GENMASK_ULL(3, 0),  0,       1000, 0x00000000,  3,  3 },
+	[4]  = { GENMASK_ULL(4, 0),  2,       1000, 0x00000020,  6,  3 },
+	[5]  = { GENMASK_ULL(5, 0),  1,       1000, 0x00000020,  6,  3 },
+	[6]  = { GENMASK_ULL(6, 0),  0,       1000, 0x00000020,  6,  3 },
+	[7]  = { GENMASK_ULL(7, 0),  2,       1000, 0x00000100,  9,  3 },
+	[8]  = { GENMASK_ULL(8, 0),  1,       1000, 0x00000100,  9,  3 },
+	[9]  = { GENMASK_ULL(9, 0),  0,       1000, 0x00000100,  9,  3 },
+	[10] = { GENMASK_ULL(10, 0), 2,      10000, 0x00000800, 12,  4 },
+	[11] = { GENMASK_ULL(11, 0), 1,      10000, 0x00000800, 12,  4 },
+	[12] = { GENMASK_ULL(12, 0), 0,      10000, 0x00000800, 12,  4 },
+	[13] = { GENMASK_ULL(13, 0), 2,     100000, 0x00004000, 15,  5 },
+	[14] = { GENMASK_ULL(14, 0), 1,     100000, 0x00004000, 15,  5 },
+	[15] = { GENMASK_ULL(15, 0), 0,     100000, 0x00004000, 15,  5 },
+	[16] = { GENMASK_ULL(16, 0), 2,    1000000, 0x00020000, 18,  6 },
+	[17] = { GENMASK_ULL(17, 0), 1,    1000000, 0x00020000, 18,  6 },
+	[18] = { GENMASK_ULL(18, 0), 0,    1000000, 0x00020000, 18,  6 },
+	[19] = { GENMASK_ULL(19, 0), 2,   10000000, 0x00100000, 21,  7 },
+	[20] = { GENMASK_ULL(20, 0), 1,   10000000, 0x00100000, 21,  7 },
+	[21] = { GENMASK_ULL(21, 0), 0,   10000000, 0x00100000, 21,  7 },
+	[22] = { GENMASK_ULL(22, 0), 2,  100000000, 0x00800000, 24,  8 },
+	[23] = { GENMASK_ULL(23, 0), 1,  100000000, 0x00800000, 24,  8 },
+	[24] = { GENMASK_ULL(24, 0), 0,  100000000, 0x00800000, 24,  8 },
+	[25] = { GENMASK_ULL(25, 0), 2, 1000000000, 0x04000000, 27,  9 },
+	[26] = { GENMASK_ULL(26, 0), 1, 1000000000, 0x04000000, 27,  9 },
+	[27] = { GENMASK_ULL(27, 0), 0, 1000000000, 0x04000000, 27,  9 }
+};
+
+static void print_event_value(struct seq_file *m, int binary_bits, u64 val)
+{
+	struct fixed_params *fp = &fixed_params[binary_bits];
+	unsigned long long frac;
+	char buf[10];
+
+	frac = val & fp->mask;
+	frac <<= fp->lshift;
+	frac *= fp->pow10;
+	frac += fp->round;
+	frac >>= fp->rshift;
+
+	sprintf(buf, "%0*llu", fp->decplaces, frac);
+
+	/* Trim trailing zeroes */
+	for (int i = fp->decplaces - 1; i > 0; i--) {
+		if (buf[i] != '0')
+			break;
+		buf[i] = '\0';
+	}
+	seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
+}
+
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
@@ -657,8 +728,10 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
-	else
+	else if (evt->binary_bits == 0)
 		seq_printf(m, "%llu\n", rr.val);
+	else
+		print_event_value(m, evt->binary_bits, rr.val);
 
 out:
 	rdtgroup_kn_unlock(of->kn);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index e6e3be990638..f554d7933739 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -878,9 +878,9 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	},
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu)
+void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu, u32 binary_bits)
 {
-	if (WARN_ON_ONCE(evtid >= QOS_NUM_EVENTS))
+	if (WARN_ON_ONCE(evtid >= QOS_NUM_EVENTS) || binary_bits > MAX_BINARY_BITS)
 		return;
 	if (mon_event_all[evtid].enabled) {
 		pr_warn("Duplicate enable for event %d\n", evtid);
@@ -888,6 +888,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu)
 	}
 
 	mon_event_all[evtid].any_cpu = any_cpu;
+	mon_event_all[evtid].binary_bits = binary_bits;
 	mon_event_all[evtid].enabled = true;
 }
 
-- 
2.49.0

Re: [PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Reinette Chatre 8 months, 1 week ago

Hi Tony,

On 5/21/25 3:50 PM, Tony Luck wrote:
> Resctrl was written with the assumption that all monitor events
> can be displayed as unsigned decimal integers.
> 
> Hardware architecture counters may provide some telemetry events with
> greater precision where the event is not a simple count, but is a
> measurement of some sort (e.g. Joules for energy consumed).
> 
> Add a new argument to resctrl_enable_mon_event() for architecture
> code to inform the file system that the value for a counter is
> a fixed-point value with a specific number of binary places.

resctrl fs contract with user space, per patch #29, is that only "core_energy"
and "activity" can be floating point. We do not want to make it possible for
an architecture to change this contract. Other events should not be able
to become floating point. I thus think there needs to be an extra setting that
indicates _if_ the architecture can specify a fraction.

> 
> Fixed point values are displayed with values rounded to an
> appropriate number of decimal places.

How are the "appropriate number of decimal places" determined?

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h            |  4 +-
>  fs/resctrl/internal.h              |  2 +
>  arch/x86/kernel/cpu/resctrl/core.c |  6 +--
>  fs/resctrl/ctrlmondata.c           | 75 +++++++++++++++++++++++++++++-
>  fs/resctrl/monitor.c               |  5 +-
>  5 files changed, 85 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 9aab3d78005a..46ba62ee94a1 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -377,7 +377,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>  u32 resctrl_arch_system_num_rmid_idx(void);
>  int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu);
> +#define MAX_BINARY_BITS	27
> +
> +void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu, u32 binary_bits);
>  
>  bool resctrl_is_mon_event_enabled(enum resctrl_event_id evt);
>  
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index eb6e92d1ab15..d5045491790e 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -58,6 +58,7 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>   * @name:		name of the event
>   * @configurable:	true if the event is configurable
>   * @any_cpu:		true if the event can be read from any CPU
> + * @binary_bits:	number of fixed-point binary bits from architecture
>   * @enabled:		true if the event is enabled
>   */
>  struct mon_evt {
> @@ -66,6 +67,7 @@ struct mon_evt {
>  	char			*name;
>  	bool			configurable;
>  	bool			any_cpu;
> +	int			binary_bits;
>  	bool			enabled;
>  };

Perhaps a new member "is_floating_point" can be hardcoded by resctrl fs and only
events that are floating point can have their precision set?

>  
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 5d9a024ce4b0..306afb50fd37 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -880,15 +880,15 @@ static __init bool get_rdt_mon_resources(void)
>  	bool ret = false;
>  
>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
> -		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);

We do not want architecture to be able to make these be floating point.

>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
>  		ret = true;
>  	}
>  
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 1337716f59c8..07bf44834a46 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -590,6 +590,77 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
>  }
>  
> +/**
> + * struct fixed_params - parameters to decode a binary fixed point value
> + * @mask:	Mask for fractional part of value.
> + * @lshift:	Shift to round-up binary places.
> + * @pow10:	Multiplier (10 ^ decimal places).
> + * @round:	Add to round up to nearest decimal representation.
> + * @rshift:	Shift back for final answer.
> + * @decplaces:	Number of decimal places for this number of binary places.
> + */
> +struct fixed_params {
> +	u64	mask;
> +	int	lshift;
> +	int	pow10;
> +	u64	round;
> +	int	rshift;
> +	int	decplaces;
> +};
> +
> +static struct fixed_params fixed_params[MAX_BINARY_BITS + 1] = {
> +	[1]  = { GENMASK_ULL(1, 0),  0,         10, 0x00000000,  1,  1 },
> +	[2]  = { GENMASK_ULL(2, 0),  0,        100, 0x00000000,  2,  2 },
> +	[3]  = { GENMASK_ULL(3, 0),  0,       1000, 0x00000000,  3,  3 },
> +	[4]  = { GENMASK_ULL(4, 0),  2,       1000, 0x00000020,  6,  3 },
> +	[5]  = { GENMASK_ULL(5, 0),  1,       1000, 0x00000020,  6,  3 },
> +	[6]  = { GENMASK_ULL(6, 0),  0,       1000, 0x00000020,  6,  3 },
> +	[7]  = { GENMASK_ULL(7, 0),  2,       1000, 0x00000100,  9,  3 },
> +	[8]  = { GENMASK_ULL(8, 0),  1,       1000, 0x00000100,  9,  3 },
> +	[9]  = { GENMASK_ULL(9, 0),  0,       1000, 0x00000100,  9,  3 },
> +	[10] = { GENMASK_ULL(10, 0), 2,      10000, 0x00000800, 12,  4 },
> +	[11] = { GENMASK_ULL(11, 0), 1,      10000, 0x00000800, 12,  4 },
> +	[12] = { GENMASK_ULL(12, 0), 0,      10000, 0x00000800, 12,  4 },
> +	[13] = { GENMASK_ULL(13, 0), 2,     100000, 0x00004000, 15,  5 },
> +	[14] = { GENMASK_ULL(14, 0), 1,     100000, 0x00004000, 15,  5 },
> +	[15] = { GENMASK_ULL(15, 0), 0,     100000, 0x00004000, 15,  5 },
> +	[16] = { GENMASK_ULL(16, 0), 2,    1000000, 0x00020000, 18,  6 },
> +	[17] = { GENMASK_ULL(17, 0), 1,    1000000, 0x00020000, 18,  6 },
> +	[18] = { GENMASK_ULL(18, 0), 0,    1000000, 0x00020000, 18,  6 },
> +	[19] = { GENMASK_ULL(19, 0), 2,   10000000, 0x00100000, 21,  7 },
> +	[20] = { GENMASK_ULL(20, 0), 1,   10000000, 0x00100000, 21,  7 },
> +	[21] = { GENMASK_ULL(21, 0), 0,   10000000, 0x00100000, 21,  7 },
> +	[22] = { GENMASK_ULL(22, 0), 2,  100000000, 0x00800000, 24,  8 },
> +	[23] = { GENMASK_ULL(23, 0), 1,  100000000, 0x00800000, 24,  8 },
> +	[24] = { GENMASK_ULL(24, 0), 0,  100000000, 0x00800000, 24,  8 },
> +	[25] = { GENMASK_ULL(25, 0), 2, 1000000000, 0x04000000, 27,  9 },
> +	[26] = { GENMASK_ULL(26, 0), 1, 1000000000, 0x04000000, 27,  9 },
> +	[27] = { GENMASK_ULL(27, 0), 0, 1000000000, 0x04000000, 27,  9 }
> +};
> +
> +static void print_event_value(struct seq_file *m, int binary_bits, u64 val)
> +{
> +	struct fixed_params *fp = &fixed_params[binary_bits];
> +	unsigned long long frac;
> +	char buf[10];
> +
> +	frac = val & fp->mask;
> +	frac <<= fp->lshift;
> +	frac *= fp->pow10;
> +	frac += fp->round;
> +	frac >>= fp->rshift;
> +

Could you please document this algorithm? I wonder why lshift is necessary at all
and why rshift cannot just always be the fraction bits? Also note earlier question about
choice of decimal places.

> +	sprintf(buf, "%0*llu", fp->decplaces, frac);

I'm a bit confused here. I see fp->decplaces as the field width and the "0" indicates
that the value is zero padded on the _left_. I interpret this to mean that, for example,
if the value of frac is 42 then it will be printed as "0042". The fraction's value is modified
(it is printed as "0.0042") and there are no trailing zeroes to remove. What am I missing?

> +
> +	/* Trim trailing zeroes */
> +	for (int i = fp->decplaces - 1; i > 0; i--) {
> +		if (buf[i] != '0')
> +			break;
> +		buf[i] = '\0';
> +	}
> +	seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
> +}
> +
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  {
>  	struct kernfs_open_file *of = m->private;
> @@ -657,8 +728,10 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		seq_puts(m, "Error\n");
>  	else if (rr.err == -EINVAL)
>  		seq_puts(m, "Unavailable\n");
> -	else
> +	else if (evt->binary_bits == 0)
>  		seq_printf(m, "%llu\n", rr.val);
> +	else
> +		print_event_value(m, evt->binary_bits, rr.val);
>  
>  out:
>  	rdtgroup_kn_unlock(of->kn);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index e6e3be990638..f554d7933739 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -878,9 +878,9 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
>  	},
>  };
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu)
> +void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu, u32 binary_bits)
>  {
> -	if (WARN_ON_ONCE(evtid >= QOS_NUM_EVENTS))
> +	if (WARN_ON_ONCE(evtid >= QOS_NUM_EVENTS) || binary_bits > MAX_BINARY_BITS)
>  		return;
>  	if (mon_event_all[evtid].enabled) {
>  		pr_warn("Duplicate enable for event %d\n", evtid);
> @@ -888,6 +888,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id evtid, bool any_cpu)
>  	}
>  
>  	mon_event_all[evtid].any_cpu = any_cpu;
> +	mon_event_all[evtid].binary_bits = binary_bits;
>  	mon_event_all[evtid].enabled = true;
>  }
>  

Reinette

Re: [PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Luck, Tony 8 months, 1 week ago

On Tue, Jun 03, 2025 at 08:49:08PM -0700, Reinette Chatre wrote:
> > +	sprintf(buf, "%0*llu", fp->decplaces, frac);
> 
> I'm a bit confused here. I see fp->decplaces as the field width and the "0" indicates
> that the value is zero padded on the _left_. I interpret this to mean that, for example,
> if the value of frac is 42 then it will be printed as "0042". The fraction's value is modified
> (it is printed as "0.0042") and there are no trailing zeroes to remove. What am I missing?

An example may help. Suppose architecture is providing 18 binary place
numbers, and delivers the value 0x60000 to be displayed. With 18 binary
places filesystem chooses 6 decimal places (I'll document the rationale
for this choice in comments in next version). In binary the value looks
like this:

	integer	binary_places
	1	100000000000000000

Running through my algorithm will end with "frac" = 500000 (decimal).

Thus there are *trailing* zeroes. The value should be displayed as
"1.5" not as "1.500000".

-Tony

Re: [PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Reinette Chatre 8 months, 1 week ago

Hi Tony,

On 6/6/25 9:25 AM, Luck, Tony wrote:
> On Tue, Jun 03, 2025 at 08:49:08PM -0700, Reinette Chatre wrote:
>>> +	sprintf(buf, "%0*llu", fp->decplaces, frac);
>>
>> I'm a bit confused here. I see fp->decplaces as the field width and the "0" indicates
>> that the value is zero padded on the _left_. I interpret this to mean that, for example,
>> if the value of frac is 42 then it will be printed as "0042". The fraction's value is modified
>> (it is printed as "0.0042") and there are no trailing zeroes to remove. What am I missing?
> 
> An example may help. Suppose architecture is providing 18 binary place
> numbers, and delivers the value 0x60000 to be displayed. With 18 binary
> places filesystem chooses 6 decimal places (I'll document the rationale
> for this choice in comments in next version). In binary the value looks
> like this:
> 
> 	integer	binary_places
> 	1	100000000000000000
> 
> Running through my algorithm will end with "frac" = 500000 (decimal).
> 
> Thus there are *trailing* zeroes. The value should be displayed as
> "1.5" not as "1.500000".

Instead of a counter example, could you please make it obvious through
the algorithm description and/or explanation of decimal place choice how
"frac" is guaranteed to never be smaller than "decplaces"?

Reinette

Re: [PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Dave Martin 8 months ago

On Fri, Jun 06, 2025 at 09:56:25AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/6/25 9:25 AM, Luck, Tony wrote:
> > On Tue, Jun 03, 2025 at 08:49:08PM -0700, Reinette Chatre wrote:
> >>> +	sprintf(buf, "%0*llu", fp->decplaces, frac);
> >>
> >> I'm a bit confused here. I see fp->decplaces as the field width and the "0" indicates
> >> that the value is zero padded on the _left_. I interpret this to mean that, for example,
> >> if the value of frac is 42 then it will be printed as "0042". The fraction's value is modified
> >> (it is printed as "0.0042") and there are no trailing zeroes to remove. What am I missing?
> > 
> > An example may help. Suppose architecture is providing 18 binary place
> > numbers, and delivers the value 0x60000 to be displayed. With 18 binary
> > places filesystem chooses 6 decimal places (I'll document the rationale
> > for this choice in comments in next version). In binary the value looks
> > like this:
> > 
> > 	integer	binary_places
> > 	1	100000000000000000
> > 
> > Running through my algorithm will end with "frac" = 500000 (decimal).
> > 
> > Thus there are *trailing* zeroes. The value should be displayed as
> > "1.5" not as "1.500000".
> 
> Instead of a counter example, could you please make it obvious through
> the algorithm description and/or explanation of decimal place choice how
> "frac" is guaranteed to never be smaller than "decplaces"?
> 
> Reinette

Trying to circumvent this...

Why do these conversions need to be done in the kernel at all?

Can't we just tell userspace the scaling factor and expose the
parameter as an integer?

In your example, this above value would be exposed as

	0b110_0000_0000_0000_0000 / 0b100_0000_0000_0000_0000

(=	0x60000 / 0x40000)

This has the advantage that the data exchanged with userspace is exact,
(so far as the hardware permits, anyway) and there is no unnecessary
cost or complexity in the kernel.

Since userspace is probably some kind of scripting language, it can do
scaling conversions and pretty-print tables more straightforwardly
than the kernel can -- if it wants to.  But it can also work in the
native representation with no introduction of rounding errors, and do
conversions only when necessary rather than every time a value crosses
the user/kernel boundary...

Cheers
---Dave

RE: [PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Luck, Tony 8 months ago

> Trying to circumvent this...
>
> Why do these conversions need to be done in the kernel at all?
>
> Can't we just tell userspace the scaling factor and expose the
> parameter as an integer?
>
> In your example, this above value would be exposed as
>
>       0b110_0000_0000_0000_0000 / 0b100_0000_0000_0000_0000
>
> (=    0x60000 / 0x40000)
>
> This has the advantage that the data exchanged with userspace is exact,
> (so far as the hardware permits, anyway) and there is no unnecessary
> cost or complexity in the kernel.
>
> Since userspace is probably some kind of scripting language, it can do
> scaling conversions and pretty-print tables more straightforwardly
> than the kernel can -- if it wants to.  But it can also work in the
> native representation with no introduction of rounding errors, and do
> conversions only when necessary rather than every time a value crosses
> the user/kernel boundary...

It seems user hostile to print 8974832975 with some info file to explain that
the scaling factor is 262144. While it may be common to read using some
special tool, it make life harder for casual scripts.

Printing that value as 34236.270809 makes it simple for all tools.

The rounding error from the kernel is insignificant ("true" value is
34236.270809173583984375 ... so the error is around five parts
per trillion).

Things are worse sampling the Joule values once per-second to convert
to Watts. But even there the rounding errors from a 1-Watt workload
are tiny. Worst case you see 0.999999 followed by 2.000001 one second
later and report as 1.000002 Watts instead of 1.0

The error bars on the values computed by hardware are enormously
larger than this. Further compounded by the telemetry update rate
of 2 millliseconds. Errors from uncertainty in when the value was
captured are also orders of magnitude higher than kernel rounding
errors.

-Tony

Re: [PATCH v5 14/29] x86,fs/resctrl: Support binary fixed point event counters

Posted by Dave Martin 8 months ago

Hi,

On Tue, Jun 10, 2025 at 03:54:35PM +0000, Luck, Tony wrote:
> > Trying to circumvent this...
> >
> > Why do these conversions need to be done in the kernel at all?
> >
> > Can't we just tell userspace the scaling factor and expose the
> > parameter as an integer?
> >
> > In your example, this above value would be exposed as
> >
> >       0b110_0000_0000_0000_0000 / 0b100_0000_0000_0000_0000
> >
> > (=    0x60000 / 0x40000)
> >
> > This has the advantage that the data exchanged with userspace is exact,
> > (so far as the hardware permits, anyway) and there is no unnecessary
> > cost or complexity in the kernel.
> >
> > Since userspace is probably some kind of scripting language, it can do
> > scaling conversions and pretty-print tables more straightforwardly
> > than the kernel can -- if it wants to.  But it can also work in the
> > native representation with no introduction of rounding errors, and do
> > conversions only when necessary rather than every time a value crosses
> > the user/kernel boundary...
> 
> It seems user hostile to print 8974832975 with some info file to explain that
> the scaling factor is 262144. While it may be common to read using some
> special tool, it make life harder for casual scripts.
> 
> Printing that value as 34236.270809 makes it simple for all tools.

The divisor is going to be a power of two or a power of ten in
practice, and I think most technical users are fairly used to looking
at values scaled by those -- so I'm not convinced that this is quite as
bad as you suggest.

The choice of unit in the interface is still arbitrary, and the kernel
is already inconsistent with itself in this area, so I think userspace
is often going to have to do some scaling conversions anyway.

resctrl is not (necessarily) a user interface, but I agree that it is
no bad thing for make the output eyeball-friendly, so long is the cost
of doing so is reasonable.

(Plenty of virtual "text" files exported by the kernel are extremely
cryptic and user-hostile, though.)

> The rounding error from the kernel is insignificant ("true" value is
> 34236.270809173583984375 ... so the error is around five parts
> per trillion).
> 
> Things are worse sampling the Joule values once per-second to convert
> to Watts. But even there the rounding errors from a 1-Watt workload
> are tiny. Worst case you see 0.999999 followed by 2.000001 one second
> later and report as 1.000002 Watts instead of 1.0
> 
> The error bars on the values computed by hardware are enormously
> larger than this. Further compounded by the telemetry update rate
> of 2 millliseconds. Errors from uncertainty in when the value was
> captured are also orders of magnitude higher than kernel rounding
> errors.
> 
> -Tony

If we can make the intermediate interface error-free by construction
and without making life especially hard for anyone, then that means we
can bolt whatever on at each end without having to even think about the
effect on accuracy.

I agree though that the inaccuracies introduced by the interface will
be very marginal, and likely swamped by hardware limitations and timing
skid.

Either way, it's not my call...

Cheers
---Dave