selftests/cgroup: better bound for cpu.max tests

[PATCH 0/2] selftests/cgroup: better bound for cpu.max tests

Posted by Shashank Balaji 3 months, 1 week ago

cpu.max selftests (both the normal one and the nested one) test the
working of throttling by setting up cpu.max, running a cpu hog process
for a specified duration, and comparing usage_usec as reported by
cpu.stat with the duration of the cpu hog: they should be far enough.

Currently, this is done by using values_close, which has two problems:

1. Semantic: values_close is used with an error percentage of 95%, which
   one will not expect on seeing "values close". The intent it's
actually going for is "values far".

2. Accuracy: the tests can pass even if usage_usec is upto around double
   the expected amount. That's too high of a margin for usage_usec.

Overall, this patchset improves the readability and accuracy of the
cpu.max tests.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
---
Shashank Balaji (2):
      selftests/cgroup: rename `expected` to `duration` in cpu.max tests
      selftests/cgroup: better bound in cpu.max tests

 tools/testing/selftests/cgroup/test_cpu.c | 42 ++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 18 deletions(-)
---
base-commit: 66701750d5565c574af42bef0b789ce0203e3071
change-id: 20250227-kselftest-cgroup-fix-cpu-max-56619928e99b

Best regards,
-- 
Shashank Balaji <shashank.mahadasyam@sony.com>

Re: [PATCH 0/2] selftests/cgroup: better bound for cpu.max tests

Posted by Michal Koutný 3 months, 1 week ago

Hello Shashank.

On Tue, Jul 01, 2025 at 11:13:54PM +0900, Shashank Balaji <shashank.mahadasyam@sony.com> wrote:
> cpu.max selftests (both the normal one and the nested one) test the
> working of throttling by setting up cpu.max, running a cpu hog process
> for a specified duration, and comparing usage_usec as reported by
> cpu.stat with the duration of the cpu hog: they should be far enough.
> 
> Currently, this is done by using values_close, which has two problems:
> 
> 1. Semantic: values_close is used with an error percentage of 95%, which
>    one will not expect on seeing "values close". The intent it's
> actually going for is "values far".
> 
> 2. Accuracy: the tests can pass even if usage_usec is upto around double
>    the expected amount. That's too high of a margin for usage_usec.
> 
> Overall, this patchset improves the readability and accuracy of the
> cpu.max tests.
> 
> Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>

I think you're getting at an actual bug in the test definition. 

I think that the test_cpucg_max should either run hog_cpus_timed with
CPU_HOG_CLOCK_PROCESS instead of CPU_HOG_CLOCK_WALL to make sense or the
expected_usage_usec should be defined with the configured quota in mind
(i.e. 1/100).  (The latter seems to make the test more natural.)

With such defined metrics, the asserted expression could be
	values_close(usage_usec, expected_usage_usec, 10)
based on your numbers, error is around 20% so our helper's argument is
roughly half of that. (I'd be fine even with err=20 to prevent some
false positives.)

I think those changes could even be in one patch but I leave that up to
you. My comment to your 2nd patch is that I'd like to stick to relative
errors and keep positive values_close() predicate that's used in other
selftests too. (But those 95% in the current code are clumsy given two
different qualities are compared.)

Thanks,
Michal

Re: [PATCH 0/2] selftests/cgroup: better bound for cpu.max tests

Posted by Shashank Balaji 3 months, 1 week ago

Hi Michal, 

Thanks for the reply!

On Wed, Jul 02, 2025 at 02:34:29PM +0200, Michal Koutný wrote:
> Hello Shashank.
> 
> On Tue, Jul 01, 2025 at 11:13:54PM +0900, Shashank Balaji <shashank.mahadasyam@sony.com> wrote:
> > cpu.max selftests (both the normal one and the nested one) test the
> > working of throttling by setting up cpu.max, running a cpu hog process
> > for a specified duration, and comparing usage_usec as reported by
> > cpu.stat with the duration of the cpu hog: they should be far enough.
> > 
> > Currently, this is done by using values_close, which has two problems:
> > 
> > 1. Semantic: values_close is used with an error percentage of 95%, which
> >    one will not expect on seeing "values close". The intent it's
> > actually going for is "values far".
> > 
> > 2. Accuracy: the tests can pass even if usage_usec is upto around double
> >    the expected amount. That's too high of a margin for usage_usec.
> > 
> > Overall, this patchset improves the readability and accuracy of the
> > cpu.max tests.
> > 
> > Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
> 
> I think you're getting at an actual bug in the test definition. 
> 
> I think that the test_cpucg_max should either run hog_cpus_timed with
> CPU_HOG_CLOCK_PROCESS instead of CPU_HOG_CLOCK_WALL to make sense or the
> expected_usage_usec should be defined with the configured quota in mind
> (i.e. 1/100).  (The latter seems to make the test more natural.)

Going with the more natural way of sticking to CPU_HOG_CLOCK_WALL, the
second patch does calculate expected_usage_usec based on the configured
quota, as the code comment explains. So I'm guessesing we're on the same page
about this?

> With such defined metrics, the asserted expression could be
> 	values_close(usage_usec, expected_usage_usec, 10)
> based on your numbers, error is around 20% so our helper's argument is
> roughly half of that. (I'd be fine even with err=20 to prevent some
> false positives.)
> 
> I think those changes could even be in one patch but I leave that up to
> you. My comment to your 2nd patch is that I'd like to stick to relative
> errors and keep positive values_close() predicate that's used in other
> selftests too. (But those 95% in the current code are clumsy given two
> different qualities are compared.)

Do you mean something like,

	if (values_close(usage_usec, expected_usage_usec, 10))
			goto cleanup;

using the positive values_close() predicate. If so, I'm not sure I
understand because if usage_usec and expected_usage_usec _are_ close,
then we want the test to pass! We should be using the negative
predicate.

And sure, I'll send v2 as a single patch.

Thanks

Shashank

Re: [PATCH 0/2] selftests/cgroup: better bound for cpu.max tests

Posted by Michal Koutný 3 months ago

On Thu, Jul 03, 2025 at 10:41:17AM +0900, Shashank Balaji <shashank.mahadasyam@sony.com> wrote:
> Going with the more natural way of sticking to CPU_HOG_CLOCK_WALL, the
> second patch does calculate expected_usage_usec based on the configured
> quota, as the code comment explains. So I'm guessesing we're on the same page
> about this?

Yes, the expected_usage_usec in the 2nd patch is correct. (It'd be nicer
if the value calculated from the configured cpu.max and not typed out as
a literal that may diverge should be the cpu.max changed in the test.)

> Do you mean something like,
> 
> 	if (values_close(usage_usec, expected_usage_usec, 10))
> 			goto cleanup;
> 
> using the positive values_close() predicate. If so, I'm not sure I
> understand because if usage_usec and expected_usage_usec _are_ close,
> then we want the test to pass! We should be using the negative
> predicate.

I meant to use it the same way like test_memcontrol.c does, i.e.
values_close() <=> pass.

So codewise it boils down to (a negation, I see why the confusion):
 	if (!values_close(usage_usec, expected_usage_usec, 10))
 		goto cleanup;
	ret = KSFT_PASS;


Michal