[v1] selftests/timers/posix_timers: reimplement check_timer_distribution()

[PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

Thomas says:

	The signal distribution test has a tendency to hang for a long
	time as the signal delivery is not really evenly distributed. In
	fact it might never be distributed across all threads ever in
	the way it is written.

To me even the

	This primarily tests that the kernel does not favour any one.

comment doesn't look right. The kernel does favour a thread which hits
the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.

The new version simply checks that the group leader sleeping in join()
never receives SIGALRM, cpu_timer_fire() should always send the signal
to the thread which burns cpu.

Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
to the current thread") the test-case fails immediately, the very 1st tick
wakes the leader up. Otherwise it quickly succeeds after 100 ticks.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 tools/testing/selftests/timers/posix_timers.c | 102 ++++++++----------
 1 file changed, 46 insertions(+), 56 deletions(-)

diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3ffd0d9..2586a6552737 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,70 @@ static int check_timer_create(int which)
 	return 0;
 }
 
-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;
 
-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
 {
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
-	return NULL;
+	if (pthread_self() != ctd_thread)
+		ctd_failed = 1;
+	ctd_count--;
 }
 
-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
 {
-	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
-		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
-	int err, i;
-	timer_t id;
-	const int nthreads = 10;
-	pthread_t threads[nthreads];
 	struct itimerspec val = {
 		.it_value.tv_sec = 0,
 		.it_value.tv_nsec = 1000 * 1000,
 		.it_interval.tv_sec = 0,
 		.it_interval.tv_nsec = 1000 * 1000,
 	};
+	timer_t id;
 
-	remain = nthreads + 1;  /* worker threads + this thread */
-	signal(SIGALRM, distribution_handler);
-	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
-	if (err < 0) {
-		ksft_perror("Can't create timer");
-		return -1;
-	}
-	err = timer_settime(id, 0, &val, NULL);
-	if (err < 0) {
-		ksft_perror("Can't set timer");
-		return -1;
-	}
+	/* 1/10 seconds to ensure the leader sleeps */
+	usleep(10000);
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_create(&threads[i], NULL, distribution_thread,
-				     NULL);
-		if (err) {
-			ksft_print_msg("Can't create thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	ctd_count = 100;
+	if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+		return "Can't create timer";
+	if (timer_settime(id, 0, &val, NULL))
+		return "Can't set timer";
 
-	/* Wait for all threads to receive the signal. */
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	while (ctd_count > 0 && !ctd_failed)
+		;
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_join(threads[i], NULL);
-		if (err) {
-			ksft_print_msg("Can't join thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	if (timer_delete(id))
+		return "Can't delete timer";
 
-	if (timer_delete(id)) {
-		ksft_perror("Can't delete timer");
-		return -1;
-	}
+	return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+	const char *errmsg;
+
+	signal(SIGALRM, ctd_sighandler);
+
+	errmsg = "Can't create thread";
+	if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+		goto err;
+
+	errmsg = "Can't join thread";
+	if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+		goto err;
+
+	if (ctd_failed)
+		ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
+	else
+		ksft_test_result_pass("check signal distribution\n");
 
-	ksft_test_result_pass("check_timer_distribution\n");
 	return 0;
+err:
+	ksft_print_msg(errmsg);
+	return -1;
 }
 
 int main(int argc, char **argv)
-- 
2.25.1.362.g51ebf55

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Mark Brown 1 year, 10 months ago

On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
> Thomas says:
> 
> 	The signal distribution test has a tendency to hang for a long
> 	time as the signal delivery is not really evenly distributed. In
> 	fact it might never be distributed across all threads ever in
> 	the way it is written.
> 
> To me even the
> 
> 	This primarily tests that the kernel does not favour any one.

Further to my previous mail it's also broken the arm64 selftest builds,
they use kselftest.h with nolibc in order to test low level
functionality mainly used by libc implementations and nolibc doesn't
implement uname():

In file included from za-fork.c:12:
../../kselftest.h:433:17: error: variable has incomplete type 'struct utsname'
        struct utsname info;
                       ^
../../kselftest.h:433:9: note: forward declaration of 'struct utsname'
        struct utsname info;
               ^
../../kselftest.h:435:6: error: call to undeclared function 'uname'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
        if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
            ^
../../kselftest.h:435:22: error: call to undeclared function 'sscanf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
        if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
                            ^
1 warning and 3 errors generated.

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Thomas Gleixner 1 year, 10 months ago

On Thu, Apr 11 2024 at 13:44, Mark Brown wrote:
> On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
>> Thomas says:
>> 
>> 	The signal distribution test has a tendency to hang for a long
>> 	time as the signal delivery is not really evenly distributed. In
>> 	fact it might never be distributed across all threads ever in
>> 	the way it is written.
>> 
>> To me even the
>> 
>> 	This primarily tests that the kernel does not favour any one.
>
> Further to my previous mail it's also broken the arm64 selftest builds,
> they use kselftest.h with nolibc in order to test low level
> functionality mainly used by libc implementations and nolibc doesn't
> implement uname():
>
> In file included from za-fork.c:12:
> ../../kselftest.h:433:17: error: variable has incomplete type 'struct utsname'
>         struct utsname info;
>                        ^
> ../../kselftest.h:433:9: note: forward declaration of 'struct utsname'
>         struct utsname info;
>                ^
> ../../kselftest.h:435:6: error: call to undeclared function 'uname'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>             ^
> ../../kselftest.h:435:22: error: call to undeclared function 'sscanf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)

Grrr. Let me stare at this.

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

On 04/11, Thomas Gleixner wrote:
>
> On Thu, Apr 11 2024 at 13:44, Mark Brown wrote:
> >
> > Further to my previous mail it's also broken the arm64 selftest builds,
> > they use kselftest.h with nolibc in order to test low level
> > functionality mainly used by libc implementations and nolibc doesn't
> > implement uname():
> >
> > In file included from za-fork.c:12:
> > ../../kselftest.h:433:17: error: variable has incomplete type 'struct utsname'
> >         struct utsname info;
> >                        ^
> > ../../kselftest.h:433:9: note: forward declaration of 'struct utsname'
> >         struct utsname info;
> >                ^
> > ../../kselftest.h:435:6: error: call to undeclared function 'uname'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> >         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> >             ^
> > ../../kselftest.h:435:22: error: call to undeclared function 'sscanf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> >         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>
> Grrr. Let me stare at this.

Damn ;)

Can't we just turn ksft_min_kernel_version() into

	static inline int ksft_min_kernel_version(unsigned int min_major,
						  unsigned int min_minor)
	{
	#ifdef NOLIBC
		return -1;
	#else
		unsigned int major, minor;
		struct utsname info;

		if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
		       ksft_exit_fail_msg("Can't parse kernel version\n");

		return major > min_major || (major == min_major && minor >= min_minor);
	#endif
	}

?

Not sure what should check_timer_distribution() do in this case, to me
ksft_test_result_fail() is fine.

Oleg.

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Mark Brown 1 year, 10 months ago

On Thu, Apr 11, 2024 at 05:50:53PM +0200, Oleg Nesterov wrote:
> On 04/11, Thomas Gleixner wrote:

> > Grrr. Let me stare at this.

> Damn ;)

> Can't we just turn ksft_min_kernel_version() into

> 	static inline int ksft_min_kernel_version(unsigned int min_major,
> 						  unsigned int min_minor)
> 	{
> 	#ifdef NOLIBC
> 		return -1;
> 	#else

That'd probably work well enough here.  I think it's reasonable for
someone who wants to build a test that uses ksft_min_kernel_version()
with nolibc to figure out how to implement it, right now it's not
actually getting used with nolibc and just happens to be seen due to
being in the same header.

> Not sure what should check_timer_distribution() do in this case, to me
> ksft_test_result_fail() is fine.

I'd go with skip but yeah.

[PATCH] selftests: fix build failure with NOLIBC

Posted by Oleg Nesterov 1 year, 10 months ago

As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
it doesn't implement uname().

Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.uk/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 tools/testing/selftests/kselftest.h           | 6 ++++++
 tools/testing/selftests/timers/posix_timers.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 973b18e156b2..0d9ed3255f5e 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -392,6 +392,11 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
 static inline int ksft_min_kernel_version(unsigned int min_major,
 					  unsigned int min_minor)
 {
+#ifdef NOLIBC
+	ksft_print_msg("NOLIBC: Can't check kernel version: "
+			"Function not implemented\n");
+	return -1;
+#else
 	unsigned int major, minor;
 	struct utsname info;
 
@@ -399,6 +404,7 @@ static inline int ksft_min_kernel_version(unsigned int min_major,
 		ksft_exit_fail_msg("Can't parse kernel version\n");
 
 	return major > min_major || (major == min_major && minor >= min_minor);
+#endif
 }
 
 #endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d86a0e00711e..878496d2a656 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -241,7 +241,7 @@ static int check_timer_distribution(void)
 
 	if (!ctd_failed)
 		ksft_test_result_pass("check signal distribution\n");
-	else if (ksft_min_kernel_version(6, 3))
+	else if (ksft_min_kernel_version(6, 3) > 0)
 		ksft_test_result_fail("check signal distribution\n");
 	else
 		ksft_test_result_skip("check signal distribution (old kernel)\n");
-- 
2.25.1.362.g51ebf55

Re: [PATCH] selftests: fix build failure with NOLIBC

Posted by Mark Brown 1 year, 9 months ago

On Fri, Apr 12, 2024 at 02:35:36PM +0200, Oleg Nesterov wrote:
> As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
> it doesn't implement uname().
> 
> Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
> Reported-by: Mark Brown <broonie@kernel.org>
> Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.uk/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Makes sense to me given that there's not likely to be any immediate
users.

Reviewed-by: Mark Brown <broonie@kernel.org>

[tip: timers/urgent] selftests: kselftest: Fix build failure with NOLIBC

Posted by tip-bot2 for Oleg Nesterov 1 year, 10 months ago

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     16767502aa990cca2cb7d1372b31d328c4c85b40
Gitweb:        https://git.kernel.org/tip/16767502aa990cca2cb7d1372b31d328c4c85b40
Author:        Oleg Nesterov <oleg@redhat.com>
AuthorDate:    Fri, 12 Apr 2024 14:35:36 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 12 Apr 2024 16:55:00 +02:00

selftests: kselftest: Fix build failure with NOLIBC

As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
it doesn't implement uname().

Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240412123536.GA32444@redhat.com
Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.uk/
---
 tools/testing/selftests/kselftest.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 0591974..7b89362 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -395,6 +395,10 @@ static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
 static inline int ksft_min_kernel_version(unsigned int min_major,
 					  unsigned int min_minor)
 {
+#ifdef NOLIBC
+	ksft_print_msg("NOLIBC: Can't check kernel version: Function not implemented\n");
+	return 0;
+#else
 	unsigned int major, minor;
 	struct utsname info;
 
@@ -402,6 +406,7 @@ static inline int ksft_min_kernel_version(unsigned int min_major,
 		ksft_exit_fail_msg("Can't parse kernel version\n");
 
 	return major > min_major || (major == min_major && minor >= min_minor);
+#endif
 }
 
 #endif /* __KSELFTEST_H */

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Mark Brown 1 year, 10 months ago

On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
> Thomas says:
> 
> 	The signal distribution test has a tendency to hang for a long
> 	time as the signal delivery is not really evenly distributed. In
> 	fact it might never be distributed across all threads ever in
> 	the way it is written.
> 
> To me even the
> 
> 	This primarily tests that the kernel does not favour any one.
> 
> comment doesn't look right. The kernel does favour a thread which hits
> the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.
> 
> The new version simply checks that the group leader sleeping in join()
> never receives SIGALRM, cpu_timer_fire() should always send the signal
> to the thread which burns cpu.
> 
> Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
> to the current thread") the test-case fails immediately, the very 1st tick
> wakes the leader up. Otherwise it quickly succeeds after 100 ticks.

This has landed in -next and is causing warning spam throughout
kselftest when built with clang:

/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
        if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
            ^~~~~~~~~~~~
/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:438:9: note: uninitialized use occurs here
        return major > min_major || (major == min_major && minor >= min_minor);
               ^~~~~
/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: note: remove the '||' if its condition is always false
        if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
            ^~~~~~~~~~~~~~~
/home/broonie/git/bisect/tools/testing/selftests/kselftest.h:432:20: note: initialize the variable 'major' to silence this warning
        unsigned int major, minor;
                          ^
                           = 0

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by John Stultz 1 year, 10 months ago

On Thu, Apr 11, 2024 at 5:42 AM Mark Brown <broonie@kernel.org> wrote:
> On Sat, Apr 06, 2024 at 05:09:51PM +0200, Oleg Nesterov wrote:
> > Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
> > to the current thread") the test-case fails immediately, the very 1st tick
> > wakes the leader up. Otherwise it quickly succeeds after 100 ticks.
>
> This has landed in -next and is causing warning spam throughout
> kselftest when built with clang:
>
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
>         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>             ^~~~~~~~~~~~
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:438:9: note: uninitialized use occurs here
>         return major > min_major || (major == min_major && minor >= min_minor);
>                ^~~~~
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:435:6: note: remove the '||' if its condition is always false
>         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>             ^~~~~~~~~~~~~~~
> /home/broonie/git/bisect/tools/testing/selftests/kselftest.h:432:20: note: initialize the variable 'major' to silence this warning
>         unsigned int major, minor;
>                           ^
>                            = 0

I hit this one too yesterday and included a fix for it here:
  https://lore.kernel.org/lkml/20240410232637.4135564-2-jstultz@google.com/

thanks
-john

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

Dmitry, Thomas,

To simplify the review I've attached the code with this patch applied below.

Yes, this changes the "semantics" of check_timer_distribution(), perhaps it
should be renamed.

But I do not see a better approach, and in fact I think that

	Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID

is the wrong goal.

Do you agree?

Oleg.
-------------------------------------------------------------------------------

static pthread_t ctd_thread;
static volatile int ctd_count, ctd_failed;

static void ctd_sighandler(int sig)
{
	if (pthread_self() != ctd_thread)
		ctd_failed = 1;
	ctd_count--;
}

static void *ctd_thread_func(void *arg)
{
	struct itimerspec val = {
		.it_value.tv_sec = 0,
		.it_value.tv_nsec = 1000 * 1000,
		.it_interval.tv_sec = 0,
		.it_interval.tv_nsec = 1000 * 1000,
	};
	timer_t id;

	/* 1/10 seconds to ensure the leader sleeps */
	usleep(10000);

	ctd_count = 100;
	if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
		return "Can't create timer";
	if (timer_settime(id, 0, &val, NULL))
		return "Can't set timer";

	while (ctd_count > 0 && !ctd_failed)
		;

	if (timer_delete(id))
		return "Can't delete timer";

	return NULL;
}

/*
 * Test that only the running thread receives the timer signal.
 */
static int check_timer_distribution(void)
{
	const char *errmsg;

	signal(SIGALRM, ctd_sighandler);

	errmsg = "Can't create thread";
	if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
		goto err;

	errmsg = "Can't join thread";
	if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
		goto err;

	if (ctd_failed)
		ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
	else
		ksft_test_result_pass("check signal distribution\n");

	return 0;
err:
	ksft_print_msg(errmsg);
	return -1;
}

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Dmitry Vyukov 1 year, 10 months ago

On Sat, 6 Apr 2024 at 17:12, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Dmitry, Thomas,
>
> To simplify the review I've attached the code with this patch applied below.
>
> Yes, this changes the "semantics" of check_timer_distribution(), perhaps it
> should be renamed.
>
> But I do not see a better approach, and in fact I think that
>
>         Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
>
> is the wrong goal.
>
> Do you agree?
>
> Oleg.
> -------------------------------------------------------------------------------
>
> static pthread_t ctd_thread;
> static volatile int ctd_count, ctd_failed;
>
> static void ctd_sighandler(int sig)
> {
>         if (pthread_self() != ctd_thread)
>                 ctd_failed = 1;
>         ctd_count--;
> }
>
> static void *ctd_thread_func(void *arg)
> {
>         struct itimerspec val = {
>                 .it_value.tv_sec = 0,
>                 .it_value.tv_nsec = 1000 * 1000,
>                 .it_interval.tv_sec = 0,
>                 .it_interval.tv_nsec = 1000 * 1000,
>         };
>         timer_t id;
>
>         /* 1/10 seconds to ensure the leader sleeps */
>         usleep(10000);
>
>         ctd_count = 100;
>         if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
>                 return "Can't create timer";
>         if (timer_settime(id, 0, &val, NULL))
>                 return "Can't set timer";
>
>         while (ctd_count > 0 && !ctd_failed)
>                 ;
>
>         if (timer_delete(id))
>                 return "Can't delete timer";
>
>         return NULL;
> }
>
> /*
>  * Test that only the running thread receives the timer signal.
>  */
> static int check_timer_distribution(void)
> {
>         const char *errmsg;
>
>         signal(SIGALRM, ctd_sighandler);
>
>         errmsg = "Can't create thread";
>         if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
>                 goto err;
>
>         errmsg = "Can't join thread";
>         if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
>                 goto err;
>
>         if (ctd_failed)
>                 ksft_test_result_skip("No signal distribution. Assuming old kernel\n");

Shouldn't the test fail here? The goal of a test is to fail when
things don't work.
I don't see any other ksft_test_result_fail() calls, and it does not
look that the test will hang on incorrect distribution.


>         else
>                 ksft_test_result_pass("check signal distribution\n");
>
>         return 0;
> err:
>         ksft_print_msg(errmsg);
>         return -1;
> }
>

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

On 04/08, Dmitry Vyukov wrote:
>
> >
> >         if (ctd_failed)
> >                 ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
>
> Shouldn't the test fail here? The goal of a test is to fail when
> things don't work.

I've copied this from the previous patch from Thomas, I am fine
either way.

> I don't see any other ksft_test_result_fail() calls, and it does not
> look that the test will hang on incorrect distribution.

Yes, it should never hang.

Thanks,

Oleg.

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

On 04/08, Oleg Nesterov wrote:
>
> On 04/08, Dmitry Vyukov wrote:
> >
> > >
> > >         if (ctd_failed)
> > >                 ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
> >
> > Shouldn't the test fail here? The goal of a test is to fail when
> > things don't work.
>
> I've copied this from the previous patch from Thomas, I am fine
> either way.
>
> > I don't see any other ksft_test_result_fail() calls, and it does not
> > look that the test will hang on incorrect distribution.
>
> Yes, it should never hang.

Forgot to say...

To me this test should simply do

	ksft_test_result(!ctd_failed, "check signal distribution\n");
	return 0;

but I am not familiar with tools/testing/selftests/ and I am not sure
I understand the last email from Thomas.

I agree with whatever you and Thomas decide.

Oleg.

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Thomas Gleixner 1 year, 10 months ago

On Mon, Apr 08 2024 at 20:49, Oleg Nesterov wrote:
> To me this test should simply do
>
> 	ksft_test_result(!ctd_failed, "check signal distribution\n");
> 	return 0;

Right.

> but I am not familiar with tools/testing/selftests/ and I am not sure
> I understand the last email from Thomas.

The discussion started about running new tests on older kernels. As this
is a feature and not a bug fix that obviously fails on older kernels.

So something like the uncompiled below should work.

Thanks,

        tglx
---
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,83 @@ static int check_timer_create(int which)
 	return 0;
 }
 
-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;
 
-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
 {
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
-	return NULL;
-}
-
-static void distribution_handler(int nr)
-{
-	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
-		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+	if (pthread_self() != ctd_thread)
+		ctd_failed = 1;
+	ctd_count--;
 }
 
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
+static void *ctd_thread_func(void *arg)
 {
-	int err, i;
-	timer_t id;
-	const int nthreads = 10;
-	pthread_t threads[nthreads];
 	struct itimerspec val = {
 		.it_value.tv_sec = 0,
 		.it_value.tv_nsec = 1000 * 1000,
 		.it_interval.tv_sec = 0,
 		.it_interval.tv_nsec = 1000 * 1000,
 	};
+	timer_t id;
 
-	remain = nthreads + 1;  /* worker threads + this thread */
-	signal(SIGALRM, distribution_handler);
-	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
-	if (err < 0) {
-		ksft_perror("Can't create timer");
-		return -1;
-	}
-	err = timer_settime(id, 0, &val, NULL);
-	if (err < 0) {
-		ksft_perror("Can't set timer");
-		return -1;
-	}
+	/* 1/10 seconds to ensure the leader sleeps */
+	usleep(10000);
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_create(&threads[i], NULL, distribution_thread,
-				     NULL);
-		if (err) {
-			ksft_print_msg("Can't create thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	ctd_count = 100;
+	if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+		return "Can't create timer";
+	if (timer_settime(id, 0, &val, NULL))
+		return "Can't set timer";
 
-	/* Wait for all threads to receive the signal. */
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	while (ctd_count > 0 && !ctd_failed)
+		;
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_join(threads[i], NULL);
-		if (err) {
-			ksft_print_msg("Can't join thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	if (timer_delete(id))
+		return "Can't delete timer";
 
-	if (timer_delete(id)) {
-		ksft_perror("Can't delete timer");
+	return NULL;
+}
+
+static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
+{
+	unsigned int major, minor;
+	struct utsname info;
+
+	uname(&info);
+	if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+		ksft_exit_fail();
+	return major > min_major || (major == min_major && minor >= min_minor);
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+	const char *errmsg;
+
+	if (!check_kernel_version(6, 3)) {
+		ksft_test_result_skip("check signal distribution (old kernel)\n");
 		return 0;
 	}
 
-	ksft_test_result_pass("check_timer_distribution\n");
+	signal(SIGALRM, ctd_sighandler);
+
+	errmsg = "Can't create thread";
+	if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+		goto err;
+
+	errmsg = "Can't join thread";
+	if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+		goto err;
+
+	ksft_test_result(!ctd_failed, "check signal distribution\n");
 	return 0;
+
+err:
+	ksft_print_msg(errmsg);
+	return -1;
 }
 
 int main(int argc, char **argv)

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

On 04/09, Thomas Gleixner wrote:
>
> The discussion started about running new tests on older kernels. As this
> is a feature and not a bug fix that obviously fails on older kernels.

OK, I see... please see below.

> So something like the uncompiled below should work.

Hmm... this patch doesn't apply to Linus's tree...

It seems that this is because in your tree check_timer_distribution() does

	if (timer_delete(id)) {
		ksft_perror("Can't delete timer");
		return 0;
	}

while in Linus's tree it returns -1 if timer_delete() fails. Nevermind.

Thomas, I am almost shy to continue this discussion and waste your time ;)
But ...

> +static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
> +{
> +	unsigned int major, minor;
> +	struct utsname info;
> +
> +	uname(&info);
> +	if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> +		ksft_exit_fail();
> +	return major > min_major || (major == min_major && minor >= min_minor);
> +}

this looks useful regardless. Perhaps it should be moved into
tools/testing/selftests/kselftest.h as ksft_ck_kernel_version() ?

> +static int check_timer_distribution(void)
> +{
> +	const char *errmsg;
> +
> +	if (!check_kernel_version(6, 3)) {
> +		ksft_test_result_skip("check signal distribution (old kernel)\n");
>  		return 0;

...

> +	ksft_test_result(!ctd_failed, "check signal distribution\n");

Perhaps

	if (!ctd_failed)
		ksft_test_result_pass("check signal distribution\n");
	else if (check_kernel_version(6, 3))
		ksft_test_result_fail("check signal distribution\n");
	else
		ksft_test_result_skip("check signal distribution (old kernel)\n");

makes more sense?

This way it can be used on the older kernels with bcb7ee79029d backported.

Oleg.

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Thomas Gleixner 1 year, 10 months ago

On Tue, Apr 09 2024 at 13:10, Oleg Nesterov wrote:
> On 04/09, Thomas Gleixner wrote:
> It seems that this is because in your tree check_timer_distribution() does
>
> 	if (timer_delete(id)) {
> 		ksft_perror("Can't delete timer");
> 		return 0;
> 	}
>
> while in Linus's tree it returns -1 if timer_delete()
> fails. Nevermind.

Ooops.

>> +static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
>> +{
>> +	unsigned int major, minor;
>> +	struct utsname info;
>> +
>> +	uname(&info);
>> +	if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
>> +		ksft_exit_fail();
>> +	return major > min_major || (major == min_major && minor >= min_minor);
>> +}
>
> this looks useful regardless. Perhaps it should be moved into
> tools/testing/selftests/kselftest.h as ksft_ck_kernel_version() ?

Makes sense.

>> +static int check_timer_distribution(void)
>> +{
>> +	const char *errmsg;
>> +
>> +	if (!check_kernel_version(6, 3)) {
>> +		ksft_test_result_skip("check signal distribution (old kernel)\n");
>>  		return 0;
>
> ..
>
>> +	ksft_test_result(!ctd_failed, "check signal distribution\n");
>
> Perhaps
>
> 	if (!ctd_failed)
> 		ksft_test_result_pass("check signal distribution\n");
> 	else if (check_kernel_version(6, 3))
> 		ksft_test_result_fail("check signal distribution\n");
> 	else
> 		ksft_test_result_skip("check signal distribution (old kernel)\n");
>
> makes more sense?
>
> This way it can be used on the older kernels with bcb7ee79029d backported.

Indeed.

[PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Oleg Nesterov 1 year, 10 months ago

Thomas says:

	The signal distribution test has a tendency to hang for a long
	time as the signal delivery is not really evenly distributed. In
	fact it might never be distributed across all threads ever in
	the way it is written.

To me even the

	This primarily tests that the kernel does not favour any one.

comment doesn't look right. The kernel does favour a thread which hits
the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.

The new version simply checks that the group leader sleeping in join()
never receives SIGALRM, cpu_timer_fire() should always send the signal
to the thread which burns cpu.

Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
to the current thread") the test-case fails immediately, the very 1st tick
wakes the leader up. Otherwise it quickly succeeds after 100 ticks.

As Thomas suggested, the new version doesn't report the failure on the
pre v6.3 kernels that do not have the commit bcb7ee79029d; this is a
feature that obviously fails on the older kernels. So the patch adds the
new simple ksft_ck_kernel_version() helper and uses ksft_test_result_skip()
if check_timer_distribution() fails on the older kernel.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 tools/testing/selftests/kselftest.h           |  14 +++
 tools/testing/selftests/timers/posix_timers.c | 103 ++++++++----------
 2 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 541bf192e30e..6aab3309c6a3 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -51,6 +51,7 @@
 #include <stdarg.h>
 #include <string.h>
 #include <stdio.h>
+#include <sys/utsname.h>
 #endif
 
 #ifndef ARRAY_SIZE
@@ -388,4 +389,17 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
 	exit(KSFT_SKIP);
 }
 
+static inline int ksft_ck_kernel_version(unsigned int min_major,
+					 unsigned int min_minor)
+{
+	struct utsname info;
+	unsigned int major, minor;
+
+	uname(&info);
+	if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+		ksft_exit_fail();
+
+	return major > min_major || (major == min_major && minor >= min_minor);
+}
+
 #endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3ffd0d9..64c41463b704 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,71 @@ static int check_timer_create(int which)
 	return 0;
 }
 
-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;
 
-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
 {
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
-	return NULL;
+	if (pthread_self() != ctd_thread)
+		ctd_failed = 1;
+	ctd_count--;
 }
 
-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
 {
-	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
-		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
-	int err, i;
-	timer_t id;
-	const int nthreads = 10;
-	pthread_t threads[nthreads];
 	struct itimerspec val = {
 		.it_value.tv_sec = 0,
 		.it_value.tv_nsec = 1000 * 1000,
 		.it_interval.tv_sec = 0,
 		.it_interval.tv_nsec = 1000 * 1000,
 	};
+	timer_t id;
 
-	remain = nthreads + 1;  /* worker threads + this thread */
-	signal(SIGALRM, distribution_handler);
-	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
-	if (err < 0) {
-		ksft_perror("Can't create timer");
-		return -1;
-	}
-	err = timer_settime(id, 0, &val, NULL);
-	if (err < 0) {
-		ksft_perror("Can't set timer");
-		return -1;
-	}
+	/* 1/10 seconds to ensure the leader sleeps */
+	usleep(10000);
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_create(&threads[i], NULL, distribution_thread,
-				     NULL);
-		if (err) {
-			ksft_print_msg("Can't create thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	ctd_count = 100;
+	if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+		return "Can't create timer";
+	if (timer_settime(id, 0, &val, NULL))
+		return "Can't set timer";
 
-	/* Wait for all threads to receive the signal. */
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	while (ctd_count > 0 && !ctd_failed)
+		;
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_join(threads[i], NULL);
-		if (err) {
-			ksft_print_msg("Can't join thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	if (timer_delete(id))
+		return "Can't delete timer";
 
-	if (timer_delete(id)) {
-		ksft_perror("Can't delete timer");
-		return -1;
-	}
+	return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+	const char *errmsg;
 
-	ksft_test_result_pass("check_timer_distribution\n");
+	signal(SIGALRM, ctd_sighandler);
+
+	errmsg = "Can't create thread";
+	if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+		goto err;
+
+	errmsg = "Can't join thread";
+	if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+		goto err;
+
+	if (!ctd_failed)
+		ksft_test_result_pass("check signal distribution\n");
+	else if (ksft_ck_kernel_version(6, 3))
+		ksft_test_result_fail("check signal distribution\n");
+	else
+		ksft_test_result_skip("check signal distribution (old kernel)\n");
 	return 0;
+err:
+	ksft_print_msg(errmsg);
+	return -1;
 }
 
 int main(int argc, char **argv)
-- 
2.25.1.362.g51ebf55

Re: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by John Stultz 1 year, 10 months ago

On Tue, Apr 9, 2024 at 6:39 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Thomas says:
>
>         The signal distribution test has a tendency to hang for a long
>         time as the signal delivery is not really evenly distributed. In
>         fact it might never be distributed across all threads ever in
>         the way it is written.
>
> To me even the
>
>         This primarily tests that the kernel does not favour any one.
>
> comment doesn't look right. The kernel does favour a thread which hits
> the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.
>
> The new version simply checks that the group leader sleeping in join()
> never receives SIGALRM, cpu_timer_fire() should always send the signal
> to the thread which burns cpu.
>
> Without the commit bcb7ee79029d ("posix-timers: Prefer delivery of signals
> to the current thread") the test-case fails immediately, the very 1st tick
> wakes the leader up. Otherwise it quickly succeeds after 100 ticks.
>
> As Thomas suggested, the new version doesn't report the failure on the
> pre v6.3 kernels that do not have the commit bcb7ee79029d; this is a
> feature that obviously fails on the older kernels. So the patch adds the
> new simple ksft_ck_kernel_version() helper and uses ksft_test_result_skip()
> if check_timer_distribution() fails on the older kernel.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

This is working great here (on both 6.6 and the older 6.1)! Thanks so
much for fixing this!
One nit below, but otherwise:
  Tested-by: John Stultz <jstultz@google.com>

> +err:
> +       ksft_print_msg(errmsg);

This bit is causing the following warning:
posix_timers.c:250:2: warning: format not a string literal and no
format arguments [-Wformat-security]
  250 |  ksft_print_msg(errmsg);
      |  ^~~~~~~~~~~~~~

A simple fix is just to switch it to:
  ksft_print_msg("%s", errmsg);

thanks
-john

Re: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Thomas Gleixner 1 year, 10 months ago

On Wed, Apr 10 2024 at 15:21, John Stultz wrote:
> On Tue, Apr 9, 2024 at 6:39 AM Oleg Nesterov <oleg@redhat.com> wrote:
> This is working great here (on both 6.6 and the older 6.1)! Thanks so
> much for fixing this!
> One nit below, but otherwise:
>   Tested-by: John Stultz <jstultz@google.com>
>
>> +err:
>> +       ksft_print_msg(errmsg);
>
> This bit is causing the following warning:
> posix_timers.c:250:2: warning: format not a string literal and no
> format arguments [-Wformat-security]
>   250 |  ksft_print_msg(errmsg);
>       |  ^~~~~~~~~~~~~~
>
> A simple fix is just to switch it to:
>   ksft_print_msg("%s", errmsg);

Can you please send a delta patch against tip timers/urgent?

Re: [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by John Stultz 1 year, 10 months ago

On Wed, Apr 10, 2024 at 3:31 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, Apr 10 2024 at 15:21, John Stultz wrote:
> > On Tue, Apr 9, 2024 at 6:39 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > This is working great here (on both 6.6 and the older 6.1)! Thanks so
> > much for fixing this!
> > One nit below, but otherwise:
> >   Tested-by: John Stultz <jstultz@google.com>
> >
> >> +err:
> >> +       ksft_print_msg(errmsg);
> >
> > This bit is causing the following warning:
> > posix_timers.c:250:2: warning: format not a string literal and no
> > format arguments [-Wformat-security]
> >   250 |  ksft_print_msg(errmsg);
> >       |  ^~~~~~~~~~~~~~
> >
> > A simple fix is just to switch it to:
> >   ksft_print_msg("%s", errmsg);
>
> Can you please send a delta patch against tip timers/urgent?

Will do! Apologies for not getting to test and reply earlier.
-john

[tip: timers/urgent] selftests/timers/posix_timers: Reimplement check_timer_distribution()

Posted by tip-bot2 for Oleg Nesterov 1 year, 10 months ago

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     6d029c25b71f2de2838a6f093ce0fa0e69336154
Gitweb:        https://git.kernel.org/tip/6d029c25b71f2de2838a6f093ce0fa0e69336154
Author:        Oleg Nesterov <oleg@redhat.com>
AuthorDate:    Tue, 09 Apr 2024 15:38:03 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 09 Apr 2024 17:48:19 +02:00

selftests/timers/posix_timers: Reimplement check_timer_distribution()

check_timer_distribution() runs ten threads in a busy loop and tries to
test that the kernel distributes a process posix CPU timer signal to every
thread over time.

There is not guarantee that this is true even after commit bcb7ee79029d
("posix-timers: Prefer delivery of signals to the current thread") because
that commit only avoids waking up the sleeping process leader thread, but
that has nothing to do with the actual signal delivery.

As the signal is process wide the first thread which observes sigpending
and wins the race to lock sighand will deliver the signal. Testing shows
that this hangs on a regular base because some threads never win the race.

The comment "This primarily tests that the kernel does not favour any one."
is wrong. The kernel does favour a thread which hits the timer interrupt
when CLOCK_PROCESS_CPUTIME_ID expires.

Rewrite the test so it only checks that the group leader sleeping in join()
never receives SIGALRM and the thread which burns CPU cycles receives all
signals.

In older kernels which do not have commit bcb7ee79029d ("posix-timers:
Prefer delivery of signals to the current thread") the test-case fails
immediately, the very 1st tick wakes the leader up. Otherwise it quickly
succeeds after 100 ticks.

CI testing wants to use newer selftest versions on stable kernels. In this
case the test is guaranteed to fail.

So check in the failure case whether the kernel version is less than v6.3
and skip the test result in that case.

[ tglx: Massaged change log, renamed the version check helper ]

Fixes: e797203fb3ba ("selftests/timers/posix_timers: Test delivery of signals across threads")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240409133802.GD29396@redhat.com
---
 tools/testing/selftests/kselftest.h           |  13 ++-
 tools/testing/selftests/timers/posix_timers.c | 103 +++++++----------
 2 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 541bf19..973b18e 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -51,6 +51,7 @@
 #include <stdarg.h>
 #include <string.h>
 #include <stdio.h>
+#include <sys/utsname.h>
 #endif
 
 #ifndef ARRAY_SIZE
@@ -388,4 +389,16 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
 	exit(KSFT_SKIP);
 }
 
+static inline int ksft_min_kernel_version(unsigned int min_major,
+					  unsigned int min_minor)
+{
+	unsigned int major, minor;
+	struct utsname info;
+
+	if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+		ksft_exit_fail_msg("Can't parse kernel version\n");
+
+	return major > min_major || (major == min_major && minor >= min_minor);
+}
+
 #endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3f..d86a0e0 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,71 @@ static int check_timer_create(int which)
 	return 0;
 }
 
-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;
 
-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
 {
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
-	return NULL;
+	if (pthread_self() != ctd_thread)
+		ctd_failed = 1;
+	ctd_count--;
 }
 
-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
 {
-	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
-		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
-	int err, i;
-	timer_t id;
-	const int nthreads = 10;
-	pthread_t threads[nthreads];
 	struct itimerspec val = {
 		.it_value.tv_sec = 0,
 		.it_value.tv_nsec = 1000 * 1000,
 		.it_interval.tv_sec = 0,
 		.it_interval.tv_nsec = 1000 * 1000,
 	};
+	timer_t id;
 
-	remain = nthreads + 1;  /* worker threads + this thread */
-	signal(SIGALRM, distribution_handler);
-	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
-	if (err < 0) {
-		ksft_perror("Can't create timer");
-		return -1;
-	}
-	err = timer_settime(id, 0, &val, NULL);
-	if (err < 0) {
-		ksft_perror("Can't set timer");
-		return -1;
-	}
+	/* 1/10 seconds to ensure the leader sleeps */
+	usleep(10000);
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_create(&threads[i], NULL, distribution_thread,
-				     NULL);
-		if (err) {
-			ksft_print_msg("Can't create thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	ctd_count = 100;
+	if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+		return "Can't create timer\n";
+	if (timer_settime(id, 0, &val, NULL))
+		return "Can't set timer\n";
 
-	/* Wait for all threads to receive the signal. */
-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	while (ctd_count > 0 && !ctd_failed)
+		;
 
-	for (i = 0; i < nthreads; i++) {
-		err = pthread_join(threads[i], NULL);
-		if (err) {
-			ksft_print_msg("Can't join thread: %s (%d)\n",
-				       strerror(errno), errno);
-			return -1;
-		}
-	}
+	if (timer_delete(id))
+		return "Can't delete timer\n";
 
-	if (timer_delete(id)) {
-		ksft_perror("Can't delete timer");
-		return -1;
-	}
+	return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+	const char *errmsg;
 
-	ksft_test_result_pass("check_timer_distribution\n");
+	signal(SIGALRM, ctd_sighandler);
+
+	errmsg = "Can't create thread\n";
+	if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+		goto err;
+
+	errmsg = "Can't join thread\n";
+	if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+		goto err;
+
+	if (!ctd_failed)
+		ksft_test_result_pass("check signal distribution\n");
+	else if (ksft_min_kernel_version(6, 3))
+		ksft_test_result_fail("check signal distribution\n");
+	else
+		ksft_test_result_skip("check signal distribution (old kernel)\n");
 	return 0;
+err:
+	ksft_print_msg(errmsg);
+	return -1;
 }
 
 int main(int argc, char **argv)

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Dmitry Vyukov 1 year, 10 months ago

On Tue, 9 Apr 2024 at 13:12, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 04/09, Thomas Gleixner wrote:
> >
> > The discussion started about running new tests on older kernels. As this
> > is a feature and not a bug fix that obviously fails on older kernels.
>
> OK, I see... please see below.
>
> > So something like the uncompiled below should work.
>
> Hmm... this patch doesn't apply to Linus's tree...
>
> It seems that this is because in your tree check_timer_distribution() does
>
>         if (timer_delete(id)) {
>                 ksft_perror("Can't delete timer");
>                 return 0;
>         }
>
> while in Linus's tree it returns -1 if timer_delete() fails. Nevermind.
>
> Thomas, I am almost shy to continue this discussion and waste your time ;)
> But ...
>
> > +static bool check_kernel_version(unsigned int min_major, unsigned int min_minor)
> > +{
> > +     unsigned int major, minor;
> > +     struct utsname info;
> > +
> > +     uname(&info);
> > +     if (sscanf(info.release, "%u.%u.", &major, &minor) != 2)
> > +             ksft_exit_fail();
> > +     return major > min_major || (major == min_major && minor >= min_minor);
> > +}
>
> this looks useful regardless. Perhaps it should be moved into
> tools/testing/selftests/kselftest.h as ksft_ck_kernel_version() ?
>
> > +static int check_timer_distribution(void)
> > +{
> > +     const char *errmsg;
> > +
> > +     if (!check_kernel_version(6, 3)) {
> > +             ksft_test_result_skip("check signal distribution (old kernel)\n");
> >               return 0;
>
> ...
>
> > +     ksft_test_result(!ctd_failed, "check signal distribution\n");
>
> Perhaps
>
>         if (!ctd_failed)
>                 ksft_test_result_pass("check signal distribution\n");
>         else if (check_kernel_version(6, 3))
>                 ksft_test_result_fail("check signal distribution\n");
>         else
>                 ksft_test_result_skip("check signal distribution (old kernel)\n");
>
> makes more sense?

This looks even better!

> This way it can be used on the older kernels with bcb7ee79029d backported.
>
> Oleg.
>

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Thomas Gleixner 1 year, 10 months ago

On Mon, Apr 08 2024 at 10:30, Dmitry Vyukov wrote:
> On Sat, 6 Apr 2024 at 17:12, Oleg Nesterov <oleg@redhat.com> wrote:
>>         if (ctd_failed)
>>                 ksft_test_result_skip("No signal distribution. Assuming old kernel\n");
>
> Shouldn't the test fail here? The goal of a test is to fail when
> things don't work.
> I don't see any other ksft_test_result_fail() calls, and it does not
> look that the test will hang on incorrect distribution.

I have a fixup for older kernels. I'll get to Olegs patch and the fixup
later today.

Thanks,

        tglx

Re: [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution()

Posted by Thomas Gleixner 1 year, 10 months ago

On Sat, Apr 06 2024 at 17:10, Oleg Nesterov wrote:
> Yes, this changes the "semantics" of check_timer_distribution(), perhaps it
> should be renamed.

Definitely.

> But I do not see a better approach, and in fact I think that
>
> 	Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
>
> is the wrong goal.
>
> Do you agree?

No argument from my side. All we can test is that the leader is not
woken up.

Thanks,

        tglx