perf test: Fix LBR test by adding indirect calls

[PATCH] perf test: Fix LBR test by adding indirect calls

Posted by Namhyung Kim 1 year, 3 months ago

I've noticed sometimes perf record LBR tests failed on indirect call
test because it has empty branch stacks more than expected.

The test workload (thloop) spawns a thread and calls a loop function for
1 second both from the main thread and the new thread.  However neither
of them has indirect calls in the body so it ended up with empty branch
stacks.

  LBR any indirect call test
  [ perf record: Woken up 21 times to write data ]
  [ perf record: Captured and wrote 5.607 MB /tmp/__perf_test.perf.data.pujKd (7924 samples) ]
  LBR any indirect call test: 7924 samples
  LBR any indirect call test [Failed empty br stack ratio exceed 2%: 3%]

Refactor the test workload to call the test_loop() both directly and
indirectly.  Now expectation of indirect call is 50% but let's add some
margin for startup and finish routines.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/shell/record_lbr.sh | 2 +-
 tools/perf/tests/workloads/thloop.c  | 9 ++++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/record_lbr.sh b/tools/perf/tests/shell/record_lbr.sh
index 8d750ee631f877fd..7a23b2095be8acba 100755
--- a/tools/perf/tests/shell/record_lbr.sh
+++ b/tools/perf/tests/shell/record_lbr.sh
@@ -121,7 +121,7 @@ lbr_test "-j any_ret" "any ret" 2
 lbr_test "-j ind_call" "any indirect call" 2
 lbr_test "-j ind_jmp" "any indirect jump" 100
 lbr_test "-j call" "direct calls" 2
-lbr_test "-j ind_call,u" "any indirect user call" 100
+lbr_test "-j ind_call,u" "any indirect user call" 52
 lbr_test "-a -b" "system wide any branch" 2
 lbr_test "-a -j any_call" "system wide any call" 2
 
diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c
index 457b29f91c3ee277..fa5547939882cf6c 100644
--- a/tools/perf/tests/workloads/thloop.c
+++ b/tools/perf/tests/workloads/thloop.c
@@ -18,14 +18,16 @@ static void sighandler(int sig __maybe_unused)
 
 noinline void test_loop(void)
 {
-	while (!done);
+	for (volatile int i = 0; i < 10000; i++)
+		continue;
 }
 
 static void *thfunc(void *arg)
 {
 	void (*loop_fn)(void) = arg;
 
-	loop_fn();
+	while (!done)
+		loop_fn();
 	return NULL;
 }
 
@@ -42,7 +44,8 @@ static int thloop(int argc, const char **argv)
 	alarm(sec);
 
 	pthread_create(&th, NULL, thfunc, test_loop);
-	test_loop();
+	while (!done)
+		test_loop();
 	pthread_join(th, NULL);
 
 	return 0;
-- 
2.47.0.163.g1226f6d8fa-goog

Re: [PATCH] perf test: Fix LBR test by adding indirect calls

Posted by Ian Rogers 1 year, 3 months ago

On Sat, Nov 2, 2024 at 5:24 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> I've noticed sometimes perf record LBR tests failed on indirect call
> test because it has empty branch stacks more than expected.
>
> The test workload (thloop) spawns a thread and calls a loop function for
> 1 second both from the main thread and the new thread.  However neither
> of them has indirect calls in the body so it ended up with empty branch
> stacks.
>
>   LBR any indirect call test
>   [ perf record: Woken up 21 times to write data ]
>   [ perf record: Captured and wrote 5.607 MB /tmp/__perf_test.perf.data.pujKd (7924 samples) ]
>   LBR any indirect call test: 7924 samples
>   LBR any indirect call test [Failed empty br stack ratio exceed 2%: 3%]
>
> Refactor the test workload to call the test_loop() both directly and
> indirectly.  Now expectation of indirect call is 50% but let's add some
> margin for startup and finish routines.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/tests/shell/record_lbr.sh | 2 +-
>  tools/perf/tests/workloads/thloop.c  | 9 ++++++---
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/tests/shell/record_lbr.sh b/tools/perf/tests/shell/record_lbr.sh
> index 8d750ee631f877fd..7a23b2095be8acba 100755
> --- a/tools/perf/tests/shell/record_lbr.sh
> +++ b/tools/perf/tests/shell/record_lbr.sh
> @@ -121,7 +121,7 @@ lbr_test "-j any_ret" "any ret" 2
>  lbr_test "-j ind_call" "any indirect call" 2
>  lbr_test "-j ind_jmp" "any indirect jump" 100
>  lbr_test "-j call" "direct calls" 2
> -lbr_test "-j ind_call,u" "any indirect user call" 100
> +lbr_test "-j ind_call,u" "any indirect user call" 52
>  lbr_test "-a -b" "system wide any branch" 2
>  lbr_test "-a -j any_call" "system wide any call" 2
>
> diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c
> index 457b29f91c3ee277..fa5547939882cf6c 100644
> --- a/tools/perf/tests/workloads/thloop.c
> +++ b/tools/perf/tests/workloads/thloop.c
> @@ -18,14 +18,16 @@ static void sighandler(int sig __maybe_unused)
>
>  noinline void test_loop(void)
>  {
> -       while (!done);
> +       for (volatile int i = 0; i < 10000; i++)

I don't think the volatile here will stop a sufficiently eager
optimizing compiler. I think it may need to be static as well.

Thanks,
Ian

> +               continue;
>  }
>
>  static void *thfunc(void *arg)
>  {
>         void (*loop_fn)(void) = arg;
>
> -       loop_fn();
> +       while (!done)
> +               loop_fn();
>         return NULL;
>  }
>
> @@ -42,7 +44,8 @@ static int thloop(int argc, const char **argv)
>         alarm(sec);
>
>         pthread_create(&th, NULL, thfunc, test_loop);
> -       test_loop();
> +       while (!done)
> +               test_loop();
>         pthread_join(th, NULL);
>
>         return 0;
> --
> 2.47.0.163.g1226f6d8fa-goog
>

Re: [PATCH] perf test: Fix LBR test by adding indirect calls

Posted by Namhyung Kim 1 year, 3 months ago

On Sat, Nov 02, 2024 at 09:58:03PM -0700, Ian Rogers wrote:
> On Sat, Nov 2, 2024 at 5:24 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > I've noticed sometimes perf record LBR tests failed on indirect call
> > test because it has empty branch stacks more than expected.
> >
> > The test workload (thloop) spawns a thread and calls a loop function for
> > 1 second both from the main thread and the new thread.  However neither
> > of them has indirect calls in the body so it ended up with empty branch
> > stacks.
> >
> >   LBR any indirect call test
> >   [ perf record: Woken up 21 times to write data ]
> >   [ perf record: Captured and wrote 5.607 MB /tmp/__perf_test.perf.data.pujKd (7924 samples) ]
> >   LBR any indirect call test: 7924 samples
> >   LBR any indirect call test [Failed empty br stack ratio exceed 2%: 3%]
> >
> > Refactor the test workload to call the test_loop() both directly and
> > indirectly.  Now expectation of indirect call is 50% but let's add some
> > margin for startup and finish routines.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/tests/shell/record_lbr.sh | 2 +-
> >  tools/perf/tests/workloads/thloop.c  | 9 ++++++---
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/tests/shell/record_lbr.sh b/tools/perf/tests/shell/record_lbr.sh
> > index 8d750ee631f877fd..7a23b2095be8acba 100755
> > --- a/tools/perf/tests/shell/record_lbr.sh
> > +++ b/tools/perf/tests/shell/record_lbr.sh
> > @@ -121,7 +121,7 @@ lbr_test "-j any_ret" "any ret" 2
> >  lbr_test "-j ind_call" "any indirect call" 2
> >  lbr_test "-j ind_jmp" "any indirect jump" 100
> >  lbr_test "-j call" "direct calls" 2
> > -lbr_test "-j ind_call,u" "any indirect user call" 100
> > +lbr_test "-j ind_call,u" "any indirect user call" 52
> >  lbr_test "-a -b" "system wide any branch" 2
> >  lbr_test "-a -j any_call" "system wide any call" 2
> >
> > diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c
> > index 457b29f91c3ee277..fa5547939882cf6c 100644
> > --- a/tools/perf/tests/workloads/thloop.c
> > +++ b/tools/perf/tests/workloads/thloop.c
> > @@ -18,14 +18,16 @@ static void sighandler(int sig __maybe_unused)
> >
> >  noinline void test_loop(void)
> >  {
> > -       while (!done);
> > +       for (volatile int i = 0; i < 10000; i++)
> 
> I don't think the volatile here will stop a sufficiently eager
> optimizing compiler. I think it may need to be static as well.

Ok, probably we can disbale optimizations in this code like others
in the test workloads.

Thanks for your review!
Namhyung