Why is there a --off-cpu-thresh 2000?
We collect an off-cpu period __ONLY ONCE__, either in direct sample form,
or in accumulated form (in BPF stack trace map).
If I don't add --off-cpu-thresh 2000, the sample in the original test
goes into the ring buffer instead of the BPF stack trace map.
Additionally, when using -e dummy, the ring buffer is not open, causing
us to lose a sample.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-11-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/tests/shell/record_offcpu.sh | 71 +++++++++++++++++++++++++
1 file changed, 71 insertions(+)
diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
index 678947fe69ee..c5d6cae94c65 100755
--- a/tools/perf/tests/shell/record_offcpu.sh
+++ b/tools/perf/tests/shell/record_offcpu.sh
@@ -7,6 +7,9 @@ set -e
err=0
perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+ts=$(printf "%u" $((~0 << 32))) # OFF_CPU_TIMESTAMP
+dummy_timestamp=${ts%???} # remove the last 3 digits to match perf script
+
cleanup() {
rm -f ${perfdata}
rm -f ${perfdata}.old
@@ -19,6 +22,9 @@ trap_cleanup() {
}
trap trap_cleanup EXIT TERM INT
+test_over_thresh="Threshold test (over threshold)"
+test_below_thresh="Threshold test (below threshold)"
+
test_offcpu_priv() {
echo "Checking off-cpu privilege"
@@ -88,6 +94,63 @@ test_offcpu_child() {
echo "Child task off-cpu test [Success]"
}
+# task blocks longer than the --off-cpu-thresh, perf should collect a direct sample
+test_offcpu_over_thresh() {
+ echo "${test_over_thresh}"
+
+ # collect direct off-cpu samples for tasks blocked for more than 999ms
+ if ! perf record -e dummy --off-cpu --off-cpu-thresh 999 -o ${perfdata} -- sleep 1 2> /dev/null
+ then
+ echo "${test_over_thresh} [Failed record]"
+ err=1
+ return
+ fi
+ # direct sample's timestamp should be lower than the dummy_timestamp of the at-the-end sample
+ # check if a direct sample exists
+ if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F event | grep -q "offcpu-time"
+ then
+ echo "${test_over_thresh} [Failed missing direct samples]"
+ err=1
+ return
+ fi
+ # there should only be one direct sample, and its period should be higher than off-cpu-thresh
+ if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F period | \
+ awk '{ if (int($1) > 999000000) exit 0; else exit 1; }'
+ then
+ echo "${test_over_thresh} [Failed off-cpu time too short]"
+ err=1
+ return
+ fi
+ echo "${test_over_thresh} [Success]"
+}
+
+# task blocks shorter than the --off-cpu-thresh, perf should collect an at-the-end sample
+test_offcpu_below_thresh() {
+ echo "${test_below_thresh}"
+
+ # collect direct off-cpu samples for tasks blocked for more than 1.2s
+ if ! perf record -e dummy --off-cpu --off-cpu-thresh 12000 -o ${perfdata} -- sleep 1 2> /dev/null
+ then
+ echo "${test_below_thresh} [Failed record]"
+ err=1
+ return
+ fi
+ # see if there's an at-the-end sample
+ if ! perf script --time "${dummy_timestamp}," -i ${perfdata} -F event | grep -q 'offcpu-time'
+ then
+ echo "${test_below_thresh} [Failed at-the-end samples cannot be found]"
+ err=1
+ return
+ fi
+ # plus there shouldn't be any direct samples
+ if perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F event | grep -q 'offcpu-time'
+ then
+ echo "${test_below_thresh} [Failed direct samples are found when they shouldn't be]"
+ err=1
+ return
+ fi
+ echo "${test_below_thresh} [Success]"
+}
test_offcpu_priv
@@ -99,5 +162,13 @@ if [ $err = 0 ]; then
test_offcpu_child
fi
+if [ $err = 0 ]; then
+ test_offcpu_over_thresh
+fi
+
+if [ $err = 0 ]; then
+ test_offcpu_below_thresh
+fi
+
cleanup
exit $err
--
2.45.2
Hello,
On Thu, Feb 13, 2025 at 3:00 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Why is there a --off-cpu-thresh 2000?
>
> We collect an off-cpu period __ONLY ONCE__, either in direct sample form,
> or in accumulated form (in BPF stack trace map).
>
> If I don't add --off-cpu-thresh 2000, the sample in the original test
> goes into the ring buffer instead of the BPF stack trace map.
>
> Additionally, when using -e dummy, the ring buffer is not open, causing
> us to lose a sample.
Just noticed that this commit message is wrong, should be:
"""
Add tests for direct off-cpu samples and --off-cpu-thresh option.
"""
Sorry.
Thanks,
Howard
>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Ian Rogers <irogers@google.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Kan Liang <kan.liang@linux.intel.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Link: https://lore.kernel.org/r/20241108204137.2444151-11-howardchu95@gmail.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
> tools/perf/tests/shell/record_offcpu.sh | 71 +++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
> diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
> index 678947fe69ee..c5d6cae94c65 100755
> --- a/tools/perf/tests/shell/record_offcpu.sh
> +++ b/tools/perf/tests/shell/record_offcpu.sh
> @@ -7,6 +7,9 @@ set -e
> err=0
> perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
>
> +ts=$(printf "%u" $((~0 << 32))) # OFF_CPU_TIMESTAMP
> +dummy_timestamp=${ts%???} # remove the last 3 digits to match perf script
> +
> cleanup() {
> rm -f ${perfdata}
> rm -f ${perfdata}.old
> @@ -19,6 +22,9 @@ trap_cleanup() {
> }
> trap trap_cleanup EXIT TERM INT
>
> +test_over_thresh="Threshold test (over threshold)"
> +test_below_thresh="Threshold test (below threshold)"
> +
> test_offcpu_priv() {
> echo "Checking off-cpu privilege"
>
> @@ -88,6 +94,63 @@ test_offcpu_child() {
> echo "Child task off-cpu test [Success]"
> }
>
> +# task blocks longer than the --off-cpu-thresh, perf should collect a direct sample
> +test_offcpu_over_thresh() {
> + echo "${test_over_thresh}"
> +
> + # collect direct off-cpu samples for tasks blocked for more than 999ms
> + if ! perf record -e dummy --off-cpu --off-cpu-thresh 999 -o ${perfdata} -- sleep 1 2> /dev/null
> + then
> + echo "${test_over_thresh} [Failed record]"
> + err=1
> + return
> + fi
> + # direct sample's timestamp should be lower than the dummy_timestamp of the at-the-end sample
> + # check if a direct sample exists
> + if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F event | grep -q "offcpu-time"
> + then
> + echo "${test_over_thresh} [Failed missing direct samples]"
> + err=1
> + return
> + fi
> + # there should only be one direct sample, and its period should be higher than off-cpu-thresh
> + if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F period | \
> + awk '{ if (int($1) > 999000000) exit 0; else exit 1; }'
> + then
> + echo "${test_over_thresh} [Failed off-cpu time too short]"
> + err=1
> + return
> + fi
> + echo "${test_over_thresh} [Success]"
> +}
> +
> +# task blocks shorter than the --off-cpu-thresh, perf should collect an at-the-end sample
> +test_offcpu_below_thresh() {
> + echo "${test_below_thresh}"
> +
> + # collect direct off-cpu samples for tasks blocked for more than 1.2s
> + if ! perf record -e dummy --off-cpu --off-cpu-thresh 12000 -o ${perfdata} -- sleep 1 2> /dev/null
> + then
> + echo "${test_below_thresh} [Failed record]"
> + err=1
> + return
> + fi
> + # see if there's an at-the-end sample
> + if ! perf script --time "${dummy_timestamp}," -i ${perfdata} -F event | grep -q 'offcpu-time'
> + then
> + echo "${test_below_thresh} [Failed at-the-end samples cannot be found]"
> + err=1
> + return
> + fi
> + # plus there shouldn't be any direct samples
> + if perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F event | grep -q 'offcpu-time'
> + then
> + echo "${test_below_thresh} [Failed direct samples are found when they shouldn't be]"
> + err=1
> + return
> + fi
> + echo "${test_below_thresh} [Success]"
> +}
>
> test_offcpu_priv
>
> @@ -99,5 +162,13 @@ if [ $err = 0 ]; then
> test_offcpu_child
> fi
>
> +if [ $err = 0 ]; then
> + test_offcpu_over_thresh
> +fi
> +
> +if [ $err = 0 ]; then
> + test_offcpu_below_thresh
> +fi
> +
> cleanup
> exit $err
> --
> 2.45.2
>
On Thu, Feb 13, 2025 at 3:04 PM Howard Chu <howardchu95@gmail.com> wrote: > > Hello, > > On Thu, Feb 13, 2025 at 3:00 PM Howard Chu <howardchu95@gmail.com> wrote: > > > > Why is there a --off-cpu-thresh 2000? > > > > We collect an off-cpu period __ONLY ONCE__, either in direct sample form, > > or in accumulated form (in BPF stack trace map). > > > > If I don't add --off-cpu-thresh 2000, the sample in the original test > > goes into the ring buffer instead of the BPF stack trace map. > > > > Additionally, when using -e dummy, the ring buffer is not open, causing > > us to lose a sample. > > Just noticed that this commit message is wrong, should be: > """ > Add tests for direct off-cpu samples and --off-cpu-thresh option. > """ Tested-by: Ian Rogers <irogers@google.com> ``` 121: perf record offcpu profiling tests : Ok ``` I'd be tempted to keep the comments about why 2000 next to the actual code rather than in the commit message. In the code the value is 12000 and not 2000 though? Thanks, Ian
Hello Ian, Thanks for testing this patch :). On Tue, Feb 18, 2025 at 11:32 AM Ian Rogers <irogers@google.com> wrote: > > On Thu, Feb 13, 2025 at 3:04 PM Howard Chu <howardchu95@gmail.com> wrote: > > > > Hello, > > > > On Thu, Feb 13, 2025 at 3:00 PM Howard Chu <howardchu95@gmail.com> wrote: > > > > > > Why is there a --off-cpu-thresh 2000? > > > > > > We collect an off-cpu period __ONLY ONCE__, either in direct sample form, > > > or in accumulated form (in BPF stack trace map). > > > > > > If I don't add --off-cpu-thresh 2000, the sample in the original test > > > goes into the ring buffer instead of the BPF stack trace map. > > > > > > Additionally, when using -e dummy, the ring buffer is not open, causing > > > us to lose a sample. > > > > Just noticed that this commit message is wrong, should be: > > """ > > Add tests for direct off-cpu samples and --off-cpu-thresh option. > > """ > > Tested-by: Ian Rogers <irogers@google.com> > ``` > 121: perf record offcpu profiling tests : Ok > ``` > I'd be tempted to keep the comments about why 2000 next to the actual > code rather than in the commit message. In the code the value is 12000 > and not 2000 though? I actually deleted the --off-cpu-thresh 2000. It was intended to fix Namhyung's original test because I forgot to enable the off-cpu event. Now that recording the off-cpu time of a task is fixed, that workaround is no longer necessary. The --off-cpu-thresh 12000 option is used to force sleep 1 to produce an at-the-end sample, as only tasks that have been off the CPU for more than 1.2 seconds can emit direct samples, and since this is recording an off-cpu period below the threshold, I think it makes sense to put it here in test_offcpu_below_thresh(). That being said, if you’d like me to add a comment or two, I’d be glad to do so. Thanks, Howard > > Thanks, > Ian
© 2016 - 2025 Red Hat, Inc.