[PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected

Yabin Cui posted 1 patch 5 days ago
Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
arch/arm64/Kconfig                  |  1 +
2 files changed, 18 insertions(+), 1 deletion(-)
[PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
Posted by Yabin Cui 5 days ago
Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
selected.

On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
Experiments on Android show 4% improvement in cold app startup time
and 13% improvement in binder benchmarks.

Signed-off-by: Yabin Cui <yabinc@google.com>
---

Change-Logs in V2:

1. Use "For ARM platforms with ETM trace" in autofdo.rst.
2. Create an issue and a change to use extbinary format in instructions:
   https://github.com/Linaro/OpenCSD/issues/65
   https://android-review.googlesource.com/c/platform/system/extras/+/3362107

 Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
 arch/arm64/Kconfig                  |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
index 1f0a451e9ccd..a890e84a2fdd 100644
--- a/Documentation/dev-tools/autofdo.rst
+++ b/Documentation/dev-tools/autofdo.rst
@@ -55,7 +55,7 @@ process consists of the following steps:
    workload to gather execution frequency data. This data is
    collected using hardware sampling, via perf. AutoFDO is most
    effective on platforms supporting advanced PMU features like
-   LBR on Intel machines.
+   LBR on Intel machines, ETM traces on ARM machines.
 
 #. AutoFDO profile generation: Perf output file is converted to
    the AutoFDO profile via offline tools.
@@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
 
       $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
 
+   - For ARM platforms with ETM trace:
+
+     Follow the instructions in the `Linaro OpenCSD document
+     https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
+     to record ETM traces for AutoFDO::
+
+      $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
+      $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
+
+     For ARM platforms running Android, follow the instructions in the
+     `Android simpleperf document
+     <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
+     to record ETM traces for AutoFDO::
+
+      $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
+
 4) (Optional) Download the raw perf file to the host machine.
 
 5) To generate an AutoFDO profile, two offline tools are available:
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd9df6dcc593..c3814df5e391 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -103,6 +103,7 @@ config ARM64
 	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
 	select ARCH_SUPPORTS_RT
+	select ARCH_SUPPORTS_AUTOFDO_CLANG
 	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
 	select ARCH_WANT_DEFAULT_BPF_JIT
-- 
2.47.0.338.g60cca15819-goog
Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
Posted by Kees Cook 3 days, 5 hours ago
On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> selected.
> 
> On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> Experiments on Android show 4% improvement in cold app startup time
> and 13% improvement in binder benchmarks.
> 
> Signed-off-by: Yabin Cui <yabinc@google.com>

This looks trivial enough to enable. ;) I expect this could go via the
kbuild tree (Masahiro) with an arm64 maintainer Ack.

FWIW:

Reviewed-by: Kees Cook <kees@kernel.org>

-- 
Kees Cook
Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
Posted by Rong Xu 4 days, 23 hours ago
This patch looks good to me.

I assume the profile format change in the Android doc will be submitted soon.
Since "extbinary" is a superset of "binary", using the "extbinary"
format profile
in Android shouldn't cause any compatibility issues.

Reviewed-by: Rong Xu <xur.google.com>

-Rong

On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote:
>
> Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> selected.
>
> On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> Experiments on Android show 4% improvement in cold app startup time
> and 13% improvement in binder benchmarks.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
>
> Change-Logs in V2:
>
> 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> 2. Create an issue and a change to use extbinary format in instructions:
>    https://github.com/Linaro/OpenCSD/issues/65
>    https://android-review.googlesource.com/c/platform/system/extras/+/3362107
>
>  Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
>  arch/arm64/Kconfig                  |  1 +
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> index 1f0a451e9ccd..a890e84a2fdd 100644
> --- a/Documentation/dev-tools/autofdo.rst
> +++ b/Documentation/dev-tools/autofdo.rst
> @@ -55,7 +55,7 @@ process consists of the following steps:
>     workload to gather execution frequency data. This data is
>     collected using hardware sampling, via perf. AutoFDO is most
>     effective on platforms supporting advanced PMU features like
> -   LBR on Intel machines.
> +   LBR on Intel machines, ETM traces on ARM machines.
>
>  #. AutoFDO profile generation: Perf output file is converted to
>     the AutoFDO profile via offline tools.
> @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
>
>        $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
>
> +   - For ARM platforms with ETM trace:
> +
> +     Follow the instructions in the `Linaro OpenCSD document
> +     https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> +     to record ETM traces for AutoFDO::
> +
> +      $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
> +      $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> +
> +     For ARM platforms running Android, follow the instructions in the
> +     `Android simpleperf document
> +     <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> +     to record ETM traces for AutoFDO::
> +
> +      $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> +
>  4) (Optional) Download the raw perf file to the host machine.
>
>  5) To generate an AutoFDO profile, two offline tools are available:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fd9df6dcc593..c3814df5e391 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -103,6 +103,7 @@ config ARM64
>         select ARCH_SUPPORTS_PER_VMA_LOCK
>         select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
>         select ARCH_SUPPORTS_RT
> +       select ARCH_SUPPORTS_AUTOFDO_CLANG
>         select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
>         select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
>         select ARCH_WANT_DEFAULT_BPF_JIT
> --
> 2.47.0.338.g60cca15819-goog
>
Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
Posted by Yabin Cui 3 days, 23 hours ago
Add George from ChromeOS.

On Mon, Nov 18, 2024 at 3:49 PM Rong Xu <xur@google.com> wrote:
>
> This patch looks good to me.
>
> I assume the profile format change in the Android doc will be submitted soon.
> Since "extbinary" is a superset of "binary", using the "extbinary"
> format profile
> in Android shouldn't cause any compatibility issues.
>
> Reviewed-by: Rong Xu <xur.google.com>
>
> -Rong
>
> On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote:
> >
> > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> > selected.
> >
> > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> > Experiments on Android show 4% improvement in cold app startup time
> > and 13% improvement in binder benchmarks.
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> >
> > Change-Logs in V2:
> >
> > 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> > 2. Create an issue and a change to use extbinary format in instructions:
> >    https://github.com/Linaro/OpenCSD/issues/65
> >    https://android-review.googlesource.com/c/platform/system/extras/+/3362107
> >
> >  Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> >  arch/arm64/Kconfig                  |  1 +
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> > index 1f0a451e9ccd..a890e84a2fdd 100644
> > --- a/Documentation/dev-tools/autofdo.rst
> > +++ b/Documentation/dev-tools/autofdo.rst
> > @@ -55,7 +55,7 @@ process consists of the following steps:
> >     workload to gather execution frequency data. This data is
> >     collected using hardware sampling, via perf. AutoFDO is most
> >     effective on platforms supporting advanced PMU features like
> > -   LBR on Intel machines.
> > +   LBR on Intel machines, ETM traces on ARM machines.
> >
> >  #. AutoFDO profile generation: Perf output file is converted to
> >     the AutoFDO profile via offline tools.
> > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
> >
> >        $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> >
> > +   - For ARM platforms with ETM trace:
> > +
> > +     Follow the instructions in the `Linaro OpenCSD document
> > +     https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> > +     to record ETM traces for AutoFDO::
> > +
> > +      $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
> > +      $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> > +
> > +     For ARM platforms running Android, follow the instructions in the
> > +     `Android simpleperf document
> > +     <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> > +     to record ETM traces for AutoFDO::
> > +
> > +      $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> > +
> >  4) (Optional) Download the raw perf file to the host machine.
> >
> >  5) To generate an AutoFDO profile, two offline tools are available:
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index fd9df6dcc593..c3814df5e391 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -103,6 +103,7 @@ config ARM64
> >         select ARCH_SUPPORTS_PER_VMA_LOCK
> >         select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> >         select ARCH_SUPPORTS_RT
> > +       select ARCH_SUPPORTS_AUTOFDO_CLANG
> >         select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> >         select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> >         select ARCH_WANT_DEFAULT_BPF_JIT
> > --
> > 2.47.0.338.g60cca15819-goog
> >
Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
Posted by George Burgess 3 days, 7 hours ago
We've used ETM in ChromeOS for a while now. Hardware
requirements make it unfortunately less ubiquitous than LBR, but:
- we first launched it on 5.15,
- it's still humming along nicely today on 6.6, so:

Tested-by: George Burgess IV <gbiv@google.com>

IIRC, with a baseline of "using x86_64 AFDO profiles on ARM kernels,"
we saw a perf win on the order of a few (3? 4?) percentage points when
we made the switch.

On Tue, Nov 19, 2024 at 5:04 PM Yabin Cui <yabinc@google.com> wrote:
>
> Add George from ChromeOS.
>
> On Mon, Nov 18, 2024 at 3:49 PM Rong Xu <xur@google.com> wrote:
> >
> > This patch looks good to me.
> >
> > I assume the profile format change in the Android doc will be submitted soon.
> > Since "extbinary" is a superset of "binary", using the "extbinary"
> > format profile
> > in Android shouldn't cause any compatibility issues.
> >
> > Reviewed-by: Rong Xu <xur.google.com>
> >
> > -Rong
> >
> > On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote:
> > >
> > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> > > selected.
> > >
> > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> > > Experiments on Android show 4% improvement in cold app startup time
> > > and 13% improvement in binder benchmarks.
> > >
> > > Signed-off-by: Yabin Cui <yabinc@google.com>
> > > ---
> > >
> > > Change-Logs in V2:
> > >
> > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> > > 2. Create an issue and a change to use extbinary format in instructions:
> > >    https://github.com/Linaro/OpenCSD/issues/65
> > >    https://android-review.googlesource.com/c/platform/system/extras/+/3362107
> > >
> > >  Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> > >  arch/arm64/Kconfig                  |  1 +
> > >  2 files changed, 18 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> > > index 1f0a451e9ccd..a890e84a2fdd 100644
> > > --- a/Documentation/dev-tools/autofdo.rst
> > > +++ b/Documentation/dev-tools/autofdo.rst
> > > @@ -55,7 +55,7 @@ process consists of the following steps:
> > >     workload to gather execution frequency data. This data is
> > >     collected using hardware sampling, via perf. AutoFDO is most
> > >     effective on platforms supporting advanced PMU features like
> > > -   LBR on Intel machines.
> > > +   LBR on Intel machines, ETM traces on ARM machines.
> > >
> > >  #. AutoFDO profile generation: Perf output file is converted to
> > >     the AutoFDO profile via offline tools.
> > > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
> > >
> > >        $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> > >
> > > +   - For ARM platforms with ETM trace:
> > > +
> > > +     Follow the instructions in the `Linaro OpenCSD document
> > > +     https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> > > +     to record ETM traces for AutoFDO::
> > > +
> > > +      $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>

FWIW, CrOS spells the event 'cs_etm/autofdo/u'.

I'm not familiar enough with perf event syntax (or downstream patches
that CrOS has to its kernel) to say whether that should motivate a
change here. Happy to find out more if there's interest.









> > > +      $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> > > +
> > > +     For ARM platforms running Android, follow the instructions in the
> > > +     `Android simpleperf document
> > > +     <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> > > +     to record ETM traces for AutoFDO::
> > > +
> > > +      $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> > > +
> > >  4) (Optional) Download the raw perf file to the host machine.
> > >
> > >  5) To generate an AutoFDO profile, two offline tools are available:
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index fd9df6dcc593..c3814df5e391 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -103,6 +103,7 @@ config ARM64
> > >         select ARCH_SUPPORTS_PER_VMA_LOCK
> > >         select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > >         select ARCH_SUPPORTS_RT
> > > +       select ARCH_SUPPORTS_AUTOFDO_CLANG
> > >         select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > >         select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > >         select ARCH_WANT_DEFAULT_BPF_JIT
> > > --
> > > 2.47.0.338.g60cca15819-goog
> > >