Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- arch/arm64/Kconfig | 1 + 2 files changed, 18 insertions(+), 1 deletion(-)
Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
selected.
On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
Experiments on Android show 4% improvement in cold app startup time
and 13% improvement in binder benchmarks.
Signed-off-by: Yabin Cui <yabinc@google.com>
---
Change-Logs in V2:
1. Use "For ARM platforms with ETM trace" in autofdo.rst.
2. Create an issue and a change to use extbinary format in instructions:
https://github.com/Linaro/OpenCSD/issues/65
https://android-review.googlesource.com/c/platform/system/extras/+/3362107
Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
arch/arm64/Kconfig | 1 +
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
index 1f0a451e9ccd..a890e84a2fdd 100644
--- a/Documentation/dev-tools/autofdo.rst
+++ b/Documentation/dev-tools/autofdo.rst
@@ -55,7 +55,7 @@ process consists of the following steps:
workload to gather execution frequency data. This data is
collected using hardware sampling, via perf. AutoFDO is most
effective on platforms supporting advanced PMU features like
- LBR on Intel machines.
+ LBR on Intel machines, ETM traces on ARM machines.
#. AutoFDO profile generation: Perf output file is converted to
the AutoFDO profile via offline tools.
@@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+ - For ARM platforms with ETM trace:
+
+ Follow the instructions in the `Linaro OpenCSD document
+ https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
+ to record ETM traces for AutoFDO::
+
+ $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
+ $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
+
+ For ARM platforms running Android, follow the instructions in the
+ `Android simpleperf document
+ <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
+ to record ETM traces for AutoFDO::
+
+ $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
+
4) (Optional) Download the raw perf file to the host machine.
5) To generate an AutoFDO profile, two offline tools are available:
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd9df6dcc593..c3814df5e391 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -103,6 +103,7 @@ config ARM64
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT
+ select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
select ARCH_WANT_DEFAULT_BPF_JIT
--
2.47.0.338.g60cca15819-goog
On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > selected. > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > Experiments on Android show 4% improvement in cold app startup time > and 13% improvement in binder benchmarks. > > Signed-off-by: Yabin Cui <yabinc@google.com> > --- > > Change-Logs in V2: > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst. > 2. Create an issue and a change to use extbinary format in instructions: > https://github.com/Linaro/OpenCSD/issues/65 > https://android-review.googlesource.com/c/platform/system/extras/+/3362107 > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- > arch/arm64/Kconfig | 1 + > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst > index 1f0a451e9ccd..a890e84a2fdd 100644 > --- a/Documentation/dev-tools/autofdo.rst > +++ b/Documentation/dev-tools/autofdo.rst > @@ -55,7 +55,7 @@ process consists of the following steps: > workload to gather execution frequency data. This data is > collected using hardware sampling, via perf. AutoFDO is most > effective on platforms supporting advanced PMU features like > - LBR on Intel machines. > + LBR on Intel machines, ETM traces on ARM machines. > > #. AutoFDO profile generation: Perf output file is converted to > the AutoFDO profile via offline tools. > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > + - For ARM platforms with ETM trace: > + > + Follow the instructions in the `Linaro OpenCSD document > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ > + to record ETM traces for AutoFDO:: > + > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il > + > + For ARM platforms running Android, follow the instructions in the > + `Android simpleperf document > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ > + to record ETM traces for AutoFDO:: > + > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> > + > 4) (Optional) Download the raw perf file to the host machine. > > 5) To generate an AutoFDO profile, two offline tools are available: > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index fd9df6dcc593..c3814df5e391 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -103,6 +103,7 @@ config ARM64 > select ARCH_SUPPORTS_PER_VMA_LOCK > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > select ARCH_SUPPORTS_RT > + select ARCH_SUPPORTS_AUTOFDO_CLANG > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > select ARCH_WANT_DEFAULT_BPF_JIT After this change, both arm64 and x86 select this option unconditionally and with no apparent support code being added. So what is actually required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't it just available for all architectures instead? Will
Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG. The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO support for Clang build). The CONFIG_AUTOFDO_CLANG config, even if selected by the user, will not be enabled unless ARCH_SUPPORTS_AUTOFDO_CLANG is present. We are not enabling this for all architectures because AutoFDO's optimized build relies on Last Branch Records (LBR) which aren't available on all architectures. -Rong On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > > selected. > > > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > > Experiments on Android show 4% improvement in cold app startup time > > and 13% improvement in binder benchmarks. > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > --- > > > > Change-Logs in V2: > > > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst. > > 2. Create an issue and a change to use extbinary format in instructions: > > https://github.com/Linaro/OpenCSD/issues/65 > > https://android-review.googlesource.com/c/platform/system/extras/+/3362107 > > > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- > > arch/arm64/Kconfig | 1 + > > 2 files changed, 18 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst > > index 1f0a451e9ccd..a890e84a2fdd 100644 > > --- a/Documentation/dev-tools/autofdo.rst > > +++ b/Documentation/dev-tools/autofdo.rst > > @@ -55,7 +55,7 @@ process consists of the following steps: > > workload to gather execution frequency data. This data is > > collected using hardware sampling, via perf. AutoFDO is most > > effective on platforms supporting advanced PMU features like > > - LBR on Intel machines. > > + LBR on Intel machines, ETM traces on ARM machines. > > > > #. AutoFDO profile generation: Perf output file is converted to > > the AutoFDO profile via offline tools. > > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: > > > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > > > + - For ARM platforms with ETM trace: > > + > > + Follow the instructions in the `Linaro OpenCSD document > > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ > > + to record ETM traces for AutoFDO:: > > + > > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> > > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il > > + > > + For ARM platforms running Android, follow the instructions in the > > + `Android simpleperf document > > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ > > + to record ETM traces for AutoFDO:: > > + > > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> > > + > > 4) (Optional) Download the raw perf file to the host machine. > > > > 5) To generate an AutoFDO profile, two offline tools are available: > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index fd9df6dcc593..c3814df5e391 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -103,6 +103,7 @@ config ARM64 > > select ARCH_SUPPORTS_PER_VMA_LOCK > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > select ARCH_SUPPORTS_RT > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > select ARCH_WANT_DEFAULT_BPF_JIT > > After this change, both arm64 and x86 select this option unconditionally > and with no apparent support code being added. So what is actually > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't > it just available for all architectures instead? > > Will
(Aside: please try to avoid top-posting on the public lists as it messes up the flow of conversation; I'll try to piece this back together.) On Mon, Dec 09, 2024 at 09:30:50AM -0800, Rong Xu wrote: > On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > index fd9df6dcc593..c3814df5e391 100644 > > > --- a/arch/arm64/Kconfig > > > +++ b/arch/arm64/Kconfig > > > @@ -103,6 +103,7 @@ config ARM64 > > > select ARCH_SUPPORTS_PER_VMA_LOCK > > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > > select ARCH_SUPPORTS_RT > > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > > select ARCH_WANT_DEFAULT_BPF_JIT > > > > After this change, both arm64 and x86 select this option unconditionally > > and with no apparent support code being added. So what is actually > > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't > > it just available for all architectures instead? > Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG. > The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO > support for Clang build). Yes, that is precisely my point. The user has to enable CONFIG_AUTOFDO_CLANG anyway, so what is the point in having ARCH_SUPPORTS_AUTOFDO_CLANG. Why would an architecture _not_ want to select that? > We are not enabling this for all architectures because AutoFDO's optimized build > relies on Last Branch Records (LBR) which aren't available on all architectures. So? ETM isn't available on all arm64 machines and I doubt whether LBR is available on _all_ x86 machines either. So there's a runtime failure mode that needs to be handled anyway and I don't think the arch-specific Kconfig option is really doing anything useful. Will
On Mon, Dec 9, 2024 at 10:56 AM Will Deacon <will@kernel.org> wrote: > > (Aside: please try to avoid top-posting on the public lists as it messes up > the flow of conversation; I'll try to piece this back together.) > > On Mon, Dec 09, 2024 at 09:30:50AM -0800, Rong Xu wrote: > > On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote: > > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > > index fd9df6dcc593..c3814df5e391 100644 > > > > --- a/arch/arm64/Kconfig > > > > +++ b/arch/arm64/Kconfig > > > > @@ -103,6 +103,7 @@ config ARM64 > > > > select ARCH_SUPPORTS_PER_VMA_LOCK > > > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > > > select ARCH_SUPPORTS_RT > > > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > > > select ARCH_WANT_DEFAULT_BPF_JIT > > > > > > After this change, both arm64 and x86 select this option unconditionally > > > and with no apparent support code being added. So what is actually > > > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't > > > it just available for all architectures instead? I think it's similar to ARCH_SUPPORTS_LTO_CLANG, which also doesn't need any support code but requires testing to ensure it works on a specific architecture. > > > Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG. > > The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO > > support for Clang build). > > Yes, that is precisely my point. The user has to enable > CONFIG_AUTOFDO_CLANG anyway, so what is the point in having > ARCH_SUPPORTS_AUTOFDO_CLANG. Why would an architecture _not_ want to > select that? > > > We are not enabling this for all architectures because AutoFDO's optimized build > > relies on Last Branch Records (LBR) which aren't available on all architectures. > > So? ETM isn't available on all arm64 machines and I doubt whether LBR is > available on _all_ x86 machines either. So there's a runtime failure > mode that needs to be handled anyway and I don't think the arch-specific > Kconfig option is really doing anything useful. My understanding of the benefits of ARCH_SUPPORTS_AUTOFDO_CLANG is: 1. Generally, we don't prefer to collect an AutoFDO profile on one architecture and use it to build the kernel for another architecture. This is because the profile misses data for architecture-dependent code. ARCH_SUPPORTS_AUTOFDO_CLANG can partially prevent this from happening. 2. Building a kernel with an AutoFDO profile involves using new optimization flags for clang. Having ARCH_SUPPORTS_AUTOFDO_CLANG=y for one architecture means someone has tested building a kernel with an AutoFDO profile on this architecture. > > Will
On Mon, Dec 09, 2024 at 03:51:34PM -0800, Yabin Cui wrote: > On Mon, Dec 9, 2024 at 10:56 AM Will Deacon <will@kernel.org> wrote: > > > > (Aside: please try to avoid top-posting on the public lists as it messes up > > the flow of conversation; I'll try to piece this back together.) > > > > On Mon, Dec 09, 2024 at 09:30:50AM -0800, Rong Xu wrote: > > > On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote: > > > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > > > index fd9df6dcc593..c3814df5e391 100644 > > > > > --- a/arch/arm64/Kconfig > > > > > +++ b/arch/arm64/Kconfig > > > > > @@ -103,6 +103,7 @@ config ARM64 > > > > > select ARCH_SUPPORTS_PER_VMA_LOCK > > > > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > > > > select ARCH_SUPPORTS_RT > > > > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > > > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > > > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > > > > select ARCH_WANT_DEFAULT_BPF_JIT > > > > > > > > After this change, both arm64 and x86 select this option unconditionally > > > > and with no apparent support code being added. So what is actually > > > > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't > > > > it just available for all architectures instead? > > I think it's similar to ARCH_SUPPORTS_LTO_CLANG, which also doesn't need any > support code but requires testing to ensure it works on a specific architecture. > > > > > > Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG. > > > The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO > > > support for Clang build). > > > > Yes, that is precisely my point. The user has to enable > > CONFIG_AUTOFDO_CLANG anyway, so what is the point in having > > ARCH_SUPPORTS_AUTOFDO_CLANG. Why would an architecture _not_ want to > > select that? > > > > > We are not enabling this for all architectures because AutoFDO's optimized build > > > relies on Last Branch Records (LBR) which aren't available on all architectures. > > > > So? ETM isn't available on all arm64 machines and I doubt whether LBR is > > available on _all_ x86 machines either. So there's a runtime failure > > mode that needs to be handled anyway and I don't think the arch-specific > > Kconfig option is really doing anything useful. > > My understanding of the benefits of ARCH_SUPPORTS_AUTOFDO_CLANG is: > 1. Generally, we don't prefer to collect an AutoFDO profile on one > architecture and use it to build the kernel for another architecture. > This is because the profile misses data for architecture-dependent > code. ARCH_SUPPORTS_AUTOFDO_CLANG can partially prevent this from > happening. Hmm, not really. Once more than one architecture selects the option, you have the possibility of the mismatch you're trying to avoid. > 2. Building a kernel with an AutoFDO profile involves using new > optimization flags for clang. Having ARCH_SUPPORTS_AUTOFDO_CLANG=y > for one architecture means someone has tested building a kernel with > an AutoFDO profile on this architecture. On the flip side, allowing all architectures to select the option actually increases your test coverage. Will
On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > selected. > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > Experiments on Android show 4% improvement in cold app startup time > and 13% improvement in binder benchmarks. > > Signed-off-by: Yabin Cui <yabinc@google.com> This looks trivial enough to enable. ;) I expect this could go via the kbuild tree (Masahiro) with an arm64 maintainer Ack. FWIW: Reviewed-by: Kees Cook <kees@kernel.org> -- Kees Cook
This patch looks good to me. I assume the profile format change in the Android doc will be submitted soon. Since "extbinary" is a superset of "binary", using the "extbinary" format profile in Android shouldn't cause any compatibility issues. Reviewed-by: Rong Xu <xur.google.com> -Rong On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote: > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > selected. > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > Experiments on Android show 4% improvement in cold app startup time > and 13% improvement in binder benchmarks. > > Signed-off-by: Yabin Cui <yabinc@google.com> > --- > > Change-Logs in V2: > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst. > 2. Create an issue and a change to use extbinary format in instructions: > https://github.com/Linaro/OpenCSD/issues/65 > https://android-review.googlesource.com/c/platform/system/extras/+/3362107 > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- > arch/arm64/Kconfig | 1 + > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst > index 1f0a451e9ccd..a890e84a2fdd 100644 > --- a/Documentation/dev-tools/autofdo.rst > +++ b/Documentation/dev-tools/autofdo.rst > @@ -55,7 +55,7 @@ process consists of the following steps: > workload to gather execution frequency data. This data is > collected using hardware sampling, via perf. AutoFDO is most > effective on platforms supporting advanced PMU features like > - LBR on Intel machines. > + LBR on Intel machines, ETM traces on ARM machines. > > #. AutoFDO profile generation: Perf output file is converted to > the AutoFDO profile via offline tools. > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > + - For ARM platforms with ETM trace: > + > + Follow the instructions in the `Linaro OpenCSD document > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ > + to record ETM traces for AutoFDO:: > + > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il > + > + For ARM platforms running Android, follow the instructions in the > + `Android simpleperf document > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ > + to record ETM traces for AutoFDO:: > + > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> > + > 4) (Optional) Download the raw perf file to the host machine. > > 5) To generate an AutoFDO profile, two offline tools are available: > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index fd9df6dcc593..c3814df5e391 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -103,6 +103,7 @@ config ARM64 > select ARCH_SUPPORTS_PER_VMA_LOCK > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > select ARCH_SUPPORTS_RT > + select ARCH_SUPPORTS_AUTOFDO_CLANG > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > select ARCH_WANT_DEFAULT_BPF_JIT > -- > 2.47.0.338.g60cca15819-goog >
Add George from ChromeOS. On Mon, Nov 18, 2024 at 3:49 PM Rong Xu <xur@google.com> wrote: > > This patch looks good to me. > > I assume the profile format change in the Android doc will be submitted soon. > Since "extbinary" is a superset of "binary", using the "extbinary" > format profile > in Android shouldn't cause any compatibility issues. > > Reviewed-by: Rong Xu <xur.google.com> > > -Rong > > On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote: > > > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > > selected. > > > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > > Experiments on Android show 4% improvement in cold app startup time > > and 13% improvement in binder benchmarks. > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > --- > > > > Change-Logs in V2: > > > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst. > > 2. Create an issue and a change to use extbinary format in instructions: > > https://github.com/Linaro/OpenCSD/issues/65 > > https://android-review.googlesource.com/c/platform/system/extras/+/3362107 > > > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- > > arch/arm64/Kconfig | 1 + > > 2 files changed, 18 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst > > index 1f0a451e9ccd..a890e84a2fdd 100644 > > --- a/Documentation/dev-tools/autofdo.rst > > +++ b/Documentation/dev-tools/autofdo.rst > > @@ -55,7 +55,7 @@ process consists of the following steps: > > workload to gather execution frequency data. This data is > > collected using hardware sampling, via perf. AutoFDO is most > > effective on platforms supporting advanced PMU features like > > - LBR on Intel machines. > > + LBR on Intel machines, ETM traces on ARM machines. > > > > #. AutoFDO profile generation: Perf output file is converted to > > the AutoFDO profile via offline tools. > > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: > > > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > > > + - For ARM platforms with ETM trace: > > + > > + Follow the instructions in the `Linaro OpenCSD document > > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ > > + to record ETM traces for AutoFDO:: > > + > > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> > > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il > > + > > + For ARM platforms running Android, follow the instructions in the > > + `Android simpleperf document > > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ > > + to record ETM traces for AutoFDO:: > > + > > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> > > + > > 4) (Optional) Download the raw perf file to the host machine. > > > > 5) To generate an AutoFDO profile, two offline tools are available: > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index fd9df6dcc593..c3814df5e391 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -103,6 +103,7 @@ config ARM64 > > select ARCH_SUPPORTS_PER_VMA_LOCK > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > select ARCH_SUPPORTS_RT > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > select ARCH_WANT_DEFAULT_BPF_JIT > > -- > > 2.47.0.338.g60cca15819-goog > >
We've used ETM in ChromeOS for a while now. Hardware requirements make it unfortunately less ubiquitous than LBR, but: - we first launched it on 5.15, - it's still humming along nicely today on 6.6, so: Tested-by: George Burgess IV <gbiv@google.com> IIRC, with a baseline of "using x86_64 AFDO profiles on ARM kernels," we saw a perf win on the order of a few (3? 4?) percentage points when we made the switch. On Tue, Nov 19, 2024 at 5:04 PM Yabin Cui <yabinc@google.com> wrote: > > Add George from ChromeOS. > > On Mon, Nov 18, 2024 at 3:49 PM Rong Xu <xur@google.com> wrote: > > > > This patch looks good to me. > > > > I assume the profile format change in the Android doc will be submitted soon. > > Since "extbinary" is a superset of "binary", using the "extbinary" > > format profile > > in Android shouldn't cause any compatibility issues. > > > > Reviewed-by: Rong Xu <xur.google.com> > > > > -Rong > > > > On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote: > > > > > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > > > selected. > > > > > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > > > Experiments on Android show 4% improvement in cold app startup time > > > and 13% improvement in binder benchmarks. > > > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > > --- > > > > > > Change-Logs in V2: > > > > > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst. > > > 2. Create an issue and a change to use extbinary format in instructions: > > > https://github.com/Linaro/OpenCSD/issues/65 > > > https://android-review.googlesource.com/c/platform/system/extras/+/3362107 > > > > > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- > > > arch/arm64/Kconfig | 1 + > > > 2 files changed, 18 insertions(+), 1 deletion(-) > > > > > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst > > > index 1f0a451e9ccd..a890e84a2fdd 100644 > > > --- a/Documentation/dev-tools/autofdo.rst > > > +++ b/Documentation/dev-tools/autofdo.rst > > > @@ -55,7 +55,7 @@ process consists of the following steps: > > > workload to gather execution frequency data. This data is > > > collected using hardware sampling, via perf. AutoFDO is most > > > effective on platforms supporting advanced PMU features like > > > - LBR on Intel machines. > > > + LBR on Intel machines, ETM traces on ARM machines. > > > > > > #. AutoFDO profile generation: Perf output file is converted to > > > the AutoFDO profile via offline tools. > > > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: > > > > > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > > > > > + - For ARM platforms with ETM trace: > > > + > > > + Follow the instructions in the `Linaro OpenCSD document > > > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ > > > + to record ETM traces for AutoFDO:: > > > + > > > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> FWIW, CrOS spells the event 'cs_etm/autofdo/u'. I'm not familiar enough with perf event syntax (or downstream patches that CrOS has to its kernel) to say whether that should motivate a change here. Happy to find out more if there's interest. > > > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il > > > + > > > + For ARM platforms running Android, follow the instructions in the > > > + `Android simpleperf document > > > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ > > > + to record ETM traces for AutoFDO:: > > > + > > > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> > > > + > > > 4) (Optional) Download the raw perf file to the host machine. > > > > > > 5) To generate an AutoFDO profile, two offline tools are available: > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > index fd9df6dcc593..c3814df5e391 100644 > > > --- a/arch/arm64/Kconfig > > > +++ b/arch/arm64/Kconfig > > > @@ -103,6 +103,7 @@ config ARM64 > > > select ARCH_SUPPORTS_PER_VMA_LOCK > > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > > select ARCH_SUPPORTS_RT > > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > > select ARCH_WANT_DEFAULT_BPF_JIT > > > -- > > > 2.47.0.338.g60cca15819-goog > > >
© 2016 - 2026 Red Hat, Inc.