After KVM supports PEBS for guest on Intel platforms
(https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
host loses the capability to sample guest with PEBS since all PEBS related
MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
switched to guest GVA at vm-entry. This would lead to "perf kvm record"
fails to sample guest on Intel platforms since "cycles:P" event is used to
sample guest by default as below case shows.
sudo perf kvm record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data.guest ]
So to ensure guest record can be sampled successfully, use "cycles"
instead of "cycles:P" to sample guest record by default on Intel
platforms. With this patch, the guest record can be sampled
successfully.
sudo perf kvm record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
Reported-by: Kevin Tian <kevin.tian@intel.com>
Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
index 424716518b75..cdb5f3e1b5be 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -3,9 +3,11 @@
#include <string.h>
#include "../../../util/kvm-stat.h"
#include "../../../util/evsel.h"
+#include "../../../util/env.h"
#include <asm/svm.h>
#include <asm/vmx.h>
#include <asm/kvm.h>
+#include <subcmd/parse-options.h>
define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
@@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
return 0;
}
+
+/*
+ * After KVM supports PEBS for guest on Intel platforms
+ * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
+ * host loses the capability to sample guest with PEBS since all PEBS related
+ * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
+ * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
+ * fails to sample guest on Intel platforms since "cycles:P" event is used to
+ * sample guest by default.
+ *
+ * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
+ * by default to sample guest on Intel platforms.
+ */
+int kvm_add_default_arch_event(int *argc, const char **argv)
+{
+ const char **tmp;
+ bool event = false;
+ int i, j = *argc;
+
+ const struct option event_options[] = {
+ OPT_BOOLEAN('e', "event", &event, NULL),
+ OPT_END()
+ };
+
+ if (!x86__is_intel_cpu())
+ return 0;
+
+ tmp = calloc(j + 1, sizeof(char *));
+ if (!tmp)
+ return -EINVAL;
+
+ for (i = 0; i < j; i++)
+ tmp[i] = argv[i];
+
+ parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
+ if (!event) {
+ argv[j++] = strdup("-e");
+ argv[j++] = strdup("cycles");
+ *argc += 2;
+ }
+
+ free(tmp);
+ return 0;
+}
--
2.34.1
On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote: > After KVM supports PEBS for guest on Intel platforms > (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), > host loses the capability to sample guest with PEBS since all PEBS related > MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is > switched to guest GVA at vm-entry. This would lead to "perf kvm record" > fails to sample guest on Intel platforms since "cycles:P" event is used to > sample guest by default as below case shows. Do you mean we cannot use "cycles:PG" for perf kvm record? > > sudo perf kvm record -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.787 MB perf.data.guest ] > > So to ensure guest record can be sampled successfully, use "cycles" > instead of "cycles:P" to sample guest record by default on Intel > platforms. With this patch, the guest record can be sampled > successfully. > > sudo perf kvm record -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ] What if user already gave some events in the command line? I think you need to check if "-e" or "--event" (and "--pfm-events" too) is in the argv[] before adding these. Thanks, Namhyung > > Reported-by: Kevin Tian <kevin.tian@intel.com> > Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event") > Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> > --- > tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c > index 424716518b75..cdb5f3e1b5be 100644 > --- a/tools/perf/arch/x86/util/kvm-stat.c > +++ b/tools/perf/arch/x86/util/kvm-stat.c > @@ -3,9 +3,11 @@ > #include <string.h> > #include "../../../util/kvm-stat.h" > #include "../../../util/evsel.h" > +#include "../../../util/env.h" > #include <asm/svm.h> > #include <asm/vmx.h> > #include <asm/kvm.h> > +#include <subcmd/parse-options.h> > > define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS); > define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS); > @@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid) > > return 0; > } > + > +/* > + * After KVM supports PEBS for guest on Intel platforms > + * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), > + * host loses the capability to sample guest with PEBS since all PEBS related > + * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is > + * switched to guest GVA at vm-entry. This would lead to "perf kvm record" > + * fails to sample guest on Intel platforms since "cycles:P" event is used to > + * sample guest by default. > + * > + * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event > + * by default to sample guest on Intel platforms. > + */ > +int kvm_add_default_arch_event(int *argc, const char **argv) > +{ > + const char **tmp; > + bool event = false; > + int i, j = *argc; > + > + const struct option event_options[] = { > + OPT_BOOLEAN('e', "event", &event, NULL), > + OPT_END() > + }; > + > + if (!x86__is_intel_cpu()) > + return 0; > + > + tmp = calloc(j + 1, sizeof(char *)); > + if (!tmp) > + return -EINVAL; > + > + for (i = 0; i < j; i++) > + tmp[i] = argv[i]; > + > + parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN); > + if (!event) { > + argv[j++] = strdup("-e"); > + argv[j++] = strdup("cycles"); > + *argc += 2; > + } > + > + free(tmp); > + return 0; > +} > -- > 2.34.1 >
On 8/7/2025 8:08 AM, Namhyung Kim wrote: > On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote: >> After KVM supports PEBS for guest on Intel platforms >> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), >> host loses the capability to sample guest with PEBS since all PEBS related >> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is >> switched to guest GVA at vm-entry. This would lead to "perf kvm record" >> fails to sample guest on Intel platforms since "cycles:P" event is used to >> sample guest by default as below case shows. > Do you mean we cannot use "cycles:PG" for perf kvm record? Yes. Here is the output on Intel Sapphire rapids. sudo ./perf record -e cycles:PG -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.801 MB perf.data ] No guest records are captured with PEBS, and guest PEBS records can be sampled only without PEBS. sudo ./perf record -e cycles:G -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ] > >> sudo perf kvm record -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.787 MB perf.data.guest ] >> >> So to ensure guest record can be sampled successfully, use "cycles" >> instead of "cycles:P" to sample guest record by default on Intel >> platforms. With this patch, the guest record can be sampled >> successfully. >> >> sudo perf kvm record -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ] > What if user already gave some events in the command line? I think you > need to check if "-e" or "--event" (and "--pfm-events" too) is in the > argv[] before adding these. kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks. > > Thanks, > Namhyung > >> Reported-by: Kevin Tian <kevin.tian@intel.com> >> Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event") >> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> >> --- >> tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++ >> 1 file changed, 46 insertions(+) >> >> diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c >> index 424716518b75..cdb5f3e1b5be 100644 >> --- a/tools/perf/arch/x86/util/kvm-stat.c >> +++ b/tools/perf/arch/x86/util/kvm-stat.c >> @@ -3,9 +3,11 @@ >> #include <string.h> >> #include "../../../util/kvm-stat.h" >> #include "../../../util/evsel.h" >> +#include "../../../util/env.h" >> #include <asm/svm.h> >> #include <asm/vmx.h> >> #include <asm/kvm.h> >> +#include <subcmd/parse-options.h> >> >> define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS); >> define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS); >> @@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid) >> >> return 0; >> } >> + >> +/* >> + * After KVM supports PEBS for guest on Intel platforms >> + * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), >> + * host loses the capability to sample guest with PEBS since all PEBS related >> + * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is >> + * switched to guest GVA at vm-entry. This would lead to "perf kvm record" >> + * fails to sample guest on Intel platforms since "cycles:P" event is used to >> + * sample guest by default. >> + * >> + * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event >> + * by default to sample guest on Intel platforms. >> + */ >> +int kvm_add_default_arch_event(int *argc, const char **argv) >> +{ >> + const char **tmp; >> + bool event = false; >> + int i, j = *argc; >> + >> + const struct option event_options[] = { >> + OPT_BOOLEAN('e', "event", &event, NULL), >> + OPT_END() >> + }; >> + >> + if (!x86__is_intel_cpu()) >> + return 0; >> + >> + tmp = calloc(j + 1, sizeof(char *)); >> + if (!tmp) >> + return -EINVAL; >> + >> + for (i = 0; i < j; i++) >> + tmp[i] = argv[i]; >> + >> + parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN); >> + if (!event) { >> + argv[j++] = strdup("-e"); >> + argv[j++] = strdup("cycles"); >> + *argc += 2; >> + } >> + >> + free(tmp); >> + return 0; >> +} >> -- >> 2.34.1 >>
On Thu, Aug 07, 2025 at 11:08:11AM +0800, Mi, Dapeng wrote: > > On 8/7/2025 8:08 AM, Namhyung Kim wrote: > > On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote: > >> After KVM supports PEBS for guest on Intel platforms > >> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), > >> host loses the capability to sample guest with PEBS since all PEBS related > >> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is > >> switched to guest GVA at vm-entry. This would lead to "perf kvm record" > >> fails to sample guest on Intel platforms since "cycles:P" event is used to > >> sample guest by default as below case shows. > > Do you mean we cannot use "cycles:PG" for perf kvm record? > > Yes. Here is the output on Intel Sapphire rapids. > > sudo ./perf record -e cycles:PG -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.801 MB perf.data ] > > No guest records are captured with PEBS, and guest PEBS records can be > sampled only without PEBS. > > sudo ./perf record -e cycles:G -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ] > > > > > >> sudo perf kvm record -a > >> ^C[ perf record: Woken up 1 times to write data ] > >> [ perf record: Captured and wrote 0.787 MB perf.data.guest ] > >> > >> So to ensure guest record can be sampled successfully, use "cycles" > >> instead of "cycles:P" to sample guest record by default on Intel > >> platforms. With this patch, the guest record can be sampled > >> successfully. > >> > >> sudo perf kvm record -a > >> ^C[ perf record: Woken up 1 times to write data ] > >> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ] > > What if user already gave some events in the command line? I think you > > need to check if "-e" or "--event" (and "--pfm-events" too) is in the > > argv[] before adding these. > > kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks. Oh, ok. I can see you called parse_options to check the option. You'd better to check "--pfm-events" as well. Thanks, Namhyung
On 8/9/2025 6:10 AM, Namhyung Kim wrote: > On Thu, Aug 07, 2025 at 11:08:11AM +0800, Mi, Dapeng wrote: >> On 8/7/2025 8:08 AM, Namhyung Kim wrote: >>> On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote: >>>> After KVM supports PEBS for guest on Intel platforms >>>> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), >>>> host loses the capability to sample guest with PEBS since all PEBS related >>>> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is >>>> switched to guest GVA at vm-entry. This would lead to "perf kvm record" >>>> fails to sample guest on Intel platforms since "cycles:P" event is used to >>>> sample guest by default as below case shows. >>> Do you mean we cannot use "cycles:PG" for perf kvm record? >> Yes. Here is the output on Intel Sapphire rapids. >> >> sudo ./perf record -e cycles:PG -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.801 MB perf.data ] >> >> No guest records are captured with PEBS, and guest PEBS records can be >> sampled only without PEBS. >> >> sudo ./perf record -e cycles:G -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ] >> >> >>>> sudo perf kvm record -a >>>> ^C[ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.787 MB perf.data.guest ] >>>> >>>> So to ensure guest record can be sampled successfully, use "cycles" >>>> instead of "cycles:P" to sample guest record by default on Intel >>>> platforms. With this patch, the guest record can be sampled >>>> successfully. >>>> >>>> sudo perf kvm record -a >>>> ^C[ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ] >>> What if user already gave some events in the command line? I think you >>> need to check if "-e" or "--event" (and "--pfm-events" too) is in the >>> argv[] before adding these. >> kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks. > Oh, ok. I can see you called parse_options to check the option. > You'd better to check "--pfm-events" as well. Sure. Thanks. > > Thanks, > Namhyung > >
© 2016 - 2025 Red Hat, Inc.