After KVM supports PEBS for guest on Intel platforms
(https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
host loses the capability to sample guest with PEBS since all PEBS related
MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
switched to guest GVA at vm-entry. This would lead to "perf kvm record"
fails to sample guest on Intel platforms since "cycles:P" event is used to
sample guest by default as below case shows.
sudo perf kvm record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data.guest ]
So to ensure guest record can be sampled successfully, use "cycles"
instead of "cycles:P" to sample guest record by default on Intel
platforms. With this patch, the guest record can be sampled
successfully.
sudo perf kvm record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
Reported-by: Kevin Tian <kevin.tian@intel.com>
Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
index 424716518b75..cdb5f3e1b5be 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -3,9 +3,11 @@
#include <string.h>
#include "../../../util/kvm-stat.h"
#include "../../../util/evsel.h"
+#include "../../../util/env.h"
#include <asm/svm.h>
#include <asm/vmx.h>
#include <asm/kvm.h>
+#include <subcmd/parse-options.h>
define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
@@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
return 0;
}
+
+/*
+ * After KVM supports PEBS for guest on Intel platforms
+ * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
+ * host loses the capability to sample guest with PEBS since all PEBS related
+ * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
+ * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
+ * fails to sample guest on Intel platforms since "cycles:P" event is used to
+ * sample guest by default.
+ *
+ * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
+ * by default to sample guest on Intel platforms.
+ */
+int kvm_add_default_arch_event(int *argc, const char **argv)
+{
+ const char **tmp;
+ bool event = false;
+ int i, j = *argc;
+
+ const struct option event_options[] = {
+ OPT_BOOLEAN('e', "event", &event, NULL),
+ OPT_END()
+ };
+
+ if (!x86__is_intel_cpu())
+ return 0;
+
+ tmp = calloc(j + 1, sizeof(char *));
+ if (!tmp)
+ return -EINVAL;
+
+ for (i = 0; i < j; i++)
+ tmp[i] = argv[i];
+
+ parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
+ if (!event) {
+ argv[j++] = strdup("-e");
+ argv[j++] = strdup("cycles");
+ *argc += 2;
+ }
+
+ free(tmp);
+ return 0;
+}
--
2.34.1
On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote:
> After KVM supports PEBS for guest on Intel platforms
> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
> host loses the capability to sample guest with PEBS since all PEBS related
> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
> switched to guest GVA at vm-entry. This would lead to "perf kvm record"
> fails to sample guest on Intel platforms since "cycles:P" event is used to
> sample guest by default as below case shows.
Do you mean we cannot use "cycles:PG" for perf kvm record?
>
> sudo perf kvm record -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.787 MB perf.data.guest ]
>
> So to ensure guest record can be sampled successfully, use "cycles"
> instead of "cycles:P" to sample guest record by default on Intel
> platforms. With this patch, the guest record can be sampled
> successfully.
>
> sudo perf kvm record -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
What if user already gave some events in the command line? I think you
need to check if "-e" or "--event" (and "--pfm-events" too) is in the
argv[] before adding these.
Thanks,
Namhyung
>
> Reported-by: Kevin Tian <kevin.tian@intel.com>
> Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
> tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
> index 424716518b75..cdb5f3e1b5be 100644
> --- a/tools/perf/arch/x86/util/kvm-stat.c
> +++ b/tools/perf/arch/x86/util/kvm-stat.c
> @@ -3,9 +3,11 @@
> #include <string.h>
> #include "../../../util/kvm-stat.h"
> #include "../../../util/evsel.h"
> +#include "../../../util/env.h"
> #include <asm/svm.h>
> #include <asm/vmx.h>
> #include <asm/kvm.h>
> +#include <subcmd/parse-options.h>
>
> define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
> define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
> @@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
>
> return 0;
> }
> +
> +/*
> + * After KVM supports PEBS for guest on Intel platforms
> + * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
> + * host loses the capability to sample guest with PEBS since all PEBS related
> + * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
> + * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
> + * fails to sample guest on Intel platforms since "cycles:P" event is used to
> + * sample guest by default.
> + *
> + * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
> + * by default to sample guest on Intel platforms.
> + */
> +int kvm_add_default_arch_event(int *argc, const char **argv)
> +{
> + const char **tmp;
> + bool event = false;
> + int i, j = *argc;
> +
> + const struct option event_options[] = {
> + OPT_BOOLEAN('e', "event", &event, NULL),
> + OPT_END()
> + };
> +
> + if (!x86__is_intel_cpu())
> + return 0;
> +
> + tmp = calloc(j + 1, sizeof(char *));
> + if (!tmp)
> + return -EINVAL;
> +
> + for (i = 0; i < j; i++)
> + tmp[i] = argv[i];
> +
> + parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
> + if (!event) {
> + argv[j++] = strdup("-e");
> + argv[j++] = strdup("cycles");
> + *argc += 2;
> + }
> +
> + free(tmp);
> + return 0;
> +}
> --
> 2.34.1
>
On 8/7/2025 8:08 AM, Namhyung Kim wrote:
> On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote:
>> After KVM supports PEBS for guest on Intel platforms
>> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
>> host loses the capability to sample guest with PEBS since all PEBS related
>> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
>> switched to guest GVA at vm-entry. This would lead to "perf kvm record"
>> fails to sample guest on Intel platforms since "cycles:P" event is used to
>> sample guest by default as below case shows.
> Do you mean we cannot use "cycles:PG" for perf kvm record?
Yes. Here is the output on Intel Sapphire rapids.
sudo ./perf record -e cycles:PG -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.801 MB perf.data ]
No guest records are captured with PEBS, and guest PEBS records can be
sampled only without PEBS.
sudo ./perf record -e cycles:G -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ]
>
>> sudo perf kvm record -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.787 MB perf.data.guest ]
>>
>> So to ensure guest record can be sampled successfully, use "cycles"
>> instead of "cycles:P" to sample guest record by default on Intel
>> platforms. With this patch, the guest record can be sampled
>> successfully.
>>
>> sudo perf kvm record -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
> What if user already gave some events in the command line? I think you
> need to check if "-e" or "--event" (and "--pfm-events" too) is in the
> argv[] before adding these.
kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks.
>
> Thanks,
> Namhyung
>
>> Reported-by: Kevin Tian <kevin.tian@intel.com>
>> Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>> tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
>> 1 file changed, 46 insertions(+)
>>
>> diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
>> index 424716518b75..cdb5f3e1b5be 100644
>> --- a/tools/perf/arch/x86/util/kvm-stat.c
>> +++ b/tools/perf/arch/x86/util/kvm-stat.c
>> @@ -3,9 +3,11 @@
>> #include <string.h>
>> #include "../../../util/kvm-stat.h"
>> #include "../../../util/evsel.h"
>> +#include "../../../util/env.h"
>> #include <asm/svm.h>
>> #include <asm/vmx.h>
>> #include <asm/kvm.h>
>> +#include <subcmd/parse-options.h>
>>
>> define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
>> define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
>> @@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
>>
>> return 0;
>> }
>> +
>> +/*
>> + * After KVM supports PEBS for guest on Intel platforms
>> + * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
>> + * host loses the capability to sample guest with PEBS since all PEBS related
>> + * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
>> + * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
>> + * fails to sample guest on Intel platforms since "cycles:P" event is used to
>> + * sample guest by default.
>> + *
>> + * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
>> + * by default to sample guest on Intel platforms.
>> + */
>> +int kvm_add_default_arch_event(int *argc, const char **argv)
>> +{
>> + const char **tmp;
>> + bool event = false;
>> + int i, j = *argc;
>> +
>> + const struct option event_options[] = {
>> + OPT_BOOLEAN('e', "event", &event, NULL),
>> + OPT_END()
>> + };
>> +
>> + if (!x86__is_intel_cpu())
>> + return 0;
>> +
>> + tmp = calloc(j + 1, sizeof(char *));
>> + if (!tmp)
>> + return -EINVAL;
>> +
>> + for (i = 0; i < j; i++)
>> + tmp[i] = argv[i];
>> +
>> + parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
>> + if (!event) {
>> + argv[j++] = strdup("-e");
>> + argv[j++] = strdup("cycles");
>> + *argc += 2;
>> + }
>> +
>> + free(tmp);
>> + return 0;
>> +}
>> --
>> 2.34.1
>>
On Thu, Aug 07, 2025 at 11:08:11AM +0800, Mi, Dapeng wrote: > > On 8/7/2025 8:08 AM, Namhyung Kim wrote: > > On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote: > >> After KVM supports PEBS for guest on Intel platforms > >> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), > >> host loses the capability to sample guest with PEBS since all PEBS related > >> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is > >> switched to guest GVA at vm-entry. This would lead to "perf kvm record" > >> fails to sample guest on Intel platforms since "cycles:P" event is used to > >> sample guest by default as below case shows. > > Do you mean we cannot use "cycles:PG" for perf kvm record? > > Yes. Here is the output on Intel Sapphire rapids. > > sudo ./perf record -e cycles:PG -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.801 MB perf.data ] > > No guest records are captured with PEBS, and guest PEBS records can be > sampled only without PEBS. > > sudo ./perf record -e cycles:G -a > ^C[ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ] > > > > > >> sudo perf kvm record -a > >> ^C[ perf record: Woken up 1 times to write data ] > >> [ perf record: Captured and wrote 0.787 MB perf.data.guest ] > >> > >> So to ensure guest record can be sampled successfully, use "cycles" > >> instead of "cycles:P" to sample guest record by default on Intel > >> platforms. With this patch, the guest record can be sampled > >> successfully. > >> > >> sudo perf kvm record -a > >> ^C[ perf record: Woken up 1 times to write data ] > >> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ] > > What if user already gave some events in the command line? I think you > > need to check if "-e" or "--event" (and "--pfm-events" too) is in the > > argv[] before adding these. > > kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks. Oh, ok. I can see you called parse_options to check the option. You'd better to check "--pfm-events" as well. Thanks, Namhyung
On 8/9/2025 6:10 AM, Namhyung Kim wrote: > On Thu, Aug 07, 2025 at 11:08:11AM +0800, Mi, Dapeng wrote: >> On 8/7/2025 8:08 AM, Namhyung Kim wrote: >>> On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote: >>>> After KVM supports PEBS for guest on Intel platforms >>>> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/), >>>> host loses the capability to sample guest with PEBS since all PEBS related >>>> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is >>>> switched to guest GVA at vm-entry. This would lead to "perf kvm record" >>>> fails to sample guest on Intel platforms since "cycles:P" event is used to >>>> sample guest by default as below case shows. >>> Do you mean we cannot use "cycles:PG" for perf kvm record? >> Yes. Here is the output on Intel Sapphire rapids. >> >> sudo ./perf record -e cycles:PG -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.801 MB perf.data ] >> >> No guest records are captured with PEBS, and guest PEBS records can be >> sampled only without PEBS. >> >> sudo ./perf record -e cycles:G -a >> ^C[ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ] >> >> >>>> sudo perf kvm record -a >>>> ^C[ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.787 MB perf.data.guest ] >>>> >>>> So to ensure guest record can be sampled successfully, use "cycles" >>>> instead of "cycles:P" to sample guest record by default on Intel >>>> platforms. With this patch, the guest record can be sampled >>>> successfully. >>>> >>>> sudo perf kvm record -a >>>> ^C[ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ] >>> What if user already gave some events in the command line? I think you >>> need to check if "-e" or "--event" (and "--pfm-events" too) is in the >>> argv[] before adding these. >> kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks. > Oh, ok. I can see you called parse_options to check the option. > You'd better to check "--pfm-events" as well. Sure. Thanks. > > Thanks, > Namhyung > >
© 2016 - 2026 Red Hat, Inc.