[PATCH 4/5] perf tools kvm: Use "cycles" to sample guest for "kvm record" on Intel

Dapeng Mi posted 5 patches 2 months ago
There is a newer version of this series
[PATCH 4/5] perf tools kvm: Use "cycles" to sample guest for "kvm record" on Intel
Posted by Dapeng Mi 2 months ago
After KVM supports PEBS for guest on Intel platforms
(https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
host loses the capability to sample guest with PEBS since all PEBS related
MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
switched to guest GVA at vm-entry. This would lead to "perf kvm record"
fails to sample guest on Intel platforms since "cycles:P" event is used to
sample guest by default as below case shows.

sudo perf kvm record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data.guest ]

So to ensure guest record can be sampled successfully, use "cycles"
instead of "cycles:P" to sample guest record by default on Intel
platforms. With this patch, the guest record can be sampled
successfully.

sudo perf kvm record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]

Reported-by: Kevin Tian <kevin.tian@intel.com>
Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
index 424716518b75..cdb5f3e1b5be 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -3,9 +3,11 @@
 #include <string.h>
 #include "../../../util/kvm-stat.h"
 #include "../../../util/evsel.h"
+#include "../../../util/env.h"
 #include <asm/svm.h>
 #include <asm/vmx.h>
 #include <asm/kvm.h>
+#include <subcmd/parse-options.h>
 
 define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
 define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
@@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
 
 	return 0;
 }
+
+/*
+ * After KVM supports PEBS for guest on Intel platforms
+ * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
+ * host loses the capability to sample guest with PEBS since all PEBS related
+ * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
+ * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
+ * fails to sample guest on Intel platforms since "cycles:P" event is used to
+ * sample guest by default.
+ *
+ * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
+ * by default to sample guest on Intel platforms.
+ */
+int kvm_add_default_arch_event(int *argc, const char **argv)
+{
+	const char **tmp;
+	bool event = false;
+	int i, j = *argc;
+
+	const struct option event_options[] = {
+		OPT_BOOLEAN('e', "event", &event, NULL),
+		OPT_END()
+	};
+
+	if (!x86__is_intel_cpu())
+		return 0;
+
+	tmp = calloc(j + 1, sizeof(char *));
+	if (!tmp)
+		return -EINVAL;
+
+	for (i = 0; i < j; i++)
+		tmp[i] = argv[i];
+
+	parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
+	if (!event) {
+		argv[j++] = strdup("-e");
+		argv[j++] = strdup("cycles");
+		*argc += 2;
+	}
+
+	free(tmp);
+	return 0;
+}
-- 
2.34.1
Re: [PATCH 4/5] perf tools kvm: Use "cycles" to sample guest for "kvm record" on Intel
Posted by Namhyung Kim 1 month, 4 weeks ago
On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote:
> After KVM supports PEBS for guest on Intel platforms
> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
> host loses the capability to sample guest with PEBS since all PEBS related
> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
> switched to guest GVA at vm-entry. This would lead to "perf kvm record"
> fails to sample guest on Intel platforms since "cycles:P" event is used to
> sample guest by default as below case shows.

Do you mean we cannot use "cycles:PG" for perf kvm record?

> 
> sudo perf kvm record -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.787 MB perf.data.guest ]
> 
> So to ensure guest record can be sampled successfully, use "cycles"
> instead of "cycles:P" to sample guest record by default on Intel
> platforms. With this patch, the guest record can be sampled
> successfully.
> 
> sudo perf kvm record -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]

What if user already gave some events in the command line?  I think you
need to check if "-e" or "--event" (and "--pfm-events" too) is in the
argv[] before adding these.

Thanks,
Namhyung

> 
> Reported-by: Kevin Tian <kevin.tian@intel.com>
> Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
> index 424716518b75..cdb5f3e1b5be 100644
> --- a/tools/perf/arch/x86/util/kvm-stat.c
> +++ b/tools/perf/arch/x86/util/kvm-stat.c
> @@ -3,9 +3,11 @@
>  #include <string.h>
>  #include "../../../util/kvm-stat.h"
>  #include "../../../util/evsel.h"
> +#include "../../../util/env.h"
>  #include <asm/svm.h>
>  #include <asm/vmx.h>
>  #include <asm/kvm.h>
> +#include <subcmd/parse-options.h>
>  
>  define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
>  define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
> @@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
>  
>  	return 0;
>  }
> +
> +/*
> + * After KVM supports PEBS for guest on Intel platforms
> + * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
> + * host loses the capability to sample guest with PEBS since all PEBS related
> + * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
> + * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
> + * fails to sample guest on Intel platforms since "cycles:P" event is used to
> + * sample guest by default.
> + *
> + * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
> + * by default to sample guest on Intel platforms.
> + */
> +int kvm_add_default_arch_event(int *argc, const char **argv)
> +{
> +	const char **tmp;
> +	bool event = false;
> +	int i, j = *argc;
> +
> +	const struct option event_options[] = {
> +		OPT_BOOLEAN('e', "event", &event, NULL),
> +		OPT_END()
> +	};
> +
> +	if (!x86__is_intel_cpu())
> +		return 0;
> +
> +	tmp = calloc(j + 1, sizeof(char *));
> +	if (!tmp)
> +		return -EINVAL;
> +
> +	for (i = 0; i < j; i++)
> +		tmp[i] = argv[i];
> +
> +	parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
> +	if (!event) {
> +		argv[j++] = strdup("-e");
> +		argv[j++] = strdup("cycles");
> +		*argc += 2;
> +	}
> +
> +	free(tmp);
> +	return 0;
> +}
> -- 
> 2.34.1
>
Re: [PATCH 4/5] perf tools kvm: Use "cycles" to sample guest for "kvm record" on Intel
Posted by Mi, Dapeng 1 month, 4 weeks ago
On 8/7/2025 8:08 AM, Namhyung Kim wrote:
> On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote:
>> After KVM supports PEBS for guest on Intel platforms
>> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
>> host loses the capability to sample guest with PEBS since all PEBS related
>> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
>> switched to guest GVA at vm-entry. This would lead to "perf kvm record"
>> fails to sample guest on Intel platforms since "cycles:P" event is used to
>> sample guest by default as below case shows.
> Do you mean we cannot use "cycles:PG" for perf kvm record?

Yes. Here is the output on Intel Sapphire rapids.

sudo ./perf record -e cycles:PG -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.801 MB perf.data ]

No guest records are captured with PEBS, and guest PEBS records can be
sampled only without PEBS.

sudo ./perf record -e cycles:G -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ]


>
>> sudo perf kvm record -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.787 MB perf.data.guest ]
>>
>> So to ensure guest record can be sampled successfully, use "cycles"
>> instead of "cycles:P" to sample guest record by default on Intel
>> platforms. With this patch, the guest record can be sampled
>> successfully.
>>
>> sudo perf kvm record -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
> What if user already gave some events in the command line?  I think you
> need to check if "-e" or "--event" (and "--pfm-events" too) is in the
> argv[] before adding these.

kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks.

>
> Thanks,
> Namhyung
>
>> Reported-by: Kevin Tian <kevin.tian@intel.com>
>> Fixes: 634d36f82517 ("perf record: Just use "cycles:P" as the default event")
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/arch/x86/util/kvm-stat.c | 46 +++++++++++++++++++++++++++++
>>  1 file changed, 46 insertions(+)
>>
>> diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c
>> index 424716518b75..cdb5f3e1b5be 100644
>> --- a/tools/perf/arch/x86/util/kvm-stat.c
>> +++ b/tools/perf/arch/x86/util/kvm-stat.c
>> @@ -3,9 +3,11 @@
>>  #include <string.h>
>>  #include "../../../util/kvm-stat.h"
>>  #include "../../../util/evsel.h"
>> +#include "../../../util/env.h"
>>  #include <asm/svm.h>
>>  #include <asm/vmx.h>
>>  #include <asm/kvm.h>
>> +#include <subcmd/parse-options.h>
>>  
>>  define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
>>  define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
>> @@ -211,3 +213,47 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
>>  
>>  	return 0;
>>  }
>> +
>> +/*
>> + * After KVM supports PEBS for guest on Intel platforms
>> + * (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
>> + * host loses the capability to sample guest with PEBS since all PEBS related
>> + * MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
>> + * switched to guest GVA at vm-entry. This would lead to "perf kvm record"
>> + * fails to sample guest on Intel platforms since "cycles:P" event is used to
>> + * sample guest by default.
>> + *
>> + * So, to avoid this issue explicitly use "cycles" instead of "cycles:P" event
>> + * by default to sample guest on Intel platforms.
>> + */
>> +int kvm_add_default_arch_event(int *argc, const char **argv)
>> +{
>> +	const char **tmp;
>> +	bool event = false;
>> +	int i, j = *argc;
>> +
>> +	const struct option event_options[] = {
>> +		OPT_BOOLEAN('e', "event", &event, NULL),
>> +		OPT_END()
>> +	};
>> +
>> +	if (!x86__is_intel_cpu())
>> +		return 0;
>> +
>> +	tmp = calloc(j + 1, sizeof(char *));
>> +	if (!tmp)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < j; i++)
>> +		tmp[i] = argv[i];
>> +
>> +	parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
>> +	if (!event) {
>> +		argv[j++] = strdup("-e");
>> +		argv[j++] = strdup("cycles");
>> +		*argc += 2;
>> +	}
>> +
>> +	free(tmp);
>> +	return 0;
>> +}
>> -- 
>> 2.34.1
>>
Re: [PATCH 4/5] perf tools kvm: Use "cycles" to sample guest for "kvm record" on Intel
Posted by Namhyung Kim 1 month, 3 weeks ago
On Thu, Aug 07, 2025 at 11:08:11AM +0800, Mi, Dapeng wrote:
> 
> On 8/7/2025 8:08 AM, Namhyung Kim wrote:
> > On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote:
> >> After KVM supports PEBS for guest on Intel platforms
> >> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
> >> host loses the capability to sample guest with PEBS since all PEBS related
> >> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
> >> switched to guest GVA at vm-entry. This would lead to "perf kvm record"
> >> fails to sample guest on Intel platforms since "cycles:P" event is used to
> >> sample guest by default as below case shows.
> > Do you mean we cannot use "cycles:PG" for perf kvm record?
> 
> Yes. Here is the output on Intel Sapphire rapids.
> 
> sudo ./perf record -e cycles:PG -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.801 MB perf.data ]
> 
> No guest records are captured with PEBS, and guest PEBS records can be
> sampled only without PEBS.
> 
> sudo ./perf record -e cycles:G -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ]
> 
> 
> >
> >> sudo perf kvm record -a
> >> ^C[ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.787 MB perf.data.guest ]
> >>
> >> So to ensure guest record can be sampled successfully, use "cycles"
> >> instead of "cycles:P" to sample guest record by default on Intel
> >> platforms. With this patch, the guest record can be sampled
> >> successfully.
> >>
> >> sudo perf kvm record -a
> >> ^C[ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
> > What if user already gave some events in the command line?  I think you
> > need to check if "-e" or "--event" (and "--pfm-events" too) is in the
> > argv[] before adding these.
> 
> kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks.

Oh, ok.  I can see you called parse_options to check the option.
You'd better to check "--pfm-events" as well.

Thanks,
Namhyung
Re: [PATCH 4/5] perf tools kvm: Use "cycles" to sample guest for "kvm record" on Intel
Posted by Mi, Dapeng 1 month, 3 weeks ago
On 8/9/2025 6:10 AM, Namhyung Kim wrote:
> On Thu, Aug 07, 2025 at 11:08:11AM +0800, Mi, Dapeng wrote:
>> On 8/7/2025 8:08 AM, Namhyung Kim wrote:
>>> On Tue, Aug 05, 2025 at 08:46:32AM +0800, Dapeng Mi wrote:
>>>> After KVM supports PEBS for guest on Intel platforms
>>>> (https://lore.kernel.org/all/20220411101946.20262-1-likexu@tencent.com/),
>>>> host loses the capability to sample guest with PEBS since all PEBS related
>>>> MSRs are switched to guest value after vm-entry, like IA32_DS_AREA MSR is
>>>> switched to guest GVA at vm-entry. This would lead to "perf kvm record"
>>>> fails to sample guest on Intel platforms since "cycles:P" event is used to
>>>> sample guest by default as below case shows.
>>> Do you mean we cannot use "cycles:PG" for perf kvm record?
>> Yes. Here is the output on Intel Sapphire rapids.
>>
>> sudo ./perf record -e cycles:PG -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.801 MB perf.data ]
>>
>> No guest records are captured with PEBS, and guest PEBS records can be
>> sampled only without PEBS.
>>
>> sudo ./perf record -e cycles:G -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.798 MB perf.data (60 samples) ]
>>
>>
>>>> sudo perf kvm record -a
>>>> ^C[ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.787 MB perf.data.guest ]
>>>>
>>>> So to ensure guest record can be sampled successfully, use "cycles"
>>>> instead of "cycles:P" to sample guest record by default on Intel
>>>> platforms. With this patch, the guest record can be sampled
>>>> successfully.
>>>>
>>>> sudo perf kvm record -a
>>>> ^C[ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.783 MB perf.data.guest (23 samples) ]
>>> What if user already gave some events in the command line?  I think you
>>> need to check if "-e" or "--event" (and "--pfm-events" too) is in the
>>> argv[] before adding these.
>> kvm_add_default_arch_event() would detect if user already sets events explicitly. If so, it won't add "cycles" event any more. Thanks.
> Oh, ok.  I can see you called parse_options to check the option.
> You'd better to check "--pfm-events" as well.

Sure. Thanks.


>
> Thanks,
> Namhyung
>
>