arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/debugfs.c | 17 +++++++++++++++++ arch/x86/kvm/x86.c | 1 + 3 files changed, 19 insertions(+)
It's helpful to know whether some other host activity affects a virtual
machine to estimate virtual machine quality of sevice.
The fact of virtual machine affection from the host side can be obtained
by reading "preemption_reported" counter via kvm entries of sysfs, but
the exact vcpu waiting time isn't reported to the host.
This patch adds this reporting.
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/debugfs.c | 17 +++++++++++++++++
arch/x86/kvm/x86.c | 1 +
3 files changed, 19 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a68cb3eba78f..3d4bd3ca83593 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -899,6 +899,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u64 last_steal;
struct gfn_to_hva_cache cache;
+ u64 steal_total;
} st;
u64 l1_tsc_offset;
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index 999227fc7c665..e67136954c095 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -11,6 +11,7 @@
#include "lapic.h"
#include "mmu.h"
#include "mmu/mmu_internal.h"
+#include "cpuid.h"
static int vcpu_get_timer_advance_ns(void *data, u64 *val)
{
@@ -56,6 +57,19 @@ static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val)
DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n");
+static int vcpu_get_steal_time(void *data, u64 *val)
+{
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
+
+ if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
+ return 1;
+
+ *val = vcpu->arch.st.steal_total;
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(vcpu_steal_time_fops, vcpu_get_steal_time, NULL, "%llu\n");
+
void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
{
debugfs_create_file("guest_mode", 0444, debugfs_dentry, vcpu,
@@ -76,6 +90,9 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
debugfs_dentry, vcpu,
&vcpu_tsc_scaling_frac_fops);
}
+
+ debugfs_create_file("steal-time", 0444, debugfs_dentry, vcpu,
+ &vcpu_steal_time_fops);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 70219e4069874..ca5f21b930d4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3735,6 +3735,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
+ vcpu->arch.st.steal_total = steal;
unsafe_put_user(steal, &st->steal, out);
version += 1;
--
2.34.1
On Tue, Sep 17, 2024, Denis Plotnikov wrote: > It's helpful to know whether some other host activity affects a virtual > machine to estimate virtual machine quality of sevice. > The fact of virtual machine affection from the host side can be obtained > by reading "preemption_reported" counter via kvm entries of sysfs, but > the exact vcpu waiting time isn't reported to the host. > This patch adds this reporting. > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> > --- > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/debugfs.c | 17 +++++++++++++++++ Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be present as KVM (correctly) ignores debugfs setup errors. Using debugfs is also unnecessary. The total steal time is available in guest memory, and by definition that memory is shared with the host. To query total steal time from userspace, use MSR filtering to trap writes (and reflect writes back into KVM) so that the GPA of the steal time structure is known, and then simply read the actual steal time from guest memory as needed.
On 9/22/24 11:04, Sean Christopherson wrote: > On Tue, Sep 17, 2024, Denis Plotnikov wrote: >> It's helpful to know whether some other host activity affects a virtual >> machine to estimate virtual machine quality of sevice. >> The fact of virtual machine affection from the host side can be obtained >> by reading "preemption_reported" counter via kvm entries of sysfs, but >> the exact vcpu waiting time isn't reported to the host. >> This patch adds this reporting. >> >> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> >> --- >> arch/x86/include/asm/kvm_host.h | 1 + >> arch/x86/kvm/debugfs.c | 17 +++++++++++++++++ > > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be > present as KVM (correctly) ignores debugfs setup errors. > > Using debugfs is also unnecessary. The total steal time is available in guest > memory, and by definition that memory is shared with the host. To query total > steal time from userspace, use MSR filtering to trap writes (and reflect writes > back into KVM) so that the GPA of the steal time structure is known, and then > simply read the actual steal time from guest memory as needed. Thanks for the reply! Just to clarify, by reading the actual steal time from guest memory do you mean by using some kind of new vcpu ioctl? Best, Denis
On Mon, Sep 23, 2024, Denis Plotnikov wrote: > On 9/22/24 11:04, Sean Christopherson wrote: > > On Tue, Sep 17, 2024, Denis Plotnikov wrote: > > > It's helpful to know whether some other host activity affects a virtual > > > machine to estimate virtual machine quality of sevice. > > > The fact of virtual machine affection from the host side can be obtained > > > by reading "preemption_reported" counter via kvm entries of sysfs, but > > > the exact vcpu waiting time isn't reported to the host. > > > This patch adds this reporting. > > > > > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> > > > --- > > > arch/x86/include/asm/kvm_host.h | 1 + > > > arch/x86/kvm/debugfs.c | 17 +++++++++++++++++ > > > > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be > > present as KVM (correctly) ignores debugfs setup errors. > > > > Using debugfs is also unnecessary. The total steal time is available in guest > > memory, and by definition that memory is shared with the host. To query total > > steal time from userspace, use MSR filtering to trap writes (and reflect writes > > back into KVM) so that the GPA of the steal time structure is known, and then > > simply read the actual steal time from guest memory as needed. > Thanks for the reply! > Just to clarify, by reading the actual steal time from guest memory do you > mean by using some kind of new vcpu ioctl? No, I mean by using the host userspace VMA to read the memory.
On 9/23/24 14:46, Sean Christopherson wrote: > On Mon, Sep 23, 2024, Denis Plotnikov wrote: >> On 9/22/24 11:04, Sean Christopherson wrote: >>> On Tue, Sep 17, 2024, Denis Plotnikov wrote: >>>> It's helpful to know whether some other host activity affects a virtual >>>> machine to estimate virtual machine quality of sevice. >>>> The fact of virtual machine affection from the host side can be obtained >>>> by reading "preemption_reported" counter via kvm entries of sysfs, but >>>> the exact vcpu waiting time isn't reported to the host. >>>> This patch adds this reporting. >>>> >>>> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> >>>> --- >>>> arch/x86/include/asm/kvm_host.h | 1 + >>>> arch/x86/kvm/debugfs.c | 17 +++++++++++++++++ >>> >>> Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be >>> present as KVM (correctly) ignores debugfs setup errors. >>> >>> Using debugfs is also unnecessary. The total steal time is available in guest >>> memory, and by definition that memory is shared with the host. To query total >>> steal time from userspace, use MSR filtering to trap writes (and reflect writes >>> back into KVM) so that the GPA of the steal time structure is known, and then >>> simply read the actual steal time from guest memory as needed. >> Thanks for the reply! >> Just to clarify, by reading the actual steal time from guest memory do you >> mean by using some kind of new vcpu ioctl? > > No, I mean by using the host userspace VMA to read the memory. Oh, I think I got your idea. You mean using KVM_CAP_X86_MSR_FILTER which... "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to trap and emulate MSRs ..." And then having guest's steal time struct valid address read the value from userspace VMM like qemu directly. Thanks for the answers! Best, Denis
On Mon, Sep 30, 2024, Denis Plotnikov wrote: > > > On 9/23/24 14:46, Sean Christopherson wrote: > > On Mon, Sep 23, 2024, Denis Plotnikov wrote: > > > On 9/22/24 11:04, Sean Christopherson wrote: > > > > On Tue, Sep 17, 2024, Denis Plotnikov wrote: > > > > > It's helpful to know whether some other host activity affects a virtual > > > > > machine to estimate virtual machine quality of sevice. > > > > > The fact of virtual machine affection from the host side can be obtained > > > > > by reading "preemption_reported" counter via kvm entries of sysfs, but > > > > > the exact vcpu waiting time isn't reported to the host. > > > > > This patch adds this reporting. > > > > > > > > > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> > > > > > --- > > > > > arch/x86/include/asm/kvm_host.h | 1 + > > > > > arch/x86/kvm/debugfs.c | 17 +++++++++++++++++ > > > > > > > > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be > > > > present as KVM (correctly) ignores debugfs setup errors. > > > > > > > > Using debugfs is also unnecessary. The total steal time is available in guest > > > > memory, and by definition that memory is shared with the host. To query total > > > > steal time from userspace, use MSR filtering to trap writes (and reflect writes > > > > back into KVM) so that the GPA of the steal time structure is known, and then > > > > simply read the actual steal time from guest memory as needed. > > > Thanks for the reply! > > > Just to clarify, by reading the actual steal time from guest memory do you > > > mean by using some kind of new vcpu ioctl? > > > > No, I mean by using the host userspace VMA to read the memory. > > Oh, I think I got your idea. You mean > using KVM_CAP_X86_MSR_FILTER which... > > "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to > trap and emulate MSRs ..." > > And then having guest's steal time struct valid address read the value from > userspace VMM like qemu directly. Yep, exactly!
> On 9/30/24 18:44, Sean Christopherson wrote: >>> No, I mean by using the host userspace VMA to read the memory. >> >> Oh, I think I got your idea. You mean >> using KVM_CAP_X86_MSR_FILTER which... >> >> "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to >> trap and emulate MSRs ..." >> >> And then having guest's steal time struct valid address read the value from >> userspace VMM like qemu directly. > > Yep, exactly! By the way, what if we add "steal time" as a kvm statistics item? Why I think it's a good idea? * it is available via standard KVM_GET_STATS_FD * it doesn't introduce any overhead * it is quite easy to add with just three lines of code --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1596,6 +1596,7 @@ struct kvm_vcpu_stat { u64 preemption_other; u64 guest_mode; u64 notify_window_exits; + u64 steal_time; }; struct x86_instruction_info; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146fc..cd771aef1558a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -291,6 +291,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { STATS_DESC_COUNTER(VCPU, preemption_other), STATS_DESC_IBOOLEAN(VCPU, guest_mode), STATS_DESC_COUNTER(VCPU, notify_window_exits), + STATS_DESC_TIME_NSEC(VCPU, steal_time), }; const struct kvm_stats_header kvm_vcpu_stats_header = { @@ -3763,6 +3764,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu) version += 1; unsafe_put_user(version, &st->version, out); + vcpu->stat.steal_time = steal; The disadvantage of this approach is that it adds some kind of data duplication but it doesn't seem to be a problem - using shadowing and caching are common practices. My concern about intercepting steal time MSR in user space is overcomplication - we need to add significant amount of userspace code to achieve what we can get in much easier and, in my opinion, cleaner way. I think it's a cleaner way because every userspace app (like QEMU) will get steal time without any modification via means provided by kvm. For example, QEMU will be able to get steal time via qmp with "query-stats" command which returns every statistics item provided by KVM_GET_STATS_FD.
Ping! In the last message I suggested another approach to get steal time statistics. It would be nice to get some opinion according to that! Thanks! On 11/5/24 15:43, Denis Plotnikov wrote: > > > On 9/30/24 18:44, Sean Christopherson wrote: >>>> No, I mean by using the host userspace VMA to read the memory. >>> >>> Oh, I think I got your idea. You mean >>> using KVM_CAP_X86_MSR_FILTER which... >>> >>> "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user >>> space to >>> trap and emulate MSRs ..." >>> >>> And then having guest's steal time struct valid address read the >>> value from >>> userspace VMM like qemu directly. >> >> Yep, exactly! > > By the way, what if we add "steal time" as a kvm statistics item? > > Why I think it's a good idea? > * it is available via standard KVM_GET_STATS_FD > * it doesn't introduce any overhead > * it is quite easy to add with just three lines of code > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1596,6 +1596,7 @@ struct kvm_vcpu_stat { > u64 preemption_other; > u64 guest_mode; > u64 notify_window_exits; > + u64 steal_time; > }; > > struct x86_instruction_info; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 83fe0a78146fc..cd771aef1558a 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -291,6 +291,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { > STATS_DESC_COUNTER(VCPU, preemption_other), > STATS_DESC_IBOOLEAN(VCPU, guest_mode), > STATS_DESC_COUNTER(VCPU, notify_window_exits), > + STATS_DESC_TIME_NSEC(VCPU, steal_time), > }; > > const struct kvm_stats_header kvm_vcpu_stats_header = { > @@ -3763,6 +3764,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu) > version += 1; > unsafe_put_user(version, &st->version, out); > > + vcpu->stat.steal_time = steal; > > The disadvantage of this approach is that it adds some kind of data > duplication but it doesn't seem to be a problem - using shadowing and > caching are common practices. > > My concern about intercepting steal time MSR in user space is > overcomplication - we need to add significant amount of userspace code > to achieve what we can get in much easier and, in my opinion, cleaner > way. I think it's a cleaner way because every userspace app (like QEMU) > will get steal time without any modification via means provided by kvm. > For example, QEMU will be able to get steal time via qmp with > "query-stats" command which returns every statistics item provided by > KVM_GET_STATS_FD.
© 2016 - 2024 Red Hat, Inc.