[RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative

David Woodhouse posted 21 patches 1 year, 6 months ago
[RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative
Posted by David Woodhouse 1 year, 6 months ago
From: David Woodhouse <dwmw@amazon.co.uk>

When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
time spent in the previous runstate is accounted. This is based on the
delta between the current KVM clock time, and the previous value stored
in vcpu->arch.xen.runstate_entry_time.

If the KVM clock goes backwards, that delta will be negative. Or, since
it's an unsigned 64-bit integer, very *large*. Linux guests deal with
that particularly badly, reporting 100% steal time for ever more (well,
for *centuries* at least, until the delta has been consumed).

So when a negative delta is detected, just refrain from updating the
runstates until the KVM clock catches up with runstate_entry_time again.

The userspace APIs for setting the runstate times do not allow them to
be set past the current KVM clock, but userspace can still adjust the
KVM clock *after* setting the runstate times, which would cause this
situation to occur.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/xen.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 014048c22652..3d4111de4472 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -538,24 +538,34 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
 {
 	struct kvm_vcpu_xen *vx = &v->arch.xen;
 	u64 now = get_kvmclock_ns(v->kvm);
-	u64 delta_ns = now - vx->runstate_entry_time;
 	u64 run_delay = current->sched_info.run_delay;
+	s64 delta_ns = now - vx->runstate_entry_time;
+	s64 steal_ns = run_delay - vx->last_steal;
 
 	if (unlikely(!vx->runstate_entry_time))
 		vx->current_runstate = RUNSTATE_offline;
 
+	vx->last_steal = run_delay;
+
+	/*
+	 * If KVM clock time went backwards, stop updating until it
+	 * catches up (or the runstates are reset by userspace).
+	 */
+	if (delta_ns < 0)
+		return;
+
 	/*
 	 * Time waiting for the scheduler isn't "stolen" if the
 	 * vCPU wasn't running anyway.
 	 */
-	if (vx->current_runstate == RUNSTATE_running) {
-		u64 steal_ns = run_delay - vx->last_steal;
+	if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) {
+		if (steal_ns > delta_ns)
+			steal_ns = delta_ns;
 
 		delta_ns -= steal_ns;
 
 		vx->runstate_times[RUNSTATE_runnable] += steal_ns;
 	}
-	vx->last_steal = run_delay;
 
 	vx->runstate_times[vx->current_runstate] += delta_ns;
 	vx->current_runstate = state;
-- 
2.44.0
Re: [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative
Posted by Sean Christopherson 1 year, 3 months ago
On Wed, May 22, 2024, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
> time spent in the previous runstate is accounted. This is based on the
> delta between the current KVM clock time, and the previous value stored
> in vcpu->arch.xen.runstate_entry_time.
> 
> If the KVM clock goes backwards, that delta will be negative. Or, since
> it's an unsigned 64-bit integer, very *large*. Linux guests deal with
> that particularly badly, reporting 100% steal time for ever more (well,
> for *centuries* at least, until the delta has been consumed).
> 
> So when a negative delta is detected, just refrain from updating the
> runstates until the KVM clock catches up with runstate_entry_time again.
> 
> The userspace APIs for setting the runstate times do not allow them to
> be set past the current KVM clock, but userspace can still adjust the
> KVM clock *after* setting the runstate times, which would cause this
> situation to occur.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/xen.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 014048c22652..3d4111de4472 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -538,24 +538,34 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
>  {
>  	struct kvm_vcpu_xen *vx = &v->arch.xen;
>  	u64 now = get_kvmclock_ns(v->kvm);
> -	u64 delta_ns = now - vx->runstate_entry_time;
>  	u64 run_delay = current->sched_info.run_delay;
> +	s64 delta_ns = now - vx->runstate_entry_time;
> +	s64 steal_ns = run_delay - vx->last_steal;
>  
>  	if (unlikely(!vx->runstate_entry_time))
>  		vx->current_runstate = RUNSTATE_offline;
>  
> +	vx->last_steal = run_delay;
> +
> +	/*
> +	 * If KVM clock time went backwards, stop updating until it
> +	 * catches up (or the runstates are reset by userspace).
> +	 */

I take it this is a legitimate scenario where userpace sets KVM clock and then
the runstates, and KVM needs to lend a hand because userspace can't do those two
things atomically?

> +	if (delta_ns < 0)
> +		return;
> +
>  	/*
>  	 * Time waiting for the scheduler isn't "stolen" if the
>  	 * vCPU wasn't running anyway.
>  	 */
> -	if (vx->current_runstate == RUNSTATE_running) {
> -		u64 steal_ns = run_delay - vx->last_steal;
> +	if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) {
> +		if (steal_ns > delta_ns)
> +			steal_ns = delta_ns;
>  
>  		delta_ns -= steal_ns;
>  
>  		vx->runstate_times[RUNSTATE_runnable] += steal_ns;
>  	}
> -	vx->last_steal = run_delay;
>  
>  	vx->runstate_times[vx->current_runstate] += delta_ns;
>  	vx->current_runstate = state;
> -- 
> 2.44.0
>
Re: [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative
Posted by David Woodhouse 1 year, 3 months ago
On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote:
> > +       vx->last_steal = run_delay;
> > +
> > +       /*
> > +        * If KVM clock time went backwards, stop updating until it
> > +        * catches up (or the runstates are reset by userspace).
> > +        */
> 
> I take it this is a legitimate scenario where userpace sets KVM clock and then
> the runstates, and KVM needs to lend a hand because userspace can't do those two
> things atomically?

Indeed. Will update the comment to make that more obvious.

Thanks for the rest of the review on this series. I'll go through in
detail and update it, hopefully this week.
Re: [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative
Posted by Steven Rostedt 1 year, 3 months ago
On Tue, 20 Aug 2024 11:22:31 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote:
> > > +       vx->last_steal = run_delay;
> > > +
> > > +       /*
> > > +        * If KVM clock time went backwards, stop updating until it
> > > +        * catches up (or the runstates are reset by userspace).
> > > +        */  
> > 
> > I take it this is a legitimate scenario where userpace sets KVM clock and then
> > the runstates, and KVM needs to lend a hand because userspace can't do those two
> > things atomically?  
> 
> Indeed. Will update the comment to make that more obvious.
> 
> Thanks for the rest of the review on this series. I'll go through in
> detail and update it, hopefully this week.

Hmm, is this related at all to this:

  https://lore.kernel.org/all/20240806111157.1336532-1-suleiman@google.com/

-- Steve
Re: [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative
Posted by David Woodhouse 1 year, 3 months ago
On Tue, 2024-08-20 at 11:08 -0400, Steven Rostedt wrote:
> On Tue, 20 Aug 2024 11:22:31 +0100
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > On Thu, 2024-08-15 at 21:39 -0700, Sean Christopherson wrote:
> > > > +       vx->last_steal = run_delay;
> > > > +
> > > > +       /*
> > > > +        * If KVM clock time went backwards, stop updating
> > > > until it
> > > > +        * catches up (or the runstates are reset by
> > > > userspace).
> > > > +        */  
> > > 
> > > I take it this is a legitimate scenario where userpace sets KVM
> > > clock and then
> > > the runstates, and KVM needs to lend a hand because userspace
> > > can't do those two
> > > things atomically?  
> > 
> > Indeed. Will update the comment to make that more obvious.
> > 
> > Thanks for the rest of the review on this series. I'll go through
> > in
> > detail and update it, hopefully this week.
> 
> Hmm, is this related at all to this:
> 
>  
> https://lore.kernel.org/all/20240806111157.1336532-1-suleiman@google.com/

No, but patch 21 in this same series is.
Re: [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative
Posted by Paul Durrant 1 year, 6 months ago
On 22/05/2024 01:17, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
> time spent in the previous runstate is accounted. This is based on the
> delta between the current KVM clock time, and the previous value stored
> in vcpu->arch.xen.runstate_entry_time.
> 
> If the KVM clock goes backwards, that delta will be negative. Or, since
> it's an unsigned 64-bit integer, very *large*. Linux guests deal with
> that particularly badly, reporting 100% steal time for ever more (well,
> for *centuries* at least, until the delta has been consumed).
> 
> So when a negative delta is detected, just refrain from updating the
> runstates until the KVM clock catches up with runstate_entry_time again.
> 
> The userspace APIs for setting the runstate times do not allow them to
> be set past the current KVM clock, but userspace can still adjust the
> KVM clock *after* setting the runstate times, which would cause this
> situation to occur.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>   arch/x86/kvm/xen.c | 18 ++++++++++++++----
>   1 file changed, 14 insertions(+), 4 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>