[PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future

Roger Pau Monne posted 1 patch 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20230419114556.34856-1-roger.pau@citrix.com
There is a newer version of this series
CHANGELOG.md              |  2 ++
xen/common/domain.c       | 13 ++++++++++---
xen/include/public/vcpu.h |  2 +-
3 files changed, 13 insertions(+), 4 deletions(-)
[PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future
Posted by Roger Pau Monne 1 year ago
The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
When the hypervisor returns -ENOTIME (timeout in the past) Linux keeps
retrying to setup the timer with a higher timeout instead of
self-injecting a timer interrupt.

On boxes without any hardware assistance for logdirty we have seen HVM
Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
logdirty is enabled:

CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 506250 nsec
CE: xen increased min_delta_ns to 759375 nsec
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
Freezing user space processes ...
INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
Task dump for CPU 14:
swapper/14      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14

Thus leading to CPU stalls and a broken system as a result.

Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
the hypervisor.  Old Linux versions are the only ones known to have
(wrongly) attempted to use the flag, and ignoring it is compatible
with the behavior expected by any guests setting that flag.

Note the usage of the flag has been removed from Linux by commit:

c06b6d70feb3 xen/x86: don't lose event interrupts

Which landed in Linux 4.7.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Just ignore the flag, as there's no ABI breakage.
---
 CHANGELOG.md              |  2 ++
 xen/common/domain.c       | 13 ++++++++++---
 xen/include/public/vcpu.h |  2 +-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5dbf8b06d72c..ffe009af2dc8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 ### Changed
  - Repurpose command line gnttab_max_{maptrack_,}frames options so they don't
    cap toolstack provided values.
+ - Ignore VCPU_SSHOTTMR_future VCPUOP_set_singleshot_timer flag. The only
+   known user doesn't use it properly, leading to in-guest breakage.
 
 ### Added
  - On x86, support for features new in Intel Sapphire Rapids CPUs:
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 626debbae095..6a440590fe2a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1762,9 +1762,16 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&set, arg, 1) )
             return -EFAULT;
 
-        if ( (set.flags & VCPU_SSHOTTMR_future) &&
-             (set.timeout_abs_ns < NOW()) )
-            return -ETIME;
+        if ( set.timeout_abs_ns < NOW() )
+        {
+            /*
+             * Simplify the logic if the timeout has already expired and just
+             * inject the event.
+             */
+            stop_timer(&v->singleshot_timer);
+            send_timer_event(v);
+            break;
+        }
 
         migrate_timer(&v->singleshot_timer, smp_processor_id());
         set_timer(&v->singleshot_timer, set.timeout_abs_ns);
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 81a3b3a7438c..30b5291cd447 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -150,7 +150,7 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
 
 /* Flags to VCPUOP_set_singleshot_timer. */
- /* Require the timeout to be in the future (return -ETIME if it's passed). */
+ /* Ignored. */
 #define _VCPU_SSHOTTMR_future (0)
 #define VCPU_SSHOTTMR_future  (1U << _VCPU_SSHOTTMR_future)
 
-- 
2.40.0


Re: [PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future
Posted by Jan Beulich 1 year ago
On 19.04.2023 13:45, Roger Pau Monne wrote:
> The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
> When the hypervisor returns -ENOTIME (timeout in the past) Linux keeps

Nit: -ETIME

> retrying to setup the timer with a higher timeout instead of
> self-injecting a timer interrupt.
>[...]
> --- a/CHANGELOG.md
> +++ b/CHANGELOG.md
> @@ -9,6 +9,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
>  ### Changed
>   - Repurpose command line gnttab_max_{maptrack_,}frames options so they don't
>     cap toolstack provided values.
> + - Ignore VCPU_SSHOTTMR_future VCPUOP_set_singleshot_timer flag. The only
> +   known user doesn't use it properly, leading to in-guest breakage.

Might this read a little better as

 - Ignore VCPUOP_set_singleshot_timer's VCPU_SSHOTTMR_future flag. The only
   known user doesn't use it properly, leading to in-guest breakage.

> --- a/xen/include/public/vcpu.h
> +++ b/xen/include/public/vcpu.h
> @@ -150,7 +150,7 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
>  
>  /* Flags to VCPUOP_set_singleshot_timer. */
> - /* Require the timeout to be in the future (return -ETIME if it's passed). */
> + /* Ignored. */

I think this could do with something like "as of Xen 4.18", as the public
header shouldn't be tied to a specific version (and then perhaps also
retaining the original text). Arguably mentioning a specific version may be
a little odd in case we'd consider backporting this, but something would
imo better be said.

All this said - while I'm likely to ack the final patch, I would still feel
a little uneasy doing so.

Jan
Re: [PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future
Posted by Roger Pau Monné 1 year ago
On Wed, Apr 19, 2023 at 02:14:44PM +0200, Jan Beulich wrote:
> On 19.04.2023 13:45, Roger Pau Monne wrote:
> > The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
> > When the hypervisor returns -ENOTIME (timeout in the past) Linux keeps
> 
> Nit: -ETIME

Oh, thanks.

> 
> > retrying to setup the timer with a higher timeout instead of
> > self-injecting a timer interrupt.
> >[...]
> > --- a/CHANGELOG.md
> > +++ b/CHANGELOG.md
> > @@ -9,6 +9,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
> >  ### Changed
> >   - Repurpose command line gnttab_max_{maptrack_,}frames options so they don't
> >     cap toolstack provided values.
> > + - Ignore VCPU_SSHOTTMR_future VCPUOP_set_singleshot_timer flag. The only
> > +   known user doesn't use it properly, leading to in-guest breakage.
> 
> Might this read a little better as
> 
>  - Ignore VCPUOP_set_singleshot_timer's VCPU_SSHOTTMR_future flag. The only
>    known user doesn't use it properly, leading to in-guest breakage.

Sure.

> > --- a/xen/include/public/vcpu.h
> > +++ b/xen/include/public/vcpu.h
> > @@ -150,7 +150,7 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
> >  DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
> >  
> >  /* Flags to VCPUOP_set_singleshot_timer. */
> > - /* Require the timeout to be in the future (return -ETIME if it's passed). */
> > + /* Ignored. */
> 
> I think this could do with something like "as of Xen 4.18", as the public
> header shouldn't be tied to a specific version (and then perhaps also
> retaining the original text). Arguably mentioning a specific version may be
> a little odd in case we'd consider backporting this, but something would
> imo better be said.

Hm, at least for XenServer we will backport this to 4.13, other
vendors might backport to different versions, at which point I'm not
sure the comment is very helpful.  It can be misleading because it
might seem to infer that checking the Xen version will tell you
whether the flag is ignored or not.

What about:

/*
 * Request the timeout to be in the future (return -ETIME if it's passed)
 * but can be ignored by the hypervisor.
 */

> All this said - while I'm likely to ack the final patch, I would still feel
> a little uneasy doing so.

Thanks, Roger.
Re: [PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future
Posted by Jan Beulich 1 year ago
On 19.04.2023 15:22, Roger Pau Monné wrote:
> On Wed, Apr 19, 2023 at 02:14:44PM +0200, Jan Beulich wrote:
>> On 19.04.2023 13:45, Roger Pau Monne wrote:
>>> --- a/xen/include/public/vcpu.h
>>> +++ b/xen/include/public/vcpu.h
>>> @@ -150,7 +150,7 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
>>>  DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
>>>  
>>>  /* Flags to VCPUOP_set_singleshot_timer. */
>>> - /* Require the timeout to be in the future (return -ETIME if it's passed). */
>>> + /* Ignored. */
>>
>> I think this could do with something like "as of Xen 4.18", as the public
>> header shouldn't be tied to a specific version (and then perhaps also
>> retaining the original text). Arguably mentioning a specific version may be
>> a little odd in case we'd consider backporting this, but something would
>> imo better be said.
> 
> Hm, at least for XenServer we will backport this to 4.13, other
> vendors might backport to different versions, at which point I'm not
> sure the comment is very helpful.  It can be misleading because it
> might seem to infer that checking the Xen version will tell you
> whether the flag is ignored or not.
> 
> What about:
> 
> /*
>  * Request the timeout to be in the future (return -ETIME if it's passed)
>  * but can be ignored by the hypervisor.
>  */

Would be fine with me.

Jan

RE: [PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future
Posted by Henry Wang 1 year ago
Hi Roger,

> -----Original Message-----
> From: Roger Pau Monne <roger.pau@citrix.com>
> Subject: [PATCH v2] xen/vcpu: ignore VCPU_SSHOTTMR_future
> 
> The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
> When the hypervisor returns -ENOTIME (timeout in the past) Linux keeps
> retrying to setup the timer with a higher timeout instead of
> self-injecting a timer interrupt.
> 
> On boxes without any hardware assistance for logdirty we have seen HVM
> Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
> logdirty is enabled:
> 
> CE: Reprogramming failure. Giving up
> CE: xen increased min_delta_ns to 1000000 nsec
> CE: Reprogramming failure. Giving up
> CE: Reprogramming failure. Giving up
> CE: xen increased min_delta_ns to 506250 nsec
> CE: xen increased min_delta_ns to 759375 nsec
> CE: xen increased min_delta_ns to 1000000 nsec
> CE: Reprogramming failure. Giving up
> CE: Reprogramming failure. Giving up
> CE: Reprogramming failure. Giving up
> Freezing user space processes ...
> INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002
> jiffies, g=4006, c=4005, q=14130)
> Task dump for CPU 14:
> swapper/14      R  running task        0     0      1 0x00000000
> Call Trace:
>  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
>  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
>  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
>  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
>  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
>  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002
> jiffies, g=6922, c=6921, q=7013)
> Task dump for CPU 26:
> swapper/26      R  running task        0     0      1 0x00000000
> Call Trace:
>  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
>  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
>  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
>  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
>  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
>  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002
> jiffies, g=8499, c=8498, q=7664)
> Task dump for CPU 26:
> swapper/26      R  running task        0     0      1 0x00000000
> Call Trace:
>  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
>  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
>  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
>  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
>  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
>  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> 
> Thus leading to CPU stalls and a broken system as a result.
> 
> Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
> the hypervisor.  Old Linux versions are the only ones known to have
> (wrongly) attempted to use the flag, and ignoring it is compatible
> with the behavior expected by any guests setting that flag.
> 
> Note the usage of the flag has been removed from Linux by commit:
> 
> c06b6d70feb3 xen/x86: don't lose event interrupts
> 
> Which landed in Linux 4.7.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Henry Wang <Henry.Wang@arm.com> # CHANGELOG

Kind regards,
Henry