[PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field

Roger Pau Monne posted 1 patch 1 year ago
Failed in applying to current master (apply log)
CHANGELOG.md                    |  2 ++
xen/common/compat/domain.c      | 18 +++++-------------
xen/common/domain.c             | 13 ++++++++++---
xen/include/public/vcpu.h       | 12 +++++++-----
xen/include/public/xen-compat.h |  2 +-
xen/include/xlat.lst            |  2 +-
6 files changed, 26 insertions(+), 23 deletions(-)
[PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Roger Pau Monne 1 year ago
The addition of the flags field in the vcpu_set_singleshot_timer in
505ef3ea8687 is an ABI breakage, as the size of the structure is
increased.

Remove such field addition and drop the implementation of the
VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
value just inject the timer interrupt.

Bump the Xen interface version, and keep the flags field and
VCPU_SSHOTTMR_future available for guests using the old interface.

Note the removal of the field from the vcpu_set_singleshot_timer
struct allows removing the compat translation of the struct.

Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 CHANGELOG.md                    |  2 ++
 xen/common/compat/domain.c      | 18 +++++-------------
 xen/common/domain.c             | 13 ++++++++++---
 xen/include/public/vcpu.h       | 12 +++++++-----
 xen/include/public/xen-compat.h |  2 +-
 xen/include/xlat.lst            |  2 +-
 6 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5dbf8b06d72c..b0d9bf4edbda 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 ### Changed
  - Repurpose command line gnttab_max_{maptrack_,}frames options so they don't
    cap toolstack provided values.
+ - Remove flags field from vcpu_set_periodic_timer: its introduction was an
+   ABI breakage.
 
 ### Added
  - On x86, support for features new in Intel Sapphire Rapids CPUs:
diff --git a/xen/common/compat/domain.c b/xen/common/compat/domain.c
index c4254905359e..ffc73a9a1dc9 100644
--- a/xen/common/compat/domain.c
+++ b/xen/common/compat/domain.c
@@ -16,6 +16,10 @@ EMIT_FILE;
 CHECK_vcpu_set_periodic_timer;
 #undef xen_vcpu_set_periodic_timer
 
+#define xen_vcpu_set_singleshot_timer vcpu_set_singleshot_timer
+CHECK_vcpu_set_singleshot_timer;
+#undef xen_vcpu_set_singleshot_timer
+
 #define xen_vcpu_info vcpu_info
 CHECK_SIZE_(struct, vcpu_info);
 #undef xen_vcpu_info
@@ -97,6 +101,7 @@ int compat_common_vcpu_op(int cmd, struct vcpu *v,
     case VCPUOP_is_up:
     case VCPUOP_set_periodic_timer:
     case VCPUOP_stop_periodic_timer:
+    case VCPUOP_set_singleshot_timer:
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
         rc = common_vcpu_op(cmd, v, arg);
@@ -116,19 +121,6 @@ int compat_common_vcpu_op(int cmd, struct vcpu *v,
         break;
     }
 
-    case VCPUOP_set_singleshot_timer:
-    {
-        struct compat_vcpu_set_singleshot_timer cmp;
-        struct vcpu_set_singleshot_timer *nat;
-
-        if ( copy_from_guest(&cmp, arg, 1) )
-            return -EFAULT;
-        nat = COMPAT_ARG_XLAT_VIRT_BASE;
-        XLAT_vcpu_set_singleshot_timer(nat, &cmp);
-        rc = do_vcpu_op(cmd, vcpuid, guest_handle_from_ptr(nat, void));
-        break;
-    }
-
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 626debbae095..6a440590fe2a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1762,9 +1762,16 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&set, arg, 1) )
             return -EFAULT;
 
-        if ( (set.flags & VCPU_SSHOTTMR_future) &&
-             (set.timeout_abs_ns < NOW()) )
-            return -ETIME;
+        if ( set.timeout_abs_ns < NOW() )
+        {
+            /*
+             * Simplify the logic if the timeout has already expired and just
+             * inject the event.
+             */
+            stop_timer(&v->singleshot_timer);
+            send_timer_event(v);
+            break;
+        }
 
         migrate_timer(&v->singleshot_timer, smp_processor_id());
         set_timer(&v->singleshot_timer, set.timeout_abs_ns);
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 81a3b3a7438c..6d86a661bd67 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -144,15 +144,17 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_set_periodic_timer_t);
 #define VCPUOP_stop_singleshot_timer 9 /* arg == NULL */
 struct vcpu_set_singleshot_timer {
     uint64_t timeout_abs_ns;   /* Absolute system time value in nanoseconds. */
-    uint32_t flags;            /* VCPU_SSHOTTMR_??? */
+#if __XEN_INTERFACE_VERSION__ < 0x00040f00
+    uint32_t flags;            /* Ignored. */
+#endif
 };
 typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
 
-/* Flags to VCPUOP_set_singleshot_timer. */
- /* Require the timeout to be in the future (return -ETIME if it's passed). */
-#define _VCPU_SSHOTTMR_future (0)
-#define VCPU_SSHOTTMR_future  (1U << _VCPU_SSHOTTMR_future)
+#if __XEN_INTERFACE_VERSION__ < 0x00040f00
+/* Ignored. */
+#define VCPU_SSHOTTMR_future  1
+#endif
 
 /*
  * Register a memory location in the guest address space for the
diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h
index 97fe6984989a..dc43cc9567c0 100644
--- a/xen/include/public/xen-compat.h
+++ b/xen/include/public/xen-compat.h
@@ -10,7 +10,7 @@
 #ifndef __XEN_PUBLIC_XEN_COMPAT_H__
 #define __XEN_PUBLIC_XEN_COMPAT_H__
 
-#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040e00
+#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040f00
 
 #if defined(__XEN__) || defined(__XEN_TOOLS__)
 /* Xen is built with matching headers and implements the latest interface. */
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index d601a8a98421..5463961ce26b 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -168,7 +168,7 @@
 ?	vcpu_register_vcpu_info		vcpu.h
 !	vcpu_runstate_info		vcpu.h
 ?	vcpu_set_periodic_timer		vcpu.h
-!	vcpu_set_singleshot_timer	vcpu.h
+?	vcpu_set_singleshot_timer	vcpu.h
 ?	build_id                        version.h
 ?	compile_info                    version.h
 ?	feature_info                    version.h
-- 
2.40.0


RE: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Henry Wang 1 year ago
Hi Roger,

> -----Original Message-----
> From: Roger Pau Monne <roger.pau@citrix.com>
> Subject: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
> 
> The addition of the flags field in the vcpu_set_singleshot_timer in
> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> increased.
> 
> Remove such field addition and drop the implementation of the
> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> value just inject the timer interrupt.
> 
> Bump the Xen interface version, and keep the flags field and
> VCPU_SSHOTTMR_future available for guests using the old interface.
> 
> Note the removal of the field from the vcpu_set_singleshot_timer
> struct allows removing the compat translation of the struct.
> 
> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

For the CHANGELOG part:
Acked-by: Henry Wang <Henry.Wang@arm.com> # CHANGELOG

Kind regards,
Henry
Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Andrew Cooper 1 year ago
On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> The addition of the flags field in the vcpu_set_singleshot_timer in
> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> increased.
>
> Remove such field addition and drop the implementation of the
> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> value just inject the timer interrupt.
>
> Bump the Xen interface version, and keep the flags field and
> VCPU_SSHOTTMR_future available for guests using the old interface.
>
> Note the removal of the field from the vcpu_set_singleshot_timer
> struct allows removing the compat translation of the struct.
>
> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

While everything said is true, this isn't the reason to to get rid of
VCPU_SSHOTTMR_future

It 505ef3ea8687 does appear to have been an ABI break, that's
incidental.  It dates from 2007 so whatever we have now is the de-facto
ABI, whether we like it or not.

The reason to delete this is because it is a monumentality and entirely
stupid idea which should have been rejected outright at the time, and
the only guest we can find which uses it also BUG_ON()'s in response to
-ETIME.

It can literally only be used to shoot yourself in the foot with, and
more recent Linuxes have dropped their use of it.

The structure needs to stay it's current shape, and while it's fine to
hide the VCPU_SSHOTTMR_future behind an interface version macro, we do
need to say that it is explicitly ignored.

~Andrew

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Jan Beulich 1 year ago
On 18.04.2023 17:54, Andrew Cooper wrote:
> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
>> The addition of the flags field in the vcpu_set_singleshot_timer in
>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
>> increased.
>>
>> Remove such field addition and drop the implementation of the
>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
>> value just inject the timer interrupt.
>>
>> Bump the Xen interface version, and keep the flags field and
>> VCPU_SSHOTTMR_future available for guests using the old interface.
>>
>> Note the removal of the field from the vcpu_set_singleshot_timer
>> struct allows removing the compat translation of the struct.
>>
>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> While everything said is true, this isn't the reason to to get rid of
> VCPU_SSHOTTMR_future
> 
> It 505ef3ea8687 does appear to have been an ABI break, that's
> incidental.  It dates from 2007 so whatever we have now is the de-facto
> ABI, whether we like it or not.
> 
> The reason to delete this is because it is a monumentality and entirely
> stupid idea which should have been rejected outright at the time, and
> the only guest we can find which uses it also BUG_ON()'s in response to
> -ETIME.

The instance in Linux (up to 4.6) that I could find was

	BUG_ON(ret != 0 && ret != -ETIME);

i.e. not really dying when getting back -ETIME. (And if there really was
a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
idea" as well as ...

> It can literally only be used to shoot yourself in the foot with, and
> more recent Linuxes have dropped their use of it.

... this: If used correctly, it can avoid injection of a pointless event.
Clearly the Linux change dropping use of the flag indicates that its use
wasn't correct (anymore?), likely because of not properly dealing with
-ETIME up the call stack. I'm willing to trust Jeremy / Keir that at the
time of its introduction such a problem didn't exist.

Jan

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Roger Pau Monné 1 year ago
On Wed, Apr 19, 2023 at 09:07:41AM +0200, Jan Beulich wrote:
> On 18.04.2023 17:54, Andrew Cooper wrote:
> > On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> >> The addition of the flags field in the vcpu_set_singleshot_timer in
> >> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> >> increased.
> >>
> >> Remove such field addition and drop the implementation of the
> >> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> >> value just inject the timer interrupt.
> >>
> >> Bump the Xen interface version, and keep the flags field and
> >> VCPU_SSHOTTMR_future available for guests using the old interface.
> >>
> >> Note the removal of the field from the vcpu_set_singleshot_timer
> >> struct allows removing the compat translation of the struct.
> >>
> >> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> >> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > 
> > While everything said is true, this isn't the reason to to get rid of
> > VCPU_SSHOTTMR_future
> > 
> > It 505ef3ea8687 does appear to have been an ABI break, that's
> > incidental.  It dates from 2007 so whatever we have now is the de-facto
> > ABI, whether we like it or not.
> > 
> > The reason to delete this is because it is a monumentality and entirely
> > stupid idea which should have been rejected outright at the time, and
> > the only guest we can find which uses it also BUG_ON()'s in response to
> > -ETIME.
> 
> The instance in Linux (up to 4.6) that I could find was
> 
> 	BUG_ON(ret != 0 && ret != -ETIME);
> 
> i.e. not really dying when getting back -ETIME. (And if there really was
> a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
> not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
> idea" as well as ...

The logic in old Linux is indeed 'fine' in the sense that it doesn't
hit a BUG_ON.

The problem we are seeing is that when logdirty is enabled on a guest
with 32vCPUs (and without any kind of logdirty hardware assistance)
the contention on the p2m lock is so high that by the time
VCPUOP_set_singleshot_timer has copied the hypercall data from HVM
context the provided timeout has already expired, leading to:

[   65.543736] hrtimer: interrupt took 10817714 ns
[   65.514171] CE: xen increased min_delta_ns to 150000 nsec
[   65.514171] CE: xen increased min_delta_ns to 225000 nsec
[   65.514171] CE: xen increased min_delta_ns to 337500 nsec
[   65.566495] CE: xen increased min_delta_ns to 150000 nsec
[   65.514171] CE: xen increased min_delta_ns to 506250 nsec
[   65.573088] CE: xen increased min_delta_ns to 150000 nsec
[   65.572884] CE: xen increased min_delta_ns to 150000 nsec
[   65.514171] CE: xen increased min_delta_ns to 759375 nsec
[   65.638644] CE: xen increased min_delta_ns to 150000 nsec
[   65.566495] CE: xen increased min_delta_ns to 225000 nsec
[   65.514171] CE: xen increased min_delta_ns to 1000000 nsec
[   65.572884] CE: xen increased min_delta_ns to 225000 nsec
[   65.573088] CE: xen increased min_delta_ns to 225000 nsec
[   65.630224] CE: xen increased min_delta_ns to 150000 nsec
...

xenrt1062821 login: [   82.752788] CE: Reprogramming failure. Giving up
[   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
[   82.793075] CE: Reprogramming failure. Giving up
[   82.779470] CE: Reprogramming failure. Giving up
[   82.821864] CE: xen increased min_delta_ns to 506250 nsec
[   82.821864] CE: xen increased min_delta_ns to 759375 nsec
[   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
[   82.821864] CE: Reprogramming failure. Giving up
[   82.856256] CE: Reprogramming failure. Giving up
[   84.566279] CE: Reprogramming failure. Giving up
[   84.649493] Freezing user space processes ... 
[  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
[  130.604032] Task dump for CPU 14:
[  130.604032] swapper/14      R  running task        0     0      1 0x00000000
[  130.604032] Call Trace:
[  130.604032]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
[  549.655463] Task dump for CPU 26:
[  549.655463] swapper/26      R  running task        0     0      1 0x00000000
[  549.655463] Call Trace:
[  549.655463]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
[  821.888596] Task dump for CPU 26:
[  821.888622] swapper/26      R  running task        0     0      1 0x00000000
[  821.888677] Call Trace:
[  821.888712]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14

At some point Linux simply gives up trying to reprogram the timer, and
that obviously lead to CPU stalls.

Ignoring the VCPU_SSHOTTMR_future flag allows the guest to survive, by
not returning ETIME and just injecting the expired interrupt.

Overall I'm not sure how useful VCPU_SSHOTTMR_future is at least when
implemented as done currently in Linux.

Instead of trying to reprogram the timer Linux should do the
equivalent of self-inject a timer interrupt in order to cope with the
fact that the selected timeout has already expired.

> > It can literally only be used to shoot yourself in the foot with, and
> > more recent Linuxes have dropped their use of it.
> 
> ... this: If used correctly, it can avoid injection of a pointless event.
> Clearly the Linux change dropping use of the flag indicates that its use
> wasn't correct (anymore?), likely because of not properly dealing with
> -ETIME up the call stack. I'm willing to trust Jeremy / Keir that at the
> time of its introduction such a problem didn't exist.

I'm afraid Linux didn't implement this properly originally, as it
attempted to reprogram the timer with a bigger timeout rather than
just doing the equivalent of self-injecting a timer interrupt.

Thanks, Roger.

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Jan Beulich 1 year ago
On 19.04.2023 11:02, Roger Pau Monné wrote:
> On Wed, Apr 19, 2023 at 09:07:41AM +0200, Jan Beulich wrote:
>> On 18.04.2023 17:54, Andrew Cooper wrote:
>>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
>>>> The addition of the flags field in the vcpu_set_singleshot_timer in
>>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
>>>> increased.
>>>>
>>>> Remove such field addition and drop the implementation of the
>>>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
>>>> value just inject the timer interrupt.
>>>>
>>>> Bump the Xen interface version, and keep the flags field and
>>>> VCPU_SSHOTTMR_future available for guests using the old interface.
>>>>
>>>> Note the removal of the field from the vcpu_set_singleshot_timer
>>>> struct allows removing the compat translation of the struct.
>>>>
>>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>
>>> While everything said is true, this isn't the reason to to get rid of
>>> VCPU_SSHOTTMR_future
>>>
>>> It 505ef3ea8687 does appear to have been an ABI break, that's
>>> incidental.  It dates from 2007 so whatever we have now is the de-facto
>>> ABI, whether we like it or not.
>>>
>>> The reason to delete this is because it is a monumentality and entirely
>>> stupid idea which should have been rejected outright at the time, and
>>> the only guest we can find which uses it also BUG_ON()'s in response to
>>> -ETIME.
>>
>> The instance in Linux (up to 4.6) that I could find was
>>
>> 	BUG_ON(ret != 0 && ret != -ETIME);
>>
>> i.e. not really dying when getting back -ETIME. (And if there really was
>> a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
>> not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
>> idea" as well as ...
> 
> The logic in old Linux is indeed 'fine' in the sense that it doesn't
> hit a BUG_ON.
> 
> The problem we are seeing is that when logdirty is enabled on a guest
> with 32vCPUs (and without any kind of logdirty hardware assistance)
> the contention on the p2m lock is so high that by the time
> VCPUOP_set_singleshot_timer has copied the hypercall data from HVM
> context the provided timeout has already expired, leading to:
> 
> [   65.543736] hrtimer: interrupt took 10817714 ns
> [   65.514171] CE: xen increased min_delta_ns to 150000 nsec
> [   65.514171] CE: xen increased min_delta_ns to 225000 nsec
> [   65.514171] CE: xen increased min_delta_ns to 337500 nsec
> [   65.566495] CE: xen increased min_delta_ns to 150000 nsec
> [   65.514171] CE: xen increased min_delta_ns to 506250 nsec
> [   65.573088] CE: xen increased min_delta_ns to 150000 nsec
> [   65.572884] CE: xen increased min_delta_ns to 150000 nsec
> [   65.514171] CE: xen increased min_delta_ns to 759375 nsec
> [   65.638644] CE: xen increased min_delta_ns to 150000 nsec
> [   65.566495] CE: xen increased min_delta_ns to 225000 nsec
> [   65.514171] CE: xen increased min_delta_ns to 1000000 nsec
> [   65.572884] CE: xen increased min_delta_ns to 225000 nsec
> [   65.573088] CE: xen increased min_delta_ns to 225000 nsec
> [   65.630224] CE: xen increased min_delta_ns to 150000 nsec
> ...
> 
> xenrt1062821 login: [   82.752788] CE: Reprogramming failure. Giving up
> [   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
> [   82.793075] CE: Reprogramming failure. Giving up
> [   82.779470] CE: Reprogramming failure. Giving up
> [   82.821864] CE: xen increased min_delta_ns to 506250 nsec
> [   82.821864] CE: xen increased min_delta_ns to 759375 nsec
> [   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
> [   82.821864] CE: Reprogramming failure. Giving up
> [   82.856256] CE: Reprogramming failure. Giving up
> [   84.566279] CE: Reprogramming failure. Giving up
> [   84.649493] Freezing user space processes ... 
> [  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
> [  130.604032] Task dump for CPU 14:
> [  130.604032] swapper/14      R  running task        0     0      1 0x00000000
> [  130.604032] Call Trace:
> [  130.604032]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> [  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
> [  549.655463] Task dump for CPU 26:
> [  549.655463] swapper/26      R  running task        0     0      1 0x00000000
> [  549.655463] Call Trace:
> [  549.655463]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> [  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
> [  821.888596] Task dump for CPU 26:
> [  821.888622] swapper/26      R  running task        0     0      1 0x00000000
> [  821.888677] Call Trace:
> [  821.888712]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> 
> At some point Linux simply gives up trying to reprogram the timer, and
> that obviously lead to CPU stalls.

And that's all with old enough Linux then, I suppose?

> Ignoring the VCPU_SSHOTTMR_future flag allows the guest to survive, by
> not returning ETIME and just injecting the expired interrupt.
> 
> Overall I'm not sure how useful VCPU_SSHOTTMR_future is at least when
> implemented as done currently in Linux.
> 
> Instead of trying to reprogram the timer Linux should do the
> equivalent of self-inject a timer interrupt in order to cope with the
> fact that the selected timeout has already expired.

Indeed - that's what I was expecting would be happening. But I didn't
go check their code ... Yet them getting it wrong still isn't a reason
to ignore the request, at least not unconditionally. OSes could be
getting it right, and they could then benefit from the avoided event.

As to "unconditionally": Introducing a per-guest control is likely too
much overhead for something that, aiui, isn't commonly used (anymore).
But tying this to a command line option might make sense - engaging it
shouldn't (hopefully) lead to misbehavior in guests properly using the
flag, so ought to be okay to enable in a system-wide manner.

I vaguely recall considerations for similar overrides to hypercall
behavior in other areas, so such an option - if made extensible -
might find further uses down the road.

Jan

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Roger Pau Monné 1 year ago
On Wed, Apr 19, 2023 at 11:18:38AM +0200, Jan Beulich wrote:
> On 19.04.2023 11:02, Roger Pau Monné wrote:
> > On Wed, Apr 19, 2023 at 09:07:41AM +0200, Jan Beulich wrote:
> >> On 18.04.2023 17:54, Andrew Cooper wrote:
> >>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> >>>> The addition of the flags field in the vcpu_set_singleshot_timer in
> >>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> >>>> increased.
> >>>>
> >>>> Remove such field addition and drop the implementation of the
> >>>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> >>>> value just inject the timer interrupt.
> >>>>
> >>>> Bump the Xen interface version, and keep the flags field and
> >>>> VCPU_SSHOTTMR_future available for guests using the old interface.
> >>>>
> >>>> Note the removal of the field from the vcpu_set_singleshot_timer
> >>>> struct allows removing the compat translation of the struct.
> >>>>
> >>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> >>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>
> >>> While everything said is true, this isn't the reason to to get rid of
> >>> VCPU_SSHOTTMR_future
> >>>
> >>> It 505ef3ea8687 does appear to have been an ABI break, that's
> >>> incidental.  It dates from 2007 so whatever we have now is the de-facto
> >>> ABI, whether we like it or not.
> >>>
> >>> The reason to delete this is because it is a monumentality and entirely
> >>> stupid idea which should have been rejected outright at the time, and
> >>> the only guest we can find which uses it also BUG_ON()'s in response to
> >>> -ETIME.
> >>
> >> The instance in Linux (up to 4.6) that I could find was
> >>
> >> 	BUG_ON(ret != 0 && ret != -ETIME);
> >>
> >> i.e. not really dying when getting back -ETIME. (And if there really was
> >> a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
> >> not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
> >> idea" as well as ...
> > 
> > The logic in old Linux is indeed 'fine' in the sense that it doesn't
> > hit a BUG_ON.
> > 
> > The problem we are seeing is that when logdirty is enabled on a guest
> > with 32vCPUs (and without any kind of logdirty hardware assistance)
> > the contention on the p2m lock is so high that by the time
> > VCPUOP_set_singleshot_timer has copied the hypercall data from HVM
> > context the provided timeout has already expired, leading to:
> > 
> > [   65.543736] hrtimer: interrupt took 10817714 ns
> > [   65.514171] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 337500 nsec
> > [   65.566495] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 506250 nsec
> > [   65.573088] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.572884] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 759375 nsec
> > [   65.638644] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.566495] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 1000000 nsec
> > [   65.572884] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.573088] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.630224] CE: xen increased min_delta_ns to 150000 nsec
> > ...
> > 
> > xenrt1062821 login: [   82.752788] CE: Reprogramming failure. Giving up
> > [   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
> > [   82.793075] CE: Reprogramming failure. Giving up
> > [   82.779470] CE: Reprogramming failure. Giving up
> > [   82.821864] CE: xen increased min_delta_ns to 506250 nsec
> > [   82.821864] CE: xen increased min_delta_ns to 759375 nsec
> > [   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
> > [   82.821864] CE: Reprogramming failure. Giving up
> > [   82.856256] CE: Reprogramming failure. Giving up
> > [   84.566279] CE: Reprogramming failure. Giving up
> > [   84.649493] Freezing user space processes ... 
> > [  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
> > [  130.604032] Task dump for CPU 14:
> > [  130.604032] swapper/14      R  running task        0     0      1 0x00000000
> > [  130.604032] Call Trace:
> > [  130.604032]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> > [  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> > [  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> > [  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> > [  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> > [  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > [  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
> > [  549.655463] Task dump for CPU 26:
> > [  549.655463] swapper/26      R  running task        0     0      1 0x00000000
> > [  549.655463] Call Trace:
> > [  549.655463]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> > [  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> > [  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> > [  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> > [  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> > [  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > [  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
> > [  821.888596] Task dump for CPU 26:
> > [  821.888622] swapper/26      R  running task        0     0      1 0x00000000
> > [  821.888677] Call Trace:
> > [  821.888712]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> > [  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> > [  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> > [  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> > [  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> > [  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > 
> > At some point Linux simply gives up trying to reprogram the timer, and
> > that obviously lead to CPU stalls.
> 
> And that's all with old enough Linux then, I suppose?

That's Linux 3.10.

> > Ignoring the VCPU_SSHOTTMR_future flag allows the guest to survive, by
> > not returning ETIME and just injecting the expired interrupt.
> > 
> > Overall I'm not sure how useful VCPU_SSHOTTMR_future is at least when
> > implemented as done currently in Linux.
> > 
> > Instead of trying to reprogram the timer Linux should do the
> > equivalent of self-inject a timer interrupt in order to cope with the
> > fact that the selected timeout has already expired.
> 
> Indeed - that's what I was expecting would be happening. But I didn't
> go check their code ... Yet them getting it wrong still isn't a reason
> to ignore the request, at least not unconditionally. OSes could be
> getting it right, and they could then benefit from the avoided event.

Well, the reason to ignore would be because the introduction of the
flags field and the VCPU_SSHOTTMR_future option did break the ABI.

If we care about that behavior we should introduce a new hypercall,
either that behaves in such way, or that has a flags field in order to
implement it.

> As to "unconditionally": Introducing a per-guest control is likely too
> much overhead for something that, aiui, isn't commonly used (anymore).

No, I don't think any guest I've looked at (Linux, NetBSD, FreeBSD)
use the VCPU_SSHOTTMR_future flag.

> But tying this to a command line option might make sense - engaging it
> shouldn't (hopefully) lead to misbehavior in guests properly using the
> flag, so ought to be okay to enable in a system-wide manner.

I personally don't think we should go to those lengths in order to
keep this behavior, because ignoring VCPU_SSHOTTMR_future (and thus
never returning -ENOTIME) is compatible with the current
implementation.  The Linux implementation shows that even
it's current/past user didn't know how the flag should be used anyway.

As said above, if there's a willingness to have this behavior (which
based on the current public implementations it seems it's not) it can
always be implemented as a separate hypercall.

Thanks, Roger.

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Roger Pau Monné 1 year ago
On Tue, Apr 18, 2023 at 04:54:49PM +0100, Andrew Cooper wrote:
> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> > The addition of the flags field in the vcpu_set_singleshot_timer in
> > 505ef3ea8687 is an ABI breakage, as the size of the structure is
> > increased.
> >
> > Remove such field addition and drop the implementation of the
> > VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> > value just inject the timer interrupt.
> >
> > Bump the Xen interface version, and keep the flags field and
> > VCPU_SSHOTTMR_future available for guests using the old interface.
> >
> > Note the removal of the field from the vcpu_set_singleshot_timer
> > struct allows removing the compat translation of the struct.
> >
> > Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> While everything said is true, this isn't the reason to to get rid of
> VCPU_SSHOTTMR_future
> 
> It 505ef3ea8687 does appear to have been an ABI break, that's
> incidental.  It dates from 2007 so whatever we have now is the de-facto
> ABI, whether we like it or not.
> 
> The reason to delete this is because it is a monumentality and entirely
> stupid idea which should have been rejected outright at the time, and
> the only guest we can find which uses it also BUG_ON()'s in response to
> -ETIME.

I agree, but didn't think it was necessary to get into debating
whether it's useful or not, since its introduction was bogus anyway.

> It can literally only be used to shoot yourself in the foot with, and
> more recent Linuxes have dropped their use of it.
> 
> The structure needs to stay it's current shape, and while it's fine to
> hide the VCPU_SSHOTTMR_future behind an interface version macro, we do
> need to say that it is explicitly ignored.

Oh, I think I've dropped the comment I had added next to
VCPU_SSHOTTMR_future that contained /* Ignored. */ (just like for the whole
flags field).

I can elaborate a bit on why VCPU_SSHOTTMR_future is not useful in the
commit log, and add that Ignored comment to the flag.

Thanks, Roger.

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Andrew Cooper 1 year ago
On 18/04/2023 5:02 pm, Roger Pau Monné wrote:
> On Tue, Apr 18, 2023 at 04:54:49PM +0100, Andrew Cooper wrote:
>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
>>> The addition of the flags field in the vcpu_set_singleshot_timer in
>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
>>> increased.
>>>
>>> Remove such field addition and drop the implementation of the
>>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
>>> value just inject the timer interrupt.
>>>
>>> Bump the Xen interface version, and keep the flags field and
>>> VCPU_SSHOTTMR_future available for guests using the old interface.
>>>
>>> Note the removal of the field from the vcpu_set_singleshot_timer
>>> struct allows removing the compat translation of the struct.
>>>
>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> While everything said is true, this isn't the reason to to get rid of
>> VCPU_SSHOTTMR_future
>>
>> It 505ef3ea8687 does appear to have been an ABI break, that's
>> incidental.  It dates from 2007 so whatever we have now is the de-facto
>> ABI, whether we like it or not.
>>
>> The reason to delete this is because it is a monumentality and entirely
>> stupid idea which should have been rejected outright at the time, and
>> the only guest we can find which uses it also BUG_ON()'s in response to
>> -ETIME.
> I agree, but didn't think it was necessary to get into debating
> whether it's useful or not, since its introduction was bogus anyway.

Well - the reason to actually make a change is that (older) guests are
really exploding on that BUG_ON() for reasons outside of their own control.

And the reason to fix it by ignoring VCPU_SSHOTTMR_future is because the
entire concept is broken and should never have existed.

The ABI argument just adds to why the patch ought to have been rejected
at the time.  But it was done, and the fact it has been like this for 16
years means that the ABI shouldn't change further, even if it done in
error in the first place.

>
>> It can literally only be used to shoot yourself in the foot with, and
>> more recent Linuxes have dropped their use of it.
>>
>> The structure needs to stay it's current shape, and while it's fine to
>> hide the VCPU_SSHOTTMR_future behind an interface version macro, we do
>> need to say that it is explicitly ignored.
> Oh, I think I've dropped the comment I had added next to
> VCPU_SSHOTTMR_future that contained /* Ignored. */ (just like for the whole
> flags field).
>
> I can elaborate a bit on why VCPU_SSHOTTMR_future is not useful in the
> commit log, and add that Ignored comment to the flag.

The important thing is to not actually change the size of the structure,
and to change the commit message to explain the real reason why we need
to make the change.

~Andrew

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Roger Pau Monné 1 year ago
On Tue, Apr 18, 2023 at 05:12:07PM +0100, Andrew Cooper wrote:
> On 18/04/2023 5:02 pm, Roger Pau Monné wrote:
> > On Tue, Apr 18, 2023 at 04:54:49PM +0100, Andrew Cooper wrote:
> >> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> >>> The addition of the flags field in the vcpu_set_singleshot_timer in
> >>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> >>> increased.
> >>>
> >>> Remove such field addition and drop the implementation of the
> >>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> >>> value just inject the timer interrupt.
> >>>
> >>> Bump the Xen interface version, and keep the flags field and
> >>> VCPU_SSHOTTMR_future available for guests using the old interface.
> >>>
> >>> Note the removal of the field from the vcpu_set_singleshot_timer
> >>> struct allows removing the compat translation of the struct.
> >>>
> >>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> >>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> While everything said is true, this isn't the reason to to get rid of
> >> VCPU_SSHOTTMR_future
> >>
> >> It 505ef3ea8687 does appear to have been an ABI break, that's
> >> incidental.  It dates from 2007 so whatever we have now is the de-facto
> >> ABI, whether we like it or not.
> >>
> >> The reason to delete this is because it is a monumentality and entirely
> >> stupid idea which should have been rejected outright at the time, and
> >> the only guest we can find which uses it also BUG_ON()'s in response to
> >> -ETIME.
> > I agree, but didn't think it was necessary to get into debating
> > whether it's useful or not, since its introduction was bogus anyway.
> 
> Well - the reason to actually make a change is that (older) guests are
> really exploding on that BUG_ON() for reasons outside of their own control.
> 
> And the reason to fix it by ignoring VCPU_SSHOTTMR_future is because the
> entire concept is broken and should never have existed.
> 
> The ABI argument just adds to why the patch ought to have been rejected
> at the time.  But it was done, and the fact it has been like this for 16
> years means that the ABI shouldn't change further, even if it done in
> error in the first place.
> 
> >
> >> It can literally only be used to shoot yourself in the foot with, and
> >> more recent Linuxes have dropped their use of it.
> >>
> >> The structure needs to stay it's current shape, and while it's fine to
> >> hide the VCPU_SSHOTTMR_future behind an interface version macro, we do
> >> need to say that it is explicitly ignored.
> > Oh, I think I've dropped the comment I had added next to
> > VCPU_SSHOTTMR_future that contained /* Ignored. */ (just like for the whole
> > flags field).
> >
> > I can elaborate a bit on why VCPU_SSHOTTMR_future is not useful in the
> > commit log, and add that Ignored comment to the flag.
> 
> The important thing is to not actually change the size of the structure,
> and to change the commit message to explain the real reason why we need
> to make the change.

Why not revert back to the previous (smaller) size of the structure?

That would work for guests that have been built with Xen 3.0 headers.
Currently Xen does copy past the original (3.0) structure and likely
copy rubbish from the guest stack from those guests, that might
contain the VCPU_SSHOTTMR_future bit set and end up returning -ENOTIME
unexpectedly.

Overall I don't see the benefit of keeping the flags field, even if
technically it's been so long the ABI was broken that it doesn't
matter anymore.  But still keeping an empty flags field is kind of
useless, the more that removing it allows removing the compat code
handling for VCPUOP_set_singleshot_timer.

Thanks, Roger.

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Jan Beulich 1 year ago
On 19.04.2023 11:41, Roger Pau Monné wrote:
> On Tue, Apr 18, 2023 at 05:12:07PM +0100, Andrew Cooper wrote:
>> On 18/04/2023 5:02 pm, Roger Pau Monné wrote:
>>> On Tue, Apr 18, 2023 at 04:54:49PM +0100, Andrew Cooper wrote:
>>>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
>>>>> The addition of the flags field in the vcpu_set_singleshot_timer in
>>>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
>>>>> increased.
>>>>>
>>>>> Remove such field addition and drop the implementation of the
>>>>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
>>>>> value just inject the timer interrupt.
>>>>>
>>>>> Bump the Xen interface version, and keep the flags field and
>>>>> VCPU_SSHOTTMR_future available for guests using the old interface.
>>>>>
>>>>> Note the removal of the field from the vcpu_set_singleshot_timer
>>>>> struct allows removing the compat translation of the struct.
>>>>>
>>>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
>>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> While everything said is true, this isn't the reason to to get rid of
>>>> VCPU_SSHOTTMR_future
>>>>
>>>> It 505ef3ea8687 does appear to have been an ABI break, that's
>>>> incidental.  It dates from 2007 so whatever we have now is the de-facto
>>>> ABI, whether we like it or not.
>>>>
>>>> The reason to delete this is because it is a monumentality and entirely
>>>> stupid idea which should have been rejected outright at the time, and
>>>> the only guest we can find which uses it also BUG_ON()'s in response to
>>>> -ETIME.
>>> I agree, but didn't think it was necessary to get into debating
>>> whether it's useful or not, since its introduction was bogus anyway.
>>
>> Well - the reason to actually make a change is that (older) guests are
>> really exploding on that BUG_ON() for reasons outside of their own control.
>>
>> And the reason to fix it by ignoring VCPU_SSHOTTMR_future is because the
>> entire concept is broken and should never have existed.
>>
>> The ABI argument just adds to why the patch ought to have been rejected
>> at the time.  But it was done, and the fact it has been like this for 16
>> years means that the ABI shouldn't change further, even if it done in
>> error in the first place.
>>
>>>
>>>> It can literally only be used to shoot yourself in the foot with, and
>>>> more recent Linuxes have dropped their use of it.
>>>>
>>>> The structure needs to stay it's current shape, and while it's fine to
>>>> hide the VCPU_SSHOTTMR_future behind an interface version macro, we do
>>>> need to say that it is explicitly ignored.
>>> Oh, I think I've dropped the comment I had added next to
>>> VCPU_SSHOTTMR_future that contained /* Ignored. */ (just like for the whole
>>> flags field).
>>>
>>> I can elaborate a bit on why VCPU_SSHOTTMR_future is not useful in the
>>> commit log, and add that Ignored comment to the flag.
>>
>> The important thing is to not actually change the size of the structure,
>> and to change the commit message to explain the real reason why we need
>> to make the change.
> 
> Why not revert back to the previous (smaller) size of the structure?
> 
> That would work for guests that have been built with Xen 3.0 headers.

Are there any such guests known to still be in active use? Linux iirc
requires 4.0 as a minimum ...

Jan

> Currently Xen does copy past the original (3.0) structure and likely
> copy rubbish from the guest stack from those guests, that might
> contain the VCPU_SSHOTTMR_future bit set and end up returning -ENOTIME
> unexpectedly.
> 
> Overall I don't see the benefit of keeping the flags field, even if
> technically it's been so long the ABI was broken that it doesn't
> matter anymore.  But still keeping an empty flags field is kind of
> useless, the more that removing it allows removing the compat code
> handling for VCPUOP_set_singleshot_timer.
> 
> Thanks, Roger.


Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field
Posted by Roger Pau Monné 1 year ago
On Wed, Apr 19, 2023 at 12:11:09PM +0200, Jan Beulich wrote:
> On 19.04.2023 11:41, Roger Pau Monné wrote:
> > On Tue, Apr 18, 2023 at 05:12:07PM +0100, Andrew Cooper wrote:
> >> On 18/04/2023 5:02 pm, Roger Pau Monné wrote:
> >>> On Tue, Apr 18, 2023 at 04:54:49PM +0100, Andrew Cooper wrote:
> >>>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> >>>>> The addition of the flags field in the vcpu_set_singleshot_timer in
> >>>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> >>>>> increased.
> >>>>>
> >>>>> Remove such field addition and drop the implementation of the
> >>>>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> >>>>> value just inject the timer interrupt.
> >>>>>
> >>>>> Bump the Xen interface version, and keep the flags field and
> >>>>> VCPU_SSHOTTMR_future available for guests using the old interface.
> >>>>>
> >>>>> Note the removal of the field from the vcpu_set_singleshot_timer
> >>>>> struct allows removing the compat translation of the struct.
> >>>>>
> >>>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> >>>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>> While everything said is true, this isn't the reason to to get rid of
> >>>> VCPU_SSHOTTMR_future
> >>>>
> >>>> It 505ef3ea8687 does appear to have been an ABI break, that's
> >>>> incidental.  It dates from 2007 so whatever we have now is the de-facto
> >>>> ABI, whether we like it or not.
> >>>>
> >>>> The reason to delete this is because it is a monumentality and entirely
> >>>> stupid idea which should have been rejected outright at the time, and
> >>>> the only guest we can find which uses it also BUG_ON()'s in response to
> >>>> -ETIME.
> >>> I agree, but didn't think it was necessary to get into debating
> >>> whether it's useful or not, since its introduction was bogus anyway.
> >>
> >> Well - the reason to actually make a change is that (older) guests are
> >> really exploding on that BUG_ON() for reasons outside of their own control.
> >>
> >> And the reason to fix it by ignoring VCPU_SSHOTTMR_future is because the
> >> entire concept is broken and should never have existed.
> >>
> >> The ABI argument just adds to why the patch ought to have been rejected
> >> at the time.  But it was done, and the fact it has been like this for 16
> >> years means that the ABI shouldn't change further, even if it done in
> >> error in the first place.
> >>
> >>>
> >>>> It can literally only be used to shoot yourself in the foot with, and
> >>>> more recent Linuxes have dropped their use of it.
> >>>>
> >>>> The structure needs to stay it's current shape, and while it's fine to
> >>>> hide the VCPU_SSHOTTMR_future behind an interface version macro, we do
> >>>> need to say that it is explicitly ignored.
> >>> Oh, I think I've dropped the comment I had added next to
> >>> VCPU_SSHOTTMR_future that contained /* Ignored. */ (just like for the whole
> >>> flags field).
> >>>
> >>> I can elaborate a bit on why VCPU_SSHOTTMR_future is not useful in the
> >>> commit log, and add that Ignored comment to the flag.
> >>
> >> The important thing is to not actually change the size of the structure,
> >> and to change the commit message to explain the real reason why we need
> >> to make the change.
> > 
> > Why not revert back to the previous (smaller) size of the structure?
> > 
> > That would work for guests that have been built with Xen 3.0 headers.
> 
> Are there any such guests known to still be in active use? Linux iirc
> requires 4.0 as a minimum ...

That would be something from the pre-pvops Linux days.

I looked forward a bit, and I don't think the introduction of the
field was an ABI breakage, because it was done a day after introducing
the original hypercall, so no released version of the hypervisor
headers contained the structure without the flags field:

commit 505ef3ea86870bb8a35533ec9d446f98a6b61ea6
Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
Date:   Sat Mar 10 16:58:11 2007 +0000

    Add flags field to VCPUOP_set_singlsehot_timer.
    Flag 'future' causes Xen to check if the timeout is in the past and
    return -ETIME if so.
    From: Jeremy Fitzhardinge <jeremy@goop.org>
    Signed-off-by: Keir Fraser <keir@xensource.com>

commit eb1a565927c0fdcd89be41f6d063c458539cca8d
Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
Date:   Fri Mar 9 18:26:47 2007 +0000

    xen: New vcpu_op commands for setting periodic and single-shot timers.
    Signed-off-by: Keir Fraser <keir@xensource.com>

So I think my proposal is to declare the flag as deprecated (and
effectively ignored in the hypervisor) due to the bogus usage of it in
Linux.

Thanks, Roger.