[PATCH] rpc: avoid crash when system time jump back

BiaoxiangYe posted 1 patch 3 years, 1 month ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/1612771947-8968-1-git-send-email-yebiaoxiang@huawei.com
There is a newer version of this series
src/rpc/virkeepalive.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
[PATCH] rpc: avoid crash when system time jump back
Posted by BiaoxiangYe 3 years, 1 month ago
From: BiaoXiang Ye <yebiaoxiang@huawei.com>

Setting the system time backward would lead to a
multiplication overflow in function virKeepAliveStart.
FunctionvirKeepAliveTimerInternal got the same bug too.

Backtrace below:
 #0  0x0000ffffae898470 in raise () from /usr/lib64/libc.so.6
 #1  0x0000ffffae89981c in abort () from /usr/lib64/libc.so.6
 #2  0x0000ffffaf9a36a8 in __mulvsi3 () from /usr/lib64/libvirt.so.0
 #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval@entry=0,
     count=count@entry=0) at ../../src/rpc/virkeepalive.c:283
 #4  0x0000ffffaf908560 in virNetServerClientStartKeepAlive (client=0xaaaaf954cbe0)
     at ../../src/rpc/virnetserverclient.c:1628
 #5  0x0000aaaac57eb6dc in remoteDispatchConnectSupportsFeature (server=0xaaaaf95309d0,
     msg=0xaaaaf9549d90, ret=0xffff8c007fc0, args=0xffff8c002e70, rerr=0xffff9ea054a0,
     client=0xaaaaf954cbe0) at ../../src/remote/remote_daemon_dispatch.c:5063
 #6  remoteDispatchConnectSupportsFeatureHelper (server=0xaaaaf95309d0, client=0xaaaaf954cbe0,
     msg=0xaaaaf9549d90, rerr=0xffff9ea054a0, args=0xffff8c002e70, ret=0xffff8c007fc0)
     at ./remote/remote_daemon_dispatch_stubs.h:3503
 #7  0x0000ffffaf9053a4 in virNetServerProgramDispatchCall(msg=0xaaaaf9549d90, client=0xaaaaf954cbe0,
     server=0x0, prog=0xaaaaf953a170) at ../../src/rpc/virnetserverprogram.c:451
 #8  virNetServerProgramDispatch (prog=0xaaaaf953a170, server=0x0, server@entry=0xaaaaf95309d0,
     client=0xaaaaf954cbe0, msg=0xaaaaf9549d90) at ../../src/rpc/virnetserverprogram.c:306
 #9  0x0000ffffaf90a6bc in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>,
     client=<optimized out>, srv=0xaaaaf95309d0) at ../../src/rpc/virnetserver.c:137
 #10 virNetServerHandleJob (jobOpaque=0xaaaaf950df80, opaque=0xaaaaf95309d0)
     at ../../src/rpc/virnetserver.c:154
 #11 0x0000ffffaf812e14 in virThreadPoolWorker (opaque=<optimized out>)
     at ../../src/util/virthreadpool.c:163
 #12 0x0000ffffaf81237c in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:246
 #13 0x0000ffffaea327ac in ?? () from /usr/lib64/libpthread.so.0
 #14 0x0000ffffae93747c in ?? () from /usr/lib64/libc.so.6
 (gdb) frame 3
 #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval@entry=0,
     count=count@entry=0) at ../../src/rpc/virkeepalive.c:283
 283	        timeout = ka->interval - delay;
 (gdb) list
 278    now = time(NULL);
 279    delay = now - ka->lastPacketReceived; <='delay' got a negative value
 280    if (delay > ka->interval)
 281        timeout = 0;
 282    else
 283        timeout = ka->interval - delay;
 284    ka->intervalStart = now - (ka->interval - timeout);
 285    ka->timer = virEventAddTimeout(timeout * 1000, virKeepAliveTimer, <= multiplication overflow
 286                                   ka, virObjectFreeCallback);
 287    if (ka->timer < 0)
 (gdb) p now
 $2 = 18288001
 (gdb) p ka->lastPacketReceived
 $3 = 1609430405

Signed-off-by: BiaoXiang Ye <yebiaoxiang@huawei.com>
---
 src/rpc/virkeepalive.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c
index 860b91b6b1..e662484ea6 100644
--- a/src/rpc/virkeepalive.c
+++ b/src/rpc/virkeepalive.c
@@ -119,12 +119,25 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka,
     if (ka->interval <= 0 || ka->intervalStart == 0)
         return false;
 
+    if (now < ka->intervalStart) {
+        VIR_WARN("system time jump back detected. now=%ld,"
+                 " intervalStart=%ld",
+                 now, ka->intervalStart);
+        ka->intervalStart = now;
+    }
+
     if (now - ka->intervalStart < ka->interval) {
         timeval = ka->interval - (now - ka->intervalStart);
         virEventUpdateTimeout(ka->timer, timeval * 1000);
         return false;
     }
 
+    if (now < ka->lastPacketReceived) {
+        VIR_WARN("system time jump back detected. now=%ld,"
+                 " lastPacketReceived=%ld",
+                 now, ka->lastPacketReceived);
+        ka->lastPacketReceived = now;
+    }
     timeval = now - ka->lastPacketReceived;
     PROBE(RPC_KEEPALIVE_TIMEOUT,
           "ka=%p client=%p countToDeath=%d idle=%d",
@@ -271,6 +284,13 @@ virKeepAliveStart(virKeepAlivePtr ka,
           ka, ka->client, interval, count);
 
     now = time(NULL);
+    if (now < ka->lastPacketReceived) {
+        VIR_WARN("system time jump back detected. now=%ld,"
+                 " lastPacketReceived=%ld",
+                 now, ka->lastPacketReceived);
+        ka->lastPacketReceived = now;
+    }
+
     delay = now - ka->lastPacketReceived;
     if (delay > ka->interval)
         timeout = 0;
-- 
2.23.0

Re: [PATCH] rpc: avoid crash when system time jump back
Posted by Ján Tomko 3 years, 1 month ago
On a Monday in 2021, BiaoxiangYe wrote:
>From: BiaoXiang Ye <yebiaoxiang@huawei.com>
>
>Setting the system time backward would lead to a
>multiplication overflow in function virKeepAliveStart.
>FunctionvirKeepAliveTimerInternal got the same bug too.
>
>Backtrace below:
> #0  0x0000ffffae898470 in raise () from /usr/lib64/libc.so.6
> #1  0x0000ffffae89981c in abort () from /usr/lib64/libc.so.6
> #2  0x0000ffffaf9a36a8 in __mulvsi3 () from /usr/lib64/libvirt.so.0
> #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval@entry=0,
>     count=count@entry=0) at ../../src/rpc/virkeepalive.c:283
> #4  0x0000ffffaf908560 in virNetServerClientStartKeepAlive (client=0xaaaaf954cbe0)
>     at ../../src/rpc/virnetserverclient.c:1628
> #5  0x0000aaaac57eb6dc in remoteDispatchConnectSupportsFeature (server=0xaaaaf95309d0,
>     msg=0xaaaaf9549d90, ret=0xffff8c007fc0, args=0xffff8c002e70, rerr=0xffff9ea054a0,
>     client=0xaaaaf954cbe0) at ../../src/remote/remote_daemon_dispatch.c:5063
> #6  remoteDispatchConnectSupportsFeatureHelper (server=0xaaaaf95309d0, client=0xaaaaf954cbe0,
>     msg=0xaaaaf9549d90, rerr=0xffff9ea054a0, args=0xffff8c002e70, ret=0xffff8c007fc0)
>     at ./remote/remote_daemon_dispatch_stubs.h:3503
> #7  0x0000ffffaf9053a4 in virNetServerProgramDispatchCall(msg=0xaaaaf9549d90, client=0xaaaaf954cbe0,
>     server=0x0, prog=0xaaaaf953a170) at ../../src/rpc/virnetserverprogram.c:451
> #8  virNetServerProgramDispatch (prog=0xaaaaf953a170, server=0x0, server@entry=0xaaaaf95309d0,
>     client=0xaaaaf954cbe0, msg=0xaaaaf9549d90) at ../../src/rpc/virnetserverprogram.c:306
> #9  0x0000ffffaf90a6bc in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>,
>     client=<optimized out>, srv=0xaaaaf95309d0) at ../../src/rpc/virnetserver.c:137
> #10 virNetServerHandleJob (jobOpaque=0xaaaaf950df80, opaque=0xaaaaf95309d0)
>     at ../../src/rpc/virnetserver.c:154
> #11 0x0000ffffaf812e14 in virThreadPoolWorker (opaque=<optimized out>)
>     at ../../src/util/virthreadpool.c:163
> #12 0x0000ffffaf81237c in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:246
> #13 0x0000ffffaea327ac in ?? () from /usr/lib64/libpthread.so.0
> #14 0x0000ffffae93747c in ?? () from /usr/lib64/libc.so.6
> (gdb) frame 3
> #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval@entry=0,
>     count=count@entry=0) at ../../src/rpc/virkeepalive.c:283
> 283	        timeout = ka->interval - delay;
> (gdb) list
> 278    now = time(NULL);
> 279    delay = now - ka->lastPacketReceived; <='delay' got a negative value
> 280    if (delay > ka->interval)
> 281        timeout = 0;
> 282    else
> 283        timeout = ka->interval - delay;
> 284    ka->intervalStart = now - (ka->interval - timeout);
> 285    ka->timer = virEventAddTimeout(timeout * 1000, virKeepAliveTimer, <= multiplication overflow
> 286                                   ka, virObjectFreeCallback);
> 287    if (ka->timer < 0)
> (gdb) p now
> $2 = 18288001
> (gdb) p ka->lastPacketReceived
> $3 = 1609430405
>
>Signed-off-by: BiaoXiang Ye <yebiaoxiang@huawei.com>
>---
> src/rpc/virkeepalive.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
>diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c
>index 860b91b6b1..e662484ea6 100644
>--- a/src/rpc/virkeepalive.c
>+++ b/src/rpc/virkeepalive.c
>@@ -119,12 +119,25 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka,
>     if (ka->interval <= 0 || ka->intervalStart == 0)
>         return false;
>
>+    if (now < ka->intervalStart) {
>+        VIR_WARN("system time jump back detected. now=%ld,"
>+                 " intervalStart=%ld",
>+                 now, ka->intervalStart);
>+        ka->intervalStart = now;
>+    }
>+

Instead of detecting a jump backwards, we could use a monotonic timer
that does not go backwards.

g_get_monotonic_time () from GLib should do that
https://developer.gnome.org/glib/stable/glib-Date-and-Time-Functions.html#g-get-monotonic-time

(Although they say the timer might not tick while the machine is
suspended - not sure whether we need that here or not)

Jano

>     if (now - ka->intervalStart < ka->interval) {
>         timeval = ka->interval - (now - ka->intervalStart);
>         virEventUpdateTimeout(ka->timer, timeval * 1000);
>         return false;
>     }
>
>+    if (now < ka->lastPacketReceived) {
>+        VIR_WARN("system time jump back detected. now=%ld,"
>+                 " lastPacketReceived=%ld",
>+                 now, ka->lastPacketReceived);
>+        ka->lastPacketReceived = now;
>+    }
>     timeval = now - ka->lastPacketReceived;
>     PROBE(RPC_KEEPALIVE_TIMEOUT,
>           "ka=%p client=%p countToDeath=%d idle=%d",
Re: [PATCH] rpc: avoid crash when system time jump back
Posted by Daniel P. Berrangé 3 years, 1 month ago
On Mon, Feb 08, 2021 at 07:50:11PM +0100, Ján Tomko wrote:
> On a Monday in 2021, BiaoxiangYe wrote:
> > From: BiaoXiang Ye <yebiaoxiang@huawei.com>
> > 
> > Setting the system time backward would lead to a
> > multiplication overflow in function virKeepAliveStart.
> > FunctionvirKeepAliveTimerInternal got the same bug too.
> > 
> > Backtrace below:
> > #0  0x0000ffffae898470 in raise () from /usr/lib64/libc.so.6
> > #1  0x0000ffffae89981c in abort () from /usr/lib64/libc.so.6
> > #2  0x0000ffffaf9a36a8 in __mulvsi3 () from /usr/lib64/libvirt.so.0
> > #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval@entry=0,
> >     count=count@entry=0) at ../../src/rpc/virkeepalive.c:283
> > #4  0x0000ffffaf908560 in virNetServerClientStartKeepAlive (client=0xaaaaf954cbe0)
> >     at ../../src/rpc/virnetserverclient.c:1628
> > #5  0x0000aaaac57eb6dc in remoteDispatchConnectSupportsFeature (server=0xaaaaf95309d0,
> >     msg=0xaaaaf9549d90, ret=0xffff8c007fc0, args=0xffff8c002e70, rerr=0xffff9ea054a0,
> >     client=0xaaaaf954cbe0) at ../../src/remote/remote_daemon_dispatch.c:5063
> > #6  remoteDispatchConnectSupportsFeatureHelper (server=0xaaaaf95309d0, client=0xaaaaf954cbe0,
> >     msg=0xaaaaf9549d90, rerr=0xffff9ea054a0, args=0xffff8c002e70, ret=0xffff8c007fc0)
> >     at ./remote/remote_daemon_dispatch_stubs.h:3503
> > #7  0x0000ffffaf9053a4 in virNetServerProgramDispatchCall(msg=0xaaaaf9549d90, client=0xaaaaf954cbe0,
> >     server=0x0, prog=0xaaaaf953a170) at ../../src/rpc/virnetserverprogram.c:451
> > #8  virNetServerProgramDispatch (prog=0xaaaaf953a170, server=0x0, server@entry=0xaaaaf95309d0,
> >     client=0xaaaaf954cbe0, msg=0xaaaaf9549d90) at ../../src/rpc/virnetserverprogram.c:306
> > #9  0x0000ffffaf90a6bc in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>,
> >     client=<optimized out>, srv=0xaaaaf95309d0) at ../../src/rpc/virnetserver.c:137
> > #10 virNetServerHandleJob (jobOpaque=0xaaaaf950df80, opaque=0xaaaaf95309d0)
> >     at ../../src/rpc/virnetserver.c:154
> > #11 0x0000ffffaf812e14 in virThreadPoolWorker (opaque=<optimized out>)
> >     at ../../src/util/virthreadpool.c:163
> > #12 0x0000ffffaf81237c in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:246
> > #13 0x0000ffffaea327ac in ?? () from /usr/lib64/libpthread.so.0
> > #14 0x0000ffffae93747c in ?? () from /usr/lib64/libc.so.6
> > (gdb) frame 3
> > #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval@entry=0,
> >     count=count@entry=0) at ../../src/rpc/virkeepalive.c:283
> > 283	        timeout = ka->interval - delay;
> > (gdb) list
> > 278    now = time(NULL);
> > 279    delay = now - ka->lastPacketReceived; <='delay' got a negative value
> > 280    if (delay > ka->interval)
> > 281        timeout = 0;
> > 282    else
> > 283        timeout = ka->interval - delay;
> > 284    ka->intervalStart = now - (ka->interval - timeout);
> > 285    ka->timer = virEventAddTimeout(timeout * 1000, virKeepAliveTimer, <= multiplication overflow
> > 286                                   ka, virObjectFreeCallback);
> > 287    if (ka->timer < 0)
> > (gdb) p now
> > $2 = 18288001
> > (gdb) p ka->lastPacketReceived
> > $3 = 1609430405
> > 
> > Signed-off-by: BiaoXiang Ye <yebiaoxiang@huawei.com>
> > ---
> > src/rpc/virkeepalive.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> > 
> > diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c
> > index 860b91b6b1..e662484ea6 100644
> > --- a/src/rpc/virkeepalive.c
> > +++ b/src/rpc/virkeepalive.c
> > @@ -119,12 +119,25 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka,
> >     if (ka->interval <= 0 || ka->intervalStart == 0)
> >         return false;
> > 
> > +    if (now < ka->intervalStart) {
> > +        VIR_WARN("system time jump back detected. now=%ld,"
> > +                 " intervalStart=%ld",
> > +                 now, ka->intervalStart);
> > +        ka->intervalStart = now;
> > +    }
> > +
> 
> Instead of detecting a jump backwards, we could use a monotonic timer
> that does not go backwards.
> 
> g_get_monotonic_time () from GLib should do that
> https://developer.gnome.org/glib/stable/glib-Date-and-Time-Functions.html#g-get-monotonic-time
> 
> (Although they say the timer might not tick while the machine is
> suspended - not sure whether we need that here or not)

Your connections are doomed no matter what if you suspend for any
non-negligible amount of time, as the low level TCP connection
will get timed out by the OS most likely.

IOW regardless of which timer we use for keepalive, we're
going to start counting the keepalive failures from when we
resume. It is probably only going to show a difference for any
keepalive responses that we had received, but not yet processed
at time of suspend.

So I think using monotonic timer would likely be ok here.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|