[Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails

Daniel P. Berrangé posted 1 patch 5 years, 10 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180619151719.17002-1-berrange@redhat.com
Test checkpatch passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test s390x failed
There is a newer version of this series
migration/migration.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Daniel P. Berrangé 5 years, 10 months ago
The way we determine if we can start the incoming migration was
changed to use migration_has_all_channels() in:

  commit 428d89084c709e568f9cd301c2f6416a54c53d6d
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Mon Jul 24 13:06:25 2017 +0200

    migration: Create migration_has_all_channels

This method in turn calls multifd_recv_all_channels_created()
which is hardcoded to always return 'true' when multifd is
not in use.

This means that if channel initialization fails with normal
migration, it'll never notice and attempt to start the
incoming migration regardless.

This can be seen, for example, if a client connects to a server
requiring TLS, but has an invalid x509 certificate:

qemu-system-x86_64: The certificate hasn't got a known issuer
qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.

 #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
 #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
 #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
 #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
 #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
 #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
 #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
 #7  0x0000000000000000 in  ()

To handle the non-multifd case, we check whether mis->from_src_file
is non-NULL.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 migration/migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index e1eaa97df4..38ad818b23 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -518,11 +518,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
  */
 bool migration_has_all_channels(void)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     bool all_channels;
 
     all_channels = multifd_recv_all_channels_created();
 
-    return all_channels;
+    return all_channels && mis->from_src_file != NULL;
 }
 
 /*
-- 
2.17.0


Re: [Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Dr. David Alan Gilbert 5 years, 10 months ago
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> The way we determine if we can start the incoming migration was
> changed to use migration_has_all_channels() in:
> 
>   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
>   Author: Juan Quintela <quintela@redhat.com>
>   Date:   Mon Jul 24 13:06:25 2017 +0200
> 
>     migration: Create migration_has_all_channels
> 
> This method in turn calls multifd_recv_all_channels_created()
> which is hardcoded to always return 'true' when multifd is
> not in use.
> 
> This means that if channel initialization fails with normal
> migration, it'll never notice and attempt to start the
> incoming migration regardless.

I'm not sure if that was actually the cause, because older versions
had no check at all in socket_accept_incoming_migration.

> This can be seen, for example, if a client connects to a server
> requiring TLS, but has an invalid x509 certificate:
> 
> qemu-system-x86_64: The certificate hasn't got a known issuer
> qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.
> 
>  #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
>  #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
>  #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
>  #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
>  #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
>  #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
>  #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
>  #7  0x0000000000000000 in  ()
> 
> To handle the non-multifd case, we check whether mis->from_src_file
> is non-NULL.

So, what happens with this fix, does the destination exit cleanly, or
stay to accept another connection or what?

Dave

> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  migration/migration.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index e1eaa97df4..38ad818b23 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -518,11 +518,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>   */
>  bool migration_has_all_channels(void)
>  {
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>      bool all_channels;
>  
>      all_channels = multifd_recv_all_channels_created();
>  
> -    return all_channels;
> +    return all_channels && mis->from_src_file != NULL;
>  }
>  
>  /*
> -- 
> 2.17.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Daniel P. Berrangé 5 years, 10 months ago
On Tue, Jun 19, 2018 at 04:45:05PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > The way we determine if we can start the incoming migration was
> > changed to use migration_has_all_channels() in:
> > 
> >   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
> >   Author: Juan Quintela <quintela@redhat.com>
> >   Date:   Mon Jul 24 13:06:25 2017 +0200
> > 
> >     migration: Create migration_has_all_channels
> > 
> > This method in turn calls multifd_recv_all_channels_created()
> > which is hardcoded to always return 'true' when multifd is
> > not in use.
> > 
> > This means that if channel initialization fails with normal
> > migration, it'll never notice and attempt to start the
> > incoming migration regardless.
> 
> I'm not sure if that was actually the cause, because older versions
> had no check at all in socket_accept_incoming_migration.

Hmm, actually your write - it was a complicated series of refactorings
for multifd, and I think I mis-identified. I'll do a proper bisect so
we can have an accurate record.

> 
> > This can be seen, for example, if a client connects to a server
> > requiring TLS, but has an invalid x509 certificate:
> > 
> > qemu-system-x86_64: The certificate hasn't got a known issuer
> > qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.
> > 
> >  #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
> >  #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
> >  #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
> >  #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
> >  #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
> >  #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
> >  #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
> >  #7  0x0000000000000000 in  ()
> > 
> > To handle the non-multifd case, we check whether mis->from_src_file
> > is non-NULL.
> 
> So, what happens with this fix, does the destination exit cleanly, or
> stay to accept another connection or what?

It stays around and can accept another connection, which is actually
quite desirable, as it limits impact of a malicious client from DOS'ing
the genuine client by racing to connect first.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Dr. David Alan Gilbert 5 years, 10 months ago
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Tue, Jun 19, 2018 at 04:45:05PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > The way we determine if we can start the incoming migration was
> > > changed to use migration_has_all_channels() in:
> > > 
> > >   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
> > >   Author: Juan Quintela <quintela@redhat.com>
> > >   Date:   Mon Jul 24 13:06:25 2017 +0200
> > > 
> > >     migration: Create migration_has_all_channels
> > > 
> > > This method in turn calls multifd_recv_all_channels_created()
> > > which is hardcoded to always return 'true' when multifd is
> > > not in use.
> > > 
> > > This means that if channel initialization fails with normal
> > > migration, it'll never notice and attempt to start the
> > > incoming migration regardless.
> > 
> > I'm not sure if that was actually the cause, because older versions
> > had no check at all in socket_accept_incoming_migration.
> 
> Hmm, actually your write - it was a complicated series of refactorings
> for multifd, and I think I mis-identified. I'll do a proper bisect so
> we can have an accurate record.
> 
> > 
> > > This can be seen, for example, if a client connects to a server
> > > requiring TLS, but has an invalid x509 certificate:
> > > 
> > > qemu-system-x86_64: The certificate hasn't got a known issuer
> > > qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.
> > > 
> > >  #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
> > >  #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
> > >  #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
> > >  #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
> > >  #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
> > >  #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
> > >  #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
> > >  #7  0x0000000000000000 in  ()
> > > 
> > > To handle the non-multifd case, we check whether mis->from_src_file
> > > is non-NULL.
> > 
> > So, what happens with this fix, does the destination exit cleanly, or
> > stay to accept another connection or what?
> 
> It stays around and can accept another connection, which is actually
> quite desirable, as it limits impact of a malicious client from DOS'ing
> the genuine client by racing to connect first.

OK, that's fine, although I suspect in most other cases QEMU exits on
the destination with an error (e.g. if you aren't using TLS and just
connect to the port and squirt garbage down it, I think it'll exit
telling you that it's not a migration header).


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Daniel P. Berrangé 5 years, 10 months ago
On Tue, Jun 19, 2018 at 05:00:34PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Tue, Jun 19, 2018 at 04:45:05PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > The way we determine if we can start the incoming migration was
> > > > changed to use migration_has_all_channels() in:
> > > > 
> > > >   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
> > > >   Author: Juan Quintela <quintela@redhat.com>
> > > >   Date:   Mon Jul 24 13:06:25 2017 +0200
> > > > 
> > > >     migration: Create migration_has_all_channels
> > > > 
> > > > This method in turn calls multifd_recv_all_channels_created()
> > > > which is hardcoded to always return 'true' when multifd is
> > > > not in use.
> > > > 
> > > > This means that if channel initialization fails with normal
> > > > migration, it'll never notice and attempt to start the
> > > > incoming migration regardless.
> > > 
> > > I'm not sure if that was actually the cause, because older versions
> > > had no check at all in socket_accept_incoming_migration.
> > 
> > Hmm, actually your write - it was a complicated series of refactorings
> > for multifd, and I think I mis-identified. I'll do a proper bisect so
> > we can have an accurate record.
> > 
> > > 
> > > > This can be seen, for example, if a client connects to a server
> > > > requiring TLS, but has an invalid x509 certificate:
> > > > 
> > > > qemu-system-x86_64: The certificate hasn't got a known issuer
> > > > qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.
> > > > 
> > > >  #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
> > > >  #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
> > > >  #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
> > > >  #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
> > > >  #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
> > > >  #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
> > > >  #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
> > > >  #7  0x0000000000000000 in  ()
> > > > 
> > > > To handle the non-multifd case, we check whether mis->from_src_file
> > > > is non-NULL.
> > > 
> > > So, what happens with this fix, does the destination exit cleanly, or
> > > stay to accept another connection or what?
> > 
> > It stays around and can accept another connection, which is actually
> > quite desirable, as it limits impact of a malicious client from DOS'ing
> > the genuine client by racing to connect first.
> 
> OK, that's fine, although I suspect in most other cases QEMU exits on
> the destination with an error (e.g. if you aren't using TLS and just
> connect to the port and squirt garbage down it, I think it'll exit
> telling you that it's not a migration header).

That's ok as non-TLS has zero security anyway, so improving resilience
only really matters for TLS setups, prior to authorizing the client.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Daniel P. Berrangé 5 years, 10 months ago
On Tue, Jun 19, 2018 at 04:58:59PM +0100, Daniel P. Berrangé wrote:
> On Tue, Jun 19, 2018 at 04:45:05PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > The way we determine if we can start the incoming migration was
> > > changed to use migration_has_all_channels() in:
> > > 
> > >   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
> > >   Author: Juan Quintela <quintela@redhat.com>
> > >   Date:   Mon Jul 24 13:06:25 2017 +0200
> > > 
> > >     migration: Create migration_has_all_channels
> > > 
> > > This method in turn calls multifd_recv_all_channels_created()
> > > which is hardcoded to always return 'true' when multifd is
> > > not in use.
> > > 
> > > This means that if channel initialization fails with normal
> > > migration, it'll never notice and attempt to start the
> > > incoming migration regardless.
> > 
> > I'm not sure if that was actually the cause, because older versions
> > had no check at all in socket_accept_incoming_migration.
> 
> Hmm, actually your write - it was a complicated series of refactorings
> for multifd, and I think I mis-identified. I'll do a proper bisect so
> we can have an accurate record.

Ok, this commit above introduces the latent bug, and then it is
activated by

  commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Wed Mar 7 08:40:52 2018 +0100

    migration: Delay start of migration main routines
    

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: fix crash in when incoming client channel setup fails
Posted by Eric Blake 5 years, 10 months ago
On 06/19/2018 10:58 AM, Daniel P. Berrangé wrote:

>> So, what happens with this fix, does the destination exit cleanly, or
>> stay to accept another connection or what?
> 
> It stays around and can accept another connection, which is actually
> quite desirable, as it limits impact of a malicious client from DOS'ing
> the genuine client by racing to connect first.

Compare to NBD which had a CVE created for the DOS aspect of a malicious 
client connecting first; see commit 0c9390d9, CVE-2017-9524.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org