Report better error message with no socket to connect to

[PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Martin Kletzander 2 years, 2 months ago

Before this patch users might be confused with the error when no daemon
nor systemd socket unit is running due to the error message being a bit
vague when running as root with no URI:

  # virsh list
  error: failed to connect to the hypervisor
  error: Operation not supported: Cannot use direct socket mode if no
  URI is set

Instead of merely suggesting to start any daemon, also give a hint as to
what socket we have tried looking up:

  # virsh list
  error: failed to connect to the hypervisor
  error: Operation not supported: Cannot connect to
  '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
  daemon or systemd socket unit started?

Resolves: https://issues.redhat.com/browse/RHEL-700
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
---
 src/remote/remote_sockets.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/remote/remote_sockets.c b/src/remote/remote_sockets.c
index c21970cd31e7..0c0e31e4eb78 100644
--- a/src/remote/remote_sockets.c
+++ b/src/remote/remote_sockets.c
@@ -311,6 +311,7 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
     g_autofree char *daemon_name = NULL;
     g_autofree char *direct_sock_name = NULL;
     g_autofree char *legacy_sock_name = NULL;
+    g_autofree char *default_socket = NULL;
 #ifdef REMOTE_DRIVER_AUTOSTART_DIRECT
     g_autofree char *guessdriver = NULL;
 #endif
@@ -345,7 +346,7 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
         } else {
             if (remoteProbeSystemDriverFromSocket(flags & REMOTE_DRIVER_OPEN_RO,
                                                   &guessdriver,
-                                                  NULL) < 0)
+                                                  &default_socket) < 0)
                 return NULL;
         }
         driver = guessdriver;
@@ -404,8 +405,14 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
         }
 
         if (!direct_sock_name) {
-            virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
-                           _("Cannot use direct socket mode if no URI is set"));
+            if (default_socket) {
+                virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
+                               _("Cannot connect to '%1$s' and no URI is set, is any virt daemon or systemd socket unit started?"),
+                               default_socket);
+            } else {
+                virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
+                               _("Cannot use direct socket mode if no URI is set"));
+            }
             return NULL;
         }
 
-- 
2.43.0
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

On Tue, Nov 28, 2023 at 01:02:40PM +0100, Martin Kletzander wrote:
> Before this patch users might be confused with the error when no daemon
> nor systemd socket unit is running due to the error message being a bit
> vague when running as root with no URI:
> 
>   # virsh list
>   error: failed to connect to the hypervisor
>   error: Operation not supported: Cannot use direct socket mode if no
>   URI is set
> 
> Instead of merely suggesting to start any daemon, also give a hint as to
> what socket we have tried looking up:
> 
>   # virsh list
>   error: failed to connect to the hypervisor
>   error: Operation not supported: Cannot connect to
>   '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
>   daemon or systemd socket unit started?

This is definitely much better, but can we suggest the actual command
here?

An alternative might be to refer to a libvirt wiki page which
describes how to start the libvirt daemon(s) after installing libvirt.
This seems to be a common problem, but I can't find any libvirt.org
page about it.  There is https://libvirt.org/daemons.html but my eyes
glaze over reading it.
https://wiki.libvirt.org/The_daemon_cannot_be_started.html is close,
but not quite the same problem.

Rich.

> Resolves: https://issues.redhat.com/browse/RHEL-700
> Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
> ---
>  src/remote/remote_sockets.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/src/remote/remote_sockets.c b/src/remote/remote_sockets.c
> index c21970cd31e7..0c0e31e4eb78 100644
> --- a/src/remote/remote_sockets.c
> +++ b/src/remote/remote_sockets.c
> @@ -311,6 +311,7 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
>      g_autofree char *daemon_name = NULL;
>      g_autofree char *direct_sock_name = NULL;
>      g_autofree char *legacy_sock_name = NULL;
> +    g_autofree char *default_socket = NULL;
>  #ifdef REMOTE_DRIVER_AUTOSTART_DIRECT
>      g_autofree char *guessdriver = NULL;
>  #endif
> @@ -345,7 +346,7 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
>          } else {
>              if (remoteProbeSystemDriverFromSocket(flags & REMOTE_DRIVER_OPEN_RO,
>                                                    &guessdriver,
> -                                                  NULL) < 0)
> +                                                  &default_socket) < 0)
>                  return NULL;
>          }
>          driver = guessdriver;
> @@ -404,8 +405,14 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
>          }
>  
>          if (!direct_sock_name) {
> -            virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
> -                           _("Cannot use direct socket mode if no URI is set"));
> +            if (default_socket) {
> +                virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
> +                               _("Cannot connect to '%1$s' and no URI is set, is any virt daemon or systemd socket unit started?"),
> +                               default_socket);
> +            } else {
> +                virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
> +                               _("Cannot use direct socket mode if no URI is set"));
> +            }
>              return NULL;
>          }
>  
> -- 
> 2.43.0
> _______________________________________________
> Devel mailing list -- devel@lists.libvirt.org
> To unsubscribe send an email to devel-leave@lists.libvirt.org

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Martin Kletzander 2 years, 2 months ago

On Tue, Nov 28, 2023 at 01:00:53PM +0000, Richard W.M. Jones wrote:
>On Tue, Nov 28, 2023 at 01:02:40PM +0100, Martin Kletzander wrote:
>> Before this patch users might be confused with the error when no daemon
>> nor systemd socket unit is running due to the error message being a bit
>> vague when running as root with no URI:
>>
>>   # virsh list
>>   error: failed to connect to the hypervisor
>>   error: Operation not supported: Cannot use direct socket mode if no
>>   URI is set
>>
>> Instead of merely suggesting to start any daemon, also give a hint as to
>> what socket we have tried looking up:
>>
>>   # virsh list
>>   error: failed to connect to the hypervisor
>>   error: Operation not supported: Cannot connect to
>>   '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
>>   daemon or systemd socket unit started?
>
>This is definitely much better, but can we suggest the actual command
>here?
>

I don't think so.  It would be different based on distributions, what
the users want to really do (especially when there is no URI set for the
command or in libvirt.conf) and I had a patch that constructed a long
error message, listing all the tried sockets, but that looked like a
pure overkill.  I added "is any virt daemon or systemd socket unit
started" in order to avoid anything more complicated.  I think that's
enough of a suggestion for an error message, with debugging turned on
all the sockets are listed in the order they are tried.

>An alternative might be to refer to a libvirt wiki page which
>describes how to start the libvirt daemon(s) after installing libvirt.
>This seems to be a common problem, but I can't find any libvirt.org
>page about it.  There is https://libvirt.org/daemons.html but my eyes
>glaze over reading it.
>https://wiki.libvirt.org/The_daemon_cannot_be_started.html is close,
>but not quite the same problem.
>

Well, we can add some kbase page saying "you need to start a daemon if
you want to use libvirt right after installation".  If that seems
appropriate (to be honest, it does not, at least for me), I suggest
someone with a less bias against users who do not know they should start
a daemon when error message asks them if any daemon is running, should
od it.  I would most probably not phrase it in a complimentary way.

>Rich.
>
>> Resolves: https://issues.redhat.com/browse/RHEL-700
>> Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
>> ---
>>  src/remote/remote_sockets.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/remote/remote_sockets.c b/src/remote/remote_sockets.c
>> index c21970cd31e7..0c0e31e4eb78 100644
>> --- a/src/remote/remote_sockets.c
>> +++ b/src/remote/remote_sockets.c
>> @@ -311,6 +311,7 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
>>      g_autofree char *daemon_name = NULL;
>>      g_autofree char *direct_sock_name = NULL;
>>      g_autofree char *legacy_sock_name = NULL;
>> +    g_autofree char *default_socket = NULL;
>>  #ifdef REMOTE_DRIVER_AUTOSTART_DIRECT
>>      g_autofree char *guessdriver = NULL;
>>  #endif
>> @@ -345,7 +346,7 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
>>          } else {
>>              if (remoteProbeSystemDriverFromSocket(flags & REMOTE_DRIVER_OPEN_RO,
>>                                                    &guessdriver,
>> -                                                  NULL) < 0)
>> +                                                  &default_socket) < 0)
>>                  return NULL;
>>          }
>>          driver = guessdriver;
>> @@ -404,8 +405,14 @@ remoteGetUNIXSocket(remoteDriverTransport transport,
>>          }
>>
>>          if (!direct_sock_name) {
>> -            virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
>> -                           _("Cannot use direct socket mode if no URI is set"));
>> +            if (default_socket) {
>> +                virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
>> +                               _("Cannot connect to '%1$s' and no URI is set, is any virt daemon or systemd socket unit started?"),
>> +                               default_socket);
>> +            } else {
>> +                virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
>> +                               _("Cannot use direct socket mode if no URI is set"));
>> +            }
>>              return NULL;
>>          }
>>
>> --
>> 2.43.0
>> _______________________________________________
>> Devel mailing list -- devel@lists.libvirt.org
>> To unsubscribe send an email to devel-leave@lists.libvirt.org
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>libguestfs lets you edit virtual machines.  Supports shell scripting,
>bindings from many languages.  http://libguestfs.org
>
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Daniel P. Berrangé 2 years, 2 months ago

On Tue, Nov 28, 2023 at 01:02:40PM +0100, Martin Kletzander wrote:
> Before this patch users might be confused with the error when no daemon
> nor systemd socket unit is running due to the error message being a bit
> vague when running as root with no URI:
> 
>   # virsh list
>   error: failed to connect to the hypervisor
>   error: Operation not supported: Cannot use direct socket mode if no
>   URI is set
> 
> Instead of merely suggesting to start any daemon, also give a hint as to
> what socket we have tried looking up:
> 
>   # virsh list
>   error: failed to connect to the hypervisor
>   error: Operation not supported: Cannot connect to
>   '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
>   daemon or systemd socket unit started?

As Peter points out, this is a misleading message because it is
arbitrarily reporting the first compiled in driver, which may
be no resemblance to what the user was expecting to run. I think
we should not include the sock path here, but we could include
the sock *directory*, as that would help diagnose when someone
built with the wrong install prefix, eg

  "No active daemon socket found in /var/run/libvirt, and no
   URI is set. Is any libvirt daemon or socket unit started ?"

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Peter Krempa 2 years, 2 months ago

On Tue, Nov 28, 2023 at 13:22:47 +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 28, 2023 at 01:02:40PM +0100, Martin Kletzander wrote:
> > Before this patch users might be confused with the error when no daemon
> > nor systemd socket unit is running due to the error message being a bit
> > vague when running as root with no URI:
> > 
> >   # virsh list
> >   error: failed to connect to the hypervisor
> >   error: Operation not supported: Cannot use direct socket mode if no
> >   URI is set
> > 
> > Instead of merely suggesting to start any daemon, also give a hint as to
> > what socket we have tried looking up:
> > 
> >   # virsh list
> >   error: failed to connect to the hypervisor
> >   error: Operation not supported: Cannot connect to
> >   '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
> >   daemon or systemd socket unit started?
> 
> As Peter points out, this is a misleading message because it is
> arbitrarily reporting the first compiled in driver, which may
> be no resemblance to what the user was expecting to run. I think

Exactly. E.g. on Fedora and other distros which enable most features
you'd get:

  error: Operation not supported: Cannot connect to '/var/run/libvirt/virtxend-sock' and no URI is set, is any virt daemon or systemd socket unit started?

And running xen is not what most users want to do when they install
libvirt.

> we should not include the sock path here, but we could include
> the sock *directory*, as that would help diagnose when someone
> built with the wrong install prefix, eg
> 
>   "No active daemon socket found in /var/run/libvirt, and no
>    URI is set. Is any libvirt daemon or socket unit started ?"

This sounds much better. I also don't think we should be putting any
command examples here as suggested by Rich as there isn't a good default
value we could use because of exactly the same reason.
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Martin Kletzander 2 years, 2 months ago

On Tue, Nov 28, 2023 at 02:29:20PM +0100, Peter Krempa wrote:
>On Tue, Nov 28, 2023 at 13:22:47 +0000, Daniel P. Berrangé wrote:
>> On Tue, Nov 28, 2023 at 01:02:40PM +0100, Martin Kletzander wrote:
>> > Before this patch users might be confused with the error when no daemon
>> > nor systemd socket unit is running due to the error message being a bit
>> > vague when running as root with no URI:
>> >
>> >   # virsh list
>> >   error: failed to connect to the hypervisor
>> >   error: Operation not supported: Cannot use direct socket mode if no
>> >   URI is set
>> >
>> > Instead of merely suggesting to start any daemon, also give a hint as to
>> > what socket we have tried looking up:
>> >
>> >   # virsh list
>> >   error: failed to connect to the hypervisor
>> >   error: Operation not supported: Cannot connect to
>> >   '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
>> >   daemon or systemd socket unit started?
>>
>> As Peter points out, this is a misleading message because it is
>> arbitrarily reporting the first compiled in driver, which may
>> be no resemblance to what the user was expecting to run. I think
>
>Exactly. E.g. on Fedora and other distros which enable most features
>you'd get:
>
>  error: Operation not supported: Cannot connect to '/var/run/libvirt/virtxend-sock' and no URI is set, is any virt daemon or systemd socket unit started?
>
>And running xen is not what most users want to do when they install
>libvirt.
>

How do you know when there is no URI set?  Maybe they really want to
connect to a different host.  We don't say that QEMU is the default for
some reason.  If there is something most users want, then we could
rearrange the order of the drivers that were tried.

What I think is that users who install libvirt and then immediately run
virsh define or something similar as non-root user probably also do not
want the session daemon to be started.  I just don't know we can
accommodate everyone.  But...

>> we should not include the sock path here, but we could include
>> the sock *directory*, as that would help diagnose when someone
>> built with the wrong install prefix, eg
>>
>>   "No active daemon socket found in /var/run/libvirt, and no
>>    URI is set. Is any libvirt daemon or socket unit started ?"
>
>This sounds much better. I also don't think we should be putting any
>command examples here as suggested by Rich as there isn't a good default
>value we could use because of exactly the same reason.
>

...this looks like a best effort.  If this is fine with everyone, then
I'll change it to this, including RUNSTATEDIR / libvirt.  Of course that
might still not work in other situations, e.g. when readonly connection
is being made, but no readonly socket is there, etc., but that's a very
corner case situation anyway.

So let me know if that's fine with you all.

Thanks,
Martin
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

Can we have the error message include a link to this page?
https://wiki.libvirt.org/Failed_to_connect_to_the_hypervisor.html

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Martin Kletzander 2 years, 2 months ago

On Wed, Nov 29, 2023 at 06:33:54PM +0000, Richard W.M. Jones wrote:
>
>Can we have the error message include a link to this page?
>https://wiki.libvirt.org/Failed_to_connect_to_the_hypervisor.html
>

I think we could, although I'd be a bit afraid of the precedence which,
by itself is not that bad, but keeping the separate document(s) up to
date could bite us in the back in a while.  I won't go against the idea
if others in this discussion are fine with it and we reach a consensus.

>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>nbdkit - Flexible, fast NBD server with plugins
>https://gitlab.com/nbdkit/nbdkit
>
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Peter Krempa 2 years, 2 months ago

On Thu, Nov 30, 2023 at 09:49:56 +0100, Martin Kletzander wrote:
> On Wed, Nov 29, 2023 at 06:33:54PM +0000, Richard W.M. Jones wrote:
> > 
> > Can we have the error message include a link to this page?
> > https://wiki.libvirt.org/Failed_to_connect_to_the_hypervisor.html
> > 
> 
> I think we could, although I'd be a bit afraid of the precedence which,
> by itself is not that bad, but keeping the separate document(s) up to
> date could bite us in the back in a while.  I won't go against the idea
> if others in this discussion are fine with it and we reach a consensus.

There is one precedent case, when backing image brobing fails due to
mess-up in the format fields, in which case the error links to an
article in the kbase on proper setup of images.

I suggest that if you want to add a link into the error, the article
should be kept in the kbase in the main repo instead of the wiki.

The article should probably be also clarified to contain only this case
and also link perhaps to the page about how to setup daemons to run.
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

On Thu, Nov 30, 2023 at 10:06:38AM +0100, Peter Krempa wrote:
> On Thu, Nov 30, 2023 at 09:49:56 +0100, Martin Kletzander wrote:
> > On Wed, Nov 29, 2023 at 06:33:54PM +0000, Richard W.M. Jones wrote:
> > > 
> > > Can we have the error message include a link to this page?
> > > https://wiki.libvirt.org/Failed_to_connect_to_the_hypervisor.html
> > > 
> > 
> > I think we could, although I'd be a bit afraid of the precedence which,
> > by itself is not that bad, but keeping the separate document(s) up to
> > date could bite us in the back in a while.  I won't go against the idea
> > if others in this discussion are fine with it and we reach a consensus.
> 
> There is one precedent case, when backing image brobing fails due to
> mess-up in the format fields, in which case the error links to an
> article in the kbase on proper setup of images.
> 
> I suggest that if you want to add a link into the error, the article
> should be kept in the kbase in the main repo instead of the wiki.
> 
> The article should probably be also clarified to contain only this case
> and also link perhaps to the page about how to setup daemons to run.

This is a lot of extra work just to fix a bad error message.

Error messages should **always** be actionable.  They should, every
single time, tell the user in simple language: What is wrong.  How to
fix it.

This is an egregious example of a bad error from libvirt that I see
happening over and over again.  Let's just fix it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Martin Kletzander 2 years, 2 months ago

On Thu, Nov 30, 2023 at 09:48:10AM +0000, Richard W.M. Jones wrote:
>On Thu, Nov 30, 2023 at 10:06:38AM +0100, Peter Krempa wrote:
>> On Thu, Nov 30, 2023 at 09:49:56 +0100, Martin Kletzander wrote:
>> > On Wed, Nov 29, 2023 at 06:33:54PM +0000, Richard W.M. Jones wrote:
>> > >
>> > > Can we have the error message include a link to this page?
>> > > https://wiki.libvirt.org/Failed_to_connect_to_the_hypervisor.html
>> > >
>> >
>> > I think we could, although I'd be a bit afraid of the precedence which,
>> > by itself is not that bad, but keeping the separate document(s) up to
>> > date could bite us in the back in a while.  I won't go against the idea
>> > if others in this discussion are fine with it and we reach a consensus.
>>
>> There is one precedent case, when backing image brobing fails due to
>> mess-up in the format fields, in which case the error links to an
>> article in the kbase on proper setup of images.
>>
>> I suggest that if you want to add a link into the error, the article
>> should be kept in the kbase in the main repo instead of the wiki.
>>
>> The article should probably be also clarified to contain only this case
>> and also link perhaps to the page about how to setup daemons to run.
>
>This is a lot of extra work just to fix a bad error message.
>
>Error messages should **always** be actionable.  They should, every
>single time, tell the user in simple language: What is wrong.  How to
>fix it.
>
>This is an egregious example of a bad error from libvirt that I see
>happening over and over again.  Let's just fix it.
>

The clearest actionable error message for me would be:

   Cannot find a socket to connect to, please specify a connection URI.

Would that be OK?  It clearly tells the user what to do, and even though
it is not the user experience one might want, it is error-prone when
compared to relying on libvirt to guess the right URI.  It's just luck
that people running different hypervisor daemons than virtqemud are not
using tools that use libvirt, but automatically presume QEMU connection.

I know what I suggest is a lot of work for such a simple thing, the same
way as creating and maintaining a page which tells everyone how to start
a system service is.  If someone has root access to a machine, can
install packages, but does not know how to start a system service, then
there are better places where to find up to date information than every
single project maintaining a list of distro-specific instructions.

If you don't like that last suggestion above, then I'll rewrite it,
create a kbase article and link to it.  No problem.  I'm just trying to
find a middle ground here.

>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>virt-p2v converts physical machines to virtual machines.  Boot with a
>live CD or over the network (PXE) and turn machines into KVM guests.
>http://libguestfs.org/virt-v2v
>
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

On Fri, Dec 01, 2023 at 01:22:14PM +0100, Martin Kletzander wrote:
> On Thu, Nov 30, 2023 at 09:48:10AM +0000, Richard W.M. Jones wrote:
> >On Thu, Nov 30, 2023 at 10:06:38AM +0100, Peter Krempa wrote:
> >>On Thu, Nov 30, 2023 at 09:49:56 +0100, Martin Kletzander wrote:
> >>> On Wed, Nov 29, 2023 at 06:33:54PM +0000, Richard W.M. Jones wrote:
> >>> >
> >>> > Can we have the error message include a link to this page?
> >>> > https://wiki.libvirt.org/Failed_to_connect_to_the_hypervisor.html
> >>> >
> >>>
> >>> I think we could, although I'd be a bit afraid of the precedence which,
> >>> by itself is not that bad, but keeping the separate document(s) up to
> >>> date could bite us in the back in a while.  I won't go against the idea
> >>> if others in this discussion are fine with it and we reach a consensus.
> >>
> >>There is one precedent case, when backing image brobing fails due to
> >>mess-up in the format fields, in which case the error links to an
> >>article in the kbase on proper setup of images.
> >>
> >>I suggest that if you want to add a link into the error, the article
> >>should be kept in the kbase in the main repo instead of the wiki.
> >>
> >>The article should probably be also clarified to contain only this case
> >>and also link perhaps to the page about how to setup daemons to run.
> >
> >This is a lot of extra work just to fix a bad error message.
> >
> >Error messages should **always** be actionable.  They should, every
> >single time, tell the user in simple language: What is wrong.  How to
> >fix it.
> >
> >This is an egregious example of a bad error from libvirt that I see
> >happening over and over again.  Let's just fix it.
> >
> 
> The clearest actionable error message for me would be:
> 
>   Cannot find a socket to connect to, please specify a connection URI.

This isn't really actionable.  What an actionable error message means
is something that explains what the average user has to do, clearly.
And by average user, don't assume they know a lot about how libvirt
works.

If we wanted to go down this route we'd have to explain what the
sockets are and how they are started, what a "connection URI" is, how
to show what connection URIs are available (or suggest some), etc.

Can you actually show the list of available connection URIs?  As far
as I know there's no libvirt tool for doing that.  At least for local
sockets it should be possible, I suppose.

Think about this from the point of view of the average user who really
doesn't want to invest a lot of effort.

> Would that be OK?  It clearly tells the user what to do, and even though
> it is not the user experience one might want, it is error-prone when
> compared to relying on libvirt to guess the right URI.  It's just luck
> that people running different hypervisor daemons than virtqemud are not
> using tools that use libvirt, but automatically presume QEMU connection.

I agree that it may be the user meant to connect to a remote libvirt
server.  Is that more common than needing to start the local libvirt
daemon?  It probably does depend on the type of user.  There will be
some users who are using virt-* tools who need the local server.  And
another quite separate class of users who mainly use libvirt for
remote connections.

Probably if the libvirt install isn't capable of running local libvirt
daemon (MacOS? Windows?) then we can discount the "needs to start the
socket" case, and just explain what connection URIs are.

> I know what I suggest is a lot of work for such a simple thing, the same
> way as creating and maintaining a page which tells everyone how to start
> a system service is.  If someone has root access to a machine, can
> install packages, but does not know how to start a system service, then
> there are better places where to find up to date information than every
> single project maintaining a list of distro-specific instructions.

I really think we do need to do that.  That's the approach we try
(very imperfectly) for in libguestfs, nbdkit, libnbd, etc.

> If you don't like that last suggestion above, then I'll rewrite it,
> create a kbase article and link to it.  No problem.  I'm just trying to
> find a middle ground here.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

On Wed, Nov 29, 2023 at 10:33:34AM +0100, Martin Kletzander wrote:
> On Tue, Nov 28, 2023 at 02:29:20PM +0100, Peter Krempa wrote:
> >On Tue, Nov 28, 2023 at 13:22:47 +0000, Daniel P. Berrangé wrote:
> >>On Tue, Nov 28, 2023 at 01:02:40PM +0100, Martin Kletzander wrote:
> >>> Before this patch users might be confused with the error when no daemon
> >>> nor systemd socket unit is running due to the error message being a bit
> >>> vague when running as root with no URI:
> >>>
> >>>   # virsh list
> >>>   error: failed to connect to the hypervisor
> >>>   error: Operation not supported: Cannot use direct socket mode if no
> >>>   URI is set
> >>>
> >>> Instead of merely suggesting to start any daemon, also give a hint as to
> >>> what socket we have tried looking up:
> >>>
> >>>   # virsh list
> >>>   error: failed to connect to the hypervisor
> >>>   error: Operation not supported: Cannot connect to
> >>>   '/var/run/libvirt/virtqemud-sock' and no URI is set, is any virt
> >>>   daemon or systemd socket unit started?
> >>
> >>As Peter points out, this is a misleading message because it is
> >>arbitrarily reporting the first compiled in driver, which may
> >>be no resemblance to what the user was expecting to run. I think
> >
> >Exactly. E.g. on Fedora and other distros which enable most features
> >you'd get:
> >
> > error: Operation not supported: Cannot connect to '/var/run/libvirt/virtxend-sock' and no URI is set, is any virt daemon or systemd socket unit started?
> >
> >And running xen is not what most users want to do when they install
> >libvirt.
> >
> 
> How do you know when there is no URI set?  Maybe they really want to
> connect to a different host.  We don't say that QEMU is the default for
> some reason.  If there is something most users want, then we could
> rearrange the order of the drivers that were tried.
> 
> What I think is that users who install libvirt and then immediately run
> virsh define or something similar as non-root user probably also do not
> want the session daemon to be started.  I just don't know we can
> accommodate everyone.  But...
> 
> >>we should not include the sock path here, but we could include
> >>the sock *directory*, as that would help diagnose when someone
> >>built with the wrong install prefix, eg
> >>
> >>  "No active daemon socket found in /var/run/libvirt, and no
> >>   URI is set. Is any libvirt daemon or socket unit started ?"
> >
> >This sounds much better. I also don't think we should be putting any
> >command examples here as suggested by Rich as there isn't a good default
> >value we could use because of exactly the same reason.
> >
> 
> ...this looks like a best effort.  If this is fine with everyone, then
> I'll change it to this, including RUNSTATEDIR / libvirt.  Of course that
> might still not work in other situations, e.g. when readonly connection
> is being made, but no readonly socket is there, etc., but that's a very
> corner case situation anyway.
> 
> So let me know if that's fine with you all.

I'd like to try writing a wiki page that we can link in this message,
since I have to explain this problem over and over again to people and
it'd be good to have one place that explains it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

On Wed, Nov 29, 2023 at 09:49:47AM +0000, Richard W.M. Jones wrote:
> I'd like to try writing a wiki page that we can link in this message,
> since I have to explain this problem over and over again to people and
> it'd be good to have one place that explains it.

It's a start:

https://gitlab.com/libvirt/libvirt-wiki/-/merge_requests/6

I realise after writing it that I don't fully understand the problem
myself.

Why is it exactly that the socket doesn't work after installation, but
does work after reboot?  On my laptop, the socket unit is set to
"disabled", yet libvirt works fine (since the laptop has been rebooted
since libvirt was installed, I guess).  Shouldn't the command be
"systemctl enable virtqemud.socket --now"?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Andrea Bolognani 2 years, 2 months ago

On Wed, Nov 29, 2023 at 10:09:36AM +0000, Richard W.M. Jones wrote:
> On Wed, Nov 29, 2023 at 09:49:47AM +0000, Richard W.M. Jones wrote:
> > I'd like to try writing a wiki page that we can link in this message,
> > since I have to explain this problem over and over again to people and
> > it'd be good to have one place that explains it.
>
> It's a start:
>
> https://gitlab.com/libvirt/libvirt-wiki/-/merge_requests/6
>
> I realise after writing it that I don't fully understand the problem
> myself.
>
> Why is it exactly that the socket doesn't work after installation, but
> does work after reboot?  On my laptop, the socket unit is set to
> "disabled", yet libvirt works fine (since the laptop has been rebooted
> since libvirt was installed, I guess).  Shouldn't the command be
> "systemctl enable virtqemud.socket --now"?

It's a distro policy.

I assume you're running Fedora/RHEL on your laptop, and the policy
there is that services (or sockets) should not be started right after
a package is installed. Debian has the opposite policy.

This is in no way specific to libvirt. For example, [1] describes how
to set up Apache on RHEL, and as you can see manually starting the
service after installation is an explicitly documented step.

Regarding why the socket is disabled, are you sure that you're
looking at the actual status rather than the preset? I've made some
improvement to that area recently[2] so things should be less
confusing going forward.

Additionally, virtqemud.service BindsTo=virtqemud.socket, so even if
the socket is disabled, the service being enabled will be enough to
cause it to be started on boot.

[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/deploying_web_servers_and_reverse_proxies/setting-apache-http-server_deploying-web-servers-and-reverse-proxies#setting-up-a-single-instance-apache-http-server_setting-apache-http-server
[2] https://src.fedoraproject.org/rpms/fedora-release/pull-request/281
-- 
Andrea Bolognani / Red Hat / Virtualization
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Richard W.M. Jones 2 years, 2 months ago

On Wed, Nov 29, 2023 at 05:44:59AM -0500, Andrea Bolognani wrote:
> On Wed, Nov 29, 2023 at 10:09:36AM +0000, Richard W.M. Jones wrote:
> > On Wed, Nov 29, 2023 at 09:49:47AM +0000, Richard W.M. Jones wrote:
> > > I'd like to try writing a wiki page that we can link in this message,
> > > since I have to explain this problem over and over again to people and
> > > it'd be good to have one place that explains it.
> >
> > It's a start:
> >
> > https://gitlab.com/libvirt/libvirt-wiki/-/merge_requests/6
> >
> > I realise after writing it that I don't fully understand the problem
> > myself.
> >
> > Why is it exactly that the socket doesn't work after installation, but
> > does work after reboot?  On my laptop, the socket unit is set to
> > "disabled", yet libvirt works fine (since the laptop has been rebooted
> > since libvirt was installed, I guess).  Shouldn't the command be
> > "systemctl enable virtqemud.socket --now"?
> 
> It's a distro policy.
> 
> I assume you're running Fedora/RHEL on your laptop, and the policy
> there is that services (or sockets) should not be started right after
> a package is installed. Debian has the opposite policy.

I think this is a very weird choice (for Fedora).  Why would
installing the package not start the service, but then the service
would be started without further intervention after reboot?  It's the
opposite of predictable behaviour.

> This is in no way specific to libvirt. For example, [1] describes how
> to set up Apache on RHEL, and as you can see manually starting the
> service after installation is an explicitly documented step.
> 
> 
> Regarding why the socket is disabled, are you sure that you're
> looking at the actual status rather than the preset? I've made some
> improvement to that area recently[2] so things should be less
> confusing going forward.
> 
> Additionally, virtqemud.service BindsTo=virtqemud.socket, so even if
> the socket is disabled, the service being enabled will be enough to
> cause it to be started on boot.

OK I see, and yes you're right that the service is enabled.

> [1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/deploying_web_servers_and_reverse_proxies/setting-apache-http-server_deploying-web-servers-and-reverse-proxies#setting-up-a-single-instance-apache-http-server_setting-apache-http-server
> [2] https://src.fedoraproject.org/rpms/fedora-release/pull-request/281
> -- 
> Andrea Bolognani / Red Hat / Virtualization

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Andrea Bolognani 2 years, 2 months ago

On Wed, Nov 29, 2023 at 10:49:23AM +0000, Richard W.M. Jones wrote:
> On Wed, Nov 29, 2023 at 05:44:59AM -0500, Andrea Bolognani wrote:
> > On Wed, Nov 29, 2023 at 10:09:36AM +0000, Richard W.M. Jones wrote:
> > > Why is it exactly that the socket doesn't work after installation, but
> > > does work after reboot?  On my laptop, the socket unit is set to
> > > "disabled", yet libvirt works fine (since the laptop has been rebooted
> > > since libvirt was installed, I guess).  Shouldn't the command be
> > > "systemctl enable virtqemud.socket --now"?
> >
> > It's a distro policy.
> >
> > I assume you're running Fedora/RHEL on your laptop, and the policy
> > there is that services (or sockets) should not be started right after
> > a package is installed. Debian has the opposite policy.
>
> I think this is a very weird choice (for Fedora).  Why would
> installing the package not start the service, but then the service
> would be started without further intervention after reboot?  It's the
> opposite of predictable behaviour.

I believe that the rationale is that a newly-installed service might
need to be configured before it can work correctly/securely. Not
starting it right away provides a temporal window that can be used to
perform the initial configuration, and which is entirely under the
local admin's control.

-- 
Andrea Bolognani / Red Hat / Virtualization
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Martin Kletzander 2 years, 2 months ago

On Wed, Nov 29, 2023 at 12:30:19PM -0600, Andrea Bolognani wrote:
>On Wed, Nov 29, 2023 at 10:49:23AM +0000, Richard W.M. Jones wrote:
>> On Wed, Nov 29, 2023 at 05:44:59AM -0500, Andrea Bolognani wrote:
>> > On Wed, Nov 29, 2023 at 10:09:36AM +0000, Richard W.M. Jones wrote:
>> > > Why is it exactly that the socket doesn't work after installation, but
>> > > does work after reboot?  On my laptop, the socket unit is set to
>> > > "disabled", yet libvirt works fine (since the laptop has been rebooted
>> > > since libvirt was installed, I guess).  Shouldn't the command be
>> > > "systemctl enable virtqemud.socket --now"?
>> >
>> > It's a distro policy.
>> >
>> > I assume you're running Fedora/RHEL on your laptop, and the policy
>> > there is that services (or sockets) should not be started right after
>> > a package is installed. Debian has the opposite policy.
>>
>> I think this is a very weird choice (for Fedora).  Why would
>> installing the package not start the service, but then the service
>> would be started without further intervention after reboot?  It's the
>> opposite of predictable behaviour.
>
>I believe that the rationale is that a newly-installed service might
>need to be configured before it can work correctly/securely. Not
>starting it right away provides a temporal window that can be used to
>perform the initial configuration, and which is entirely under the
>local admin's control.
>

One could argue that the maintainer should be able to decide that based
on whether the service is configured with safe defaults from the get go.

Not to mention that your rationale is void once someone reboots.  On the
other hand when you start only the sockets you can still configure the
service before connecting to it.

I feel like "enable but don't start" is a middle ground between seamless
user experience and safe defaults.  It does none of that.  I think
either default would be better than the current state, but that's
swaying from what we're trying to fix here.

>--
>Andrea Bolognani / Red Hat / Virtualization
>
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH 2/2] Report better error message in remoteGetUNIXSocket

Posted by Andrea Bolognani 2 years, 2 months ago

On Thu, Nov 30, 2023 at 09:40:47AM +0100, Martin Kletzander wrote:
> On Wed, Nov 29, 2023 at 12:30:19PM -0600, Andrea Bolognani wrote:
> > On Wed, Nov 29, 2023 at 10:49:23AM +0000, Richard W.M. Jones wrote:
> > > On Wed, Nov 29, 2023 at 05:44:59AM -0500, Andrea Bolognani wrote:
> > > > I assume you're running Fedora/RHEL on your laptop, and the policy
> > > > there is that services (or sockets) should not be started right after
> > > > a package is installed. Debian has the opposite policy.
> > >
> > > I think this is a very weird choice (for Fedora).  Why would
> > > installing the package not start the service, but then the service
> > > would be started without further intervention after reboot?  It's the
> > > opposite of predictable behaviour.
> >
> > I believe that the rationale is that a newly-installed service might
> > need to be configured before it can work correctly/securely. Not
> > starting it right away provides a temporal window that can be used to
> > perform the initial configuration, and which is entirely under the
> > local admin's control.
>
> One could argue that the maintainer should be able to decide that based
> on whether the service is configured with safe defaults from the get go.

The maintainer's definition of "safe" might not match the local
admin's. There might be additional constraints that are specific to
the local environment and that upstream can't possibly know about.

> Not to mention that your rationale is void once someone reboots.

Rebooting is a pretty explicit choice by the admin, just like
manually starting the service. That's what I meant when I said that
the temporal window is entirely under the local admin's control.

> On the
> other hand when you start only the sockets you can still configure the
> service before connecting to it.

If you started the sockets, any user on the system would be able to
trigger startup of the daemon by connecting to the read/only one,
possibly at a time when the final configuration is not in place yet.

Incidentally, "any user on the system can open a read/only connection
to libvirt" is a perfect example of a configuration that the local
admin might not like :)

We have recently made it possible to conveniently disable the
read/only (and admin) socket altogether to minimize the attack
surface for the daemon, but again that requires explicit
configuration by the local admin.

> I feel like "enable but don't start" is a middle ground between seamless
> user experience and safe defaults.  It does none of that.  I think
> either default would be better than the current state, but that's
> swaying from what we're trying to fix here.

Yeah, as I said it's a distro-wide policy so we don't really have any
control over it.

-- 
Andrea Bolognani / Red Hat / Virtualization
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org