[PATCH] bhyve: process: improve domain startup error handling

Roman Bogorodskiy posted 1 patch 1 month, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20251028182204.81586-1-bogorodskiy@gmail.com
src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
[PATCH] bhyve: process: improve domain startup error handling
Posted by Roman Bogorodskiy 1 month, 2 weeks ago
After executing the bhyve binary, it might happen that it fails very
early due to configuration issues (missing/inaccessible files, incorrect
custom args), bugs, etc. In this case it'll look like the domain has
started normally, but quickly turned off.

Improve that by waiting for the domain's vmm entity to appear in
/dev/vmm.

Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
---
Ideally, I'd also like to display the errors that the bhyve binary logs
in this case. But as I'm doing:

 virCommandSetErrorFD(cmd, &logfd);

I'm not sure if I should read that again from logfd, or are there more
optimal ways to do that?


 src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c
index 4fb7e642e1..591c2d1ded 100644
--- a/src/bhyve/bhyve_process.c
+++ b/src/bhyve/bhyve_process.c
@@ -143,7 +143,12 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver,
     g_autoptr(virCommand) cmd = NULL;
     g_autoptr(virCommand) load_cmd = NULL;
     bhyveDomainObjPrivate *priv = vm->privateData;
+    g_auto(virBuffer) domain_vmm_path_buf = VIR_BUFFER_INITIALIZER;
+    g_autofree char *domain_vmm_path = NULL;
     int ret = -1, rc;
+    size_t i = 0;
+    int timeout = 3;
+    bool vmm_appeared = false;
 
     logfile = g_strdup_printf("%s/%s.log", BHYVE_LOG_DIR, vm->def->name);
     if ((logfd = open(logfile, O_WRONLY | O_APPEND | O_CREAT,
@@ -228,6 +233,25 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver,
         goto cleanup;
     }
 
+    virBufferAsprintf(&domain_vmm_path_buf, "/dev/vmm/%s", vm->def->name);
+    domain_vmm_path = virBufferContentAndReset(&domain_vmm_path_buf);
+
+    do {
+        if (virFileExists(domain_vmm_path)) {
+            vmm_appeared = true;
+            break;
+        }
+
+        g_usleep(500000);
+    } while (!vmm_appeared && ++i < timeout);
+
+    if (!vmm_appeared) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("Domain %1$s didn't show up in /dev/vmm"),
+                       vm->def->name);
+        goto cleanup;
+    }
+
     vm->def->id = vm->pid;
     virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason);
     priv->mon = bhyveMonitorOpen(vm, driver);
-- 
2.51.0
Re: [PATCH] bhyve: process: improve domain startup error handling
Posted by Michal Prívozník via Devel 1 month, 2 weeks ago
On 10/28/25 19:22, Roman Bogorodskiy wrote:
> After executing the bhyve binary, it might happen that it fails very
> early due to configuration issues (missing/inaccessible files, incorrect
> custom args), bugs, etc. In this case it'll look like the domain has
> started normally, but quickly turned off.
> 
> Improve that by waiting for the domain's vmm entity to appear in
> /dev/vmm.
> 
> Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
> ---
> Ideally, I'd also like to display the errors that the bhyve binary logs
> in this case. But as I'm doing:
> 
>  virCommandSetErrorFD(cmd, &logfd);
> 
> I'm not sure if I should read that again from logfd, or are there more
> optimal ways to do that?
> 
> 
>  src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)

Please consider a squashing this in:


diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c
index 591c2d1ded..61638abf85 100644
--- a/src/bhyve/bhyve_process.c
+++ b/src/bhyve/bhyve_process.c
@@ -51,6 +51,7 @@
 #include "virnetdev.h"
 #include "virnetdevbridge.h"
 #include "virnetdevtap.h"
+#include "virtime.h"
 
 #define VIR_FROM_THIS   VIR_FROM_BHYVE
 
@@ -143,11 +144,9 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver,
     g_autoptr(virCommand) cmd = NULL;
     g_autoptr(virCommand) load_cmd = NULL;
     bhyveDomainObjPrivate *priv = vm->privateData;
-    g_auto(virBuffer) domain_vmm_path_buf = VIR_BUFFER_INITIALIZER;
     g_autofree char *domain_vmm_path = NULL;
+    virTimeBackOffVar timebackoff;
     int ret = -1, rc;
-    size_t i = 0;
-    int timeout = 3;
     bool vmm_appeared = false;
 
     logfile = g_strdup_printf("%s/%s.log", BHYVE_LOG_DIR, vm->def->name);
@@ -233,17 +232,16 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver,
         goto cleanup;
     }
 
-    virBufferAsprintf(&domain_vmm_path_buf, "/dev/vmm/%s", vm->def->name);
-    domain_vmm_path = virBufferContentAndReset(&domain_vmm_path_buf);
+    domain_vmm_path = g_strdup_printf("/dev/vmm/%s", vm->def->name);
 
-    do {
+    if (virTimeBackOffStart(&timebackoff, 1, 1000) < 0)
+        goto cleanup;
+    while (virTimeBackOffWait(&timebackoff)) {
         if (virFileExists(domain_vmm_path)) {
             vmm_appeared = true;
             break;
         }
-
-        g_usleep(500000);
-    } while (!vmm_appeared && ++i < timeout);
+    }
 
     if (!vmm_appeared) {
         virReportError(VIR_ERR_INTERNAL_ERROR,


Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal
Re: [PATCH] bhyve: process: improve domain startup error handling
Posted by Roman Bogorodskiy 1 month, 1 week ago
  Michal Prívozník wrote:

> On 10/28/25 19:22, Roman Bogorodskiy wrote:
> > After executing the bhyve binary, it might happen that it fails very
> > early due to configuration issues (missing/inaccessible files, incorrect
> > custom args), bugs, etc. In this case it'll look like the domain has
> > started normally, but quickly turned off.
> > 
> > Improve that by waiting for the domain's vmm entity to appear in
> > /dev/vmm.
> > 
> > Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
> > ---
> > Ideally, I'd also like to display the errors that the bhyve binary logs
> > in this case. But as I'm doing:
> > 
> >  virCommandSetErrorFD(cmd, &logfd);
> > 
> > I'm not sure if I should read that again from logfd, or are there more
> > optimal ways to do that?
> > 
> > 
> >  src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> 
> Please consider a squashing this in:
> 
> 
> diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c
> index 591c2d1ded..61638abf85 100644
> --- a/src/bhyve/bhyve_process.c
> +++ b/src/bhyve/bhyve_process.c
> @@ -51,6 +51,7 @@
>  #include "virnetdev.h"
>  #include "virnetdevbridge.h"
>  #include "virnetdevtap.h"
> +#include "virtime.h"
>  
>  #define VIR_FROM_THIS   VIR_FROM_BHYVE
>  
> @@ -143,11 +144,9 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver,
>      g_autoptr(virCommand) cmd = NULL;
>      g_autoptr(virCommand) load_cmd = NULL;
>      bhyveDomainObjPrivate *priv = vm->privateData;
> -    g_auto(virBuffer) domain_vmm_path_buf = VIR_BUFFER_INITIALIZER;
>      g_autofree char *domain_vmm_path = NULL;
> +    virTimeBackOffVar timebackoff;
>      int ret = -1, rc;
> -    size_t i = 0;
> -    int timeout = 3;
>      bool vmm_appeared = false;
>  
>      logfile = g_strdup_printf("%s/%s.log", BHYVE_LOG_DIR, vm->def->name);
> @@ -233,17 +232,16 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver,
>          goto cleanup;
>      }
>  
> -    virBufferAsprintf(&domain_vmm_path_buf, "/dev/vmm/%s", vm->def->name);
> -    domain_vmm_path = virBufferContentAndReset(&domain_vmm_path_buf);
> +    domain_vmm_path = g_strdup_printf("/dev/vmm/%s", vm->def->name);
>  
> -    do {
> +    if (virTimeBackOffStart(&timebackoff, 1, 1000) < 0)
> +        goto cleanup;
> +    while (virTimeBackOffWait(&timebackoff)) {
>          if (virFileExists(domain_vmm_path)) {
>              vmm_appeared = true;
>              break;
>          }
> -
> -        g_usleep(500000);
> -    } while (!vmm_appeared && ++i < timeout);
> +    }
>  
>      if (!vmm_appeared) {
>          virReportError(VIR_ERR_INTERNAL_ERROR,
> 
> 
> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
> 
> Michal

Thanks, I was looking for something like that, but somehow overlooked
the virTimeBackOff* functions.

I squashed that in and changed 1000 to 1500 in virTimeBackOffStart() to
keep the original 1.5 seconds timeout. Then I thought a bit more about
it and bumped it to 5000 to add some safety gap. I'll test it a bit more
and merge after the freeze.

By the way, any pointers how to improve logging for these situation,
i.e. so I could somehow keep the error log not only in the actual log
FD, but also accessible in the code, so I could report that to a user?

Thanks,
Roman