[PATCH] libxl: Defer starting job until migration perform phase

Jim Fehlig posted 1 patch 10 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20230707201754.17060-1-jfehlig@suse.com
src/libxl/libxl_migration.c | 37 +++++++++++++++----------------------
1 file changed, 15 insertions(+), 22 deletions(-)
[PATCH] libxl: Defer starting job until migration perform phase
Posted by Jim Fehlig 10 months ago
Currently the libxl driver calls virDomainObjBeginJob in the begin
phase of migration on the source host, expecting to call virDomainObjEndJob
in the confirm phase. But the confirm phase is never executed if the
prepare or perform phases fail. The job was already being terminated
in the perform phase in case of failure, but there is no way to do that
if the prepare phase fails since it runs on the destination host.

Move the call to virDomainObjBeginJob to the perform phase. It can be
terminated there in the event of failure. On success, the confirm phase
will be executed, ensuring the job is terminated.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
---
 src/libxl/libxl_migration.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index 961815f9f7..126265f895 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -378,19 +378,11 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
     virDomainDef *def;
     char *xml = NULL;
 
-    /*
-     * In the case of successful migration, a job is started here and
-     * terminated in the confirm phase. Errors in the begin or perform
-     * phase will also terminate the job.
-     */
-    if (virDomainObjBeginJob(vm, VIR_JOB_MODIFY) < 0)
-        goto cleanup;
-
     if (!(mig = libxlMigrationCookieNew(vm)))
-        goto endjob;
+        goto cleanup;
 
     if (libxlMigrationBakeCookie(mig, cookieout, cookieoutlen) < 0)
-        goto endjob;
+        goto cleanup;
 
     if (xmlin) {
         if (!(tmpdef = virDomainDefParseString(xmlin,
@@ -398,10 +390,10 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
                                                NULL,
                                                VIR_DOMAIN_DEF_PARSE_INACTIVE |
                                                VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE)))
-            goto endjob;
+            goto cleanup;
 
         if (!libxlDomainDefCheckABIStability(driver, vm->def, tmpdef))
-            goto endjob;
+            goto cleanup;
 
         def = tmpdef;
     } else {
@@ -409,15 +401,9 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
     }
 
     if (!libxlDomainMigrationIsAllowed(def))
-        goto endjob;
+        goto cleanup;
 
     xml = virDomainDefFormat(def, driver->xmlopt, VIR_DOMAIN_DEF_FORMAT_SECURE);
-    /* Valid xml means success! EndJob in the confirm phase */
-    if (xml)
-        goto cleanup;
-
- endjob:
-    virDomainObjEndJob(vm);
 
  cleanup:
     libxlMigrationCookieFree(mig);
@@ -1209,6 +1195,13 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivate *driver,
     sockfd = virNetSocketDupFD(sock, true);
     virObjectUnref(sock);
 
+    /*
+     * Start a job for the migration and terminate it in the confirm
+     * phase, taking care to handle error cases.
+     */
+    if (virDomainObjBeginJob(vm, VIR_JOB_MODIFY) < 0)
+        return -1;
+
     /* suspend vm and send saved data to dst through socket fd */
     virObjectUnlock(vm);
     ret = libxlDoMigrateSrcSend(driver, vm, flags, sockfd);
@@ -1223,8 +1216,8 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivate *driver,
         }
     } else {
         /*
-         * Confirm phase will not be executed if perform fails. End the
-         * job started in begin phase.
+         * End the job since the confirm phase will not be executed if
+         * perform fails.
          */
         virDomainObjEndJob(vm);
     }
@@ -1383,7 +1376,7 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivate *driver,
     ret = 0;
 
  cleanup:
-    /* EndJob for corresponding BeginJob in begin phase */
+    /* EndJob for corresponding BeginJob in perform phase */
     virDomainObjEndJob(vm);
     virObjectEventStateQueue(driver->domainEventState, event);
     virObjectUnref(cfg);
-- 
2.41.0
Re: [PATCH] libxl: Defer starting job until migration perform phase
Posted by Jim Fehlig 9 months, 4 weeks ago
On 7/7/23 14:17, Jim Fehlig wrote:
> Currently the libxl driver calls virDomainObjBeginJob in the begin
> phase of migration on the source host, expecting to call virDomainObjEndJob
> in the confirm phase. But the confirm phase is never executed if the
> prepare or perform phases fail. The job was already being terminated
> in the perform phase in case of failure, but there is no way to do that
> if the prepare phase fails since it runs on the destination host.
> 
> Move the call to virDomainObjBeginJob to the perform phase. It can be
> terminated there in the event of failure. On success, the confirm phase
> will be executed, ensuring the job is terminated.

After taking another look at the job handling code in libxl_migration.c, and 
reviewing the documentation for VIR_MIGRATE_CHANGE_PROTECTION, I realize the 
libxl driver is making a weak attempt at implementing CHANGE_PROTECTION without 
advertising it. I need to do a bit more thinking/testing if the current attempt 
is strong enough, and if so drop this patch in favor of simply advertising 
support for CHANGE_PROTECTION.

Regards,
Jim

> 
> Signed-off-by: Jim Fehlig <jfehlig@suse.com>
> ---
>   src/libxl/libxl_migration.c | 37 +++++++++++++++----------------------
>   1 file changed, 15 insertions(+), 22 deletions(-)
> 
> diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
> index 961815f9f7..126265f895 100644
> --- a/src/libxl/libxl_migration.c
> +++ b/src/libxl/libxl_migration.c
> @@ -378,19 +378,11 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
>       virDomainDef *def;
>       char *xml = NULL;
>   
> -    /*
> -     * In the case of successful migration, a job is started here and
> -     * terminated in the confirm phase. Errors in the begin or perform
> -     * phase will also terminate the job.
> -     */
> -    if (virDomainObjBeginJob(vm, VIR_JOB_MODIFY) < 0)
> -        goto cleanup;
> -
>       if (!(mig = libxlMigrationCookieNew(vm)))
> -        goto endjob;
> +        goto cleanup;
>   
>       if (libxlMigrationBakeCookie(mig, cookieout, cookieoutlen) < 0)
> -        goto endjob;
> +        goto cleanup;
>   
>       if (xmlin) {
>           if (!(tmpdef = virDomainDefParseString(xmlin,
> @@ -398,10 +390,10 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
>                                                  NULL,
>                                                  VIR_DOMAIN_DEF_PARSE_INACTIVE |
>                                                  VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE)))
> -            goto endjob;
> +            goto cleanup;
>   
>           if (!libxlDomainDefCheckABIStability(driver, vm->def, tmpdef))
> -            goto endjob;
> +            goto cleanup;
>   
>           def = tmpdef;
>       } else {
> @@ -409,15 +401,9 @@ libxlDomainMigrationSrcBegin(virConnectPtr conn,
>       }
>   
>       if (!libxlDomainMigrationIsAllowed(def))
> -        goto endjob;
> +        goto cleanup;
>   
>       xml = virDomainDefFormat(def, driver->xmlopt, VIR_DOMAIN_DEF_FORMAT_SECURE);
> -    /* Valid xml means success! EndJob in the confirm phase */
> -    if (xml)
> -        goto cleanup;
> -
> - endjob:
> -    virDomainObjEndJob(vm);
>   
>    cleanup:
>       libxlMigrationCookieFree(mig);
> @@ -1209,6 +1195,13 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivate *driver,
>       sockfd = virNetSocketDupFD(sock, true);
>       virObjectUnref(sock);
>   
> +    /*
> +     * Start a job for the migration and terminate it in the confirm
> +     * phase, taking care to handle error cases.
> +     */
> +    if (virDomainObjBeginJob(vm, VIR_JOB_MODIFY) < 0)
> +        return -1;
> +
>       /* suspend vm and send saved data to dst through socket fd */
>       virObjectUnlock(vm);
>       ret = libxlDoMigrateSrcSend(driver, vm, flags, sockfd);
> @@ -1223,8 +1216,8 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivate *driver,
>           }
>       } else {
>           /*
> -         * Confirm phase will not be executed if perform fails. End the
> -         * job started in begin phase.
> +         * End the job since the confirm phase will not be executed if
> +         * perform fails.
>            */
>           virDomainObjEndJob(vm);
>       }
> @@ -1383,7 +1376,7 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivate *driver,
>       ret = 0;
>   
>    cleanup:
> -    /* EndJob for corresponding BeginJob in begin phase */
> +    /* EndJob for corresponding BeginJob in perform phase */
>       virDomainObjEndJob(vm);
>       virObjectEventStateQueue(driver->domainEventState, event);
>       virObjectUnref(cfg);
Re: [PATCH] libxl: Defer starting job until migration perform phase
Posted by Jim Fehlig 9 months, 3 weeks ago
On 7/10/23 17:00, Jim Fehlig wrote:
> On 7/7/23 14:17, Jim Fehlig wrote:
>> Currently the libxl driver calls virDomainObjBeginJob in the begin
>> phase of migration on the source host, expecting to call virDomainObjEndJob
>> in the confirm phase. But the confirm phase is never executed if the
>> prepare or perform phases fail. The job was already being terminated
>> in the perform phase in case of failure, but there is no way to do that
>> if the prepare phase fails since it runs on the destination host.
>>
>> Move the call to virDomainObjBeginJob to the perform phase. It can be
>> terminated there in the event of failure. On success, the confirm phase
>> will be executed, ensuring the job is terminated.
> 
> After taking another look at the job handling code in libxl_migration.c, and 
> reviewing the documentation for VIR_MIGRATE_CHANGE_PROTECTION, I realize the 
> libxl driver is making a weak attempt at implementing CHANGE_PROTECTION without 
> advertising it. I need to do a bit more thinking/testing if the current attempt 
> is strong enough, and if so drop this patch in favor of simply advertising 
> support for CHANGE_PROTECTION.

FTR, self-NACK this patch in favor of

https://listman.redhat.com/archives/libvir-list/2023-July/240681.html

Regards,
Jim