[libvirt] [RFC] New domain job control and stat APIs

Peter Krempa posted 1 patch 4 years, 9 months ago
Test syntax-check passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/b6567a036c7a24ef88aeb8195d5279fc72536a7d.1562761598.git.pkrempa@redhat.com
include/libvirt/libvirt-domain.h | 91 +++++++++++++++++++++++++++++++
src/libvirt-domain.c             | 94 ++++++++++++++++++++++++++++++++
2 files changed, 185 insertions(+)
[libvirt] [RFC] New domain job control and stat APIs
Posted by Peter Krempa 4 years, 9 months ago
Currently we don't have a consolidated approach for managing
asynchronous long-running domain jobs. Historically there were
long-running jobs which interlocked with each other and thus there was
only one such job possible at given time (migration, save, restore, dump)

These jobs have a not very flexible set of APIs:
virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.

These don't really allow selecting which job to terminate since there's
only one, thus if we wanted to add different kinds of jobs which not
necessarily interlock but are able to run in parallel we had to
introduce another set of APIs.

This resulted into creation of block job APIs:
virDomainBlockJobAbort, virDomainGetBlockJobInfo

These allow parallel jobs (discriminated by disk to which the job
belongs) but are not universal and nor allow parallel jobs on a single
disk.

Similarly blockjobs can also become detached from the disk e.g. if the
guest unplugs the disk fronted. That way the job would linger in a limbo
and would not be controllable. (This is certainly a possibility with
-blockdev).

With -blockdev we also get a potentially long-running blockdev-create
job which is not bound to any disk as part of kicking of a snapshot or
block copy job. This one might also get stuck and in the current state
is not really controllable.

Additionally the upcomming block-backup job will be a combination of the
above. It's a job which spans multiple disks (thus not really a block
job in libvirt terminology) but not a domain job either as there
can be potentially more than one block backup job. The proposal for
block-backup introduces it's own single-purpose set of APIs for managing
the backup job only, but abuses the block job and domain job events to
distribute the async state updates.

With this series I want to introduce a set of APIs for managing the jobs
which are designed to be universal enough and a new event so that noting
will try to retrofit onto existing infrastructure.

An example of the job XML would be:

<job type='block-commit-active' state='ready'>
  <config>
    <disk>vda</disk>
    <top>vda[1]</top>
    <base>vda[5]</base>
  </config>
  <stats>
   <current>12345</current>
   <end>12345</current>
  </stats>
</job>

but this will be mostly a topic for the second part of this excercise
after we discuss the APIs.

The new infrastructure will also allow adding a flag for all the
existing APIs which kick-off a job so that the job will persist even
after it finishes. This will also properly implement the statistics for
a finished migration and similar.

Obviously we will need to take special care when wiring up these so that
the old APIs work for old situations and also the events are reported
correctly.

The initial idea would be to implement the stats XML both for the domain
jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
so that they won't have to infer whether the given job type is already
reported in the new API.

Additionally we can also implement flags for the XML getter API that
will skip the stats gathering as that may require monitor interactions.
Also one possibility would be to return an abbreviated XML in the
listing API.
---
 include/libvirt/libvirt-domain.h | 91 +++++++++++++++++++++++++++++++
 src/libvirt-domain.c             | 94 ++++++++++++++++++++++++++++++++
 2 files changed, 185 insertions(+)

diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
index 2dbd74d4f3..dac77771be 100644
--- a/include/libvirt/libvirt-domain.h
+++ b/include/libvirt/libvirt-domain.h
@@ -4485,6 +4485,28 @@ typedef void (*virConnectDomainEventBlockThresholdCallback)(virConnectPtr conn,
                                                             unsigned long long excess,
                                                             void *opaque);

+/**
+ * virConnectDomainEventJobStateCallback:
+ * @conn: connection object
+ * @dom: domain on which the event occurred
+ * @jobname: name of job which changed state
+ * @jobtype: type of the job
+ * @newstate: the new state the job entered
+ * @opaque: application specified data
+ *
+ * The callback occurs when a long running domain job (see virDomainJobList)
+ * changes state.
+ *
+ * The callback signature to use when registering for an event of type
+ * VIR_DOMAIN_EVENT_ID_JOB_STATE with virConnectDomainEventRegisterAny()
+ */
+typedef void (*virConnectDomainEventJobStateCallback)(virConnectPtr conn,
+                                                      virDomainPtr dom,
+                                                      const char *jobname,
+                                                      virDomainJobType jobtype,
+                                                      virDomainJobState newstate,
+                                                      void *opaque);
+
 /**
  * VIR_DOMAIN_EVENT_CALLBACK:
  *
@@ -4527,6 +4549,7 @@ typedef enum {
     VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED = 22, /* virConnectDomainEventDeviceRemovalFailedCallback */
     VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */
     VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */
+    VIR_DOMAIN_EVENT_ID_JOB_STATE = 25, /* virConnectDomainEventJobStateCallback */

 # ifdef VIR_ENUM_SENTINELS
     VIR_DOMAIN_EVENT_ID_LAST
@@ -4896,4 +4919,72 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
                                    int *nparams,
                                    unsigned int flags);

+typedef enum {
+    VIR_DOMAIN_JOB_TYPE_NONE = 0,
+    VIR_DOMAIN_JOB_TYPE_MIGRATION = 1,
+    VIR_DOMAIN_JOB_TYPE_BLOCK_PULL = 2,
+    [...]
+
+# ifdef VIR_ENUM_SENTINELS
+    VIR_DOMAIN_JOB_TYPE_LAST
+# endif
+} virDomainJobType;
+
+
+typedef enum {
+    VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
+    VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
+    VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state and may be finalized */
+    VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
+    VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully */
+    VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
+    [...]
+
+# ifdef VIR_ENUM_SENTINELS
+    VIR_DOMAIN_JOB_STATE_LAST
+# endif
+} virDomainJobState;
+
+
+typedef struct _virDomainJob virDomainJob;
+typedef virDomainJob *virDomainJobPtr;
+struct _virDomainJob {
+    char *name;
+    virDomainJobType type;
+    virDomainJobState state;
+
+    /* possibly overkill? - currently empty*/
+    virTypedParameterPtr data;
+    size_t ndata;
+};
+
+
+void virDomainJobFree(virDomainJobPtr job);
+
+int virDomainJobList(virDomainPtr domain,
+                     virDomainJobPtr **jobs,
+                     unsigned int flags);
+
+int virDomainJobGetXMLDesc(virDomainPtr domain,
+                           const char *jobname,
+                           unsigned int flags);
+
+typedef enum {
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
+
+# ifdef VIR_ENUM_SENTINELS
+    VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
+# endif
+} virDomainJobControlOperation;
+
+int virDomainJobControl(virDomainPtr domain,
+                        const char *jobname,
+                        virDomainJobControlOperation op,
+                        unsigned int flags);
+
 #endif /* LIBVIRT_DOMAIN_H */
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index 3d12e7c125..aa5571818f 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12362,3 +12362,97 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
     virDispatchError(domain->conn);
     return -1;
 }
+
+
+/**
+ * virDomainJobFree:
+ * @job: pointer to virDomainJob object
+ *
+ * Frees the memory associated with @job.
+ */
+void
+virDomainJobFree(virDomainJobPtr job)
+{
+    [...]
+}
+
+
+/**
+ * virDomainJobList:
+ * @domain: pointer to a domain
+ * @jobs: Pointer to a variable to store the array containing job description
+ *        objects or NULL if the list is not required.
+ * @flags: optional flags (currently unused, callers should always pass 0)
+ *
+ * Collects a list of background jobs associated with @domain and returns it in
+ * an allocated array of virDomainJobPtr structs. The jobs include migration jobs
+ * block jobs and any other possibly long running asynchronous operation.
+ *
+ * The caller is responsible for freeing the members of the returned @jobs array
+ * using virDomainJobFree and the whole array using free();
+ *
+ * Returns the number of jobs running on @domain on success (optionally filling
+ * @jobs if non-NULL) or -1 on error (value of @jobs is set to NULL).
+ */
+int
+virDomainJobList(virDomainPtr domain,
+                 virDomainJobPtr **jobs,
+                 unsigned int flags)
+{
+    [...]
+}
+
+
+/**
+ * virDomainJobGetXMLDesc:
+ * @domain: pointer to a domain
+ * @jobname: name of the domain job to operate on
+ * @flags: optional flags (currently unused, callers should always pass 0)
+ *
+ * Returns a string containing an UTF-8 encoded XML document describing the
+ * configuration, state and progress of domain job @jobname. Please refer to the
+ * job XML documentation for information on the format of the returned document.
+ *
+ * In case of error NULL is returned. Caller is responsible for free()-ing the
+ * returned string.
+ */
+char *
+virDomainJobGetXMLDesc(virDomainPtr domain,
+                       const char *jobname,
+                       unsigned int flags)
+{
+    [...]
+}
+
+
+/**
+ * virDomainJobControl:
+ * @domain: pointer to a domain
+ * @jobname: name of the domain job to operate on
+ * @op: operation to perform on @jobname
+ * @flags: optional flags (currently unused, callers should always pass 0)
+ *
+ * Requests change of state of @jobname. Note that it depends on the type of
+ * @jobname whether @op is supported.
+ *
+ * VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE are supported only with
+ * VIR_DOMAIN_JOB_TYPE_BLOCK_COPY and VIR_DOMAIN_JOB_TYPE_BLOCK_COMMIT_ACTIVE.
+ *
+ * VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE and VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME
+ * are supported only with VIR_DOMAIN_JOB_TYPE_BLOCK_* type jobs.
+ *
+ * The request to change state is asynchronous and callers should install an
+ * event callback for VIR_DOMAIN_EVENT_ID_JOB_STATE if they wish to be notified
+ * when the state change occured. (Note that the callback may fire before this
+ * API returns).
+ *
+ * Returns 0 on success or -1 on error.
+ */
+int
+virDomainJobControl(virDomainPtr domain,
+                    const char *jobname,
+                    virDomainJobControlOperations op,
+                    unsigned int flags)
+{
+    [...]
+}
-- 
2.21.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] New domain job control and stat APIs
Posted by Daniel P. Berrangé 4 years, 8 months ago
On Wed, Jul 10, 2019 at 02:27:21PM +0200, Peter Krempa wrote:
> Currently we don't have a consolidated approach for managing
> asynchronous long-running domain jobs. Historically there were
> long-running jobs which interlocked with each other and thus there was
> only one such job possible at given time (migration, save, restore, dump)
> 
> These jobs have a not very flexible set of APIs:
> virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.

Yeah the person who designed virDomainGetJobInfo was not doing
a very good job of it ;-P

> With this series I want to introduce a set of APIs for managing the jobs
> which are designed to be universal enough and a new event so that noting
> will try to retrofit onto existing infrastructure.
> 
> An example of the job XML would be:
> 
> <job type='block-commit-active' state='ready'>

I'd expect to see a unique identifier of some kind here.

>   <config>
>     <disk>vda</disk>
>     <top>vda[1]</top>
>     <base>vda[5]</base>
>   </config>
>   <stats>
>    <current>12345</current>
>    <end>12345</current>
>   </stats>
> </job>
> 
> but this will be mostly a topic for the second part of this excercise
> after we discuss the APIs.

> +typedef enum {
> +    VIR_DOMAIN_JOB_TYPE_NONE = 0,
> +    VIR_DOMAIN_JOB_TYPE_MIGRATION = 1,
> +    VIR_DOMAIN_JOB_TYPE_BLOCK_PULL = 2,
> +    [...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_TYPE_LAST
> +# endif
> +} virDomainJobType;
> +
> +
> +typedef enum {
> +    VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
> +    VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
> +    VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state and may be finalized */
> +    VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
> +    VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully */
> +    VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
> +    [...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_STATE_LAST
> +# endif
> +} virDomainJobState;
> +
> +
> +typedef struct _virDomainJob virDomainJob;
> +typedef virDomainJob *virDomainJobPtr;
> +struct _virDomainJob {
> +    char *name;
> +    virDomainJobType type;
> +    virDomainJobState state;
> +
> +    /* possibly overkill? - currently empty*/
> +    virTypedParameterPtr data;
> +    size_t ndata;
> +};

I'd have a preference for keeping this as an opaque
struct not exposed in the public header, and making
it a first class object. So instead have accessors

  char * virDomainJobGetName(virDomainJobPtr job);
  virDomainJobType virDOmainJobGetType(virDomainJobPtr job);
  virDomainJobState virDOmainJobGetState(virDomainJobPtr job);

which fetch from the local struct. The local struct would
also need to contain a reference to the virDomainPtrpass over

This avoids the query you have about virTypedParameterPtr
inclusion, as we can ignore it now, and if we get a compelling
need, we then just add a remotable method:

   virDomainJobGetStats(virDomainJobPtr job,
                        virTypedParameterPtr **params,
			unsigned int *nparams);

> +void virDomainJobFree(virDomainJobPtr job);
> +
> +int virDomainJobList(virDomainPtr domain,
> +                     virDomainJobPtr **jobs,
> +                     unsigned int flags);
> +
> +int virDomainJobGetXMLDesc(virDomainPtr domain,
> +                           const char *jobname,
> +                           unsigned int flags);

This would become

 int virDomainJobGetXMLDesc(virDomainJobPtr job,
                            unsigned int flags);


> +
> +typedef enum {
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> +# endif
> +} virDomainJobControlOperation;
> +
> +int virDomainJobControl(virDomainPtr domain,
> +                        const char *jobname,
> +                        virDomainJobControlOperation op,
> +                        unsigned int flags);

And again change to

 int virDomainJobControl(virDomainJobPtr job,
                         virDomainJobControlOperation op,
                         unsigned int flags);


I think I'd also have a preference to identify a job based on a UUID too,
as that gives stronger uniqueness. eg when debugging there's no risk of
confusing two jobs which are against different domains but happened to get
the same name. 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] New domain job control and stat APIs
Posted by Pavel Hrdina 4 years, 8 months ago
On Wed, Jul 24, 2019 at 03:43:14PM +0100, Daniel P. Berrangé wrote:

[...]

> > +typedef struct _virDomainJob virDomainJob;
> > +typedef virDomainJob *virDomainJobPtr;
> > +struct _virDomainJob {
> > +    char *name;
> > +    virDomainJobType type;
> > +    virDomainJobState state;
> > +
> > +    /* possibly overkill? - currently empty*/
> > +    virTypedParameterPtr data;
> > +    size_t ndata;
> > +};
> 
> I'd have a preference for keeping this as an opaque
> struct not exposed in the public header, and making
> it a first class object. So instead have accessors
> 
>   char * virDomainJobGetName(virDomainJobPtr job);
>   virDomainJobType virDOmainJobGetType(virDomainJobPtr job);
>   virDomainJobState virDOmainJobGetState(virDomainJobPtr job);
> 
> which fetch from the local struct. The local struct would
> also need to contain a reference to the virDomainPtrpass over
> 
> This avoids the query you have about virTypedParameterPtr
> inclusion, as we can ignore it now, and if we get a compelling
> need, we then just add a remotable method:
> 
>    virDomainJobGetStats(virDomainJobPtr job,
>                         virTypedParameterPtr **params,
> 			unsigned int *nparams);

Agreed to make it first class object.  Shouldn't we rename it to
virJob* as well or do we want to have it tied to domains only?

> > +void virDomainJobFree(virDomainJobPtr job);
> > +
> > +int virDomainJobList(virDomainPtr domain,
> > +                     virDomainJobPtr **jobs,
> > +                     unsigned int flags);
> > +
> > +int virDomainJobGetXMLDesc(virDomainPtr domain,
> > +                           const char *jobname,
> > +                           unsigned int flags);
> 
> This would become
> 
>  int virDomainJobGetXMLDesc(virDomainJobPtr job,
>                             unsigned int flags);
> 
> 
> > +
> > +typedef enum {
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> > +
> > +# ifdef VIR_ENUM_SENTINELS
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> > +# endif
> > +} virDomainJobControlOperation;
> > +
> > +int virDomainJobControl(virDomainPtr domain,
> > +                        const char *jobname,
> > +                        virDomainJobControlOperation op,
> > +                        unsigned int flags);
> 
> And again change to
> 
>  int virDomainJobControl(virDomainJobPtr job,
>                          virDomainJobControlOperation op,
>                          unsigned int flags);
> 
> 
> I think I'd also have a preference to identify a job based on a UUID too,
> as that gives stronger uniqueness. eg when debugging there's no risk of
> confusing two jobs which are against different domains but happened to get
> the same name. 

Agreed, we need to have a unique ID and we should definitely avoid any
magic with ID=0 to pick the "current" job as that would only lead to lot
of issues.

Overall looks good.

Pavel
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] New domain job control and stat APIs
Posted by Daniel P. Berrangé 4 years, 8 months ago
On Thu, Jul 25, 2019 at 01:52:01PM +0200, Pavel Hrdina wrote:
> On Wed, Jul 24, 2019 at 03:43:14PM +0100, Daniel P. Berrangé wrote:
> 
> [...]
> 
> > > +typedef struct _virDomainJob virDomainJob;
> > > +typedef virDomainJob *virDomainJobPtr;
> > > +struct _virDomainJob {
> > > +    char *name;
> > > +    virDomainJobType type;
> > > +    virDomainJobState state;
> > > +
> > > +    /* possibly overkill? - currently empty*/
> > > +    virTypedParameterPtr data;
> > > +    size_t ndata;
> > > +};
> > 
> > I'd have a preference for keeping this as an opaque
> > struct not exposed in the public header, and making
> > it a first class object. So instead have accessors
> > 
> >   char * virDomainJobGetName(virDomainJobPtr job);
> >   virDomainJobType virDOmainJobGetType(virDomainJobPtr job);
> >   virDomainJobState virDOmainJobGetState(virDomainJobPtr job);
> > 
> > which fetch from the local struct. The local struct would
> > also need to contain a reference to the virDomainPtrpass over
> > 
> > This avoids the query you have about virTypedParameterPtr
> > inclusion, as we can ignore it now, and if we get a compelling
> > need, we then just add a remotable method:
> > 
> >    virDomainJobGetStats(virDomainJobPtr job,
> >                         virTypedParameterPtr **params,
> > 			unsigned int *nparams);
> 
> Agreed to make it first class object.  Shouldn't we rename it to
> virJob* as well or do we want to have it tied to domains only?

I guess the obvious use case would be in the storage pool APIs
which already have long running background tasks where this would
be useful.

The difficulty though is that we then need APIs for the job
dispatched to different drivers depending on the instance of
the job. eg one job needs to be processed by QEMU driver
whle the next job needs to be processed by the storage driver.
This could get quite fugly with the way we do API dispatch.
We already have this with the virStream APIs of course, but
I think they are one of the more horrible parts of our internal
framework.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] New domain job control and stat APIs
Posted by Eric Blake 4 years, 9 months ago
On 7/10/19 7:27 AM, Peter Krempa wrote:
> Currently we don't have a consolidated approach for managing
> asynchronous long-running domain jobs. Historically there were
> long-running jobs which interlocked with each other and thus there was
> only one such job possible at given time (migration, save, restore, dump)

Yes, I agree with the problem statement. In fact, one of my questions
when starting on incremental backup was whether I'd have to implement
any of this first, or punt it to later. Well, here we are; it's later,
and I still don't have incremental backup in.

> 
> These jobs have a not very flexible set of APIs:
> virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.
> 
> These don't really allow selecting which job to terminate since there's
> only one, thus if we wanted to add different kinds of jobs which not
> necessarily interlock but are able to run in parallel we had to
> introduce another set of APIs.
> 
> This resulted into creation of block job APIs:
> virDomainBlockJobAbort, virDomainGetBlockJobInfo
> 
> These allow parallel jobs (discriminated by disk to which the job
> belongs) but are not universal and nor allow parallel jobs on a single
> disk.
> 
> Similarly blockjobs can also become detached from the disk e.g. if the
> guest unplugs the disk fronted. That way the job would linger in a limbo
> and would not be controllable. (This is certainly a possibility with
> -blockdev).
> 
> With -blockdev we also get a potentially long-running blockdev-create
> job which is not bound to any disk as part of kicking of a snapshot or
> block copy job. This one might also get stuck and in the current state
> is not really controllable.
> 
> Additionally the upcomming block-backup job will be a combination of the
> above. It's a job which spans multiple disks (thus not really a block
> job in libvirt terminology) but not a domain job either as there
> can be potentially more than one block backup job. The proposal for
> block-backup introduces it's own single-purpose set of APIs for managing
> the backup job only, but abuses the block job and domain job events to
> distribute the async state updates.

At the bare minimum, a push job implementation has to send async state
update events, but a pull job does not. If the backup API goes into
5.6.0 but not your new job control APIs, we can limit the qemu
implementation to JUST pull backups for now (the XML has been shown to
be extensible to add in push support once we also have sane APIs for
managing async state updates with push support).

> 
> With this series I want to introduce a set of APIs for managing the jobs
> which are designed to be universal enough and a new event so that noting
> will try to retrofit onto existing infrastructure.
> 
> An example of the job XML would be:
> 
> <job type='block-commit-active' state='ready'>
>   <config>
>     <disk>vda</disk>
>     <top>vda[1]</top>
>     <base>vda[5]</base>
>   </config>

Where what forms a valid <config> is dependent on the type='...' of the
<job>, correct?

>   <stats>
>    <current>12345</current>
>    <end>12345</current>
>   </stats>
> </job>

Looks reasonable.

> 
> but this will be mostly a topic for the second part of this excercise
> after we discuss the APIs.
> 
> The new infrastructure will also allow adding a flag for all the
> existing APIs which kick-off a job so that the job will persist even
> after it finishes. This will also properly implement the statistics for
> a finished migration and similar.
> 
> Obviously we will need to take special care when wiring up these so that
> the old APIs work for old situations and also the events are reported
> correctly.
> 
> The initial idea would be to implement the stats XML both for the domain
> jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
> so that they won't have to infer whether the given job type is already
> reported in the new API.
> 
> Additionally we can also implement flags for the XML getter API that
> will skip the stats gathering as that may require monitor interactions.
> Also one possibility would be to return an abbreviated XML in the
> listing API.

Makes sense.

> ---
>  include/libvirt/libvirt-domain.h | 91 +++++++++++++++++++++++++++++++
>  src/libvirt-domain.c             | 94 ++++++++++++++++++++++++++++++++
>  2 files changed, 185 insertions(+)
> 
> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> index 2dbd74d4f3..dac77771be 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -4485,6 +4485,28 @@ typedef void (*virConnectDomainEventBlockThresholdCallback)(virConnectPtr conn,
>                                                              unsigned long long excess,
>                                                              void *opaque);
> 
> +/**
> + * virConnectDomainEventJobStateCallback:
> + * @conn: connection object
> + * @dom: domain on which the event occurred
> + * @jobname: name of job which changed state
> + * @jobtype: type of the job
> + * @newstate: the new state the job entered
> + * @opaque: application specified data
> + *
> + * The callback occurs when a long running domain job (see virDomainJobList)
> + * changes state.
> + *
> + * The callback signature to use when registering for an event of type
> + * VIR_DOMAIN_EVENT_ID_JOB_STATE with virConnectDomainEventRegisterAny()
> + */
> +typedef void (*virConnectDomainEventJobStateCallback)(virConnectPtr conn,
> +                                                      virDomainPtr dom,
> +                                                      const char *jobname,
> +                                                      virDomainJobType jobtype,
> +                                                      virDomainJobState newstate,
> +                                                      void *opaque);
> +
>  /**
>   * VIR_DOMAIN_EVENT_CALLBACK:
>   *
> @@ -4527,6 +4549,7 @@ typedef enum {
>      VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED = 22, /* virConnectDomainEventDeviceRemovalFailedCallback */
>      VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */
>      VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */
> +    VIR_DOMAIN_EVENT_ID_JOB_STATE = 25, /* virConnectDomainEventJobStateCallback */
> 
>  # ifdef VIR_ENUM_SENTINELS
>      VIR_DOMAIN_EVENT_ID_LAST
> @@ -4896,4 +4919,72 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
>                                     int *nparams,
>                                     unsigned int flags);
> 
> +typedef enum {
> +    VIR_DOMAIN_JOB_TYPE_NONE = 0,
> +    VIR_DOMAIN_JOB_TYPE_MIGRATION = 1,
> +    VIR_DOMAIN_JOB_TYPE_BLOCK_PULL = 2,
> +    [...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_TYPE_LAST
> +# endif
> +} virDomainJobType;
> +
> +
> +typedef enum {
> +    VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
> +    VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
> +    VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state and may be finalized */
> +    VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
> +    VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully */
> +    VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
> +    [...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_STATE_LAST
> +# endif
> +} virDomainJobState;

Not all job types will utilize all of the possible job states, but I
don't see that as an issue.

> +
> +
> +typedef struct _virDomainJob virDomainJob;
> +typedef virDomainJob *virDomainJobPtr;
> +struct _virDomainJob {
> +    char *name;
> +    virDomainJobType type;
> +    virDomainJobState state;
> +
> +    /* possibly overkill? - currently empty*/
> +    virTypedParameterPtr data;

Can the XML can provide all the same information as what you would place
in typed parameters?

> +    size_t ndata;
> +};
> +
> +
> +void virDomainJobFree(virDomainJobPtr job);
> +
> +int virDomainJobList(virDomainPtr domain,
> +                     virDomainJobPtr **jobs,
> +                     unsigned int flags);
> +
> +int virDomainJobGetXMLDesc(virDomainPtr domain,
> +                           const char *jobname,
> +                           unsigned int flags);
> +
> +typedef enum {
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> +# endif
> +} virDomainJobControlOperation;
> +
> +int virDomainJobControl(virDomainPtr domain,
> +                        const char *jobname,

Do we want const char *jobname or int jobid?  Which is easier to work
with? My backup code currently proposed the use of int jobid, switching
to a job name may be more effort. Also, I was able to document that a
jobid==0 implies "the current backup job, if there is exactly one",
regardless of what other job id it may have (I guess a jobname==NULL
would serve the same purpose - but passing around char* instead of int
means you also have to start worrying about malloc lifetimes).

> +                        virDomainJobControlOperation op,
> +                        unsigned int flags);
> +
>  #endif /* LIBVIRT_DOMAIN_H */
> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> index 3d12e7c125..aa5571818f 100644
> --- a/src/libvirt-domain.c
> +++ b/src/libvirt-domain.c
> @@ -12362,3 +12362,97 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
>      virDispatchError(domain->conn);
>      return -1;
>  }
> +
> +
> +/**
> + * virDomainJobFree:
> + * @job: pointer to virDomainJob object
> + *
> + * Frees the memory associated with @job.
> + */
> +void
> +virDomainJobFree(virDomainJobPtr job)
> +{
> +    [...]
> +}
> +
> +
> +/**
> + * virDomainJobList:
> + * @domain: pointer to a domain
> + * @jobs: Pointer to a variable to store the array containing job description
> + *        objects or NULL if the list is not required.
> + * @flags: optional flags (currently unused, callers should always pass 0)

One possible use for flags - as a way to filter on specific job types.
With one flag bit per supported job type, I could do virDomainJobList(,
JOB_LIST_MIGRATE | JOB_LIST_BACKUP).

Another filter might be on whether to list all jobs active or otherwise,
vs. just the jobs that have completed but are not yet cleaned up (if I
only want to know about the jobs that have a pending status waiting for
me to collect).

> + *
> + * Collects a list of background jobs associated with @domain and returns it in
> + * an allocated array of virDomainJobPtr structs. The jobs include migration jobs

s/$/,/

> + * block jobs and any other possibly long running asynchronous operation.
> + *
> + * The caller is responsible for freeing the members of the returned @jobs array
> + * using virDomainJobFree and the whole array using free();

s/;/./

> + *
> + * Returns the number of jobs running on @domain on success (optionally filling
> + * @jobs if non-NULL) or -1 on error (value of @jobs is set to NULL).
> + */
> +int
> +virDomainJobList(virDomainPtr domain,
> +                 virDomainJobPtr **jobs,
> +                 unsigned int flags)
> +{
> +    [...]
> +}
> +
> +
> +/**
> + * virDomainJobGetXMLDesc:
> + * @domain: pointer to a domain
> + * @jobname: name of the domain job to operate on
> + * @flags: optional flags (currently unused, callers should always pass 0)
> + *
> + * Returns a string containing an UTF-8 encoded XML document describing the
> + * configuration, state and progress of domain job @jobname. Please refer to the
> + * job XML documentation for information on the format of the returned document.
> + *
> + * In case of error NULL is returned. Caller is responsible for free()-ing the
> + * returned string.
> + */
> +char *
> +virDomainJobGetXMLDesc(virDomainPtr domain,
> +                       const char *jobname,
> +                       unsigned int flags)
> +{
> +    [...]
> +}
> +
> +
> +/**
> + * virDomainJobControl:
> + * @domain: pointer to a domain
> + * @jobname: name of the domain job to operate on
> + * @op: operation to perform on @jobname
> + * @flags: optional flags (currently unused, callers should always pass 0)
> + *
> + * Requests change of state of @jobname. Note that it depends on the type of
> + * @jobname whether @op is supported.
> + *
> + * VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE are supported only with

s/are/is/

> + * VIR_DOMAIN_JOB_TYPE_BLOCK_COPY and VIR_DOMAIN_JOB_TYPE_BLOCK_COMMIT_ACTIVE.
> + *
> + * VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE and VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME
> + * are supported only with VIR_DOMAIN_JOB_TYPE_BLOCK_* type jobs.
> + *
> + * The request to change state is asynchronous and callers should install an
> + * event callback for VIR_DOMAIN_EVENT_ID_JOB_STATE if they wish to be notified
> + * when the state change occured. (Note that the callback may fire before this

either 'occurs' or 'occurred'

> + * API returns).
> + *
> + * Returns 0 on success or -1 on error.
> + */
> +int
> +virDomainJobControl(virDomainPtr domain,
> +                    const char *jobname,
> +                    virDomainJobControlOperations op,
> +                    unsigned int flags)
> +{
> +    [...]
> +}
> 

Overall seems reasonable. Figuring out how to connect existing jobs
(both domain jobs and block jobs) into the new scheme may be an
interesting effort.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] New domain job control and stat APIs
Posted by Peter Krempa 4 years, 9 months ago
On Wed, Jul 10, 2019 at 08:38:20 -0500, Eric Blake wrote:
> On 7/10/19 7:27 AM, Peter Krempa wrote:
> > Currently we don't have a consolidated approach for managing
> > asynchronous long-running domain jobs. Historically there were
> > long-running jobs which interlocked with each other and thus there was
> > only one such job possible at given time (migration, save, restore, dump)
> 
> Yes, I agree with the problem statement. In fact, one of my questions
> when starting on incremental backup was whether I'd have to implement
> any of this first, or punt it to later. Well, here we are; it's later,
> and I still don't have incremental backup in.
> 
> > 
> > These jobs have a not very flexible set of APIs:
> > virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.
> > 
> > These don't really allow selecting which job to terminate since there's
> > only one, thus if we wanted to add different kinds of jobs which not
> > necessarily interlock but are able to run in parallel we had to
> > introduce another set of APIs.
> > 
> > This resulted into creation of block job APIs:
> > virDomainBlockJobAbort, virDomainGetBlockJobInfo
> > 
> > These allow parallel jobs (discriminated by disk to which the job
> > belongs) but are not universal and nor allow parallel jobs on a single
> > disk.
> > 
> > Similarly blockjobs can also become detached from the disk e.g. if the
> > guest unplugs the disk fronted. That way the job would linger in a limbo
> > and would not be controllable. (This is certainly a possibility with
> > -blockdev).
> > 
> > With -blockdev we also get a potentially long-running blockdev-create
> > job which is not bound to any disk as part of kicking of a snapshot or
> > block copy job. This one might also get stuck and in the current state
> > is not really controllable.
> > 
> > Additionally the upcomming block-backup job will be a combination of the
> > above. It's a job which spans multiple disks (thus not really a block
> > job in libvirt terminology) but not a domain job either as there
> > can be potentially more than one block backup job. The proposal for
> > block-backup introduces it's own single-purpose set of APIs for managing
> > the backup job only, but abuses the block job and domain job events to
> > distribute the async state updates.
> 
> At the bare minimum, a push job implementation has to send async state
> update events, but a pull job does not. If the backup API goes into
> 5.6.0 but not your new job control APIs, we can limit the qemu
> implementation to JUST pull backups for now (the XML has been shown to
> be extensible to add in push support once we also have sane APIs for
> managing async state updates with push support).
> 
> > 
> > With this series I want to introduce a set of APIs for managing the jobs
> > which are designed to be universal enough and a new event so that noting
> > will try to retrofit onto existing infrastructure.
> > 
> > An example of the job XML would be:
> > 
> > <job type='block-commit-active' state='ready'>
> >   <config>
> >     <disk>vda</disk>
> >     <top>vda[1]</top>
> >     <base>vda[5]</base>
> >   </config>
> 
> Where what forms a valid <config> is dependent on the type='...' of the
> <job>, correct?

Yes exactly.

> 
> >   <stats>
> >    <current>12345</current>
> >    <end>12345</current>
> >   </stats>
> > </job>
> 
> Looks reasonable.
> 
> > 
> > but this will be mostly a topic for the second part of this excercise
> > after we discuss the APIs.
> > 
> > The new infrastructure will also allow adding a flag for all the
> > existing APIs which kick-off a job so that the job will persist even
> > after it finishes. This will also properly implement the statistics for
> > a finished migration and similar.
> > 
> > Obviously we will need to take special care when wiring up these so that
> > the old APIs work for old situations and also the events are reported
> > correctly.
> > 
> > The initial idea would be to implement the stats XML both for the domain
> > jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
> > so that they won't have to infer whether the given job type is already
> > reported in the new API.
> > 
> > Additionally we can also implement flags for the XML getter API that
> > will skip the stats gathering as that may require monitor interactions.
> > Also one possibility would be to return an abbreviated XML in the
> > listing API.
> 
> Makes sense.
> 
> > ---

[...]

> > +typedef enum {
> > +    VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
> > +    VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
> > +    VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state and may be finalized */
> > +    VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
> > +    VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully */
> > +    VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
> > +    [...]
> > +
> > +# ifdef VIR_ENUM_SENTINELS
> > +    VIR_DOMAIN_JOB_STATE_LAST
> > +# endif
> > +} virDomainJobState;
> 
> Not all job types will utilize all of the possible job states, but I
> don't see that as an issue.

Some of the states will be only for the event in the initial
implementation until somebody will request that we add the possibility
to keep the job data also after the job finished until it's dismissed.

This will e.g. allow to figure out job state after libvirtd restart.

> > +typedef struct _virDomainJob virDomainJob;
> > +typedef virDomainJob *virDomainJobPtr;
> > +struct _virDomainJob {
> > +    char *name;
> > +    virDomainJobType type;
> > +    virDomainJobState state;
> > +
> > +    /* possibly overkill? - currently empty*/
> > +    virTypedParameterPtr data;
> 
> Can the XML can provide all the same information as what you would place
> in typed parameters?

Yes, this would merely be a subset. This definitely would not include
stats because you want this API to return quick without any monitor
interaction.

> 
> > +    size_t ndata;
> > +};
> > +
> > +
> > +void virDomainJobFree(virDomainJobPtr job);
> > +
> > +int virDomainJobList(virDomainPtr domain,
> > +                     virDomainJobPtr **jobs,
> > +                     unsigned int flags);
> > +
> > +int virDomainJobGetXMLDesc(virDomainPtr domain,
> > +                           const char *jobname,
> > +                           unsigned int flags);
> > +
> > +typedef enum {
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> > +
> > +# ifdef VIR_ENUM_SENTINELS
> > +    VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> > +# endif
> > +} virDomainJobControlOperation;
> > +
> > +int virDomainJobControl(virDomainPtr domain,
> > +                        const char *jobname,
> 
> Do we want const char *jobname or int jobid?  Which is easier to work
> with? My backup code currently proposed the use of int jobid, switching

I strongly prefer name/string. For programatic consumption it does not
matter much and for human consumption its WAY better since the jobs can
bear descriptive names (e.g. contain the job type in the name).

> to a job name may be more effort. Also, I was able to document that a
> jobid==0 implies "the current backup job, if there is exactly one",
> regardless of what other job id it may have (I guess a jobname==NULL

I definitely do not want to do this kind of magic solution. If a job
finishes and a second one is started while a human is issuing the
command you can attempt to kill a different job.

Also really this would only be an optimization for humans as software
using that will presumably always pass in an explicit identifier.

> would serve the same purpose - but passing around char* instead of int
> means you also have to start worrying about malloc lifetimes).

You get it from the job list. The job listing API is the only way to get
the job name unless you want to retrofit all blockjobs to new APIs which
will start to return the job identifier somehow.

[...]

> > diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> > index 3d12e7c125..aa5571818f 100644
> > --- a/src/libvirt-domain.c
> > +++ b/src/libvirt-domain.c
> > @@ -12362,3 +12362,97 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
> >      virDispatchError(domain->conn);
> >      return -1;
> >  }
> > +
> > +
> > +/**
> > + * virDomainJobFree:
> > + * @job: pointer to virDomainJob object
> > + *
> > + * Frees the memory associated with @job.
> > + */
> > +void
> > +virDomainJobFree(virDomainJobPtr job)
> > +{
> > +    [...]
> > +}
> > +
> > +
> > +/**
> > + * virDomainJobList:
> > + * @domain: pointer to a domain
> > + * @jobs: Pointer to a variable to store the array containing job description
> > + *        objects or NULL if the list is not required.
> > + * @flags: optional flags (currently unused, callers should always pass 0)
> 
> One possible use for flags - as a way to filter on specific job types.
> With one flag bit per supported job type, I could do virDomainJobList(,
> JOB_LIST_MIGRATE | JOB_LIST_BACKUP).
> 
> Another filter might be on whether to list all jobs active or otherwise,
> vs. just the jobs that have completed but are not yet cleaned up (if I
> only want to know about the jobs that have a pending status waiting for
> me to collect).

Yes I thought about filtering.

[...]

> > + * Collects a list of background jobs associated with @domain and returns it in
> > + * an allocated array of virDomainJobPtr structs. The jobs include migration jobs
> > + * API returns).
> > + *
> > + * Returns 0 on success or -1 on error.
> > + */
> > +int
> > +virDomainJobControl(virDomainPtr domain,
> > +                    const char *jobname,
> > +                    virDomainJobControlOperations op,
> > +                    unsigned int flags)
> > +{
> > +    [...]
> > +}
> > 
> 
> Overall seems reasonable. Figuring out how to connect existing jobs
> (both domain jobs and block jobs) into the new scheme may be an
> interesting effort.

Block jobs will be easy. I'm finishing up my block job tracking refactor
which will register all jobs in a global hash table so a mere
virHashForeach to fill it will be sufficient.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list